7,842 Matching Annotations
  1. Last 7 days
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ducrocq et al. present research exploring the genetic link between simple multicellular group formation (ace2Δ/ace2Δ) and its interaction with cell-cycle progression mutants (e.g., cln3Δ/cln3Δ), demonstrating that this combination can provide fitness benefits during fluctuating resource conditions, resulting in a rapid increase in the fraction of multicellular cell-cycle mutants over unicellular yeast without selection for multicellular size. Because both the multicellular phenotype and the regulatory link enabling faster escape from the stationary phase are controlled by the Ace2 transcription factor, this work demonstrates that multicellularity can arise as a side-effect of a completely independent fitness advantage unrelated to the benefits of group formation itself. As a "passenger phenotype," multicellularity could thus emerge for other selective reasons, potentially facilitating a later transition to more entrenched multicellularity if novel conditions arise where group formation becomes directly beneficial.

      Strengths:

      This work is novel and exciting for research exploring the very first steps of the transition from unicellularity to simple multicellularity. This is particularly significant because the formation of multicellular groups is almost always assumed to come at a cell-level fitness cost due to reduced reproductive fitness compared to remaining unicellular. This cell-level fitness cost generally needs to be outweighed by the benefits of multicellular group formation (e.g., large size escaping predation) for the multicellular phenotype to be stable, which is true for a large number of cases studied in the literature, where the multicellular phenotype can only evolve over unicellular competitors under strong selection for multicellular groups. However, this study presents an interesting case of a genetic and environmental condition under which individual cells (forming simple multicellular clusters) can actually have higher reproductive fitness than unicellular yeast. This demonstrates that the assumed cost at the single-cell level does not always apply. In summary, this work represents a unique example contrary to common assumptions regarding the costs of multicellular phenotypes, showing that simple multicellular phenotypes can evolve and remain stable without requiring strong selection for multicellular size or other benefits of group formation.

      The claims and interpretation of the results align well with the data presented. This is due to the careful and straightforward experimental design testing predictions with a clear, stepwise methodology, ruling out alternative explanations and providing support for the proposed link between the mutations (ace2, cln3, and others), their impact on faster exit from quiescence, and thus earlier entry into reproduction in fresh media, resulting in higher fitness in the snowflake yeast phenotype compared to unicellular yeast.

      Weaknesses:

      The authors show that the same multicellular phenotype with higher cell-level fitness due to faster exit from the stationary phase can also be observed with alleles found at other loci in non-laboratory yeast strains, implying that the results are likely not specific to a peculiar case genetically engineered in laboratory strains, but that similar phenotypes may be present in nature. However, this remains to be explored further by examining the natural ecology of commercially available or wild yeast isolates and their genomes. This is by no means a weakness of this study and, therefore, not necessarily something the current work can improve. It does mean, however, that the relevance of these findings for early multicellularity in yeast, and even more so for nascent multicellularity in distinct taxa, remains to be explored in the future. Until then, it is difficult to make strong claims about how applicable these results would be for non-laboratory yeast and other taxa. Regardless, this work does its part by representing a very exciting finding.

      Reviewer #2 (Public review):

      Summary:

      Here, the authors attempt to demonstrate that a simple model of multicellularity - snowflake yeast - exhibits key ecologically relevant changes in the regulation of the cell cycle. By examining the effects of the ace2 mutation in environments where multicellularity is not directly selected for or against, and combining it with mutations in key cell cycle regulators, they hope to show that mutations driving simple multicellularity can be selectively favored due to their effects on the release from quiescence rather than their effects on multicellularity itself.

      Strengths:

      The experiments performed are extensive and thorough. The yeast genotypes examined are judiciously chosen, so as to map out a functional model of the relationship between alterations to cell cycle control and changes to multicellularity phenotypes. Multiple possible interactions are examined, with the causal link and model of the relationship between the multicellular passenger phenotype and the selectable quiescence-release phenotype being well-supported. There are extensive controls demonstrating the separation between the 'passenger' multicellular phenotype and the cell cycle regulation phenotypes examined, including haploid/diploid strains with different multicellular phenotypes but similar cell cycle regulation phenotypes, and phenocopy strains in which downstream enzymes are deleted rather than key central regulators.

      Weaknesses:

      My only concerns about these results relate to the focus on selection on cell cycle control being examined in a model of multicellularity with key core cell cycle mutations rather than in a wild-type background, as this is a somewhat artificial system.

      I believe, however, that the authors convincingly make their case that this work on the multicellular phenotypes of yeast represents a potent proof-of-concept that simple multicellularity can be driven into existence or selected for as a passenger phenotype due to pleiotropic effects of mutations under selection from real-world ecological pressures. They are able to connect this phenotype back to known mutations of particular cell cycle regulators (RB) in other multicellular lineages and demonstrate that ecologically relevant changes to the cell cycle are connected to multicellular phenotypes. As a proof of concept of the connection between these phenotypes, rather than a study of a particular event in the past of a living lineage, it makes a strong case.

      A longstanding question in the field of multicellularity is the selective pressures that can drive simple multicellularity into existence and then act on simple multicells to drive their increased size and complexity. This work brings to the table tangible evidence of the possibility that, instead of being selected for on its own, simple multicellularity can be a side-effect of selection on other key phenotypes.

      This separates the question of the origins of multicellularity and the forces that drive its further evolution. This separation can reframe how the field is studied, especially in the context of the apparent dichotomy between dozens of origins of 'simple' multicellularity across the tree of life and a few origins of 'complex' multicellularity in the history of Earth. Especially in light of other evidence that multicellularity is connected to changes in cell cycle regulation, I believe that this is an important insight that will alter the way we think about the origins of this key evolutionary transition.

      We thank the reviewers for their insightful comments on our work.

      We agree with reviewer #1 that further experiments would be needed to figure out how the observations done on lab strains can apply to yeast in various ecological conditions and particularly in the wild. We here provide a proof of principle that multicellularity selection can arise as a side-effect. It obviously does not prove that it took place during yeast evolution, but we would like to emphasize that resource fluctuations are very common in ecological conditions, making it highly likely that the environmental conditions necessary for the selection of the side effects described have arisen.

      We agree with reviewer #2 that our work on yeast strains is “somewhat artificial” as often the case with model organisms under laboratory conditions. Importantly though, we showed that the effect found with the cln3 knock-out mutation can be phenocopied by overexpression of WHI5 (encoding the yeast equivalent of Rb). We propose that variations in the levels of cell cycle regulators during evolution may have played a role in multicellularity selection as a side effect. We agree that this is merely a hypothesis to explain the selection of multicellularity (just like predator escape) and that there is no direct evidence that this occurred in the history of the lineage. Nevertheless, our work provides a first evidence that such a selection of multicellularity as a side effect could be possible, and gives a framework to understand how multicellularity can persist in the wild, even when it is not the primary target of selection.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned in my public review, I very much appreciate this work, its interpretation for early multicellularity as an example opposite to the assumed cost of multicellular phenotypes, and the robust design behind the premise and claims. Therefore, my suggestions below are mostly aimed at improving the readability and data presentation.

      (1) In the abstract, Lines 24-27 (the last sentence): This statement is worded too generally and therefore reads as too strong. I think the authors' work provides an example that multicellularity itself does not need to be beneficial all the time - this is really exciting and makes sense! However, there is a substantial body of work showing the origin and maintenance of multicellularity for its direct benefits. Relative to that body of work, this represents a special case, and therefore, while we should definitely reconsider the view that "multicellularity always comes at a cell-level fitness cost," we cannot overgeneralize these findings. Please consider reframing this statement.

      Done, now line 25 (addition of “in some cases”)

      (2) Line 48 (Introduction): "This mostly concerns two major regulators, RB and Cyclin D." Which organisms are you referring to? Please specify.

      Done.

      (3) In the Introduction, there are at least three sentences that need citations: L57-58, L59-60, and L65. For instance, I do not know what makes CLN3 the yeast functional equivalent of RB, and I wanted to verify this claim, but no references are cited. Please ensure citations are provided throughout the manuscript.

      Done: ref 11,12 and 13 were added

      (4) This is my main request regarding data collection and presentation. The authors share some microscopy images of mutant strains in Figure 2 for different purposes (e.g., Figure 2B compares the fraction of budded cells between two genotypes). However, I would appreciate seeing a collected microscopy figure showcasing the phenotypes of all genotypes that went into competition experiments, including the planktonic (WT lab strain) yeast, either where they appear or in a supplementary figure, all presented with the same magnification and scale to make them comparable. Because cell size, shape, and multicellular phenotype are all key aspects of the competition experiments, being able to see all those genotypes/phenotypes would prepare the reader to make predictions about the fitness assays and other experiments.

      Done Supplementary Figure 1 B-E were added

      (5) Related to my previous point, I would appreciate seeing cell size measurements for the different genotypes (both single cells of planktonic genotypes and single cells forming multicellular clusters). Cell size is a key trait that directly impacts the results shown in the paper, and summary statistics comparing them would be helpful for interpreting the results.

      Done Supplementary Figure 1 F was added

      (6) In competition experiments, the authors mix unicellular and multicellular yeast clusters at 50/50 and measure the fraction of a phenotype of interest (usually the % of snowflake). It took me a while to understand what is being counted under the "% snowflake yeast" category. This is because, while each cell in unicellular yeast should be counted as one unit, one can count a snowflake yeast composed of 50 cells as 50 units or as 1 unit. Please clearly state what is being counted for the Y-axis labeled "% of snowflake yeast" (or relabel those Y-axes in plots to make this clear).

      Done: Added in figure legend 1A and Y-axes of competition figures

      (7) I recommend editing the genotype labels in figures (see, for instance, Figure 1B, C, D). In Figure 1B, the bars are labeled as "CLN3/CLN3 co-culture" or "cln3Δ/cln3Δ co-culture," etc. These are actually co-cultures of SF vs. PK (with or without a CLN3 copy). Please consider using more representative labels that will be easier for readers to understand.

      Done: this has been changed in all concerned figures

      (8) In the Results, L225, you begin referring to AMN1368D as AMN1. I suggest using the full allelic form throughout the text so it will be clear each time that you are referring to that specific allele, as I was confused about whether you were discussing the allele or the gene AMN1 itself.

      This has been changed throughout the text.

      (9) Discussion, Lines 250-252, states that this is a "situation that is likely to happen very often under ecological conditions." Are there any examples you can cite?

      Done, as also requested by reviewer #2 (now line 256-7)

      (10) Lines 272-275 contain a strong, general statement suggesting that co-evolution of cell cycle regulation and multicellularity could be more general (which is acceptable as speculation). However, the suggestion that this co-evolution could have "started very early in the evolution of eukaryotic cells" is too speculative. I would recommend sticking with the alternative, suggesting that the link between the two phenotypes may be a case of convergent evolution.

      Done

      (11) Lines 278-279 are both vague and too bold. The text mentions a link between cancer and multicellularity and then extends this link through cell cycle regulators. Without explaining the connection between cancer and multicellularity and then trying to link it to cell cycle regulators, all in a few words without background, this sentence is too vague. Please consider deleting this or spending more time clearly explaining the link, which would at best still be speculative.

      These speculative sentences were removed.

      (12) First, I wanted to note that I highlighted Lines 284-287, as this passage is clearly written and provides a nice argument. I also wonder if you could mention that your work shows simple multicellular cluster formation should not always come at a cost, contrary to the general assumption in the literature, and add a few citations to support that claim. This would highlight how significant this work is within the broader multicellularity literature.

      Changed in discussion (now line 242-4 with additional references 30 and 31)

      (13) I recommend labeling the genotype of your "quintuple mutant" in Figure 3. You can refer to it as the quintuple mutant in the text, but I had to go back and forth to see what those mutations were when trying to think about potential genetic interactions. Even the legend of Figure 3 does not specify the genotype and refers to it only as the "quintuple mutant."

      Now explicitly stated in the title of the figure

      Reviewer #2 (Recommendations for the authors):

      I find the presented research to be of high quality, with very important implications. I have suggestions for improvement of the manuscript, but they are largely stylistic, with one paper that I believe deserves citation regarding the proteins involved. I see little need for additional experiments or analysis, just a clearer description of the results and their significance.

      (1) Line 62: Yeast CLN3 definitely performs the same role as cyclin D in the cell cycle, but has an unclear phylogenetic relationship with the rest of the cyclins. See Cross, Buchler, & Skotheim 2011 ("Evolution of networks and sequences in eukaryotic cell cycle control"). This reference also covers the functional relationship between RB and Whi5, referred to in nearby sentences, as does Medina, Walsh, and Buchler 2019 ("Evolutionary innovation, fungal cell biology, and the lateral gene transfer of a viral KilA-N domain").

      The reference has been added

      (2) Line 69: Is the question whether the evolution of G1/S regulation favoring multicellularity the question, or the two of them being connected such that the evolution of one can affect the other?

      It is clearly the first of the two questions.

      (3) Line 73: Comma after Ace2.

      Done

      (4) Line 76: It would be clearer to specify that snowflake and ACE2 yeast were co-cultured without settling selection or other selection that explicitly favors multicellularity, unlike in experiments where multicellular evolution is observed, as in Ratcliff publications.

      This is now specified.

      (5) Line 80: Specify which phenotypes observed for ace2 mutants are observed, specifically, both the multicellularity and the release from quiescence.

      Done

      (6) Line 146: This observation should be noted as another indication that the multicellular phenotype is not behind the selective pressure, because it is so different between unicells and multicells.

      Overall, you have very strong evidence that this is the case, and emphasizing this would benefit the paper!

      Done.

      (7) Line 151: specify that you are maintaining yeast in proliferation in coculture.

      Done.

      (8) Line 181: This is another key experiment showing that the multicellular phenotype is not the causal reason for the change in quiescence. It might make things clearer to bring all these confirmatory experiments together, particularly the haploids and the sonicated single cells.

      This is now clearly stated line 195.

      (9) Line 225: The choice of referring to the non-laboratory strain as the 'AMN1' wild type default may be confusing to readers, who may treat the genetic background you are using as the ground truth wild type. I recommend throughout the paper always specifying the allele's amino acid to avoid any confusion.

      The genotype is now clearly presented throughout the text.

      (10) Line 238: I would continue to specify that the multicellular phenotype has no selective advantage, specifically when no selection for size is applied.

      See added sentence Line 242-4 (revised version)

      (11) Line 243: I would say that the evolution of cell cycle regulation may interact with the multicellular phenotype.

      This was changed (now line 248)

      (12) Line 244: Strike 'indeed' and the 'the' before AMN1 and ACE2.

      Done

      (13) Line 252: Suggest some ecological conditions under which quiescence exit is likely, such as boom and bust or moving from rotting fruit to rotting fruit.

      Done

      (14) Line 267: Are you suggesting that the specific genes AMN1 and ACE2 had particular effects on actual organisms in the past, or that it represents a broad pattern of evolution in which multicellularity could be more broadly related to exit from quiescence? I believe it is the latter, and I think that should be clearer.

      Modified as suggested

      (15) Line 280: In this paragraph, I think that the point being made could be slightly clearer - if I am not mistaken, you are making the distinction between the appearance of multicellularity and its refinement under selection, and that the former may be more common than previously believed, given this proof of concept. I think this can be made clearer. Furthermore, it is worth noting that all experiments that show effects of the multicellular phenotype are in mutant backgrounds, and explaining why this is still relevant to wild organisms. It might be taken by some as indicating that the multicellular phenotypes are not relevant to a wild population, but the connection to known RB mutations in known multicellular lineages and the fact that it is connected to a very key aspect of cell cycle regulation, I think, overcomes this issue, and this should be made clear.

      Our study reveals a genetic link between multicellularity and Whi5 and Cln3, two important G1/S cell cycle regulators. Similar genetic interactions have been observed in phylogenetically distant species, reinforcing the idea that the interplay between cell cycle regulation and multicellularity is a general feature and not a mere artifact of mutant background.

      The neutral fitness effect of multicellularity in wild-type backgrounds is particularly of interest. By being maintained as a side effect of selection on fundamental cellular processes, the neutral effect of multicellularity may have provided “an evolutionary scheme” for its repeated emergence throughout the tree of life. As such, the "passenger selection" hypothesis fits well with the observations of phenotypic reversibility and facultative multicellularity, despite varying and specific selective pressures. Our work thus gives a framework to understand how multicellularity can persist in the wild, even when it is not the primary target of selection.

      (16) Line 314: What promoters are they driven by?

      Specified

      (17) Line 336: What was the culture volume, and the volume transferred?

      Specified

      (18) Line 362: How was the proportion of blue-stained cells scored? Manually, or with an imaging software cutoff?

      Specified

      (19) Figure 1: I think that the full genotypes of each strain should be specified, either in the legend or the key of the figure, rather than always specifying the ACE2 genotype and other mutations separately.

      Done as requested by reviewer #1

      (20) Figure 2E, 2F: Same as Figure 1, regarding genotypes.

      Done

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript has several strengths, including a technically comprehensive approach that combines mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models, providing a rich and multifaceted dataset. Cross-species validation through the parallel use of mouse and human systems strengthens the generality of the observed phenotypes and increases relevance to human neurodevelopment.

      Consistent phenotypic observations across systems show that ARHGEF6 loss affects migration, neurite morphology, growth cone structure, and neuronal survival, supporting a coherent role in cytoskeletal regulation.

      There is clear evidence for developmental defects, including reduced interneuron numbers, increased apoptosis in the ganglionic eminences, and migration deficits, all well supported by quantitative analyses. Also, there is a high-quality electrophysiological characterization that demonstrates reduced firing in interneurons, providing a well-controlled functional phenotype.

      Strengths:

      The manuscript has several strengths, including a technically comprehensive approach that combines mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models, providing a rich and multifaceted dataset. Cross-species validation through the parallel use of mouse and human systems strengthens the generality of the observed phenotypes and increases relevance to human neurodevelopment.

      Consistent phenotypic observations across systems show that ARHGEF6 loss affects migration, neurite morphology, growth cone structure, and neuronal survival, supporting a coherent role in cytoskeletal regulation.

      There is clear evidence for developmental defects, including reduced interneuron numbers, increased apoptosis in the ganglionic eminences, and migration deficits, all well supported by quantitative analyses. Also, there is a high-quality electrophysiological characterization that demonstrates reduced firing in interneurons, providing a well-controlled functional phenotype.

      We thank the reviewer for their positive and thoughtful assessment of our manuscript. We appreciate their recognition of the technical breadth of the study, including the integration of mouse genetics, electrophysiology, live imaging in assembloids, and human organoid models. We are also grateful that the reviewer highlights the value of our cross-species approach, as a major goal of the study was to determine whether ARHGEF6 loss produces convergent developmental and cellular phenotypes in both mouse and human systems.

      Weaknesses:

      Despite the strengths mentioned above, the study has some conceptual and experimental weaknesses that reduce its impact. The mechanistic insight is limited, as the research does not directly establish how ARHGEF6 regulates downstream signaling pathways.

      We appreciate the reviewer’s constructive comment. We agree that, although our data establish a phenotypic link between ARHGEF6 loss and interneuron development, they do not directly dissect the molecular mechanisms underlying the observed defects. Our interpretation that the mutant phenotype involves dysregulation of cytoskeletal dynamics is based on the directly observed defects in actin polymerization and organization in neural progenitor cells and neuronal growth cones respectively, and is consistent with the abnormalities observed in neurite morphology and neuronal migration. This interpretation is further supported by the established role of Arhgef6 as a regulator of the small Rho GTPases Rac1 and Cdc42. Previous evidence shows that Arhgef6 loss reduces the activity of both GTPases and deregulates the expression of the cytoskeletal regulators Pak1–3, Limk1, and Cofilin in the mouse brain (Ramakers et al., 2012). Moreover, spine abnormalities in Arhgef6-knockdown ex vivo slice cultures can be rescued by expressing the active form of Pak3, a downstream effector of Rac1 and Cdc42 (Node-Langlois et al., 2006). Together, these findings support a model in which the loss of the protein affects development through cytoskeletal dysregulation, likely involving altered Rho GTPase signalling. We nevertheless agree that further experiments would be required to establish a direct causal relationship between ARHGEF6 loss, Rho GTPase activity, cytoskeletal dysregulation, and the interneuron phenotypes described here. We will therefore revise the manuscript to clarify that this mechanistic link remains an interpretation supported by our data and the literature, rather than a direct demonstration within the present study.

      Also, there is insufficient evidence for interneuron specificity; although the central claim is that ARHGEF6 plays a selective role in interneurons, the data do not adequately exclude the possibility that the observed effects reflect broader neuronal defects. The study lacks critical controls across cell types, as several phenotypes observed in organoids and progenitors, including apoptosis, reduced neuronal output, and altered morphology, could also affect multiple neuronal populations without being directly tested.

      We agree that the current data do not exclude the possibility of alterations in other neuronal lineages, specifically the excitatory lineage. With regard to this, we would like to emphasize that the investigation of excitatory cell phenotypes was beyond the scope of the present study, as this aspect has previously been examined by Ramakers et al., 2012 and Node-Langlois et al., 2006, particularly in the context of hippocampal pyramidal cells, which are among the few cell types showing consistent expression of the gene in the adult mouse brain (Allen Brain Atlas; Yao et al., 2021). In this context, it is interesting to note that, in Ramakers et al., 2012 (Figure S1), MAP2 immunostaining of hippocampal formations revealed comparable distribution and intensity of neuronal cell bodies and dendrites throughout the hippocampus of both wild-type and Arhgef6-KO animals. With regard to morphological maturation of excitatory cells, whereas we observe a simplification of interneuron morphology in both mouse and human models, Ramakers et al., 2012 reported increased dendritic arborization complexity in hippocampal pyramidal cells. With regard to migration, a direct comparison with excitatory neurons would be intrinsically difficult, as excitatory and inhibitory neurons undergo highly distinct migratory processes and are therefore not directly comparable. We greatly appreciate the reviewer’s comment, as it gives us the opportunity to better discuss the relationship between our findings and previous studies in the Discussion. We will revise the manuscript and avoid implying that the phenotype observed is exclusive to interneurons.

      Furthermore, the data are predominantly descriptive, with many results remaining correlative and failing to establish causal relationships.

      We agree that our study primarily establishes a phenotypic framework and does not fully resolve the causal hierarchy among altered survival, migration, cytoskeletal morphology, and intrinsic excitability. We will revise the manuscript to make this limitation explicit, avoiding statements that imply direct causality beyond the data presented.

      Some more comments:

      (1) Given that ARHGEF6 is a guanine nucleotide exchange factor for Rac1 and Cdc42, the absence of direct measurements of GTPase activity or downstream signaling represents a significant gap. The interpretation that the observed phenotypes are mediated through specific cytoskeletal pathways, therefore, remains inferential.

      We appreciate the comment. The interpretation that our phenotype involves dysregulated cytoskeletal dynamics is based on the observed defects in actin polymerization and F-actin organization in neuronal growth cones and is consistent with the abnormalities in neurite morphology and neuronal migration. We will explicitly state in the Discussion that, since we did not directly measure Rac1 and Cdc42 activity levels in our models, our hypothesis regarding the involvement of this molecular pathway in the establishment of the observed phenotype therefore remains inferential, despite being supported by the current literature.

      (2) The manuscript repeatedly interprets the findings as interneuron-specific. However, several key observations are not demonstrated to be restricted to IN. Without direct comparison to excitatory neurons or other cell types, it is difficult to conclude that ARHGEF6 plays a selective role in interneurons rather than a more general role in neuronal development. The well-done analysis of the transcriptomic dataset is not sufficient to claim IN specificity. This issue is particularly important for the interpretation of the human organoid experiments, where reductions in SOX2⁺ progenitors and NEUN⁺ neurons, as well as increased apoptosis, could reflect global developmental defects. Similarly, in the mouse experiments, the reduction in GAD67⁺ cells is compelling, but it is not shown whether other neuronal populations are also affected.

      As previously mentioned, we understand the reviewer’s concern regarding the specificity of the observed phenotypes in interneurons and agree that the claims should be tempered. However, it is important to note that the interpretation of the human organoid experiments should be reconsidered. The use of specifically ventralized MGE-like organoids allowed us to assess the cell-autonomous nature of defects such as the reduction in inhibitory progenitors’ neuronal output, the increased apoptosis, and the morphological abnormalities of inhibitory neurons. We will acknowledge in the Discussion the limitations of the study with regard to assessing the cell-autonomous nature of the observed migration defects.

      (3) The study provides a strong phenotypic description but limited causal resolution. For example, migration defects, altered growth cone morphology, and reduced branching are all consistent with impaired cytoskeletal regulation, but the links between these phenotypes are not directly established. Likewise, while the electrophysiological data convincingly show reduced firing in interneurons, the connection between altered cytoskeletal dynamics and intrinsic excitability is not explored.

      The observed migration defects, altered growth-cone morphology, and reduced branching are consistent with impaired cytoskeletal regulation. However, we acknowledge that the mechanistic links among these phenotypes remain to be directly demonstrated. Similarly, although our electrophysiological data show reduced firing in ARHGEF6-KO interneurons, the present study does not provide direct evidence linking impaired excitability to altered cytoskeletal dynamics. In the latter case, we think that the underlying mechanisms should be further investigated at the subcellular level, particularly with respect to cytoskeleton-mediated intracellular trafficking and localization and distribution of ion channels. One limitation of the present study, which may have masked electrophysiological alterations associated with differences in membrane composition (current Figure S1D–H), is that different interneuron subtypes with distinct intrinsic properties were pooled together in the analysis. We will expand the Discussion to address these limitations.

      (4) Several aspects of data presentation could be improved. In multiple figures (e.g., Figure 1A, D; Figure 4 and Video S1, 2), the images are difficult to interpret due to high cellular density, limited magnification, or lack of clear annotation. In some cases, it is not fully clear how quantifications were performed or which regions were analyzed. Improving the visual clarity with arrows, boxes, and high-magnification inserts of the data would strengthen confidence in the conclusions.

      We would like to thank the reviewer for pointing this out. We agree that some images and videos would benefit from clearer annotation. In the revised manuscript, we will add high-magnification insets, arrows or boxes highlighting the relevant regions/cells, and clearer descriptions of the quantified regions. We will also improve legends and video labels to indicate genotype, region, and tracked cells.

      Reviewer #2 (Public review):

      The authors investigate the impact of the deletion of the small GTPase regulator ARHGEF6 on the development and physiology of interneurons. Using public databases, they first show that ARHGEF6 is enriched in interneurons or in areas that give rise to them, both in development and adulthood, in humans and mice. Using a complete KO mouse previously reported, and using a GAD67-GFP reporter mice line, they show that in the adult mouse cortex and hippocampus, there is a notorious reduction GFP+ cells. These mice show increased apoptotic cells at different timepoints and areas of the brain during development. In the developing cortex of ARHGEF6-KO mice, there are fewer IN in all layers of the developing cortex, and cells present processes not correctly oriented. IN from the hippocampus in culture show reduced excitability and impaired neurite branching. The authors then established isogenic hiPSCs lines to study ARHGEF6 deletion in human cells and differentiated ventral forebrain neurons, to find interneuron-related and non-related phenotypes. Most importantly, human interneurons grown in organoids show reduced branching and altered growth cone morphology. The authors claim that the novel interneuron phenotypes found in these models can explain, in part, the human intellectual disabilities associated with mutations in this protein. The study is well conducted and opens new avenues of research not only for the role of small GTPases regulation in early nervous system development, but also for how interneuron deficiencies impact a wider range of intellectual disability syndromes found in humans.

      We appreciate the reviewer’s positive evaluation of our manuscript and their recognition of this work’s potential to expand the focus of intellectual disability research on the development and function of the inhibitory system. We are particularly encouraged that the reviewer highlights the strength of our combined mouse and human cellular models, as well as the relevance of the interneuron-related phenotypes we identify across systems.

      However, most conclusions of the present version would be strengthened after considering the following comments:

      Major comments:

      (1) The reported biological processes evaluated at different developmental stages may be directly or indirectly related to ARHGEF6 function itself. As a model of a hereditary disease, full organism gene deletion is valid, since the human patients suffer from that condition as well. However, to investigate the roles of a protein, complete deletions may not be very accurate since they can give rise to phenotypes that are only indirectly related to the protein function itself. Most conclusions of the present manuscript should either be discussed in this regard or add evidence for a direct role of the protein. One such evidence is typically performed with acute knockdowns in culture, or in developing brains by in utero electroporation. For example, Figure 1C shows that the principal excitatory neurons in the hippocampus do not express ARHGEF6. However, most electrophysiological and behavioral evidence of defects in ARHGEF6-KO mice arises from evaluating these cells (Ramakers et al., 2012). I am not suggesting that either previous or actual evidence is wrong. But I believe readers would benefit from a clear distinction (or add caution notes) between a functional consequence of the deletion (that can be months away and in other cells than the actual molecular defect) and a true cell biological function of the protein under study. In favor of the authors, this is a concern with most conclusions derived from KO organisms.

      We agree with the reviewer that phenotypes observed in constitutive knockout models may, in some contexts, reflect indirect or compensatory consequences of long-term gene loss. Conditional and/or inducible knockout or knockdown approaches can certainly help dissect the nature of the observed defects and better define the effects of gene ablation at different developmental stages or in specific cell types. However, in the context of our study, it is important to note that the experiments performed in ventralized MGE-like organoids allowed us to assess the cell-autonomous nature of very early developmental defects in the inhibitory lineage, in isolation from other cell types. These defects include reduced neuronal output from inhibitory progenitors, increased apoptosis, and morphological abnormalities in inhibitory neurons. Therefore, the phenotypes reported here are less likely to reflect effects originating in, or indirectly caused by, cell types that do not express Arhgef6.

      With regard to Figure 1C, we state in the Results that “among excitatory populations, only CA3 pyramidal neurons and mossy cells exhibited expression levels comparable to those observed in inhibitory clusters (Figure 1D, Table S2),” thereby not neglecting the potential effect of the lack of a functional protein in these populations.

      (2) Figure 1E-G H I. All conclusions are made with a GAD67-GFP reporter, which is a very powerful and reliable tool for large-scale screening. All the conclusions of the paper would be strengthened if some immunohistochemical staining in the same areas of specific markers for interneurons would be added as supporting complementary evidence.

      We appreciate the insightful comment of the reviewer. Additional validation using established interneuronal markers will further strengthen the GAD67-eGFP analysis. We will perform complementary stainings (e.g., PVALB and CCK) and quantifications and include these data as a Supplementary Figure.

      (3) Cell death in development: It is surprising that the high amount of TUNEL staining during development does not translate into gross histological changes in the adult brain (studied elsewhere). Can authors discuss possible explanations?

      We appreciate the thoughtful consideration of our findings. We think that possible explanations include partial compensatory mechanisms during development, which may mitigate the long-term anatomical consequences of increased cell death. In addition, the phenotype may be restricted to specific neuronal populations or developmental windows, thereby producing functional alterations without necessarily resulting in overt macroanatomical defects. Thus, although increased developmental cell death may contribute to altered circuit assembly and neuronal output, it may not be sufficient to produce gross histological changes detectable at the adult brain level.

      (4) Section 4 (Figures 2F-J) - The authors present this staining as an analysis of migration. Normally, migration studies are performed with a "pulse-chase" paradigm, where a single cohort is labeled and then followed over time (normally by in utero electroporation of a fluorescent protein). Tissue is then fixed at different time points, and migration can be followed. On the contrary, the evidence is from a single point, in an experimental setting in which all Gad67 IN are stained, and hence, one cannot imply a defect in migration. The differences between WT and ARHGEF6-KO are obvious and interesting; it is just that they cannot be solely attributed to a problem in migration.

      Also, a true phenotype of migration in the current setting should have found that the cells that failed to migrate are accumulated in deeper layers. My impression is that the changes in IN per layer are easier explained by total cell number, rather than migration. Perhaps evaluating earlier timepoints could clarify this.

      We appreciate the reviewer’s suggestion to implement an additional time point in the in vivo migration analysis. Since an earlier in vivo time point would most likely not reveal migration-related defects, as most cells would still be confined to the ganglionic eminence (Liaci et al., 2022), we will include analyses performed at a later developmental time point as supplementary evidence. We will also revise the wording to clarify that the fixed-tissue data show altered distribution and orientation of GAD67-eGFP-positive interneurons, which are consistent with impaired migratory behavior when considered together with the in vitro live-imaging data. At the same time, we will acknowledge that reduced interneuron survival and/or neuronal output may also contribute to the observed phenotype.

      (5) It is known that ARHGEF6 deletion produces severe F-actin phenotypes in neurons. Have the authors confirmed in their hippocampal cultures GAD67 cells ALSO have these phenotypes? Stress fibers in somas, growth cones, and actin patches along neurites.

      We did not directly assess F-actin organization in GAD67-eGFP murine primary cultures. Direct analyses of F-actin organization, growth-cone morphology, and cytoskeletal organization were performed only in the human system. To further assess this phenotype, we will perform phalloidin staining on GAD67-eGFP brain sections to evaluate F-actin organization in interneurons in vivo.

      (6) Section 4. The authors present data for deficient migration of the GFP-labeled interneurons. Is it possible to assess, in the same sections, whether other cell types are also affected? Although the hypothesis that ARHGEF6 deletion will have an impact in IN is well rooted in expression data, by assessing other cell types, one can even include a positive control or evidence for a cell-autonomous phenotype.

      We thank the reviewer for their thoughtful suggestions. We agree that extending the analysis to additional cell types would provide further insight into the specificity of the phenotype; however, a comprehensive evaluation of all neuronal populations falls beyond the scope of this research. The use of ventralized MGE-like organoids enabled us to examine whether key defects were cell-autonomous, including the reduced neuronal output of inhibitory progenitors, increased apoptosis, and abnormal inhibitory-neuron morphology.

      (7) ARHGEDF6 deletion has an important impact on organoid development (size, shape, etc). Have the authors analysed whether these organoids produced fewer interneurons?

      We would like to clarify that the organoids analyzed in the study are ventral MGE-like organoids and therefore the reduction in neuronal output (current Figure 4K) primarily reflects the ventral/interneuron lineage in this model.

      (8) In assembloids, the differences in migration parameters are very small between WT and ARHGEF6-KO, which reinforces that perhaps what is observed in the different layers of cortex during mouse development is likely not entirely due to migration, as concluded.

      We agree that the migration parameters in assembloids should not be interpreted in isolation. We will revise the text to emphasize that the reduction in the number of interneurons observed in the adult brains is part of a broader pattern that also includes altered neuronal output and reduced viability.

      (9) To properly weigh the present evidence -interneuron deficits- using the ARHGEF6-KO model, authors should include a deeper discussion in light of much work that has been done using these mice. How does the finding of a diminished IN population in the brain of these mice explain the large amount of electrophysiological and behavioral evidence produced before with these animals? Perhaps the most important work to discuss these aspects is the initial ARHGEF6-KO report by Ramakers and colleagues (2012), but there are others.

      We appreciate the reviewer’s emphasis on the importance of framing our findings within the broader context of the existing literature. We will expand the Discussion to better integrate previous work on ARHGEF6-KO mice. Specifically, we will discuss how reduced interneuron number and altered interneuronal function may contribute to previously reported electrophysiological and behavioral phenotypes, acting in concert with previously described alterations in excitatory neurons and synaptic plasticity (Ramakers et al., 2012).

      Minor comments:

      (1) Figure 1A. It looks clear that the GE shows the highest expression of ARHGEF6; however, the reader needs the reference levels where the log2 expression is calculated. What are the reference levels?

      We would like to thank the reviewer for pointing this out. We will clarify in the caption that the log2(RPKM+1) expression values are shown as absolute values and are not relative to a reference condition.

      (2) Have the authors compared the number of GAD67-eGFP cells in the hippocampal cultures between WT and ARHGEF6-KO mice?

      We did not rely on total GAD67-eGFP counts in dissociated hippocampal cultures because differences could reflect initial plating composition, survival, and maturation. In our experience, the MGE-like organoid system provides a more controlled in vitro context to assess neuronal output in the ventral lineage.

      (3) Section 3, as a caution note, authors should mention that it is not possible to know from the evidence provided which cells are dying.

      We agree with the reviewer and will add a cautionary statement noting that TUNEL staining alone does not identify the precise dying cell type. We will clarify that increased cell death in the ganglionic eminence and MGE-like organoids is consistent with a prominent involvement of the ventral/inhibitory lineage, while acknowledging the limits of the assay.

      (4) In the dorsal-ventral assembloids, it is expected that the ventral organoid would contain lots of GFP expression compared to the dorsal, but in the image shown (Figure 5A) both parts of the assembloid seem to have the same amount and distribution of GFP. How is that possible?

      We appreciate the thoughtful comment of the reviewer. After two weeks of fusion, a considerable number of interneurons are expected to have migrated from the ventral to the dorsal compartment of the assembloid (Birey et al., 2017; Sloan et al., 2018). In terms of distribution, we think that current Figure 5A shows a gradient of eGFP-positive cells within the dorsal compartment, with the number of labeled cells decreasing as the distance from the fusion interface between the two organoids increases. By contrast, a comparable gradient is not evident in the ventral compartment, where several labeled neurons remain present even in regions distal to the fusion site.

      Reviewer #3 (Public review):

      Summary:

      ARHGEF6 is a RAC1/CDC42 guanine nucleotide exchange factor that has been proposed to be associated with X-linked intellectual disability, but its relevance to the pathology is not well established. ARHGEF6 has been assigned a role in spine density and plasticity of hippocampal pyramidal neurons, but nothing is known about its role in interneuron development. Here, the authors show that ARHGEF6 is expressed early in development in the inhibitory lineage during the peak of interneuron generation and migration. The aim of the study is therefore to investigate whether, in addition to its role in pyramidal neurons, ARHGEF6 could play a role in inhibitory neuron development. Using both ARHGEF6-KO mice and organoids from ARHGEF6-KO hiPSCs, the authors show that ARHGEF6 plays a critical role in interneuron development and function

      Strengths:

      The major strength of the paper is the very detailed analysis of the role of ARHGEF6 using two different systems: ARHGEF6-KO mice and deletion of ARHGEF6 in human iPSC-derived organoids. Strikingly, deletion of ARHGEF6 in both systems induces similar defects such as an increase in apoptosis, reduced neuronal output, impaired neuronal morphology, and disrupted migratory dynamics. This compelling evidence demonstrates that ARHGEF6, in addition to its already well-described role in spine formation and plasticity, is playing a crucial role during embryonic development through its function in interneurons.

      We thank the reviewer for this positive assessment of our work and for highlighting the strength of our combined in vivo and human iPSC-derived organoid approaches. We are pleased that the reviewer recognizes the consistency of the phenotypes observed across both systems and acknowledges that our findings support a crucial role, during early stages of embryonic development, for a protein previously thought to be relevant primarily in the synaptic context.

      Weaknesses:

      (1) In Figure 1, the authors show that ARHGEF6 is expressed in different regions of the brain, including the interneuron lineage, and that depletion of ARHGEF6 reduces the number of GABAergic neurons in the adult cortex and hippocampus. To try to better characterize this defect, the authors in Figure 2 investigate whether deletion of ARHGEF6 affects interneuron migration and survival during embryonic development. To do so, ARHGEF6 ko mice were crossed with the GAD67-eGFP reporter line to follow the inhibitory lineage. The authors analyse apoptosis using TUNEL staining, and show that it is significantly increased in the ganglion eminence of ARHGEF6-KO E14.5 embryos. The authors claim that this is not the case in the cortex. However, the image shown in Figure 2A really suggests that staining is increased. Which part of the neocortex is analysed for quantification? This should be clarified.

      We would like to thank the reviewer for pointing this out. The region analyzed was the same as that used to assess GAD67-eGFP-positive cells in Figure 2F. We will clarify the exact neocortical region used for TUNEL quantification and revise the figure and legend to make the analyzed area explicit. We will also analyze additional animals to improve the accuracy of the analysis.

      (2) In Figure 2F-J, the authors investigate the migration of interneurons by analysing the GAD67-eGFP staining, and clearly show that the migratory abilities of the depleted neurons are reduced. However, the authors do not discuss the fact that, because depletion of ARHGEF6 increases apoptosis, there are fewer neurons available for migration. This is important for the interpretation of the data. This point should be clarified.

      We appreciate this comment and believe that it is particularly relevant to the interpretation of the data shown in Figure 2F–G. We will clarify the limited interpretation of this specific analysis in the Results section. The altered directionality observed in vivo, together with evidence of impaired migratory behavior obtained through in vitro live imaging, supports the possibility that altered migratory dynamics contribute to the phenotype, although increased apoptosis and reduced neuronal output may also contribute.

      (3) In Supplementary Figure S2, the authors describe the establishment of the ARHGEF6-KO human iPSC line and test the ability of these cells to undergo correct development, especially for the generation of neural progenitor cells. I was wondering why the authors do not present the data of both control and ARHGEF6-KO cells.

      We thank the reviewer for pointing this out. All staining reported in the organoids and assembloids in this paper shows that the WT ATCC-DYS0100 cell line, as well as the mutant, efficiently differentiates into neuronal tissue. The Supplementary Figure was intended to validate the impact of the mutation on the ability of the iPSC line to retain its differentiation capacity as a preliminary step before proceeding with organoid differentiation. We will integrate stainings for NPC markers on the WT line in the Supplementary Figure.

      (4) At the molecular level, how ARHGEF6 depletion could affect neuronal survival is missing. In addition, as ARHGEF6 is a GEF for RAC1 and Cdc42 amongst other GEFs, I would have expected that the authors test how RAC1 activity (and Cdc42) is affected in ARHGEF6-depleted brains and in ARHGEF6-KO organoids. The measure of phalloidin staining and the anisotropy index are not really meaningful.

      We appreciate the thoughtful comment of the reviewer. Previous evidence already shows that Arhgef6 loss reduces the activity of both GTPases and deregulates the expression of the cytoskeletal regulators Pak1–3, Limk1, and Cofilin in the mouse brain (Ramakers et al., 2012). Regarding organoids, we agree that direct RAC1/CDC42 activity measurements would have strengthened the molecular mechanism. We will revise the manuscript to avoid implying that our phalloidin-based measurements alone establish the underlying dysregulated molecular pathway.

      (5) The authors show that ARHGEF6-KO forebrain organoids were markedly smaller compared to their isogenic controls, and their study suggests that ARHGEF6 expression impacts progenitor maintenance and neurogenesis. Despite representing only a minority of the total neuronal population, I was wondering whether ARHGEF6-KO mice present brain morphology defects such as microcephaly.

      We appreciate the comment. We did not perform a morphometric analysis for microcephaly in the present study. We will add this limitation to the Discussion and note that gross brain morphology changes were not reported in the previously published ARHGEF6-KO mouse characterization (Ramakers et al., 2012). We will also clarify that the smaller organoid phenotype may reflect developmental defects that may reflect developmental defects that are not fully compensated in a reductionist in vitro model and therefore do not necessarily imply overt microcephaly in vivo.

      References

      Allen Institute for Brain Science. Allen Mouse Brain Atlas: Arhgef6 ISH data. Available from: Allen Brain Map.

      Birey, F., Andersen, J., Makinson, C. D., Islam, S., Wei, W., Huber, N., Fan, H. C., Metzler, K. R. C., Panagiotakos, G., Thom, N., O’Rourke, N. A., Steinmetz, L. M., Bernstein, J. A., Hallmayer, J., Huguenard, J. R., & Pașca, S. P. (2017). Assembly of functionally integrated human forebrain spheroids. Nature, 545(7652), 54–59. https://doi.org/10.1038/nature22330

      Liaci, C., Camera, M., Zamboni, V., Sarò, G., Ammoni, A., Parmigiani, E., Ponzoni, L., Hidisoglu, E., Chiantia, G., Marcantoni, A., Giustetto, M., Tomagra, G., Carabelli, V., Torelli, F., Sala, M., Yanagawa, Y., Obata, K., Hirsch, E., & Merlo, G. R. (2022). Loss of ARHGAP15 affects the directional control of migrating interneurons in the embryonic cortex and increases susceptibility to epilepsy. Frontiers in Cell and Developmental Biology, 10, 875468. https://doi.org/10.3389/fcell.2022.875468

      Nodé-Langlois, R., Muller, D., & Boda, B. (2006). Sequential implication of the mental retardation proteins ARHGEF6 and PAK3 in spine morphogenesis. Journal of Cell Science, 119(23), 4986–4993. https://doi.org/10.1242/jcs.03273

      Pelkey, K. A., Chittajallu, R., Craig, M. T., Tricoire, L., Wester, J. C., & McBain, C. J. (2017). Hippocampal GABAergic inhibitory interneurons. Physiological Reviews, 97(4), 1619–1747. https://doi.org/10.1152/physrev.00007.2017

      Ramakers, G. J. A., Wolfer, D., Rosenberger, G., Kuchenbecker, K., Kreienkamp, H.-J., Prange-Kiel, J., Rune, G., Richter, K., Langnaese, K., Masneuf, S., Bösl, M. R., Fischer, K.-D., Krugers, H. J., Lipp, H.-P., van Galen, E., & Kutsche, K. (2012). Dysregulation of Rho GTPases in the αPix/Arhgef6 mouse model of X-linked intellectual disability is paralleled by impaired structural and synaptic plasticity and cognitive deficits. Human Molecular Genetics, 21(2), 268–286. https://doi.org/10.1093/hmg/ddr457

      Sloan, S. A., Andersen, J., Pașca, A. M., Birey, F., & Pașca, S. P. (2018). Generation and assembly of human brain region-specific three-dimensional cultures. Nature Protocols, 13(9), 2062–2085. https://doi.org/10.1038/s41596-018-0032-7

      Yao, Z., Nguyen, T. N., van Velthoven, C. T. J., Goldy, J., Sedeno-Cortes, A. E., Baftizadeh, F., Bertagnolli, D., Casper, T., Chiang, M., Crichton, K., Ding, S.-L., Fong, O., Garren, E., Glandon, A., Gouwens, N. W., Gray, J., Graybuck, L. T., Hawrylycz, M. J., Hirschstein, D., … Zeng, H. (2021). A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell, 184(12), 3222–3241.e26. https://doi.org/10.1016/j.cell.2021.04.021

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosinbased mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Move methodological and descriptive details (e.g., especially from the second Results subheading and Figure 2) to the Methods or Supplementary Materials.

      In these parts, we define four phases of kinetochore motion in early mitosis. Without such a description in the main text, readers would be confused about subsequent analyses. Figure 2 is also important to show examples of how the four phases develop. Although we respect this suggestion from the reviewer, we would like to keep these parts in the main text and main figure.

      Remove repetitive statements that simply restate that later phenotypes arise as consequences of delayed Phase 1 (applicable to subheadings 3 onward).

      As suggested, we have removed the statement for the delayed start of Phase 2 for peripheral kinetochores in azBB-treated cells (Page 9, second paragraph). We have also simplified the statement for the delayed start of Phase 3 and Phase 4 to avoid repetition (Page 9, third paragraph; Page 10, second paragraph).

      Figure 4I: This panel is currently unclear and should be drastically simplified.

      Following this suggestion, we simplified Figure 4I by removing the column of ‘Start’, which is easily deduced from the ‘Duration’ results and therefore does not provide much new information.

      I recommend to reorganize figures as follows:

      Figure I: Keep as single figure but simplify. Figure 1D and 1E could be combined, move unnormalized SCV to supplementary materials. Same goes for 1F.

      We have reorganized Figure 1, as suggested, and moved unnormalized data to supplemental materials.

      New Figure 2: Combine current Figures 2A, 3A, 3C, 3D, 4C, 4F, and 4H to illustrate how PANEM contraction facilitates initial interactions of peripheral chromosomes with spindle microtubules which increases speed of congression initiation.

      If we were to follow this suggestion, we would lose Figure 2B, D, Figure 3B and Figure 4A, where examples of kinetochore motions are shown in images and 3D diagrams. The new Figure would mostly consist of only graphs. Without examples of images and 3D diagrams, readers would have difficulty understanding the study. Although we respect this suggestion from the reviewer, we would like to keep Figures 2, 3 and 4, as they are (except for making Figure 4I simpler; see above).

      New Figure 3: Combine current Figures 5A, 5C, 5D, 5F, 6B, 6C, and lower panels of 4H to show how

      PANEM contraction repositions polar chromosomes and reduces chromosome volume in early mitosis to enable rapid initiation of congression.

      If we were to follow this suggestion, we would lose Figure 5B and Figure 6A, where examples of kinetochore/chromosome dynamics are shown in images and 3D diagrams. For the same reason as above, we would like to keep Figure 5 and 6 as they are, although we respect this suggestion from the reviewer.

      New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.

      We have conducted new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We have combined the new results with the original Figure S7 to create Figure 8 in line with this suggestion.

      On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.

      B. Specificity and redundancy of actin perturbation

      To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:

      Apply global actin inhibitors (e.g., cytochalasin D, latrunculin A) to disrupt the entire actin cytoskeleton. These perturbations strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as reported previously (Lancaster et al., 2013; Dewey et al., 2017; Koprivec et al., 2025). The minimal effect of global inhibition must be addressed when proposing a localized actomyosin mechanism. Comment if the apparent differences in this approach and one that the authors were using arises due to different cell types.

      We did experiments along this line, using a dominant-negative LINC construct, in our previous study (Booth et al eLife 2019). LINC-DN should more specifically remove/reduce PANEM than the global actin inhibitors mentioned above. LINC-DN attenuated the reduction of CSV soon after NEBD and increased the number of polar chromosomes (Booth et al eLife 2019); i.e. in this regard, the outcome was similar to azBB treatment in the current study. One can expect that global actin inhibitors would also inhibit the PANEM formation and show effects similar to LINC-DN. By contrast, the indicated references reported that global actin inhibitors strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as the reviewer noted. One possibility is that such differences may have arisen from different cell types – this could be important, especially given that some cells form the PANEM and others do not (Figure 8A). A second possibility is that cytokinesis, mitotic rounding and PANEM formation may rely on actin polymerization to different extents. For example, the same concentration of global actin polymerization inhibitors may affect cytokinesis, but may still allow PANEM formation to proceed without observable effects on early chromosome movements. As suggested, we discussed this topic in the Discussion (page 16, third paragraph).

      Clarify why spindle-associated actin, especially near centrosomes, as reported in prior studies using human cultured cells (Kita et al., 2019; Plessner et al., 2019; Aquino-Perez et al., 2024), was not observed in this study. The Myosin-10 and actin were also observed close to centrosomes during mitosis in X.laevis mitotic spindles (Woolner et al., 2008). Possible explanations include differences in fixation, probe selection, imaging methods, or cell type. Note that some actin probes (e.g., phalloidin) poorly penetrate internal actin, and certain antibodies require harsh extraction protocols. Comment on possibility that interference with a pool of Myo10 at the centrosomes is important for effects on congression.

      As the reviewer implies, we cannot rule out that we could not detect actin associated with the spindle or centrosomes because of the difference in methods or cell lines between the current study and the literature mentioned by the reviewer. We have therefore moderated our claim in the Discussion that ‘we did not detect any actin network inside the nucleus, on the spindle or between chromosomes’ by adding ‘at least, using the method and the cell line in the current study’ to this statement (Page 14, second paragraph). We have also cited the three references mentioned by the reviewer in the Discussion (Page 14, second paragraph). Regarding Myosin10, azBB (blebbistatin variant) should have negligible effects on class-X myosin, including Myosin-10 (Limouze et al 2004 [PMID 15548862]). It is therefore unlikely that the effects of azBB that we observed in the current study are due to the inhibition of Myosin-10. We have cited Woolner et al 2008 and another paper and discussed this topic in the Discussion (Page 14, second paragraph).

      C. Expansion of PANEM functional analysis

      To strengthen the conclusions and broaden the study beyond the group's previous work, PANEM function should be tested in additional contexts (some may be considered optional but important for broader impact): [underlined by authors]

      Test PANEM function in at least one additional cell line that displays PANEM to rule out cell-line-specific effects.

      As suggested, we have studied the effect of PANEM contraction in cell lines other than U2OS. We have found that when PANEM contraction was inhibited, the reduction in chromosome scattering was diminished in RPE1 cells (new Figure 8B, C). Moreover, we have found that inhibition of PANEM contraction increased polar chromosomes during prometaphase/ metaphase in RPE1 and HCT116 cells (which form PANEM), but not in HeLa cells (which do not form PANEM) (new Figure 8D, E). These results suggest that the effects of PANEM contraction, originally observed in U2OS cells, are also present in other cell lines (RPE1 and HCT116) that form PANEM.

      Examine higher-ploidy or binucleated cells to determine whether multiple PANEM contractions are coordinated and if PANEM contraction contributes more in cells of higher ploidies or specific nuclear morphologies.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Investigate dependency on nuclear shape or lamina stiffness; test whether PANEM force transmission requires a rigid nuclear remnant.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Analyze PANEM's contribution under mild microtubule perturbations that are known to induce congression problems (e.g., low-dose nocodazole).

      In the current study, we found that PANEM contraction affects chromosome motions in Phase 1 and Phase 3 but not Phase 2 or Phase 4. Mild microtubule perturbation itself could affect chromosome motions in all four Phases. We do not think it would be so informative to study what additional effects the reduced PANEM contraction shows when combined with mild microtubule perturbation.

      Evaluate PANEM contraction role in unsynchronized U2OS cells, where centrosome separation can occur before NEBD in a subset of cells (Koprivec et al., 2025), and in other cell types with variable spindle elongation timing.

      Following this suggestion, we first investigated the timing of spindle elongation, relative to NEBD, in asynchronous U2OS cells (Figure 8 – figure supplement 3). We imaged cells every 5 min (it was difficult to reasonably observe enough mitotic cells using a shorter interval). Most of the cells showed no significant change in the spindle length (distance between two spindle poles) after (or around) NEBD [e.g. Cell 1 in A] or a mild reduction in it [e.g. Cell 2 in A]. Only a small number of cells (2-3 out of 26) showed a mild increase in the spindle length after (or around) NEBD [e.g. Cell 3 in A]. Because the spindle elongation after NEBD was rare and mild, it was difficult to address how the timing of spindle elongation affects the effect of PANEM on reducing chromosome scattering and on chromosome relocation from polar regions. We explained this result and discussed this topic in the Discussion section.

      Quantify not only the percentage of affected cells after azBB but also the number of chromosomes per cell with congression defects in the current and future experiments.

      It is tricky to count the number of chromosomes because they frequently overlap. Counting kinetochores is more feasible, but kinetochore signals show some non-specific background (e.g. those outside of the nucleus in prophase). We therefore quantified the chromosome volume at polar regions in azBB-treated cells (Figure 6C).

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.

      It has been a widely accepted view in the field that chromosome congression precedes biorientation, since the publication in 2006 (Kapoor et al Science 2006). Very recently, this view has been challenged by the new publication (Vukušić & Tolić, Nat comm 2025), as indicated by this reviewer. We have mentioned this new model and discussed the new interpretation of our results based on this new model, in the Discussion (page 15; ‘It has been a widely accepted view…’).

      To explain the new interpretation of our results more clearly, we have a new diagram as a supplemental figure (Figure 9 – figure supplement 1) in the revised manuscript.

      Explain that PANEM is most critical for polar chromosomes because their peripheral positions are unfavorable for rapid biorientation (Barišić et al., 2014; Vukušić & Tolić, 2025).

      We have included such a statement in the Discussion, as a part of the new interpretation of our results based on the new model that chromosome biorientation precedes congression (see above). We have also cited the indicated two papers.

      Discuss how cell lines lacking PANEM (e.g., HeLa and others) nonetheless achieve efficient congression, and what alternative mechanisms compensate in the absence of PANEM. For example, it is well established that cells congress chromosomes after monastrol or nocodazole washout, which essentially bypasses the contribution of PANEM contraction.

      Following this suggestion, we discussed three possible mechanisms that could compensate for a lack of PANEM and facilitate kinetochore-MT interaction and chromosome congression, based on previous literature (Page 17): 1) the enhanced assembly rate of spindle MTs may facilitate kinetochore-MT interactions in N-CIN+ cancer cells, 2) chromosome biorientation may precede congression more frequently to promote the congression towards the spindle midplane, and 3) the balance between CENP-E, Dynein and chromokinesin’s activities may incline to greater chromosome-arm ejection forces towards the spindle midplane.

      Minor Comments

      These issues are more easily addressable but will significantly improve clarity and presentation.

      Introduction

      Remove the reference to Figure 1A in the Introduction. The portion of Figure 1 and related text that recapitulates the authors' previous work should be incorporated into the Introduction, not the Results.

      As suggested in the second sentence of this comment, we have moved most of the second paragraph of the first section of Results to Introduction (Page 4) and cited Figure 1A and 1B in Introduction. We would like to keep the reference to Figure 1A in the Introduction, because showing the PANEM images at the beginning of the manuscript would help readers’ understanding of our study. In addition, citing Figure 1A in the Introduction is more consistent with the suggestion in the second sentence of this comment.

      Results (by subheading)

      First subheading: When introducing the ~8-minute early mitotic interval, cite additional studies that have characterized this period: Magidson et al., 2011 (Cell); Renda et al., 2022 (Cell Reports); Koprivec et al., 2025 (bioRxiv); Vukušić & Tolić, 2025 (Nat Commun); Barišić et al., 2013 (Nat Cell Biol).

      As suggested, we cited these references at the indicated part of the first section of the Results (page 5).

      Second subheading: Cite key reviews and foundational research on kinetochore architecture and sequential chromosome movement during early mitosis: Mussachio & Desai, 2017 (Biology); Itoh et al., 2018 (Sci Rep); Magidson et al., 2011 (Cell); Vukušić & Tolić, 2025 (Nat Commun); Koprivec et al., 2025 (bioRxiv); Rieder & Alexander, 1990 (J Cell Biol); Skibbens et al., 1993 (J Cell Biol); Kapoor et al., 2006 (Science); Armond et al., 2015 (PLoS Comput Biol); Jaqaman et al., 2010 (J Cell Biol).

      Rieder & Alexander, 1990 (J Cell Biol) and Kapoor et al., 2006 (Science) have already been cited in the second section of the Results in the original manuscript. We agree that all other references should be cited in this manuscript, and they are now cited in the Introduction and/or Discussion where they fit best (e.g. Mussachio & Desai 2017 reviews the kinetochore in general and is therefore best cited in the Introduction).

      Third subheading: Clarify why some kinetochores on Figure 3A appear outside the white boundaries if these boundaries are intended to represent the nuclear envelope.

      We interpret that these are background signals in the cytoplasm, which do not come from kinetochores, because 1) before NEBD, they were outside of the nucleus, and 2) after NEBD, they did not show any characteristic kinetochore motions such as those towards a spindle pole (Phase 2) and the spindle mid-plane (Phase 4). We have commented on these background signals in the legend for Figure 3A.

      Fourth subheading: Note that congression speed is lower for centrally located kinetochores because they achieve biorientation more rapidly (Barišić et al., 2013, Nat Cell Biol; Vukušić & Tolić, 2025, Nat Commun).

      Relevant to this comment, there was an error regarding the congression speed of central kinetochores (original Figure 4H). The congression speed of peripheral kinetochores was shown correctly, but for central kinetochores it was shown incorrectly with µm per time interval (30s) shown, rather than µm per minute. We amended this error in the revised manuscript (new Figure 4H). Based on the corrected data, the speed of congression is similar between peripheral and central kinetochores. The original Figure 3G (the speed of poleward motion for central kinetochores) had a similar error, which we have also corrected in the revised manuscript. We apologize for these errors and the confusion it may have caused.

      Regarding this comment, if biorientation is achieved more rapidly for central kinetochores, Phase 3 (rather than congression speed) would be shorter for central kinetochores. Indeed, Phase 3 is slightly shorter for central kinetochores (control) than for peripheral kinetochores (control) (Figure 4C), but the difference is not statistically significant (t test; p\=0.21).

      Fifth subheading: Cite studies on polar chromosome movements: Klaasen et al., 2022 (Nature); Koprivec et al., 2025 (bioRxiv). Clarify that Figure 5F displays only those kinetochores that initiated directed congression movements.

      These two references have already been cited and discussed in this Result section of our original manuscript. However, considering this suggestion, we have discussed more about polar chromosome movements reported by Koprivec et al (page 11). Meanwhile, the reviewer is correct about Figure 5F, and we have clarified this point in the Figure 5F legend.

      Sixth subheading (currently in Discussion): Move the final paragraph of the Discussion into the Results and expand it with preliminary analyses linking PANEM contraction to congression efficiency across untreated cell types or under mild nocodazole treatment.

      As suggested, we have moved the final paragraph of the Discussion in the original manuscript to make a new final section in the Results in the revised manuscript. Moreover, as suggested, we have studied the outcome of inhibiting PANEM contraction in cell lines other than U2OS (Figure 8 B–E), and have described the new results to the new final section in the Results.

      Discussion

      1. When discussing cortical actin, cite key reviews on its presence and function during mitosis: Kunda & Baum, 2009 (Trends Cell Biol); Pollard & O'Shaughnessy, 2019 (Annu Rev Biochem); Di Pietro et al., 2016 (EMBO Rep).

      As suggested, we have cited all these review papers in the Discussion (page 17), and mentioned the role of the cortical actin on the spindle orientation and positioning (Kunda & Baum, 2009; Di Pietro et al., 2016), as well as the function of the actomyosin ring on cytokinesis (Pollard & O'Shaughnessy, 2019).

      Significance

      Advance

      This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]

      We have addressed the underlined criticisms as detailed above.

      Audience

      Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.

      Expertise

      My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.

      Significance

      While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]

      The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is.

      Reviewer #3:

      Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochore-microtubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.

      Major points

      (1) The complexity of tracking has been managed by classifying kinetochore movements into 4 categories, considering motions towards or away from the spindle mid-plane. While this is a very creative solution in most cases, there may be some difficult phases that involve movement in both directions or no dominant direction (eg Phase3-like). It is unclear if all kinetochores go through phase1, 2, 3 and 4 in a sequential or a few deviate from this pattern. A comment on this would be helpful. Also, it may be interesting to compare those that deviate from the sequence, and ask how they recover in the presence and absence of azBB.

      To respond to this comment, we would like to first clarify how we selected kinetochores for our analysis. We selected kinetochores that can be individually tracked. If kinetochore tracking was difficult (before the start of Phase 4 in control and azBB-treated cells or before observing the extended Phase 3 in azBB-treated cells) because of kinetochore crowding, we did not choose such kinetochores. For example, related to the next comment of this Reviewer, we did not include kinetochores close to spindle poles (within 4 µm) at NEBD in our analysis for the following two reasons: First, these kinetochores often did not show clear and rapid movements towards a spindle pole, which we used to define Phase 2. Second, although we referred to kinetochore co-localization with a microtubule signal for the start of Phase 2, this was difficult for kinetochores close to spindle poles because of a high density of microtubules. As requested, we have added this comment to the Method section (page 25).

      With the above selection, all selected kinetochores without azBB treatment (control) showed the poleward motion (Phase 2) and congression (Phase 4) in this order, though their extents were varied among kinetochores. All selected kinetochores with azBB treatment also showed the poleward motion (Phase 2), and some of them showed congression (Phase 4) after Phase 2. Then, Phase 1 and Phase 3 were defined as intervals between NEBD and Phase 2 and between Phase 2 and Phase 4, respectively. If no Phase 4 was observed with azBB, we judged that Phase 3 continued till the end of tracking. We have added this comment to the Method section (page 25-26).

      (2) Would peripheral kinetochore close to poles behave differently compared to peripheral kinetochore close to the midplane (figure S4)? In figure 3D, are they separated? If not, would it look different?

      Since we did not include kinetochores close to spindle poles (at NEBD), for which it was difficult to define Phase 2 (see our response to the above major point 1), in our analysis, the suggested comparison is not feasible.

      (3) Uncongressed polar chromosomes (eg., CENPE inhibited cells) are known to promote tumbling of the spindle. In figure 5B with polar chromosomes, it will be helpful to indicate how the authors decouple spindle pole movements from individual kinetochore movements.

      In contrast to CENPE-inhibited cells, azBB-treated cells did not show much tumbling of the spindle, though both cells showed uncongressed polar chromosomes. The reason for this difference may be fewer uncongressed polar chromosomes in azBB-treated cells. There were still modest spindle motions in azBB-treated cells. However, because kinetochore motions were assessed relative to a spindle pole (and other reference points on the spindle) in our study (Figure 2A, C), the modest spindle motions were offset in our analyses of kinetochore motions. We have clarified the underlined part in the Method section (page 24).

      (4) The work has high quality manual tracking of objects in early mitosis- if this would be made available to the field, it can help build AI models for tracking. The authors could consider depositing the tracking data and increasing the impact of their work.

      As suggested, we have included kinetochore tracking data as supplemental data in the revised manuscript (Figure 3 – source data 1–4; Figure 5 – source data 1, 2).

      Minor points

      (1) It will be helpful for readers to see how many kinetochores/cell were considered in the tracking studies. Figure legends show kinetochore numbers but not cell numbers.

      As suggested, we have now mentioned the number of cells, where the kinetochore motions were analyzed, in the legends for Figures 3, 4, 5, and supplemental figures.

      (2) Discussion point: If cells had not separated their centrosomes before NEBD, would PANEM still be effective? Perhaps the cancer cell lines or examples as shown in Figure 6A have some clues here.

      Following this suggestion, we first investigated the timing of spindle elongation, relative to NEBD, in asynchronous U2OS cells (Figure 8 – figure supplement 3). We imaged cells every 5 min (it was difficult to reasonably observe enough mitotic cells using a shorter interval). Most of the cells showed no significant change in the spindle length (distance between two spindle poles) after (or around) NEBD [e.g. Cell 1 in A] or a mild reduction in it [e.g. Cell 2 in A]. Only a small number of cells (2-3 out of 26) showed a mild increase in the spindle length after (or around) NEBD [e.g. Cell 3 in A]. Because the spindle elongation after NEBD was rare and mild, it was difficult to address how the timing of spindle elongation affects the effect of PANEM on reducing chromosome scattering and on chromosome relocation from polar regions. We explained this result and discussed this topic in the Discussion section.

      (3) Figure 7 cartoon shows misalignment leading to missegregation. It may be useful to consider this in the context of the centrosome directed kinetochore movements via pivoting microtubules. Is this process blocked in azBB-treated cells?

      We understand that the Reviewer refers to the kinetochore pivoting mechanism around a spindle pole, which was recently reported by the Tolic group (Koprivec et al., 2026). Such a pivoting mechanism would work only when the spindle elongates (i.e. the distance between spindle poles is enlarged) after NEBD. Therefore, to address this Reviewer’s question, we tried to assess how PANEM contraction contributes to relocating polar chromosomes when the spindle elongates before or after NEBD in asynchronous U2OS cells (i.e. in the situation where the kinetochore pivoting mechanism is applied or not), as we noted above in response to Point 2. However, spindle elongation after NEBD was rare and mild, and we were unable to address this issue (see our response to Point 2). We discussed this matter in the Discussion section.

      (4) Are all the N-CIN- lines with PANEM highly sensitive to azBB? In other words, is PANEM essential for normal congression in some of these lines.

      Because blebbistatin could kill cells by inhibiting cytokinesis, the blebbistatin sensitivity of cell growth may not necessarily reflect how essential the PANEM contraction is for chromosome congression.

      Instead, we addressed more directly how essential the PANEM contraction is for chromosome congression. We analyzed chromosome congression in RPE1 and HCT116 cells (both are NCIN-) in the presence and absence of pnBB, the inhibitor of PANEM contraction (new Figure 8D, E). With pnBB, these cells showed congression defects, suggesting that the PANEM contraction is essential for chromosome congression in these N-CIN- cells.

      (5) Are congression times delayed in lines that naturally lack PANEM?

      For example, it takes 10-20 min for HeLa cells (lacking PANEM) to complete chromosome congression after the NEBD (Bancroft et al 2025: https://doi.org/10.1242/jcs.163659). This is not significantly different from the time (8-18 min) for chromosome congression we observed in U2OS cells (which form PANEM). We assume that cells lacking PANEM have developed a compensatory mechanism for efficient chromosome congression – we have discussed possible compensatory mechanisms in the last paragraph of the Discussion (page 17).

      (6) Page 23 "we first identified the end of congression" how does this relate to kinetochore oscillations that move kinetochores away from the metaphase plate?

      The start of kinetochore oscillation was defined as the end of Phase 4 if we could track the kinetochore until that point. In some cases where the kinetochore became close to the midplane (< 2.5 µm), it was not possible to track it further due to kinetochore crowding around the spindle mid-plane – in such cases, the end of Phase 4 was assigned as the end of tracking. These definitions were not necessarily clear in the original manuscript. Moreover, in the original manuscript, it was not clearly stated that the end of Phase 4 was defined in the same way for both non-polar and polar kinetochores. We have now clarified these points in the Method section (page 25).

      (7) Are spindle pole distances (spindle sizes) different in early and late mitotic cells (4min vs 6min after NEBD) in control vs azBB-treated cells? Please comment on Figure S2E (mean distance) in the context of when phase 4 is completed. Does spindle size return to normal after congression?

      In Figure S2E (Figure 1 – figure supplement 6 in the revised manuscript), we did not observe a significant difference in the spindle-pole distance (the spindle size) between control and azBBtreated cells at any individual time points. The smallest p-value was 0.094 at 6.0 min. As suggested, we have explained this in the legend for this supplementary figure. Completion of Phase 4 is highly variable across different kinetochores within the same cell; thus, a general comment on its completion timing in cells is not feasible.

      Significance:

      The current work builds upon their previous work, in which the authors demonstrated that an actomyosin network forms on the cytoplasmic side of the nuclear envelope during prophase. This work explains how the network facilitates chromosome capture and congression by tracking motions of individual kinetochores during early mitosis. The findings can be broadly useful for cell division and the cytoskeletal fields.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Renard, Ukrow et al. applied their recently published computational pipeline (CHROMAS) to the skin of Euprymna berryi and Sepia officinalis to track the dynamics of cephalopod chromatophore expansion. By segmenting each chromatophore into radial slices and analyzing the co-expansion of slices across regions of the skin, they inferred the motor control underlying chromatophore groups.

      Strengths:

      The authors demonstrate that most motor units of cephalopod skin include a subregion of multiple chromatophores, creating "virtual chromatophores" in between the fixed chromatophores. This is an interesting concept that challenges prevailing models of chromatophore organization, and raises interesting possibilities for how chromatophore arrays may be patterned during development.

      This study introduces new analyses of cephalopod skin that will be valuable for the quantitative study of cephalopod behavior.

      Weaknesses:

      The authors chose to image spontaneous skin changes in sedated animals, rather than visually-evoked skin changes in awake, freely-moving animals. Spontaneous chromatophore changes tend to be small shimmers of expansion and contraction, rather than obvious, sizable expansions. This may make it more challenging to distinguish truly co-occurring expansions from background activity. The authors don't provide any raw data (videos) of the skin, so it is difficult to independently assess the robustness of the inferred chromatophore groupings.

      The patch-clamp experiments in E. berryi are used to test the validity of their approach for inferring motor units. The stimulations evoke expansions of sub-regions of each chromatophore, creating "virtual chromatophores" as predicted from the behavioral analysis. However, the authors were not able to predict these specific motor units from behavioral analysis before confirming them with patch-clamp, limiting the strength of the validation. It would be informative to quantify the results of the patch-clamp experiments - are the inferred motor units of similar sizes to those predicted from behavior?

      The authors report testing multiple experimental conditions (e.g., age, size, behavioral stimuli, sedation, head-fixation, and lighting), but only a small subset of these data are presented. It is difficult to determine which conditions were used for which experiments, and the manuscript would benefit from pooling data from multiple experiments to draw general conclusions about the motor control of cephalopod skin.

      The authors use a different clustering algorithm for E. berryi and S. officinalis, but do not discuss why different clustering approaches were required for the two species.

      Impact:

      The authors use their computational pipeline to generate a number of interesting predictions about chromatophore control, including motor unit size, their spatial distribution within the skin, and the independent control of subregions within individual chromatophores by putatively distinct motor neurons. While these observations are interesting, the current data do not yet fully support them.

      The CHROMAS tool is likely to be valuable to the field, given the need for quantitative frameworks in cephalopod biology. The predictions outlined here provide a useful foundation for future experimental investigation.

      We thank the reviewer for the thoughtful and detailed evaluation of our work and for recognizing the potential of the CHROMAS pipeline for studying chromatophore control.

      We agree that some aspects of the manuscript required clarification and additional explanation, and we have revised the text accordingly. We also now provide access to representative raw video recordings in the Data Availability section. In the E. berryi patch-clamp experiments, single motor neurons evoked expansions of sub-regions of chromatophores, consistent with the “virtual chromatophore” concept. We have now quantified the size of motor units across patch-clamp sessions, and the results show that the inferred motor-unit sizes broadly match those predicted from behavioral recordings, supporting the validity of our approach.

      We agree that pooling data across individuals would provide valuable insight into variability across animals. In practice, we recorded chromatophore activity from several animals (14 Euprymna berryi and 12 Sepia officinalis) under different experimental conditions during development of the experimental pipeline. However, acquiring long, stable, artifact-free recordings suitable for motor unit analysis is technically challenging. We now clarify this point in the manuscript. Specifically, we explain that multiple animals were recorded during pipeline development, while the analyses presented focus on recordings with the highest signal quality. We anticipate that the framework introduced here will enable future studies to collect larger datasets and compare motor unit organization across individuals, developmental stages, and species.

      HDBSCAN was used for E. berryi during initial exploratory analyses, and Affinity Propagation was adopted for S. officinalis because it better captured the correlation structure of those recordings. We did not re-analyze the E. berryi data with Affinity Propagation, and the implications of algorithm choice are now discussed in the Discussion.

      Reviewer #2 (Public review):

      Summary:

      Overall, this is an excellent paper, making use of a newly developed system for monitoring the behaviour of chromatophores in the skin of (mostly) free-swimming bobtail squid and European cuttlefish. The manuscript is very well-written, clearly presented and very well-structured. The central finding, that individual chromatophores are connected to multiple motor neurones, is not new. Novelty instead comes from the ability to measure the actuation of chromatophore sections across wide areas of skin in free-swimming animals, showing the diversity of local motor units and reinforcing the notion that individual chromatophores are not necessarily the individual units of colour change, but rather local motor units that cover multiple neighbour and near-neighbour chromatophore muscles. This is an excellent finding and one that will shape our understanding of the neural control of cephalopod skin colour.

      Strengths:

      The methodological approach to collecting large amounts of data about local variations in the expansion of sections of chromatophores is exciting, and the analysis pipeline for clustering sections of chromatophores whose spontaneous activity correlated over time is powerful and exciting.

      Weaknesses:

      Some minor edits and typographical errors need correcting. I also had some concerns that the preparation for the electrophysiological section of the manuscript complies with the journal's ethical requirements, so I would urge that this be carefully checked.

      We thank the reviewer for the positive evaluation of our work and for recognizing the value of the methodological approach and the clarity of the manuscript.

      We have carefully reviewed the manuscript and corrected minor typographical errors.

      Regarding the ethical considerations raised for the electrophysiological experiments, we have carefully verified that the experimental procedures comply with the journal's ethical requirements and relevant institutional guidelines.

      Reviewer #3 (Public review):

      Summary:

      This study uses high-resolution videography and a custom computer-vision pipeline to dissect the motor control of cephalopod chromatophores in Euprymna berryi and Sepia officinalis. By quantifying anisotropic chromatophore deformations and applying dimensionality reduction methods, the authors infer that individual chromatophores can be a part of multiple motor units. Clustering analyses reveal putative motor units that often span multiple chromatophores, with diverse and overlapping geometries. Chromatophore expansion dynamics are faster and more stereotyped than relaxation, consistent with active neural contraction followed by passive recoil. Together, the results show that chromatophores function not as uniform pixels but as fractionated, coordinately controlled elements that enable flexible pattern generation

      Strengths:

      The authors present compelling, direct evidence that a). chromatophore deformations are anisotropic, and indirect evidence that b) individual chromatophores can be split across multiple putative motor units. This evidence is provided through data collected over large spatial scales, but also at a sub-chromatophore resolution. This combination of scale and resolution is not possible using traditional neuroanatomical and physiological approaches alone.

      The authors also develop a new non-invasive, image analysis approach to extract information about chromatophore deformation across large spatial scales on the organism's body. In principle, this approach is applicable across species and may allow for further comparative characterization of chromatophore motor control. It is therefore a promising new tool and useful resource for the community.

      Weaknesses:

      An important weakness of the work is that the methods the authors develop can only be applied during resting, spontaneous 'flickering' activity of chromatophores. The inability to reliably apply their technique during any kind of realistic camouflage is a large limitation, as it means this method cannot be used to study the dynamics of motor control during realistic camouflage behaviors.

      Another weakness of this paper is the rather limited electrophysiological validation of the computational findings. The authors present only one electrophysiology experiment in E. berryi, the species that they used only for 'methodological development' and not for detailed characterization. A complementary electrophysiological experiment in S. officinalis, or some visualization of neuron morphology confirming that motor neurons do indeed project to multiple chromatophores, would strengthen the generalizability of their computational analysis. This would be particularly pertinent to validate the author's claim that some motor units contain chromatophores that are quite distant from one another on the animal.

      Overall, the authors' technical contributions and method development are an important advance. This work serves as an excellent proof of concept that their method can extract useful information about chromatophore motor control. Further validation of their method is needed to fully trust the fine-scale conclusions drawn about the distribution and composition of multi-innervated chromatophores. Furthermore, the authors raise many interesting ideas about developmental constraints on circuit wiring and potential adaptive significance of multi-innervated chromatophores for certain features of camouflage patterning. Their method may be able to help resolve some of these questions in the future if it is refined and applied across developmental stages, regions of the animal, and across species

      We thank the reviewer for their thoughtful evaluation and for recognizing the potential of the computational approach introduced in this study.

      Regarding the focus on spontaneous chromatophore activity, we have clarified earlier in the Results section why these events are necessary to isolate individual muscle activations. While large camouflage patterns are visually striking, they involve the coordinated activation of many groups of chromatophores by premotor circuits simultaneously, making the identification of individual motor units, our goal here, impossible. Our approach can, however, also be applied during active behavior, including camouflage; the questions addressed there would be different, focusing on how multiple motor units are coordinated to generate the resulting skin patterns, rather than resolving the structure of single motor units. This could be challenging if the patterns of premotor control are highly variable, thus making the detection of meaningful or interpretable motion correlations difficult. This remains to be tested.

      We also acknowledge that electrophysiological validation remains limited. Patch-clamp experiments were performed in Euprymna berryi to test predictions generated by the computational analysis, and these experiments confirmed that activation of single motor neurons can produce anisotropic expansion of chromatophore subregions. We now provide the associated datasets in the Data Availability section. We agree that complementary electrophysiological or anatomical experiments in Sepia officinalis would further strengthen the conclusions. Such experiments represent an important direction for future work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      General points:

      (1) Given all the experimental conditions and animals tested, the manuscript would be much stronger if the figures represented pooled data from many animals and experiments (e.g. Figure 1C).

      We agree that pooling data from multiple animals would strengthen the manuscript. In practice, we tested these experimental conditions across several animals (14 Euprymna berryi and 12 Sepia officinalis), but we selected the segments shown in the figures for their minimal artifacts and errors. Acquiring high-quality, stable recordings of this type is extremely challenging, and the presented data represents the clearest examples suitable for analysis and visualization. We hope that in the future these methods will enable not only the collection of a larger, high-quality dataset, but also comparisons across individuals, ages, species, and different regions of the mantle.

      (2) It's very unclear what animals were used for each experiment:

      (a) E. berryi: L677 states that 14 animals were filmed, and L684 implies that non-sedated individuals were used in addition to sedated animals, but it appears all the data is from a single E. berryi with sedation?

      The original wording was unclear, so we modified the sentence for clarity. The Methods now specify that 14 animals were filmed to refine the experimental pipeline and explore different conditions, while the data presented in the Results are from a single lightly sedated individual chosen for quality and stability of chromatophore activity.

      (b) S. officinalis: L692 onwards states that lots of different conditions and animals were explored, but only minimal data from a couple of animals is described in the figures. L156 states that all (?) the data comes from one head-fixed animal and one sedated and head-fixed animal. L549: The conclusion states that the pipeline was used in freely moving animals, but it appears that all of the S. officinalis were head-fixed? This is very confusing. Rather than describing the conditions of every experiment ever performed, the manuscript would benefit from explicitly stating the experimental conditions used for each figure.

      The original text was unclear. We have clarified in the manuscript which animals and experimental conditions were used for the analyses in each figure. To clarify, E. berryi was recorded without head fixation, whereas S. officinalis data were obtained under head-fixed conditions. We did film 11 S. officinalis without head fixation, and data can in principle be extracted from these recordings. Head fixation was used both to minimize visual artifacts and to enable longer, stable recordings, which was important for capturing the highest level of apparent noise in motor unit activation—information that is critical for our analyses of motor-unit organization, though not necessary for studies of broader camouflage patterns. Our computational pipeline enables large-scale analyses that would be very difficult or impossible with traditional electrophysiology, not that all data were acquired from freely behaving animals. While fully unconstrained recordings remain technically challenging due to optical and logistical constraints, we maintain that our approach provides a valid framework for analyzing freely behaving animals.

      (c) Additionally, there is a claim that the sedated condition represents the unsedated one (e.g. L151 and L643), but no data is shown to support this. L173 references Figure 6d as evidence, but 6d doesn't exist. Only L210 provides sedation/no sedation statistics for the number of components per motor unit. However, in L643 it says "and motor unit organization remained unchanged". This data needs to be shown to include that statement.

      Reference to the inexistant 6d figure was removed. L170 provides statistics for the number of principal components per chromatophore, and L210 provides statistics for the number of components per MU. We do not think a sub-figure is necessary. We, however, agree that L643 “motor unit organisation” is potentially misleading as we only compared the number of chromatophores belonging to a single MU and not the MU shape or distribution. Changed “organization” to “size (in chromatophores)”.

      (3) The text needs considerable revision. There are many typos (including multiple instances of "refs" instead of the actual references being inserted). These issues make the manuscript much more difficult to evaluate.

      Our apologies. We have now added the missing refs.

      (4) It is not clear how convincing the chromatophore groups are. For instance, Figure 4h could alternatively be interpreted as a group of 5 chromatophores in a motor group that happen to co-vary with a sixth one at a great distance. Without seeing some of the raw data (videos), it's difficult to assess how convincing it is that these chromatophores belong to the same group. I recommend analyzing: when multiple chromatophores expand together, what is the likelihood that other chromatophores also happen to expand at the same time (given the frequency that they're all changing shape spontaneously)?

      We appreciate the reviewer’s concern. Chromatophores are assigned to the same cluster because their activity, or that of their slices, covaries consistently over time. It is, of course, possible that what appears as a single motor unit may reflect two or more motor neurons acting simultaneously during the recording. Longer video segments increase confidence in the integrity of inferred motor units, but in the absence of a ground truth for motor unit spatial organization in this species at this age, it is difficult to quantify the likelihood that two motor units are being conflated. Raw video data is provided in the Data Availability section. We note, however, that most of the time motor units cannot be readily discerned by eye, because individual chromatophores and their constituent slices fluctuate continuously, and motor-unit correlations are subtle and distributed across multiple chromatophores.

      (5) The rationale for focusing on spontaneous activity is introduced relatively late in the manuscript and would benefit from being stated earlier. Examples should be provided of what this looks like (as opposed to regular chromatophore expansion). It would be valuable to see measurements across many experiments of how expanded the chromatophores are - what is the change in surface area? And what is the frequency of expansion for each chromatophore?

      Thank you for the remark. This is true. We have added a paragraph at the beginning of the Results section to clarify the rationale for focusing on spontaneous activity.

      This section now reads:

      “Because our primary aim was to describe the composition and coordination of chromatophore motor units, it was important to examine animals in the absence of the descending commands that occur during active behavior. Spontaneous activity, typically mild and “noisy” was thus ideal to enable measurements of the motion correlations between chromatophores that reflected shared motor neuron drive, rather than shared correlations due to upstream motor neuron groupings by premotor circuits.”

      We added an example of video recording of spontaneous activity in our Data Availability section.

      While quantifying expansion magnitude and frequency across experiments would indeed be valuable, these questions fall outside the primary focus of the present study, which centers on resolving motor unit organization. In the section “Dynamics of chromatophore expansion and contraction,” we analyze the speed of expansion and contraction to demonstrate that such kinetic features can be reliably detected with the temporal resolution of our video imaging approach. By isolating single muscle activations, we establish a methodological framework that can be used in future work to quantify expansion amplitude, rate of change and frequency across preparations.

      (6) Chromatophore expansion was only measured in anesthetized E. berryi, and L679 states that chromatophore expansion was triggered by shining light on the skin. However, light-mediated chromatophore expansion may be mediated by a different mechanism, so chromatophore correlations do not necessarily reflect the underlying motor control.

      We agree that there is, in principle, a theoretical risk of direct light-mediated activation of chromatophores. Yet, the kinetics of this light mediated activation are very different, and are the object of a separate, on-going investigation by our groups. In our experiments, the illumination was applied to the whole animal rather than locally to the skin, ensuring that all chromatophores and the eyes were exposed to the same light source. By transitioning from darkness to light, we created a window in which chromatophores were partially expanded—both fully contracted and fully expanded states would show little to no decorrelation. Within this window, we observed spontaneous fluctuations in chromatophore activity, which formed the basis for our correlation analyses. To our knowledge, direct light-mediated expansion of chromatophores has not been reported in E. berryi although it may exist there. Finally, the size, shape, and orientation of the inferred motor units align with electrophysiological evidence, supporting the validity of our motor unit inferences.

      (7) Some figures might be better suited for the supplement. For instance, it's not clear what the significance of Figure 5 is (it's not currently sufficiently justified in the text).

      We have clarified the purpose of Fig. 5 in both the Results and Discussion sections. In the Results, we now explain that events are separated by amplitude to show that expansion–contraction kinetics can be reliably measured across a full range of chromatophore events, validating the precision of our videographic approach. In the Discussion, we highlight that this precision allows measurement of radial muscle speeds and opens avenues to study chromatophore biomechanics, including the contributions of intertwined forces such as radial muscles, elastic pigment sacs, and intercellular coupling.

      (8) Multiple chromatophores can belong to multiple clusters - this study reveals that this is because subsections of a chromatophore are controlled separately. But do the same sections (slices) of chromatophores ever belong to multiple clusters?

      Yes, it is possible. Dubas (1985) used videographic recordings to show that the same chromatophore muscle fibers could be activated by stimulation of different nerve bundles, supporting Florey’s (1969) electrophysiological evidence for polyneuronal excitatory innervation. From Dubas: "Usually, different muscle fibres were recruited by each nerve but sometimes a single muscle fibre responded to stimulation of each nerve. Variations of the stimulus voltage also produced gradation of the amplitude of shortening of individual muscle fibres. This supports the evidence above for multiple innervation of single muscle fibres."

      The petal-like distribution of motor-neuron influence shows overlapping territories, suggesting that some chromatophore sections may be influenced by multiple neurons. However, this overlap could arise from polyinnervation of individual muscles, the presence of gap junctions between muscles, or passive mechanical coupling due to the elastic properties of the pigment sac.

      The petal-like distribution of motor-neuron influence shows overlapping territories, suggesting that some chromatophore sections may be influenced by multiple neurons. However, this overlap could arise from polyinnervation of individual muscles, the presence of gap junctions between muscles, or passive mechanical coupling due to the elastic properties of the pigment sac.

      With the present approach, it is not possible to disentangle the relative contributions of these mechanisms, which will require targeted physiological or anatomical experiments. For this reason, we adopted a hard clustering approach for individual chromatophore slices.

      (9) All time should be labeled in seconds, not in frames, and all distances should be measured in um or mm, not in pixels.

      We chose to present figures in pixels and frames to reflect the native units of our recordings and analyses, which preserves fidelity and reproducibility of the computational pipeline. For biological interpretation, corresponding values are converted to µm in the main text, providing the relevant real-world scale. A scale for conversion is provided in the figure legend.

      Specific comments:

      (1) L36: I'm not sure the description of virtual chromatophores here is clear enough to make sense to a more general audience.

      Addressed. We retained the concept of ‘virtual chromatophores’ in the abstract and added a brief clarifying phrase to indicate that these are functional groupings of adjacent chromatophore territories that act as single units.

      (2) L50: "Rimmed by" - consider rephrasing.

      Addressed. Replaced with “surrounded”.

      (3) L64: "refs" - actual references aren't inserted. There are multiple other examples of this.

      Addressed. Added missing references.

      (4) L100: This section could use rewriting. Some of the text reads more like a figure legend.

      Addressed. We have streamlined the main text to reduce redundancy with the figure legend.

      (5) L101: Consider the opening sentence/s providing a more general introduction to the question and approach.

      Addressed.

      (6) L104: This implies that the data presented are from 14 animals of many ages. This is only relevant if the pooled data is analyzed and presented.

      We agree that the original phrasing was ambiguous. We have modified the sentence for clarity, and explain in the Methods that 14 animals were filmed to refine the pipeline and explore experimental conditions, while the analyses shown are from a single animal.

      (7) L111: HDBSCAN should be defined.

      Addressed. The acronym has been expanded.

      (8) L173: Figure 6D doesn't exist.

      Addressed. Reference to the inexistent 6d figure was removed.

      (9) L193: "excluding negative (contraction) phases" This phrase requires clarification.

      Addressed. Added “see Methods” in the legend and added clarification on the reasoning in Methods.

      (10) L204: Should explain why the switch to affinity-propagation clustering was made when a different method was used for E. berryi.

      Addressed in discussion.

      (11) Figure 3: I recommend including a diagram or image of a whole cuttlefish and showing what the corresponding imaging area was in relation to the animal so the reader gets an intuitive sense of scale.

      Thank you. We have added a supplementary figure to give the reader a sense of scale.

      (12) L221/Fig 3b: These colors are supposed to represent clusters of 3 to 5 chromatophores? The clusters look much bigger.

      The figure shows clusters of 3 to 5 chromatophores, but many adjacent clusters were assigned the same color. We have changed the colors to remove this ambiguity.

      (13) Figure 3c: This would be more powerful if it represented the combined data of many experiments to draw a general conclusion. Also, shouldn't these cluster sizes match those in 2e, e.g. they get as big as 40?

      We assume the reviewer is referring to a comparison between Figures 3c and 2e. For visualization purposes, the graph in 3c was truncated to display over 90% of the data, which explains why the largest clusters appear smaller than in 2e. We modified the legend accordingly. We agree that the results would be strengthened by pooling data from additional experiments; however, acquiring high-quality, artifact-free recordings suitable for motor unit analysis is extremely challenging. We hope that our framework will enable future studies to extend this analysis.

      (14) Figure 4: I would show some of these examples earlier, to give the reader an intuitive sense of the data and claims (though it doesn't need its own figure - provide a couple of examples, and the diagram of how much of the mantle you're sampling) then put the rest in the supplement, and include some videos too.

      We agree that providing spatial context is important for readers to develop an intuitive understanding of the dataset. However, introducing examples of motor units earlier in the manuscript would, in our view, interrupt the logical progression of the Results, where motor unit identification builds on prior analyses. To address the reviewer’s concern, we have added a new supplementary figure (Fig. S1) illustrating the size and location of the sampled mantle region. In addition, we now provide representative videos in the Data Availability section to give readers direct visual access to the underlying dynamics.

      (15) Figure 4f: Is the location of the split color in each dot accurate? It's surprising that each one is split down the middle, and the pink side is always on the right - this is unintuitive given where the motor neuron is likely to be located.

      The dots and half dots represent the membership of a chromatophore to a particular cluster.

      (16) Figure 5: I didn't find this figure sufficiently justified in the text. I would move this to the supplement.

      Addressed in General point #7.

      (17) L350: States that 12 animals were patched, but the data isn't shown. It's important to show all of this data (some of which can be in the supplement).

      Addressed. We provided the data in the Data Availability Section.

      (18) Figure 5: I would quantify how many chromatophores were in each motor group across all the recording sessions, and compare this to the equivalent behavioral analysis.

      We assume the reviewer means Fig. 6. We calculated and stated the size of motor units across patching sessions.

      (19) Figure 5c: I recommend labeling each panel with a different number so you can refer to specific data.

      We assume the reviewer means Fig. 6c. We consider the figure layout clear enough to allow readers to follow the data without additional panel numbers.

      (20) L379: Typo: repeat of "quantitative"

      Addressed.

      (21) L576: Salinity should be 33-36 ppt, not %

      Addressed.

      (22) L877: The salinity units are sg? That should be stated. Though I would use the same units for salinity throughout.

      Addressed.

      Overall, this work introduces a potentially valuable quantitative framework for studying chromatophore dynamics. Addressing the points above would substantially strengthen the manuscript and clarify the scope and support for its conclusions.

      We thank the reviewer for these many helpful comments.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 64 - missing references for chromatophore colour with age.

      Addressed. Added missing refs.

      (2) Line 64-65 - would be good to have a little more detail about what is meant by 'migrating through the skin'. Is this a lateral process, or depth in the skin?

      Addressed. Changed “migrating in the thickness..” with “through the thickness..” to emphasize verticality.

      (3) Line 72 - typo, should read '...individual and groups...'

      Addressed.

      (4) Remove 'In Fig 1, ...' from line 104.

      Addressed.

      (5) Figure 1 - It's unclear why some chromatophores are uncoloured with a red dot in the centre. Are these chromatophores that do not share a cluster with neighbours? If so, wouldn't it make more sense to colour the chromatophore with a unique colour of its own? Or, at the very least, make a note in the caption to indicate that all white chromatophores are not clustered with neighbours.

      Segmented chromatophores are shown in white, with coloured slices highlighting cluster membership. Uncoloured slices represent outliers. Addressed in the figure legend.

      (6) Line 119 - the concept of a 'closed virtual chromatophore' needs a few more words of explanation. The way I interpret the text as it is, is that the motor units driving colour change are not necessarily the individual chromatophores, but a motor region containing a mixture of whole and partial chromatophores innervated by the same motor neuron. If this is the case, a few extra words of description would help here to remove any ambiguity as I think this is an important concept for the paper.

      Addressed. We added a sentence clarifying the concept.

      (7) Line 173 - Figure 6d doesn't exist in the paper. Was a different panel intended? If so, please make sure to number the figures in order of appearance in the manuscript.

      Reference to the inexistent figure 6d was removed.

      (8) Figure 3b is very difficult to see. Perhaps consider lightening the background image. Please also indicate whether the individual colours refer to individual clusters. If this is the case, then some of these clusters look much larger than the 3-5 suggested in the caption.

      This issue has been corrected.

      (9) Line 210 - remove the bold type.

      Addressed.

      (10) Line 211 - please specify which 'two groups' you are referring to here. Presumably, this is anaesthetised and non-anaesthetised.

      Addressed.

      (11) I think that the text is missing any indication of the pixel sizes involved in extracting slice metrics, particularly from the S. officinalis data. It would be great to include some data on how many pixels span the radius of an expanded chromatophore. There is some small indication of this in Figure 2a, but a panel or two with details about the pixel size of S. officinalis chromatophores and their slices would be welcome. This would help with the judgment of the robustness of the resolution of the analysis. Looking at the y-axis in Figure 5a, there is some indication that the chromatophore radius is only 1 to 8 pixels. Is this the case?

      Figure 5a doesn’t show chromatophore radius but instead the relative change in peak amplitude during an expansion event. At that point the chromatophore has likely a larger radius as you sum the baseline radius of the chromatophore + the size of the peak.

      (12) Line 246-7 - reword this sentence to avoid referring to Figure 3d in the narrative. Include it in parentheses instead.

      Addressed.

      (13) Lines 408 and 409 - missing references.

      Addressed.

      (14) Line 576 - salinity should be reported in parts per thousand, not per cent.

      Addressed.

      (15) Line 593 - how were animals <50mm fed?

      Animals smaller than 50 mm were fed Neomysis spp. or small Palaemonetes spp., as noted a few lines above the description for animals larger than 50 mm.

      (16) Line 847 - typo - '...putative motor units' ramifications...'

      Addressed.

      (17) Line 854 - better to write out the [chrom_id, label] info as narrative text rather than using the variable names.

      Addressed.

      (18) Line 876 - two typos '...were reared in an artificial...'

      Addressed.

      (19) Line 877 - please use the same salinity metric as used in the earlier part of the methods.

      Addressed.

      (20) Section 898-910 - equipment details would ideally include the location of the company. E.g. (BX51W1, Olympus, Tokyo, Japan).

      Addressed.

      Reviewer #3 (Recommendations for the authors):

      I am left with a number of questions that arise from the authors' work, some of which the authors themselves briefly mention in the technical limitations section.

      (1) In relation to the first weakness, do the authors know if the recruitment patterns they identify are likely to be the same when octopi perform visually-mediated camouflage to their environment?

      Thank you for this comment. We assume the reviewer is referring to S. officinalis. There seems to be a misunderstanding: our approach is designed to reveal the smallest independent functional units—motor units—that together generate skin patterns. The technique is fully applicable to an animal displaying camouflage, but the results would necessarily differ. Camouflage patterns are composed of relatively large shapes compared to individual motor units and arise from the coordinated activation of multiple units. Disentangling motor units requires decorrelated activity, whereas visually-evoked camouflage inherently drives correlated motor-unit activation by premotor control. To use an analogy, if our goal were to map the distribution and wiring of pixels on a screen, it would be more informative to broadcast a noise signal rather than display coherent images, as the noise produces decorrelated activity that allows the underlying structure to be resolved. We have clarified this important point in the early results section.

      (2) The authors provide indirect evidence that motor neurons innervate multiple chromatophores. Can sets of radial muscles within a chromatophore be innervated by multiple motor neurons? Is there neuroanatomical evidence or experiments that could perhaps shed light on this?

      Addressed above. Same question as #1(8).

      (3) Are multi-innervated chromatophores evenly distributed across the octopus's body? For instance, could the authors compare chromatophore recruitment over multiple patches on the animal from multiple regions?

      At present, we do not have sufficient data to quantitatively compare motor-unit structure or the distribution of multi-innervated chromatophores across different body regions of cuttlefish. However, we would not necessarily expect uniformity across the skin, as distinct body regions are associated with characteristic pattern elements (e.g., the white square on the central mantle or the thicker zebra stripes along the sides). It is therefore plausible that different motor-unit geometries and densities are differentially represented across regions to support these region-specific patterns. Future recordings spanning multiple patches and body locations will be required to test this question directly.

      (4) Relatedly, is there any idea of whether chromatophore size or age corresponds with the number of motor units within a single chromatophore?

      At present, our analyses are limited to single developmental time points, and we therefore cannot directly assess whether chromatophore size or age correlates with the number of motor neurons innervating an individual chromatophore. However, this is a question that our analysis framework is explicitly designed to address. Our custom pipeline, CHROMAS, (Ukrow, Renard et al., 2025) includes tools for longitudinal image alignment that allow chromatophores to be tracked within the same animal across development. Applying these scripts to developmental datasets enables future analyses linking chromatophore growth or age to changes in the motor innervation of single chromatophores.

      I understand that a full resolution to the issues raised above may require substantial additional experiments. At a minimum, further discussion of these points with integration of existing literature would elevate the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The rationale behind averaging sentence embeddings across multiple transformer models (with different architectures and training objectives) is unclear. These transformer-based models have different training paradigms and model architectures, which may result in misaligned semantic spaces. The averaging operation may dilute the distinct sentence representations learned by each model, potentially weakening the overall semantic encoding for sentences. Please clarify this choice or cite supporting methodology.

      The reviewer questions the rationale for averaging sentence embeddings across different models. However, our method involves computing correlations separately for each model, then averaging the correlations. We apologize for the confusion. We have clarified this on page 3:

      “Results for the ‘Transformers’ model are computed by computing correlations separately for five different transformer models and then taking a simple average of these correlations. Results for each individual transformer are presented in Supplementary Information Figure S2.”

      (2) All structure-sensitive models discussed incorporate semantics to some extent. Including a purely syntactic baseline, such as a model based on context-free grammar, would help confirm the importance of syntactic structures.

      Following the suggestion, we have implemented two syntactic models and discuss the results on page 10:

      “We also found that purely syntactic models based on constituency parses (see Benepar and CFG) show poor correlations with brain activity (see Supplementary Information Figure S2). Examining the corresponding RSA matrices (see Figure S1), this seems to be due to such models being overly sensitive to syntactic form, and relatively insensitive to which words are assigned to different nodes within the syntactic tree. This is most evident for the edit-distance similarity metric, and to a lesser extent also for the subtree similarity metric. This finding highlights the value of hybrid approaches designed to appropriately balance sensitivity to lexical, syntactic, and compositional information in representing semantic information at the sentence level.”

      (3) In Figure 2, human behavioral judgments show weak correlations with neural data, and even fall below those of computational models, suggesting the behavioral judgments may not reflect the sentence structures in a brain-like way. This discrepancy between behavioral and neural data should be clarified, as it affects the interpretation of the results.

      While the behavioural judgements are made by different participants and involve a different task than the neuroimaging results, nonetheless we agree the difference is surprising and warrants more detailed consideration. We have included a more detailed discussion of this issue on page 11:

      “Our study has several limitations. First, we found a surprisingly low correlation between behavioural ratings and brain activations (see Figure 2). This may be partly explained by differences in task structure. In the behavioural experiment, participants viewed many pairs of related sentences, and were explicitly asked to pay attention to differences in the words of each sentence. In contrast, in the fMRI task, participants read one sentence at a time without an explicit comparison. In addition, we suspect that presentation of so many sentence pairs with highly similar structures may have biased the way in which participants rated sentence similarity. Modifications to the behavioural task to mitigate these aspects may reduce the divergence between behavioural and brain findings.”

      (4) To better contextualize model and neural performance, sentence similarity should be anchored to a notion of semantic "ground truth", such as the matrix shown in Figure 1a. Comparing this reference with human judgments, brain responses, and model similarities would help establish an upper bound.

      While our design matrix served as the basis for constructing a set of stimuli with systematic modifications, we respectfully suggest that it should not be regarded as a ‘semantic ground truth’. Sentence pairs within each category will not have the same degrees of semantic similarity since the words and context differ across sentences in a graded manner. Furthermore, while we anticipated ‘different’ sentence pairs would be less similar than ‘swapped’ sentence pairs, and that within each of the six block diagonals the ‘modified’ or ‘substituted’ sentence pairs would be the most similar, we did not have any prediction about the magnitude of these differences. Our goal was to construct a set of sentence pairs which spanned a range of semantic similarities, and allowed for dissociation between lexical similarity and overall similarity in meaning. The design matrix is not intended to represent a ‘ground truth’ that human judgements or brain representations would be expected to conform with.

      (5) The structure of this paper is confusing. For instance, Figure 5 is cited early but appears much later. Reordering sections and figures would enhance readability.

      We agree that placement of figures was not ideal in the previous draft. We have reworked the manuscript so that all figures appear closer to their mention in the text, and the figure (now Figure 3) appears in the correct order. We have also substantially revised the discussion, and included subheadings to help guide the reader through the various different issues we include.

      (6) While the analysis is broad and comprehensive, it lacks depth in some respects. For instance, it remains unclear what specific insights are gained from comparing across brain regions (e.g., whole brain, language network, and other subregions). Similarly, the results of simple-average and group-average RSA appear quite similar and may not advance the interpretation.

      We included both analyses in line with our preregistration, and also because we believe the fact that two distinct approaches to analyzing the data yield similar results strengthens our conclusions.

      (7) While explaining the grid-like pattern due to sentence length is important, this part feels somewhat disconnected from the central question of this paper (word order). It might be better placed in supplementary material.

      We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Consider including a purely syntactic baseline model. For instance, parse each sentence into a constituency tree and compute tree edit distances between pairs of trees. This would allow you to construct a sentence similarity matrix based solely on syntactic structure, and may clarify the role of syntax in sentence representations.

      See our response to Public Review comment 2.

      (2) Instead of averaging embeddings across different transformer-based models, I recommend reporting RSA results for each model individually. For instance, compare one sentence-level model (e.g., SentBERT or SimCSE) and one general-purpose language model (e.g., GPT-2 or Llama).

      See our response to Public Review comment 1.

      (3) I suggest revisiting the structure of the Results section to improve the clarity and impact of your key findings. Consider which results are most central to the paper's claims and ensure they are presented in the main text. Less central analyses (e.g., the analysis on the grid-like pattern) might be better suited for the supplementary information. Presenting behavioral results prior to neuroimaging results could also improve logical flow by first validating model similarity estimates behaviorally.

      As mentioned in our response to Public Review comment 5, we have revised the ordering of the figures to improve the flow of the main manuscript. We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript. In addition, we believe that presenting the neuroimaging results first is appropriate as this is the primary and most important contribution of our study.

      Reviewer #2 (Public review):

      (1) The stimuli are not fully controlled for lexical content across conditions. Residual lexical differences between sentences could still influence both brain and model similarity patterns. To more cleanly isolate syntactic effects, it would be useful to systematically vary only a single structural element while keeping all other lexical content constant (e.g., the boy kicked the ball / the ball kicked the boy). It would be better to engage more with the minimal pair paradigm, which is widely used in large language model probing research.

      The reviewer rightly argues that our stimuli do not fully control for lexical content across conditions, and that a more appropriate paradigm may be to utilise minimal pairs in which only a single variable of interest (such as sentence structure) is modified. We agree that most of our sentence pairs do not constitute minimal pairs; however, this was not our objective. Our study design aimed to synthesise traditional minimal pair approaches with more recent research paradigms using naturalistic stimuli. As such, we selected stimuli which are more complex and contain more variable features than traditional minimal pair studies, but which also are tailored to highlight differences which are of particular theoretical interest.

      Because we are interested in comparing the effects of multiple sentence elements and semantic roles, a systematic pairwise comparison of minimal pairs is not necessarily optimal. Instead, we designed our stimuli to leverage the advantage of fMRI in that we can measure the brain representations corresponding to each sentence, and hence can conduct a full series of pairwise comparisons of sentence representations. We do not claim this approach to be universally superior to a minimal pair approach, but we do believe our novel approach provides additional insights and a new perspective on semantic representation relative to minimal pair studies.

      We have added the following paragraph on pages 9-10 contrasting our approach to previous minimal-pair studies:

      “Another approach that has seen widespread use is the presentation of minimal sentence pairs that differ only in one specified aspect, for example, interchanging subject and object in a sentence (Frankland 2015, Wang 2016, Frankland 2020, Giglio 2024), or altering adjective-noun phrases to influence composition (Graves 2010, Schell 2017, Fyshe 2019, Ciapparelli 2025). Our approach is an extension of these approaches utilising more naturalistic and complex sentences, designed to facilitate comparison of a wider range of structural manipulations (see Table 1). In more completely characterising the representational structure of various computational models in response to different structural contrasts, we can more comprehensively evaluate their adequacy as models of semantic processing in the brain.”

      (2) The comparisons are done across fundamentally different model types, including static embeddings, graph-based parsers, and transformers. The inherent differences in dimensionality and training objectives might make the conclusion drawn from RSA inconclusive. Transformer embeddings typically occupy much higher-dimensional, anisotropic representational spaces, and their similarity structure may reflect richer, more heterogeneous information than models explicitly encoding semantic roles. A lower RSA correlation in this study does not necessarily imply that transformers fail to encode syntactic information; rather, they may represent additional aspects of meaning or context that diverge from the narrow structural contrasts probed here.

      The reviewer notes that low RSA correlations do not necessarily imply that transformers fail to encode syntactic information. We acknowledge this in our discussion (page 10), where we also highlight that our focus is not on whether transformers encode such information, but rather what transformer representations can tell us about how sentence structure is represented in the brain. Our results indicate that transformer embeddings do not have the same geometric properties as brain representations of sentence meaning, at least for certain types of sentences where lexical information is insufficient to determine overall meaning.

      The reviewer also notes that transformer embeddings are highly anisotropic; however, we adjust for this by normalising each feature as discussed on page 14. Finally, the reviewer notes that the transformers we examine differ in architecture and training objectives. This is not critical for our study because we are not seeking to determine which architecture or training objectives are best. Our goal is simply to compare a range of approaches and see which, if any, have similar sentence representations to those formed by the brain. In fact, our results indicate that architecture and training regime make relatively little difference for our stimuli, as shown by the pattern of results for all models in Figure S2.

      (3) The interpretation of the RSA correlation largely depends on the understanding of models. The authors suggest that because hybrid models correlate better than transformers, this implies that transformers are inferior at representing syntax. However, this is not a direct test of syntactic ability. Transformers may encode syntactic information, but it may not be expressed in a way that aligns with the RSA paradigm or the chosen stimuli. RSA does not reveal what the model encodes, and the models might achieve a good correlation for non-syntactic reasons (e.g., length of sentence, orthographic similarity, lexical features).

      The reviewer argues that RSA correlations do not measure the extent to which a model encodes syntactic information. This is very similar to the previous point. We do not claim that our results show that transformers do not encode syntactic information. Rather, our claim is that sentence embeddings derived from transformers have different geometric properties to brain representations, and that brain representations are better described by models explicitly representing key semantic roles. From this we conclude that, at least for the sentences we present, the brain is highly sensitive to semantic roles in a way that transformer representations are not (at least to the same extent). We have clarified this in a modified paragraph on page 11:

      “We emphasise that our results do not show that transformers fail to represent syntactic or semantic role information. Indeed, large language models show clear capabilities of correctly interpreting sentence structure (Chang 2024), and probing studies have found that transformers represent information about syntax and word order (Clark 2019, Manning 2020). This is consistent with our finding that directly prompting GPT-4 to rate sentence similarity yields very high correlations with human judgements (see Supplementary Information Figure S3). Nonetheless, the fact that transformers can encode and utilise structural information to perform linguistic tasks does not mean that they effectively utilise this information to construct a brain-like representation of sentence meaning.”

      We also respectfully disagree with the reviewer’s suggestions that sentence length and orthographic or lexical similarities may drive model correlations with brain activity. As we discuss on page 19, we explicitly control for differences in sentence length when computing correlations. Our process for constructing our sentence set also controls for lexical similarity by generating pairs of sentences with all or mostly the same words but different orderings. We did not explicitly address orthographic similarity, but this will be strongly correlated with lexical similarity.

      Reviewer #2 (Recommendations for the authors):

      (1) Model dimensionality: the interpretability of cosine similarity diminishes as the dimensionality increases, and there are some math tricks to work around it. To make a fair comparison among models with different dimensionalities, it would be better to apply some dimensionality-insensitive distance metrics.

      We thank the reviewer for this suggestion. We repeated all vector-based similarity calculations using the Dimension Insensitive Euclidean Metric (DIEM). As shown in Figure S9, the results are broadly similar, though with overall somewhat lower brain correlations for most transformers compared to cosine similarity.

      (2) Depending on the scope of the current study, if the authors would like to establish whether transformers are inferior to graph-based models in representing syntax, a linear classifier using the model embeddings would be sufficient. I think this would be a more direct assessment of model syntax ability than correlation with brain data.

      As we discuss in our previous responses, our objective in this study was not to assess how well transformers can represent syntax. Rather, the goal was to assess whether internal transformer representations have similar geometric properties to patterns of brain activation. Our results indicate that transformers do represent sentence structure, but in a different manner to the human brain.

      Reviewer #3 (Public review):

      (1) The interpretation of findings is nuanced. Although Transformers underperform as brain models on the critical subsets of controlled sentences, a Transformer outperforms all other models when evaluated on the union of all sentences when both word-level content and structure vary. Transformers also yield equivalent or better models of human behavioral data. Thus, although Transformers have demonstrable flaws as human models, which are pinpointed here, in the general case, (some) Transformers are more human-like than the other models considered.

      The reviewer argues that we overstate some of our conclusions, as several transformers achieve higher brain correlations than the hybrid model when computed over all sentence pairs, as well as on the behavioural data. In response, we first note that our primary interest in this paper is on the block diagonal sentence pairs, as these were specifically designed to interrogate how different models represent sentence structure. The comparison with all sentence pairs is presented for comparison but is not our primary focus on this paper, as also reflected in the pre-registered prediction that our VerbNet-CN hybrid model would show higher brain correlations than transformers over this block diagonal subset.

      Second, we have included a new analysis in the revised manuscript (Figure S9) where we compute brain correlations controlling for the pattern of similarities observed in the primary visual cortex (averaged over participants), as a way to control for visual similarity. This added control substantially reduces the brain correlations of the transformers, such that they all have lower correlations than VerbNet-CN and AMR-smatch even over the set of all sentence pairs. We provide interpretation of this result in the discussion.

      Third, we would like to note one of the disadvantages of transformers as a model of mind or brain representations is that they are largely a ‘black box’ whose workings are poorly understood. One advantage of hybrid models like our simple semantic role model is that they can be much easier to interpret, thereby enabling them to be used to determine which features are most important for brain representations of sentence meaning, and what mechanisms are used to combine individual words into a full sentence. Given their relative simplicity and interpretability, we believe hybrid models have considerable value as scientific tools, even in cases where they achieve comparable correlations to transformers. We have added a short discussion of this issue in the revised manuscript (page 10).

      (2) There may be confounds between the critical sentence structure manipulations and visual representations of sentence stimuli. This is inconvenient because activation in brain regions that process semantics tends to partially correlate with visual cortex representations, and computational models tend to reflect the number of words/tokens/elements in sentences. Although the study commendably controls for confounds associated with sentence length, there could still be residual effects that remain. For instance, the Graph model correlates most strongly with the visual cortex despite these sentence length controls.

      We agree with the reviewer that this is a potential confound. As noted in the previous response, we have implemented a new control analysis in which we directly control for visual similarities as reflected in participant-averaged similarities of primary visual cortex activations in response to all stimuli. These results are shown in Figures S8-S11 in the SI. We show that transformer correlations are reduced much more than graph and hybrid models with this control. Also, we note that the AMR-smatch graph model shows high correlations with other brain regions even after removing correlations with the visual cortex (Figure S10). This indicates that the model represents a range of sentence features, including both superficial visual or length-related features, as well as semantic features that are represented in common with language and other cortical regions.

      (3) Sentence similarity computations are emphasized as the basis for unifying comparative analyses of graph structures and vector data. A strength of this approach is that correlation is not always the ideal similarity metric. However, a weakness is that similarity computations are not unified across models. This has practical consequences here because different similarity metrics applied to the same model produce positive or negative correlations with brain data.

      The reviewer notes that the method for computing similarities differs between the vector-based (mean and transformer) models, and the hybrid and syntax-based models, thereby potentially adding an additional confound to our results. We agree that this is a potential limitation, and our correlations should always be understood as applying to a model paired with a similarity metric. However, we believe that this is mostly unavoidable when comparing different formalisms. In the revised manuscript we have incorporated an entirely new similarity metric for vector-based models (DIEM similarity), as well as an extended discussion of the effect of different similarity metrics for graph and hybrid models.

      Reviewer #3 (Recommendations for the authors):

      (1) Compute separate RSAs on each sentence pair type (especially Swapped), to quantify how each sentence type manipulation contributed to the divergence between model and brain. Although the manuscript is already brimming with analyses, I think squeezing this in would be helpful because the results currently rely on qualitative inspection of group-average scatter plots to interpret how sentence pair manipulations contributed to the divergence between Transformers and humans. The Swapped condition would appear to be the centrepiece of the title and manuscript, and potentially the only condition for which confounds associated with the surface form of sentence are controlled for (because sentences should be the same words in different orders). Thus, this analysis might see to the inconvenient visual cortex correlations in Figures 3d/e.

      We respectfully disagree that computing separate RSA for each sentence pair type would be a useful additional analysis. The motivation for the construction of our stimulus set was to provide a range of variants of a given base sentence that alter the semantic meaning and lexical content (somewhat) independently. The purpose of the ‘modified’ sentences, for instance, is to construct sentences with a similar overall meaning but lower lexical similarity due to the inclusion of many modifier words. It is precisely the comparisons across the different pair types that provide information about how each model represents sentence semantics, so restricting an analysis to only a single subset would not be very informative. Another problem with this approach is that it would dramatically reduce the number of sentence pairs analysed, thereby decreasing statistical power. In the revised manuscript we have provided additional details regarding the motivation and rationale for how our stimulus set of 108 sentences was constructed, which should help to elucidate this point more clearly. The following excerpt is from page 3:

      “Within each of the six subsets, we begin with a base sentence such as `the cameraman brought the equipment to the director', which we then systematically modified in various ways to create different combinations of lexical and compositional similarity, in order to dissociate these two aspects of meaning (see Table 1 for further details).”

      (2) Explaining the motivation for the sentence stimulus types. I appreciated the careful design of the dataset, but I couldn't immediately work out the motivation for all the different sentence types, and why this selection was ideal to identify divergences with Transformers. For instance, given the goal of (approximately) controlling for lexical similarity whilst varying sentence meaning, I couldn't immediately see why stimulus blocks weren't all built from rearranging the same content words (as in the Swapped condition). The negative RSA correlation with the Mean model also made me stop and think - it seems like the more similar the words in a sentence, the more different their structure, and vice versa, but I wasn't clear that this was a design feature. Thus, a few extra words motivating the conditions could be helpful for the reader, and these might helpfully lead them to anticipate the negative RSA correlation.

      As noted in the previous response, in the revised manuscript we have expanded our explanation of the rationale for the construction of our 108 sentences. In particular, Table 1 in the methods section now includes two additional columns which summarise the intended combinations of lexical and overall sentence similarity which our sentence pairs are intended to satisfy.

      (3) Explanation for why different implementations and similarity computations between variants of ostensibly equivalent Graph / Hybrid models yielded widely divergent positive vs negative brain correlations, despite both positively capturing behavioural ratings. This might incorporate a brief intuitive explanation of how Graph model similarities were computed (e.g., what SMATCH and WWLK do). In light of the above, why do different similarity algorithms applied to the Graph model yield positive and negative correlations on the same brain (e.g., Figure S2 - Graph / Graph-WL a,b, diag-pairs). Same goes for why Hybrid and Hybrid-AMR yielded positive vs negative correlations (e.g., Figure S2 - Graph / Graph-WL a,b, diag-pairs). Acknowledge that the brain results are sensitive to similarity computations in the Discussion.

      We appreciate this suggestion. We have added an extended consideration of these issues to the discussion (pages 10-11), as well as some additional details regarding the differences between the Smatch and WWLK metrics in the methods section (page 17).

      (4) Acknowledgement and explanation of why the human similarity ratings were poor at explaining brain data in Figure 2a,b (right column diag-pairs). The poor behaviour vs brain match is indirectly implied in the Discussion as "the comparison between behavioural and fMRI data is somewhat difficult owing to the difference in task structure." However, I would suggest being upfront and explicitly mentioning and explaining the poor brain match in Figures 2a and b, because the reader will notice and wonder - especially because the models correlate strongly with the behavioural data without the models doing the human behavioral task (though this could be a possibility, see later).’

      As suggested, we have included a passing reference to this in the presentation of our main results in page 5, and a lengthier discussion on page 11:

      “Our study has several limitations. First, we found a surprisingly low correlation between behavioural ratings and brain activations (see Figure 2). This may be partly explained by differences in task structure. In the behavioural experiment, participants viewed many pairs of related sentences, and were explicitly asked to pay attention to differences in the words of each sentence. In contrast, in the fMRI task participants (who were not the same as the behavioural task participants) read one sentence at a time without an explicit comparison. In addition, we suspect that presentation of so many sentence pairs with highly similar structures may have biased the way in which participants rated sentence similarity. Modifications to the behavioural task to mitigate these aspects may reduce the divergence between behavioural and brain findings.”

      (5) Brief explanation of why model vs brain correlations tended to be strongest in the visual cortex (Figure 3d,e). Currently, this issue is only mentioned in passing, however, it seems worthy of further comment.

      We appreciate the reviewer for highlighting this issue. We have added discussion of the potential for visual confounds to several points in the revised manuscript, including the ‘Neuroscience of semantics’ subsection on page 11. As noted, we have also added a new analysis in which we compute correlations controlling for the average RSA similarities of the primary visual cortex. We find that this additional control significantly reduces correlations for most transformer models, but only has a more modest reduction on the correlations for most of the graph and hybrid models, particularly VerbNet-CN (see Figures S8-S11).

      (6) Softening/clarifying some statements that could be misconstrued as suggesting Transformers were universally inferior models. Statements made in the Abstract/Discussion initially came over to me as implying that Transformers were universally inferior models when compared to the Graph/Hybrid models - but this appears only to be true when one looks at analyses conducted within block diagonal sentence subsets. Otherwise, when analyses are conducted on all sentences (between and within blocks, Figure 5) Llama 3 L2 provides by far the strongest brain model. Transformers also appear to yield the strongest accounts of the behavioural data, whether tested on block diagonal or all sentence pairs (Figure S3). To remedy this, I would suggest softening some statements in the Abstract/Discussion that could be misconstrued as suggesting that Transformers were universally inferior. I would also suggest explicitly acknowledging that when the entire dataset was analyzed, Transformers were most accurate, and that (some) Transformers best accounted for the behavioural data.

      We agree that there was some lack of precision in certain sections of the previous draft regarding the conclusions to be drawn regarding the representational capacities of transformers. We have revised the abstract and conclusion to better reflect our intended message, which is that transformers certainly can represent sentence structure and semantic roles, but that the way in which they do this (through vector representations in their hidden layers) is significantly different to how such features are represented in the human brain. In particular, we have included this new text on page 10:

      “We emphasise that our results do not show that transformers fail to represent syntactic or semantic role information. Indeed, large language models show clear capabilities of correctly interpreting sentence structure, and probing studies have found that transformers represent information about syntax and word order. This is consistent with our finding that directly prompting GPT-4 to rate sentence similarity yields very high correlations with human judgements (see Figure S3). Nonetheless, the fact that transformers can encode and utilise structural information to perform linguistic tasks does not mean that they effectively utilise this information to construct a brain-like representation of sentence meaning.

      (7) Given that GPT-4 was already deployed to parse semantic roles for the hybrid model, and GPT-4 should be able to generate reasonable similarity ratings between sentence pairs, it struck me that an interesting addendum could be to use GPT-4 similarities derived from the human behavioral task to interpret both brain and human behavioral data. This might also help support the case for conducting analyses within a similarity-based framework.

      We appreciate this suggestion. We have added this model (GPT-4 ratings of sentence similarity) to the revised manuscript (see Figures S1-S3).

      Other changes

      As noted by reviewer 3, the full set of sentence pairs was missing from the previous draft. They have been added to the SI of the revised manuscript.

      We have renamed the Graph and Hybrid models in the manuscript to AMR-Smatch and Verbnet-CN respectively, for greater clarity as to which models these terms refer to, and also to better differentiate from the newly added constituency parse graph models.

      We have thoroughly revised the discussion section, incorporating feedback from all reviewers regarding areas needing additional depth.

      We have added subsections to the discussion to aid the reader navigating the now lengthier section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review

      This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.

      We thank the reviewers for their thorough and thoughtful comments and believe that the manuscript is much stronger now after the revision which incorporates the requested changes. We added results of new experiments and additional analyses. Although these new insights did not change the previous conclusions, we significantly reworked the Discussion and added further references to clarify the conclusions we want to make.

      Please note that we used ORN as acronym for “olfactory receptor neuron” throughout the manuscript. ORNs contain odorant receptors (ORs), and in insects these ORs associate with the olfactory receptor co-receptor (Orco) to be trafficked to the membrane of the cilium of the ORN, where they can be contacted by pheromones and odorants. In Manduca sexta, evidence is accumulating for G-protein coupled metabotropic pheromone transduction and not for OR-Orco dependent ionotropic transduction, as shown for Drosophila melanogaster. In both insect species, besides its chaperone function, Orco can form leaky cation channels, which can regulate the spontaneous spiking activity of ORNs. In this study, we explored this role of Orco.

      Strengths:

      The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.

      Major weaknesses:

      At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.

      Please see our responses to the detailed comments.

      Detailed comments are provided below:

      (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.

      The Orco agonist VUAA1 (Jones et al., 2011) binds directly to Orco and increases the channel open time probability. In M. sexta hawkmoths, we have already published that VUAA 1 increases the low spontaneous activity of ORNs in a dose-dependent fashion (Nolte et al., 2013). Chen and Luetje (2012) systematically varied the chemical structure of VUAA1 to identify new Orco ligands and discovered 22 Orco ligand candidates (OLCs) that either activated or inhibited Orco. In their heterologous expression system, Orco was most sensitive to inhibition by OLC15. Based on these results, we published a dose-response curve of OLC15 inhibition (1-100 µM) using in vivo tip recordings of pheromone-sensitive long trichoid sensilla of M. sexta (Nolte et al., 2016). There, we also demonstrated that OLC15 dose-dependently antagonizes the VUAA1-dependent activation of Orco.

      Furthermore, we tested other published Orco antagonists, which were characterized in heterologous assays, in primary cell cultures of hawkmoth ORNs, as well as in in vivo assays in intact hawkmoths. We focused on amiloride-derived antagonists, because we previously identified an amiloride-sensitive cation channel in hawkmoth ORNs. We found that, in contrast to OLC15, the amilorides HMA and MIA were not Orco-specific antagonists but instead affected different ion channel targets depending on the time of day (Nolte et al., 2016). Based on those experiments and the dose-response curves we determined that the Orco agonist VUAA1 (Jones et al., 2011) and the Orco antagonist OLC15 (Chen and Luetje, 2012) worked best in hawkmoth ORNs to target Orco pharmacologically. Due to those results and other comparative tests with other published Orco antagonists we settled since then in all further experiments on a dose of 50 µM OLC15 as most adequate to antagonize Orco functions in Manudca. In the current study, we focus on Orco without excluding the possibility that other ion channels in the ORNs contribute to the control of membrane potential rhythms.

      We have clarified the Methods section accordingly.

      (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.

      We agree with the referees that more time points and a direct comparison between timeless and Orco mRNA levels should be included in this manuscript. We included these additional qPCR experiments and edited the manuscript to make clear that we measure transcript abundance, but we will not perform snRNAseq analysis due to time- and financial constraints.

      (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.

      Our ELISAs found circadian oscillations in cAMP levels not only in antennae of the Madeira cockroach (Schendzielorz et al., 2014, 2012), but also in hawkmoth antennae (Schendzielorz et al., 2015). For clarification, we added the 2015 citation to the Modeling chapter in the Methods section.

      We agree with the referees that we cannot distinguish between Orco homo- and heteromers in the different compartments of our hawkmoth ORNs but we know that both are expressed in the pheromone-sensitive ORNs. Thus, as the referee suggests, we added text regarding the presence and localization of OR-Orco heteromers. Consistent data collected across different experiments (heterologous expression systems, primary cell cultures of hawkmoth ORNs, in vivo/in situ studies) support that Orco homomers are present in hawkmoth ORNs. In addition to co-expression of MsexOrco and MsexSNMP-1 with either MsexOr-1 or MsexOr-4 in a heterologous expression system, MsexOrco expression alone was already sufficient to increase intracellular Ca<sup>2+</sup> levels spontaneously as a result of its property as leaky, non-specific cation channel, and in response to VUAA1 application (Nolte et al., 2013). Both in developing hawkmoth pupae and differentiating primary cell cultures of hawkmoth ORNs, Orco expression started during a developmental time window where ORNs did not yet express pheromone receptors but where Orco affected spontaneous activity and intracellular Ca<sup>2+</sup> levels dependent on VUAA1 (Nolte et al., 2016). In vitro patch clamp studies of differentiating cultured hawkmoth ORNs during this time window of pupal development characterized ion channels/currents with properties of Orco as a leaky, non-specific cation channel/current that depends on protein kinase C and cyclic nucleotides (Dolzer et al., 2021, 2008; Krannich and Stengl, 2008; Stengl, 1993). Thus, Orco homomers are present in developing hawkmoth ORNs during a time window where ORNs already express spontaneous activity but they do not heteromerize with pheromone receptors. However, we do not know whether and in what ratio homo- and heteromers of Orco and ORs are present in the respective sensillum compartments of adult hawkmoths because all OR-specific antibodies tested did not work in immunocytochemical studies of hawkmoth antennae (Nolte et al., 2013; Stengl, 1994; Stengl and Hildebrand, 1990). Our hypothesis of differential distribution of Orco homomers in the some and dendrite compartment, and OR-Orco heteromers in the cilia is based on differential immunocytochemical localization of Drosophila ORs mainly in the cilia compartment (Benton et al., 2006).

      We clarified our manuscript accordingly.

      (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.

      It is consensus that insects display daily and circadian rhythms in pheromone-dependent mating, odor-gated feeding, and egg-laying behavior that phase-locks to environmental rhythms, corresponding with daily/circadian rhythms of sensory neuron physiology (e.g., Merlin et al., 2007; Rymer et al., 2007; Schendzielorz et al., 2015, 2012). However, circadian rhythms can be easily masked by stress, like the disturbances during an experimentally very challenging long-term recording experiment over several days. In addition, we observed over the years in our animal raising facility that in 17:7 light-dark cycles the originally nocturnal hawkmoths M. sexta distribute their activity patterns over the course of the day, finding nocturnal as well as diurnal hawkmoths. Thus, light-dark cycles were not enough to ensure phase-synchronized behavioral rhythms, and it is very likely that the nocturnal hawkmoths, next to stress signals, rely heavily on pheromone/odor dependent synchronization as also found in other moth species (Ghosh et al., 2024). Because we focus on spontaneous activity and not on pheromone-dependent physiology in this study, we used isolated males that were never exposed to the female pheromones, taking phase dispersal into account. Therefore, it became necessary in free-running conditions to first determine the respective behavioral rhythm for each animal, and then to phase-align their activity patterns to allow for statistical analysis. Otherwise, circadian differences would average out in a phase-dispersed free-running population. As requested by the referees in point (7), we added RAIN to test for rhythmicity in each of our recordings and revised the manuscript accordingly.

      Furthermore, in preliminary experiments we briefly exposed hawkmoths to pheromone the night before the start of the experiment. However, we failed to obtain phase-synchronized spiking rhythms. Most likely, a circadian pattern of pheromone exposure would have been necessary as zeitgeber, which could not be used here due to long-term pheromone-dependent effects in spiking activity. These results are added as supplementary figure to Fig 3.

      (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.

      We think the reviewers might have misinterpreted our description of the recording site. In the Methods, we state that we clip off the 20 most distal annuli (leaving a stump of about 60 annuli) and insert the reference electrode into the flagellum up to the second annulus from the cut end, i.e., the recording sites are located at 2/3 – 3/4 of the antenna length as seen from the head of the animal. We clarified this in the Methods section.

      In addition, our lab did show with antibody stainings against Orco that apparently all ORNs that innervate long and short trichoid sensilla along the whole flagellum express the same staining pattern (Nolte et al., 2016). Lee and Strausfeld (1990) mapped all types of antennal sensilla, and together with pheromone-dependent tip-recordings of Kaissling et al. (1989) it was shown that most of the male antennal sensilla are pheromone-sensitive long trichoid sensilla, with one of the two innervating ORNs always responding to bombykal, ensuring high sensitivity to pheromone detection. Furthermore, our patch clamp recordings of primary cell cultures of whole male antennae found largely overlapping ion channel populations across ORNs (review: (Stengl, 2010)). This would indicate that all ORNs, whether they express ORs sensitive to pheromone or general odorants, could potentially share the same Orco-dependent spontaneous activity rhythms. Furthermore, in our lab, different experimenters from different years that recorded from long trichoid sensilla on different annuli did not detect obvious differences in neither the spontaneous activity nor the pheromone responses (c.f., Dolzer et al., 2003; Gawalek and Stengl, 2018; Schneider et al., 2025). Thus, it is very likely that we are reporting a general encoding mechanism that is not locally restricted along the antennal flagellum and is very likely shared by all types of OR-Orco expressing ORNs.

      (6.1) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating…

      There are publications supporting cyclic nucleotide gating of Orco in Drosophila, but only after previous phosphorylation via protein kinase C (PKC; review: (Wicher and Miazzi, 2021)). Since Orco is very conserved among insect species, it is likely that PKC- and cGMP/cAMP-dependent regulations are present for Orco in other insect species. To test this, we are currently characterizing second messenger-dependence of spontaneous spiking activity, which is the focus of a follow-up manuscript. Nevertheless, to provide more evidence for our hypothesis of the current manuscript, we added a new set of tip-recording experiments that demonstrate cAMP-dependent gating of Orco. Because of the addition of this figure, we merged figures 8-10 into Figure 8 and added the cAMP data as Figure 9.

      (6.2) … and the PTTF model proposed is somewhat disappointing.

      For a detailed introduction of our PTFL membrane clock hypothesis please see our opinion paper that we refer to in the manuscript (Stengl and Schneider, 2024). We added clarification of how Orco activation can influence cAMP levels. A more elaborate PTFL clock model including many more of the identified ion channels in hawkmoth ORNs is the focus of another manuscript to come.

      (6.3) The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells.

      Indeed, we propose a metabotropic pheromone-transduction cascade, which in moths and cockroaches is based on G-protein-mediated activation of phospholipase C but not on adenylyl cyclase activation. Our hypothesis is not influenced by HEK cell heterologous expression studies of Orco but is supported by our own work comparing in vivo tip recordings of intact hawkmoths with patch clamp experiments on hawkmoth primary cell cultures of olfactory receptor neurons, which are able to respond to their species-specific pheromones in vitro (Schneider et al., 2025; Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In addition, a multitude of publications by other laboratories with in vivo and in vitro studies using physiological, genetic, and immunocytochemical assays all support a metabotropic signal transduction cascade in insect olfaction (Stengl, 2010; Stengl and Funk, 2013; Takagi et al., 2025; Wicher and Miazzi, 2021). In contrast, the hypothesis suggesting a solely ionotropic pheromone- and general odor-dependent transduction cascade for all insect species is based on very sparse experimental evidence, based primarily on heterologous expression studies such as HEK cells that lack the insect’s WT molecular surroundings, and thus, cannot predict OR-Orco function in vivo. Furthermore, the ionotropic hypothesis is heavily based upon the argument that an inverse 7TM receptor cannot couple to G-proteins, which lacks careful backup via biochemical and structural studies. In addition, the ionotropic hypothesis lacks support via carefully performed physiological in vivo studies in different insect species that paid attention to analysis of the distinct kinetic components of ORN´s odor/pheromone responses and that employ physiological concentrations and durations of odor/pheromone stimuli (please see our most recent publication by Schneider et al. (2025)). We added references to the possible odor transduction mechanisms to the introduction.

      (6.4) Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.

      While structural studies did not find evidence for conserved known cyclic nucleotide binding sites on Orco, this does not exclude the presence of indirect cAMP effects via e.g., Orco subunits complexing with other molecules under direct cAMP control, such as other ion channel subunits. Furthermore, it does not exclude so far unknown binding sites, or via sites that fold out only after a specific sequence of previous phosphorylations of the many phosphorylation sites on Orco. Indeed, physiological studies in Drosophila presented evidence for cyclic nucleotide dependence of Orco after previous PKC-dependent phosphorylation (Getahun et al., 2013). Our ongoing in vivo experiments in hawkmoths further corroborate a zeitgeber time-dependent PKC- and cyclic nucleotide-dependent modulation of Orco. These detailed studies will be published in a follow-up publication. In the revised version of this manuscript, we added tip-recording experiments that indicate cAMP involvement in Orco gating (new Figure 9).

      (7) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.

      The long-term tip-recordings of intact hawkmoths are very challenging and take a very long time to accomplish, thus, we are very happy that we succeeded in obtaining so many of them (N=40). We are thankful to the reviewers’ suggestion to use RAIN since this analysis revealed circadian rhythms in 7 of 11 LD recordings, 8 of 12 DD recordings, and 2 of 12 OLC15 recordings. Please see also our response to (4) above, commenting the phase-dispersal of activity rhythms observed in our experiments, as well as in the behavior of hawkmoth males in the mating cage.

      (8) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 “low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)”. The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the “effectiveness [of OLC15] increased over time.” They conclude that the drug “obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12).” The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).

      According to the valuable suggestions of the referees, we used RAIN to detect circadian rhythms in the spiking attributes in each individual animal. Since only 2 of 12 animals displayed a circadian rhythm in OLC15, statistical comparison of circadian amplitudes is not possible. We revised the results section accordingly and added to the figure legend to make it clearer that the heat maps in Fig 5 are representative from one animal each and not averages across animals.

      As the reviewer states correctly in (7), wavelet results of circadian rhythmicity must be interpreted carefully because of the low number of circadian cycles in ~3-4 day recordings. Since the heatmaps in Figure 5 visually revealed the presence of ultradian rhythms, the main focus of the wavelet analysis in Figure 6 is in the detection and quantification of ultradian periods up to 20 h.

      We revised the Methods section to include references to previous experiments that characterized the effect of different doses of OLC15 and other Orco antagonists and agonists in M. sexta antennae (Nolte et al., 2016). Please see also our response to (1).

      (9) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.

      We revised the manuscript accordingly and clarified which statements are supported via published evidence and which are predictions based upon our novel hypothesis published in our opinion paper (Stengl and Schneider, 2024).

      (10.1) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases).

      We agree with the referees that it is very likely that there are multiple layers of interconnected feedback cycles that control Orco localization and activity. Our novel hypothesis suggests interlocked TTFL and PTFL control of physiological circadian rhythms, not strictly hierarchical TTFL control, which would require a daily turnover of membrane proteins and transcriptional control via the established TTFL clock in insect ORNs. We are currently searching for TTFL control at all levels of odor/pheromone transduction using ZT-dependent transcriptomics in combination with qPCR and single-nucleus transcriptomics, involving also all the molecules suggested by the referees. These studies are ongoing, are very time- and money-consuming, and are beyond the scope of this manuscript. However, we added a set of experiments to this manuscript in which we demonstrate that the effect of increased cAMP on the spontaneous spiking activity is mediated by Orco (new Figure 9).

      (10.2) The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.

      Our experiments that test circadian TTFL control at different levels of the cAMP transduction cascade in hawkmoth antennae are on the way and are part of another publication. In section 6.2 we already stated that our experiments do not exclude that Orco is under indirect control of the TTFL. We revised our discussion accordingly.

      The experiments published for TTFL dependent control of Drosophila olfaction that we are aware of (Krishnan et al., 1999; Tanoue et al., 2004) do not exclude interlinked PTFL and TTFL clocks. Krishnan et al. (1999) demonstrated that the TTFL clock in antennal olfactory receptor neurons correlates with circadian rhythms in odor responses measured in electroantennogram (EAG) recordings, not in single sensillum recordings as in our experiments. EAG recordings comprise not only voltage responses of the olfactory sensory neurons but also voltage changes generated in non-neuronal antennal cells such as trichogen and tormogen cells that built the transepithelial potential gradient via vATPases that generates the high K<sup>+</sup> concentration in the sensillum lymph (Jain et al., 2024; Klein, 1992; Thurm and Küppers, 1980). In addition, EAG recordings most likely contain responses of afferent neurons originating from somata in the brain that maintain central control of the antennae. Thus, EAG recordings are difficult to interpret.

      (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:

      (a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).

      We revised the discussion accordingly.

      (b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).

      We added those experiments to the revised version of the manuscript (see our response to (2)).

      (c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.

      We added possible negative feedback elements to the Discussion to explain how our proposed PTFL could in principle work independent of TTFL clock.

      Minor weaknesses:

      (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?

      We have revised the discussion accordingly.

      (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.

      These large fluctuations stem from doing experiments at different seasons (higher temperature and humidity in the summer months, lower temperature and humidity in winter). Throughout each individual experiment, conditions were stable. We clarified the Methods section accordingly.

      Recommendations for the authors:

      The authors should post the code for their computational model to a repository like GitHub.

      The code for the computational model is now available at https://github.com/a-c-schneider/VijayanForlinoEtAl2025_Model.git

      References

      Benton R, Sachse S, Michnick SW, Vosshall LB. 2006. Atypical Membrane Topology and Heteromeric Function of Drosophila Odorant Receptors In Vivo. PLOS Biology 4:e20. DOI: https://doi.org/10.1371/journal.pbio.0040020

      Chen S, Luetje CW. 2012. Identification of New Agonists and Antagonists of the Insect Odorant Receptor Co-Receptor Subunit. PLOS ONE 7:e36784. DOI: https://doi.org/10.1371/journal.pone.0036784

      Dolzer J, Fischer K, Stengl M. 2003. Adaptation in pheromone-sensitive trichoid sensilla of the hawkmoth Manduca sexta. Journal of Experimental Biology 206:1575–1588. DOI: https://doi.org/10.1242/jeb.00302

      Dolzer J, Krannich S, Stengl M. 2008. Pharmacological Investigation of Protein Kinase C- and cGMP-Dependent Ion Channels in Cultured Olfactory Receptor Neurons of the Hawkmoth Manduca sexta. Chemical Senses 33:803–813. DOI: https://doi.org/10.1093/chemse/bjn043

      Dolzer J, Schröder K, Stengl M. 2021. Cyclic nucleotide-dependent ionic currents in olfactory receptor neurons of the hawkmoth Manduca sexta suggest pull–push sensitivity modulation. European Journal of Neuroscience 54:4804–4826. DOI: https://doi.org/10.1111/ejn.15346

      Gawalek P, Stengl M. 2018. The Diacylglycerol Analogs OAG and DOG Differentially Affect Primary Events of Pheromone Transduction in the Hawkmoth Manduca sexta in a Zeitgebertime-Dependent Manner Apparently Targeting TRP Channels. Frontiers in Cellular Neuroscience 12:218. DOI: https://doi.org/10.3389/fncel.2018.00218

      Getahun MN, Olsson SB, Lavista-Llanos S, Hansson BS, Wicher D. 2013. Insect Odorant Response Sensitivity Is Tuned by Metabotropically Autoregulated Olfactory Receptors. PLOS ONE 8:e58889. DOI: https://doi.org/10.1371/journal.pone.0058889

      Ghosh S, Suray C, Bozzolan F, Palazzo A, Monsempès C, Lecouvreur F, Chatterjee A. 2024. Pheromone-mediated command from the female to male clock induces and synchronizes circadian rhythms of the moth Spodoptera littoralis. Current biology 34:1414-1425.e5. DOI: https://doi.org/10.1016/j.cub.2024.02.042, PMID: 38479388

      Jain K, Prelic S, Hansson BS, Wicher D. 2024. Expression of Drosophila melanogaster V-ATPases in Olfactory Sensillum Support Cells. Insects 15:1016. DOI: https://doi.org/10.3390/insects15121016

      Jones PL, Pask GM, Rinker DC, Zwiebel LJ. 2011. Functional agonism of insect odorant receptor ion channels. Proceedings of the National Academy of Sciences 108:8821–8825. DOI: https://doi.org/10.1073/pnas.1102425108

      Kaissling KE, Hildebrand JG, Tumlinson JH. 1989. Pheromone receptor cells in the male moth Manduca sexta. Archives of Insect Biochemistry and Physiology 10:273–279. DOI: https://doi.org/10.1002/arch.940100403

      Klein U. 1992. The insect V-ATPase, a plasma membrane proton pump energizing secondary active transport: immunological evidence for the occurrence of a V-ATPase in insect ion-transporting epithelia. Journal of Experimental Biology 172:345–354. DOI: https://doi.org/10.1242/jeb.172.1.345

      Krannich S, Stengl M. 2008. Cyclic Nucleotide-Activated Currents in Cultured Olfactory Receptor Neurons of the Hawkmoth Manduca sexta. Journal of Neurophysiology 100:2866–2877. DOI: https://doi.org/10.1152/jn.01400.2007

      Krishnan B, Dryer SE, Hardin PE. 1999. Circadian rhythms in olfactory responses of Drosophila melanogaster. Nature 400:375–378. DOI: https://doi.org/10.1038/22566

      Lee JK, Strausfeld NJ. 1990. Structure, distribution and number of surface sensilla and their receptor cells on the olfactory appendage of the male mothManduca sexta. Journal of Neurocytology 19:519–538. DOI: https://doi.org/10.1007/BF01257241

      Merlin C, Lucas P, Rochat D, François M-C, Maïbèche-Coisne M, Jacquin-Joly E. 2007. An Antennal Circadian Clock and Circadian Rhythms in Peripheral Pheromone Reception in the Moth Spodoptera littoralis. Journal of Biological Rhythms 22:502–514. DOI: https://doi.org/10.1177/0748730407307737

      Nolte A, Funk NW, Mukunda L, Gawalek P, Werckenthin A, Hansson BS, Wicher D, Stengl M. 2013. In situ Tip-Recordings Found No Evidence for an Orco-Based Ionotropic Mechanism of Pheromone-Transduction in Manduca sexta. PLOS ONE 8:e62648. DOI: https://doi.org/10.1371/journal.pone.0062648

      Nolte A, Gawalek P, Koerte S, Wei H, Schumann R, Werckenthin A, Krieger J, Stengl M. 2016. No Evidence for Ionotropic Pheromone Transduction in the Hawkmoth Manduca sexta. PLOS ONE 11:e0166060. DOI: https://doi.org/10.1371/journal.pone.0166060

      Rymer J, Bauernfeind AL, Brown S, Page TL. 2007. Circadian rhythms in the mating behavior of the cockroach, Leucophaea maderae. Journal of Biological Rhythms 22:43–57. DOI: https://doi.org/10.1177/0748730406295462, PMID: 17229924

      Schendzielorz J, Schendzielorz T, Arendt A, Stengl M. 2014. Bimodal Oscillations of Cyclic Nucleotide Concentrations in the Circadian System of the Madeira Cockroach Rhyparobia maderae. Journal of Biological Rhythms 29:318–331. DOI: https://doi.org/10.1177/0748730414546133

      Schendzielorz T, Peters W, Boekhoff I, Stengl M. 2012. Time of Day Changes in Cyclic Nucleotides Are Modified via Octopamine and Pheromone in Antennae of the Madeira Cockroach. Journal of Biological Rhythms 27:388–397. DOI: https://doi.org/10.1177/0748730412456265

      Schendzielorz T, Schirmer K, Stolte P, Stengl M. 2015. Octopamine Regulates Antennal Sensory Neurons via Daytime-Dependent Changes in cAMP and IP3 Levels in the Hawkmoth Manduca sexta. PLOS ONE 10:e0121230. DOI: https://doi.org/10.1371/journal.pone.0121230

      Schneider AC, Schröder K, Chang Y, Nolte A, Gawalek P, Stengl M. 2025. Hawkmoth Pheromone Transduction Involves G-Protein–Dependent Phospholipase Cβ Signaling. eNeuro 12:ENEURO.0376-24.2024. DOI: https://doi.org/10.1523/ENEURO.0376-24.2024, PMID: 39880675

      Stengl M. 2010. Pheromone Transduction in Moths. Frontiers in Cellular Neuroscience 4:133. DOI: https://doi.org/10.3389/fncel.2010.00133

      Stengl M. 1994. Inositol-trisphosphate-dependent calcium currents precede cation currents in insect olfactory receptor neurons in vitro. Journal of Comparative Physiology A 174:187–194. DOI: https://doi.org/10.1007/BF00193785

      Stengl M. 1993. Intracellular-Messenger-Mediated Cation Channels in Cultured Olfactory Receptor Neurons. Journal of Experimental Biology 178:125–147. DOI: https://doi.org/10.1242/jeb.178.1.125

      Stengl M, Funk NW. 2013. The role of the coreceptor Orco in insect olfactory transduction. Journal of Comparative Physiology A 199:897–909. DOI: https://doi.org/10.1007/s00359-013-0837-3

      Stengl M, Hildebrand JG. 1990. Insect olfactory neurons in vitro: morphological and immunocytochemical characterization of male-specific antennal receptor cells from developing antennae of male Manduca sexta. Journal of Neuroscience 10:837–847. DOI: https://doi.org/10.1523/JNEUROSCI.10-03-00837.1990, PMID: 2319305

      Stengl M, Schneider AC. 2024. Contribution of membrane-associated oscillators to biological timing at different timescales. Frontiers in Physiology 14:1243455. DOI: https://doi.org/10.3389/fphys.2023.1243455

      Takagi S, Abuin L, Mermet J, Lee D, Benton R. 2025. A GPCR signaling pathway in insect odor detection. DOI: https://doi.org/10.1101/2025.10.03.680299

      Tanoue S, Krishnan P, Krishnan B, Dryer SE, Hardin PE. 2004. Circadian Clocks in Antennal Neurons Are Necessary and Sufficient for Olfaction Rhythms in Drosophila. Current Biology 14:638–649. DOI: https://doi.org/10.1016/j.cub.2004.04.009, PMID: 15084278

      Thurm U, Küppers J. 1980. Epithelial physiology of insect sensilla. In: Locke M, Smith DS (Eds). Insect Biology in the Future. Academic Press. p. 735–763. DOI: https://doi.org/10.1016/B978-0-12-454340-9.50039-2

      Wicher D, Miazzi F. 2021. Functional properties of insect olfactory receptors: ionotropic receptors and odorant receptors. Cell and Tissue Research 383:7–19. DOI: https://doi.org/10.1007/s00441-020-03363-x

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This rigorous and creative study uses an elegant combination of metabolomics, transcriptomics, and budding yeast molecular genetics to discover that (i) activating AMPK to maintain mitochondrial respiration fueled by cytosolic Acetyl CoA and (ii) increasing fatty acid synthesis independent of respiration drive independent pathways that increase the fitness of replicatively-aged budding yeast cells, albeit without increasing their lifespan. This work will be of interest to scientists in the field of aging and metabolism. Some clarifications in the text would address the following concerns, which would increase the impact of the study:

      (1) What does activation of AMPK (via PGDP-Sak1 expression) do to the replicative lifespan? How many bud scars, in general, do the subpopulations that are older - yet have less Tom70 (increased mitochondrial fitness) - have, after the 48 hrs time point that they are examining? How many divisions occurred in this 48hr time period - i.e. is it long enough to have all cells reach the end of their replicative lifespan? This information is important to rule out that a subset of the mutant cells just divided faster and hence had more divisions within 48 hrs (growing faster and living longer are different things). Having identical growth curves doesn't indicate per se that they all divide at the same rate, as there may be a subpopulation that divides faster and a subpopulation that doesn't grow so well.

      Increasing AMPK activity increases replicative lifespan [PMID: 25869125], but given our finding that AMPK activation splits the population, such replicative lifespan assays are hard to interpret. Bud scar counts have a similar issue. Hence we restricted the lifespan and bud scar analyses to wt and A2A which are more homogenous (Figures S2 B and E). A2A cells at 48h have ~25% more bud scars than wt cells. Yes, by 48h most of the cells have lost viability (Figure 2E). The reviewer is correct that you can't properly compare the lifespan curves if the cells divide at different rates, hence our follow-up test of wt at 48h vs A2A at 40h viability after we had confirmed that these timepoints captured cells at equivalent replicative ages (Figure 2D,E). This shows that viability of A2A is slightly lower than wt at matched age, indicating a slightly shorter lifespan.

      (2) A2A cells do not have an extended replicative lifespan (RLS) but show an increase in the "low senescence" population (Figure 2). If the cells are not becoming senescent, why don't they have longer RLS? Not having a longer lifespan seems inconsistent with the statement that "bud scar counting confirmed that A2A cells reach a higher age than wild type", which comes back to how many times the cells can divide in the 48hr timepoint studied and their rate of cell division? Also, the lifespan curve shown is plotted against time, not cell division number, which does not take into account different division times of cells within the population (described above). It would be much more useful to show standard lifespan curves showing cell division numbers per lifespan per cell.

      Our observation that cells can reach the end of life without senescing is consistent with other studies that have studied the life course of individual cells by microscopy [PMID: 31291577, 32675375]. These studies always highlight some proportion of the cells that reach the end of life with no or minimal senescence, though this fraction varies with the experimental system. The question of why cells lose viability without senescing is a complete unknown in the field, but reflects a wider lack of consensus as to why yeast lose viability with replicative age.

      We are wary about making strong statements on lifespan for exactly the reason the reviewer picks out. In liquid culture we can only assess viability over time, and it is clear from the comparison of liquid and solid media lifespans performed by the Gottschling lab [PMID: 19652178] that culture system has a huge effect on lifespan, with cells in classical microdissection-based lifespan assays living far longer than they do in liquid. This of course means that classical microdissection assays are not very useful for A2A so we are left with an unsatisfactory approximation. We have therefore restricted our conclusion on lifespan to simply say that lifespan of A2A cells is not extended which our data in Figures 2D,E,S2B does support (see also answer to Q1), and therefore with the majority of A2A cells showing low senescence marks and high fitness at 48h we can conclude that lifespan and fitness loss must be separable.

      We will note these limitations of lifespan measurements in the manuscript.

      (3) Increased "fitness" of the old cells is implied from the increased size of the colonies that the old cells can make. However, this is a measure of the fitness of the daughters per se, not the old mother cells. Are the old mothers just passing on healthier mitochondria and more lipids to the daughters, such that they can divide more times? If the aged cells have an "increased fitness", why don't they divide more times themselves (i.e. live longer?).

      Yes, colony growth speed is defined by daughter cell replication, and as long as the daughters and subsequent generations divide at the same rate irrespective of whether they come from a young or old mothers then the size of the colony after 24 hours varies based on the time it took the initial mother to produce a daughter. This is what the assay really measures. We note that aged wildtype mothers often do not divide at all in the first 24 hours after being put on an agar plate (hence the tiny reported colony size), even though they do eventually produce a daughter which then forms a colony, whereas A2A cells tend to produce the first daughter rapidly whether young or old. It is known that daughters of aged wildtype mothers also divide slower, which will also contribute to differences in colony size, and this may well result from a lipid and/or mitochondrial contribution, but the primary driver of colony size in 24 hours is the time the mother took to initially divide. We will add this detail to the manuscript.

      As noted above, the mechanistic basis of lifespan is unknown, but although senescence can shorten lifespan, our work and that of others shows that lifespan is still limited in the absence of senescence.

      (4) The statement is made that "these experiments define two classes of aging cells with distinct metabolic needs, coherent with the model of two aging trajectories previously proposed (referencing Nan Hao's work)". However, the big difference here is that in Nan Hao's work, their two aging trajectories influenced the length of lifespan, but that does not appear to be the case here. That distinction should be made clear. Perhaps the authors could also speculate as to why the A2A yeast stops dividing after presumably the same number of cell divisions, even though they have an activated AMPK and activated fatty acid synthesis pathway.

      We will add this distinction. As noted above, we are wary of making strong statements regarding lifespan as the assays we can do in liquid culture are limited. We are therefore similarly wary about speculating about causes for the lack of lifespan difference because in reality all we can do is rule out a big effect. We would love to speculate on why the A2A cells don't have an extended lifespan, but at this point we don't have any good ideas on this point!

      (5) I am a bit confused by the use of the word "senescence" by this lab here and in their previous growth on galactose studies. If yeast don't senesce, which is usually defined as an irreversible arrest of the cell cycle where cells stop dividing, shouldn't the yeast that do not senesce still be dividing and hence have a longer lifespan? Should a different term be used rather than senescence? Such as "fitness late in life". The authors giving their definition of senescence may help reduce this apparent contradiction.

      We completely agree, this is confusing and noted this distinction in the Introduction. Use of the term senescence to mean a loss of fitness late in life in yeast stems from the classical definition of senescence as applied to whole organisms. However, the term senescence as applied to cells has a more specific meaning in terms of the cell cycle as the reviewer notes. As an individual S. cerevisiae is both a cell and an organism, the terminology clashes. However, the marker we largely employ (Tom70-GFP) which in our hands is a very good proxy for fitness was originally defined as marking the senescence entry point (SEP), so overall we feel we can't avoid the term.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors investigate how cytosolic acetyl-CoA metabolism influences replicative aging in budding yeast. They propose that acetyl-CoA regulates aging through three major pathways: (1) mitochondrial transport to support mitochondrial function, (2) fatty acid synthesis, and (3) global protein acetylation. The data show that AMPK activation promotes mitochondrial import of acetyl-CoA and partially mitigates mitochondrial decline in a subset of aging cells.

      Furthermore, the engineered A2A strain, which enhances mitochondrial acetyl-CoA utilization while relieving inhibition of fatty acid synthesis, increases the proportion of cells exhibiting a "low senescence" phenotype.

      Overall, this is a thoughtful and potentially impactful study that advances our understanding of metab to olic control of aging. Addressing the points below, particularly by refining interpretations and, where feasible, incorporating additional analyses, will further strengthen the manuscript and its conclusions.

      Strengths:

      The study has several notable strengths. It addresses an important question by shifting the focus from lifespan to preservation of late-life fitness, which is highly relevant to aging biology. The work integrates metabolic, genetic, and functional analyses to link cytosolic acetyl-CoA flux with distinct aging outcomes, and the engineering of the A2A strain provides a clear and elegant demonstration of how coordinated pathway modulation can improve cellular fitness.

      Weaknesses:

      (1) While the manuscript focuses on mitochondrial transport and fatty acid synthesis, cytosolic acetyl-CoA is also a key regulator of histone acetylation and chromatin silencing. It would strengthen the study to consider whether acetyl-CoA depletion contributes to improved fitness through enhanced rDNA silencing. Given the well-established role of rDNA instability in yeast aging, additional experiments examining rDNA silencing and stability would be valuable. For example, monitoring rDNA copy number changes (not necessarily ERCs) under AMPK activation, oleic acid supplementation, and in the A2A strain, similar to approaches used in the authors' prior work, would help clarify whether chromatin regulation contributes to the observed phenotypes.

      We have data addressing this point that we will add to the manuscript. In short, we see no difference in gene expression from Sir2-repressed sub-telomeric regions or MAT loci, but the genome-wide gene expression dysregulation associated with age is partially suppressed in PGPD-SAK1. However, A2A does not suppress this further, so it is not critical for the suppression of senescence in A2A though we are following this up. ERC accumulation is higher in A2A at 48h, consistent with the cells being older, meaning that ERCs are unlinked to senescence onset as we have previously reported. There is a strong upregulation of transcripts from Sir2-repressed rDNA intergenic spacers with age in all genotypes, but we attribute this simply to the copy number increase of these regions on ERCs rather than a defect in silencing. We have previously looked for heritable changes in rDNA copy number arising during ageing and found (to our surprise) absolutely nothing, so we don't expect any changes under these conditions.

      (2) The current data do not fully distinguish whether AMPK activation and oleic acid supplementation act on distinct subpopulations of aging cells. An alternative explanation is that oleic acid supplementation enhances mitochondrial function and acts additively with AMPK activation, thereby increasing the fraction of cells in the "low senescence" state. Since this distinction is not central to the main conclusions, I suggest softening the language around subpopulation specificity. Emphasizing instead that the A2A strain coordinately modulates multiple branches of acetyl-CoA metabolism to improve late-life fitness would maintain the strength of the central message without overinterpretation.

      We agree that oleic acid and the lipids produced downstream of Acc1 in A2A may improve late life fitness via enhanced mitochondrial function, and in support of this Oxygen Consumption Rate is marginally (though significantly) higher in A2A than PGPD-SAK1. We will add this data to the manuscript. However, we disagree with the interpretation of an additive effect as we report throughout the study that AMPK activation and lipid biosynthesis/supplementation affect different sub-populations of cells. We do not observe populations of intermediate senescence cells, rather by flow cytometry and fitness assays we observe individual cells in binary low senescence or high senescence states.

      (3) The manuscript proposes that lipid starvation and excess acetyl-CoA are major drivers of senescence in distinct subpopulations of wild-type aging cells. This conclusion is not yet fully supported by the presented data. Direct measurements of age-dependent divergence in acetyl-CoA and fatty acid levels at the single-cell level would be needed to substantiate this model. Based on the current evidence, a more conservative interpretation would be that aging cells exhibit differential sensitivity to perturbations in acetyl-CoA and lipid metabolism. Accordingly, I recommend revising the statement in the Abstract ("We further implicate lipid starvation and excess acetyl coenzyme A availability as major drivers of senescence...") and the corresponding discussion text to better align with the data.

      We agree and will adjust the abstract to make it clearer that the lipid starvation / excess acetyl coA interpretation is a model.

      Reviewer #3 (Public review):

      Summary:

      These findings suggest that PGPD-SAK1 yeast show a subpopulation with lowered TOM70-GFP expression in high bud scar staining aged cells. Deletion of CAT2 or MLS1 reduces this effect. A PGPD-SAK1 acc1S1157A double mutant (called "A2A" here) shows an even larger effect of lowered tom70 expression in high bud scar staining aged cells. Utilization of various additional mutants involved in acetyl-CoA transport, carnitine shuttle, respiration, etc., leads the authors to conclude that these shifts in TOM70-GFP in aged cells are linked to the AMPK-fatty acid metabolic regulatory system.

      Strengths:

      These extensive and clearly described experiments reveal interesting changes in TOM70-GFP intensity in subsets of aged yeast in several mutants eventually identified as linked to the AMPK-fatty acid metabolic regulatory system.

      Weaknesses:

      (1) 3 biological replicates for mRNASeq is low.

      Thank you for pointing this out. We performed another replicate after posting the initial preprint but didn’t update the figure in the eLIFe-reviewed version. We will add this to the scatter plots and analysis in Figure 1, the findings have not changed.

      (2) While "Traditional conceptions of ageing implicate a progressive accumulation of damage leading to systemic degradation in performance until death, with evolutionary pressures acting to maximise early life fitness and fecundity at the expense of ageing health." is tangential perhaps to the data and conclusions of the study, both claims of this sentence are at best controversial, and the manuscript is no weaker for their omission.

      We actually feel that this sentence is very important to the message of the manuscript, which is that ageing does not necessarily have to involve a loss of fitness before death. Ageing is often described as the progressive wearing out of components leading to decline and death (with an old car often used as an analogy); in the ageing field this is certainly controversial, but outside the field this remains the normal understanding. We think it is important to state this widely held viewpoint with which our findings are hard to reconcile.

      Our interpretation that yeast are bet-hedging as a population growth strategy and this drives ageing in the long term is a classic antagonistic pleiotropy - we will add this term (from the citation that is already in the manuscript) and clarify in the discussion to make it obvious why we are introducing this concept in the introduction.

      (3) The statement that "Here, we determine the basis of senescence and fitness loss in replicatively ageing yeast" is a bit strong as a summary of the present careful work presented here. If the authors had created yeast mutants that retained fitness indefinitely, this would be a more appropriate strength of claim to summarize the work.

      Indeed - we will refine this sentence.

    1. Teleshuttle ucm.teleshuttle.com › 2018 › 11 › as-we-will-think-legacy-of-ted-nelson.html Smartly Intertwingled: "As We Will Think" -- The Legacy of Ted Nelson, Original Visionary of the Web Why Nelson matters A fuller explanation of why Nelson matters is in my post from a few years ago, Digital Camelot - The Once and Future Web of Engelbart and Nelson, but here I caption its core message: If you care about modern culture and how technology is shaping it, this is worth thinking about -- A powerful eulogy for where the Web might have gone, and still may someday, and the friendship of the two people most responsible for envisioning the Web* -- Ted Nelson's eulogy for his friend Doug Engelbart, as reported by John Markoff in The Times -- with Nelson's inimitable flair.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study investigated how visuospatial attention influences the way people build simplified mental representations to support planning and decision-making. Using computational modeling and virtual maze navigation, the authors examined whether spatial proximity and the spatial arrangement of obstacles determine which elements are included in participants' internal models of a task. The study developed and tested an extension of the value-guided construal (VGC) model that incorporates features of spatial attention for selecting simpler task mental representation.

      Strengths:

      (1) Original Perspective:

      The study introduces an explicit attentional component to established models of planning, offering an approach that bridges perception, attention, and decisionmaking.

      (2) Methodological Approach:

      The combination of computational modeling, behavioral data, and eye-tracking provides converging measures to assess the relationship between attention and planning representations.

      (3) Cross-validated data:

      The study relies on the analysis of three separate datasets, two already published and an additional novel one. This allows for cross-validation of the findings and enhances the robustness of the evidence.

      (4) Focus on Individual Differences:

      Reports of how individual variability in attentional "spillover" correlates with the sparsity of task representations and spatial proximity add depth to the analysis.

      We thank the Reviewer for their overall positive assessment of our work and their helpful comments. We have addressed each point below.

      Weaknesses:

      (1) Clarity of the VGC model and behavioral task:

      The exposition of the VGC model lacks sufficient detail for non-expert readers. It is not clear how this model infers which maze obstacles are relevant or irrelevant for planning, nor how the maze tasks specifically operationalize "planning" versus other cognitive processes.

      The method for classifying obstacles as relevant or irrelevant to the task and connecting metacognitive awareness (i.e., participants' reports of noticing obstacles) to attentional capture is not well justified. The rationale for why awareness serves as a valid attention proxy, as opposed to behavioral or neurophysiological markers, should be clearer.

      We thank the reviewer for urging further clarity here. Our work builds closely on the previous maze navigation paradigm and VGC model developed and reported by Ho et al. Nature (2022). We directly adopted variants of their maze stimuli, computational model and obstacle awareness measures, and married these with an investigation of the role of visuospatial attention. We agree that it would be useful for the reader to have a more in-depth description of the paradigm and model, and how it operationalises planning, without needing to refer back to the original Ho et al. paper. We have now added additional explanatory sections to the Introduction and Methods as follows:

      On page 4:

      “One elegant approach to forming such a simplified representation is to adaptively select the granularity of information required to complete the task (Ho et al., 2022), known as value-guided construal (VGC). Unlike previous accounts, which model human planning as a search over all items (e.g.., tube lines), the VGC model predicts that a cognitively limited decision-maker selects a manageable subset of information over which to plan— i.e., a task representation—balancing utility and complexity (Ho et al., 2022). In our example, the VGC algorithm would plan over a few relevant tube lines rather than planning over all possible stations. To select the representation that achieves the best balance between utility and complexity, the model searches across all possible combinations of tube lines, computing the value (i.e., the plan’s utility minus its cost) of each representation for planning a specific journey. The algorithm then selects the representation with the highest value, which ensures that an ideal observer selects a representation which only includes the items (i.e., tube lines) that lead to successful planning while excluding as many items as possible to keep the plan as simple as possible. For our purposes, items included in the representation are considered taskrelevant, while items that are not represented are considered task-irrelevant. This algorithm, therefore, provides a normative standard of an efficient plan to which we can compare people’s actual plans.”

      On page 6:

      “We operationalized planning using a maze navigation paradigm, akin to our tube-related example, where participants were required to plan a route through the maze, avoiding obstacles that blocked their path. Obstacles predicted by the sVGC model to be included in the representation were considered task-relevant.”

      “At the end of every trial, participants reported their awareness of specific obstacles (see Methods for details). The level of awareness reported for different obstacles provides a read-out of what features of the environment individuals were subjectively representing while solving a particular maze. While other markers of attention and awareness (for instance, behavioural or neurophysiological variables) could also be used, here we focused on direct awareness reports in order to relate our findings both to those of Ho and colleagues and to the subjective awareness reports used in consciousness science (e.g. the Perceptual Awareness Scale (Barnett et al., 2024; Overgaard & Sandberg, 2021; Ramsøy & Overgaard, 2004; Samaha et al., 2015)). Participants were instructed to maintain central fixation while planning (see dataset dSC 1), in line with previous empirical work using this task (Ho et al., 2022).”

      To visualize our effects, we binarized the predictions of the sVGC model such that obstacles with a marginalized probability greater than 0.5 were considered taskrelevant, while other obstacles were considered task-irrelevant (e.g., Figure 2b). We now clarify this point in the caption of Figure 2.

      (2) Attention framework:

      The account of attention is largely limited to the "spotlight" model. When solving a maze, participants trace the correct trail, following it mentally with their overt or covert attention. In this perspective, relevant concepts are also rooted in attention literature pertaining to object-based attention using tasks like curve tracing (e.g., Pooresmaeili & Roelfsema, 2014) and to mental maze solving (e.g., Wong & Scholl, 2024), which may be highly relevant and add nuance to the current work. This view of attention may be more pertinent to the task than models of simultaneously tracking multiple objects cited here. Prior work (notably from the Roelfsema group) indicates that attentional engagement in curve-tracing tasks may be a continuous, bottom-up process that progressively spreads along a trajectory, in time and space, rather than a "spotlight" that simply travels along the path. The spread of attention depends on the spatial proximity to distractors - a point that could also be pertinent to the findings here.

      Moreover, the tracing of a "solution" trail in a maze may be spontaneous and not only a top-down voluntary operation (Wong & Scholl, 2024), a finding that requires a more careful framing of the link to conscious perception discussed in the manuscript.

      Conceptualizing attention as a spatial spotlight may therefore oversimplify its role in navigation and planning. Perhaps the observed attentional modulation reflects a perceptual stage of building the trail in the maze rather than a filter for a later representation for more efficient decision making and planning. A fuller discussion of whether the current model and data can distinguish between these frameworks would benefit readers.

      We thank the reviewer for highlighting relevant findings in the attention literature that were missing from our discussion. We fully agree that a complete account of the interplay of planning, navigation, and attention is likely to recruit the kind of curvetracing processes highlighted by the reviewer. However, we emphasise that our current focus is not on the process of navigation through a maze, but on the process of construing the maze itself. In other words, we are focused not on how people represent their path from A to B, but how they represent the maze itself, which they then use as a basis for planning between A and B. The VGC model predicts that a subset of obstacles will be included in this construal. We think that a spotlight model is a good starting point for this work, because attention is being deployed across the whole maze stimulus, and then becomes attached to particular objects located in particular positions. This is a distinct process from that involved in navigating the path itself. Accordingly, our stimuli were designed such that task-relevant obstacles could be presented either proximally or distally to the optimal path (e.g., Figure 1a and Supplemental Figures S1-6). An obstacle that blocks any possible path on one side of the maze is task-relevant but located a long way from the optimal path. The results of Ho and colleagues’ (2022) third experiment demonstrate how task-relevant yet distal obstacles are better remembered than task-irrelevant proximal obstacles (see Figure 4 of Ho et al., 2022). We also observed that obstacles further away from the navigation path were often represented by participants (see Figures S1-6), which cannot be explained by curve tracing alone.

      While these results cannot definitively rule out the possibility that participants automatically trace the path while also construing the maze, they suggest that the value-guided construal process is an independent predictor of participants’ representations beyond proximity to the navigated path. To make this distinction clearer, we now cite the papers alluded to by the reviewer, in the Discussion on pages 28-29, while also acknowledging the potential for investigating attention during the navigation process itself:

      “Future work may also wish to examine the relevance of visuospatial attention for the navigation process itself in this task. While our present findings speak to how individuals perceive the maze while planning, it remains unclear how attention is deployed during navigation along a path, such as how object-based attention progressively spreads along trajectories in time and space(Pooresmaeili & Roelfsema, 2014; Wong & Scholl, 2024).”

      There is also one additional nuance to the current spotlight model that we were inspired to consider by the reviewer’s comment. This is the idea that attentional effects may spread within or along the obstacles themselves. We cannot explore this in the current data because we asked for awareness of the entire obstacles, not parts of obstacles, but it may be possible to explore this in future work, for instance, with eye tracking measures.

      More generally, the growth-cone (i.e., zoom lens) model of attention for curve tracing proposed by Roelfsema and colleagues shares considerable similarities with the spotlight of attention model. Both models argue for the grouping of spatially proximal items based on attention. While the growth-cone model argues for varying sizes of zoom lenses (i.e., receptive fields of neurons) that facilitate the tracing of proximal items, both models predict that spatially proximal items are preferentially processed together because of attention. Indeed, the spotlight model could model these varying zoom lenses by altering the width of the attentional spotlight dynamically across the visual scene based on the spatial proximity of obstacles. Following related comments by Reviewer 2, we now investigate inter-individual differences in the attentional spotlight of participants and observed that these differences significantly predict participants’ mental representations (see Attentional spotlight model of task representations). We have now updated the Discussion to include consideration of these alternative model frameworks:

      On page 27:

      “Second, in the current work we were unable to distinguish whether these attentional effects are driven by a fixed spotlight of attention, or whether attention operates akin to a zoom lens, shifting the ‘width’ of the focus of attention according to the task demands (Eriksen & St. James, 1986; Müller et al., 2003; Schad & Engbert, 2012). The latter view would be consistent with growth-cone models of attention in which the focus of attention expands and contracts in accordance with task demands, mirroring the various receptive field sizes in the visual hierarchy (Pooresmaeili et al., 2014; Pooresmaeili & Roelfsema, 2014). In partial support of this idea, we found significant inter-individual differences in the width of participants’ attentional spotlight (Figure S11). It is also possible that attention is deployed within or along parts of obstacles, rather than on entire obstacles. Future work using naturalistic measures of eye movements may be able to address these questions.”

      (3) Lateralization of attention:

      The analysis considers whether relevant information is distributed bilaterally or unilaterally across the visual display, but does not sufficiently address evidence for attentional asymmetries across the left and right visual fields due to hemispheric specialization (e.g., Bartolomeo & Seidel Malkinson, 2019). Whether effects differ for left versus right hemifield arrangements is not made explicit in the presented findings.

      We thank the reviewer for this suggestion. To address this point, we fitted a three-way interaction model between VGC model prediction, lateralization index, and side (left vs right hemifield). We did not find evidence for the three-way effect (β= 0.01, SE= 0.02, 95% CI [-0.03, 0.04], p = 0.738; ΔBIC = 58.30 in favour of the null effect; see table below), suggesting that the side to which participants lateralized their attention did not influence their task representations. This result is now reported on page 12:

      “This effect did not vary significantly as a function of the specific hemifield (i.e., left vs right) in which task-relevant information was presented (β= 0.01, SE= 0.02, 95% CI [-0.03, 0.04], p = 0.738; ΔBIC = 58.30 in favour of the null effect; see table S14).”

      We also explored inter-individual differences in participants’ tendency to lateralize their attention (see also the next point). We observed that participants tended to lateralize their attention slightly more to the right-hand side for non-lateralized maze stimuli, despite the normative sVGC model predicting that participants should not lateralize their attention for these stimuli (Figure 3c). These results may speak to potential asymmetries in lateralization, but given the exploratory nature of these analyses, they should be verified and replicated in future work.

      (4) Individual differences:

      Individual differences in attentional modulation are a strength of the work, but similar analyses exploring individual variation in lateralization effects could provide further insight, and the lack of such analyses may mask important effects.

      Thank you for this suggestion. In new analyses, we explored whether i) participants exhibited differences in their tendency to lateralize their awareness reports, and ii) whether the degree to which they tended to lateralize their awareness predicted their performance on a separate set of maze stimuli. In short, we observed substantial variation in participants’ tendency to lateralize their awareness (Figure S11) and found that this tendency reflected an inter-individual difference which was stable across maze types. We report these new findings on pages 14-16.

      “Inter-individual variation in lateralization of attention

      Next, we investigated participants’ tendency to pay attention to obstacles within a single hemifield (left vs right) regardless of the sVGC model predictions. To do so, we computed an awareness lateralization index (ALI) based on participants’ self-reported awareness reports of obstacles on each trial (Figure 3a). Large positive values indicate that participants were preferentially aware of the right hemifield, whereas negative values indicate preferential awareness of the left hemifield. Values close to zero indicate that participants paid attention to both hemifields equally (see Methods for details). We observed that participants’ tendency to lateralize their awareness varied greatly across the Ho datasets 1 and 2 (Figure 3b); some participants preferentially paid attention to a single hemifield, regardless of whether the sVGC model predictions were lateralized. For the dSC1 dataset, we observed that on some trials, participants significantly lateralized their awareness (|ALI| > 0.5; Figure 3c) even though the sVGC model predictions were non-lateralized. These findings suggest that participants’ tendency to pay attention to a single hemifield may represent an observable inter-individual difference in how they allocate their awareness to form task construals.”

      “To further explore these inter-individual differences, we tested whether participants’ tendencies to lateralize their attention to a single hemifield was consistent across trials and maze stimuli. We observed that participants’ tendency to lateralize their attention to a single hemifield was similar for left and right lateralized maze stimuli (Spearman ⍴= 0.72, Figure 3d). This suggests that participants who preferentially attended to a single hemifield did so regardless of which hemifield they should attend to. More consequentially, the tendency for participants to lateralize their awareness on maze stimuli whose model predictions were also lateralized linearly correlated with participants’ tendency to lateralize their attention on non-lateralized maze stimuli (Spearman ⍴= 0.88, Figure 3d). Taken together, these findings emphasize that some individuals tend to preferentially attend to a single hemifield when planning. This tendency, importantly, represents an inter-individual difference in how participants allocate their attention across various maze types.”

      (5) Distinction between overt and covert attention:

      The current report at times equates eye movement patterns with the locus of attention. However, attention can be covertly shifted without corresponding gaze changes (see, for example, Pooresmaeili & Roelfsema, 2014).

      We fully agree, and thank the reviewer for prompting further reflection on this distinction. In the online experiments run by Ho and colleagues (i.e., datasets Ho1 and Ho2), participants’ eye movements were not tracked, and therefore, they could not disambiguate whether participants were engaging in covert or overt attention to sample maze obstacles. In our third experiment (i.e., dataset dSC1), we both recorded eye movements and explicitly instructed participants to fixate centrally while viewing the maze. This ensured that participants oriented their attention only covertly during planning (see Figure S13-14).

      We now elaborate on this important distinction in the Results section of the manuscript, page 12:

      “In addition, we monitored participants’ eye movements in dataset dSC 1 to ensure that attention shifts would be covert as opposed to overt—a distinction which could not be determined in the online samples of datasets Ho 1 and 2.”

      On page 28:

      “Importantly, while the visuospatial attention effects observed in the Ho 1 and 2 datasets are likely driven by both covert and overt shifts in attention, the findings presented in experiment 3 (i.e., dSC1 dataset) rule out the contribution of overt shifts in attention through the use of eye tracking (see Figure S13-14)(Carrasco, 2011; Pooresmaeili & Roelfsema, 2014).”

      The implications for interpreting the relationship between eye movement, memory, and attention in this setting are not fully addressed. The potential dynamics of attention along a maze trajectory and their impact on lateralization analysis would benefit from further clarification.

      We thank the reviewer for urging more clarity here. The attentional dynamics we document in our study concern how people perceive / construe the maze itself, rather than how they deploy their attention to guide active navigation. We have now sought to make this distinction clear at a number of points in the paper. The core idea is that attention acts as an early filter to select which obstacles are part of a task construal, which then affects both awareness and memory.

      We have now clarified the focus of our study in the introduction on pages 5-7:

      “Our focus in this study was to examine how participants perceive and represent their environment (the maze stimulus). This is a distinct process to how participants orient their attention during navigation itself, which is not part of our current study. To do so, we harness classical signatures of attentional selection to characterise how visuospatial attention shapes awareness of maze obstacles during planning.” … “Our focus in the present study was to examine attentional effects on participants’ perception of the maze stimulus. We did not quantify how individuals deploy their attention in the phase in which they were navigating through the maze.”

      We did not explicitly test for memory effects in our new experiments, but Ho and colleagues demonstrated that the sVGC model predicted not only awareness reports, but also participants’ memory of obstacles (see Ho et al., 2022). Indeed, task representations computed from memory or awareness reports were strikingly similar in their experiments (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness). In relation to eye movements, we refer the reviewer back to our previous response, which details how eye movements were measured and controlled during maze construal.

      Figure 1 legend (b) --> (c)

      We have corrected this typo in the figure caption.

      Reviewer #2 (Public review):

      Summary:

      Castanheira et al. investigate the role of spatial attention for planning during three maze navigation experiments (one new experiment and two existing datasets). Effective planning in complex situations requires the construction of simplified representations of the task at hand. The authors find that these mental representations (as assessed by conscious awareness) of a given stimulus are influenced by (spatially) surrounding stimuli. Individual participants varied in the degree to which attention influenced their task representations, and this attentional effect correlated with the sparsity of representations (as measured by the range of awareness reports across all stimuli). Spatially grouping taskrelevant information on either the left or right side of the maze led to mental representations more similar to optimal representations predicted by the valueguided construal (VGC) model - a normative model describing a theoretical approach to simplifying complex task information. Finally, the authors propose an update to this model, incorporating an attentional spotlight component; the revised descriptive model predicts empirical task representations better than the original (normative) VGC model.

      Strengths:

      The novelty of this study lies in the proposal and investigation of a cognitive mechanism through which a normative model like value-guided construal can enable human planning. After proposing attention as this mechanism, the authors make concrete hypotheses about mismatches between the VGC predictions and real human behavior, which are experimentally validated. Thus, not only does this study describe a possible mechanism for simplification of task information for planning, but the authors also propose a descriptive model, revising VGC to incorporate this attentional component.

      A strength of this paper is the variety of investigative approaches: analysis of existing data, novel experiment, and a computational approach to predict experimental findings from a theoretical model. Analyzing pre-existing datasets increases the size of the participant cohort and strengthens the authors' conclusions. Meanwhile, comparing the predictions of the existing normative model and the authors' own refined model is a clever approach to substantiate their claims. In addition, the authors describe several crucial controls, which are key to the interpretability of their results. In particular, the eye tracking results were critical.

      In summary, this paper constitutes an important step toward a more complete understanding of the human ability to plan.

      We thank the Reviewer for their thoughtful and positive assessment of our findings. We also appreciate the constructive feedback on our methodology, which we believe has substantially improved our manuscript.

      Weaknesses:

      (1) There is a critical conceptual gap in the study and its interpretation, mainly due to the reliance on a self-report metric of awareness (rather than an objective measure of behavioral performance).

      a. Awareness is tested by a 9-point self-report scale. It is currently unclear why awareness of task-irrelevant obstacles in this task would necessarily compromise optimal planning. There is no indication of whether self-reported awareness affects performance (e.g., navigation path distance, time to complete the maze, number of errors). Such behavioral evidence of planning would be more compelling.

      We thank the reviewer for prompting further reflection on the connection between construal and navigation performance. We wish to emphasise that the primary focus of our study was on measuring and modeling participants’ task construals using perceptual awareness judgments, building on the methods developed by Ho and colleagues, rather than on navigation performance itself. However, as the reviewer points out, there is a natural relationship between construal and performance – if you represent the wrong obstacles, plans may be disrupted.

      To explore the relationship between task construals and performance on the navigation task we first regressed out the effects of the sVGC model on participants’ awareness reports and computed the mean squared residuals for each trial. We then used these values to predict participants’ navigation response times on each trial. We observed a significant negative relationship, suggesting that on trials where participants’ representations showed greater deviations from the normative model, they were in fact faster at navigating the mazes. This relationship was surprising, and at odds with the initial idea that adhering to normative VGC aids in task performance. However, we think that this direction of effect may make sense if one considers that a large part of the actual construal (rather than the normative prediction) in our data was in fact driven by effects such as lateralisation which are not accounted for by the sVGC model. If one is faster at harnessing inductive biases such as lateralisation, then one may be faster to complete the maze but also show a greater deviation from the predictions of the original model.

      To further explore these effects, we next focused on the distinction between lateralised and non-lateralised mazes. Here, we reasoned that the initial phase of lateralised attentional selection would lead to lateralised mazes being easier to navigate than nonlateralised ones. We conducted new analyses to determine whether participants navigated lateralized maze stimuli faster and with fewer moves than maze stimuli with non-lateralized model predictions. As detailed in Methods, we excluded trials in which participants significantly deviated from the optimal number of moves (9 or more moves) and took longer than 20 seconds to solve the maze. In line with our interpretation that attention operates as an inductive bias, participants were faster and deviated less from the optimal path on lateralized compared to non-lateralized mazes.

      We now report these new results on navigation performance on pages 20-21:

      “Maze navigation performance

      The previous analyses focused on participants’ task representations during planning. We next sought to explore links between participants’ task representations and maze navigation performance. Participants performed the maze navigation task near-ceiling: they solved 95% of maze stimuli in under 20 seconds, with minimal deviation from the optimal path (i.e., 9 moves or fewer). Notwithstanding this limited variance in task performance, we explored whether participants’ task construals may have impacted their navigation speed. To do so, we first regressed out the effects of the sVGC model from participants’ awareness reports and used the mean squared residuals for each trial to predict response times (see Methods for details). Surprisingly, we observed a negative relationship between mean squared residual variance and response times (β = -0.31, SE = 0.05, 95% CI [-0.41, -0.21], p < 0.001), indicating that participants were faster on trials where the sVGC model explained less variance in their awareness reports. In other words, trials in which participants deviated more from the sVGC model predictions were solved faster. We note that one reason for this may be the strong influence of the lateralisation effect on navigation performance (see paragraph below), which itself is not part of the sVGC model prediction.”

      “We then explored whether participant performance differed between lateralised and nonlateralised mazes. Here, we reasoned that the initial phase of lateralised attentional selection would lead to lateralised mazes being easier to navigate than non-lateralised ones. Consistent with this hypothesis, participants were faster (β = -0.04, SE = 5.91*10<sup>3</sup>, 95% CI [-0.06, -0.03], p< 0.001) and followed the optimal path more closely (β = -0.59, SE = 0.09, 95% CI [-0.78, -0.40], p< 0.001) when maze stimuli were more lateralized.”

      And in the Discussion section, on page 23:

      “Mental representations and task performance

      We observed that participants were faster and deviated less from the optimal path on maze stimuli that were lateralized. This effect is not predicted by the original sVGC model but dovetails with the interpretation that early visuospatial attention operates as an inductive bias to guide the formation of simplified task representations. Surprisingly, we also observed that participants were faster to navigate mazes on trials where their simplified task representation deviated from the sVGC model prediction. We interpret this seemingly contradictory finding in the following way: there are several factors beyond the sVGC model – including, for instance, maze lateralisation – that predict both construal and performance on the maze navigation task. Further work is needed to understand how inductive biases such as lateralisation shape both construal and performance, and the real-world benefits that such strategies might afford for naturalistic stimuli.”

      b. Relatedly, it would have been more convincing to have an objective measure of awareness, for instance, how the presence or absence of a "task-irrelevant" obstacle affects performance (e.g., change navigation path distance or time to complete the maze), or whether participants can accurately recall the location of obstacles.

      We thank the reviewer for prompting further reflection on the validity and robustness of our awareness measures. We emphasise however that our focus is not (primarily) on maze navigation performance, but on task construal, which as noted in our previous response may come apart from navigation performance for a variety of reasons. Our primary goal is to measure participants’ subjective awareness of the maze as a marker of their idiosyncratic (conscious) mental representation on each trial. In doing so, we build on a rich tradition of measuring subjective awareness in consciousness and perception science (for instance, work using the Perceptual Awareness Scale, or detection judgments). In this sense, we think our awareness scale (following Ho et al.) represents a valid and straightforward way of assessing our target psychological construct. However, we also agree with the Reviewer that convergent evidence from other measures is always valuable. In Ho and colleagues’ original paper, they developed a variant of the maze task where participants had to recall the location of obstacles, as well as rate their awareness (Exp 3) and a variant in which participants could hover their mouse over hidden obstacles in the maze to reveal their location – an online metric of attentional deployment (Exp 4). These data afforded us the opportunity to validate the awareness reports against an objective measure of recall, as suggested by the Reviewer. In reanalysing these data, we observed that the obstacle awareness and memory/hover measures were strikingly correlated within two independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness). These re-analyses are now reported on page 22 of our manuscript, to highlight the convergent validity of the awareness metric:

      “Finally, we examined the convergent validity of participants’ awareness reports by reanalyzing the memory recall data reported in Ho and colleagues’ experiment(Ho et al., 2022). We reasoned that participants should demonstrate similar task representations regardless of the measure used to probe the construal. In line with this prediction, we observed that the obstacle awareness reports and memory/hover measures were strikingly correlated within three independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness; see Tables S18 and S19).”

      c. Consequently, I'm not sure that we can conclude that the spatial context does impact participants' ability to plan spatial navigation or to "incorporate taskrelevant information into their construal". We know that the spatial context affects subjective (self-reported) awareness, but the authors do not present evidence that spatial context affects behavioral performance.

      Following the line of argument above, we think it’s important to separate out task construal (the simplified representation of the maze, measured by awareness reports), and the impact of this on navigation and other aspects of behaviour. The awareness reports (and other convergent measures) show that task-relevant information (as predicted by the VGC) is incorporated into the construal, a process which is modulated by spatial context. These are the key targets of our modeling. Whether this impacts performance is a distinct question, and one that we now address in our response to point a above.

      d. Another concern that may complicate interpretation is the following: Figure 3c shows improved VGC model predictions (steeper slope) for mazes with greater lateralization. However, there are notable outliers in these plots, where a high lateralization index does not correspond to good model performance. There is currently no discussion/explanation of these cases.

      The Reviewer astutely points out some outliers in our analysis. While on average lateralized maze stimuli are represented more closely to the sVGC model, there are indeed some noticeable outlier mazes. These mazes represent stimuli in which participants tended to lateralize their attention to the ‘wrong hemifield’—e.g., participants were more aware of obstacles in the right hemifield despite sVGC model predicting that obstacles on the left hemifield were task-relevant. We believe this explains the poor sVGC model fits on these trials. We note, however, that on average participants were capable of attending to the correct hemifield without explicit instructions (i.e., 9 out of 12 mazes).

      We have now included a discussion of these outliers in the results section of the paper on page 12:

      “We note that for three maze stimuli whose model predictions were lateralized there was nevertheless a poor fit to the sVGC model (see Figure 2c, right panel). These outliers correspond to maze stimuli where participants, on average, lateralized their attention to the incorrect hemifield (i.e., the opposite hemifield to that predicted by the sVGC model).”

      (2) I noticed an issue with clarity regarding task-relevance. It is currently not fully clear which obstacles are "task irrelevant". Also, the term is used inconsistently, sometimes conflating with "awareness". For example, in the "Attentional spotlight model of task representations" section, the authors state that "taskrelevant information becomes less relevant when surrounded by task-irrelevant information". But they really mean that participants become less aware of those task-relevant obstacles. I assume task-relevance is an objective characteristic related to maze organization, not to a participant's construal. Indeed, the following paragraph provides evidence of model predictions of awareness.

      We apologize for any confusion regarding the terminology of our manuscript. We indeed use the terms task-relevant and task-irrelevant to refer to obstacles that are objectively predicted by the normative sVGC model or the attentional spotlight model to be included in (>0.5) or excluded from (<0.5) task construals, respectively. This designation reflects the predictions from the computational model and does not reflect participants’ reported awareness. We then ran linear hierarchical models to predict participants’ awareness reports from these model predictions. The Reviewer is correct that the task-relevance of obstacles is indeed related to the maze’s organization, and not related to participants’ subjective reports of awareness. We have now clarified this point throughout the manuscript to better emphasize the difference between the model predictions of taskrelevance and participants’ subjective reports.

      On page 17:

      “To achieve this, we computed the predictions of the existing VGC model for each obstacle’s task relevance in a given maze, and averaged these predictions within an attentional spotlight of 3 squares (Figure 4a & S8, see Methods for details). This process yielded novel model predictions, whereby some obstacles which were once predicted as task-irrelevant by the normative sVGC are now predicted as task-relevant by the attentional spotlight model. We depict the effects of this spatial spotlight in Figure 4a: task-irrelevant stimuli (plotted in grey; see middle left obstacle) neighbouring taskrelevant obstacles (plotted in orange) become more task-relevant, whereas taskrelevant information becomes less relevant when surrounded by task-irrelevant information (see bottom right orange obstacle). This deviation in model predictions from the normative sVGC model was used to predict participants’ awareness reports. We hypothesized that this spotlight-VGC model would predict participants’ reports better than the original VGC model, which does not account for spatial attention.”

      (3) The behavioral paradigm has some distinct disadvantages, and the validity of the task is not backed up by behavioral data.

      a. I understand the need for central fixation, but it also makes the task less naturalistic.

      The fixation cross was required on every trial such that participants could maintain central fixation for our eye tracking experiment. While this design is less naturalistic, it allows us to examine the eye movements of participants. Requiring participants to fixate during the ‘planning’ phase of the experiment allowed us to isolate the effects of covert attention from changes in awareness due to overt shifts in attention. In other words, differences in participants’ awareness reports in the 3rd experiment cannot be explained by longer fixation times to specific obstacles.

      b. The task with its top-down grid view does not seem to mimic real human navigation. Though this grid may be similar to mental maps we form for navigation, the sensory stimuli corresponding to possible paths and to spatial context during real-life navigation are very different.

      We agree with the reviewer that while our task is engaging for participants and simple to follow, it does not mimic naturalistic navigation in humans. There is a natural tension in computational / experimental work in cognitive science in wanting to build closely on previous results and paradigms, while ensuring that results can generalise to real-world contexts. Here, our choice of paradigm and measures was closely built on previous papers using this task from Ho and colleagues (2022, 2023). While preparing this response, we learnt that the MIT group had also harnessed this same task to develop a novel dynamic variant of the VGC model (Chen et al., 2026) called the Just in Time model (JIT). The advantage of building on this prior work is that we are able to iteratively refine and expand the VGC approach, and (in our case) bring it into closer contact with work on modeling the deployment of spatial attention in human vision. The top-down aspect of the maze notably facilitated the study of the spatial deployment of attention. We now discuss the novel dynamic variant of the VGC model in our paper on page 27:

      “We close by reflecting on opportunities for further work in this area. First, an important next step is to explore the process by which task representations are formed, and how inductive biases might affect the process of task construal. The sVGC model is a normative model of the optimal task representation. Since it’s construction involves an exhaustive calculation over possible paths, it is not a plausible basis for a model of the psychological process by which participants actually construct task representations. More recently a process model of task construal has been proposed, the Just in Time model (JIT). The hypothesis of the JIT model is that participants’ task representations are built up over time by iteratively simulating possible paths through the maze, affording insight into the construal process (Chen et al., 2026). In future work, it would be of interest to ask whether the attentional effects we observe in our experiments could be meshed with a dynamic JIT account of construal. We speculate that visuospatial attention may operate as an early filter, limiting the space of potential construals based on coarse spatial features of the environment, constraining a dynamic selection of obstacles. Brain imaging techniques with high time resolution, such as M/EEG, may be able to shed further light on how task representations are formed as participants plan.”

      c. Behavioral performance is not reported, so it is unknown whether participants are able to properly complete the task. The task seems pretty difficult to navigate, especially when the obstacles disappear, and in combination with the central fixation.

      Behavioural performance is now reported in response to point 1a above.

      d. There is no discussion of whether/how this navigation task generalizes to other forms of planning.

      We fully agree that an important next step would be to generalise our results on construal to naturalistic forms of planning – for instance, using immersive VR mazes, and or investigating cognitive rather than perceptual construals. We have now added a line to this effect to the Discussion on page 28.

      “An important next step to further our understanding of task representations would be to extend the current paradigm to other forms of planning and more naturalistic tasks, such as navigating immersive virtual reality (VR) environments, planning over cognitive rather than perceptual representations (e.g. planning over an abstract space), or internallyguided planning based on working memory.”

      Reviewer #2 (Recommendations for the authors):

      (1) There are, of course, benefits to simple tasks like the ones described, but it would be interesting to compare the results to a possible experiment in which a top-down grid/map is used for planning, but then task execution is carried out in a simulated environment corresponding to the map. Also, perhaps beyond the scope of the questions addressed in this paper, but I am curious how unexpected obstacles affect representations. For instance, if participants plan based on a topdown map and then begin "real" navigation but encounter an unexpected obstacle that was not indicated on the map, does this modulate representations/awareness of future obstacles (near vs. far)?

      We fully agree that all of these lines of investigation would be super interesting to pursue in future studies, and we have added a line to the discussion to that effect on page 28:

      “An important next step to further our understanding of task representations would be to extend the current paradigm to other forms of planning and more naturalistic tasks, such as navigating immersive virtual reality (VR) environments, planning over cognitive rather than perceptual representations (e.g.. planning over an abstract space), or internallyguided planning based on working memory.”

      (2) Regarding self-reported awareness as a metric, an additional experiment could ask participants to recreate the maze (identify locations of obstacles after they disappear). This would be a more objective measure of awareness.

      Yes indeed, and as described above, this was a metric used by Ho and colleagues in their previous experiment. As we describe in more detail above, the task representations obtained via memory or awareness reports demonstrated striking similarity (⍴ = 0.86).

      (3) What is meant by "all possible orientations of the maze" in this Methods sentence: "For dataset dSC 1, participants solved each of these 24 mazes four times (i.e., all possible orientations of the maze)"?

      We thank the Reviewer for prompting more clarity here. We vertically and horizontally reversed mazes (i.e., left-right flipped) such that participants could not predict the location of the goal or start location. In this way, each maze stimulus had four potential orientations. This resulted in 96 trials of 24 unique mazes. We have clarified this point in the Methods section on page 30:

      Maze stimuli were vertically and horizontally reversed (i.e., left-right flipped) such that participants could not predict the location of the start or goal location. This resulted in four potential orientations of each maze across all 24 mazes, 96 trials in total.

      (4) For lateralization, it was unclear until reading the Methods that the lateralization index was calculated using the VGC-predicted level of taskrelevance. From the main text and Figure 2, I assumed you were just counting the number of task-relevant obstacles on each side, rather than also quantifying relevance. I understood after reading the Methods, but this could be clarified further.

      We agree with the Reviewer that this was not evident from the text. We have now updated the Results section of the manuscript to clarify this point on page 11:

      “To test this hypothesis, we derived a measure of task-relevant lateralization inspired by the attention literature (Ghafari et al., 2024; Keefe & Störmer, 2021; Vollebregt et al., 2015) (Figure 2a). Specifically, we separated maze stimuli across the vertical meridian and computed the ratio of task-relevant information presented on the left versus right side derived from the sVGC model. For example, the maze shown in Figure 2a has twice the amount of task-relevant information presented in the left hemifield than in the right (lat. Index= 1/3). A lateralization index of 0.0 indicates that both hemifields contain equal amounts of task-relevant information (i.e., non-lateralized). The lateralization index was computed using the continuous VGC predictions for each obstacle (see Methods).”

      (5) The explanation in the Methods of how the width of the attentional spotlight was chosen references Figure 1b and Supplementary Figure S2, but it seems that Supplementary Figure S8 explains this more in the caption. Also, I don't see how Figure S2 supports this.

      We apologize for this typo. The explanation of how we selected the width of the attentional spotlight should indeed reference supplemental Figure 15 (previously Figure S8). We have now corrected this and elaborated on this choice in the Methods section on page 35:

      “We fixed the ‘width’ of the attentional spotlight to a distance of 3 squares based on the observation that the two neighbouring obstacles positively predicted the awareness of a probe. We observed that the mean and median distance between neighbouring obstacles of the 2nd rank (i.e., second closest) was 3 squares away for all mazes (Figure S15). We therefore opted to fix the value of the attention spotlight to 3 squares based on these observations. Future work utilizing this model should consider the statistics of their maze stimuli when deciding on the ‘width’ of the attentional spotlight.”

      (6) The attentional spotlight width was assumed to be 3 squares, based on the linear regression predictions of the effect of neighboring obstacles on stimulus awareness. Given the individual differences across participants, it would be interesting to choose a different attentional spotlight size for each participant. Would a participant-specific attentional spotlight width improve the predictions of the spotlight-VGC model?

      The Reviewer highlights a very interesting question: do individuals vary in terms of their attentional spotlight? To test this hypothesis, we first estimated the size of the attentional spotlight for each individual based on lateralized maze stimuli, and then used this to generate personalized attentional spotlight model predictions for each subject based on these values (Figure S11). We restricted this analysis to the dSC1 dataset, where we had substantially more trials (96 in total).

      In brief, we observed that indeed the personalized spotlight model fit participants’ awareness reports better than both a normative sVGC model and a group-level attentional spotlight model. We interpret these findings with some caution as i) a subset of individuals had flat attentional slopes and therefore were excluded from these analyses, and ii) we believe we require additional trials to ensure a robust model fit at the individual level. While our results are encouraging, we hope future investigations into inter-individual differences will extend these findings.

      We have included these additional analyses in the main text.

      On page 18:

      “To further explore inter-individual differences in task construal, we tested whether adjusting the attentional spotlight width to each participant’s awareness reports improved the predictions of the attentional spotlight model. To do so, we first determined the width attentional spotlight of each individual in the dSC1 dataset based on lateralized maze stimuli. We then generated person-specific attentional spotlight model predictions for the non-lateralized maze stimuli to avoid overfitting the data (Figure S11). We note that 7 participants had either flat attentional slopes or negative beta coefficients, which prevented the selection of an appropriate attentional spotlight width (see Methods for details). We observed a significant improvement in model fit for the person-specific attentional spotlight model relative to both the group-level attentional spotlight model (ΔBIC= -1487.39) and the normative sVGC model (ΔBIC= -1655.29). While the limited trial numbers per participant in our current dataset warrants caution in interpreting these findings, these findings do encourage further research on inter-individual differences in attentional deployment during planning.”

      On pages 23-24:

      “Inter-individual differences in attention

      We also observed considerable inter-individual differences in attentional effects across participants (Figure 1c). While some participants were strongly influenced by the spatial context of neighbouring stimuli, others showed more limited evidence for an attentional effect (Figure 1b). Inter-individual differences in attention predicted the sparsity of participants’ simplified representations: participants with larger attention effects exhibited sparser representations. Moreover, these inter-individual differences in effects of spatial proximity could be incorporated into the attentional spotlight model by varying the width of the spotlight, resulting in better model predictions.”

      “Beyond these spatial proximity effects, we also observed that participants varied in their tendency to lateralize their attention to a single hemifield (Figure 3). This tendency was observed across all three datasets, including on maze stimuli whose value-guided model predictions were not lateralized. This suggests that although a strategy of allocating attention is sub-optimal for these maze stimuli, some individuals preferentially attend to a single hemifield in a heuristic-like fashion. This tendency to attend to a single hemifield was a robust inter-individual difference across maze stimuli (Figure 3d), and dovetails with individual-level variation in spatial proximity effects. Taken together, these findings offer novel insights into how people vary in the ways they allocate spatial attention to solve complex problems. Future research could explore how these individual differences constrain performance on other tasks that require planning and search in highdimensional spaces.”

      On page 17 of the Supplemental Materials:

      (7) The supplementary text about lateralization effects, above Supplementary Table S8, references Table S6, but it is Table S6 does not seem to display lateralization results.

      We thank the Reviewer for pointing out this typo: we now refer to the correct supplementary table (S9).

      (8) Why does it matter that "the maze stimuli were not designed to test horizontalmeridian lateralization effects"? What is the effect on power? Is it because there is not a good enough range in lateralization indices? It would be good to clarify, or just remove that explanation, since the cortical retinotopy explanation seems more convincing.

      We did not specifically design the maze stimuli such that there is an equal number of obstacles above and below the horizontal meridian. As such, the lateralization index derived along the horizontal meridian does not control for the number of obstacles in each hemifield, which may influence participants’ awareness reports. In contrast, we designed maze stimuli such that this would not be a concern for the vertical meridian. We have clarified this point in the discussion on page 27.

      “Third, while we observed clear lateralization effects along the vertical meridian (i.e., left vs right hemifield), effects along the horizontal meridian were less clear (i.e., above vs below; see Table S15-16). One potential explanation of this asymmetry is the retinotopic organization of the cortex, in which spatially adjacent stimuli can be retinotopically distant if presented on the opposite side of the vertical (but not horizontal) meridian, facilitating distractor inhibition. Importantly, while the visuospatial attention effects observed in the Ho 1 and 2 datasets are likely driven by both covert and overt shifts in attention, the findings presented in experiment 3 (i.e., dSC1 dataset) rule out the contribution of overt shifts in attention through the use of eye tracking (see Figure S13-14)(Carrasco, 2011; Pooresmaeili & Roelfsema, 2014).”

      (9) For Figure 2c, it would be helpful to directly state what each dot and line mean.

      We updated the caption of Figure 2c to clarify what we are plotting: each point represents an obstacle, and each line the linear fit for a maze stimulus.

      “Each point represents an obstacle in a maze, and each line represents the model fit for that specific maze stimulus.”

      (10) Figures and wording imply there is only a single probe obstacle per trial, but methods and model imply that participants are asked to report awareness for every obstacle. This should be clarified.

      We apologize for any confusion regarding the methodology of our study. The Reviewer is correct that participants reported their awareness of every obstacle presented on a given trial. We have clarified this in the Results section of the manuscript on page 7:

      “Note, participants reported their awareness of every obstacle presented on a given trial.”

      We have also updated the caption of Figure 1 to clarify this point:

      “Once participants finished navigating the maze, they were asked to report their awareness of every obstacle presented on a given trial in a random order.”

      (11) What is the reason for the exclusion of participants (33 for experiment 1 and 26 for experiment 2)?

      Participants were excluded from the Ho et al. datasets 1 and 2 based on their preregistered exclusion criteria, as detailed in the Methods section of their paper. In short, trials were excluded if participants took longer than 20 seconds to complete the trial, or if they spent longer than 5 seconds in the initial state. Participants were excluded if less than 80% of trials remained after reaction time exclusions or if they failed 2 out of 3 comprehension checks. We have elaborated on this point in the Methods section on page 31.

      “Participants were excluded from analyses based on pre-registered exclusion criteria as detailed in (Ho et al., 2022). In short, participants were excluded if 20% or more of their trials were removed based on reaction times, or if they failed 2 out of 3 comprehension checks.”

      (12) The supplemental figures are not referenced in order, and some are not referenced at all; this should be fixed.

      We thank the Reviewer for pointing this out and have reorganized our Supplementary materials accordingly.

      Reviewer #3 (Public review):

      Summary:

      The authors build on a recent computational model of planning, the "value-guided construal" framework by Ho et al. (2022), which proposes that people plan by constructing simple models of a task, such as by attending to a subset of obstacles in a maze. They analyze both published experimental data and new experimental data from a task in which participants report attention to objects in mazes. The authors find that attention to objects is affected by spatial proximity to other objects (i.e., attentional overspill) as well as whether relevant objects are lateralized to the same hemifield. To account for these results, the authors propose a "spotlight-VGC" model, in which, after calculating attention scores based on the original VGC model, attention to objects is enhanced based on distance. They find that this model better explains participant responses when objects are lateralized to different hemifields. These results demonstrate complex interactions between filtering of task-relevant information and more classical signatures of attentional selection.

      Strengths:

      (1) The paper builds on existing modeling work in a novel manner and integrates classic results on attention into the computational framework.

      (2) The authors report new and extensive analyses of existing data that shed light on additional sources of systematic variability in responses related to attentional spillover effects

      (3) They collect new data using new stimuli in the original paradigm that directly test predictions related to the lateralization of task-relevant information, including eye tracking data that allows them to control for possible confounds.

      (4) The extended model (spotlight-VGC) provides a formal account of these new results.

      We thank the Reviewer for their positive assessment of our manuscript and their insightful comments, which has improved the clarity of our findings.

      Weaknesses:

      (1) The spotlight-VGC model has a free parameter - the "width" of the attentional spotlight. This seems to have been fixed to be 3 squares. It would be good if the authors could describe a more principled procedure for selecting the width so that others can use the model in other contexts.

      Our choice for this parameter was informed by the spatial effects reported in Figure 1b. We observed that the two closest neighbouring obstacles to a probe had similar awareness (i.e., positive beta weights). We therefore compute the mean and median distances between obstacle pairs that were the second closest obstacle to a probe. This distance was 3 squares away, as depicted in Figure S15. We fixed the width of the attentional spotlight across all studies based on this observation. We agree that future research utilizing this model may need to tune this hyperparameter depending on the mean distance between a probe and its neighbours.

      We have clarified this point in the methods section on page 35:

      “We fixed the ‘width’ of the attentional spotlight to a distance of 3 squares based on the observation that the two neighbouring obstacles positively predicted the awareness of a probe. We observed that the mean and median distance between neighbouring obstacles of the 2nd rank (i.e., second closest) was 3 squares away for all mazes (Figure S15). We therefore opted to fix the value of the attention spotlight to 3 squares based on these observations. Future work utilizing this model should consider the statistics of their maze stimuli when deciding on the ‘width’ of the attentional spotlight.”

      Following the suggestion of Reviewer 2 point 6, we now also explored inter-individual differences in this parameter. To do so, we first used the lateralized mazes in the dSC1 dataset to determine the optimal width of the attentional spotlight for each individual.

      Then, we used this spotlight to derive model predictions for each person. We observed that these personalized attentional spotlight model predictions fit participants’ awareness reports on non-lateralized mazes better than the fixed-width spotlight model. We believe this preliminary result suggests the importance of modelling inter-individual differences in attentional deployment during planning. We report these effects on page 17.

      (2) Have the authors considered other ways in which factors such as attentional spillover and lateralization could be incorporated into the model? The spotlightVGC model, as presented, involves first computing VGC predictions and only afterwards computing spillover. This seems psychologically implausible, since it supposes that the "optimal" representation is first formed and then it gets corrupted. Is there a way to integrate these biases directly into the VGC framework, perhaps as a prior on construals? The authors gesture towards this when they talk about "inductive biases", but this is not formalized.

      We thank the reviewer for bringing up this very important point. We think that a full computational treatment of the inductive bias would be a distinct project, but now seek to expand our discussion on the mechanisms by which representations could be formed. In this context, we specifically highlight novel computational work from the MIT group that was published as a preprint in the time since we submitted our paper, and which proposes a new process account of construal, the “Just in Time” (JIT) model. We also elaborate on a possible mechanism by which visuospatial attention may aid the dynamics of the construal process. In short, we agree with the reviewer that spatial attention may bias individuals to search over a subset of potential representations based on low-level spatial characteristics of the obstacles (e.g., their spatial spread in the visual field), prior to (or in concert with) a dynamic JIT-like selection process. We now elaborate on these possibilities on pages 27-28:

      “We close by reflecting on opportunities for further work in this area. First, an important next step is to explore the process by which task representations are formed, and how inductive biases might affect the process of task construal. The sVGC model is a normative model of the optimal task representation. Since it’s construction involves an exhaustive calculation over possible paths, it is not a plausible basis for a model of the psychological process by which participants actually construct task representations. More recently a process model of task construal has been proposed, the Just in Time model (JIT). The hypothesis of the JIT model is that participants’ task representations are built up over time by iteratively simulating possible paths through the maze, affording insight into the construal process (Chen et al., 2026). In future work, it would be of interest to ask whether the attentional effects we observe in our experiments could be meshed with a dynamic JIT account of construal. We speculate that visuospatial attention may operate as an early filter, limiting the space of potential construals based on coarse spatial features of the environment, constraining a dynamic selection of obstacles. Brain imaging techniques with high time resolution, such as M/EEG, may be able to shed further light on how task representations are formed as participants plan.”

      […]

      “Fourth, it will also be necessary to elaborate on how bottom-up and top-down aspects of attentional selection are combined to guide complex task representations and plans. Foundational questions remain unanswered, for instance: can multiple spatial locations be preferentially selected at once, i.e. are there multiple spotlights (Awh & Pashler, 2000; McMains & Somers, 2004; Pylyshyn & Storm, 1988; Shaw & Shaw, 1977)? There is also discourse on how spatial attention may move from one location to another: are the intervening visual regions between attended locations similarly selected (Dubois et al., 2009; Kr & Np, 1999; McMains & Somers, 2004, 2005)? Our findings tentatively suggest that individuals are able to attend to disparate spatial regions to form sparse task representations, yet there is substantial variability in how individuals orient their attention during the task. The present paradigm and computational modelling, in conjunction with carefully designed stimuli, may help resolve these outstanding questions.”

      (3) Can the authors rule out that the lateralization effects are the result of memory biases since the main measure used is a self-report of attention?

      We thank the reviewer for bringing up this important point. In our experiments, we sought to measure participants’ subjective awareness of the maze stimuli as a readout of their conscious task representation on each trial. This approach marries an extensive literature on measures of perceptual awareness in consciousness science (e.g., using the Perceptual Awareness Scale) with computational models of planning. Participants’ memory of (their awareness of) the obstacles is inherent to this approach, but just as with similar approaches in consciousness science (e.g. measures of iconic memory in the Sperling paradigm), we think it provides a reasonably “online” measure of awareness. It’s important of course to ensure that results obtained with awareness reports are not idiosyncratic, and generalise to other approaches to quantifying task representations.

      To further bolster the convergent validity of our awareness measure, we reanalyzed the data from Ho and colleagues. In their original paper, they developed a variant of the maze-navigation task where participants were asked to recall the location of obstacles as well as report their awareness (Exp 3) and a third variant of the task where participants could hover their cursors over hidden obstacles to reveal their locations (Exp 4). These data allowed us to validate the awareness reports against objective measures of recall and mouse-tracking data. We observed that the subjective awareness reports of participants were strikingly correlated with recall/hover measures across two independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness). We believe these findings validate participants’ awareness reports. These findings are now reported on page 22 of the manuscript.

      “Finally, we examined the convergent validity of participants’ awareness reports by reanalyzing the memory recall data reported in Ho and colleagues’ experiment (Ho et al., 2022). We reasoned that participants should demonstrate similar task representations regardless of the measure used to probe the construal. In line with this prediction, we observed that the obstacle awareness reports and memory/hover measures were strikingly correlated within three independent samples of participants (Spearman ⍴ = 0.86 between memory accuracy and awareness; ⍴ = 0.86 between confidence in memory and awareness; ⍴ = 0.76 between the probability of hovering over the obstacle and awareness; ⍴ = 0.65 between the duration of the mouse hovering and awareness; see Tables S18 and S19).”

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review)

      Summary:

      In this manuscript the authors derive a mean-field model for a network of Hodgkin-Huxley neurons retaining the equations for ion exchange between the intracellular and extracellular space.

      The mean-field model derived in this work relies on approximations and heuristic arguments that, on the one hand, allow a closed-form derivation of the mean-field equations, and on the other hand restrict its validity to a limited regime of activity corresponding to quasi-synchronous neuronal populations. Therefore, rather than an exact mean-field representation, the model provides a description of a mesoscopic population of connected neurons driven by ion exchange dynamics.

      Strengths:

      The idea of deriving a mean-field model which relates the slow-timescale biophysical mechanism of ion exchange and transportation in the brain to the fast-timescale electrical activities of large neuronal ensembles.

      Weaknesses:

      The idea underlying this work is not completely implemented in practice.

      The derived mean field model do not show a one-to-one correspondence with the neural network simulations, except in strongly synchronous regimes. The agreement with the in vitro experiment is hardly evident, both for the mean-field model and for the network model. The assumptions made to derive the closed-form equations of the mean field model have not been justified by any biological reason, they just allow for the mathematical derivation. The final form of the mean-field equations do not clarify whether or not microscopic variables are used together with macroscopic variables in an inconsistent mixture.

      Comments on revisions:

      The main weaknesses I listed in the first report are still present, since the authors did not answer my questions on a solid basis. I report the list for completeness:

      (1) It seems that the reduction methodology that is employed is not the most suitable one for the single-neuron model they are considering.

      (2) The formulation of the mean-field derivation is unnecessarily complicated. It could be heavily simplified by following previously published approaches to derive biologically realistic neural masses.

      (3) The model seems to work only for highly synchronized situations and not for the standard asynchronous evolution usually observed in neural circuits.

      Therefore, my statement remains unchanged.

      Reviewer #2 (Public review)

      Summary:

      The authors aiming in developing a neural mass model characterized by few collective variables mimicking the dynamics of a network of Hodgkin - Huxley neurons encompassing ion-exchange mechanisms. They describe in details the derivation of the mean-field model , then they compare experimental results obtained for the hippocampus of a mice with the neural network simulations and the mean-field results. Furthermore, they report a bifurcation analysis of the developed model and simulation of a small network containing various coupled neural masses, somehow moving towards the simulation of an entire connectome.

      Strengths:

      The author attempts to develop a mean-field model for a globally coupled network of heterogeneous Hodgkin-Huxley neurons with explicit ion exchange mechanism between the cell interior and exterior.

      Weaknesses:

      (1) They do not employ the reduction methodology more suited for the single neuron model they consider.

      (2) Their derivation of the neural mass model is based on several assumptions, and not all well justified.

      (3) Their formulation of the mean-field derivation is unnecessary complicated, it can be strongly simplified by following previously published approaches to derive biologically realistic neural masses.

      (4) Their model seems to work only for highly synchronized situations and not for the standard asynchronous evolution usually observed in neural circuits.

      General Statements:

      The authors honestly declared the many limitations of their approach, once assumed this the results of the mean-field are somehow inconsistent with the neural network simulations as expected.

      The authors suggest to employ this model for the simulations on the whole connectome to follow seizure propagation, however I believe that a simpler model, as the Epileptor, remains superior in this respect to this model. That indeed includes biophysical parameters but their correspondence with the ones employed in the network dynamics remain elusive, due to the many assumptions required to derive this mean field model. Furthermore it is more complicated than the Epileptor, I do not think that the present model will be largely employed by the community.

      Comments on revisions:

      The authors have corrected mistakes present in the manuscript and put a correct list of references.

      However, they refuse

      (1) To simplify the formulation of the model, the model contains unnecessary complications, as I have clearly written in my report, the authors agree, but they do not want to change the formulation;

      (2) To derive the mean field model in a simpler way, as possible, and as I asked many times in my Referee report, this would help the readers to understand the important aspect of the derivation, without not needed and confusing complicated formulations;

      (3) To compare direct simulations of the network with neural mass results in sub-section "Bifurcation analysis: emergent network states and multistability" to show bistability, as I asked.

      As a matter of fact the performed modifications do not solve my previous doubts on the validity of the results reported in the manuscript.

      Therefore, my previous assessments remain valid.

      We thank the editors and the two reviewers for their continued engagement with our manuscript. The three weaknesses retained from the first round are essentially identical between the two public reviews:

      (i) The reduction methodology is not the most suitable for the single-neuron model we consider;

      (ii) The mean-field derivation is unnecessarily complicated;

      (iii) The model works only in highly synchronous regimes and does not reproduce the asynchronous evolution typical of neural circuits.

      Both reviewers explicitly note that their assessments remain unchanged and we have decided not to alter the formulation of the model. We use this response to state—on the public record—exactly where we agree with the reviewers, where we disagree, and why.

      On point (i): the reduction methodology.

      We fully agree with the reviewers' technical observation: the Ott–Antonsen / Lorentzian-ansatz reduction in the form introduced by Montbrió, Pazó and Roxin (2015) is exact for canonical Type I neurons (QIF), whose membrane-potential equation is quadratic, and is not directly applicable to a Type II / Hodgkin–Huxley-type neuron whose voltage dynamics is cubic-like. On this point there is no disagreement.

      Where we differ is in the conclusion the reviewers draw from this observation. The reviewers read our work as applying an inappropriate reduction methodology to an inappropriate neuron model. We instead positioned our work, from the outset, as an extension of that methodology: we keep the biophysically detailed Hodgkin–Huxley substrate (because it is the only level at which extracellular ion concentrations, depolarization block, bursting and seizure-like events are biophysically grounded), and we adapt the reduction by approximating the cubic voltage nullcline as a piece-wise quadratic with two parabolas of opposite curvature. This is explicitly an approximate, not exact, mean-field. The Lorentzian ansatz is then applied on each branch of the piece-wise quadratic, with the limitations of this construction analyzed in the manuscript.

      The reviewers' alternative—starting from a Type I canonical model and grafting on biophysical features—would indeed yield an exact mean-field, but it would forfeit precisely what motivates our work: a tractable mesoscopic description in which the slow variables are physiologically interpretable ion concentrations rather than phenomenological parameters. The trade-off is that we give up exact rigour in order to construct a bridge between the Montbrió-style next-generation neural mass models on one side and the Epileptor on the other, with the additional benefit that the parameters of the resulting neural mass retain a biophysical correspondence (e.g., [K<sup>+</sup>]_bath, Δ[K<sup>+</sup>]_int, [K<sup>+</sup>]_g, the gating variable n) that the Epileptor does not afford.

      We therefore respectfully maintain our position: the methodology is not "the wrong reduction for a Type II neuron"; it is an extended reduction designed to be applicable beyond the Type I case, with explicitly characterized validity.

      On point (ii): the formulation is unnecessarily complicated.

      We agree with the reviewers that, given the assumptions we ultimately adopt, namely that the gating variable n and the potassium concentrations Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are treated as collective (mesoscopic) variables shared by the population, with n a function of the average membrane potential, the closed neural mass equations could be reached by the more direct path used by Guerreiro et al. (2022) and the related literature (R1–R7). In the revised manuscript we now state this explicitly, and we note that the same five-dimensional system arises under either derivation.

      Our choice to follow Chen and Campbell (2022) is motivated by the fact that it makes each approximation visible at the point where it is invoked. In particular, it exposes the moment-closure step (Eq. 19), the vanishing-flux boundary condition (Eq. 28), and the locations where microscopic and mesoscopic variables enter the description. We believe that for a reader trying to extend our framework, for instance to a setting with partial heterogeneity in the slow variables, or with stochastic gating, this is the more useful presentation. We have added a remark stating that the simpler Guerreiro-type derivation reaches the same equations under our assumptions, so that readers can take whichever route they find clearer.

      On point (iii): the model only works in highly synchronous regimes.

      Here we partially agree and partially disagree, and we would like the partial disagreement to appear on the public record.

      We agree that the Lorentzian ansatz is, strictly, valid in regimes where the population's membrane potential distribution is unimodal, that is, when essentially all neurons sit on the same side of the threshold V*. Where we disagree is with the implication that the mean-field model fails outside the strongly synchronous regime. The supplementary analysis in Fig. S2, added in the previous round, quantifies the error introduced by the first-moment approximation of n as a collective variable across the full range of [K<sup>+</sup>]_bath values, spanning quiescent, bursting, seizure-like, sustained ictal and depolarization-block dynamics. The fraction of neurons whose gating variable deviates from the population mean is below 2% for the parameters used throughout the manuscript, and the error becomes appreciable only during the brief transitions between sub- and supra-threshold states. These are precisely the moments at which the population is genuinely bimodal and the single-Lorentzian assumption is theoretically expected to leak. In other words, the error peaks coincide with the moments where our derivation tells us in advance that the assumption is locally invalid; the model "knows where it fails." Away from these transitions, the mean-field tracks the population average across all dynamical regimes shown in Fig. 3, not only in the most strongly synchronized ones.

      This is, in our view, the strongest argument we can make: we are not claiming exactness, and we are not unaware of the limitations. We have characterized them analytically (the construction of the piece-wise Lorentzian, and the theoretical reason a closed solution exists only when the two branches collapse onto one), and we have characterized them numerically (Fig. S2). The deviations are bounded, their location in parameter space is well identified, and they coincide with transitions where the underlying assumption is locally violated. We believe this constitutes a controlled approximation rather than an uncontrolled one, and we would like this distinction to be visible to readers of the Reviewed Preprint.

      We note, in this connection, that the reviewers' preferred reference point, the next-generation neural mass model of Montbrió et al. (2015), which is exact and one-to-one with its underlying network, is exact precisely because the underlying network is a network of QIF neurons. The corresponding statement for a network of Hodgkin–Huxley-type neurons with explicit ion exchange does not, to our knowledge, exist in closed form, and may not exist at all. The relevant question is therefore not whether our model matches the exactness of the QIF case, but whether the controlled approximation we provide is useful. Given the qualitative agreement with neural-network simulations across the full range of [K<sup>+</sup>]_bath, the qualitative agreement with the in vitro recordings, and the recovery of the expected bifurcation structure with new emergent regimes, we believe the answer is yes.

      Other outstanding points in the review.

      Reviewer 2 reiterates the view that the Epileptor remains superior for whole-connectome seizure-propagation simulations because it is simpler and better characterized. We do not dispute that the Epileptor is more thoroughly analyzed and more parsimonious. The complementarity we propose is not a replacement but a parameter-grounding, as the Epileptor's phenomenological parameters (excitability, slow permittivity) acquire, in the present framework, an interpretation in terms of measurable biophysical quantities (extracellular potassium, intracellular potassium variation, glial buffering).

      We thank the reviewers and editors once again for their careful reading, and we are grateful that the points of disagreement have been sharpened to a state where readers can judge them transparently.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors derive a mean-field model for a network of Hodgkin-Huxley neurons retaining the equations for ion exchange between the intracellular and extracellular space.

      The mean-field model derived in this work relies on approximations and heuristic arguments that, on the one hand, allow a closed-form derivation of the mean-field equations, and on the other hand restrict its validity to a limited regime of activity corresponding to quasi-synchronous neuronal populations. Therefore, rather than an exact mean-field representation, the model provides a description of a mesoscopic population of connected neurons driven by ion exchange dynamics.

      We agree with the reviewer's characterization. Our manuscript describes the derivation as relying on "approximations and heuristic arguments" and states that "the derivation is not exact"; what we provide is a controlled, approximate mesoscopic description in which the slow variables are physiologically interpretable ion concentrations rather than phenomenological parameters. An exact closed-form thermodynamic limit is, to our knowledge, available only for canonical Type I (QIF) networks (Montbrió, Pazó and Roxin, 2015) and a few of their extensions; it is not currently known for a Hodgkin–Huxley-type network with explicit ion-exchange dynamics. We acknowledge that the original description of the regime of validity may have caused confusion on this point, and in the revised manuscript we have therefore replaced the looser formulation "strongly synchronous regimes" by the more accurate "regimes where the membrane-potential distribution is unimodal and can be reasonably approximated by a Lorentzian" throughout the manuscript.

      Strengths:

      The idea of deriving a mean-field model that relates the slow-timescale biophysical mechanism of ion exchange and transportation in the brain to the fast-timescale electrical activities of large neuronal ensembles.

      We thank the reviewer for recognizing the motivation behind our work. This explicit coupling between slow biophysical ion dynamics and fast electrical activity is precisely the feature we tried to preserve in the reduction, even at the cost of giving up exactness.

      Weaknesses:

      The idea underlying this work is not completely implemented in practice.

      We address this general statement through the four specific sub-points the reviewer raises in the paragraph that follows.

      The derived mean field model does not show a one-to-one correspondence with the neural network simulations, except in strongly synchronous regimes.

      We partially agree and partially disagree. We agree that the Lorentzian ansatz is strictly valid where the membrane-potential distribution is unimodal, i.e. when essentially all neurons sit on the same side of the threshold V*. We disagree with the implication that the mean-field fails outside this regime. To make this claim quantitative, we added a new supplementary figure (Fig. S2) that quantifies the deviation of individual neurons' gating variables from the population mean across the full range of [K<sup>+</sup>]_bath values—quiescent, bursting, seizure-like, sustained ictal and depolarization-block dynamics. The fraction of deviating neurons is below 2% for the parameters used in the manuscript, with localized peaks only during the brief, genuinely bimodal transitions between sub- and supra-threshold states—precisely the moments at which the theory predicts the assumption to be locally invalid. Away from these transitions, the mean-field tracks the population average across all dynamical regimes shown in Fig. 3, not only in the strongly synchronized ones.

      The agreement with the in vitro experiment is hardly evident, both for the mean-field model and for the network model.

      We acknowledge that the experimental and simulated traces in the original Fig. 4 did not match quantitatively; this was never our intention. The figure and its caption have been reorganized in the revised manuscript to frame the comparison as qualitative: we aim to demonstrate the shared structure i.e., the slow modulation of fast population activity by extracellular potassium fluctuations, rather than to claim a quantitative fit.

      We also added two clarifications that account for the residual differences: (i) the network simulations were intentionally run with rescaled biophysical parameters (membrane capacitance, gating time constants) to keep the computational cost feasible, a standard practice when the goal is to validate dynamical mechanisms rather than absolute timescales; (ii) the in vitro LFP recordings were AC-coupled, so the slow DC components visible in the mean-field traces are filtered out at acquisition.

      The assumptions made to derive the closed-form equations of the mean-field model have not been justified by any biological reason, they just allow for the mathematical derivation.

      We agree that the modelling assumptions were scattered through the original derivation. In the revised manuscript, the three core assumptions are stated explicitly at the point of derivation: (i) the gating variable n is treated as a collective, population-averaged variable; (ii) the potassium concentrations Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are homogeneous across the population, biophysically justified by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforces near-instantaneous equilibration at the mesoscopic scale; (iii) no heterogeneity is assumed at the level of ion dynamics. The meaning of "locally homogeneous" is now defined explicitly.

      On the biophysical motivation of the in vitro perturbation used in the experiment, we have added a new Methods subsection that explains how low extracellular Mg<sup>2+</sup> unblocks NMDARs and abolishes the divalent-cation stabilisation of the resting membrane potential, depolarising hippocampal neurons and increasing the driving force for outward K<sup>+</sup> currents. This provides a biophysical link between the experimental perturbation and the model's main control parameter, the extracellular potassium concentration. We also added a reference to the well-established model of epileptic discharges that underpins the experiment.

      The final form of the mean-field equations does not clarify whether or not microscopic variables are used together with macroscopic variables in an inconsistent mixture.

      We now explicitly acknowledge that in the spiking-network simulations the gating variable n is microscopic (each neuron has its own n_i), whereas in the mean-field derivation it is treated as mesoscopic and shared by the population. This asymmetry between modalities is discussed both in the Results and in the Limitations sections, and is identified as a likely source of some of the discrepancy between the two modalities.

      We have also made the notation in Eqs. (36)–(37) consistent (firing rate r used throughout, full current-based dV/dt̄ restored) and fixed the typos and broken equation/reference labels that contributed to the impression of inconsistency (Eqs. 18, 28, 29; the Fig. 2(c) [K<sup>+</sup>] bath label; the lost reference at line 696).

      Reviewer #2 (Public review):

      Summary:

      The authors aim to develop a neural mass model characterized by a few collective variables mimicking the dynamics of a network of Hodgkin – Huxley neurons encompassing ion-exchange mechanisms. They describe in detail the derivation of the mean-field model, then they compare experimental results obtained for the hippocampus of a mouse with the neural network simulations and the mean-field results. Furthermore, they report a bifurcation analysis of the developed model and simulation of a small network containing various coupled neural masses, somehow moving towards the simulation of an entire connectome.

      We thank the reviewer for the accurate summary of the manuscript's structure and aims.

      Strengths:

      The author attempts to develop a mean-field model for a globally coupled network of heterogeneous Hodgkin-Huxley neurons with an explicit ion exchange mechanism between the cell interior and exterior.

      We thank the reviewer for recognizing this objective. The retention of Hodgkin–Huxley dynamics with explicit ion exchange is precisely the feature that distinguishes our framework from QIF-based reductions, and it is what enables the slow variables of the resulting mean-field to retain a direct biophysical interpretation.

      Weaknesses:

      (1) It seems that the reduction methodology that is employed is not the most suitable one for the single-neuron model they are considering.

      We agree, on technical grounds, with the observation: the Ott–Antonsen / Lorentzian-ansatz reduction is exact for canonical Type I neurons (QIF) and is not directly applicable to a Type II Hodgkin–Huxley-type neuron with a cubic-like voltage nullcline. Where we differ is in the conclusion. We did not apply an inappropriate reduction to an inappropriate neuron; we deliberately extended the methodology by approximating the cubic nullcline as a piece-wise quadratic with two parabolas of opposite curvature, and then applying the Lorentzian ansatz on each branch. The result is an explicitly approximate, biophysically grounded mean-field, with its regime of validity stated and quantified (Fig. S2).

      To make this positioning explicit, we have added a paragraph to the Introduction that situates our work within the next-generation neural mass literature (Byrne et al. 2020; Montbrió, Pazó & Roxin 2015; Guerreiro et al. 2022; Forrester et al. 2024; Perl et al. 2023; Gerster et al. 2021; and works on short-term plasticity, adaptation, conductance-based reductions,

      spike-timing-dependent plasticity, random connectivity and noise) and clarifies that we see our contribution as complementary to these approaches, not as a competitor to the exact QIF reductions.

      (2) The authors' derivation of the neural mass model is based on several assumptions, and not all well justified.

      We agree that, in the original submission, the modelling assumptions were scattered through the derivation. In the revised manuscript, the three core assumptions are stated explicitly at the point of derivation: (i) the gating variable n is treated as a collective population-averaged variable; (ii) the potassium concentrations Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are homogeneous across the population, biophysically justified by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforces near-instantaneous equilibration at the mesoscopic scale; (iii) no heterogeneity at the level of ion dynamics is assumed. The meaning of "locally homogeneous" is now defined explicitly. In addition, we have added Fig. S2, which quantifies numerically the error introduced by the moment-closure assumption (deviation below 2% for the parameters used in the manuscript).

      (3) The formulation of the mean-field derivation is unnecessarily complicated. It could be heavily simplified by following previously published approaches to derive biologically realistic neural masses.

      We agree that, under the assumptions ultimately adopted in our model—namely that n, Δ[K<sup>+</sup>]_int and [K<sup>+</sup>]_g are mesoscopic—the final five-dimensional system can be reached by the more direct path used by Guerreiro et al. (2022) and the related literature. We now state this explicitly in the revised manuscript and note that the same system arises under either derivation, so that the reader can take whichever route they find clearer. Our choice to retain the Chen and Campbell (2022) formalism is pedagogical: it exposes the moment-closure step (Eq. 19), the vanishing-flux boundary condition (Eq. 28), and the locations where microscopic versus mesoscopic variables enter the description, which is the more useful presentation for a reader wishing to extend the framework (e.g. to partial heterogeneity in the slow variables or to stochastic gating). We also made the notation in Eqs. (36)–(37) consistent (firing rate r used throughout, full current-based dV/dt̄ restored) and fixed a number of typos and broken equation/reference labels.

      (4) The model seems to work only for highly synchronized situations and not for the standard asynchronous evolution usually observed in neural circuits.

      We partially agree and partially disagree. We agree that the Lorentzian ansatz is strictly valid where the membrane-potential distribution is unimodal; we have replaced "strongly synchronous regimes" by this more accurate formulation throughout the manuscript. We disagree, however, with the implication that the mean-field is useful only in those regimes. Fig. S2, added in this revision, explicitly quantifies the deviation across all dynamical regimes (quiescent, bursting, seizure-like, sustained ictal and depolarization-block dynamics): it remains below 2% for the parameters used in the manuscript, with localized peaks only during the brief sub-to-supra-threshold transitions where the population is genuinely bimodal. Away from these transitions, the mean-field tracks the population average across all dynamical regimes shown in Fig. 3.

      General Statements:

      The authors honestly declared the many limitations of their approach. It is assumed that the results of the mean-field are somehow inconsistent with the neural network simulations as expected.

      We thank the reviewer for acknowledging that the limitations are honestly declared. As detailed above and quantified in Fig. S2, the deviation from the network simulations is bounded and well characterized; it is not assumed but measured.

      The authors suggest employing this model for the simulations on the whole connectome to follow seizure propagation, however, I believe that the Epileptor remains superior in this respect to this model. That indeed includes biophysical parameters but their correspondence with the ones employed in the network dynamics remains elusive, due to the many assumptions required to derive this mean-field model. Furthermore, it is more complicated than the Epileptor, I do not think that the present model will be largely employed by the community.

      We do not propose our model as a direct replacement for the Epileptor and we do not dispute that the Epileptor is more thoroughly analyzed and more parsimonious. The complementarity we propose is not a replacement but a parameter-grounding: the Epileptor's phenomenological parameters (excitability, slow permittivity) acquire, in our framework, a concrete interpretation in terms of measurable biophysical variables (extracellular potassium, intracellular potassium variation, glial buffering). Retaining the Hodgkin–Huxley substrate is essential to ground these variables biophysically.

      To make this complementarity more visible, the Limitations and Discussion section has been expanded to discuss the choice of a purely excitatory network as a first step (with excitatory–inhibitory generalizations available via the synaptic reversal potential) and to point to additional biological ingredients (calcium and other ions, plastic synapses, random connectivity and noise, adaptation, spike-timing-dependent plasticity) that the framework can accommodate, with reference to the next-generation neural mass literature.

      We thank the reviewers and editors for their careful reading. We hope this public response makes our reasoning, the limits of our approach, and the concrete revisions made in this round transparent.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In general, the writing is scattered. Every time a model is introduced, one starts from the general formulation only to find that a very simplified case is used with respect to that formulation, which is very confusing. Authors need to reduce unnecessary formulations that confuse the reader and make it clear which formulations are actually used.

      We thank the reviewer for this comment and understand the concern regarding the balance between general formulations and specific approximations. Our intention in including the more general equations and derivations (e.g., Eq. 7 and others) was pedagogical — to ensure completeness and transparency in the modeling steps, especially for readers less familiar with mean-field reductions of biophysically detailed models. These general forms also serve to clarify the assumptions underlying the simplifications we employ. In the latest version, we improved the clarity of core equations (e.g., Eq. 37), which form the basis of all simulations presented (see details below, in the answer to question 14).

      (2) The Introduction would benefit from a wider view of the literature. The literature on exact mean field models (i.e. derived from the Lorentzian Ansatz) has flourished in the last years. In particular, it would be worth considering the following papers, where exact neural mass models are applied to perform whole-brain and large-scale brain simulations:

      Forrester, M., Petros, S., Cattell, O., Lai, Y. M., O'Dea, R. D., Sotiropoulos, S., & Coombes, S. (2024). Whole brain functional connectivity: Insights from next generation neural mass modelling incorporating electrical synapses. PLOS Computational Biology, 20(12), e1012647.

      Perl, Y. S., Zamora-Lopez, G., Montbrio, E., Monge-Asensio, M., Vohryzek, J., Fittipaldi, S.,

      Campo, C. G., Moguilner, S., Ibanez, A., Tagliazucchi, E., Yeo, B. T. T., Kringelbach, M. L., & Deco, G. (2023). The impact of regional heterogeneity in whole-brain dynamics in the presence of oscillations. Network Neuroscience, 7(2), 632-660.

      Byrne, Aine, James Ross, Rachel Nicks, and Stephen Coombes. "Mean-field models for EEG/MEG: from oscillations to waves." Brain topography 35, no. 1 (2022): 36-53.

      Gerster, M., Taher, H., Skoch, A., Hlinka, J., Guye, M., Bartolomei, F.,... & Olmi, S. (2021). Patient-specific network connectivity combined with a next generation neural mass model to test clinical hypothesis of seizure propagation. Frontiers in Systems Neuroscience, 15, 675272.

      Byrne, Aine, Reuben D. O'Dea, Michael Forrester, James Ross, and Stephen Coombes. "Next-generation neural mass and field modeling." Journal of neurophysiology 123, no. 2 (2020): 726-742.

      Benitez-Stulz, Sophie, Samy Castro, Gregory Dumont, Boris Gutkin, and Demian Battaglia. "Compensating functional connectivity changes due to structural connectivity damage via modifications of local dynamics." bioRxiv (2024): 2024-05.

      We have added the following paragraph:

      “Recently, a class of these models, called next-generation neural mass models [42], has been developed based on an analytical approach introduced by [25] that allowed for the exact derivation of mean field parameters for a population of quadratic integrate-and-fire (QIF) neurons. These can be linked to EEG/MEG oscillations [43], including epipeltic seizures [43], and have been used to study various aspects of the whole-brain dynamics such as the low-dimensional manifold of the resting state [45,46], aging [47] and neural signatures of consciousness [48].”

      We have also modified the preceding paragraph of the introduction that now reads:

      “At the mesoscopic level, the observable properties of a neuronal ensemble are generally explained by statistical physics formalism of mean-field theory [19-22]. Mean-field models demonstrated a predictive value for studying the mesoscopic dynamics of neuronal populations [23], providing statistical descriptions of neuronal networks [2, 19, 24-29], which can be used to address questions related to network-level mechanisms [12, 24, 30].

      In general, neural mass models have a low enough number of parameters to be tractable and provide general intuitions regarding mechanisms underlying complex neuronal activity [31-36]. For example, statistical population measures, such as the firing rate, can be used to assess mesoscopic dynamics [1, 7, 31, 36-41].”

      (3) Moreover, conductance-based models have been already implemented in neural mass models not only in references [69, 71, 95], but also in:

      Guerreiro, I. C., Di Volo, M., & Gutkin, B. (2023). A new generation of reduction methods for networks of neurons with complex dynamic phenotypes.

      Capone, C., Di Volo, M., Romagnoni, A., Mattia, M., & Destexhe, A. (2019). State-dependent mean-field formalism to model different activity states in conductance-based networks of spiking neurons. Physical Review E, 100(6), 062413.

      We have added the following sentence:

      “Moreover, conductance-based couplings between the spiking neurons have been already implemented in neural mass models [58, 59, 91, 93, 121], but without an extracellular exchange mechanism.”

      (4) Sec. 1.1 As previously established in the literature, a system of all-to-all coupled neuronal equations can be solved exactly in the thermodynamic limit (i.e., infinite neurons limit) if the single neuron membrane potential equation is a quadratic function and if the instantaneous distribution of membrane potentials of neurons in a population is described by a Lorentzian [Montbrió, E., Pazó, D. & Roxin, A. Physical Review X 5 (2), 021028 (2015)]. This means that the thermodynamic limit can be performed for a Canonical Type I model like the quadratic integrate-and-fire.

      What is the biological justification and the reason to approximate a different neuron type (a type II neuron model), whose membrane potential equation resembles a cubic function, with a quadratic function? The fact that it can be solved in the quadratic approximation is not, in my opinion, a sufficient justification. It would be more correct to start from a type I neuron at the microscopic level with a quadratic function and then provide additional biological features.

      We thank the reviewer for raising this important point. We respectfully disagree with the notion that starting from a canonical Type I model (such as the quadratic integrate-and-fire neuron) would be a more biologically grounded approach. While the quadratic form is analytically convenient, it does not capture certain key features of neuronal excitability particularly those related to bursting, seizure-like events, and depolarization block which are closely tied to the cubic-like nullcline geometry arising in Hodgkin–Huxley-type models, especially in the presence of slow ion dynamics.

      Our work seeks to bridge biophysical realism with analytical tractability. The step-wise quadratic approximation we employ is specifically designed to mimic the cubic membrane potential profile that emerges from the full ion-exchange dynamics. While the Lorentzian Ansatz is not strictly justified in this case from first principles, we show that it yields a workable and biologically interpretable mean-field description, which aligns with single-neuron dynamics, population simulations, and even in vitro observations. To our knowledge, this is a novel contribution that extends mean-field modeling beyond currently available approaches, which are often restricted to simplified or phenomenological neuron models.

      In this context, using a quadratic approximation is not merely a mathematical convenience — it is a means to retain key dynamical features of more realistic (non-Type I) neurons within a tractable framework, enabling insights into complex behaviors like multistability and pathological bursting.

      (5) Sec. 1.2 As shown in Figure 3, the mean-field equations do not show a one-to-one correspondence with the neural network simulations, except in strongly synchronous regimes. This represents a strong limitation in the model, especially because exact neural mass models (as shown in Reference [23]) perfectly fit the dynamics of the underlying network model both in the asynchronous and in the synchronized regime.

      We appreciate the reviewer’s observation and acknowledge that our original description may have caused confusion. The model's validity is not strictly limited to strongly synchronous regimes, but rather to regimes where the distribution of membrane potentials across the neuronal population remains unimodal and can be reasonably approximated by a Lorentzian. This includes but is not restricted to—highly synchronized states.

      We agree that this distinction is important and have clarified it in the revised manuscript (e.g., “in strongly synchronous regimes” —> “in regimes where the membrane potentials' distribution is unimodal and can be reasonably approximated by a Lorentzian”).

      In contrast to exact mean-field reductions based on quadratic integrate-and-fire neurons (e.g., [23]), our model originates from a biophysically grounded HH-type neuron with ion exchange dynamics, and necessarily involves heuristic approximations to achieve a closed-form mean-field description. While this results in a less exact correspondence with network simulations in more heterogeneous or bimodal states, our goal was to retain biological interpretability and account for phenomena such as ion-driven bursting and seizure-like transitions, which are not captured by standard QIF-based neural masses.

      We see our contribution as complementary to existing exact reductions — offering a biophysically grounded alternative that remains tractable and informative in a relevant class of unimodal, mesoscopic dynamical regimes.

      (6) Sec. 1.3 In this section the authors show the comparison between in vitro experiments and simulations with both the network model and the neural mass model (Figure 4, panels a,b,c). The qualitative agreement that is supposed to be shown is hardly evident. The shape of the signals is different as is the type of bursting. The only agreement results in the fact that there are repeated spiking events at successive times in a periodic manner. However, the time scale of the simulations is different for neural network simulation and mean-field experiment, making it difficult to compare them. While the period of the bursting event is around 2 min for mean field simulation (in according with experiments), the time scale of the network simulation is 60 times smaller, thus meaning that we are considering completely different mechanisms and phenomena. The justification given by the authors, that "the parameters were modified to simulate shorter fluctuations (in the network of Hodgkin-Huxley neurons) for computational efficiency" is inappropriate.

      The poor agreement turns out to be even worse in the comparison between experiments and mean-field simulations shown in panels d and e of Figure 4. While the mean field simulation is characterized by a periodic behaviour both in the mean membrane potential and in the external potassium concentration, the in-vitro traces are not periodic and show an increasing irregular activity of the extracellular LFP in correspondence with increasing external potassium concentration.

      How it is possible to justify the implementation of this model if the working hypotheses are not supported by the results? The worst agreement of the network simulations with the experiments reinforces the doubt raised in the previous point: what is the reasoning underlying the choice of Hodgkin-Huxley as a single neuron model?

      We thank the reviewer for this detailed critique. We acknowledge that the comparisons in Figure 4 involve limitations and we now provide a clearer rationale and context in the revised manuscript. First, we emphasize that our intention is not to claim a quantitative match between the experimental and simulated traces, but rather to demonstrate that our model grounded in biophysical mechanisms such as ion exchange is capable of qualitatively reproducing a key feature observed experimentally: the slow modulation of neuronal activity by extracellular potassium concentration. For example, both in vitro (Fig. 4a, 4d) and in our simulations (Fig. 4b, 4e), bursts of activity ride on slower oscillations of potassium, and the interplay of fast and slow dynamics is central to both.

      Regarding the discrepancy in timescales between the neural network and mean-field simulations: the network simulations were intentionally run with accelerated dynamics by rescaling biophysical parameters (e.g., membrane capacitance and gating time constants) to keep the computational cost feasible. We now clarify in the manuscript that this choice is standard practice in computational modeling when the primary goal is to validate dynamical mechanisms rather than replicate absolute timescales.

      On the shape of LFP signals: the experimental recordings were AC-coupled, and the DC components associated with slower shifts in membrane potential such as those modeled in the mean-field simulations are not captured in those recordings. This limits the visibility of key features like the underlying potential jumps. Additionally, no claim is made regarding a specific bursting classification in either data or simulation.

      We agree that the experimental trace in Fig. 4d shows more complex, non-periodic dynamics (e.g., slowing burst frequency and irregularity), which are not captured by our current deterministic model. These differences could plausibly arise from additional physiological processes (e.g., stochastic transitions between metastable regimes or variability in ion regulation) that are not modeled here. In future work, such phenomena may be captured by introducing noise or parameter variability (see, e.g., Saggio et al., A taxonomy of seizure dynamotypes , elife 2020), or by allowing the parabola coefficients in the nullcline approximation to vary dynamically.

      Finally, regarding the choice of a Hodgkin–Huxley-type neuron: this model allows us to incorporate a biophysical description of ion exchange, which is central to the phenomena we study. While modeling the spiking mechanisms explicitly precludes certain mathematical simplifications available to very simplified neuron models with reset, it enables direct links between mesoscopic dynamics and measurable quantities such as extracellular potassium an essential objective of our work. To summarize, we rearranged Fig4:

      Potassium can have periodic behavior with V bursting riding on top (Fig.4 a). The model also shows this behavior at different timescales (Fig. b,c,e).

      AC LFP recording is filtered so we might not see the V jump during the bursts (because we do not have DC recordings). No claim about bursting class here.

      Potassium can also have more complex behavior (e.g., slowing down of burst frequency Fig.4.d), that the deterministic model do not show, but maybe exploring dynamical parameters (e.g., from parabolas or K_bath) or with added noise allowing to jump between regimes (reference Saggio et al. eLife 2020).

      (7) Sec. 1.5 Here six neural masses are coupled via long-range structural connections with random weights. Simulations of the system are shown for two different values of the global coupling parameter (G = 0 and G = 100). How many realisations of the network have been considered?

      We thank the reviewer for pointing this out. The presented simulation was intended as a proof-of-concept demonstration to illustrate the model’s capacity to support network-level propagation of pathological activity. For this purpose, we considered a single representative realization of the structural connectivity with random weights. Given the deterministic nature of the model and the qualitative focus of the demonstration, additional realizations do not qualitatively change the observed behavior — namely, the transition from localized to network-wide bursting as coupling strength increases. We have now clarified this in the revised manuscript.

      “This simulation serves as a proof of concept to illustrate how local pathological activity can propagate through a network depending on the strength of coupling. We used a single representative realization of randomly weighted structural connectivity. While we did not perform a systematic exploration of different realizations or coupling strengths, we observed that the qualitative behavior namely, the emergence of network-wide bursting beyond a critical coupling threshold remains robust across similar setups. The model is compatible with empirical connectome data and can be readily extended to simulations using realistic brain network architectures.”

      In future applications involving data-driven network architectures or variability analyses, we agree that exploring multiple realizations or empirical connectomes will be valuable.

      How do the results depend on the different choices of the random weights? What is the dependence of the emergent dynamics on G? What kind of dynamics can be observed varying smoothly the parameter G (e.g. from 0 to 100)?

      This section serves as a proof of concept to show that pathological activity in one node can propagate through the network when coupling is strong. We used a single random weight configuration and did not systematically explore variations in G or connectivity. While richer dynamics likely emerge across intermediate values of G, a full parameter sweep is beyond the scope of this study. We clarify this in the revised text (see answer above).

      (8) Sec. 2.1 In the description of the experiment it is mentioned that only Mg^{2+} is varied. What is the role played by Mg^{2+} variation in influencing the external potassium concentration variation? How the experiment can be linked to the model? How the hypothesis of introducing an equation for the potassium concentration current in the microscopic model is supported by the experiment and vice-versa?

      We thank the reviewer for this question. We have added a new subsection in the Methods explaining the.agnesium removal as a mean to influence the external potassium dynamics:

      “The membrane of hippocampal neurons is equipped with N-methyl-D-aspartate type glutamate receptors (NMDARs). These receptors have a very high affinity for glutamate and can, in principle, be activated by ambient glutamate present at low concentrations in the brain extracellular fluid (ECF). Under normal physiological conditions, this activation does not occur because extracellular magnesium ions (Mg<sup>2+</sup>) block the NMDAR channel at membrane potentials more negative than about –50 mV; this voltage-dependent block prevents receptor activation at rest. When extracellular magnesium is removed, the block is relieved, allowing NMDARs to be activated, leading to neuronal depolarization toward the action potential threshold [117].”

      “In addition, as a divalent cation, Mg<sup>2+</sup> interacts with the negatively charged neuronal membrane, contributing to the stabilization of the resting membrane potential. Lowering extracellular magnesium concentration disrupts this effect, resulting in membrane depolarization [118].”

      “Consequently, magnesium removal not only facilitates NMDAR-dependent depolarization, but also directly depolarizes neurons. This depolarization increases the driving force for outward potassium currents through K<sup>+</sup> channels, meaning that variations in Mg<sup>2+</sup> can indirectly influence external potassium dynamics during neuronal activity.”

      (9) Sec. 2.6 The modified version of the continuity equation has been derived following Reference [95], where the authors consider a network of Izhikevich neurons, and each neuron is modelled by a two-dimensional system consisting of a quadratic integrate and fire equation plus an equation that implements spike frequency adaptation. In particular, in [95] the authors achieve a closed set of mean-field equations with the inclusion of the mean-field dynamics of the adaptation variable by using a Lorentzian ansatz combined with the moment closure approach. The moment closure condition is also assumed in the present manuscript (Eq. 19). Under which assumptions is the implementation of the moment closure condition justified?

      We are thankful to the reviewer (and also to the R2) for pointing out to the validity of the justification of the assumptions that we have used in our formalism. We hence agree that the moment closure is not a sufficient justification for assuming that V depends on the mean n, which is neccessary for the derivation of Eq. 20, but in addition we need the assumption that n can be treated as a collective variable as it is done in the works mentioned by the reviewer 2. In addition we have performed numerical simulations of the full system to calculate the error term introduced by this approximation, and the results in the new Fig. S2 show that this is below 2% for each of the different dynamical regimes.

      We have hence modified the justification for Eq. (19) reading:

      “Next we assume a first-order moment closure condition for the variable n [59], justified by the numerical simulations of the full network (see Fig. S2) which show that for most of the neurons (close to 99 \% for the value of ∆ same as in the other simulations) the mean of the population is well capturing the behavior of the single neurons [122]. Finally, putting together these factors and assuming that n can be treated as a collective variable for each neuron (see Limitations of the model} section) we arrive to ” and also

      “The validity of the first moment closure, Eqs. (19), as in [59], is supported by the numerical simulations, which show that, both, during the silent regime and when seizure-like events occur, n<sub>i</sub> for most neurons track the network averaged ⟨n | V, η⟩. In particular, it is less than 2% of the neurons that fire while the mean is low, and vice-versa, Fig. S2. In less synchronized scenarios (larger ∆ or smaller J), however, this value would increase, but the mean would always capture the qualitative behaviour of the population.”

      This is also now explicitly mentioned in the following paragraph:

      “Unlike the mean membrane potential ⟨V⟩ and the firing rate (r), which can be explicitly derived from the continuity equation under the Lorentzian assumption, the expression for ⟨n(t)⟩ in Eq. (26) is formal. In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore, ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”

      (10) Considering also the comments reported above, I think that it would make more sense to start from an Izhikevich neuron model as microscopic model and add the equations for the ionic currents as mesoscopic variables (i.e. written as population average variables), instead of starting from the Hodgkin-Huxley single neuron model and trying to make hardly justifiable approximations and simplifications.

      We respectfully disagree. While the Izhikevich model is computationally efficient, it lacks the biophysical detail required to capture key ion-driven mechanisms such as depolarization block, slow ion accumulation, and specific burst-initiation dynamics all of which are central to our study. The Hodgkin–Huxley framework, despite requiring approximation, provides the necessary physiological grounding to link microscopic ion exchange with emergent population behavior.

      (11) Sec. 2.7 What is the advantage of using six more parameters to fit, like R-,R+,c-,c+,I-,I+?

      This is in contradiction with the spirit of deriving a mean-field model, where the number of parameters should be reduced. What is the advantage of this mean-field derivation with respect to other mean-field derivations of Hodgkin-Huxley neurons, like the one in Reference [9]?

      The additional parameters (R±, c±, I±) are not arbitrary they compactly parametrize the cubic-like nonlinearity of the membrane potential dynamics in our stepwise-quadratic approximation. This trade-off allows us to preserve essential biophysical features of HH neurons (e.g., bursting regimes, depolarization block) within a tractable analytic framework. Compared to alternative approaches like in ref. [9], which focus on phenomenological reductions and do not yield an ODE system, our model offers more direct interpretability in terms of ion dynamics, providing a closer link between microscopic mechanisms and mesoscopic activity patterns.

      (12) Sec. 2.11 The derivation of the mean-field dynamics for the gating variable is rather heavy and difficult to follow. This section could be simplified, whilst also better explaining the underlying approximations and the validity of these approximations, which is currently missing.

      We agree that the derivation is technical, but we chose to retain it for transparency, as it follows the Chen and Campbell approach and makes key approximations such as moment closure explicit. We have now added a clarification that n is treated as a collective variable We hope that the current level of detail helps readers understand the assumptions underlying the gating variable dynamics.

      (13) Sec. 2.12 The derivation of Eqs. (36) is quite confusing and needs to be re-written in a clearer form. Why are both the variables x and r present in these equations, since they are proportional according to Eq. (25)?

      We thank the reviewer for pointing this out. We have adjusted the equations to improve clarity and now consistently express the firing rate in terms of a single variable. This removes the redundancy and simplifies the presentation.

      (14) Sec. 2.13 The derivation of Eqs. (37) is quite confusing and needs to be rewritten in a clearer form.

      Both the auxiliary variable x and the firing rate r are present in this equation, the same as in Eq. (36). Therefore it is presented as a set of equations for the auxiliary variable x and for the physical variable V. Moreover in the equation for dV/dt, the quadratic term in V has disappeared and it is not clear to me which are the variables corresponding to I- and I+. In particular, in Eqs. (36) there are two different current terms I-,I+ for the two equations related to dy/dt. In Eqs. (37) there is a single term (I_{cl} +I_{Na}+I_K+I_{pump})/C_m which is identical for both equations related to dV/dt. I was expecting two different terms also in Eqs. (37).

      We appreciate the reviewer’s close reading. To improve clarity, we now express the dynamics in terms of the firing rate r, replacing \dot{x} with \dot{r} in both Eq. (36) and Eq. (37) to avoid confusion.

      As for the current terms: in Eq. (37), we reverse the stepwise quadratic approximation and reintroduce the original ionic currents from Eq. (16). This is why the expressions involving I_{\text{cl}}, I_{\text{Na}}, I_K, and I_{\text{pump}} appear as a single summed term in \dot{V}, rather than the split I_-,I_+ terms used in the stepwise approximation. We now clarify this in the text.

      We also write V as \bar{V} to clarify that it refers to the average membrane potential for the neuronal population. Finally, we wrote the final equation in a more compact form to improve clarity (new Eq.38).

      (15) Moreover, while the equation for the gating variable n can be considered as a differential equation for a mesoscopic variable since n depends on average values only, it is not clear to me if the remaining variables 𝛥[K+]_{int}, [K+]_g can be considered mesoscopic or not. Since Eqs. (37) represent a mean-field model, I expect every variable to be a mean-field variable. This could be easily achievable for the extracellular potassium concentration, but I do not understand how a site-specific microscopic variable like the intracellular potassium concentration variation can be automatically inserted in a set of mean-field equations without any averaging or intermediate steps. This is a crucial point to be clarified for the validity of the neural mass equations.

      We thank the reviewer for raising this important point. In our model, we assume spatial homogeneity at the mesoscopic scale, meaning that ion concentrations — both intra- and extracellular — are uniformly distributed across the population. As a result, variables such as \Delta[K^+]_{\text{int}}, Δ[K+]int and [K+]g are treated as population-level averages, consistent with the mean-field framework.

      Moreover, the rate of change of intracellular potassium is tightly coupled to extracellular dynamics via ion exchange mechanisms, justifying its inclusion as a slow, mesoscopic variable. We now clarify this modeling assumption explicitly in the text.

      “By locally homogeneous, we mean that all neurons in the population are assumed to share the same extracellular and intracellular ionic environment and are connected with identical coupling rules, allowing us to treat the population as uniform with respect to ion dynamics and connectivity.”

      “These slow variables are in addition considered to be mesoscopic, meaning they are identical for every neuron in the population.”

      Minor points:

      (1) Figure 2, panel d. Please detail the variable on the y-axis, which is not reported in the figure.

      Done

      (2) Eq. (15) is cited in many parts of the manuscript, while it seems to me it would be more appropriate to reference Eq. (2). Is this a mistake or is there a reason to cite Eq. (15)?

      The reviewer is correct, we have had a wrong equation label, which we have now corrected.

      (3) Figure 4 Would it be possible to show enlargements of the mean membrane potential traces to directly compare the different bursting types shown by the simulation of the different models?

      The panel d already contains enlarged part of the membrane potential traces. For the rest, going back to the Q6, we want to stress again that our intention is not to claim a quantitative match between the experimental and simulated traces.

      (4) Figure 5 In the caption the author refers to "the generic model, single neuron model, and epileptor model". Could you please better explain the models referred to and why they are mentioned? Are the generic model and the single neuron model those that are presented in the Materials and Methods section? Or do you refer to completely different models, as for the epileptor?

      We have removed the reference to the generic model (we had in mind the canonical model for seizures by Saggio et al. 2017), since it is not mentioned in the paper, and we have clarified that the single neuron model and epileptor model, which were used to simulate seizure like events.

      (5) Sec 2.5 As already stated above, the authors need to reduce unnecessary formulations that confuse the reader. Here, for example, Eqs. (6) and (7) are unnecessary, in view of the fact that delta spikes are used (Eq. 8).

      We thank the reviewer for the suggestion, but we disagree, and we think it is better to start the derivations from the more general case, as done with Eqs. 6-7.

      (6) Sec. 2.6 Could you please better explain why in Eqs. (15) and (16), the variable V0 is introduced, while before and after this, the variable V is used?

      We thank the reviewer for the comment. In Eqs. (15) and (16), \dot{V}_0 denotes the free term of the membrane potential equation, i.e., the component driven solely by the intrinsic ionic currents and excluding the synaptic input I_syn. Only this \dot{V}_0 term (a function rather than an independent variable) is approximated by the piece-wise quadratic expression in Eq.(21). In contrast, the variable V represents the membrane–potential variable, which dynamics is obtained by combining \dot{V}_0 with the synaptic current contribution I_syn. In summary, there is no independent variable V_0; only the function \dot{V}_0 is introduced to represent the intrinsic (non-synaptic) component of the membrane–potential dynamics. We have now clarified this in the text.

      (7) In the square brackets of the r.h.s. of Eq. (18), for all the intermediate steps, it appears G^n(V,n) ϱ^V, while there should be G^n(V,n) ϱ^n.

      We thank the reviewer for catching this typo. We have corrected this in the revised manuscript.

      (8) Sec. 2.8 Here the authors affirm that "a double-Lorentzian (or a piece-wise Lorentzian) could be a suitable form for ρ^V (t, V | η). However, it is not clear under which conditions such an assumption would allow a solution to the continuity equation". What are the problems underlying the implementation of the double Lorentzian? It seems to be a more correct form than the single Lorentzian actually implemented.

      We thank the reviewer for this thoughtful question. In principle, a double-Lorentzian ansatz for \rho^V can indeed be implemented in several reasonable ways–for example, by enforcing that the combined area of the two Lorentzian components is normalized to one (to preserve the probabilistic interpretation) and by imposing smoothness constraints at their boundaries. However, despite exploring these implementations, we were unable to obtain non-trivial solutions of the continuity equation under this parametrization. The only solvable case we found is the degenerate one in which the two Lorentzians collapse onto each other (i.e., (x_- = x_+) and (y_- = y_+)), which reduces the ansatz to the single-Lorentzian form used in the manuscript. For this reason, although the double-Lorentzian is conceptually appealing, it did not yield practically useful solutions within our framework.

      (9) Eq. (28). The symbols used for the flux (especially those used in the second-to-last step once the inner integration is performed) are confusing and it is difficult to understand what they mean.

      We thank the reviewer for noting this issue. The problem was due to a LaTeX typo that prevented the vertical lines—indicating that the flux is evaluated at specific points—from rendering correctly. We have now corrected this.

      (10) Eq. (29) In the third step there are some misprints that impair comprehension.

      We thank the reviewer for noting this. We have corrected these misprints in the revised version.

      (11) Line 696. The reference is not displayed.

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      As a really general remark, this manuscript is written in a confusing manner, the authors present their model in a general formulation and their analysis in a complicated way that in the end is not needed, as I will explain in detail in the following.

      Another general question is why the authors want to employ the neural mass reduction methodology developed in [23] to obtain exact mean-field evolution for quadratic neurons (like the quadratic integrate and fire (QIF)) for a model that reveals a cubic dependence on the membrane potential, as the FizhHugh-Nagumo neuron (that indeed is a 2d reduction of the Hodkgin-Huxley model), to obtain an approximate neural mass model that somehow works qualitatively only for synchronized dynamics? Why not use another approach more suited to derive the neural mass model for cubic nonlinearity, as the one suggested in [33] and [69] by Di Volo and co-authors? What is the rationale behind the choice of the authors?

      We appreciate the reviewer’s critical feedback and the opportunity to clarify our methodological choices. Our decision to base the mean-field model on Hodgkin–Huxley-type neurons stems from the need to retain ion channel dynamics, which are essential to capture the coupling between membrane activity and extracellular ionic concentrations. This biophysical link is central to our study and cannot be achieved using more abstract neuron models such as QIF or FitzHugh-Nagumo alone.

      Regarding the mean-field reduction method: while the Ott-Antonsen/Lorentzian framework is indeed exact for QIF neurons, we adopted a stepwise quadratic approximation to apply a similar formalism to the cubic-like dynamics of the HH model. This choice enables us to analytically capture a rich set of behaviors, including bursting, depolarization block, and seizure-like dynamics, in a tractable mean-field system.

      We considered the approach of Di Volo and colleagues [33, 69], but their methodology is tailored to asynchronous irregular regimes, whereas our model is specifically designed to capture dynamics in quasi-synchronous or bursting regimes — including epileptiform activity — which are not covered by the assumptions of the Di Volo framework.

      We now clarify these modeling choices more explicitly in the revised manuscript.

      "Unlike phenomenological or reduced models, the Hodgkin–Huxley framework allows us to retain explicit ion exchange dynamics, which are essential for linking membrane behavior to extracellular potassium fluctuations. This level of biophysical detail is crucial for modeling pathological regimes such as seizure onset and propagation."

      Furthermore, the derivation of the neural mass equations is unnecessarily complicated, as a matter of fact, they approximate all the variables (except the membrane potentials of the single neurons) as collective variables (i.e. the gating variable and the potassium concentration) common to all the neurons. The neural network model for which they derive the neural mass model presents microscopic evolutions of the membrane potential cubic-like plus other global variables equal for all neurons, that depend on collective variables such as the mean membrane potential or the mean firing rate. Once clarified, the derivation of the neural mass model is much simpler, and it is not necessary to follow the approach reported in Reference [95] [Chen, L. & Campbell, S. A. Exact mean-field models for spiking neural networks with adaptation. Journal of Computational Neuroscience 50 (4), 445-469 (2022)] which is unnecessarily complicated. The authors can follow a much simpler methodology as explained by Guerriero et al in Reference [R6] (cited below) where the authors consider the same model studied in [95]. Such a methodology has been applied in many cases already, to introduce realistic aspects in the neural mass model [23] (see References [R1-R7] below). I strongly encourage the authors to reformulate their approach in a simpler and clearer manner, by following the approach reported in [R1-R7]. The manuscript will become more readable and it will gain in comprehension.

      We thank the reviewer for this helpful suggestion. We agree that, given the assumptions made in our derivation (i.e., shared gating and ion concentration variables across neurons), the mean-field equations could alternatively be obtained using the simpler methodology proposed by Guerriero et al. [R6] and related works [R1–R7]. However, we chose to follow the derivation presented by Chen and Campbell [95] because it makes the approximations (e.g., moment closure, flux boundary assumptions) explicit and generalizable to future extensions. However, we also acknowledge that the assumption of n to be treated as a collective variable is needed, and for clarity, we have now added a remark in the manuscript indicating that the same result could be recovered more directly using the approach of Guerriero et al.

      “We note that, under the assumption of globally shared gating and ion concentration variables across the neuronal population, the resulting mean-field equations can also be derived using simpler methods as proposed by Guerriero et al [58]. In this work, we follow the more general formalism of Chen and Campbell [59], which makes the role of key approximations (e.g., moment closure, vanishing flux at boundaries) explicit. This also facilitates potential generalizations to settings with partial heterogeneity or dynamic gating distributions.”

      “Finally, putting together these factors and assuming that n can be treated as a collective variable for each neuron”

      “Unlike the mean membrane potential ⟨V⟩ and the firing rate (r), which can be explicitly derived from the continuity equation under the Lorentzian assumption, the expression for ⟨n(t)⟩ in Eq. (26) is formal. In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore, ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”

      Now I will examine in detail all the manuscript and report comments/remarks/suggestions numbered as (Q#) on how to improve the present manuscript to render it easier to read and more comprehensible, these are not minor remarks, just detailed ones.

      Introduction

      (Q1) The Introduction section needs a part devoted to the reduction methodology developed in [23] for QIF neurons and a presentation of previous works dealing with the introduction of biologically realistic aspects in the neural mass model derived in [23]. Here is a non exhaustive list of such papers concerning the introduction of the following realistic aspects in the neural mass developed in [23]:

      (I) short-term synaptic plasticity :

      [R1] Exact neural mass model for synaptic-based working memory H Taher, A Torcini, S Olmi, PLOS Computational Biology 16 (12), e1008533 (2020)

      [R2] Bursting in a next generation neural mass model with synaptic dynamics: a slow-fast approach H Taher, D Avitabile, M Desroches, Nonlinear Dynamics 108 (4), 4261-4285 (2022)

      [R3] Mean-field approximations of networks of spiking neurons with short-term synaptic plasticity R Gast, K Thomas R, H Schmidt, Physical Review E 104 (4), 044310 (2021)

      (II) spike frequency adaptation:

      [R4] Gast, Richard, Helmut Schmidt, Thomas R. Knösche. "A mean-field description of bursting dynamics in spiking neural networks with short-term adaptation." Neural computation 32.9 (2020): 1615-1634.

      [R5] Population spiking and bursting in next-generation neural masses with spike-frequency adaptation, A Ferrara, D Angulo-Garcia, A Torcini, S Olmi, Physical Review E 107 (2), 024311 (2023).

      (III) conductance-based neuron with a slow current (Izekievic model):

      [R6] A new generation of reduction methods for networks of neurons with complex dynamic phenotypes,IC Guerreiro, M Di Volo, B Gutkin, preprint arxiv: 2206.10370 (2022)

      (IV) spike timing-dependent plasticity:

      [R7] Mean-field approximations with adaptive coupling for networks with spike-timing-dependent plasticity, B Duchet, C Bick, Á Byrne, Neural computation 35 (9), 1481-1528 (2023).

      (V) random connectivity and noise:

      [R8] Mean-field models of populations of quadratic integrate-and-fire neurons with noise on the basis of the circular cumulant approach

      DS Goldobin Chaos: An Interdisciplinary Journal of Nonlinear Science 31 (8) (2021)

      [R9] A reduction methodology for fluctuation-driven population dynamics DS Goldobin, M Di Volo, A Torcini, Phys. Rev. Lett. 127, 038301 (2021)

      [R10] Shot noise in next-generation neural mass models for finite-size networks VV Klinshov, SY Kirillov Physical Review E 106 (6), L062302 (2022)

      I think the authors should refer in the introduction to these previous papers, where realistic biological aspects have been already introduced in the neural mass model developed in [23].

      We have added a whole pragaraph devoted to the next-generation neural mass models and in particular to the other works introducing biological realism in this class of models:

      “Recently, a class of these models, called next-generation neural mass models [42], has been developed based on an analytical approach introduced by [25] that allowed for the exact derivation of mean field parameters for a population of quadratic integrate-and-fire (QIF) neurons. These can be linked to EEG/MEG oscillations [43], including epipeltic seizures [44], and have been used to study various aspects of the whole-brain dynamics such as the low-dimensional manifold of the resting state [45, 46], aging [47] and neural sig natures of consciousness [48]. Number of works dealt with the introduction of biologically realistic aspects in the mostly phenomenological neural mass model derived in [25]. These included short-term synaptic plasticity [49–51], spike frequency adaptation [52, 53], spike timing-dependent plasticity [54], synaptic delay [29], random connectivity and noise [55–57], as well as an extension of the conductance-based neurons with a recovery variable [58–60].”

      (Q2) Line 117 - Please specify what you mean by locally homogeneous, here.

      Thank you for allowing us the opportunity to clarify this. We now report:

      "By locally homogeneous, we mean that all neurons in the population are assumed to share the same extracellular and intracellular ionic environment and are connected with identical coupling rules, allowing us to treat the population as uniform with respect to ion dynamics and connectivity."

      (Q3) In this sub-section the authors should clarify all the hypotheses they employ to derive the neural mass models, not only the Lorentzian approximation they did for a cubic model, but also the fact that they assume that the gating variable n is a global variable as well as that the potassium concentration are assumed to be the same for all neurons, that they assume no heterogeneity at this level. This is a fundamental aspect that should be clarified at this stage already.

      We thank the reviewer for this important observation. We agree and have revised the text in the derivation section to explicitly state all key assumptions. Specifically, we now clarify that:

      (1) The gating variable n is treated as a population-average (global) variable;

      (2) The potassium concentrations Δ[K+]int and [K+]g are assumed to be homogeneous across the neuronal population; and (3) No heterogeneity is assumed at the level of the ion dynamics.

      This assumption is biophysically motivated: ion concentrations — particularly extracellular potassium — tend to redistribute rapidly due to diffusion and electrochemical forces, leading to an effectively well-mixed environment at the mesoscopic scale. As such, assigning separate compartments to individual neurons is not justified in this modeling context. We now explicitly note this in the manuscript to avoid ambiguity.

      “3) We assume that the potassium concentrations, both intracellular(\( \Delta[K^+]_{\text{int}} \)) and extracellular (through the buffering variable \( [K^+]_g \)), are homogeneous across the neuronal population. This is justified physiologically by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforce near-instantaneous equilibration at the mesoscopic scale. As such, assigning separate compartments to each neuron is neither practical nor biologically meaningful in this context. We assume that the potassium concentrations, both intracellular (\( \Delta[K^+]_{\text{int}} \)) and extracellular (through the buffering variable \( [K^+]_g \)), are homogeneous across the neuronal population. This is justified physiologically by the rapid redistribution of ions through diffusion and electrochemical gradients, which enforce near-instantaneous equilibration at the mesoscopic scale. As such, assigning separate compartments to each neuron is neither practical nor biologically meaningful in this context; 4) We assume that the gating variable n, which governs potassium conductance, can be treated as a population-averaged variable. This allows us to describe the neuronal ensemble using a reduced set of collective (mean-field) variables.”

      Comparison with neural network simulations

      (Q4) The comparison the authors perform between the microscopic model and the neural mass is misleading, From what the authors wrote it seems that you are considering 4 variables for each neuron in the network model (this is unclear from how the model is written in Eq (9)), I guess one for the membrane potential, one for the gating variable and two for the potassium concentration. However, this is not the network model for which the neural mass has been developed, the neural mass has been obtained for a network made of N + 3 variables (N membrane potentials and 3 collective variables for gate, and potassium concentrations) this is a sort of mesoscopic network models, analogously to what done previously in references [R1,R3,R4] above and others. If the authors would compare their neural mass with this mesoscopic model the agreement among the two would be improved.

      We agree with reviewer’s observation and we now acknowledge this issue in the Results and in the Limitations. We have already modified the text to explicitly state that for the mean filed derivations n is treated as a collective variable and we have added the following statements:

      “Also note that the gating variable n is treated as microscopic in the neural network, while in the derivations for the mean-field it is considered as a mesoscopic and identical for the whole population. This is likely responsible for some of the discrepancies between the two modalities.”

      “Moreover, the discrepancy between the two modalities would have likely been smaller if for the neural network we also adopted a gating variable that is mesoscopic and identical across the spiking neurons, as in similar works [49–51]. However, here we demonstrate the validity of the mean-field approximation even for the more natural, microscopic representation of the gating variable in the neural network.”

      Comparison with in vitro experiments

      (Q5) Experiment -- The experiment is performed in vitro on the intact Hippocampus of mice between postnatal days P5-P7. It is known [R1] that neuronal activity at an early developmental stage is provided in the Hippocampus by a network primarily driven by synchronized GABA_A that provides an excitatory action and generates giant depolarizing potentials (GDPs) [R11]. However, GDPs have frequencies in the range of 1 Hz - 0.1 Hz, not matching the oscillation frequencies reported by the authors. I have several questions here:

      (E1) At this stage P5-P7 are the interactions among neurons essentially excitatory? Or not, please explain why, Are the oscillations reported by the authors somehow related to GDPs? The depolarizing action of GABAergic transmission and the presence of GDPs during early rodent brain development, as described by Ben-Ari and some others researchers, are characteristics commonly observed in ex vivo brain preparations, but are not evident under physiological in vivo conditions (see doi: 10.3389/fphar.2012.00065).

      In our preparation—intact mouse hippocampus—GABAergic synaptic transmission is not depolarizing. This is evidenced by the fact that inhibition of ionotropic GABA_A receptors with bicuculline triggers interictal-like discharges, which are routinely used as a model of epileptiform activity (see doi: 10.1016/j.nbd.2014.12.013). Therefore, in our experiments at P5–P7, neuronal interactions are not purely excitatory, and the observed low Mg2+ induced oscillations are not related to GDP.

      (E2) What is the nature of the oscillations reported by the authors in Figure 4 ? Which is their origin, please explain in the text of the paper clearly.

      The model of epileptic discharges presented in our study was first introduced over 20 years ago and has since become a well-established paradigm for screening potential antiepileptic drugs and research on the mechanism of epileptic seizure. A detailed description of this model can be found in doi: 10.1046/j.1460-9568.2002.02143.x, and its pharmacological properties are reviewed in doi: 10.1046/j.1528-1157.2003.19503.x. These references have now been added to the manuscript for clarity.

      We have added the following:

      “The model of epileptic discharges presented in our study was first introduced over 20 years ago [115] and has since become a well-established paradigm for screening potential antiepileptic drugs and research on the mechanism of epileptic seizure [116].”

      (E3) How exactly does the concentration of extracellular potassium ions change, this is not clear even in Methods, please clarify.

      [R11] Excitatory actions of GABA during development: the nature of the nurture Y Ben-Ari, Nature Reviews Neuroscience 3 (9), 728-739 (2002).

      We have now added a new Subsection in the methods explaining how we use Mg2+ variation to influence the external potasium variation.

      “The membrane of hippocampal neurons is equipped with N-methyl-D aspartate type glutamate receptors (NMDARs). These receptors have a very high affinity for glutamate and can, in principle, be activated by ambient glutamate present at low concentrations in the brain extracellular fluid (ECF).Under normal physiological conditions, this activation does not occur because extracellular magnesium ions (Mg<sup>2+</sup>) block the NMDAR channel at membrane potentials more negative than about –50 mV; this voltage-dependent block prevents receptor activation at rest. When extracellular magnesium is removed, the block is relieved, allowing NMDARs to be activated, leading to neuronal depolarization toward the action potential threshold [117]. In addition, as a divalent cation, Mg<sup>2+</sup> interacts with the negatively charged neuronal membrane, contributing to the stabilization of the resting membrane potential. Lowering extracellular magnesium concentration disrupts this effect, resulting in membrane depolarization [118]”

      “Consequently, magnesium removal not only facilitates NMDAR-dependent depolarization, but also directly depolarizes neurons. This depolarization increases the driving force for outward potassium currents through K<sup>+</sup> channels, meaning that variations in Mg<sup>2+</sup> can indirectly influence external potassium dynamics during neuronal activity.”

      (Q6) Lines 187-191 and Figure 4 -- The authors wrote : "In Figure 4.c we show the membrane potential and external potassium for a simulation of N = 3000 coupled HH-like neurons showing a similar behavior, although the parameters were modified to simulate shorter fluctuations for computational efficiency." This sentence is unclear. What is clear from Figure 4 is that the network simulations gave rise to collective oscillations on a completely different scale seconds with respect to minutes and also the profile of the potassium concentration has a clearly different evolution. From Figure 4 one can conclude that network simulations have nothing to do with the neural mass evolution and the experiment. I think the authors should better clarify and describe the results reported in Figure 4.

      We thank the reviewer for the observation. We have revised the relevant section of the manuscript to clarify the interpretation of Figure 4 and avoid any implication of quantitative matching. As stated in our response to Reviewer 1 (comment 6), the comparison is intended to highlight the shared qualitative structure across experimental data, the neural mass model, and the network simulation — specifically, the modulation of fast bursting by slow extracellular potassium fluctuations. The difference in timescale in the network simulation arises from rescaled parameters used for computational efficiency. We now explicitly state this and have updated the figure caption and accompanying text accordingly to reflect these points.

      (Q7) Why do the authors consider a purely excitatory network to describe the experimental results? What is the reason for this choice? Why they do not consider as usual balanced excitatory- inhibitory networks? Please clarify this point.

      We thank the reviewer for raising this point. We chose to model a purely excitatory network as a first step in isolating the role of extracellular potassium dynamics in generating population-level bursting. This allows us to focus on the ion-driven modulation mechanisms without introducing additional complexity from inhibitory feedback. Similar modeling choices have been made in previous studies of bursting and seizure-like dynamics (e.g., Gutkin et al.,), where inhibition is omitted to emphasize intrinsic or modulatory mechanisms. We acknowledge that incorporating inhibitory populations is an important next step for capturing a broader range of dynamics, but for the current study, the excitatory-only network provides a minimal and interpretable framework aligned with our focus.

      (Q8) By comparing Figures 4 (a) and (b) it seems that the bursting activity observed in the experiment and in the mean-field simulations seem quite different, originating from different mechanisms and bifurcations, Can the authors comment on this?

      We thank the reviewer for this important observation. We have reorganized the presentation of Figure 4 and revised the accompanying text to better clarify the nature of the comparison (see also our response to Reviewer 1, point 6). Our aim is not to claim that the experimental and simulated bursts arise from identical bifurcation mechanisms, but rather to highlight shared qualitative features — in particular, slow modulation of population activity by extracellular potassium. We now also comment on the potential role of more complex or noise-driven bifurcations (see Saggio et al. 2020) in shaping experimental bursting dynamics, which are not fully captured by the current deterministic model.

      Bifurcation analysis: emergent network states and multistability

      (Q9) This sub-section will gain interest by reporting simulations of the network and of the neural mass model presenting bistable dynamics.

      We agree with the reviewer that this would be an important addition, but we believe that it goes beyond the scope of this work (for the computational reasons among others) and it remains for future work. We have however updated the bifurcation analysis section.

      Limitations of the model

      (Q10) Lines 276- 280 -- I think that the parameters c+,c_,R+,R_ depend not only on the slow variables, potassium concentrations but also on the actual value of the gate variable n. This should be stressed.

      We thank the reviewer for this helpful observation. We agree and have clarified in the revised manuscript. This reflects the mean-field assumption that n is treated as a collective variable, and we now make this dependency explicit in the text.

      “Furthermore, the parabola coefficients c_-,c_+, R_-, R_+ were fixed as constants, however, these coefficients could be made functions of the slow variables and the gating variable, which might unveil new dynamical regimes and extend the validity of the thermodynamic limit beyond the regimes described in this work. Also, in the case of constant values, an in-depth exploration of the parameter space is required to fully characterize the model and its bifurcation structure.”

      (Q11) The authors wrote: " Other limiting assumptions are the moment closure condition (19) and the assumptions that the functions (3) averaged across the neuronal population can be expressed as functions of the average membrane potential V and gating variable n (which is only true in the cases where the functions (3) can be reasonably approximated as linear functions in a range of V and n." Apart from that a parenthesis is lacking, I think that this last aspect has been already taken into account when performing the fit with 2 parabolas to the sum of the currents, or not? In case, please specify.

      We thank the reviewer for catching the missing parenthesis — this has been corrected in the revised manuscript. Regarding the modeling point: the two-parabola fit applies specifically to the membrane potential dynamics and captures the nonlinear dependence of the total current on V (eq.16). In contrast, the moment closure assumption involves approximating averages of nonlinear functions of both V and n, such as those appearing in the gating dynamics (e.g., n∞(V)). This is not directly accounted for by the parabola approximation, but is handled separately via the mean-field approximation of G^n as a function of the average variables (eq.15).

      (Q12) A limitation that should be stressed is that the authors in the neural mass model consider the gate variable and the potassium concentrations, as global variable equal for all neurons, and where n depends on the mena membrane potential, to write that the moment closure (19) is a limiting assumption is honestly too clear, please be explicit here.

      We have now the following two statements:

      “These slow variables are in addition considered to be mesoscopic, meaning they are identical for every neuron in the population.”

      “In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore, ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”

      Discussion

      (Q13) The authors could discuss in this section the further biological ingredients they can introduce in their neural mass based on the previous works [R1-R9] that have already shown how to include plastic synapses, random connectivity, noise, adaptation, spike-timing-dependent plasticity, etc and which of these ingredients they consider more relevant for the whole brain dynamics.

      In order not to repeat the same statements from the Introduction, we have now addded the following sentence:

      “This approach, taking into account key biophysical details, offers a first step in considering the role of the glia in neural tissue excitability. Following this direction, other ions, such as calcium should be taken into consideration, as well as other effects such as plastic synapses, random connectivity, noise, adaptation, spike-timing-dependent plasticity, as already discussed in the Introduction.”

      (Q14) The authors should also discuss why they limited their analysis to purely excitatory networks, and what would change by including excitatory-inhibitory interactions in each single mass and across neural masses, if this makes sense or not.

      As stated in our response to Q7, we chose to focus on purely excitatory networks as a first step to isolate and study the core role of extracellular potassium dynamics in driving bursting behavior. This modeling choice allows for a minimal system where the interaction between intrinsic ionic mechanisms and network coupling is most transparent.

      We also note that excitatory and inhibitory effects can be modeled within the same formalism by adjusting the synaptic reversal potential — for example, $E_{syn}=0$mV for excitatory, and $E_{syn}=-80$mV for inhibitory interactions. Including inhibitory populations would introduce additional complexity and richer dynamical regimes (e.g., oscillatory instabilities, balance states), which are certainly of interest but beyond the scope of this study.

      Materials and Methods

      (Q15) Fig.2 - I think a plus is lost in panel (c) where it should be [K+bath];

      Thank you. We corrected the figure.

      (Q16) Caption of Figure 2- the authors wrote: "In the case where the derivative of the membrane potential is zero for V > V ⋆ (e.g., if the cubic function is shifted up by adding a constant current to the membrane potential derivative), the population is described by the red distribution in the steady state, and the continuity equation is governed by the negative parabola equation." This sentence is unclear, the authors mean in the case where the derivative of the membrane potential crosses zero at V > V*? Please clarify.

      We thank the reviewer for pointing this out. Yes, we refer to the case where the membrane potential derivative crosses zero at a point V>V∗. We have clarified this in the revised figure caption.

      (Q17) Lines 558-562 -- Eqs (6) and (7) are examples of unnecessary complications of which this manuscript is full of. Since the authors do not consider any synaptic dynamics and homogenous (equal) couplings, these equations are not needed, I strongly recommend removing Eqs (6) and (7) and limiting to the expression reported in Eq (8), which indeed should also be corrected see next remark.

      We appreciate the reviewer’s concern regarding clarity. As mentioned in our response to Reviewer 1, the inclusion of Eqs. (6) and (7) was intentional and serves a pedagogical purpose — to present the general structure of the network interactions before introducing simplifying assumptions. While we agree that Eq. (8) suffices for the simulations considered in this manuscript, we believe that showing the more general form helps clarify the model’s extensibility, for instance to cases with heterogeneous coupling or synaptic dynamics.

      (Q18) Eq (8) - line 562 - Since the authors assume no synaptic evolution, i.e. instantaneous post-synaptic potentials, they can clarify that Eq (8) represents the population firing rate that later will be one of the fundamental variables of the neural mass model and call it r, as in the following. Furthermore, $s_i$ does not depend on the neuron index $i$ in a fully coupled network with homogenous coupling, as in the present case, this quantity is the same for all neurons. Please drop the index and call it r since it is the population firing rate.

      We thank the reviewer for this useful suggestion. We now clarify in the text that under the assumptions of all-to-all homogeneous coupling and no synaptic dynamics, s_i is identical for all neurons and can be interpreted as the population firing rate r. This connection is made explicit in the revised manuscript.

      “Under the assumption of instantaneous synaptic transmission and homogeneous all-to-all coupling, the synaptic activation variable (s<sub>i</sub>) is the same for all neurons and corresponds to the population firing rate, which we denote by (r)”

      (Q19) Line 564-567 - Here the network model is incomplete, it is not sufficient that the authors report the evolution equation for the membrane potential Eq (9). They should report the evolution equation for the gate variable n and for the potassium concentration as done in Eq (1). This request is fundamental because it is unclear from the present formulation which are the variables that are microscopic (associated with the single neuron evolution) and which are global (common to all the neurons). This is a fundamental aspect and it should be clarified. I guess that n will depend on the neuron index $i$, while the potassium concentration it is unclear how the authors will consider them, global or local. I guess that the internal density should depend on the neuron index $i$ or not ? Anyway, I would like to know exactly which network model has been simulated e.g. to obtain the results reported in Figure 3.

      We thank the reviewer for this essential clarification request. In the revised manuscript, we now explicitly state the full network model, including the evolution equations for the gating variable n_i and potassium variables. While in some simulations we consider the full microscopic model involving 4N variables (where each neuron has its own V_i ,n_i ,Δ[K+]int_i ,[K+]g_i), for the mean-field reduction and mesoscopic comparisons we assume that the gating and potassium variables are shared across neurons. This assumption is consistent with prior work (e.g., Chen & Campbell) and is biophysically justified in the case of potassium due to its fast spatial equilibration in extracellular space. We also now mention this explicitly in the Limitations.

      (Q20) Continuity equation - Lines 568 - 597 - This part can be largely simplified and rewritten, as a matter of fact, the authors consider the gate variable n, the potassium concentrations as global (collective variables) depending on mean field values of <V> they can directly start from eq 20, by stating that they assume that the other variables (n, $\Delta[K^+]_{int}$, $[K^+]_g$) are collective variables, common to all the neurons, and that depends only on mean field variables as <V> or r. This has been done in many previous cases since the Ott-Antonsen Ansatz can be applied whenever the potential evolution is driven by quadratic terms and in the presence of mean field variables, the first indication of this was reported in 1993 by Watanabe and Strogatz for phase oscillators :

      [R12] Watanabe, Shinya, and Steven H. Strogatz. "Integrability of a globally coupled oscillator array." Physical review letters 70.16 (1993): 2391.

      Anyway, this approach has been previously employed to derive a neural mass model for networks of QIF neurons in the presence of various further neuronal variables (ranging from slow currents to plastic evolution of the couplings) describing more biologically realistic situations, see references [R1-R7] above. I strongly encourage the authors to reformulate their approach in a simpler and clearer manner, particularly interesting is for them the article [R6] by Guerriero et al, the authors examine exactly the same model as in Ref [95] [Chen, L. & Campbell, S. A. Exact mean-field models for spiking neural networks with adaptation. Journal of Computational Neuroscience 50 (4), 445-469 (2022)]. However, they solve the problem in a much more simple way, I encourage the authors to follow this approach.

      We thank the reviewer for the constructive suggestion. We acknowledge that, under the assumption that n, Δ[K+]int , and [K+]g are collective variables shared across the neuronal population, one could directly begin from Eq. (20) and proceed using the simpler approaches found in Guerriero et al. [R6] or related works [R1–R7]. However, we chose to retain the Chen & Campbell formalism, with additional clarification regarding the mesoscopic nature of the gatin variable, as it explicitly highlights the key approximations used in the derivation, which may be beneficial for readers seeking to extend the method. See also general response to reviewer 2 at the beginning.

      (Q21) Eq (26) -- I do not think the authors can estimate explicitly <n(t)> from the equation (26), as they do for the mean membrane potential and the firing rate. This is just a formal expression representing a collective variable, I do not think that <n> will coincide with the average of the values of n_i for each neuron. Please discuss this point, and in this case show that <n> indeed coincides with the average of all of the values of the single neuron gate variable n_i.

      We thank the reviewer for raising this important point. We agree that Eq. (26) is more formal than operational, as ⟨n(t)⟩ is not directly derived from the continuity equation in the same way as ⟨V⟩ or the firing rate r. Rather, it reflects our mean-field assumption that the gating variable evolves as a collective population-averaged quantity, governed by the dynamics of the average membrane potential. In our formulation, n is treated as a global variable shared across neurons, and thus ⟨n(t)⟩ effectively is the gating variable in the neural mass model — rather than the result of averaging heterogeneous n_i. We have clarified this distinction in the text to avoid suggesting that Eq. (26) provides an explicit estimate of microscopic gating dynamics.

      “Unlike the mean membrane potential ⟨V⟩ and the firing rate (r)>, which can be explicitly derived from the continuity equation under the Lorentzian assumption, the expression for ⟨n(t)⟩ in Eq. (26) is formal. In our mean-field model, the gating variable (n) is treated as a global population variable, evolving deterministically as a function of the average membrane potential. Therefore ⟨n(t)⟩ corresponds to the collective gating variable assumed to be shared by all neurons, and is not computed by averaging distinct microscopic (n<sub>i</sub>) values.”

      (Q22) Mean-field dynamics for the gating variable - All this sub-section is in my opinion not useful, if the authors assume from the beginning that <n(t)> is a global variable. Indeed in the end they write for <n(t)> the evolution equation Eq (30) which is the same equation as for the single neuron gate variable (1) but for the mean values of n and <V>. I suggest removing this sub-section.

      We thank the reviewer for this suggestion. We agree that, under the assumption that n is a global collective variable, the resulting equation for ⟨n(t)⟩\langle n(t) \rangle⟨n(t)⟩ is equivalent in form to the single-neuron gating equation, driven by the average membrane potential. However, we chose to retain this subsection to explicitly demonstrate how the gating dynamics enter into the mean-field formulation, especially for readers less familiar with this type of reduction. This step also mirrors the structure of the derivation used for other state variables in the model and maintains clarity for potential extensions where n may not be strictly global.

      (Q23) Line 696 - here an equation reference is lost.

      Thank you for pointing this out. We have corrected the text and restored the missing equation reference in the revised manuscript.

      (Q24) Eqs (36) -(37) -- Since the variables r and x entered in Eq (36) are essentially the same as Eq (25), apart from a constant R/pi, the use of two different names complicated in a useless manner an already complicated expression, Please decide to use everywhere r or x and then proceed consequently this applies also to Eq (37). This will also allow us to rewrite the equation in x or r in a more compact form.

      As noted in our response to Reviewer 1, point 14, we have revised Eq. (37) to ensure consistency in notation by replacing x with r throughout.

      (Q25) Eq (37) - This equation is written in a manner that is not careful enough, apart from that the authors are passed now from (x,y) to (pi*r/R,V) , therefore they should substitute everywhere x with r. Furthermore, the equation for the derivative of V is confusing, the authors should use the same approximate expression employed in eq (36) that makes explicit the quadratic dependence on V itself, otherwise, I believe that the equation is incorrect.

      In the same response to Reviewer 1, point 14, we also clarified the expression for \dot{V} in Eq. (37), we reintroduced the full current-based formulation (as in Eq. 16), reversing the quadratic approximation used earlier. This is now explicitly stated in the text, and we have improved the equation presentation to avoid confusion.

      (Q26) Eq (37) below line 708 - From this expression, it is clear that the gate variable n and the potassium variables are ruled exactly by the same equations as for the single neuron Eq (1) and that the Lorentzian Ansatz enter only in the rewriting of the evolution of the membrane potentials of the neurons in the network. In the end, the authors are doing exactly the same approximation made by many other authors [R1-R7], that these variables are collective, i.e. they are the same for all neurons, and in particular n=n(V) is a function of the mean membrane potential V. The mean field model that the authors derive corresponds to a microscopic model where the single neurons are heterogenous only in the intrinsic currents $\eta_i$, but they are all driven by collective variables, like n(V) and the potassium variables that are identical for all neurons. This should be clarified.

      We agree with the conclusion by the reviewer, and as seen through the previous responses, we now explicitly acknowledge the fact that n and the two slow variables are considered as a mesoscopic variables for the mean-field derivation, while for the spiking network, n remains microscopic.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors conducted a comprehensive benchmarking and evaluation of co-folding platforms, including AlphaFold3, Boltz-2, Chai-1, and the docking algorithm Dock3.7, which employs a physics-based scoring function that incorporates van der Waals interactions, electrostatics, and ligand desolvation energies. The system of interest was the SARS-CoV-2 NSP3 macrodomain (Mac1), an increasingly popular antiviral target, and the ligand sets comprised 557 unseen ligand poses (keeping the training for these co-folding platforms in mind). Additionally, the authors investigated whether the co-folding models could distinguish true ligands from non-binding small molecules. The study is thorough, with extensive statistical support and consensus across multiple metrics (chemoinformatics for quantifying ligand similarity and efficacy). The questions that the authors aim to address are whether the co-folding models struggle with memorization, whether they can distinguish between a true and a false binder, whether they replicate experimental binding affinities and efficacy, and how they compare to the physics-based docking algorithm (Dock3.7).

      We thank Reviewer 1 for this thoughtful summary of our work.

      Strengths:

      Overall, this is a scientifically solid paper. The work is highly detailed and well executed, featuring thorough data analysis and statistical assessment.

      Weaknesses:

      My main concern is that the study's aim is a bit unclear. Modern benchmarking studies comparing physics-based docking with deep learning-based co-folding approaches (e.g., AF3, Boltz-2, Chai-1, and others) are increasingly expected to go beyond aggregate performance metrics.

      Indeed, we have gone into several examples of failures and successes for each of these methods. As we are not developing these methods ourselves, we also think this dataset will be a valuable contribution for improving them further.

      In addition to rigorous dataset construction, transparent methodology, and appropriate statistical evaluation, high-impact benchmarks typically provide actionable guidance on when each method class is most appropriate, reflecting their distinct inductive biases and practical constraints. Failure-mode analyses that link performance differences to protein flexibility, ligand chemistry, or binding-site characteristics are particularly valuable, as they move comparisons beyond "scoreboard" assessments toward mechanistic understanding.

      Right now, we do not observe meaningful trends that separate the failure modes for any individual method. This is covered in Supplementary Figures 6 and 7.

      While full biological validation is not expected, qualitative interpretation grounded in physical and biological principles strengthens conclusions. Providing reproducible workflows or reference pipelines is not mandatory, but it is increasingly viewed as a best practice because it facilitates adoption and helps contextualize results for practitioners.

      We note that our code is available (https://github.com/jongbin99/Cofolding/) and all structural data will be publicly accessible in the PDB alongside publication (we only held it back only for “blinding” during peer review to avoid contamination with any new deep learning methods).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Kim et al. evaluates the performance of three modern AI-based methods in predicting complex structures and binding affinities between proteins and chemical compounds. An honest 'prospective' evaluation is achieved by studying benchmark structures and chemical compounds that did not exist in the PDB at the time the AI structure prediction models (AlphaFold3, Chai-1, Boltz-2) were trained.

      Strengths:

      (1) The study addresses an important question in modern computational biology and drug discovery, and establishes the strengths and limitations of the three tools in solving various computational chemistry tasks, including compound pose prediction, active-inactive discrimination, and potency ranking.

      (2) The conclusions are based on examination of four separate targets and respective compound datasets, where for one of the targets, the authors also obtained numerous X-ray structures to serve as experimental answers for the binding pose prediction task.

      (3) The study reports relationships between structure prediction confidence, predicted energies (DOCK3.7), and affinity predictions (Boltz-2) with the geometric accuracy of compound pose prediction as well as the experimentally measured potency.

      (4) One of the key findings is the limited ability of co-folding methods to predict conformational rearrangements, which does not correlate with their ability to predict binding poses of the compounds inducing these rearrangements.

      (5) The findings could serve as useful guidelines for computational chemists in selecting appropriate software and scoring schemes for each task.

      We appreciate Reviewer 2’s summary of the novelty of the dataset and analysis.

      Weaknesses:

      While I consider this a solid study, several aspects would need to be addressed to make it really strong:

      (1) DOCK3.7 docking and scoring experiments were performed using one experimental structure of Mac1, selected from dozens of structures based on a criterion that is not sufficiently well justified. For sigma2 receptor, dopamine D4 receptor, and AmpC β-lactamase, it is not clear which structures or models were selected for docking at all. It is well known that geometry predictions, scoring, and active-inactive ROC AUCs are all strongly influenced by the selected structure. It would be important to attempt Mac1 docking using all available experimental Mac1 structures, or at least against representative structures in various conformations; it would also be quite insightful to compare results to docking of the same compound sets to AF3, Boltz-2 and Chai-1 predicted structures of Mac1. Same goes for the docking studies of sigma2, D4, and AmpC β-lactamase.

      In any program, a decision has to be made as to which template will be used for docking, we justified the choice in the methods:

      “We used this structure because the inhibitor (Z5014193706) was the most potent molecule with a structure determined around the same time as the ligands in this dataset were tested.”

      We stand by this as a reasonable assumption. Similarly, for sigma2, D4, and AmpC β-lactamase, the template was chosen in the respective papers:

      a) The σ2 receptor bound to cholesterol (PDB ID: 7MFI) was used in the docking calculations.

      - This structure was determined in the paper, the first structure of sigma2 and therefore a worthy template

      b) The D4 receptor campaign used PDB 5WIU

      - This was one of two D4 structures available and chosen because it was not bound to sodium

      c) For AmpC, the campaign used the structure in the Protein Data Bank (PDB) 1L2S

      - This maximizes comparisons to other docking studies that used the same receptor template.

      The major goal of this study is to compare different methods under reasonable (but perhaps as the reviewer points out, not optimal) conditions, not to optimize docking score.

      (2) For binding affinity predictions, as a control, authors should consider compound co-folding with an unrelated protein, or even with a pseudo-peptide that consists of a few random single amino acids - this would provide an honest baseline for such predictions.

      This suggestion would be valuable for understanding the performance for these methods from the perspective of ligand specificity (a valuable, but separate, goal). Surely this will generate some number or some prediction - but what would this baseline mean and how would it be relevant for drug discovery? Therefore, we do not think this suggestion is relevant for the issues being investigated in this manuscript.

      (3) ROC curves Figure 3 and elsewhere should be shown, and AUCs quantified/reported on a log or square-root scaled x-axis, to emphasize early enrichment, which is the area of practical significance for these predictions. For example, Figure 3A currently suggests that the pose prediction performance of AF3 exceeds that of Boltz-2 whereas the early enrichment is clearly better for Boltz-2.

      We agree with this, and added a semi-logAUC plot for Figure 3A. For Figure 5, we also generated a semi-logAUC plot to see early ligand enrichment clearly, added as Supplementary Figure 11. We added the text:

      “Considering its early enrichment performance, Boltz-2 Ligand ipTM was the strongest predictor of pose accuracy based on normalized logAUC (20.5% above random, Fig. 3a). In contrast, although Boltz-2 pIC50 showed poor overall discrimination, it overestimated its ability to enrich true positive poses at low false positive rates, despite having a weak early enrichment behavior”

      (4) 'Trained set' in figures and text should probably be 'training set'? Or otherwise explain this new term the first time it is introduced.

      Thank you for pointing out this for clarification. ‘Training set’ is the correct word, and we made changes appropriately across all figures and texts.

      (5) Figure 1 illustrates a projection onto the first two principal components of a space that apparently had only one (scalar) metric for each compound pair (% maximum common substructure or Tanimoto coefficient); the authors need to better explain the principle behind this analysis and visualization.

      This suggestion is valuable, since we often use PCA to reduce dimensionality for more complex features. For clarification, we actually have a full pairwise similarity matrix for all tested Mac1 compounds based on each of Tc and MCS%. PCA for each MCS% and Tc is a representation of each pairwise similarity matrix. We also made a change in Figure 1 caption to make this point clearer:

      “projection of compounds represented by their full pairwise similarity vectors (by ECFP-4 Tc and MCS%)”

      Reviewer #3 (Public review):

      Summary:

      This study's core conclusions are well-supported by data. It is shown that co-folding outperforms docking in known ligand pose/affinity prediction (validated by RMSD and IC₅₀ correlation), struggles with false-positive discrimination in virtual screens (lower AUC values), and is complementary to docking (non-correlated errors, distinct strengths in drug discovery stages).

      Strengths:

      (1) Unprecedented prospective design with 557 novel Mac1-ligand complexes ensures rigorous, independent evaluation of co-folding methods.

      (2) Comprehensive comparison of 3 co-folding tools (AlphaFold3, Chai-1, Boltz-2) with DOCK3.7 across diverse targets and metrics enables nuanced performance assessment.

      (3) The study clearly demonstrates complementary roles of co-folding (superior pose/affinity prediction for known ligands) and docking (better hit prioritization), and addresses deep learning memorization concerns via ligand similarity analysis.

      We thank Reviewer 3 for pointing out the unprecedented and comprehensive nature of our study

      Weaknesses:

      (1) Limited generalization to diverse protein families (e.g., no ion channels/transporters).

      We agree - we have not explored the entire proteome and these are important target classes that will surely be investigated by future studies. We focused on targets here where we had large number of X-ray crystal structures (Mac1) and affinity/inhibition measurements from docking (the other three targets).

      (2) Ambiguity in the mechanism underlying co-folding's failure to predict rare conformational changes.

      Again, we agree. We are not the developers of these methods. We observe that these methods do not predict conformational changes with high fidelity and this weakness is an area that co-folding methods will surely prioritize in the future.

      (3) Virtual screen comparison is unbalanced (docking-prioritized hit lists bias results).

      We acknowledge this in the results: “An important caveat is that the hit-lists were composed of molecules prioritized by docking in the first place, giving it an advantage on these particular sets.” and discussion: “Finally, comparing co-folding to docking based on hit-lists themselves selected by docking is arguably unfair to co-folding. Counter-balancing this is the inclusion, in each of the three hit lists, of molecules that had mediocre and poor docking scores intentionally selected to test the correlation between docking score and hit-rate. Here too, the correlation between co-folding score and likelihood to bind, what we sometimes call a “dock-response-curve” was no better than docking’s, often worse (SFig.11).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are suggestions for revisions:

      (1) The writing is at times obtuse and hard to follow.

      This happens sometimes when multiple authors are writing together. We apologize and are happy to respond to specific areas that can be streamlined to be easier to follow.

      (2) In the Results section, "A set of 557 previously unreported Mac1 ligand complexes", the authors have compared the ligand poses across different metrics such as Tc - a standard, highly effective method in chemo-informatics and MCS (maximum common substructures); these are standard metrics for quantifying the structural similarity between pairs of small molecules. This part of the analysis checks whether this is memorization; it is critical to compare the two metrics, but it is not sufficient to draw a conclusion.

      Thank you for pointing out about the structural similarity of molecules co-folded to those present in the training set (resolved as Mac1 complexes and deposited in PDB before training dates). We have conducted an analysis where we do a pairwise similarity comparison for all ligands present in the PDB (regardless of the target), by both Tc and MCS, and overlay the cluster of ligands we tested (Mac1, AmpC, sigma2, D4). This should show where our tested benchmark datasets lie in the chemical space covered in the entire PDB. Each cluster (around 500 to 1300 compounds per target system) is overlaid on the cluster of all ligands deposited in PDB (over 50,000 compounds), and each cluster was relatively diverse by both Tc and MCS.

      (3) In the "Co folding can accurately reproduce poses of ligands dissimilar to those trained." Subsection under Results, the authors' conclusions are hard to follow; they state that the co-folding models often mispredict or miss the alternative conformation, but they also predict poses that are distinct from the training set. What does that imply?

      Our interpretation is actually a somewhat unsettling one: co-folding gets the ligand pose right even when it gets the protein wrong, and even when the ligand is novel. This suggests the models may be anchoring on conserved pharmacophoric interactions (like the adenosine-mimicking purine scaffold) rather than truly modeling the physics of the full complex. We added to the results section:

      This result suggests that co-folding reliably recapitulates dominant ligand-binding interactions even in the absence of accurate protein conformational modeling, providing further support to the idea that they are learning specific interaction patterns rather than a deeper physics-based representation (Masters et al. 2025).

      (4) The Discussion section connects the results and conclusions, but it can be challenging to grasp the study's overall message.

      We think the final paragraph hits on three major points:

      - Co-folding accurately predicts ligand poses for known binders, but fails to capture conformational changes

      - Co-folding does not reliably distinguish true binders from false positives in virtual screening hit lists

      - Docking and co-folding are complementary rather than competing tools

      (5) The work is highly detailed and well executed, featuring thorough data analysis and statistical assessment. The value of the paper would be further enhanced by explaining how it differs from seemingly similar results reported in other studies, including the one cited in this manuscript (see https://www.biorxiv.org/content/10.64898/2025.12.04.692352v1).

      The Mac1 results are completely unique. However, the docking datasets are exactly the same as those analyzed in the Menon et al manuscript. We don’t think our results differs from conclusions of the Menon et al manuscript as we wrote: These observations are supported by a fascinating study on some of the same ligand sets as investigated here, using AlphaFold3, reaching similar conclusions (Menon et al. 2025).

      Reviewer #3 (Recommendations for the authors):

      (1) Expand target diversity to include ion channels, transporters, etc., beyond enzymes and GPCRs.

      (2) Investigate the cause of co-folding's failure in predicting rare conformational changes (e.g., adjust sampling, MSA inputs, or add experimental constraints).

      (3) Mitigate docking bias in virtual screens (e.g., re-analyze unbiased compound libraries).

      We addressed these three points in the public review above

      (4) Test Boltz-2's affinity predictions without linear calibration and compare with FEP.

      The data without linear calibration are included in the manuscript. Comparing such a large number of compounds with FEP is currently beyond our capabilities.

      (5) Conduct proof-of-concept to test co-folding-docking integration for better hit rates.

      We think this is well beyond the scope of this manuscript - but look forward to testing this idea in the future.

      We also got one community review that we respond to below:

      Summary

      This manuscript evaluates the performance of co-folding models when tasked with 1) the recapitulation of a large number of experimentally determined co-crystal structures of Mac1 with a series of Mac1 ligands and 2) the rescoring of hits to identify false positives originally derived from a set of large docking-based virtual screens. The evaluation leverages a dataset of crystal structures and affinity data from high-throughput crystallographic and biophysical screens, respectively. These data uniquely enable this report to focus on the ability of co-folding models to handle ligands, resulting in an analysis that is particularly timely given the wide adoption of co-folding models and the relative scarcity of such ligand-focused benchmarks among existing evaluations, which have primarily focused on protein structure prediction or binder design.

      Thank you for this thoughtful summary of our work

      Feedback

      The experiments and analyses in the manuscript are well thought-out and do not have any significant issues. There are a few high-level points that may improve the clarity and completeness of the results. Importantly, none of the suggested additional experiments will affect the conclusions of the paper, but rather help provide additional context for the results:

      The first section presents an exciting opportunity to frame the Mac1 ligands against ligands in the PDB more broadly. It would be informative to assess whether chemotypes that are easier or harder to predict accurately and confidently are over- or under-represented in the PDB as a whole. Note that this is not a recommendation that new scaffold similarity metrics be incorporated into the analysis, but rather that analyses similar to those already performed in the manuscript are performed using all ligands in the PDB. For example, PCA-based analyses similar to those in Fig. 1c could be used to examine Mac1 ligands in the context of all PDB ligands enabling questions such as whether similarity to a nearest PDB neighbor, cluster size in a Tc/MCS PCA space, or other frequency-based measures show any relationship with prediction vs. crystal structure RMSD. Such analyses could provide additional insight into how effectively models leverage ligand information present in the PDB overall, as opposed to biases arising specifically from scaffolds represented in Mac1 structures in the PDB, which are already well covered in the manuscript. The conclusion that Tc/MCS do not correlate with the ligand RMSDs for the ligands already associated with the Mac1 is well supported, and presumably suggests that a correlation would not exist against the backdrop of the PDB, but it would be interesting to see the data using analyses similar to those already done in the manuscript nonetheless.

      We are adding new figures in SFig.1 that consider how different clusters of ligands tested for our co-folding analysis are distributed across the chemical space in PDB. This is done by making a similarity comparison between every ligand in PDB and those tested in our analysis by Tc and MCS%, then plotting in PCA space for each metric. We are excited to see that each dataset covers a wide scope in PCA space, but at the same time, there are unexplored areas in the chemical space of PDB by co-folding.

      Similarly, even though the four proteins used in this manuscript are not themselves the primary focus of the analysis, it would be valuable to perform a high-level assessment of the precedent for each protein in the PDB (beyond the count of liganded structures in Table S6), either in protein sequence space (e.g., MSAs) or structural space (e.g., FoldSeek). An analysis like this would provide important context about whether any of the proteins in the study have close homologs with liganded structures in the PDB, or are generally overrepresented in the PDB. The fact that the AUC for L-pLDDT for AmpC is higher than σ2 and D4, for example, is notable given the relative abundance of liganded AmpC structures in the PDB (this raises potentially interesting questions related to where DOCK3.7 and AF3 actually place the ligands, given the orthosteric β-lactam binding pocket in AmpC, although this is outside of the scope of this manuscript).

      High-level assessment of the precedent for each protein in the PDB will definitely help to understand if proteins we used have close homologs with liganded structures in the PDB. Our Supplementary Table 6 covers the extent to which these liganded structures were available by cutoff dates for AF3, Chai-1 and Boltz-2. AmpC had more homologs than sigma2 and D4, and this may explain a better AUC for AF3 L-pLDDT specifically for this target.

      A discussion of the affinity probability results (`affinity_probability_binary`) from Boltz-2 is likely warranted in the second section in addition to the pIC50s that are already reported (`affinity_pred_value`). The former seems like it would be more applicable for section 2 of the manuscript, but both warrant inclusion—they should both be calculated by default when the affinity pipeline in Boltz-2 is turned on, so it wouldn't involve any more inference.

      As boltz-2 affinity module outputs both affinity probability binary output and affinity predicted value, we kept track of both metrics. So we tried re-ranking hit lists using both metrics. Where boltz-2 performed better (Sigma2, D4), binary probability values were more representative as a metric to differentiate true actives from non-binders. This was more clear in semi-logarithmic ROC plots. However, in AmpC, both Boltz-2 scoring metrics performed similarly. Such inconsistency in trend made it difficult to draw conclusions.

      Minor points

      A more detailed description of the experimental methods used to generate the ground-truth data in the introduction (even though these have been explained in prior works) would help orient the reader early on, and ground the benchmarking aspect of the story. In general, the abstract and introduction would benefit from a more cohesive through-line to tie the two complementary but orthogonal sections of the paper together.

      We will include a more thorough description alongside the PDB depositions. As for the two sections, we have tried to tie them together from the perspective of drug discovery workflows…

      The cutoffs in the "Co-folding can accurately reproduce..." section shift between 2.5 Å (from the ligand center of mass) and 2.0 Å. Is there a reason for this? Along similar lines, mentioning cutoffs for true positives/negatives when introducing the ROC analyses later on in the Mac1 section seems unnecessary since no cutoff should be necessary here.

      We used 2.5A distance to COM to just get at “broadly the correct binding site” for fast filtering and 2.0A RMSD because that is the broadly accepted standard in the field for “relatively correct binding pose”.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2026-03407

      Corresponding author(s): Laura Cantini, Julio Saez-Rodriguez

      [The "revision plan" should delineate the revisions that authors intend to carry out in response to the points raised by the referees. It also provides the authors with the opportunity to explain their view of the paper and of the referee reports.

      • *

      The document is important for the editors of affiliate journals when they make a first decision on the transferred manuscript. It will also be useful to readers of the reprint and help them to obtain a balanced view of the paper.

      • *

      If you wish to submit a full revision, please use our "Full Revision" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank both reviewers for their thorough and constructive evaluation of our manuscript.

      Reviewer 1 highlighted that the manuscript would benefit from 1) a stronger positioning of ReCoN within the existing literature on multicellular modelling and network exploration, 2) a justification of our methodological choices, including the use of Random Walk with Restart (RWR), 3) the choice of input datasets for GRN inference and an assessment of the robustness of ReCoN's predictions to noise in these networks, 4) a more systematic exploration of ReCoN's parameter space (restart probability, layer transition probabilities, filtering thresholds).

      Reviewer 2 raised concerns about 1) the generalisability of the α parameter value (by default, 0.8) across independent datasets, 2) the expected contribution of the indirect effect in prediction performances, 3) the robustness of GRN across datasets and systems, and 4) the need for more quantitative validation in the spatial/microenvironment showcase. They also pointed out an unsupported claim regarding gene knockout prediction in the abstract.

      Several clarifications on figures, methods, and writing were also requested by both reviewers.

      As the main addition to the manuscript, we propose a new showcase based on the recently published Human Cytokine Dictionary (Oesinghaus et al., 2025). This showcase will simultaneously address several reviewer concerns by allowing us to 1) test the robustness and performance of α = 0.8 in an independent dataset, 2) evaluate the impact of different GRN inference methods (HuMMuS, SCENIC+, CellOracle, GRNBoost2) and noise on ReCoN's predictions..

      We will conduct a systematic parameter exploration on the Heart Atlas showcase, covering restart probability and inter-layer transition probabilities. We will additionally strengthen the validation of the microenvironment showcase by providing additional comparison to matched single-cell fibroblast data.

      Regarding the manuscript, we will substantially expand the discussion to better contextualise ReCoN within existing multicellular modelling approaches and the methods to justify our methodological choices (RWR/MultiXrank, dataset selection). We will remove the unsupported gene knockout claim from the abstract and reframe it as a future direction. In addition, we will clarify the distinction between ReCoN variants and rename them for clarity in the results section 1.2., improve figure legends. Finally, we will also work on the tool's documentation, including new tutorials on using spatial data and on running ReCoN with scRNA-seq-only GRN inference.

      We believe these revisions will substantially strengthen the manuscript and address the reviewers' concerns regarding method's robustness, generalisation, and contextualisation.

      2. Description of the planned revisions

      Reviewers' comments are in blue

      Authors' answers are in black

      Proposed text modifications are in green

      Reviewer #1

      R1.1. This is a very well-written paper; the methods used are adequate, and the use cases are relevant and broad, exploiting state-of-the-art datasets and tools.

      The author's claims are mostly justified. The authors could make an effort to more explicitly cite other efforts in similar directions. The claim 'We envision ReCoN as an extension to prior multicellular modelling, offering an interesting compromise between prediction of cell type responses and understanding of their molecular coordination.' is very general and could be better substantiated. In fact, the authors do not really give examples of alternative approaches to study systems of interacting cells, other than mechanistic agent-based models, which are clearly very different.

      Response:

      We thank the reviewer for pointing out the lack of contextualisation for ReCoN in this closing discussion.

      We wanted to remind that ReCoN builds notably on multicellular factor decomposition methods. We also want to emphasise the interest in completing cell communication methods that describe the big picture in multicellular interactions.

      • *

      We proposed to *explicitly state these two points with such rephrasing: *

      • *

      Network-based representations of multicellular systems have been an active field for many years, from early conceptual cytokine networks (Frankenstein, Alon, and Cohen 2006) to curated ligand-receptor cascades of hematopoietic tissue (Kirouac et al. 2010, Qiao et al. 2014). In parallel, and from bulk RNA-seq, the consideration of tissue specificities in GRN inference has been another way to consider the importance of the context in molecular mechanisms reconstruction (Sonawane et al. 2017). Single-cell analysis allowed decomposing tissue composition and quantifying gene expression, opening the possibility of scaling the inference of these networks and the inference of multicellular mechanisms in general, to large sets of molecules. Several methods have been developed to recover multicellularity. A first direction extends ligand-receptor interaction inference into the receiver cell response through curated signalling cascades, yielding ligand to target cascades (Browaeys, Saelens, and Saeys 2020, Jin et al. 2021, Zhang et al. 2021, Yan et al. 2025). A second direction leverages spatial context through explainable multi-view models that decompose marker variation in both intra- and intercellular contributions (Arnol et al. 2019, Tanevski et al. 2022), without considering the mediating cascades. Finally, the more recent family of multicellular factor decomposition methods focuses on the coordinated aspect of cellular programs rather than on the mechanisms. ReCoN's methodology proposes a network-based approach based on single-cell data and the philosophy of this last group of methods. Indeed, ReCoN aims to retrieve links between molecular drivers and such coordinated multicellular programs by bridging and exploring CCC inference and GRN modelling (Badia-i-Mompel et al. 2023) within large and coherent heterogeneous multilayer network.

      Arnol D, Schapiro D, Bodenmiller B et al. Modeling Cell-Cell Interactions from Spatial Molecular Data with Spatial Variance Component Analysis. Cell Rep 2019;29(1):202-211.e6. https://doi.org/10.1016/j.celrep.2019.08.077.

      Badia-i-Mompel P, Casals-Franch R, Wessels L et al. Comparison and evaluation of methods to infer gene regulatory networks from multimodal single-cell data. Preprint, bioRxiv, 21 Dec. 2024, 2024.12.20.629764. https://doi.org/10.1101/2024.12.20.629764.

      Badia-i-Mompel P, Wessels L, Müller-Dott S et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023;24(11):739-54. https://doi.org/10.1038/s41576-023-00618-5.

      Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 2020;17(2):159-62. https://doi.org/10.1038/s41592-019-0667-5.

      Frankenstein Z, Alon U, Cohen IR. The immune-body cytokine network defines a social architecture of cell interactions. Biol Direct 2006;1(1):32. https://doi.org/10.1186/1745-6150-1-32.

      Jin S, Guerrero-Juarez CF, Zhang L et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun 2021;12(1):1088. https://doi.org/10.1038/s41467-021-21246-9.

      Kirouac DC, Ito C, Csaszar E et al. Dynamic interaction networks in a hierarchically organized tissue. Mol Syst Biol 2010;6(1):MSB201071. https://doi.org/10.1038/msb.2010.71.

      Oesinghaus L, Becker S, Vornholz L et al. A single-cell cytokine dictionary of human peripheral blood. Preprint, bioRxiv, 15 Dec. 2025, 2025.12.12.693897. https://doi.org/10.64898/2025.12.12.693897.

      Qiao W, Wang W, Laurenti E et al. Intercellular network structure and regulatory motifs in the human hematopoietic system. Mol Syst Biol 2014;10(7):MSB145141. https://doi.org/10.15252/msb.20145141.

      Radig J, Droit R, Doncevic D et al. Tracking biological hallucinations in single-cell perturbation predictions using scArchon, a comprehensive benchmarking platform. Preprint, bioRxiv, 27 June 2025, 2025.06.23.661046. https://doi.org/10.1101/2025.06.23.661046.

      Sonawane AR, Platig J, Fagny M et al. Understanding Tissue-Specific Gene Regulation. Cell Rep 2017;21(4):1077-88. https://doi.org/10.1016/j.celrep.2017.10.001.

      Tanevski J, Flores ROR, Gabor A et al. Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol 2022;23(1):97. https://doi.org/10.1186/s13059-022-02663-5.

      Yan L, Cheng J, Nie Q et al. Dissecting multilayer cell-cell communications with signaling feedback loops from spatial transcriptomics data. Genome Res published online 12 May 2025. https://doi.org/10.1101/gr.279857.124.

      Zhang Y, Liu T, Hu X et al. CellCall: integrating paired ligand-receptor and transcription factor activities for cell-cell communication. Nucleic Acids Res 2021;49(15):8520-34. https://doi.org/10.1093/nar/gkab638.

      R1.2. Moreover, the exploration of the multilayer networks with RWR is a very reasonable choice but could there be other approaches? I think the authors could discuss this issue to briefly support their choice of this method.

      Response:

      It is a very relevant comment, as this choice has not been discussed in the paper; we propose extending the method section about ReCoN's networks exploration with a justification about this choice.

      • *

      There is currently a limited set of network exploration methods that have been implemented for multilayer networks. It includes notably pymnet (Nurmi et al., 2024), natively adapted to heterogenous multilayer networks, and multinet (Bagavathi et al., 2019) and muxviz (De Domenico et al., 2015), initially developed for multiplexed networks (e.g. social network where the same set of nodes is present in each layer) but adaptable to more complex multilayer networks. However, to our knowledge, only MultiXrank proposes a robust measurement of proximity between each pair of nodes.

      Indeed, pymnet does not propose implementation for pairwise distance, similarly for muxViz, which focuses on community and motif detection. Multi-net does propose pairwise distance based on shortest paths, but implements it only for nodes of the same multiplex (e.g. in our network, it would only be two genes, or two receptors, respectively). https://www.rdocumentation.org/packages/multinet/versions/4.3.2/topics/multinet.distance

      • *

      We provide the additional justification for choosing RWR and MultiXrank over a reimplementation of another method or an extension of another method.

      • *

      • The total complexity of the RWR is O(δm) - when the number of nodes is negligible compared to the number of edges, with m the number of edges and δ the number of iterations in the walk (Baptista et al., 2022 - Supp Notes 2.A; Jin W. et al, 2019). This linear increase with the number of edges is particularly interesting for large networks, such as ReCoN ones that can contain several million* edges. The number of iteration δ and the computational time increases inversely to the restart probability, which is an important factor to keep this probability high. *

      • *

      • *MultiXrank is particularly interesting for its flexibility as it allows to easily attribute different weights to the different layers and to precise the direction of the exploration easily. *

      • *

      • It also produces deterministic results by prolonging exploration until convergence.

      • *

      • Additionally, in the context of ReCoN, the indirect effect of each cell is run independently. We previously extended the implementation of multiXrank for running RWR in parallel in a previous work (Trimbour et al., 2024), making it already adapted for optimising ReCoN's explorations.

      • *

      For all these reasons MultiXRank implementation seemed to be the best choice for robust and efficient exploration of ReCoN's HMLN.

      • *

      Bagavathi, A., Krishnan, S. (2019). Multi-Net: A Scalable Multiplex Network Embedding Framework. In: Aiello, L., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L. (eds) Complex Networks and Their Applications VII. COMPLEX NETWORKS 2018. Studies in Computational Intelligence, vol 813. Springer, Cham. https://doi.org/10.1007/978-3-030-05414-4_10

      Manlio De Domenico, Mason A. Porter, Alex Arenas, MuxViz: a tool for multilayer analysis and visualization of networks, Journal of Complex Networks, Volume 3, Issue 2, June 2015, Pages 159-176, https://doi.org/10.1093/comnet/cnu038

      Nurmi et al., (2024). pymnet: A Python Library for Multilayer Networks. Journal of Open Source Software, 9(99), 6930, https://doi.org/10.21105/joss.06930

      Jin, Woojeong, Jinhong Jung, and U. Kang. "Supervised and extended restart in random walks for ranking and link prediction in networks." PloS one 14.3 (2019): e0213857

      R1.3. Generally the discussion should provide the reader the context in the existing literature in which the work can be set, detailing its impact. I think this could be improved.

      Response:

      • *

      We hope that the correction on the context proposed for comment R1.1 offers a first clarification on the context in the literature.

      • *

      We also propose to extend the description of ReCoN's impact with the following sentences in the discussion: "Unlike purely data-driven approaches, ReCoN contextualizes prior knowledge balancing both robustness through literature data, and specificity through new measurements. This mechanistic approach opens new possibilities for understanding how cellular coordination shapes tissue-level responses and for designing targeted molecular interventions."

      • *

      R1.4. Regarding the choice of datasets, it is clear that the method is quite demanding, requiring single cell and different omics to build the model, in addition to the expression dataset that is used as a use case. This inevitably leads to using a mix of datasets.

      For example in the mouse experiments the gene regulatory network was inferred from both a lymph node scRNA-seq dataset and a splenic scATAC-seq dataset, presumably due to the lack of multiome data in this setting. However the cell-cell communication network was inferred from the control case of the Immune Dictionary. Why can't the authors use the control data also for inferring GRNs?

      Is atac-seq really necessary in the inference of the GRN? What is the impact of the fact that lymph node and spleen samples might be different?

      :

      • *

      Is it a very *interesting comment, and we propose to add both 1) an explanation about our dataset choice to generate the GRN as a Supplementary text, and 2) a new experiment about the effect of GRNs built from multi-omics and scRNA-seq alone. *

      • *

      • Dataset choice

      • *

      We decided to infer a GRN using multiomics data, as these methods seem to perform better and are becoming the state of the art (Badia-i-Mompel et al. 2023, Trimbour, Deutschmann, and Cantini 2024, Yuan and Duren 2025).

      As scATAC-seq data was not produced for the Mouse Immune dictionary, we tried to find an external dataset, used HuMMuS, the method we previously developed, as it is also based on RWR and performs well on unpaired data.

      • *

      scATAC-seq

      Our first criteria was to match the mouse model used in the immune dictionary dataset, which reduced importantly the number of multicellular immune cell datasets available. We extended our research to a splenic dataset, as spleen is itself classified as a high specialised lymphatic structure, (check) and contains notably the same cell types than classical lymph nodes.

      • *

      scRNA-seq

      While we could technically use the control mice of the Immune Dictionary single-cel RNA-seq data with the spleen scATAC-seq data, the Immune Dictionary only provides 100 or less cells for each cell types per stimulation, which would results in a low number of cells. As GRN quality seems to depend a lot on the number of cell used, we favoured choosing a larger dataset.

      • *

      Our choice to use single-cell multiomics methods was driven by the novelty of these methods over scRNA-seq based ones, the performance improvement that they seemed to offer in several benchmarkings, and the will of developing a pipeline integrating the most complete data available for contextualization (Badia-i-Mompel et al. 2024).

      • *

      • GRN impact over the Human Immune Dictionary

      • *

      While it does not relate directly to this showcase, we will also add a new dataset analysis, detailed in the the comment R1.12. In the Human Cytokine Dictionary showcase,, we propose exploring the effect of choosing different GRNs, built from external multi-omics data or from the control scRNA-seq data of the dataset itself. We hope it can partially help users to decide in general wether to use external datasets of higher quality or sample-specific datasets.

      • *

      Finally, we propose to add in the documentation of the tool, a section showing how to use ReCoN with only scRNA-seq for the GRN inference, and the performance of different GRNs for the Human Cytokine Dictionary dataset directly in the paper.

      • *

      R1.5. The code is very clear, we were able to install and run it and it is quite well-documented. However, a few more details should be given in the text regarding how the evaluation of the performance is carried out.

      For example: If I understand correctly, when predicting the impact of cytokine perturbations the ReCoN predictions of genes impacted are compared to differentially expressed genes identified through traditional DEG analysis. What is compared is the ranking of these genes from ReCoN with the ranking provided by DEseq2. There is no description of how this comparison of ranking gives rise to AUROC values. Also, is it just the ranking that is predicted or can they also estimate how well they can predict the effect size?

      Response:

      • *

      We are thankful for pointing out the unclear technical details. DEG results were binarised, to obtain the list of differentially genes using the thresholds indicated in the section 4.4.4. We considered a gene as perturbed in each cytokine treatment if the comparison of control and treated cells had a t-test p-value below 0.1 and if the log-fold change was above 1.

      • *

      The second, and more general point of the reviewers, ReCoN scores should be considered to provide ranking on the possible regulations, but cannot be considered proportional to the effect size. As they are represent a likelihood more than a score, the binarisation should be the most appropriate transformation for the validation

      • *

      *Moreover, as the scores can be seen as the probability to end up the exploration on each node, they are always summing to one. This also prevents interpreting the scores as the amplitude of change. As an illustration example: if a receptor regulates three genes identically, they would (hopefully) all be having a score of (1 - R)/3, R being the restart probability in ReCoN, whether their expression doubles or is multiplied by 10. *

      • *

      While it can legitimately be seen as a downside, we believe it is similar in practice to most methods inferring GRN methods in practice, where trying to predict the true amplitude of gene perturbations usually results in very low performances (Badia-i-Mompel et al. 2024).

      • *

      We propose changes related to this comment.

      • *

      • We would modify the section 4.4.4. of the method with the following paragraph to explicit that it consists in a binary selection: "For each cytokine-cell type pair, differentially expressed genes were binarised: genes passing the significance thresholds (FDR P-val 1) were labelled as positives, and all remaining genes as negatives. ReCoN scores were then used to rank all genes, and AUROC values were computed from this ranking against the binary labels."

      • *

      • We will also include a section "ReCoN scores interpretation" on the documentation website, as score interpretation precisions will be particularly useful for users.

        R1.6. When describing the use cases, I think a bit more detail would help.

      For example 'To identify the cell-type-specific genes associated with HF, we used the MOFAcell scores of the multicellular factor 1 (MCP1) reported in ReHeat236' I supposed the explanation is on the dataset but for the sake of clarity it would be good to expand this sentence to give at least an idea of the approach.

      Response:

      • *

      We completely agree that more explanations should be provided, to avoid for the reader having to switching between articles to understand the concepts behind this showcase. As suggested by the reviewer, we propose a general description of the approach with the short paragraph, and to remove the term "loading":

      • *

      "In the ReHeat2 study, the first multicellular factor (MCP1) was associated with heart failure. We used the gene loadings of MCP1 as a proxy for the cell-type-specific transcriptomic changes associated with heart failure, ranking genes by their absolute loading values."

      • *

      We also propose to complete the method section: "MOFAcell is a multicellular factor analysis method that decomposes multi-sample single-cell data into latent factors representing coordinated gene expression patterns across cell types. Each factor is characterised by cell-type-specific gene scores, reflecting their individual contribution to the coordinated program. In this showcase, we use the first multicellular program (MCP1), as it was associated with heart failure"

      R1.7. Regarding the calculation of the R matrix from the NichNet matrices L and G, I gather that the R matrix is calculated once and is thus fully data-independent and available just like the L and G matrices from NichNet. This was not very clear in the tutorials.

      Response:

      • *

      We are very thankful for the reviewers' involvement in testing the tools itself and its documentation. First, we propose a new website page explaining the pre-computed resources available for receptor - gene links, and added a descriptive paragraph in the tutorial themselves.

      *Second, we notice a typo in the equation, where it should actually be L = R * G with the current definition. We corrected it in the next version, and precised that R is fully data independent and solely inferred from prior knowledge. *

      R1.8. Also, this might just be a typo in the tutorial: 'The default α = 0.8 gives more weight to direct effects, which has been empirically validated. You can adjust this based on your biological question." I believe the manuscript says alpha>0.5 refers to indirect effects dominating.

      Response:

      • *

      We corrected the saying in the tutorials. Indeed, a high alpha represents a stronger indirect effect. Additionally, a similar typo was in the first equation of the paper, we are correcting it too.

      R1.9. Same for the pre-processing of the spatial data for the third use case, a little more details on how this was done would help the users and readers.

      Response:

      • *

      We propose adding a specific section about the spatial pre-processing and analysis in the methods.

      We are also adding a tutorial on spatial data. Since spatial data processing is computationally intensive without GPUs, we will also provide the data already processed, in order to allow anyone to test this tutorial too.

      • *

      R1.10. I don't see issues with the statistical power of the analysis.

      Rather, I think the authors should provide some examination of the parameter space for their model. Whereas ana analysis of the impact of the Alpha parameter is provided, I believe there are several more parameters that have a crucial impact and choices for their values should be discussed.

      For example 'In the GRN reconstruction only the links with a score above 1.5e-7 were retained in ReCoN's gene regulatory layer. How was this chosen?

      We have identified the following parameters that are somehow justified but could be explored to have a better feel for how they impact the results

      Restart probability: How often the walker goes back to the starting seed/molecule

      Layer transition probability: How often the walker stays in the same layer - different cell? - different layers? Gamma

      Node transition within a layer: How often one jumps to a different layer

      Response:

      This is a very valid point raised by the reviewer about parameters explorations.

      • *

      We focused on exploring the alpha (direct/indirect effect) parameter, as its value was the incertitude when designing the model.

      • *

      We would like to address this comment by adding new explorations for the restart probability and the transition probability between layers. The probability to transition between specific nodes inside a layer directly depends itself on 1) the restart probability, 2) the transition probabilities, and 3) the weights of the edges, that are determined before and independently to ReCoN's exploration.

      • *

      The Heart Atlas showcase allows to evaluate each set of parameters in around 10 min instead of 10h for the Immune Dictionary. We thus propose to evaluate restart probability and layer transition probabilities on the data of this showcase.

      • *

      • We would explore the restart probability of 0.1 * N, with N between 1 and 9.

      • *

      • For transitions probabilities we propose varying GRN, receptor, and cell communication importance with the following configurations: - Staying in CCC probabilities (- not jumping to receptor layer) among (0.1, 0.3, 0.5, 0.7, 0.9), staying in receptor layer (- not jumping to GRN) of (0.25, 0.5, 0.75), staying in GRN layer (- not jumping to CCC) of (0.25, 0.5, 0.75). It would result in 9 intracellular variations combined with 5 intercellular variations.

      • *

      We envision an evaluation by measuring the correlation between the results of the different configurations, and the time before convergence of the results, as it could potentially increase drastically when decreasing the restart probability. If correlations below 0.9 are observed between some results, we will compare their absolute performances.

      • *

      We would include the figures related to these explorations in the supplementary data. We would highlight the main findings in the method section dedicated to the random walk with restart. Finally, we would briefly describe the parameter exploration design in the first section of the results, for curious readers who would like to verify parameter choice before reading the showcases.

      • *

      R1.11. Weighting parameters: How much weight for direct or indirect effect to account for the combined effect - alpha - this is the only one that is explicitly explored.

      Response:

      We are very thankful for this comment, and we decided to modify our tutorial guidelines to make this choice more intuitive and general.

      • *

      Indeed, 1.5e-7 would hardly make sense for most methods, which would not produce such low scores. We now propose to select the first 2 million connections of GRNs, in order to keep a complete or a large portion of the network if other methods than HuMMuS are applied.

      • *

      In our case, 1.5e-7 was empirically determined from the distribution of HuMMuS scores, to keep the 2 million top connections as HuMMuS networks are generally almost fully connected, which is a particularity for classical GRN inference methods, and keeping it entirely would make exploration time much longer.

      • *

      R1.12. Finally, this might be considered OPTIONAL but would greatly improve the work in our opinion:

      The method crucially depends on the networks that are used in the different layers and to connect layers and cell types. As we know, biological data is noisy and incomplete (FP and FN) at each level and in each datatype. It would be really useful to estimate what is the robustness of the results to this noise. Particularly, from personal experience, we think the GRNs reconstructed from data are often almost fully connected and it is exceedingly difficult to validate them in specific contexts. This means that some 'errors' are likely to be present.

      Since several methods exist for inferring GRNs one could simply compare the results using different methods for this part of the network.

      A related point involves the characteristics of the RWR algorithm, that will be quite impacted by the presence of hubs in these networks (either in single layers or across several) that is likely to impact the exploration. If proteins that are hub are effectively important, that is not a problem, but in some layers, for example, the receptor-receptor layer that presumably will contain PPIs, there might be biases in hubs being just better studied proteins, and these hubs might have an 'unjustified' weight in the walks.

      One potential approach to assess the robustness of the method to these issues could be an empirical one that just randomly perturbs the networks in ReCoN to see to what extent similar predictions are achieved.

      *Response: *

      • *

      We are thankful for this relevant comment on GRN and prediction stability, and would like to take it as an opportunity to support the hypothesis that different GRN methods can be used in ReCoN.

      • *

      When developing our previous HMLN-based tool, HuMMuS (Trimbour et al. 2024 - Supp Figure 6), we observed that its multilayer structure provided more robust results than individual layers. We would like to reproduce such an analysis, verifying that ReCoN results have less variability than the GRN layers individually.

      We propose to integrate a new showcase on the Human Cytokine Dictionary (Oesinghaus et al. 2025), trying to predict cytokine downstream effects similarly to the Mouse Immune Dictionary showcase.

      This showcase would be useful to confirm the contribution of the indirect effect and test the impact of different GRN on the results.

      We would generate different GRN with several other GRNs methods: SCENIC+, CellOracle, and GRNBoost2 - the latest using only the scRNA-seq of the control samples in the Human Cytokine Dictionary.

      • *

      The GRN methods produce generally output with very low overlap (Badia-i-Mompel et al. 2024)*. *

      *If we observe high correlations between the ReCoN predictions associated with the different GRNS, it would provide already a validation of ReCoN's robustness to GRN noise. *

      If lower correlations between ReCoN's predictions are obtained, we will add a specific permutation experience over the HuMMuS GRN, creating different level of artificial noise and assessing more precisely the robustness of ReCoN to GRN stochasticity.

      • *

      Regarding PPI hub justification, our *applications did not use receptor PPI and are not affected by bias at this level in the showcases. This bias could specifically be present in the receptor-gene links, as we derive it from the ligand-gene connections of Nichenet which was itself partially based on prior knowledge. It is thus possible that some receptor are reached more often due to this bias and not a stronger effect. It seems however, hard to control in this context, as ReCoN currently relies on this prior knowledge. Currently, we hope that the combination of personalised, literature-agnostic GRN with literature-based receptor - gene can provide an interesting trade-off. In future development, we could imagine a receptor-gene network based solely on perturbations, but it would require controlling also the bias of ligand - receptor binding couples, which limits even the use of ligand-based experience. *

      We propose adding a short point in the discussion about hub effects from RWR-based methods.

      • *

      R1.13. Please add page numbers.

      *Response: *

      • *

      We will add the page numbers.

      • *

      R1.14. Figures are nice and clear.

      Some specific minor points are listed here below.

      Define hMLN on first appearance fig1 caption (no page numbers..

      2nd appearance heterogeneous multilayer structure (HMLN) ...

      Response:

      • *

      We updated the legend of the figure to include the definition of the acronym, as it arrives before first text occurrence. (Or define at both positions ?)

      R1.15. Bi_j not so clear to what it refers when first mentioned

      Response:

      • *

      *Bi_j represents a weight that can be attributed to favour some cell-to-cell transitions. It is usually not necessary to use them.

      *

      *It is of interest notably to model 1) known spatial patterns in situ and hypothesis/design where cell types favour some connections. *

      • *

      E.g.: for modelling the skin, a user might notably want to increase connections between epidermic and dermic cells, and between dermic and hypodermic cells.

      • *

      We propose a new explanation of Bi_j to both explain it's meaning in the modelling, and illustrates situations for using it: "The coefficient B_{i,j} modulates the influence of cell type i on cell type j in the indirect effect computation. By default, all B_{i,j} are set to one, weighting each cell type's contribution equally per cell. However, it can be adjusted to encode additional biological knowledge, such as spatial proximity between cell types or known cooperation patterns. For instance, when modelling the skin, a user might increase B_{i,j} between epidermal and dermal cells, and between dermal and hypodermal cells, to reflect their spatial organisation."

      R1.16. personalized interaction specificity. - maybe better word than personalised (contextualised?)

      Response:

      • *

      We agree that contextualised explicits better the meaning behind this model. Personalised might notably lead to expect patient-specific data, which is not the case here.

      • *

      We propose to rephrase all the model names to : Receptor-matrix, ReCoN-no-CCC, ReCoN-no-context, ReCoN-complete.

      R1.17. ReCoN-genetic and ReCoN, ( generic?)

      Response:

      • *

      We will correct this typo.

      R1.18. responses. It is expected to observe common behaviors in-between cell-type, that the GRN

      and the generic CCC network already contribute captures.

      • not very clear

      Response:

      • *

      We aimed here to provide an explanation to the already good performance of the "ReCoN-no-context" (or its name updated according to comment R1.16), which could be surprising as no cell-type specific information is used. The explanation proposed is the good prediction of several properties shared by all immune cell types, such as similar metabolic pathways, despite their specific roles. If we adopt a quantitative view on their transcriptome like in this showcase, it can be expected that the cell type responses are relatively well predicted through the common properties only.

      • *

      As this is a very relevant comment, and that several comments pre-submission we received were also related to this result, we would like to keep an explanatory sentence.

      • *

      R1.19. Figure 2b the icon of cells with double arrows might suggest phenotype shift when instead this is just communication

      Response:

      (left side) We are very thankful for paying attention to the details of the paper and fully agree with this analysis. We propose to represent ligand emission instead of arrows, reusing the convention of the Figure 1.

      R1.20. eTACs explain acronym and what they are

      Response:

      • *

      We update the first occurrence of eTACS to extrathymic Aire-expressing cells (eTACS).

      R1.21. Due to very few genes being differentially

      expressed, only cDC1 was conserved and evaluated for IL22,

      Not so clear

      Response:

      • *

      As we are commenting on IL22 stimulation results, we reorganised the sentence to make it less convoluted: "For IL22 stimulation, only cDC1 presented enough genes being differentially expressed."

      R1.22. In this showcase (not very clear, use case?)

      Response:

      • *

      We perceive "use case" as describing a type of use for the method, while a show case is a specific example of a use case. We thus find showcase more appropriate here. We will however go over all use of the word, to be sure it is only used for the precise examples we provided, and not to describe "use cases".

      R1.23. different fibroblast specializations - maybe phenotypes?

      Response:

      • *

        • It is a very good suggestion, as specialisation would involve functional aspects (that we can't really be sure of), and a chronological evolution*
      • Phenotype generally includes numerous properties, such as morphology, that we cannot validate here. We think the use of phenotype might be stronger than specialisation here. To simplify, phenotype can work, to be more precise: transcriptomic specialisation? I am honestly not sure of the best change here.

      R1.24. Figure 4b

      1. b) Schematic view of the deconvolution process and cell type-specific count inference from the spatial niches.

      Not so clear what the heatmap shows, rows and columns

      Spots heatmap : label niche on rectangles in cols

      And each col is a spot

      Rows are cell types or cells?

      In the cell types x spot

      Response:

      This figure can indeed benefit strongly from legend modifications. On both matrix, lines represent the genes, while columns represent the spot / individual cells deconvoluted per spots

      • *

      • We would annotate the niche legend (here the colour surroundings) by a symbolic drawing instead of writing it on the matrix

      • *

      Legend "genes" on the first matrix

      • *

      Write deconvolution ON the figure directly

      R1.25. Cell2location. Add reference, maybe explain basic functionality?

      Response:

      • *

      Cell2location was not referenced in the results section, and was only referenced in the section 4.6.2 of the methods, as the 72th citation. We corrected this oversight, and propose 1) a brief explanation of deconvolution right before, 2) a brief explanation of Cell2location particularity in inferring individual cell profiles - which is not common in spatial deconvolution.

      R1.26. reconstructing different patients, tissues, and microenvironments to predict

      context-specific molecular treatments.

      Unclear

      fibrosis in different - at

      molecular levels

      Response:

      • *

      We will modify this section title according to the reviewer's citation and the different reformulation.

      R1.27. Figure 5d myeloid and endothelial colour code inversed from 5 BC

      Response:

      • *

      The legends are individually correct, but there is no reason to not make them coherent across panels. We will update the legend of the panel 5.d..

      • *

      R1.28. 5d indicate important pathways in organe should not change the colour of the nodes (purple=common, blue or green specific). Use border colour maybe?

      Response:

      • *

      We had forgotten to precise the colour code of this panel, where the choice of orange highlighted here the gene set related to molecular pathways instead of functional annotations. As the name already explicits pathway, we now think that the orange background is redundant informations and may create some confusion. We thus would like to update Wnt and TNFA pathways backgrounds to ___ (more enriched in cell type), and purple (significantly enriched in all cell types).

      R1.29. 5e is not a venn diagram

      1. e) Venn diagram showing the overlap between transcription factors (TFs) predicted by ReCoN (green) and those previously

      implicated in fibrosis (orange) or cardiac diseases (violet). Only the top 10 TFs were annotated from literature

      sources; full sizes of fibrosis- and cardiac disease-related receptor sets can therefore not be represented.

      1. f) also not a venn diagram e/f now in supp

      the "NABA ECM collagens" gene set. Nodes are

      grouped by molecular type (e.g., transcription factors, receptors, ligands), and links represent the weighted,

      direct regulatory interactions present in the ReCoN-constructed

      Response:

      • *

      As the diagrams do not indicate the total number of receptor/TF that are in the literature, it cannot be Venn diagrams. We updated the legend to :Venn diagram showing the Overlapp between [...]

      • *

      As we reorganised the paper, these plots are now only in supplementary; we removed the duplicate occurrence in the figure 5 legend.

      R1.30. Why Sankey plot? Normally sankey plot represents flow (of regions changing from 1 state to another) but here this is just a weighted network?

      No communication from firbos back to other cell types? No communication between ventricular/myeloid/lymphoid?

      Response:

      • *

      We are thankful for this useful feedback which helped us realising interesting details were missing from the paragraph.

      • *

      *This is only intended for visualising regulatory cascade, so users have to decide on one receiving cell, a set of target genes, and sending cells. It includes a specific subset of regulatory cells, and only their interactions with the target cells. Here, we illustrated the regulation of some ECM genes produced by fibroblast. *

      • *

      Sankey Diagram might indeed not be the clearest representation, as we are not modelling the all diffusion, and not a flow per se. We propose to replace by another representation that we hope will be more intuitive for biologists (and more aesthetic), such as illustrated below:

      R1.31. as a extension to - an

      underrepresented in the current. - current framework?

      Response:

      • *

      framework works perfectly to fill the missing word in the sentence

      • *

      R1.32. However, it can't represent more - cannot

      Borrowing representation from hypergraphs, which introduces

      The network exploration implementation of ReCoN also present some limitations.

      limitations. While random walks

      with restarts offer a stable and fast exploration workflow for multilayer networks, it

      currently only considers positive weights to predict regulation strengths. It involves that the

      nature of the regulation, as activation or inhibition, has to be identified a posteriori.

      • check concordance/grammar

      Response:

      • *

      We will update the raised grammatical errors

      • *

      R1.33. Only the nodes that are included in one of the layers are present in the

      final results, ignoring the ones present only in bipartites.

      Unclear

      Response:

      • *

      Layers and bipartites are treated differently by the algorithm, and layer presence is necessary to appear in the results.

      • *

      In practice, it just means that receptors/ligands not paired in the CCC, or genes not regulated by any TF in the GRN, won't appear.

      • *

      We propose clarifying with this second explanation

      • *

      "In practice, a node must have at least one connection in its layer to appear in the final results. It thus means that receptors or ligands absent from the CCC network and genes not targeted by any transcription factor in the GRN will not receive a score from the random walk exploration."

      • *

      R1.34. a scATAC - an

      • *

      Barsi et al is published https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013188

      Response:

      • *

      We updated the reference with the published article.

      R1.35. effects, allowing for modulating in a second

      time their contribution. - word order

      Response:

      • *

      We propose to formulate "allowing in a second time to modulate their contribution"

      R1.36. others. However, it is possible to adjust the Beta coefficient to

      represent it based on the available information for each dataset.

      Represent- adjust?

      Response:

      • *

      We agree with the reviewer's suggestion to use adjust.

      R1.37. We use the latter to compare the different models. - what is the latter?

      Response:

      • *

      The latter referred to the 25 cytokines of the Immune Dictionary which had at least one connection in the inferred cell communication network with CellPhoneDB. We propose clarifying this formulation to "..."

      R1.38. It resulted in the scRNA-seq in 1,789 cells with 13,167

      genes, and for the scATAC-seq in 3,759 cells with 254,545 regions.

      Check english

      Response:

      • *

      We propose replacing this sentence by the following: "It resulted in a scRNAseq dataset of 1,789 cells with 13,167 genes, and a scATACseq dataset of 3,759 cells with 254,545 regions."

      R1.39. GRETA pipeline.- reference

      Response:

      • *

      We added the citation to the paper of the GRETA pipeline in the section 4.5 of the methods: "Badia-i-Mompel et al., 2026"

      R1.40. We kept all the cells whose annotations through unsupervised clustering,

      followed by marker gene annotations, through scANVI were coherent.

      Word order

      Response:

      • *

      We propose the following reformulation to correct the sentence: "We kept all cells whose annotations were coherent between unsupervised clustering with marker-gene labelling and scANVI-based label transfer"

      R1.41. In parallel, pairs of ligands and receptors with both associated with scores above

      an absolute gene loading of 0.1 were considered potential driver interactions in HF.

      Unclear

      Response:

      • *

      In the MOFAcell results, factors correspond to linear combination of genes that explain a large part of the data variance; the contribution of each gene is called loading. We chose the factor that classified the best patient with and without fibrosis, and kept all the top genes, all of those with a score above 0.1.

      • *

      We propose reformulating this sentence as the word "loading" could overcomplicate here for most readers: "To identify the ligand and receptors driving heart failure, we considered all of those with an absolute contribution to the multicellular factor of 0.1."

      R1.42. gseapy Python - reference?

      Response:

      • *

      The gseapy package was indeed not cited, we now include the citation : "Zhuoqing Fang, Xinyuan Liu, Gary Peltz, GSEApy: a comprehensive package for performing gene set enrichment analysis in Python, Bioinformatics, 2022;, btac757, https://doi.org/10.1093/bioinformatics/btac757"

      R1.43. and to calculate average for each spatial context the average cell type expression.

      Unclear

      Response:

      • *

      we propose to reformulate the sentence to: "These cell-type-spot profiles were used later for each spatial context to create a specific cell-cell communication networks and to calculate cell type average expressions."

      R1.44. We only used the loadings of all cell

      types but the fibroblasts to consider the effect of the sole environment.

      Unclear

      Response:

      • *

      we propose to use "APART from the fibroblast" to clarify the sentence and "to ONLY consider the environment effect".

      R1.45. We realised a downstream - performed

      Response:

      • *

      We fully agree with the reviewer's suggestion.

      R1.46. The profiles inferred by ReCoN were first very correlated in all three contexts. - unclear

      Response:

      • *

      The sentence was missing clarity and deserved being rephrased. We propose: "When looking at the absolute scores of ReCoN in all three contexts, results were initially highly correlated. To focus on context-specific differences, enrichments were performed using the log-ratio of each context profile over the mean of the other profiles."

      • *

      R1.47. Potentially the closest results are models that can predict the effect of perturbations on cell line cultures. Several approaches in the literature employ either transformers or optimal transport to predict the effect of perturbations in single cell datasets. One of the main issues is an underlying necessary assumption that the perturbation effect will be larger than the heterogeneity (in cell lines for example), which becomes increasingly difficult when considering in-vivo experiments. ReCoN obviously goes beyond this by considering explicitly the presence of different cell types but distinctions of cell types are sometimes quite arbitrary and potentially application of ReCoN to some of the in-vitro culture datasets, even on cell lines, could be a way to test its performance and benchmark it against other methods.

      The main bottleneck in the application of this framework to 'personalisation' of therapies, mentioned even in the abstract as a potential future goal for such an approach, will be the lack of data. This approach requires single cell level descriptions of the system at hand, plus additional datasets to build the model structure. To a certain extent, public data of related tissues/contexts can be used, but it will be necessary to test the dependence of performance on coherence of the input data to develop sufficient trust to use it for new predictions, especially in a medical field.

      • *

      We thank the reviewer for these reflections, which raise several distinct points that we would like to add in the discussion.

      Cell line perturbation is indeed a close and active field of research, with notably numerous models based on optimal transport and VAE and relevant benchmarks(Radig et al. 2025)*. In our view, ReCoN tries to take a complementary angle, by both focusing on the environment effect and using a network-driven approach providing explainability. *

      These perturbation methods are typically benchmarked on single cell line screenings, where cell-cell communication is highly limited or absent by design, while ReCoN is specifically designed to exploit multiple cell types interactions. Furthermore, ReCoN relies on a network that aims to provide only explainable hypotheses and molecular cascades. They also typically learn from different data, as ReCoN only uses single-cell data and best perturbation prediction methods learn from a subset of perturbation experiments.

      Exploring the performance of ReCoN in perturbation predictions would require designing extensive comparisons with the state-of-the-art taking into account all these nuances which we believe goes outside of the scope of the present study. It however still raises a fundamental question for the development of the next methods and the need to assess whether the perturbation effect is actually larger than the heterogeneity, and we propose to extend the discussion to cover these aspects.

      Secondly, this comment raised a point about cell type definition, which can be a hard task and sometimes a wrong description of cells heterogeneity. We note that even if ReCoN relies on grouping cells in some way, it does not impose any particular cell type ontology: users can define their own cell types or cell states, since the CCC layer is typically inferred from single-cell RNA-seq alone and does not require canonical cell-type annotations. This flexibility allows ReCoN to accommodate finer or coarser groupings depending on the biological question. We do not propose a framework to take into account diversity in other ways than homogeneous clusters of cells, but we think that it constitutes an interesting future development of ReCoN or new multicellular modelling methods.

      Lastly, we fully agree that an important limitation for ReCoN's use is data availability and generation, which was also a limitation when identifying datasets for the manuscript's applications. We hope that the development of open source atlases will make it easier to leverage tissue-specific prior knowledge and increase potential application, prediction performances, and trust in ReCoN results.

      In conclusion, we propose to state in the discussion two new points:

      *1) extending multicellular perturbations (including gene knock-out) to conditions where cell types cannot be defined prior to the analysis, or are more to consider across a spectrum, will be an interesting future direction. *

      2) there is new a need for broad benchmarks covering both multicellular and single-cell line tasks to evaluate the trade-off between accounting for cell heterogeneity and overall prediction accuracy.

      Radig, J., Droit, R., Doncevic, D. et al. scArchon: a scalable benchmarking framework for assessing single-cell perturbation models. Genome Biol 27, 162 (2026). https://doi.org/10.1186/s13059-026-04104-z

      R1.48. The authors could comment on how their method compares to others that do not require single cell level information. Despite clear differences, it might be important to show the advantage of using this more complex approach that requires data that is less available. Given the ease with which bulk profiles can be constructed from single cell data, it might be possible to compare the approaches directly. For example, see

      1. Wang, S. Patkar, J.S. Lee, E.M. Gertz, W. Robinson, F. Schischlik, D.R. Crawford, A.A. Schäffer, E. Ruppin Deconvolving Clinically Relevant Cellular Immune Cross-talk from Bulk Gene Expression Using CODEFACS and LIRICS Stratifies Patients with Melanoma to Anti-PD-1 Therapy

      Mike van Santvoort, Óscar Lapuente-Santana, Maria Zopoglou, Constantin Zackl, Francesca Finotello, Pim van der Hoorn, Federica Eduati,

      Mathematically mapping the network of cells in the tumor microenvironment,

      Cell Reports Methods 2025

      We propose to extend the discussion with additional methods, notably from before single-cell technology developments. We did not plan to include this two specific methods, as to our knowledge, they don't provide output directly comparable to ReCoN's purpose.

      • The first work proposes to deconvolute the bulk RNA-seq profile into cell-type-specific expression profiles. It is an interesting reference, as it could allow applying ReCoN even to bulk RNA-seq, but they do not provide comparable results, as their final task corresponds to inferring the ligand-receptor interactions, without providing downstream molecular mechanisms.
      • The second method proposed in this paper, RaCInG builds cell-to-cell networks for individual patients. They do not explore the molecular interactions inside the cells themselves, which could be used to build personalised ReCoN's model but seem to be more a prior to recent CCC than ReCoN itself.
      • *

      • *

      Reviewer #2

      R2.1. It is not clear how well it performs in independent validations. Authors showed that it can predict the effect of cytokine perturbations in the immune dictionary by selecting an optimal alpha. Authors should validate that using the same alpha value of 0.8, it is possible to accurately predict the effect of cytokine perturbations in independent datasets. This is particularly concerning for cytokine-cell type pairs where the optimal alpha is not known. Therefore, the potential utility of Recon to estimate the effect of multicellular perturbations is not well established.

      • *

      Response:

      • *

      *The reviewers raised a very relevant point by pointing out that the alpha coefficient might vary between datasets. *

      • *

      The value of 0.8 was chosen because it produced the best results in two independent datasets, the immune dictionary and the heart failure showcases. We could here observe some cross-dictionary reproducibility. To complete these findings, we will also verify that 0.8 provides the best performance in a new showcase: the Human Cytokine Dictionary (Oesinghaus et al. 2025)

      • *

      We tried to contrast this choice by opening on the need to confirm the importance of the indirect effect. We propose to add a sentence explicitly commenting on the impact of these new findings on the alpha coefficient and its robustness value.

      • *

      It is also accurate to say that ReCoN cannot currently estimate the alpha parameter autonomously. We proposed this default value as it worked on both datasets, but it is possible that no default value could fit them all. The value of alpha is currently a default value, but users are completely free in the current implementation of ReCoN to modify its value depending on their needs

      If it was not the case, one option could be to fit its value using similar prior perturbations, when such data is available. For example, perturbing one or a few cytokines, a user could choose the value that explained the best the gene expression responses.

      • *

      R2.2. Authors claimed that optimal alpha value of 0.8 implies the dominance of indirect effect. But in contrast to this claim, the performance across cytokine-celltype pair only increased from 0.72 to 0.76, which seem to imply that indirect effects do not add much.

      *Response: *

      • *

      The range of performance improvement is an interesting point to discuss for us, as it roughly doubles the computational time and consequently a trade-off between resource usage and this improvement.

      • *

      While the average improvement from combining the direct and indirect effects observed on the first showcase was around 5%, it reached more than 10% in some cell types. We consider that it still corresponds to an interesting improvement for the current task. Indeed, it here "only" incorporates the coordination of immune cells to a cytokine stimulation, which should not necessarily change their profile drastically compared to isolated exposition.

      R2.3. How does the cell-type specific effects prediction perform by just considering the intracellular layers? The authors constructed multiple variants of ReCoN to estimate unicellular and multicellular effects. How is the variant ReCoN-grn different from full ReCoN where gamma is set to zero.

      *Response: *

      • *

      We are thankful for this comment, which will help to restructure the section 2.2.

      • *

      As the ReCoN-GRN differs from the full ReCoN model, even with a gamma value of 0, as the latest include ligand-to-receptor weights. However, the ReCoN-GRN would correspond to the ReCoN-generic with an alpha of 0, which does not weight ligand-to-receptor links.

      • *

      We propose to clarify this detail in the section 2.2.2 by adding after the introduction of the ReCoN-generic model the sentence: "Note that ReCoN-grn corresponds to the ReCoN-generic model with alpha set to zero, where no indirect effects are considered. It differs from the full ReCoN model with alpha set to zero, which still includes ligand-to-receptor weights through the receptor-gene bipartite network."

      R2.4. In section 2.2, authors assert that if matching datasets are not available, GRN layer can be extracted from other datasets. How well does the GRN layer from one system generalizes to the other system in terms of perturbation prediction?

      *Response: *

      • *

      It is, of course, a complex question, as it probably strongly depends on the studied system. However, we believe while it is important to consider similar systems, using the same samples for the cell-communication and the GRN layer is not necessary.

      • *

      The first showcase that we propose explores exactly this case. We built the GRN from two unpaired datasets, and the cell communication from a third one. It provided convincing performances, justifying our earlier claim. It is additionally something done in most methods contextualising prior knowledge, which usually comes from other samples and sometimes even other organs (Browaeys, Saelens, and Saeys 2020, Jin et al. 2021, Badia-i-Mompel et al. 2023).

      • *

      To provide additional insights, we will run the new Human Cytokine Dictionary showcase using both 1) multiomics methods on external PBMC datasets, and 2) a single-cell RNA-seq only method on the Human Dictionary directly. We will then be able to show performances using both data and corresponding methods.

      • *

      To justify more clearly our claim according to reviewer's comment, we propose highlighting in the showcase itself this justification: ".... this showcase highlights the possibility to combine networks obtained from distinct datasets...".

      Related to combining datasets, we propose to clarify the reasons behind our choices for the Immune Dictionary showcase with the additional supplementary text proposed in response to the comment R1.4.

      • *

      Badia-i-Mompel P, Wessels L, Müller-Dott S et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023;24(11):739-54. https://doi.org/10.1038/s41576-023-00618-5.

      Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 2020;17(2):159-62. https://doi.org/10.1038/s41592-019-0667-5.

      Jin S, Guerrero-Juarez CF, Zhang L et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun 2021;12(1):1088. https://doi.org/10.1038/s41467-021-21246-9.

      R2.5. In the abstract, authors claimed that ReCoN can predict the effect of gene knockouts. But authors did not show any application or validation to support this claim.

      Response:

      • *

      We indeed had no showcase that could explicitly measure the performance of ReCoN directly for gene knockout, while the possible application was introduced in the abstract.

      * We believe that ReCoN could be used in the future to infer such perturbations, but we fully agree that this claim cannot be presented without justification.

      We propose to remove the introduction of gene-knockout there, and to introduce it in the discussion opening instead, specifying that it will require specific experience and constitutes a possible future extension of the work.*

      R2.6. The communication between cells might be dependent on their spatial proximity. Is it possible to construct the CCC layer by incorporating the context-matched spatial data? How would that affect the performance of multicellular response prediction?

      Response:

      • *

      *This is a very interesting comment as numerous methods using spatial transcriptomic data have been published recently. *

      • *

      In the current formulation, the beta coefficient Bi_j modulates the impact of the cell type i on the cell type j. If the spatial transcriptomic data can inform on the proximity between cell types, and its overall impact on their communication, users could enforce more communication between some.

      • *

      However, as ReCoN is a cell-type centric model, adding spatial information can only be done at a general scale, or by modelling independently spatial regions such as presented in the Microenvironments heart infarction showcase. It means that ReCoN cannot beneficiate from the potential of spatial transcriptomic as much as models representing the tissue structure.

      R2.7. In the fibroblast application in Fig 4d, based on the cardiac cell types expression in region type, they are predicting fibroblast gene expression. Wouldn't the most direct benchmarking be comparison with observed fibroblast expression from the ST (after deconvolution perhaps)?

      Response:

      • *

      This was a helpful comment to guide the restructuration of the microenvironment heart infarction showcase, as we believe the whole showcase objective was not formulated clearly enough.

      • *

      We aim at modelling the impact of the environment on the transcriptome. As the complete transcriptome of a cell results from numerous interacting variables, we believe that comparing the correlation between ReCoN's scores and the transcriptome would not evaluate the prediction of the environment impact.

      • *

      For this reason, we wanted to compare the results to the specific differences from the microenvironment. We focused on gene set enrichment that seemed less noisy for such a comparative experiment, in particular from Visium10X data that has a particularly high dropout rate.

      • *

      We propose to strengthen the validation by providing molecular insights into the three groups of cells studied.

      The spatial data themselves are bulk, adding a layer of noise over the small number of genes captured by Visium. Instead of a correlation with the deconvoluted spots, we have equivalent single-cell RNA-seq fibroblast data annotated in the same study, which matches the three modelled niches. We propose to conduct a differential expression here and try to compute a correlation between these groups and ReCoN scores, providing a quantitative analysis.

      If the correlation was low because of the noise in the data (notably leading to the permutation of individual gene orders even if overall biological signals and gene set orders are conserved), we will additionally do a pathway enrichment over this data, enriching also the qualitative validation.

      R2.8. Section 2.6 Besides the cytokine section, it is difficult to assess the added value of this approach. Likely there is a lot of valuable findings here but difficult to say because the assessment is very qualitative.

      Response:

      • *

      One of the challenges around this work was to find relevant dataset to evaluate ReCoN. We tried to complete the direct quantitative evaluation from the Immune Dictionary with another quantitive evaluation from the heart atlas multicellular programs, despite a much less direct validation.

      • *

      We hope that the production of new perturbation experiments over multicellular datasets, especially cell-type targeted perturbations, will provide more opportunities to validate the different findings and claim from our current manuscript.

      • *

      On a similar note, no method seemed proposing similar predictions to be compared to. It led to the use of Nichenet score and the current decomposition of the ReCoN model in the section 2.2.1 to evaluate the contribution of the model.

      R2.9. The article is dense and writing should be reorganized for better readability.

      Minor issues -

      No p-values in figures.

      *Response: *

      • *

      We agree that integrating values directly in the panels would make the reading of the figure easier. We would like to introduce the p-values in the panels 2d, 2e, 2f, 2g. We had forgot to indicate in the legend of the panel 4.d that all bold scores were associated with a p-value *

      R2.10. Typo - ReCoN-genetic should be - ReCoN-generic.

      • *

      Response:

      • *

      We are thankful for noticing the typo and corrected it in the new version.

      • *

      R2.11. Authors may consider adding figures to describe their results on balance between direct and indirect effects in section 2.2.2.

      • *

      Response:

      • *

      Depending on the new findings on the indirect effect iterations, we propose adding an additional panel on their combination or a supplementary figure.

      • *

      R2.12. Redundancy in the following two lines -

      o While these approaches effectively describe what tissue-wide programs are coordinated, they generally offer limited insight into the molecular mechanisms that establish or regulate these programs.

      o Despite their ability to identify coordinated tissue-wide programs, multicellular program analyses typically offer limited insight into the underlying molecular mechanisms that orchestrate these programs.

      • *

      Response:

      • *

      We propose in the version of the manuscript to remove the first sentence. In our opinion, starting the next paragraph by this clarification seems more helpful to guide the reader than having it at the end of the previous one.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      R2.13. The direct and indirect effects are treated in two separate steps. In reality of course these effects are operating simultaneously. I wonder if this could be better modelled by iterating through the two steps. It might be worthwhile

      trying to see if that improves the performance.

      We thank the reviewer for this interesting idea, and propose to add a supplementary text to present the result of this discussion to the readers.

      • *

      The direct effect is supposed to be measurable from the first iteration only, as we try to represent the effect of direct receptor binding. Regarding the indirect effect, iterations could be done to model the indirect effect, which could represent more distant effect in time.

      • *

      On an algorithmic note, the indirect effect already allow several "iterations" of this effect, as each random walk can loop between all cell types until restart. However, it does not allow to control the weight of the different successive transition. In practice, with a high restart probability, an extreme weight is given to the first "iteration" over the second, as there is three layers to cross to explore the next cell.

      • *

      First, we propose clarifying this section of the manuscript, to explain the depth of the indirect effect explorations.

      • *

      Biologically, it is highly possible that these iterations have an important role to explain the complete reaction of the cells. However, we believe that it hits a major limitation of our modelling, and RWR based exploration in general, as it goes against the enforcement of restarts.

      • *

      We aim to represent pairwise measurements, representing the impact of one node on another. But random walks without restart are not naturally well fitted to this problem, as they naturally converge to a stationary distribution ((László, Lov, and Erdos 1996)). In the case of ReCoN, it means that each gene and receptor, if we pushed the exploration indefinitely, would have the same probability to end up on each node of the system.

      • *

      The restart mitigates this impact and enforces the impacts of the seeds by ensuring that the walkers stay close to the seed. (Tong, Faloutsos, and Pan 2006). By iterating successively from the new distribution obtained from the RWR, we would go against this important probability and progressively converge toward the stationary distribution from classical random walks.

      • *

      So we completely share the opinion of the reviewer that the iterative nature of the indirect effect should be explored too, but we don't believe that ReCoN can model them accurately. We hope that new exploration methods will be able to decipher the importance of these iterations, once additional arguments have been gathered to justify the global interest of considering the indirect effect.

      • *

      Bibliography:

      • *

      László L, Lov L, Erdos O. Random Walks on Graphs: A Survey. 1 Jan. 1996:1-46.

      • *

      Tong H, Faloutsos C, Pan J yu. Fast Random Walk with Restart and Its Applications. Sixth Int Conf Data Min ICDM06 Dec. 2006:613-22. https://doi.org/10.1109/ICDM.2006.70.

    1. Culture and Society – Diversity and Multi-Cultural Education in the 21st Century Some travelers pride themselves on their willingness to try unfamiliar foods, like the late celebrated food writer Anthony Bourdain (1956-2017). Often, however, people express disgust at another culture’s cuisine. They might think that it’s gross to eat raw meat from a donkey or parts of a rodent, while they don’t question their own habit of eating cows or pigs. Such attitudes are examples of ethnocentrism, which means to evaluate and judge another culture based on one’s own cultural norms. Ethnocentrism is believing your group is the correct measuring standard and if other cultures do not measure up to it, they are wrong. As sociologist William Graham Sumner (1906) described the term, it is a belief or attitude that one’s own culture is better than all others. Almost everyone is a little bit ethnocentric. A high level of appreciation for one’s own culture can be healthy. A shared sense of community pride, for example, connects people in a society. But ethnocentrism can lead to disdain or dislike of other cultures and could cause misunderstanding, stereotyping, and conflict. Individuals, government, non-government, private, and religious institutions with the best intentions sometimes travel to a society to “help” its people, because they see them as uneducated, backward, or even inferior. Cultural imperialism is the deliberate imposition of one’s own cultural values on another culture. When people find themselves in a new culture, they may experience disorientation and frustration. In sociology, we call this culture shock. In addition to the traveler’s biological clock being ‘off’, a traveler from Chicago might find the nightly silence of rural Montana unsettling, not peaceful. Now, imagine that the ‘difference’ is cultural. An exchange student from China to the U.S. might be annoyed by the constant interruptions in class as other students ask questions—a practice that is considered rude in China. Perhaps the Chicago traveler was initially captivated with Montana’s quiet beauty and the Chinese student was originally excited to see a U.S.- style classroom firsthand. But as they experience unanticipated differences from their own culture, they may experience ethnocentrism as their excitement gives way to discomfort and doubts about how to behave appropriately in the new situation. According to many authors, international students studying in the U.S. report that there are personality traits and behaviors expected of them. Black African students report having to learn to ‘be Black in the U.S.’ and Chinese students report that they are naturally expected to be good at math. In African countries, people are identified by country or kin, not color. Eventually, as people learn more about a culture, they adapt to the new culture for a variety of reasons. Cultural relativism is the practice of assessing a culture by its own standards rather than viewing it through the lens of one’s own culture. Practicing cultural relativism requires an open mind and a willingness to consider, and even adapt to, new values, norms, and practices. Perhaps the greatest challenge for sociologists studying different cultures is the matter of keeping a perspective. It is impossible for anyone to overcome all cultural biases. The best we can do is strive to be aware of them. Pride in one’s own culture doesn’t have to lead to imposing its values or ideas on others. And an appreciation for another culture shouldn’t preclude individuals from studying it with a critical eye. This practice is perhaps the most difficult for all social scientists.

      Delete this entire section it is a repeat from begining of 1.4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The nematode C. elegans is an ideal model in which to achieve the ambitious goal of a genome-wide atlas of protein expression and localization. In this paper, the authors explore the utility of a new and efficient method for labeling proteins with fluorescent tags, evaluating its potential to be the basis for a larger, genome-wide effort that is likely to be very useful for the community. While the evidence for the method itself is solid, carrying out this project at a large scale will require significant additional feasibility studies.

      We appreciate the editor’s recognition that the evidence for our method is solid and that a genome-wide protein atlas in C. elegans would be highly valuable to the community. However, we respectfully disagree that “significant additional feasibility studies” are required. Take the yeast proteome-wide GFP tagging project (Huh et al., Nature 2003). It achieved ~75% coverage of ~6,000 proteins directly from an established protocol without any prior significant feasibility studies, at least to our knowledge. While the C. elegans genome is 3 times in size, we would argue that our tagging protocol may even be less labor intensive as it does not involve any cloning and the screening is visual, requiring no molecular biology skills. Reviewer 3 notes: ‘They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.’

      Our pilot study validates all key parameters for genome-wide scaling: editing efficiency at novel loci with untested reagents, viability of tagged worms, and detectability of multiple spectrally separated fluorophores across expression ranges. These address the core technical, biological, and practical challenges of large-scale endogenous tagging in a multicellular organism, leaving no fundamental barriers in our view.

      The proposed cost and timeline align quite favorably with established large-scale consortium projects: e.g., ENCODE pilot analyzed 1% of the human genome at ~$55 million over 4 years; Mouse Knockout Consortium scaled to ~20,000 genes over 20 years (ongoing) with ~$100 million; Human Protein Atlas mapped ~87% of proteins with antibodies in fixed cells (through much more labor intensive methods) over 20+ years at >$100 million. With ~8% of C. elegans genes already tagged (WormTagDB) and labs already tagging entire gene classes (PMID: 40463100), scaling our protocol to the proteome is feasible, potentially covering the genome in 5-6 years by a single lab or faster with distributed effort at a reagent cost of merely $2.2 million. The main barriers now are funding commitment and assembling collaborators, not further feasibility testing.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Eroglu and Hobert demonstrate that injecting CRISPR guides and repair constructs to target three genes at a time, tagging each with a different fluorescent protein, and selecting which gene to tag with which fluorophore based on genes' expression levels, can improve the efficiency of gene tagging.

      Strengths:

      This manuscript demonstrates that three genes can be targeted efficiently with three different fluorophores. It also presents some practical considerations, like using the fluorophore least complicated by agar/worm autofluorescence for genes with low expression levels, and cost calculations if the same methods were used on all genes.

      Weaknesses:

      Eroglu has demonstrated in a previous publication that single-stranded DNA injection can increase the efficiency of CRISPR in C. elegans while inserting two fluorescent proteins and a co-CRISPR marker into three loci. The current work is, therefore, an incremental advance. In general, I applaud the authors' willingness to think ahead to how whole proteome tagging might be accomplished, but I predict that the advance here will be one of many small advances that will get the field to that goal.

      Our manuscript indeed builds on prior multiplex editing (including our own co-CRISPR work), but the manuscript's primary contribution is not a novel technical breakthrough per se. Instead, our main goal was to pilot and strategize a feasible path to whole-proteome tagging in C. elegans and, most critically, test the following key parameters: (1) success rate of triple pools with prior untested reagents at novel targets; (2) utility of fluorophores across expression levels; (3) major effects on tagged protein function. In prior multiplexing, we used two targets which we already knew could be edited quite efficiently, with the 3rd target a point mutation with nearly 100% efficiency. Thus, it was not at all clear that picking 3 random genes and replacing the 3rd highly efficient locus with another less efficient large insertion would work or be sufficiently scalable for thousands of novel genes with unvalidated reagents at first pass.

      The title vastly oversells the advance in my view, and the first sentence of the Discussion seems a more apt summary of the key advance here.

      Some injections target genes on the same chromosome together, which will create unnecessary issues when doing necessary backcrossing, especially if the mutation rate is increased by CRISPR.

      We disagree with the reviewer’s assessment of the need for backcrossing, for two reasons: (1) Prior studies have shown that off-target mutations are not a serious concern in C. elegans (reviewed in PMID: 26336798). For instance, WGS of strains after CRISPR/Cas9 found negligible off-target effects (PMID: 25249454, PMID: 30420468 – using similar RNP/ssDNA method and multiple guides; PMID: 23979577, PMID: 27650892 using other methods). Targeted sequencing studies have reported similar findings, using various CRISPR/Cas9 methods, with essentially no mutations at sites other than the intended target (PMID: 23995389; PMID: 23817069). (2) If the goal is to tag the entire genome, the introduction of backcrossing should not reasonably be a routine part of the initial tagging.

      Lastly, if one really does want to backcross, the existence of tags on the same chromosome is actually an advantage because it permits selection for recombinants with wild-type chromosomes.

      Also, the need for backcrossing and perhaps sequencing made me wonder if injecting 3 together really is helpful vs targeting each gene separately, since only 5 worms need to be injected.

      Apart from our disagreement regarding backcrossing, we are puzzled by the reviewer’s comment. Why would one do single tagging at a time, rather than triple tagging if the whole point is to scale up tagging? It is important to keep in mind that the rate limiting step for tagging the whole genome is the number of injections that can be done per day. Since there is no cloning to generate the repair templates/guides and all other reagents are commercially available and not sample specific, these can be prepared quite rapidly. Being able to isolate multiple lines (together or independently) from the same injection increases throughput 3-fold and in our view does not provide any disadvantages as individual tags can be isolated independently if desired.

      Beyond the numerous technical advantages pooling provides (also lower cost and throughput for making injection mixes as well as imaging), our results show that it yields epistemic benefits as well: we would never have noted the subcellular pattern in Fig. 6B, C with different sets of mitochondria being marked by different mitochondrial proteins had we imaged them separately or even aligned to a pan-mitochondrial landmark. As we mentioned in the discussion, grouping proteins predicted to localize to the same compartment together can simultaneously test how uniform or differentiated such compartments are during the screen.

      The limited utility of current blue fluorescent proteins makes me wonder if it's worth using at all at this stage, before there are better blue (or far red) fluorescent proteins.

      We do not think that the utility of current BFPs is that limiting. At least the theoretical brightness of mTagBFP2 is comparable to that of EGFP (PMID: 30886412), which was useful for the bulk of currently tagged proteins. Due to modestly higher autofluorescence in the blue spectrum, the practical brightness is somewhat less ideal, but we have shown that many proteins are expressed high enough to be detected quite well with mTagBFP2 by eye at low magnification. We also note that many tags that are not visible by eye under a dissection scope become visible with long exposure cameras of widefield microscopes or modern confocal (GaAsP) detectors, so the list of genes detectable with mTagBFP2 is likely to be much higher. We routinely use mTagBFP2 to super-resolve subnuclear structures with endogenous tags (e.g., in the nucleolus), with some tags having lower annotated FPKMs than the genes tested here.

      Some literature reviews, particularly in the Introduction and Abstract, rely too much on recent examples from the authors' laboratory instead of presenting the state of the field. I'd like to have known what exactly has been done with simultaneous injection targeting multiple loci more thoroughly, comparing what has been accomplished to date by various laboratories' advances to date.

      We are not sure what the reviewer is referring to. In the Abstract, we do not refer to any literature. In the Introduction, we cite 28 papers, 6 of those from our lab (4 of which providing examples of protein tags). We do not believe that this can be fairly called an unbalanced presentation of the state of the field.

      This being said, we have gladly expanded our Introduction to provide more background on co-CRISPRing. Labs have routinely used co-conversion (“coCRISPR”) markers for picking out their intended edits (e.g., point mutations or insertions), as it has been shown by multiple groups that a CRISPR/Cas9 edit at one locus correlates with efficiency at other simultaneous targets (PMID: 25161212). Generally, making point mutations with the Cas9/RNP protocol is highly efficient, especially at specific loci such as dpy-10. However, multiple FP-sized insertions have not been routinely attempted. We and only one other group have successfully attempted it using previously working targets and reagents (e.g., 28% in PMID: 26187122). Importantly, the efficiency of such multiple insertions has never been assessed at scale and using entirely untested reagents at novel sites – critical parameters to determine for a whole genome approach. So, we test here (1) the efficiency of triple insertions and (2) the chance of getting them with new and untested guides and reagents.

      In our view, since we have to use some injection/coCRISPR marker anyway for those genes which are not expressed at dissecting-scope visible levels (likely most genes), using highly expressed intended targets as improvised markers in a pooled approach makes our approach much more efficient. It allows us to find the worms with the highest chance of yielding CRISPR insertions, which we can screen with higher power methods for the dimmer targets, while enabling us to co-isolate other intended targets. Insertions, being often heterozygous in F1, can be segregated independently if desired, or homozygosed together to facilitate maintenance then outcrossed individually by those interested in studying specific genes in more detail.

      In the revised version of this manuscript, we now discuss some of these points in the introduction section:

      “Currently, around 1554 proteins representing 8% of the proteome are estimated to have been endogenously tagged (Leyhr et al., 2025). However, at current rates, tagging the proteome is projected to take around 100 years and likely involve numerous duplicate attempts on a small number of commonly studied proteins (Leyhr et al., 2025). It will thus be crucial for the field to coordinate tagging efforts and scale up tagging protocols to enable coverage of the entire genome at a reasonable timescale and cost. Given the number of injections is a major time-limiting factor, pooling multiple injections into one would at minimum cut tagging time by a factor of 3. In C. elegans, screening for novel CRISPR/Cas9-induced genomic edits is already facilitated either by use of co-injection markers (i.e., plasmids that form extrachromosomal arrays) that yield phenotypes or fluorescence in progeny of successfully injected worms, or co-editing well characterized loci using established and highly efficient reagents which likewise yield visible phenotypes. In the latter approach, termed “co-CRISPR”, worms edited at the marker locus are most likely to also carry the intended edit (Arribere et al., 2014). Recent methods for CRISPR/Cas9 mediated genomic insertions have pushed efficiencies to sufficient levels to simultaneously insert multiple fluorophores (e.g., mNeonGreen and mScarlet) as well as a co-CRISPR marker (dpy-10) at three independent loci in a single injection (Eroglu et al., 2023; Paix et al., 2015). These attempts pooled reagents previously established to work efficiently and targeted genes that were known to yield functional fusion proteins when tagged. Thus, while in principle current methods could allow tagging of at least 3 independent loci in one injection if a co-CRISPR marker is omitted, it is not known to what extent such an approach could be generalized across the genome with previously unvalidated reagents (i.e., guides and repair template homology arms) at novel loci to yield functional tags”

      Reviewer #2 (Public review):

      The manuscript by Eroglu and Hobert presents a set of strains each harboring up to three fluorescently tagged endogenous proteins. While there is technically nothing wrong with the method and the images are beautiful, we struggled to appreciate the advance of this work - who is this paper for?

      We consider this paper to have two purposes: (1) motivate the community to come together to consider such genome-wide tagging approach; (2) provide a reference point for funding agencies that such an aim is not unreasonable and will provide novel interesting insights.

      As a technical method, the advance is minimal since the first author had already demonstrated that three mutations (fluorophore insertion and co-CRISPR marker) could be introduced simultaneously.

      We agree that the basic principle is similar. However, it was not clear that triple pooling three novel large edits would work, given the numbers in our original paper or that it would be scalable.

      The dpy-10 coCRISPR marker previously used is a highly efficient single site, with close to 100% hit rate. We also knew in the earlier study that the two pooled insertions already worked quite efficiently and did not disrupt the function of targeted proteins. Exchanging these plus dpy-10 for three novel tags was not guaranteed to succeed for many potential reasons, including both biological and technical. For instance, such a “marker free” approach necessitates that a significant number of targets in the genome should be expressed highly enough to be visible by fluorescence stereomicroscopy when tagged with current best fluorophores. The chance of disrupting gene function by tagging was also not explored in detail in C. elegans, nor whether one untested guide is generally sufficient. We think that establishing these parameters was meaningful and necessary for the goal of whole genome tagging. We have clarified some of these points in the text.

      As a pilot for creating genome-scale resources, it is not clear whether three different fluorophores in one animal, while elegantly designed and implemented, will be desired by the broader community. 

      The usage of three different fluorophores is largely driven by the ability to co-inject and therefore cut injection effort by a factor of three. Moreover, having all three fluorophores together facilitates imaging and maintenance. Lastly, co-labeling has the potential to reveal unexpected patterns of co-localization or lack thereof (example: two mitochondrial proteins that we found to not have overlapping distribution). We clarified this point in the revised text in both the results and discussion.

      Finally, the interpretation of the patterns observed in the created lines is somewhat lacking. A Table with all the observations must be included. This can replace the descriptions of the observations with the different lines, which could be somewhat laborious for the reader, and are often wrong. There are numerous mistaken expectations of protein expression here, but two examples include:

      We are not convinced that our expectations are mistaken. Below we respond to the reviewer’s specific examples, and we are open to hear from the reviewer about additional cases.

      (1) The expectation that ACDH-10 is enriched in the intestine and epidermal tissues (hypodermis).

      There are multiple paralogs of this protein (see WormPaths or WormFlux) that may share functions in different tissues. There is also no reason to assume that fatty acid metabolism does not occur in other tissues (including the germline). Finally, there are no published studies about this enzyme, so we really don't know for sure what it's doing.

      The expression of acdh-10 is annotated in multiple scRNA datasets as intestine and epidermal enriched (CeNGEN/Taylor et al. 2021, highest in epidermis; Ghaddar et al 2023 highest in intestine). We did not mean to imply that fatty acid metabolism does not occur in the gonad, nor that a paralog of acdh-10 could not be performing the same function in tissues where acdh-10 is not expressed.

      However, this raises an important question: why have different paralogs doing the same thing? Duplicate genes with the same function are generally not evolutionarily stable (PMID: 11073452, PMID: 24659815). That there are such striking tissue specific expression patterns of an essential or widely expressed protein class suggests that paralogs of the gene likely differ in some meaningful parameter that might align with tissue-specific functional needs or regulation. The reviewer’s statement that ‘there are no published studies about this enzyme, so we really don't know for sure what it's doing’ is in fact an excellent demonstration of our point; finding out where the duplicates are expressed can provide a starting point to uncover potential differences between the paralogs. At the very least it can delineate to what degree paralogs diverge in their expression across the proteome and identify which such cases merit further study. In a more ideal scenario, prior information of protein function could indicate that the involved pathway requires tissue specific regulation.

      (2) The expectation that HXK-1 is ubiquitously expressed.

      Three paralogous enzymes are all associated with the same reaction, and we have shown that these three function redundantly in vivo, perhaps in different tissues (PMID: 40011787).

      The cited paper (PMID: 40011787) does not show where they are expressed. We discussed redundancy/paralogs above in point 1, and in our view the same applies here. They may perform the same reaction but are likely to differ in some meaningful way, be it regulation or rate of activity, for them to be stably maintained as functional genes over evolution.

      Moreover, single-cell RNA-seq data (PMID: 38816550) also show enrichment of hxk-1 in gonadal sheath cells.

      The Ghaddar et al. and CeNGEN/Taylor et al. datasets do not show this. The scRNA paper cited (PMID: 38816550) also shows enrichment in neurons, pharynx, coelomocyte and germ cells which we did not note. In our view, these in fact further support our goals: often, transcript datasets alone (frequently used to infer tissue function) do not sufficiently predict protein expression. One can post hoc find an scRNA-seq dataset that aligns somewhat with our protein observations, but how does one know which to trust a priori? Disagreements between transcript datasets will ultimately require resolution at the protein level, in our view.

      To clarify these points, we added the following to the discussion section:

      “We also noted unexpected cell type dependent distributions of proteins involved in broadly important metabolic processes such as ACDH-10, which was depleted from the germline compared to other tissues, and HXK-1, which was highly enriched in the gonadal sheath. Notably, for these as well as other cases, scRNA-seq datasets were not sufficient to deduce a priori the observed cell type specific differences at the protein level. Importantly, many genes encoding metabolic enzymes including acdh-10 and hxk-1 have paralogs that likely perform similar catalytic functions. Yet, duplicate genes with identical functions are generally not evolutionarily stable (Adler et al., 2014; Lynch and Conery, 2000); thus such genes are likely to differ in some meaningful parameter (e.g., regulation or activity) that might align with tissue-specific functional needs. Fully annotating the expression patterns of paralogs at the protein level could indicate which tissues require unique metabolic needs and indicate which paralogous genes have undergone sub- versus neo-functionalization. For those proteins that are less functionally understood, unexpected distributions might indicate which merit further study.”

      The table should have at least the following information: gene/protein name - Wormbase ID - TPM levels of single cell data assigned to tissues for L2, L4, and adult (all published) - tissues in which expression is observed in the lines presented by the authors.

      We added some of this information such as annotated expression levels in young adults from various scRNA datasets (but not larval datasets as we did not image these). We note that each of these studies use different pipelines and report different metrics (scaled TPM/Z-score versus Seurat average expression versus TPM), so comparisons between them are not informative unless they are integrated and analyzed together.

      Reviewer #3 (Public review):

      Summary:

      The authors argue that establishing the expression pattern and subcellular localisation of an animal's proteome will highlight many hypotheses for further study. To make this point and show feasibility, they developed a pipeline to knock in DNA encoding fluorescent tags into C. elegans genes.

      Strengths:

      The authors effectively make the points above. For example, they provide evidence of two populations of mitochondria in the C. elegans germline that differ qualitatively in the proteins they express. They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.

      We appreciate the referee’s recognition that whole proteome tagging is feasible.

      Weaknesses:

      Cell biology in C. elegans is challenging because of the small size of many of its cells, notably neurons. This can make establishing the sub-cellular localisation of a fluorescently tagged protein, or co-localizing it with another protein, tricky. The authors point out in their introduction that advances in light microscopy, such as diSPIM, STED, and ISM (a close relative of SIM), have increased the resolution of light microscopy. They also point out that recent advances in expansion microscopy can similarly help overcome the resolution limit.

      (1) Have the authors investigated if the three fluorescent tags they use are appropriate for super-resolution microscopy of C. elegans, e.g., STED or SIM? Would Elektra be better than mTAGBFP2? How does mScarlet3-S2 compare to mScarlet 3?

      All three tags work for ISM (i.e., Airyscan). We previously tried Electra (not for the genes tested here) but could not isolate positive tags. Given Electra is not that much brighter on paper than mTagBFP2 we did not pursue it further, though we recognize that these may simply have been unlucky injections. mScarlet3-S2 is quite a bit dimmer than mScarlet3 on paper – the advantage is that it has higher photostability. In our view, the limiting factor will be having FPs that are bright enough to screen, image and scale to the whole genome, so brightness will likely provide an advantage over photostability at this stage.

      (2) Have the authors investigated what tags could be used in expansion microscopy - that is, which retain antigenicity or even fluorescence after the protocol is applied? It may be useful to add different epitope tags to the knock-in cassettes for this purpose.

      mSG and mSc3 retain fluorescence after fixing with formaldehyde. We have not tested mTagBFP2 fluorescence in fixed worms. We agree that adding different epitope tags would be useful.

      The paper is fine as it stands. The experiments above could add value to it and future-proof it, but are not essential. If the experiments are not attempted, the authors could refer to the points above in the discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Merged figures appear saturated, and use colors that won't work for red-green colorblind viewers. 

      For all figures, we also show individual channels separately, which is common practice for making fluorescence images accessible to colorblind readers (PMID: 33788834). Figures highlighting non-overlap like 6B and C are already in accessible colors when merged (blue/green) and include a numerical quantification. 3-color RGB images preserve the greatest information for the highest number of individuals.

      (2) Targeting ubiquitously expressed genes as a proof of concept gives me some concern that this might underestimate the challenges that may be experienced with less widely expressed genes.

      While the genes were predicted to be ubiquitously expressed, many were not in practice, like HXK-1 and F54C8.1, which were also among the lower expressed genes on our list and highly cell type restricted. As discussed, the more tissue restricted a gene, the likelier that bulk RNA levels underestimate expression. Such genes are therefore more likely to be detected in a specific tissue. We routinely isolate tissue restricted endogenous tags, including those expressed in only a few neurons, with bulk FPKMs lower than the ranges tested in this manuscript.

      (3) Some results are not shown or referenced (autofluorescence, for example, is shown using a schematic in Figure 1C).

      We now provide representative images alongside what would be expected to be observed by eye during screening.

      (4) It would be useful to describe how to recover worms from what is shown in Figure 1A. 

      In the revised version, we added the following in the caption for Fig. 1A:

      “Selected worms expressing the brighter tag can be screened for dimmer tags by higher magnification and long exposure imaging. Worms can be recovered directly from slides if immobilized by levamisole as described (Ghanta et al., 2021). Alternatively, single hermaphrodite worms can be isolated, allowed to lay eggs, then screened.”

      (5) A blue bar of data must be missing from Figure 3B injection pool 5.

      As stated in the text, “All but one tag (cox-6B::mTagBFP2) was visible in the F1 generation of injected P0 animals, and these were subsequently isolated among F2 worms positive for the other tags in the pool.”

      To clarify that data points are not unintentionally omitted, we added the following text to the caption of Fig. 3B:

      “For group 5 including cox-6B::mTagBFP2, worms with detectable levels of mTagBFP2 fluorescence were not recovered in the F1 generation but were isolated among progeny of F1s positive for mStayGold and mScarlet3; we were thus unable to quantify efficiency for this locus at F1.”

      (6) Some expression or localization patterns were unexpected, but complications like germline silencing and protein mislocalization, with a small fraction localizing normally and rescuing function, were not presented as possibilities. Viability is used to confirm function, but without presenting whether this means 100% viability, less, or just the ability to maintain a strain.

      We already do discuss mislocalization and functionality issues in the Discussion, as well as tradeoffs of alternate methods. Any existing method to observe biological molecules, be it protein, RNA or DNA, has multiple drawbacks and sources of artifacts, which are unlikely to be fully eliminated in the foreseeable future.

      In regard to germline silencing of endogenously tagged genes in C. elegans, there is actually very little evidence for this. Collectively, various labs have now generated over 200 reporter alleles of germline-expressed genes (WormTagDB), with robust expression throughout the germline and retention of function. Likewise, numerous of our tags across fluorophores showed robust germline expressions including EEF-1A.1::mTagBFP2, Y22D7AL.10::mStayGold, and HAT-1::mScarlet3. In fact, overall transcript levels generally tended to underestimate germline enrichment at the protein level. We note that single-copy transgenes driven by eef-1A.1/eft-3 promoter by itself are frequently not expressed in the germline (PMID: 31064766); that we could detect EEF-1A.1 robustly in the germline when tagged endogenously is evidence that silencing is unlikely to be a widespread concern, and at the least less of a concern than single copy transgenes. We appreciate that for a transgene, presence/absence of specific sequence elements and genomic loci play a role in expression, but an endogenous tag captures all such information at a given locus.

      Indeed, we found only two reports of endogenous tags being silenced in the germline, the first being a novel tag (not fluorophore) which initially prevented expression at the tagged locus (PMID: 30109984), but after making changes to the sequence to avoid silencing signals the authors could rescue expression and thereafter saw robust expression in various novel contexts with this tag. The second example (PMID: 34547227) leaves open the possibility that germline repression of that particular gene might be a part of its endogenous regulation.

      Nevertheless, given it is probably rare if occurs at all, it will likely take a large scale tagging effort to uncover such cases at sufficient numbers to study. In our view, this further justifies tagging at large, ideally genomic, scales. If we do discover that there are numerous annotated germline proteins which we don’t observe by tagging, that would be interesting to study on its own.

      (7) Halotag is presented in the Discussion as a small tag, but it is bigger than GFP.

      Thank you for catching this. We have removed the discussion of Halotag. Given the comparable size to FPs, it would be unlikely to alleviate issues of tag functionality.

      (8) It would be useful to include FPKMs and viability percentages in Table 1.

      FPKM is included in column 6, but the title for this column is cut off. In the revised table FPKM values are now shown more clearly across stages.

      We did not quantify viability percentage. In our view it does not yield an informative metric when there is little information about the protein’s required dosage for function, which was the case for most proteins here. A haplosufficient gene might yield a full brood size even if 50% of protein function is lost; conversely, a highly dose sensitive protein could yield penetrant and severe inviability with mild perturbation of function. It also is not actionable information at this stage if there is no alternate tagging strategy as a baseline of comparison. The worms we picked to image all have viable embryos as adults, so in those individuals the genes were likely to be sufficiently expressed and functional.

      (9) Because establishing that a guide works well is a limiting step for many CRISPR experiments (once a guide works well, it's easy to inject 5 worms and get lines), I wondered if testing that for many genes is what is really needed in the field at this stage. 

      Guide quality is rarely an issue in C. elegans, as for all the genes here we tried only one guide, all of which were previously untested. We now clarified this in the discussion section:

      “Notably, we find that previously untested guide RNAs and homology arms perform exceptionally well at novel loci, as we only tested one set of reagents for each locus which yielded satisfactory tagging rates.”

      (10) For a manuscript where the injection is so central to what was done, I was surprised to read in the Acknowledgments that all of the injections were done by someone who is not included as an author.

      We are likewise surprised by such a comment but gladly clarify: Chi Chen has been with us as an expert microinjection specialist for more than 25 years and her very important technical contributions have been acknowledged in many dozen papers. Multiple authorship guidelines, including COPE’s and ICMJE’s, state that technical contributions alone do not qualify for authorship.

      Reviewer #2 (Recommendations for the authors):

      (1) We would encourage the authors to provide systematic validation of the reported insertions. The manuscript reports that 24 of 30 tags were isolated and visible, but does not clearly state whether each isolated line was confirmed by sequence‑level validation to be correctly in‑frame and free of unintended mutations at the target locus.

      We appreciate the reviewer’s concerns on fidelity. These parameters have been assessed in prior published work (e.g., PMID: 30504364, PMID: 34748534) and in our hands are in the range of 80% whenever we sequence non-fluorescent tags of similar sizes. The efficiencies we observed are high enough that one can expect to recover numerous worms with the exact intended sequence for each target, though we would argue mutations within the FP reporter are less likely to matter if it retains high fluorescence.

      (2) The manuscript presents aggregated success counts (e.g., 8/10 mTagBFP2 tags, 9/10 mStayGold, 7/10 mScarlet3) and useful narrative descriptions of injection outcomes. We also suggest including per‑locus success rates.

      Figure 3B shows per locus success rate and source data is provided for this figure. Each dot is an individual injection and the Y axis is per locus rate. We now worded this more clearly in the figure’s caption.

      “Total insertion efficiencies per locus for the indicated targets across injection pools.”

      (3) For pools that required re‑injection after initial failures, we would like to see a description of the specific changes that were made to the injection mixes or procedures (e.g., new repair template prep, different Cas9 reagent lot, guide redesign). This will be useful troubleshooting information for others.

      We re-made the exact same injection mix but with nanodrop to ensure the purity of the repair templates as assessed by absorbance ratios (A260/230 and A260/280) were sufficient after each purification step. No other changes were made. This is now specified in the methods section in the following way:

      “For re-runs of pools 4, 6 and 10 which failed initially, we regenerated the repair templates and ensured that after each column purification, the A260/230 ratio of the purified DNA was ≥2.2 and A260/280 was 1.8 ± 0.05 when measured with a Nanodrop spectrophotometer.”

      (4) The authors state that the fluorophore sequences are codon-optimized for C. elegans. We suggest they provide the exact donor/tag sequences, specifically state whether the fluorophore sequences contain any synthetic/artificial introns, or whether other sequence modifications (e.g., silent PAM‑disrupting mutations) were included in the donor templates. 

      This information is provided in Supplementary Table 1.

      (5) Page 3: Include a reference for "The C. elegans genome encodes around 20,000 genes" 

      We added a reference to the most recent release of the genome (WS237, May 2013). Spieth et al., 2014.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to uncover novel therapeutic vulnerabilities in APC-mutant colorectal cancer (CRC), which constitutes the majority of CRC cases. They hypothesized that modulating oxygen-sensing pathways (via PHD inhibition) could disrupt adaptive stress responses in these tumours.

      Strengths:

      The study employs a powerful, two-pronged approach to identify Molidustat's targets. By using both Thermal Proteome Profiling (TPP) and an orthogonal chemical proteomic competition assay, the authors provide compelling evidence that GSTP1 is a genuine, direct off-target, effectively addressing the common limitation of indirect effects in proteomic screens.

      Weaknesses:

      (1) In Figure 1, the current data rely on a single guide RNA (sgRNA). To make the data solid, at least two independent sgRNAs targeting different regions of PHD2 should be used.

      We thank the reviewer for raising this. Clarity on the CRISPR strategy was missing from the original submission and we have now added the following to the Methods (Page 4). We did not use a single sgRNA. PHD2 was targeted with a pool of three chemically modified crRNAs:

      (IDT Alt-R; target sequences: 5'-TACAACCAGCATATGCTACA, 5'GTGGCTGCCGAAGCCGAGCC, 5'-GATAAGATCACCTGGATCGA)

      Delivered as in vitro assembled ribonucleoprotein complexes with high-fidelity Cas9. This format has been reported to achieve high on-target efficiency while minimising off-target cutting [1,2] such that any residual stochastic off-target events are distributed across the population and are not expected to manifest as a coherent phenotype at the population level. Working with pooled, unselected knockouts rather than single-cell clones also avoids the confounds of clonal heterogeneity that normally motivate the use of multiple independent guides and rescue experiments in single-clone workflows. We have previously validated this approach for GSTP1 knockout in a separate single-cell proteomics study [3], where loss of GSTP1 protein was observed in over 90% of single cells and GSTP1 was the most significantly altered protein between sgControl and sgGSTP1 populations.

      (2) Figure 3E: Asn205 site should be mutated to prove that whether Molidustat inhibits GSTP1 activity via Asn205 or not.

      This is a good suggestion, and we explored it in silico before concluding it was not tractable. We used PyMol mutagenesis to model Molidustat binding to GSTP1 variants at the predicted contact residues: Asn205 was mutated to Ala, Gly and Ser; Trp39 (predicted to hydrogen-bond Molidustat) was mutated to Ala, Phe and Thr; and a Tyr8Phe/Asn205Ser double mutant was also modelled. In every case, Molidustat reoriented within the active site and adopted an alternative hydrogen-bonding configuration (most commonly with Tyr8), yielding a docking score equal to or better than binding to native GSTP1 (Author response image 1– Author response image 4). The model therefore does not predict any single or double point mutant that would ablate Molidustat binding in a clean, interpretable way, and we could not design a rational loss-of-interaction mutant on this basis. Given this limitation, and that definitive mapping of the binding interface would require co-crystallography, which is beyond the scope of the present study, we have moved the docking model to the supplement and flagged it as predictive rather than definitive.

      Author response image 1.

      Molidustat in native GSTP1

      Author response image 2.

      Molidustat docking with mutated GSTP1, Asn205 mutated to Gln205

      Author response image 3.

      Molidustat docking with mutated GSTP1, Tyr39 mutated to Phe39

      Author response image 4.

      Molidustat docking with mutated GSTP1, Asn205 mutated to Ser205 and Tyr8 mutated to Phe8

      (3) Figure 5B and 5C: The metabolic imbalance phenotype observed upon dual knockout of PHD2 and GSTP1 requires rescue experiments to confirm on-target specificity.

      We thank the reviewer for this important point and agree that rescue experiments could represent the most direct demonstration of on-target specificity for the metabolic phenotype observed in Figures 5B and 5C. These rescue experiments are necessary when working with single clones, as they allow for comparing a knock-out clone with a reconstituted pool and sidestep the issue of clonal heterogeneity.

      In our case, we think that there is no advantage to doing so, as we work with pooled knockouts, so any clonal heterogeneity is diluted in the pool.

      One could even make the case that such a rescue experiment would introduce additional artefacts. Combined loss of PHD2 and GSTP1 leads to reduced cellular viability, with decreased proliferation and increased apoptosis, consistent with a synthetic lethal interaction. To devise a rescue experiment, we would have to isolate a single-cell clone (the pool is not a complete 100% knock out, WT cells would outgrow the knock out cells). The isolation of such a clone that has overcome the anti-proliferative insult of the double knockout is likely to have a phenotype distinct from the original, pooled population, as would the rescued have from the WT cells. For these reasons, we have not performed rescue experiments in the current study. We have added the absence of a rescue as a limitation to the study in the discussion

      “While genetic rescue experiments would provide definitive confirmation of on-target specificity, the pronounced loss-of-fitness and apoptotic phenotype observed upon combined PHD2 and GSTP1 loss limited the feasibility of establishing stable rescued double-knockout populations, and therefore represents a limitation of the current study.”

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to determine Molidustat targets and the potential utility of these findings. They clearly demonstrate that Molidustat interferes with GSTP1 and some other proteins on top of PHD2. They also demonstrate that PHD2 deletion is not sufficient to recapitulate Molidustat effects in cells and proteomes. Finally, they demonstrate synthetic lethality in organoids for Molidustat and APC deletion.

      Strengths:

      The data on Molidustat proteomes, GSTP1 binding, inhibition and metabolic health of organoids is really clear. All biochemical, docking and omic data are really strong. The potential impact of these findings could be the use of Molidustat in APC null tumours and awareness of potential off-target effects.

      Weaknesses:

      A main but minor weakness is that Molidustat also inhibits other PHDs, although these are less expressed. PHD1 has been shown to control the cell cycle and be expressed in the colon, where it is needed for viability. Although this does not explain the lack of effect of other PHD inhibitors, it does warrant some discussion. The use of MTT is not very good to detect viability when it measures metabolism; this also needs to be discussed and perhaps supplemented with colony or cell number measurements.

      Great point, for this reason, we have assayed apoptosis throughout. In addition, we have added a clonogenicity assay with APC organoids. Organoid cells were treated with an acute dose of Molidustat. We subsequently measured the level of Lgr5 (a stem cell marker) and of the ability of the cells to generate organoids (these data have been added as Figure 5 F-G.)

      Reviewer #3 (Public review):

      In this paper, the authors revealed that Molidustat can induce a dose-dependent increase in Caspase-3/7 activity in the HT29 cell line, which is an APC-mutant colorectal cancer cell line. More importantly, they found that targeting PHD2 alone cannot cause cell death. By using thermal proteome profiling (TPP) and orthogonal chemical proteomic competition assays, they determined GTSP1 as a previously undiscovered off-target of Molidustat. They also revealed that combined PHD2 and GSTP1 loss leads to an increase in intracellular ROS and apoptosis. Moreover, they evaluated the effects of Molidustat in colonic organoids and showed that

      Molidustat has a high selectivity for colonic organoids with activated WNT signaling and/or KRAS pathway alterations, and this effect is not reproduced by hydroxylase inhibition alone, providing a new potential approach to targeting both PHD2 and GTSP1 for the treatment of APC-mutant CRC.

      Specific comments:

      (1) What is the possible molecular mechanism of dual GSTP1/PHD2 loss, inducing cell death?

      This is an important question. Our data support a model in which combined loss of GSTP1 and PHD2 disrupts cellular redox homeostasis, leading to accumulation of reactive oxygen species, increased GSSG/GSH ratios, and depletion of antioxidant buffering capacity. This redox imbalance is accompanied by downregulation of pro-survival pathways. In this context, activation of apoptotic signalling, as evidenced by increased caspase-3/7 activity and proteomic enrichment of apoptosis-associated pathways, contributes to the observed cell death phenotype.

      While apoptosis is supported by our data, the magnitude of oxidative stress suggests that additional oxidative stress-associated cell death mechanisms may also contribute. We have clarified this point in the Discussion (Page 11).

      (2) Can the authors mutate the binding site of Molidustat on GTSP1 to verify the in silico docking results?

      This is a very important question. Currently, the model is of limited value. Reviewer 1 had a similar question. Can we refer you to Reviewer 1, question 2.

      (3) Evidence for Molidustat inhibiting PHD2 activity or stabilising HIF-1α should be provided.

      We thank the reviewer for this suggestion. Data showing HIF-1α stabilisation and evidence of downstream signalling is now added to Supplementary Figure 1.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I only have minor suggestions:

      Molidustat also inhibits other PHDs, although these are less expressed. PHD1 has been shown to control the cell cycle and be expressed in the colon, where it is needed for viability. Although this does not explain the lack of effect of other PHD inhibitors, it does warrant some discussion. The use of MTT is not very good to detect viability when it measures metabolism; this also needs to be discussed and perhaps supplemented with colony or cell number measurements.

      This is correct, PHD1 is of particular interest, given the effects inhibition/knock-out has on the inflamed colon. We have added a new paragraph to the Discussion (Page 13) that addresses the isoform selectivity of Molidustat. We note that, although developed as a PHD2 inhibitor, Molidustat retains appreciable activity against PHD1 and PHD3 [4], and we discuss the non-redundant and in some contexts opposing roles of PHD1 and PHD2 in the colon, PHD1 loss is protective in DSS colitis [5] and restrains colitis-associated tumour growth, whereas PHD2 loss in the tumour and stroma is reported to inhibit metastasis and treatment response [6]. We further note that this pattern of isoform engagement is shared with other pan-PHD inhibitors that did not phenocopy Molidustat in our screens, indicating that PHD isoform profile alone is insufficient to explain Molidustat’s distinctive activity and pointing to GSTP1 off-target engagement as the key distinguishing feature. We argue that localised colonic delivery (as discussed earlier in the Discussion) would concentrate drug at the APC-mutant epithelium while limiting systemic exposure.

      We fully agree with the reviewer, MTT measures metabolic activity/NADH levels rather than viability in the strict sense, and that this is particularly relevant for a compound that perturbs redox metabolism. We have added a clonogenicity assay in APC organoids (Fig. 5 F-G) to supplement the MTT and Cleaved Caspase 3 assays already present in the manuscript.

      (1) Lee, J. K. et al. Directed evolution of CRISPR-Cas9 to increase its specificity. Nat. Commun. 9, (2018).

      (2) Sakovina, L., Vokhtantsev, I., Vorobyeva, M., Vorobyev, P. & Novopashina, D. Improving Stability and Specificity of CRISPR/Cas9 System by Selective Modification of Guide RNAs with 2′-fluoro and Locked Nucleic Acid Nucleotides. Int. J. Mol. Sci. 23, (2022).

      (3) Makar, A. N., Holkham, J., Lilla, S., Wilkinson, S. & von Kriegsheim, A. Overcoming preservation challenges to enable single-cell proteomics of fixed cell and tissue samples with retained proteome integrity. Preprint at https://doi.org/10.1101/2025.03.10.642380 (2025).

      (4) Flamme, I. et al. Mimicking hypoxia to treat anemia: HIF-stabilizer BAY 85-3934 (molidustat) stimulates erythropoietin production without hypertensive effects. PLoS One 9, (2014).

      (5) Tambuwala, M. M. et al. Loss of prolyl hydroxylase-1 protects against colitis through reduced epithelial cell apoptosis and increased barrier function. Gastroenterology 139, (2010).

      (6) Leite de Oliveira, R. et al. Gene-Targeting of Phd2 Improves Tumor Response to Chemotherapy and Prevents Side-Toxicity. Cancer Cell 22, (2012).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study used pupillometry to provide an objective assessment of a form of synesthesia in which people see additional color when reading numbers. It provides convincing evidence that subjective color ratings are matched by changes in pupil size that recapitulate brightnessmediated changes when exposed to the real color. The work provides a valuable contribution to the literature on both synesthetic perception and the use of pupillometry to probe perception and related psychological processes.

      We were pleased to learn that our manuscript was of interest to the reviewers and the editor. We thank the reviewers for their useful feedback and have addressed all their comments in the revised version. We here give the most prominent changes as quotes.

      We thank all reviewers and for their very helpful input.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Knowing that small pupil-size variations accompany brightness variations (even when these are illusory), the authors asked whether pupil constrictions would accompany the synesthetic perception of a brighter color (compared with a darker one), induced by the presentation of a blackwhite character. This grapheme-colour synesthesia is only experienced by a few participants, sixteen of whom were enrolled in this study. The results reliably showed that a relative pupil constriction would "betray" the perception of a brighter color in these participants, while no such effect would be observed in control participants who were asked to report a color in association with each grapheme, even though they did not perceive any.

      Strengths:

      The main strength of the study lies in its combination of psychophysics (brightness ratings) and pupillometry, which allowed for showing clear-cut results.

      Weaknesses:

      Some relatively minor weaknesses concern the ancillary analyses, which tackle secondary questions and are not entirely convincing.

      (1) The linear mixed model approach is a powerful way to identify important variables, but it does not clarify whether the key factors are between-subject or between-trial variations. Some variables are inherently defined at a subject level (e.g., PA scores), others are not. I would strongly recommend an alternative visualisation of the results to examine inter-individual variability.

      Visualizing the highly idiosyncratic effects is indeed challenging. Addressing R1’s point 4 and a point brought up by R2, we updated all figures to now visualize pupil size in millimeters instead of arbitrary units. Furthermore, we added a supplementary figure (supplementary figure 4) that visualizes pupil size change without demeaning (please see reply to point 4).

      To get a better grasp of the interaction between lightness and coupling strength, we further included the supplementary figure 5 that splits by lightness and coupling strength in synesthetes.

      Furthermore, as this review and response will be publicly available, Author response image 1 provides participant-mean traces per lightness bin in addition to the overall means and hopefully makes the stability/variability of effects visually clearer (in addition to the strip plots that attempt this for the average response).

      Author response image 1.

      We hope that these additional visualizations make the effects of interest more transparent. Ultimately, however, the LME figure likely provides the information best, albeit at the cost of complexity.

      (2) It is not clear why taking the first derivative of pupil size in Figure 5 would isolate the effect of arousal, eliminating those of luminance and contrast changes (in fact, one could argue for the opposite, since arousal effects are generally constant for extended periods of time while contrast effects are typically more local and transient).

      First, please note that the results in 2.3.1 cannot be explained by task or context effects such as luminance and contrast: the exact same active color reporting task (same task and context) was presented to synesthetes and non-synesthetes.

      Indeed, the reviewer is correct that the first derivative does not eliminate other concurrent pupil-driving effects, that was expressed wrongly in our original text. Indeed, any stimulus-locked effect, such as the luminance and contrast effects, but also the effort effect will reflect similarly in the derivative measure.

      We did take the derivative because pupil responses driven by other non-trial related activity, such as increasing tiredness or excitement over the course of trials differ almost by necessity between participants, thus creating variability. However, these effects are most likely happening at a slower timescale and thus show less in the derivative measure. Accordingly in past research, we previously found clearer response-locked effects in the past when using a derivative measure (Douze et al., 2025; Ten Brink et al., 2024). This way, we also hoped to get rid of such variability that happens between participants for this between participant analysis.

      Even if we were to use the same baseline corrected analysis, we would arrive at the same conclusion: we here directly compared baseline-corrected pupil sizes by taking individual differences into account (using a LME). In other words, we tested for the same question, but not relying on the derivative. We thus compared baseline-corrected pupil sizes using over-time LMEs. Group (active control vs. synesthete) gained significance between ~1.7s and 3s, aligning with the derivative-based result.

      Author response image 2.

      t-values of a per-time point LME predicting pupil response from group (synesthete/active control) Group reached significance.

      In sum, we deem the derivative more powerful/more appropriate in this context, but the interpretation of findings does not hinge on that analysis choice (as can be seen in the Author response image 2).

      We corrected the claims on the derivative as a measure cleaning out other effects that indeed was oversimplified as it stood. We now write:

      “Mental effort presents in task-evoked pupil dilations, yet other factors simultaneously affect the pupil, such as luminance and contrast changes at trial onset, as well as slower trends across the session (e.g., fatigue). To reduce the influence of these slower, non-trial-locked fluctuations while retaining the trial-evoked dynamics, we calculated the first derivative of the pupil time course to assess the velocity of pupillary changes (Butterworth filter, 18 Hz, order 3, 2.5 Hz lowpass, following our previous works [60, 61]).”

      Douze, B. T., Ten Brink, A. F., Dijkerman, H. C., & Strauch, C. (2025). Pupil responses objectively index pharmacologically altered tactile sensitivity. Cortex, 193, 90-104.

      Ten Brink, A. F., Heiner, I., Dijkerman, H. C., & Strauch, C. (2024). Pupil dilation reveals the intensity of touch. Psychophysiology, 61(6), e14538.

      (3) It is a pity that responses to physical brightness modulations were only measured in the synesthete group, not in controls, as this would have allowed for ruling out differences in pupil reactivity across the two populations.

      The reviewer is correct that this would allow additional comparisons, but argue that light responses in healthy control samples are very well documented and stereotypical. For instance, Bergamin & Kardon (2003) provide very systematic latency estimations, for low-luminance change stimuli in the realm of about 320ms that can accelerate to about 250ms for very strong luminance changes. Our relatively small luminance increments should thus be expected in this range. Indeed, this also well describes the response latencies we observed in synesthetes when exposed to the colored disks. While there is no detailed information about participants in Bergamin & Kardon (2003), data from previous studies shows very similar pupil light response profiles in a healthy student control population that matches our synesthetes well demographically (Strauch, Romein et al., 2022 Figure 2a, exact same lab as for the present study; Koevoet et al., 2025 Figure 3a). See also the further responses, baseline pupil size in millimeters across groups did not differ.

      Together, we can safely conclude that pupil light responses in synesthetes are not different from pupil light responses in controls. We agree with the reviewer that this is a sensible point to also make in the manuscript:

      “Specifically, pupil size first responded significantly to physical luminance after 330 ms (see Supplementary Figure 7 for per-timepoint LME; in line with response latencies of similar control populations, see Bergamin & Kardon [52], Koevoet et al. [40], and Strauch et al. [53]), but only responded significantly to synesthetic lightness at about 870 ms (see also Figure 3c vs e and Figure 4 for per-timepoint LME)”.

      Bergamin, O., & Kardon, R. H. (2003). Latency of the pupil light reflex: sample rate, stimulus intensity, and variation in normal subjects. Investigative Ophthalmology & Visual Science, 44(4), 1546-1554.

      Koevoet, D., Naber, M., Strauch, C. & Van der Stigchel, S. Presaccadic Attention Shifts Up-and Downwards: Evidence From the Pupil Light Response. Psychophysiology 62, e70047 (2025).

      Strauch, C., Romein, C., Naber, M., Van der Stigchel, S., & Ten Brink, A. F. (2022). The orienting response drives pseudoneglect—Evidence from an objective pupillometric method. Cortex, 151, 259-271.

      (4) Another concern is with the visualisation of the pupil traces in Figure 3 (main results); these were heavily pre-processed (per-participant demeaned), losing any feature besides the effect of interest and generating the unrealistic expectation that perception of dark/bright colors generate a net dilation/constriction of the pupil - whereas perception-related modulations of pupil size are always relative and generally small compared to the numerous other effects registered in pupil size. It would be far better to see the actual profiles, preserving the unfolding of dilations and constrictions over time, especially since these are further analysed in Figures 4 and 5.

      Indeed, the expectation that any dark synesthetic experience would lead to pupil dilation whereas any bright synesthetic experience would lead to constriction is not warranted – it would only do that relative to the counterfactual of not having that experience.

      Many factors affect the pupillary signal at the same time, and often differently across individuals (think of tiredness etc.), making merely baseline corrected traces seemingly noisy. Our visualization highlights that there is a systematic part to that variation that lies in the synesthetic brightness experience.

      Visualizing the effects of idiosyncratic experiences, varying within and between participants is challenging. For the theoretical insight brought about through our paper in Figure 4 (synesthesia being sensory in nature), demeaning is favorable in our opinion as it isolates the effect of interest in visualization. However, for methodological reasons and to better show effect sizes etc., there is certainly use in additional transparency. We now thus provide non-demeaned traces in the supplementary material as the reviewer suggested and also refer to these in the main manuscript. Furthermore, all figures are now provided in millimeters, with all pupil related analysis being rerun and updated to this end (without qualitative changes to the results). This should further rectify possibly inflated expectations about the absolute size of effects and allows to put effects into perspective across studies. We now added:

      “Pupillary data were transformed from arbitrary eyelink units to millimeters using a conversion factor obtained with an artificial eye (see Hayes & Petrov, 2016).”

      Hayes, T. R., & Petrov, A. A. (2016). Mapping and correcting the influence of gaze position on pupil size measurements. Behavior research methods, 48(2), 510-527.

      Impact:

      Despite these weaknesses, and especially if they are adequately addressed in the review, this work is likely to improve our understanding of synesthesia, providing a new tool to quantify the subjective sensations; an interesting potential extension would be using pupillometry for tracking changes over time of the synesthetic experiences, opening up the possibility to evaluate the importance of learning for this peculiar experience.

      We were happy to read our manuscript was evaluated this positively and hope that our replies can address the remaining smaller concerns and make findings more transparent to the readers.

      Reviewer #2 (Public review):

      Synesthesia is a neurological condition where stimulation of one sensory channel leads to involuntary, automatic, and consistent experience of another, unrelated percept. For example, Sir Francis Galton (1880, Nature) famously described the robust tendency of some individuals (synesthetes) to associate numerals with a distinct color. Ever since, synesthesia has continued to attract a broad interest in the cognitive neurosciences in light of its implications for the study of domains such as perception, consciousness, and brain connectivity, among others.

      Strauch, Leenaars, and Rouw measured pupil size in a group of 16 grapheme-color synesthetes and two matched control groups. The participants were presented with gray digits - that is, visual stimuli having identical physical properties in terms of brightness. Each participant subsequently rated the corresponding evoked color and brightness: unlike controls, synesthetes did so in a very consistent and reliable fashion. Accordingly, this was also shown in their pupils: despite the same objective luminance, digits associated with brighter percepts caused their pupils to constrict, and digits associated with darker percepts caused their pupils to dilate more than controls. These results highlight how crossmodal correspondences are deeply rooted in synesthetes, and put forward pupillometry as a particularly appealing biomarker for some phenomenological experience (at least those grounded in "brightness").

      Further strengths of the technique are its temporal resolution and its responsiveness to several constructs. Across several tasks, the authors show, for example, that responses to synesthetic light are somewhat slower than responses to real light (i.e., they are likely mediated), but at the same time faster than responses to mental imagery. The role of mental imagery can also be reasonably dismissed when considering the second feature of pupil size: its responsiveness to mental effort and cognitive load. The pupils tend to dilate with demanding, challenging tasks, and this was the case when control participants were asked to report the color of a digit for which they did not consistently experience a synesthetic association. The same task was, instead, seemingly effortless for synesthetes, again speaking in favor of the automaticity of number-color correspondences in their case.

      Overall, the findings by Strauch, Leenaars, and Rouw are highly significant for the field and likely to be impactful. The strength of their evidence, when accounting for the relatively small sample size and the inherent variability of both phenomenology (color perception and subjective reporting) and physiology (pupil size), is adequate and sufficiently convincing.

      We were glad to read this overall very positive assessment of our work and thank the reviewer for the additional non-public suggestions for improvements.

      Reviewer #3 (Public review):

      Summary:

      In the present study, the authors examined pupillary responses to uncolored stimuli (number graphemes) among number-color synesthetes and non-synesthetes. After seeing a digit, the synesthetes and active control participants were asked to indicate which color they perceived using three dimensions of hue, saturation, and lightness. The lightness values were the primary independent variable for follow-up analyses. To see how the pupil responded to psychologically "bright" and "dark" digits, the authors split the reported lightness values at the median and plotted them. The synesthetes showed a pupillary constriction to digits they perceived as bright and dilation to digits they perceived as dark. Active control participants did not show that effect. In a subsequent block, only the synesthetes were shown the colors they reported perceiving as colored discs. Their pupillary responses were similar. The authors also found that the differences in pupillary responses between light and dark perceptions (with digits) were only slightly delayed in their onset to the perception of a colored disc, and therefore, the color perception accompanying a digit is unlikely to be effortful or a retrieved association, but occurs rather automatically.

      Strengths:

      The authors employed a well-controlled and designed quasi-experiment comparing colorgrapheme synesthetes to non-synesthetes and showed convincingly that the color perceptions accompanying graphemes alter the physical perception of brightness. They also made a reasoned attempt to rule out the possibility that color associations are occurring effortfully via retrieved associations.

      We appreciate the positive assessment and useful suggestions for revision.

      Weaknesses:

      There are some areas in which the implications of these findings could be elaborated upon. I had the following questions:

      (1) Are the pupillary responses among synesthetes, which objectively do not seem to match the degree of physical stimulation entering the retina, in any way maladaptive for eye functioning? I understand the constriction/dilation of the pupil to not only benefit visual acuity but also to protect the retina from damage. Are synesthetes at any risk of retinal damage due to over-dilation of the pupil to brighter stimuli? Or are these effects of a magnitude that is too small to matter? As reported in arbitrary units, it was hard to know how large these effects were in terms of measurable changes in dilation (e.g., millimeters).

      This is an interesting point. Some argue that pupil size changes in a mid-range mildly affect optics thus affecting detection performance, contrast perception, and depth of field (Eberhardt et al., 2022, Mathôt & Ivanov 2019, Ruuskanen, Boehler, & Mathôt, 2025), rather than serving a protective role for the retina (Mathôt, 2018). Indeed, any effects reported here were quite small. We agree with the reviewer that this can be made more accessible by reporting effects in millimeters. We thus now adjusted all figures accordingly and write in the methods section:

      “Pupillary data were transformed from arbitrary eyelink units to millimeters using a conversion factor obtained with an artificial eye (see Hayes & Petrov, 2016).”

      Note that even the largest effects here (those elicited by physical luminance change in block 2 for the synesthetes) only caused differences in pupil size of about 0.3mm. This lies below the maximal pupil dilations observable in response maximal effort (about 0.5mm), for instance, and substantially below the full range of pupil size changes elicited through strong luminance stimulation (several millimeters). We therefore deem the changes in pupil size as obtained in our study too minor to be practically maladaptive for optics/perception.

      Eberhardt, L. V., Strauch, C., Hartmann, T. S., & Huckauf, A. (2022). Increasing pupil size is associated with improved detection performance in the periphery. Attention, perception, & psychophysics, 84(1), 138-149.

      Hayes, T. R., & Petrov, A. A. (2016). Mapping and correcting the influence of gaze position on pupil size measurements. Behavior research methods, 48(2), 510-527.

      Mathôt, S., & Ivanov, Y. (2019). The effect of pupil size and peripheral brightness on detection and discrimination performance. PeerJ, 7, e8220.

      Mathôt, S. (2018). Pupillometry: Psychology, physiology, and function. Journal of cognition, 1(1), 16.

      Ruuskanen, V., Boehler, C. N., & Mathôt, S. (2025). The Interplay of Spontaneous Pupil-Size Fluctuations and EEG Power in Near-Threshold Detection. Psychophysiology, 62(3), e70035.

      (2) Likewise, is the automatic synesthetic merging of two percepts something that could be learned such that natural synesthetes and "artificial" synesthetes would look similar? For example, if a group of non-synesthetic participants were to learn a color-grapheme association to automaticity, would you expect their pupillary responses to the graphemes look similar to the synesthetes'? If so (or if not), what would this tell us anything about the phenomenology of synesthesia?

      We find this question most interesting. Likely, different synesthesia researchers wouldn’t even fully agree on the most plausible answers to these questions. Training studies have shown that nonsynesthetes can be trained to associate particular colors to particular graphemes, as revealed in the synesthetic Stroop effect: interference effects of the learned color onto reporting the typeface color of the grapheme. The degree to which non-synesthetes can be trained to become similar to synesthetes is however still topic of debate.

      We now discuss as follows:

      “Future studies could examine to what degree training a non-synesthete to associate specific colors to particular inducers (e.g., digits), can provide similar patterns of results as genuine synesthesia (Bor et al., 2014, Colizoli et al., 2012, Rothen & Meier, 2014). Could learning produce similar brightness-related pupil effects in non-synesthetes? Similarly, would effort-linked responses diminish with increased training duration? The perhaps most interesting question relates to response latencies: Would a trained participant ever be able to produce brightnessrelated pupil effects as fast as a synesthete?”

      Bor, D., Rothen, N., Schwartzman, D. J., Clayton, S., & Seth, A. K. (2014). Adults can be trained to acquire synesthetic experiences. Scientific reports, 4(1), 7089.

      Colizoli, O., Murre, J. M., & Rouw, R. (2012). Pseudo-synesthesia through reading books with colored letters. PloS one, 7(6), e39799.

      Rothen, N., & Meier, B. (2014). Acquiring synaesthesia: insights from training studies. Frontiers in human neuroscience, 8, 109.

      (3) Do the synesthetic perceptions of digit graphemes merge in a sensible way? For example, if a synesthete sees a particular color with the digit 1, and a different color with the digit 9, what do they perceive when they see 19? or 1-9, or 1 9? Is there color blending, or an altogether different color perception?

      This is a very interesting question indeed. While each synesthete will have their own specific expression of synesthesia, there are regularities in how a combination of digits evokes synesthetic color. First, if asked about the color of a specific digit, each digit keeps its own color, as the color of a digit is linked to the identity of the digit (Dixon et al., 2006). Context effects are however possible, in particular when context alters the interpretation of the digit (Myles et al., 2003). A particularly common context in a multi-digit number is a dominant first digit, spreading its color to the subsequent digits in the number. However, as the digit color is linked to digit identity, what does ‘not’ happen is a mixing of colors into a qualitatively new color; for example, a yellow "1" and blue "9" do not merge into a green "19".

      Dixon, M. J., Smilek, D., Duffy, P. L., Zanna, M. P., & Merikle, P. M. (2006). The role of meaning in grapheme-colour synaesthesia. Cortex, 42(2), 243-252.

      Myles, K. M., Dixon, M. J., Smilek, D., & Merikle, P. M. (2003). Seeing double: The role of meaning in alphanumeric-colour synaesthesia. Brain and Cognition, 53(2), 342-345.

      Many thanks for the constructive assessment of our work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I am not sure I'd use the term 'cross-modal' given that the case considered here (graphemecolor) is purely visual.

      The reviewer is absolutely right: the term 'cross-modal' has a historical background rather than reflecting an exact factual accuracy. The term is still commonly used however, as it readily reflects how the induced additional experience is always of a different (sub)type than the inducing experience. There is a cross-over between experiences that might occur within the same sensory modality, or even induce awareness of a particular concept. But key to synesthesia is the crossover experience as the inducer and concurrent are different (sub)types of experiences. For example, seeing a letter can evoke a synesthetic experience of seeing a color, or evoke awareness of a particular gender or personality of that letter, but does not evoke another letter. To remain consistent with literature, we refer to 'cross-modality' when explaining the link to previous literature, but generally switched to using 'cross-over experience':

      “Therefore, synesthesia might provide a unique window into how the brain’s constructive processes can generate additional, conscious content, in cross-over experiences, often across modalities, going all the way down to the level of sensory phenomenology.”

      We adjusted throughout the manuscript accordingly.

      (2) I would not recommend focusing the introduction on the problem of qualia; this is a much more general and complex question than the one addressed in the study; the space of the introduction may be better used to present the actual object of study, giving a better picture of the synesthetic phenomenon and of previous work aimed at characterising it (behavioural, including PA scores and consistency measures, and neuroimaging). It is important to discuss how the pupillometric approach differs from the previously adopted neuroimaging techniques and what it can add to those.

      We agree that qualia is a very general and complex question. However, we respectfully disagree that this complex question is not the object of the study. What is remarkable about synesthesia is not the presence of an additional perceptual association per se, but the presence of a specific perceptual experience. As illustration, think of a test where an unconscious color association to the word 'banana' was tested. While a generic 'yellow' could semantically be linked and would likely be obtained in the (e.g. priming) experimental results, a follow-up question of picking on a color wheel the exact shade of yellow to this association, or describing the perceptual sensation of the color, would be non-sensical to the participants.

      This sharply contrasts with the current study: synesthetes, but not non-synesthetes, indicate a perceptual sensation of additional colors, and subsequently indeed the sensory properties of this percept (experienced brightness) affects the objective reflection of this sensation (pupil size) in synesthetes but not in non-synesthetes. In our view, the presence of additional qualia is key in understanding what sets synesthetic apart from non-synesthete associations, including so-called cross-modal correspondences (unconscious consistent associations across modalities, common to us all). We even believe that the reported qualia is what makes synesthesia so interesting in the first place. We now more clearly explain this link to qualia better in the introduction.

      "The most remarkable aspect of synesthesia is the subjective perceptual phenomenology of the induced colors, setting these sensations apart from color memory, thought, or amodal association. The contrast between synesthetes and non-synesthetes can thus offer an interesting doorway into examining qualia, the subjective perceptual phenomenology or first person (what's-it-like) perspective."

      We also improved the explanation of the synesthetic phenomenon, including a more detailed characterisation of behavioural measures (including consistency scores) and added neuroimaging studies. These changes have been incorporated into the text in response to previous comments (point 1- reviewer 1).

      Please note that we have chosen not to include more detailed discussion of PA scores. Our results show a trend but do not allow for a conclusive interpretation on PA scores, and we feel that placing greater emphasis on this topic might therefore be confusing or even misleading. Still, it would be a very interesting topic for follow-up research to examine how alterations in characteristics of the synesthetic experience influence pupil responses.

      The different synesthesia types all share the defining characteristics of an additional conscious and consistent experience. Synesthetes can verbally report their additional experience, and synesthetic sensations can be measured in behavioral paradigms such as the ’synesthetic Stroop’ effect, or brain activation patterns in sensory cortex [15]. Furthermore, test-retest paradigms show how synesthetic, but not non-synesthetic associations are highly specific and consistent [16-18]. Thus, over the past decades, research has established synesthesia as a ’real’ condition that can reliably be identified using behavior, neurophysiology, and neuroimaging [11, 13, 15–21]. The most remarkable aspect of synesthesia is the subjective perceptual phenomenology of the induced additional sensation, i.e., color in grapheme-color synesthesia. This sets synesthetic sensations apart from (color) memory, thought, or amodal association. Synesthesia can thus offer an interesting doorway into examining qualia, the subjective perceptual phenomenology or first person (what’s-it-like) perspective.

      We now discuss the pupillometric approach as it differs from the previously adopted neuroimaging techniques as follows:

      “Compared to neuroimaging studies [12,15,51], pupillometry may offer a more direct window into synesthetic phenomenology, as the directionality between pupil light reflex and perceived brightness is straightforward. Finally, improved understanding of the underlying processes can be obtained by contrasting responses to perceived versus actual (physical) brightness, given that the pupil light reflex is a well-characterised reflex arc involving few inferential steps.

      This adds to the explanation that was already present on how the current approach differs from previous techniques, and what it can add to those techniques:

      "Instead, current paradigms capturing synesthesia employ objective measures, but fail to capture its phenomenology [16, 17, 21, 23]."

      (3) There are a few typos and word repetitions.

      Many thanks – we identified typos and repetitions after another set of careful reads and hope to have eradicated them completely now.

      Reviewer #2 (Recommendations for the authors):

      I am overall very supportive of this work, but addressing the following points may enrich it further:

      (1) Paragraph 2.2.1. Here, models do not seem to compare synesthetes versus controls but rather assess the effects of interest separately in the two groups. The fact that experimental effects are significant in synesthetes, but not in controls, does not tell us much about differences between groups. Controls (e.g., Figure 3) do show a similar trend, albeit clearly smaller. There is one passage in which this issue appears to be tackled (page 10): "Critically, in an LME ran on synesthetes and controls and using only graphemes and the interaction of group and lightness as predictors, we found lightness to predict pupil size in synesthetes (t = -2.754, p = 0.006), but not controls (t = -1.134, p = 0.257)." But I am not sure that the reported statistics belong to the interaction - they seem to refer to the lightness effect within each group, not the difference.

      This is an important point, power for between-group comparisons is inherently limited for n = 16 per group (while still feasible for overall responses, things become trickier when less trials remain). A simple model of pupil ~ grapheme + group * lightness_scaled + (1 | participant) shows no significant interaction (despite one group showing the effect and the other not showing the effect significantly). The additional negative effect for group is in line with the effort-related effect reported later in the manuscript. Where does this leave us? Based on the lightness responses alone, the group difference can be characterized as a quantitative distinction, but the degree in which it is also a qualitative distinction cannot clearly be determined from current data. We revised the manuscript to make sure that such an interaction is not implied/ point to the absence of the significance of that interaction.

      The sensory nature of synesthetic color is supported by within-synesthete analyses, where coupling strength parametrically modulates the lightness-pupil relationship in a theoretically predicted manner. Importantly, the effort-related findings provide a complementary and statistically robust group comparison: synesthetes and controls performing the identical colorreporting task showed significantly different pupil dilation rates, directly demonstrating that the two groups differ in how they access color information. Together, these two independent pupillometric signatures, one tracking perceptual quality, one tracking effort, converge on the same conclusion and mutually reinforce the interpretation that synesthetic color constitutes genuine sensory phenomenology.

      Author response image 3.

      We now make this more explicit in the manuscript as follows:

      “We found significant modulations of pupil size by the lightness of the grapheme's synesthetic color - sustained and in the to-be-expected time window. Specifically, the pupil constricted more for brighter reported colors, and dilated more for darker reported colors, as predicted (Average pupil size 800-4000ms, t = -3.601, p < 0.001). In an LME ran for synesthetes and controls and using only graphemes and lightness as predictors, we found lightness to predict pupil size in synesthetes (t = 2.844, p = 0.004), but not controls (t = 0.606, p = 0.544). However, when taking group as interacting factor in a joint LME, there was no interaction of lightness and group (t = -0.949 p = 0.342).”

      and

      “For controls a separate model was run, now without the PA score as predictor (not assessed for controls). Neither lightness (t = -0.815, p = 0.415), coupling strength (t = 0.438, p = 0.661), nor their interaction gained significance (t = -1.058, p = 0.290; all for average pupil size between 800 ms and 4000 ms). Critically, we also ran a LME with the three-way interaction of coupling strength, group, and lightness (Wilkinson notation: pupil = grapheme + group + lightness * group + coupling strength * lightness * group + (1 | participant)). This analysis revealed a significant three-way interaction between lightness, coupling strength, and group (F = 3.86, p = .021), indicating that the lightness × coupling strength effect on pupil size was not equivalent across groups. Decomposing this interaction by group, the lightness × coupling strength slope was significant in synesthetes (t = 2.59, p = .010) but not in controls (t=-1.01, p=.311), suggesting that reported lightness and its coupling strength were more consistently related to pupil size in synesthetes than in controls. Note however, that this decomposition does not directly test whether the two slopes significantly differ from each other, however. Lastly, pupil size was marginally larger in controls than in synesthetes (t = 1.94, p = .062; see later sections for more in-depth analyses)”

      (2) The authors choose to analyze pupil size in arbitrary eye tracker units. This is fine, although I would recommend assessing and reporting whether the average pupil size (e.g., during the baseline) is roughly comparable between groups. The size of the effects may be difficult to compare between groups in the presence of very different baseline pupil size.

      Please see Author response image 4 for Baseline pupil sizes per group in millimeters. There were no differences between groups.

      Author response image 4.

      F2, 45) = 0.707, p = 0.499 (One-way Anova).

      We now write:

      “Baseline pupil sizes did not differ between groups (F(2, 45) = 0.707, p = 0.499).”

      We agree with the reviewer that millimeters are a more intuitive measure and updated all figures throughout manuscript and supplementary materials accordingly. We also briefly added to signal processing that this conversion was applied.

      “Pupillary data were transformed from arbitrary eyelink units to millimeters using a conversion factor obtained with an artificial eye (see Hayes & Petrov, 2016).”

      Hayes, T. R., & Petrov, A. A. (2016). Mapping and correcting the influence of gaze position on pupil size measurements. Behavior research methods, 48(2), 510-527.

      (3) If I understand correctly, the main task counted 120 trials overall (12 per digit). It seems, however, that only 3 and 4 participants remained with at least 50 trials (or 25 per median split by lightness) after preprocessing. This appears to be quite a massive data loss: is there a reason behind it? Please also clarify: the overall percentage of discarded trials; whether the median split by lightness was computed on all responses or only on those of the remaining, valid trials.

      This is an important point for clarification indeed. The exclusion of participants in Figure 3 applies only to that particular visualization, not to the statistical analyses. The linear mixed effects models (LMEs) used all available valid trials from all participants, with no participant-level exclusions. The figure-specific threshold (≥25 trials per median-split bin) was applied purely for display clarity, as plotting participants with very few trials per bin would produce unreliable/noisy and thus visually misleading traces (as we note in the figure caption and point readers to Supplementary Figure 1, which shows the same visualization without any exclusions).

      Since the paradigm required participants to repeat discarded trials until 120 valid trials were collected, all participants thus contributed exactly 120 valid trials to the analyses. There was therefore no data loss at the analysis level for the LME that is central to the claims of the manuscript (albeit more complex to grasp than the t-tests between bins).

      Why were there sometimes so little trials per brightness bin?

      First, participants differed in how dark or bright (synesthetic or forced-report) colors were overall, meaning that differing proportions thereof would fall above or below the 0.5 cutoff that overall, well represented the sample (but not necessarily every single participant). Note that this median split was not performed per individual but across all color reports to allow an apples-to-apples comparison.

      Second, participants often reported colors that differed in Hue and Saturation, but not Lightness. This is in line with synesthetes picking certain colors more often than others, as compared with non-synesthetes (Rouw & Root, 2019; Ward et al., 2025).

      We now include a new Supplementary Figure that visualizes responses on the Hue and Saturation dimensions of HSL space for both synesthetes and controls; fully saturated reports appear on the outer edge. We refer to the supplementary figure in the caption of Figure 2 as follows:

      "See Supplementary Figure 1 for color reports on the hue and saturation axes.”

      Rouw, R., & Root, N. B. (2019). Distinct colours in the ‘synaesthetic colour palette’. Philosophical Transactions of the Royal Society B: Biological Sciences, 374(1787).

      Ward, J., Maciel, S., Rouw, R., Simner, J., & Root, N. (2025). Synaesthesia is linked to differences in music preference and musical sophistication and a distinctive pattern of sound-color associations. Psychology of Music, 53(3), 453-473.

      Minor points:

      (1) "Building on this evidence, we hypothesized that the cross modal color phenomenology in synesthesia can, if truly sensory in nature, could likewise be (...)" -> may need rephrasing (can/could).

      Many thanks, fixed.

      (2) Caption of Figure 1: "Block 2 (synesthetes only): a colored disk and gray central patch, matching the average indicated color per digit, and the number and luminance of pixels of said digit were presented to assess externally triggered light responses." -> I find this sentence a bit hard to follow; perhaps consider rephrasing it.

      Agreed, we rephrased to:

      Block 2 (synesthetes only): a colored disk was presented, colored according to the synesthete's average indicated color for that digit. At its center sat a gray patch matching the luminance and pixel area of the original digit from Block 1, together allowing assessment of externally triggered light responses.

      (3) Figure 2 b: Consider truncating the y-axis to 1 if that improves the visualization.

      We adjusted the axis accordingly and added a bit more detail in the caption for the interpretation of the measure.

      (4) Caption of Figure 3 points to "see Supplementary Figure 1", but it should probably be SF2.

      Many thanks for spotting, all references to supplementary figures have been checked and are corrected now.

      Elvio Blini

      Reviewer #3 (Recommendations for the authors):

      (1) As a minor comment, there are some terms that felt overused in the manuscript. For example, the words "extraordinary" and "exceptional" were used multiple times throughout. I believe I understand the authors to mean them in their descriptive sense (i.e., outside the realm of typical experience), but in context, those words make it seem like they are touting their own experiment as "exceptional" or "extraordinary," which I don't believe was their intention.

      We agree. We removed words such as exceptional and extraordinary when they do not directly refer to the sensation throughout the manuscript (which is indeed how we intended to use it). We hope that this removes unnecessary and convoluting hyperbole.

      (2) It seemed counterintuitive to me that the color consistency score would be reverse-coded. In this case, the scores actually seem to indicate inconsistency, rather than consistency. Perhaps the raw scores can be inverted for a more intuitive interpretation that aligns with the terminology. I understand that they were following a previous publication in their method (Rothen et al., 2013).

      This manner of coding is counter-intuitive indeed. However, there are both logical and practical reasons to this approach. Importantly, this is indeed the standard way of reporting color consistency in synesthesia research (Carmichael et al., 2015; Eagleman et al., 2007; Root et al., 2025; Rothen et al., 2013). The calculation is based on a simple logic; a higher number reflects a larger distance in color space. An additional advantage is the clear and intuitive zero- reference: a score of zero implies choosing the exact same color. Finally, it intuitively reflects the distinction between synesthetes and non-synesthetes; there is by definition little variation across synesthetes (visualized at the bottom of the graph), then a 'cut-off line' (if consistency is used as diagnostic tool), and then the height of the range shows how large the range in consistency is, in that particular sample of non-synesthetes. In a way we therefore inherit a confusing definition/standard, but changing it would lead to new confusion instead. We now specifically clarify this in the caption as follows:

      “Note that higher consistency is reflected in lower color distance, hence lower values [17].”

      Carmichael, D.A., Down, M.P., Shillcock, R.C., Eagleman, D.M., Simner, J., 2015. Validating a standardised test battery for synesthesia: does the synesthesia battery reliably detect synesthesia? Conscious. Cogn. 33, 375–385

      Eagleman, D.M., Kagan, A.D., Nelson, S.S., Sagaram, D., Sarma, A.K., 2007. A standardized test battery for the study of synesthesia. J. Neurosci. Methods 159 (1), 139–145.

      Root, N., Chkhaidze, A., Melero, H., Sidoro -Dorso, A., Volberg, G., Zhang, Y., & Rouw, R. (2025). How “diagnostic” criteria interact to shape synesthetic behavior: The role of self-report and test–retest consistency in synesthesia research. Consciousness and Cognition, 129, 103819.

      Rothen, N., Seth, A.K., Witzel, C., Ward, J., 2013. Diagnosing synaesthesia with online colour pickers: maximising sensitivity and specificity. J. Neurosci. Methods 215 (1), 156–160.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) All outcomes are attributed specifically to L6b neurons, but the genetic manipulation is not specific to L6b neurons. The authors acknowledge this as a limitation, but in my view, this global manipulation is more than a limitation - it affects the overall interpretations of the data. The Hoerder-Suabedissen et al., 2018 paper shows sparse, but also dense, expression of Drd1a+ neurons in brain regions outside of the L6b. Given this issue, the results are largely overstated throughout the paper.

      We appreciate the reviewer’s careful reading and concern that some of our statements may have overstated the implications of our data. The Drd1a Cre mouse model used (FK164) has a relatively selective expression of Drd1a Cre in cortex, but indeed some expression is seen subcortically. This is an acknowledged limitation which is now explicitly addressed in the revised manuscript.

      (2) It is not clear to me that the "silencing" of Drd1a+ neurons was verified.

      In our previous publications, we showed confirmation of the loss of regulated synaptic vesicle release from the Cre-positive neuronal population (Marques-Smith et al., 2016; Hoerder-Suabedissen et al., 2018; Messore et al., 2024). This has now been described in the revised manuscript.

      (3) There were various discrepancies (and potentially misattributions) between the stated significant differences in Supplementary Table T1 data and Figure 3a & S2 spectral plots. This issue makes it difficult to effectively evaluate the main text and stated outcomes.

      We thank the reviewer for their careful attention to the statistical analyses and for noting the inconsistencies in how the results of the spectral analysis were presented: in the text we described two-way ANOVAs with according posthoc tests but in the figures significance markers were positioned based on multiple t tests. We have now carefully revised the spectral results and implemented a consistent approach in statistical reporting and spectral plots. We have updated Supplementary Table T1, Figure 3a and S2 to ensure that all statistics are presented consistently throughout the manuscript, i.e. with two-way ANOVAs and accompanying posthoc tests. Please note that we performed all spectral analyses in the range between 0.5 and 128 Hz (excluding the range between 49-51.5 Hz due to electrical noise from the power grid) but only plot the range between 0.5-30 Hz as the spectral bands most relevant for sleep neurophysiology are contained in this range.

      Related, the authors stated that post hoc comparisons of EEG spectral frequency bins were not corrected for multiple testing. Instead, significance was only denoted if changes in at least two consecutive frequency bins were significant. However, there are multiple plots in which a single significance marker is placed over an isolated bin (i.e., 4c, 6, S5, S6). Unless each marker is equivalent to 2 consecutive frequency bins, these markers should be removed from the plots. Otherwise, please define the frequency and size of these markers in the main text.

      In line with the previous comment, we have adjusted markers to reflect the results from posthoc tests after two-way ANOVAs.Please note that Figure 6 and the related supplementary figures S5 and S6 have now been removed from the manuscript, as careful re-analysis indicated that the sample size was too low to support a strong conclusion regarding the comparison of orexin effects between genotypes. We stated in the text that we would only include posthoc significance when at least two consecutive bins were significant, but this was indeed not supported in our figure, where each marker reflects one 0.25 Hz bin. We have now adjusted our code to ensure that only markers are plotted when at least two consecutive bins are significant in bin-wise posthoc comparisons.

      (4) A rainbow color scale, as in Figure 3, we've now learned, can be misleading and difficult to interpret. The viridis color scale or a different diverging color scale are good alternatives.

      Thank you for pointing this out, we have adjusted the colour scale.

      (5) How much time elapsed between vehicle/orexin A & B infusions?

      There were 2-4 non-infusions days between infusions. We have added this information to methods.

      (6) For Figure 6, there are statistical discrepancies between the main text and the plots (pg. 10):

      (a) The text claims post hoc differences for relative ORXA frontal EEG, but there are no significance markers on the plot.

      (b) The text states that there were no post hoc differences for the relative ORXA occipital EEG, but significance markers are on the plot.

      (c) The main test for the relative ORXB frontal EEG was not significant, but there are post hoc significance markers on the plot.

      (d) For relative ORXB occipital EEG, there are significant markers on the plot outside of the stated range in the text.

      We agree with the reviewer, and we decided to exclude this figure from the manuscript as the sample size for some key comparisons was too low to support any strong conclusions and therefore presenting this analysis is potentially misleading. We explain the rationale for excluding this analyses in the revised manuscript.

      (7) Some important details are only available in figure captions, making it difficult to understand the main text. For example, when describing Figure 3c in the main text on page 7, it is not clear what type of transitions are being discussed without reading the figure caption. Likewise, a "decrease," "shift," and "change" are mentioned, but relative to what? Similar comment for the EEG theta activity description on pages 7 - 8. Please add relevant details to the main text.

      We have adjusted the wording in the main text to reflect more precisely which comparisons are shown in the figures.

      (8) Statistical comparisons for data in Figure 3e, post hoc analyses for data in Figure S7a-b REM data, and post hoc analyses for Figure S7c (not b) occipital EEG should be included to support differences claims. Please denote these differences on the respective plots.

      Please note that the previously named Supplementary Figures S5 and S6 have been removed from the manuscript, and that the Supplementary Figure S7 in this comment refers to the figure currently named Supplementary Figure S5.

      We have added the statistical comparisons for Figure 3e, Supplementary Figure S5A and Figure S5b to the results section. In Figure S5c, there was an overall genotype difference, but there was no significant time x genotype interaction, so we have not performed posthoc tests and did not plot posthoc significance markers for this figure. We have adjusted the wording in the results section to make this clearer. We have adjusted the reference to the figure S5c which was incorrect, thank you for your careful attention.

      (9) In the subsection titled "Layer 6b mediates effects of orexin on vigilance states (pg. 8)," there does not seem to be any stated differences between control and L6b silenced mice. A more accurate subtitle is needed.

      We agree with the reviewer and the title of this sub-section has now been changed accordingly.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Although the authors used a highly selective approach to silence layer 6b neurons, the observed changes in EEG oscillations cannot be solely attributed to layer 6b neurons because of the ICV route for orexin administration.

      We thank the reviewer for this important comment. The ICV route of orexin administration cannot guarantee that only cortical Drd1a-Cre–expressing neurons are reached by orexin, and the Drd1a-Cre driver line is highly selective but not entirely specific for layer 6b neurons (see also response to reviewer #1, comment 1). We have therefore changed the wording of the stated effects and addressed this consideration in the Limitations section of the manuscript. Please note that, as mentioned above, Figure 6 has now been excluded from the manuscript.

      (2) The rationale for using only male rats is not provided.

      We thank the reviewer for highlighting this omission. We now provide the rationale for using only male mice in the methods section as follows: “In the current study, only male mice were used, because our experimental protocol precluded the possibility of accurately monitoring the oestrous cycle, which has marked effects on brain activity, arousal and vigilance states. We therefore decided to use male mice only for the current study but are planning to use both sexes in future work.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Better descriptions of L6b connectivity will improve clarity in the second paragraph of the Introduction (pg. 3). For example, it is not explicitly stated that L6b projects to L5 before the authors describe L5. Therefore, the L5 description seems irrelevant.

      We thank the reviewer for this request for clarification. We mention the connectivity between L6b and L5 because L5 pyramidal neurons have recently been found to play a key role in sleep-wake regulation (Krone et al., Nat. Neurosci. 2021; Honjo et al., 2025; Wasilczuk et al, 2025; Krone et al., 2025). We have now amended the corresponding section of the introduction to emphasise the potential functional relevance of this connection as follows:

      “L5, the major output layer of the cortex, is also bidirectionally communicative with higher order thalamic nuclei (Hoerder-Suabedissen et al., 2018) as well as layer 5 pyramidal neurons (Zolnik et al., 2024). Since several subtypes of L5 pyramidal neurons have recently been shown to play important roles in distinct aspects of sleep-wake regulation (Krone et al., 2021, 2025; Hong et al. 2023; Wasilczuk et al. 2025; Honjo et al., 2025; Chouafeev et al., 2025); depth of anaesthesia (Wasilczuk et al. 2025), and the influence of stress on sleep (Chouafeev et al. 2025) the projections of orexin-sensitive L6b to L5 pyramidal neurons may be a key circuitry in the top-down regulation of brain states.”

      (2) There are plots where the y-axis tick label appears to be offset from the tick mark (4a, S5b, S6a).

      Thank you for spotting this graphical issue. We have removed the y-axis tick labels from Figure 4a to avoid confusion. Please note that we decided to remove Figure S5 and Figure S6, because after careful re-analysis we concluded that the group size was too small to draw conclusions on orexin spectra and that any results could be potentially misleading.

      (3) The 2-h time constant, I believe, is depicted in Figure 4H (not 4G).

      Thank you for spotting this. We have corrected the figure legends accordingly and double-checked that Figure 4G depicts the 2-h time constant and Figure 4H the 6-h time constant.

      (4) "...although there was an indication of a higher absolute theta-peak power in layer 6b silenced mice (Figure S6)," pg. 10. It is not clear to me how the data lead to this conclusion.

      Thank you for identifying this inconsistency, which resulted from a preliminary statistical analysis subsequently corrected. We have now improved the statistical analysis of spectral data (for more details see comments to both reviewers in public response) and removed this statement, which in fact is no longer supported by the data.

      (5) Exclusion of female mice is not listed as a limitation.

      We now discuss this limitation as follows:

      “In the current study, only male mice were used, because our experimental protocol precluded the possibility of accurately monitoring the oestrous cycle, which has marked effects on brain activity, arousal and vigilance states. We therefore decided to use male mice only for the current study but are planning to use both sexes in future work.”

      (6) A brief description of why Cplx3 and Tbr1 antibodies are being used will be helpful to include in the Methods (pg. 21) in addition to what is in the figure caption.

      We have added the following information to the methods section to clarify why we used these two antibodies: “rabbit α-Cplx3 to distinguish between L6a and L6b” “mouse α-Tbr1 to identify the L5-6 boundary”

      (7) Including a label/title for the Figure 2c spectral plots will be helpful. It is not immediately clear if these are light period & dark period data or frontal & occipital data.

      Thank you for pointing this out, we have updated the figure legend to clarify what is shown on this Figure

      Similar comments for S2 and S3a plots. Including a state label on the plots will be helpful in addition to the caption description.

      We have now added the state labels for Figure panels S2 and S3a for improved clarity.

      Reviewer #2 (Recommendations for the authors):

      This is a soundly conducted and well-written study that enhances our understanding of the cortical control of states of consciousness. I do not have any major concerns, but would like the authors to consider some alternate possibilities as suggested in my comments below:

      We thank the reviewer for this positive assessment of our manuscript and the helpful suggestions.

      (1) Given that the inactivation of layer6b neurons did not affect the time spent in sleep-wake states, to me it appears that these neurons likely have a role in creating the background neural conditions/oscillations supportive of an activated state rather than a direct role in behavioral state control.

      We completely agree with the reviewer and have made the wording more consistent throughout the manuscript, now using “brain state control” rather than “behavioural state control” to clarify that the main effect observed in the L6b-silenced mouse model is a change in spectral characteristics reflecting brain oscillations, rather than effects on vigilance states, which were modest.

      (2) Does the observed shift in REM sleep-related theta-peak frequency in the occipital derivation suggest changes in local neural processes, or could it be just a matter of better signal detection because theta is most prominent at or around the hippocampal region, which is approximately the location of occipital electrodes in this study.

      The source of the shift in REM sleep–related theta peak frequency in the occipital derivation cannot be established with EEG recordings alone. Additional intracortical or intrahippocampal recordings would be necessary to distinguish between the two possible explanations proposed by the reviewer. We have discussed this further in the revised manuscript.

      (3) Orexinergic system innervates multiple subcortical sites and widely covers the cortex too, because of which the effect of ICV orexins cannot be attributed to just layer6b neurons as described in the manuscript ("Layer 6b mediates effects of orexin on brain activity.").

      We agree with the reviewer that this is a limitation. We have now adjusted the subtitle of the paragraph describing the results from the ICV administration of orexin and further mention this important consideration in the ‘limitations’ section of the discussion.

      (4) While the current study is focused on sleep-wake mechanisms, the findings reported here have much broader implications for behavioral and/or brain state arousal and provide a mechanistic bridge between different states of consciousness, including general anesthesia. Therefore, the authors may consider tying these findings with the recent work on the role of the prefrontal cortex in arousal from general anesthesia and slow-wave sleep (PMID: 35436248, PMID: 29937348, PMID: 33328847).

      We thank the reviewer for this excellent recommendation. We are now citing these papers in the revised manuscript.

      (5) It's up to the authors, but I do not see the need for the section on Clinical Implications. It's very speculative, and it makes the entire discussion section heavy.<br />

      We have considerably shortened the discussion of potential clinical implications to make the manuscript more concise.

      (6) Figure 1: It's difficult to compare the EEG power the way figures are set up right now. I think it would enhance clarity if the authors separate the plots based on state and show power from the control and silenced neuronal group in the same plot. Also, the colors are too similar (essentially a shade of green/blue) to provide effective visual resolution. This is especially true in panel d. Please consider changing the color scheme.

      This comment seems to refer to Figure 2 and subsequent figures with analysis of vigilance states and EEG spectra (Figure 1 contains histological images). We have selected the colour scheme for colour-blind individuals. Therefore, the main difference is in the saturation, not the colour of the plots. We have tested the visibility of the colour scheme on a high-resolution screen with the original image files and can reassure the reviewer that the genotype differences, which are slightly blurred in the reduced-resolution figures provided within the combined text file for the review process, are easily distinguishable in the final figure quality.

      (7) I don't understand the y-axis scale in Figure 1. How can this be 500% and if it is, then 500% of what?

      This comment also seems to refer to the analysis of slow wave activity (SWA) in Figure 2 rather than to Figure 1 (histology figure). The percentage of SWA is normalised to the average SWA across the recording. Since NREM sleep is characterised by considerably higher SWA than wakefulness and REM sleep, the level of SWA during NREM sleep is in the range of 200-300%, and can be even higher after long wake episodes which are followed by a rebound of NREM sleep SWA. Hence, the upper limit of the y-axis in these (and subsequent) plots of SWA is 500% (of the average SWA). We have amended the figure legend to clarify that SWA is presented here as percentage of average SWA across the recording.

    1. Author Response:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel approach to subcellular spatial proteomics by combining laser microdissection with expansion microscopy and LC-MS/MS analysis (SPEx). They implement two different workflows for LMD and LC-MS/MS quantification:

      (1)The standard approach, where an area of interest is cut out by LMD, subjected to proteomics analysis, and compared to the rest of the cell without the dissected ROI.

      (2) The subtraction approach, where ROIs are removed, and the remaining cellular material is compared to samples containing both the surrounding material and the ROI.

      The authors assess the technique by applying it to subcellular targets of various sizes, volumes, and protein compositions such as the nucleus, nucleoli, and Golgi. They demonstrate that SPEx can identify proteins enriched or reduced in ROIs.

      Strengths:

      The broad, relatively easy, and inexpensive applicability of this approach to potentially many cell types and subcellular areas of interest provides an exciting alternative to subcellular fractionation, native immunoprecipitation, or genetically encoded proximity labeling constructs. Moreover, by visually selecting ROIs for subsequent analysis, subcellular context or organelle morphology can be taken into account, as discussed by the authors in the discussion section.

      Weaknesses:

      While strongly supporting the sharing of this approach, we have a number of comments and questions that will improve the impact of the manuscript:

      We thank the reviewer for the careful evaluation of our manuscript and the generally positive assessment. We plan on improving our manuscript based on the reviewers’ comments.

      (1) General:

      a) The manuscript would benefit from restructuring and language revision. In its current form, the writing is sometimes dense and verbose (in particular, the Results section). This makes it difficult to follow the authors' arguments.

      We will improve readability and clarity of the results section in the revised manuscript.

      b) The authors mention the possibility of selecting organelles based on morphology. This is left for the discussion, but it seems like a missed opportunity - the authors could compare individual organelles in different morphological states, e.g., connected vs. fragmented mitochondria.

      The authors agree with the reviewers’ assessment that investigating proteome of organelles based on morphology or cellular state is an exciting application of SPEx. While we plan experiments along this line in the future, we think that these experiments are beyond the scope of this manuscript, which is meant to describe the method and its general usefulness.

      (2) Technical:

      a) Why do the authors strive and optimize for a 10x expansion factor? Is SPEx compatible with a more standard 4x expansion, as e.g., used in the classic U-ExM approach (https://www.nature.com/articles/s41592-018-0238-1)? This could be added to the discussion.

      We aimed for 10x expansion solely because our ultimate goal is to cut out very small structures. Isolating structures as small as nucleoli would not be as reliable with a lower expansion factor (i.e. 4x) expansion. We did not assess the compatibility with U-ExM. We would assume that SPEx would also work with U-ExM as expansion method; omitting protease treatment, however. Still, we performed pilots with just 4x expansion (using TREx) in the early stages of optimization. We were able to isolate single cells and obtain similar protein coverage as with 10x expansion. We will further clarify our motivation to use 10x expansion in the discussion.

      We would also like to point out whether to U-ExM the standard method or not is rather subjective. Even though TREx was published three years later, it is also very widely used. The original expansion microscopy method was published three years prior to U-ExM.

      b) The U-ExM approach shows improved ultrastructural preservation when using 3%FA with 0.1% glutaraldehyde fixation (GA). Is SPEx compatible with the use of low amounts of GA for fixation?

      We tried different fixation methods in the early stages of this study (where expansion was not yet close to 10x). We saw a mild negative effect of GA on the expansion factor, so we avoided it in the later experiments since it also did not seem necessary to preserve the structure of our organelles of interest. However, the use of GA would generally be compatible with SPEx, potentially at the cost of a mild negative effect on expansion factor (see Author response image 1) and proteome coverage. We can add this information to the discussion.

      Author response image 1.

      Fixation methods mini-screen. Cells were fixed with the indicated reagents for 10 minutes at 37°C. After TREx expansion, the diameter of the nucleus was measured (A) and the resulting expansion factor compared to the non-expanded control was determined (B).

      Related to the above, was the anchoring efficiency reduced only to achieve a 10x expansion factor or does this additionally affect the proteome coverage?

      We solely lowered the anchoring in order to allow for higher expansion factors. In earlier pilots we performed proteomic analysis on samples that were just expanded 4x using standard TREx expansion (also using the original anchoring strategy from the TREx publication, consisting of 0.2 mg/ml AcX for overnight at RT). We presented the results of this pilot in Fig S1A. We still detected over 2,000 proteins from 10 cells, a coverage, which is highly similar to what we found in the final experiments (Figure 2F), in which the anchoring was lower yielding 10x expansion. Based on these data, we hypothesize that anchoring (and expansion factor!) has a negligible impact on protein coverage. We will clarify this in the manuscript.

      d) Have the authors considered using alternative anchoring approaches, such as GMA (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0291506#pone.0291506.s001), which potentially increase the amount of sample retained in the hydrogel, thus allowing for better proteome coverage? This could be added to the discussion.

      We did not use alternative anchoring approaches. We modified the TREx protocol to fit our purposes and since this was sufficient, we did not explore alternatives. However, using anchoring approaches, in which higher amounts of sample could be retained in the gel might be beneficial for the proteomics coverage. We will keep this suggestion in mind for future experiments. Thank you for the suggestion!

      e) The limitation of the approach to near-2D samples should be mentioned, and alternative approaches for more 3D samples could be discussed.

      The authors agree that SPEx is limited to near-2D samples at this point. We suggest that SPEx is applicable for 3D samples (e.g. in tissues) by performing cryosectioning. TREx has been shown to be compatible with sectioned tissue (Damstra et al., 2022). We will elaborate this in the discussion.

      f) How are peptides that are directly anchored to the hydrogel dealt with during LC-MS/MS analysis? Are they excluded, or can they be identified during the spectral search? The latter would allow us to get a deeper structural understanding of how proteins are actually anchored into hydrogels, which so far has not been assessed.

      The reviewer raises an interesting point. In general, peptides carrying the anchoring modification are analysed by LC-MS, but we did not include these specific modifications in the database search. Overall, we assumed that the labeling would be low and stochastic and hence should, if at all, only minimally affect the detection of peptides. Nevertheless, in response to the reviewers’ comment, we searched the MS data again for the crosslinking reagent linked to lysine residues. However, we could not get any confident hit for any peptide containing this modification. Since we cannot exclude that the modification precludes the identification of the corresponding peptides, we compared the number peptides generated by trypsin cleavage after arginine and lysine. As the human genome contains similar proportions of both amino acids, one would expect similar numbers of both peptide types being identified. Any modifications of lysine by the anchoring reagent used, would prevent tryptic cleavage and thus reduce the number of lysine peptides. As shown in Author response image 2, the number of lysine terminating is only slightly lower compared to arginine terminating peptides. Notably, the proteomics results of a different fixed human tissue sample directly extracted by laser capture micro dissection without expansion showed a very similar lysine to arginine peptide ratio. This indicates that the large majority of lysine residues is not modified and affected by the hydrogel anchoring.

      Author response image 2.

      Number of peptides identified either terminating with lysine (K) or arginine (R) across all samples shown in Figure 5F.

      An alternative approach to address this question would be to investigate if the peptide coverage of proteins detected by SPEx is enriched for peptides representing the folded core of proteins as opposed to the surface-exposed regions, which likely get more anchored into the hydrogel.

      Because of the negligible amounts of modified peptides, we did not investigate this potential bias of surface-exposed versus folded-core peptides.

      g) Same question regarding peptides with NHS labeling. Can they be identified, or do they just compete for ionization and thus negatively affect coverage and dynamic range of the LC-MS/MS approach?

      The reviewer raises a similar point as above for another lysine labeling used during the SPEx protocol. Again, we specifically looked for this modification by re-searching the raw MS data, but still could not identify any peptides, carrying this modification on a lysine residue. Even though we cannot exclude that this rather large modification prevents detection, considering the high number of lysine terminating peptides in our dataset (see Figure 2), we would expect that also this labeling step is stochastic and affects only a minor proportion of the proteins.

      h) How are the primary and secondary antibodies affecting the proteomics analysis identified as contaminants?

      We thank the reviewer for this comment. Since antibodies bind to proteins in a non-covalent manner, they will be released during the denaturing steps of the protocol. Of course, the antibodies will stay in the sample, be digested and analyzed and could, if very abundant, affect the analysis of the proteins from the samples. To check this possibility, we re-searched the MS data including the sequences of the antibodies used. To our surprise, we could not detect any peptides of these antibodies. This suggests that the concentrations of the antibodies used are much lower than those of the sample proteins and thus should not have any impact on the proteomics results.  We interpret this result also as a benefit of our method compared to organellar-IP.

      i) Have the authors observed differences in proteomics coverage of only antibody vs NHS-labeling? Depending on the questions above, could pure antibody-based labeling increase proteomic coverage?

      We did not perform this comparative analysis, since we always used NHS dyes. In the experiments presented in this manuscript, NHS dyes allowed easy visualization of the whole cell without the use of antibodies. This NHS staining was essential for this particular setup for sample acquisition. We cut out entire cells, cells lacking the nucleus and cells lacking the Golgi apparatus, which served as critical controls. However, other ways of detecting cell boundaries could be used to avoid NHS staining. As shown above, both, the anchor and NHS labeling are likewise sparse and stochastic. Moreover, we could not detect any impact of the antibody labeling to our results. Thus, we assume that both labeling procedures could be used.

      Reviewer #2 (Public review):

      Summary:

      This study introduces a method that combines physical expansion of cells, imaging-guided isolation of defined regions, and protein identification to enable compartment-resolved analysis of protein composition at the subcellular scale. The authors aim to address a central limitation in existing approaches, namely the loss of spatial information during sample preparation or the indirect nature of proximity-based labeling methods. Using several cellular compartments as examples, they demonstrate that their approach can recover compartment-enriched protein sets and identify candidate proteins with previously unassigned localization.

      Strengths:

      A major strength of this work is the conceptual simplicity and accessibility of the approach. By combining established techniques in a modular way, the method avoids the need for genetic manipulation or specialized labeling strategies, making it broadly adaptable across experimental systems. The ability to directly select regions of interest based on imaging represents a clear advantage over indirect enrichment strategies and allows flexible targeting of both membrane-bound and non-membrane-bound compartments.

      The experimental design is also a strong aspect of the study. The use of complementary comparison strategies-analyzing isolated compartments alongside matched "subtracted" controls-provides an internal framework for assessing enrichment and depletion, increasing confidence in spatial assignment. The application of the method across multiple organelles of different sizes and properties demonstrates versatility, and the reported specificity for several compartments is encouraging. In particular, the ability to profile small and biochemically challenging structures highlights a potentially important niche for the approach.

      Weaknesses:

      Despite these strengths, several methodological limitations constrain the interpretation of the results. The most important relates to spatial accuracy in three dimensions. While lateral resolution is improved through physical expansion, the lack of depth resolution introduces uncertainty regarding contributions from structures above and below the selected region. Although the authors argue that this does not substantially affect specificity, the current evidence is largely indirect, and a more rigorous quantification of potential contamination would strengthen this conclusion.

      Quantitative interpretation also remains challenging. Because the measurements reflect total protein abundance rather than local concentration, differences in compartment size and protein density can influence enrichment values, particularly for small structures embedded within larger volumes. This issue is evident in the analysis of smaller compartments and complicates direct comparison across conditions. Additional normalization or modeling would help clarify how to interpret these measurements.

      Another limitation concerns variability in the expansion process and its downstream consequences. Differences in expansion factor across samples may affect the definition of regions of interest and introduce variability in sampling, yet the impact of this variability is not fully explored. Similarly, the use of a modified chemical treatment to preserve proteins for downstream analysis is central to the workflow but is not extensively validated with respect to preservation of spatial organization.

      While the identification of previously unannotated proteins is an appealing aspect of the study, validation is limited to a small number of examples, and broader support from independent datasets or literature context is lacking. In addition, the study primarily focuses on steady-state measurements in a single cell type, and therefore does not yet demonstrate the ability of the method to capture dynamic or condition-dependent changes in protein localization.

      Finally, the positioning of the method relative to existing approaches could be more clearly articulated. Although qualitative comparisons are provided, a more systematic and quantitative benchmarking against alternative strategies would help readers better understand the specific advantages and trade-offs.

      We thank the reviewer for the careful evaluation of the manuscript and for the constructive feedback. We think the reviewer raises valid points and will address them in the revised manuscript.

      Reviewer #3 (Public review):

      Franziscus et al. describe an elegant approach for spatially specific proteome analysis. To achieve this, they expand fixed cells and subsequently use a laser to micro-dissect a region of interest, which is then analyzed by mass spectrometry.

      They demonstrate the effectiveness of their approach by analyzing the nucleus, nucleolus, and the Golgi, and benchmark their hits against previous datasets for these organelles.

      The manuscript is very well written and nicely guides the reader through the applied methods. The presented data is convincing, and I do not see the need for additional experimental verification of the protocol. The only minor concern is the novelty of the method and the presentation. A combination of expansion, laser microdissection, and proteomics has been applied in the past (PMID: 36450705, PMID: 39477916). In the manuscript, one of these studies is cited, though it does not become clear that this approach is already described. However, Franziscus et al. describe the approach better and make it more accessible to the reader, especially since the other studies described this methodology in combination with tissue expansion and not in combination with single cell expansion as it is done here. I would ask the authors to be clearer in the introduction about what others have already done and what their contribution is here. In general, I am convinced that the community will benefit from the presented protocol to analyze organelle proteomics in detail.

      We thank the reviewer for the careful evaluation of our manuscript and overwhelmingly positive assessment. We apologize for the omission of the mentioned citations, and will adjust the introduction to make it clearer what has already been done and what the advance our method provides.

      References

      Damstra HG, Mohar B, Eddison M, Akhmanova A, Kapitein LC, Tillberg PW. 2022. Visualizing cellular and tissue ultrastructure using Ten-fold Robust Expansion Microscopy (TREx). eLife 11:e73775. DOI: https://doi.org/10.7554/eLife.73775

      Gambarotto D, Hamel V, Guichard P. 2021. Ultrastructure expansion microscopy (U-ExM). Methods in Cell Biology 161:57–81. DOI: https://doi.org/10.1016/bs.mcb.2020.05.006, PMID: 33478697

      Liffner B, Silva TLA e., Vega-Rodriguez J, Absalon S. 2024. Mosquito Tissue Ultrastructure-Expansion Microscopy (MoTissU-ExM) enables ultrastructural and anatomical analysis of malaria parasites and their mosquito. BMC Methods 1:13. DOI: https://doi.org/10.1186/s44330-024-00013-4

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their constructive questions, valuable feedback, and for approving our manuscript. We truly appreciate the opportunity to improve our work based on their insightful comments. Before addressing the editor’s and each referee’s remarks individually, we provide below a point-by-point response summarizing the revisions made.

      Duplication of control groups across experiments

      We appreciate the reviewers’ concern regarding the potential duplication of control groups. In the revised manuscript, we have explicitly clarified that independent groups of control mice were used for each experiment. These details are now clearly indicated in the Materials and Methods section to avoid any ambiguity and to reinforce the rigor of our experimental design (Page 15, Line 453-455): “Furthermore, knockout animals and those treated with pharmacological inhibitors or neutralizing antibodies shared the same control groups (chow and HFCD), as required by the animal ethics committee.”

      Validation of the MASLD model

      To strengthen the metabolic characterization of our MASLD model, we have now included additional parameters, including liver weight, Picrosirius staining and blood glucose measurements. These data are presented as new graphs in the revised manuscript and support the metabolic relevance of the HFCD diet model (Figure Suplementary S1). The corresponding description has been added to the Results section (Page 5, Lines 116-117) as follows: “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C)”

      Assessment of liver injury in RagKO and anti-NK1.1 mice

      We fully agree that assessment of liver injury is essential for these models. For mice treated with antiNK1.1, ALT levels are shown in Figure 4G, confirming increased liver injury after treatment. Regarding Rag⁻/⁻ mice, the animals exhibit exacerbation of liver injury when fed a HFCD diet and challenged with LPS (Page 7, Lines 183–184). The corresponding description has been added to the Results section (Page 7, Lines 175-176) as follows: “Interestingly, Rag1-deficient animals under the HFCD remained susceptible to the LPS challenge (Fig. 4C) with exacerbation of liver injury (Fig. 4D) ”

      Discussion of limitations

      We have expanded the Discussion section to provide a more comprehensive and balanced perspective on the limitations of our model and experimental approach (Page 13-14, Lines 401–414) “Our study presents several limitations that should be acknowledged and discussed. First, we cannot entirely rule out the possibility that our mice deficient in pro-inflammatory components exhibit reduced responsiveness to LPS. However, our ex vivo analyses using splenocytes from these animals revealed a preserved cytokine production following LPS stimulation. These results suggest that the in vivo differences observed are primarily driven by the MAFLD condition rather than by intrinsic defects in LPS sensitivity. Second, the absence of publicly available single-cell RNA-seq datasets from MAFLD subjects under endotoxemic or septic conditions limited our ability to perform direct translational comparisons. To overcome this, we analyzed existing MAFLD patients and experimental MAFLD datasets, which consistently demonstrated upregulation of IFN-y and TNF-α inflammatory pathways in MALFD. In line with these findings, our murine model revealed TNF-α⁺ myeloid and IFN-y⁺ NK cell populations, thereby reinforcing the validity and translational relevance of our results.”. This revision highlights the constraints of the MASLD model, the inherent variability among in vivo experiments, and the interpretative limitations related to immunodeficient mouse strains.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 4 the authors are showing the number of IFN+ positive CD4, CD8, and NK 1.1+ cells. Could they show from total IFNg production, how much it goes specifically on NK cells and how much on other cell populations since NK1.1 is NK but also NKT and gamma delta T cell marker? Also, in Figure 2E the authors see a substantial increase in IFNg signal in T cells.

      While we did not specifically assess IFNγ production in NKT cells or other minor populations, our data indicate that the NK1.1+CD3+ cells (NKT cells) cited in Page 7, Lines  188-192 were essentially absent in the liver tissue of LPS-challenged animals, as shown in Supplementary Figures 3C and S10. The corresponding description has been added to the Results section (Page 7, Lines 188-192) as follows: “We observed that the number of NK cells increased in the liver tissue of PBS-treated MAFLD mice compared with mice fed a control diet (Fig. 4E). LPS challenge increased the accumulation of NK1.1+CD3− NK cells in the liver tissue of MAFLD mice and the absence of NK1.1+CD3+ NKT cells (Fig. S3C and 4E)”.

      This absence was consistent across all experimental conditions, corroborating our focus on NK1.1+CD3− cells as the primary source of NK1.1-associated IFNγ production. Furthermore, data demonstrated in Figure 2E illustrate the presence of IFNγ primarily in NK cells. Therefore, the observed IFNγ signal, attributed to NK1.1+ cells, predominantly reflects conventional NK cells, with minimal contribution from NKT or γδ T cells.

      (2) In Figure 4C, the authors state that the results suggest that T and B cells do not contribute to susceptibility to LPS challenge. However, they observe a drop in survival compared to chow+LPS. Are the authors certain there is no statistical significance there?

      The observed decrease in survival is consistent with our expectations, as T and B cells are not the primary source of interferon-gamma (IFNγ) in this context. Even in their absence, animals remain susceptible to LPS challenge due to the presence of other IFNγ-producing cells that drive the observed lethality. We have carefully re-examined the statistical analysis and confirm that it was correctly performed.  

      (3) Since the survival curve and rate are exactly the same (60%) in Figures 3F, 3G, 4C, 4F, 5G, and 5H I would just like to double-check that the authors used different controls for each experiment.

      The number of mice used in each experiment was carefully determined to ensure sufficient statistical power while fully complying with the limits established by our institutional Animal Ethics Committee. To minimize animal use, the same control group was shared across multiple survival experiments. Despite using shared controls, the total number of animals per experimental group was adequate to produce robust and reproducible survival outcomes. All groups were properly randomized, and the shared control data were rigorously incorporated into statistical analyses. This strategy allowed us to maintain both ethical standards and the scientific rigor of our findings.

      (4) In Figure 5 the authors are saying that it is neutrophils but not monocytes mediate susceptibility of animals with NAFLD to endotoxemia. However, CXCR2i depletion and CCR2 knock out mice affect both monocytes/macrophages and neutrophils. And in Figures 5E, 5G, and 5H they see that a) LPS+CXCR2i decreases liver damage more than LPS+anti Ly6G, b) HFCD mice challenged with LPS and treated with anti-LY6G do not rescue survival to levels of CHOW LPS and c) anti Ly6G treatment helps less than CXCR2i. Therefore, from both knock out mice and depletion experiments the authors can conclude that most likely monocytes (but potentially also other cells) together with neutrophils are substantial for the development of endotoxemic shock in choline-deficient high-fat diet model.

      While neutrophils express CCR2, our data clearly show that CCR2 deficiency does not impair neutrophil migration, as demonstrated in Supplemental Figures 5A and 5B (added to the manuscript, page 8, lines 213–217). The corresponding description has been added to the Results section (Page 8, Lines 213217) as follows: ``Interestingly, animals deficient in monocyte migration (CCR2-/-) showed a high mortality rate compared to wild type after LPS challenge and neutrophil migration is not altered (Fig. 5SA and Fig. 5SB)``, In contrast, CCR2 deficiency primarily affects monocyte recruitment, yet in our experimental conditions, monocyte depletion or CCR2 knockout did not significantly alter the severity of endotoxemic shock, indicating that monocytes play a minimal role in mediating susceptibility in HFCD-fed mice.

      To specifically investigate neutrophils, we used pharmacological blockade of CXCR2 to inhibit migration and antibody-mediated neutrophil depletion. Both approaches have consistently demonstrated that neutrophils are critical factors in endotoxemic shock.

      These findings support our conclusion that neutrophils are the primary cellular contributors to susceptibility in HFCD-fed mice during endotoxemia, with monocytes making a negligible contribution under the tested conditions.

      (6) In Figure 6A (but also others with PD-L1) did the authors do isotype control? And can they show how much of PD1+ population goes on neutrophils, and how much on all the other populations?

      To address this issue, we performed additional analyses to assess the distribution of PD-L1 expression on CD45+CD11B+ leukocytes. These new results, detailed on Page 9, lines 245-250, and now presented in Supplemental Figure 6, demonstrate that PD-L1 expression is predominantly enriched in neutrophils compared to other immune subsets. This observation further reinforces our conclusion that neutrophils represent a major source of PD-L1 in our experimental model.

      To ensure the robustness of these findings, we also included FMO controls for PD-L1 staining in the newly added Supplemental Figure S6. These controls validate the specificity of our gating strategy and confirm the reliability of the detected PD-L1 signal. The corresponding description has been added to the Results section (Page 9, Lines 245-250) as follows: ``First, we observed that only the MAFLD diet caused a significant increase in PD-L1 expression in CD45+CD11b+ leukocytes after LPS challenge (Fig. S6C). We observed that within this population, neutrophils predominate in their expression when compared to monocytes (Fig. 6SA, Fig. 6SB, and Fig. 6SD). Furthermore, PD-L+1 neutrophils showed an exacerbated migration of PD-L1+ neutrophils towards the liver (Fig. 6A and 6B)”

      (7) In Figure 6D it is interesting that there is not an increase in PD-L1+ neutrophils in LPS HFCD IFNg+/+ mice in comparison to LPS chow IFNg+/+ mice, since those should be like WT mice (Figure 6A going from 50% to 97%) and so an increase should be seen?

      The apparent difference between Figures 6A and 6D likely reflects inter-experimental variability rather than a biological discrepancy. Although the absolute percentages of PD-L1⁺ neutrophils varied slightly among independent experiments, the overall phenotype and trend were consistently maintained namely, that PD-L1 expression on neutrophils is enhanced in response to LPS stimulation and modulated by IFNγ signaling. Thus, the data shown in Figure 6D are representative of this consistent phenotype despite minor quantitative variation.

      (8) In Figure 7 do the authors have isotype control for TNFa because gating seems a bit random so an isotype control graph would help a lot as supplementary information, in order to make the figure more persuasive

      To address the concern regarding gating in Figure 7, we have included the FMO showing TNFα as a histogram Supplementary Figure 8gG. These control reaffirm the accuracy and reliability of our gating strategy for TNFα, further supporting the robustness of our data. The corresponding description has been added to the Results section (Page 9, Lines 272-274) as follows:`` We observed an exacerbated TNF-α expression by PD-L1+ neutrophils from MAFLD when compared to control chow animals (Fig. 7A, Fig. 7B, Fig. 7D, and Fig8SG).

      (9) Figure 6C IFNg+/+ mice on CHOW +LPS is same as Figure 8E mice chow +LPS but just with different numbers. Can the authors explain this?

      Although the data points in Figures 6C and 8E may appear similar, we confirm that they originate from entirely independent experiments and represent distinct datasets. To enhance clarity and avoid any potential confusion, we have adjusted the figure presentation and sizing in the revised manuscript. These changes make it clear that the datasets, while comparable, are derived from separate experimental replicates.

      (10) Figure 1E chow B6+LPS is the same as Figure 5D B6+LPS but should they be different since those should be two different experiments?

      We confirm that Figures 1E and 5D correspond to data obtained from independent experiments. Although the experimental conditions were similar, each dataset was generated and analyzed separately to ensure the reproducibility and robustness of our results.

      Reviewer #2 (Recommendations for the authors):

      (1) Why did you look at kidney injury in Figure 1D? I think this should be explained a little.

      We assessed kidney injury alongside ALT, a marker of liver damage, because both the liver and kidneys are among the primary organs affected during sepsis and endotoxemia. This rationale has been added to the manuscript (page 5, lines 129–131): “Remarkably, compared to the Chow group, HFCD mice exposed to LPS did not show greater changes in other organs commonly affected by endotoxemia, such as the kidneys (Figure 1D).” By evaluating markers of injury in both organs, we aimed to determine whether our physiopathological condition was liver-specific or indicative of broader systemic injury.

      (2) I know Figure 2C isn't your data, but why are there so few NK cells, considering NK cells are a resident liver cell type? Doesn't that also bring into question some of your data if there are so few NK cells? And the IFNG expression (2E) looks to mostly come from T-cells (CD8?).

      The data shown in Figure 2C were reanalyzed from a separate NAFLD model based on a 60% high-fat diet. Although this model differs from ours, the observed low number of NK cells is consistent with expectations for animals subjected solely to a hyperlipidic diet, which primarily provides an inflammatory stimulus that promotes recruitment rather than maintaining high baseline NK cell numbers.

      In our experimental model, these observations align with published data. Specifically, liver tissue from NAFLD animals typically exhibits low baseline NK cell numbers, but upon LPS challenge, there is a marked increase in NK cell recruitment to the liver. This dynamic illustrates the interplay between dietinduced inflammation and immune cell recruitment in our experimental context and supports the interpretation of our IFNγ data.

      (3) In your methods, I think you didn't explain something. You said LPS was administered to 56 week old mice, but that HFCD diet was started in 5-6 week old mice and lasted 2 weeks, then LPS was administered. So LPS administration happened when the mice were 7-8 weeks old, right?

      We thank the reviewer for pointing out this inconsistency in our Methods section. The reviewer is correct: the HFCD diet was initiated in 5–6-week-old mice, and LPS was administered after 2 weeks on the diet, such that LPS challenge occurred when the mice were 7–8 weeks old.

      We have revised the Methods section (add page 15-16, lines 474–480).  to clarify this timeline and ensure it is accurately described in the manuscript. The corresponding description has been added to the Materials and Methods section (Page 14, Lines 436-442) as follows: “Lipopolysaccharide (LPS; Escherichia coli (O111:B4), L2630, Sigma-Aldrich, St. Louis, MO, USA) was administered intraperitoneally (i.p.; 10 mg/kg) in C57BL/6, CCR2 -/-, IFN-/-, and TNFR1R2 -/- mice. The HFCD was initiated in 5–6 week-old mice, and LPS was administered after 2 weeks on the diet, meaning that LPS administration occurred when the mice were 7–8 weeks old, with body weights ranging from 22 to 26 g. LPS was previously solubilized in sterile saline and frozen at -70°C. The animals were euthanized 6 hours after LPS administration”.

      (4) Throughout the manuscript, I would consider changing the term NAFLD to something else. I think HFCD diet is a closer model to NASH, so there needs to be some discussion on that. And the field is changing these terms, so NAFLD is now MASLD and NASH is now MASH.

      We appreciate the reviewer’s comment regarding the terminology and disease classification. In our experimental conditions, the animals were subjected to a high-fat, choline-deficient (HFCD) diet for only two weeks, a period considered very early in the progression of diet-induced liver disease. At this stage, histological analysis revealed lipid accumulation in hepatocytes without evidence of hepatocellular injury, inflammation, or fibrosis. Therefore, our model more closely resembles the metabolic-associated fatty liver disease (MAFLD, formerly NAFLD) stage rather than the more advanced metabolic-associated steatohepatitis (MASH, formerly NASH).

      Indeed, prolonged exposure to HFCD diets, typically 8 to 16 weeks, is required to induce the inflammatory and fibrotic features characteristic of MASH. Since our objective was to study the initial metabolic and immune alterations preceding overt liver injury, we believe that using the term MAFLD more accurately reflects the pathological stage represented in our model. Accordingly, we have revised the text to align with the updated nomenclature and disease context.

      (6) I am concerned about over interpretation of the publicly available RNA-seq data in Figure 2. This data comes from human NAFLD patients with unknown endotoxemia and mouse models using a traditional high-fat diet model. So it is hard to compare these very disparate datasets to yours. Also, if these datasets have elevated IFNG, why does your model require LPS injection?

      We thank the reviewer for their thoughtful comments regarding the interpretation of the RNA-seq data presented in Figure 2. We would like to clarify that the human NAFLD datasets referenced in our study do not specifically include patients with endotoxemia; rather, they focus on individuals with NAFLD alone.

      Comparing data from human and murine MAFLD models, we observed that NK cells, T cells, and neutrophils are present and contribute to the hepatic inflammatory environment. Our reanalysis indicates that the elevations of IFNγ and TNF in NAFLD are primarily derived from NK cells, T cells, and myeloid cells, respectively.

      In our experimental model, LPS administration was used to evaluate whether these immune populations particularly NK cells are further potentiated under a hyperinflammatory state, leading to exacerbated IFNγ production. This approach allows us to determine whether increased IFNγ contributes to worsening outcomes in NAFLD, providing mechanistic insights that cannot be obtained from static human or traditional mouse datasets alone.

      (7) The zoom-ins for the histology (for example, Figure 1E) don't look right compared to the dotted square. The shape and area expanded don't match. And the cells in the zoom-in don't look exactly the same either.

      We have thoroughly re-examined the histological sections and the corresponding zoom-ins, including the example in Figure 1E. Upon verification, we confirm that the zoom-ins accurately represent the highlighted areas indicated by the dotted squares. The apparent discrepancies in shape or cellular appearance are likely due to minor differences in orientation or cropping during figure preparation. Nevertheless, the content and regions depicted are consistent with the original sections.  

      (8) Did the authors measure myeloid infiltration in the CCR2-/- mice? Did you measure Neutrophil infiltration in the TNF-Receptor KO mice?

      Analysis of CD45+ cell migration in CCR2 knockout mice, as shown in Supplemental Figure 5C and 5D, demonstrates that the absence of CCR2 does not impair overall leukocyte migration. Similarly, assessment of neutrophil migration in TNF receptor (TNFR1/2) knockout mice, presented in Supplemental Figure 8A, shows that neutrophil trafficking is not affected in these animals. These results indicate that the respective knockouts do not compromise the migration of the analyzed immune populations, supporting the interpretations presented in our study.

      (9) Regarding Methods for RNA-seq Analysis. Was the Mitochondrial percentage cutoff 0.8%, because that seems low. And was there not a Padj or FDR cutoff for the differential expression?

      The mitochondrial percentage in our scRNA-seq analysis reflects the proportion of mitochondrial gene expression per cell, which serves as a quality control metric. A low mitochondrial gene expression percentage, such as the 0.8% cutoff used here, is indicative of highly viable cells.

      For differential gene expression analysis, we employed the FindMarkers function in Seurat with standard parameters: adjusted p-value (Padj) < 0.05 and log2 fold change > 0.25 for upregulated genes, and adjusted p-value < 0.05 with log2 fold change < -0.25 for downregulated genes. These thresholds ensure robust identification of differentially expressed genes while balancing sensitivity and specificity.

      (10) Regarding Methods for Flow Cytometry. How were IFNG and TNF staining performed? Was this an intracellular stain? Did you need to block secretion? TNF and IFNG antibodies have the same fluorophore (PE), so were these stainings and analyses performed separately?

      Six hours after LPS challenge, non-parenchymal liver cells were isolated using Percoll gradient centrifugation. Because the animals were in a hyperinflammatory state induced by LPS, no in vitro stimulation was performed; all staining was carried out immediately after cell isolation. Detection of IFNγ and TNF was performed via intracellular staining using the Foxp3 staining kit (eBioscience). Due to both antibodies being conjugated to PE, IFN-γ and TNF-α staining and analyses were conducted in separate experiments. These distinct staining protocols and analyses are detailed in Supplemental Figures 10 and 11. The corresponding description has been added to the Materials and Methods section (Page 16, Lines 490-493) as follows: ``As animals were already in a hyperinflammatory state, no additional in vitro stimulation was required. Intracellular detection of IFN-γ and TNF-α was conducted using the Foxp3 staining kit (eBioscience). Since both antibodies were conjugated to PE, staining and analyses were performed in separate experiments``

      Reviewer #3 (Recommendations for the authors):

      (1) Achieving an NAFLD model/disease is the starting point of this study. I understand that a two-week HFCD diet period was applied due to the decrease in lymphocyte numbers. Was it enough to initiate NAFLD then? Or is it a milder metabolic disease? Which parameters have been evaluated to accept this model as a NAFLD model?

      Indeed, the two-week HFCD diet induces an early-stage form of NAFLD, characterized by initial fat accumulation in the liver without significant hepatic injury. While this represents a milder metabolic phenotype, it is sufficient to study the inflammatory and immune responses associated with NAFLD. To validate this model, we assessed multiple parameters: liver weight, blood glucose levels, and collagen deposition. These measurements confirmed the presence of early-stage NAFLD features in the animals, providing a relevant and reliable context for investigating susceptibility to endotoxemia and immune cell dynamics. They are shown in Figure Suplementary 1 and the text was included in the manuscript (Page 5, Lines 116-117): “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C) ”.

      (2) It is true that the CD274 gene (encoding PD-L1) and the IFNGR2 gene, corresponding to the IFNγ receptor, are among the upregulated genes when authors analyzed the publicly available RNAseq data but they are not the most significantly elevated genes. What is the reasoning behind this cherrypicking? Why are other high DEGs not analyzed but these two are analyzed?

      We highlighted the expression of the IFN-γ receptor (IFNGR2) and CD274 (encoding PD-L1) in the publicly available RNA-seq data to align and corroborate these findings with the key results observed later in our study. To avoid redundancy, we chose to present these genes in the initial figures as they are directly relevant to the subsequent analyses. Regarding the broader analysis of human RNA-seq data, our primary objective was to identify enriched biological processes and pathways, which served as a foundation for the focus and direction of this study.

      (3) Figures 3C-3G: I understand that IFNg-/- and NFR1R2a-/- mice are not showing elevated liver damage but it may simply be because of the non-responsiveness to the LPS challenge. I suggest using a different challenge or recovery experiments with the cytokines to show that the challenge is successful and results are caused by NAFLD, truly. The same goes for Figure 6: Looking at Figure 6D one may think that IFNg deficiency alters the LPS response independent of the diet condition (or NAFLD condition).

      We appreciate the reviewer’s insightful comment and fully understand the concern regarding the potential non-responsiveness of IFN-γ⁻/⁻ and TNFR1R2a⁻/⁻ mice to the LPS challenge. To address this point and confirm that these knockout animals are indeed responsive to LPS stimulation, we conducted an additional set of ex vivo experiments.

      Specifically, WT and cytokine-deficient (IFN-γ⁻/⁻) mice were fed either Chow or HFCD for two weeks, after which spleens were collected, and splenocytes were challenged in vitro with LPS. We then quantified TNF, IFN, and IL-6 production to confirm that these mice are capable of mounting cytokine responses upon LPS stimulation.

      Due to current breeding limitations and a temporary issue in colony maintenance of TNF-deficient mice, we were unable to include TNFR1R2a⁻/⁻ animals in this additional experiment. Nevertheless, we prioritized performing the analysis with the available knockout line to avoid leaving this important point unaddressed.

      These additional data demonstrate that IFN-γ-deficient mice remain responsive to LPS, reinforcing that the differences observed in vivo are related to the NAFLD condition rather than a lack of LPS responsiveness.

      (4) Figure 1 vs Figure 4: Rag-/- mice seem more susceptible to LPS-derived death even after normal conditions. But If I compare the survival data between Figure 1 and Figure 4, Rag-/- HFCD diet mice seem to be doing better than wt mice after LPS treatment. (1 day survival vs 2 days survival). How do you explain these different outcomes?

      We thank the reviewer for this insightful question regarding the survival data in Figures 1 and 4. Although there is a one-day difference in survival outcomes, Rag-/- mice consistently exhibit increased susceptibility to LPS-induced mortality can influence the exact survival timing. Nonetheless, across all experiments, Rag-/- mice display a reproducible phenotype of heightened sensitivity to LPS challenge, which is supported by multiple independent observations in our study.

      (5) How do you explain Figure 4J in connection to the observation presented with Figure 7: TNFa tissue levels, even though significant, seem very similar between the conditions?

      We would like to clarify that the animals in this study are in a metabolic syndrome state, with early-stage NAFLD characterized by hepatic fat accumulation without significant tissue injury, as shown in Figure 1C.

      Under these conditions, the LPS challenge triggers an exacerbated inflammatory response, leading to increased secretion of IFN-γ and TNF-α, primarily from NK cells and neutrophils. While TNFα levels may appear visually similar across conditions, the HFCD mice exhibit a heightened predisposition for an amplified immune response compared to chow-fed mice. This difference is consistent with the functional outcomes observed in our study and highlights the diet-specific sensitization of the immune system.

  2. May 2026
    1. Medium rreisman.medium.com › as-we-will-think-the-legacy-of-ted-nelson-original-visionary-of-the-web-f4f69a60bd6 “As We Will Think” — The Legacy of Ted Nelson, Original Visionary of the Web | by Richard Reisman | Medium 26 November 2018 - A fuller explanation of why Nelson ... about — A powerful eulogy for where the Web might have gone, and still may someday, and the friendship of the two people most responsible for envisioning the Web* — Ted Nelson’s ...

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the editors and reviewers for their assessment of this manuscript, and for the positive words highlighting the value of undertaking evaluation of small molecule drugs for snakebite in the neotropics, inclusive of the quality of this work and the value of the validated screening pipeline. We completely agree that the next steps for this work will be to evaluate the preclinical efficacy of the identified drugs in mouse models, though this considerable undertaking will form the basis of future work. Critically, the pipeline that we describe herein facilitates the selection of the most appropriate candidates to progress into such mouse studies, aligning with the 3Rs principles for minimising the need for animal research. The comment around insufficient venom characterisation seems somewhat misplaced – the objective of this project was not to characterise the venoms used, but to evaluate the in vitro inhibition of venom toxin family activities and identify the potential utility of specific repurposed drugs as therapeutics for snakebite in the neotropics. Venom characterisation of the diverse samples used in this project would represent an entire project and manuscript in its own right. We are pleased that the reviewers highlight the gap in research on serine protease inhibitors and the value this paper has in highlighting that more research is required in this area to identify a candidate that is more suitable for future clinical use than nafamostat.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Small molecule therapeutics for snakebite have received a lot of attention for their potential to close the gap between bite and treatment, where antivenom is not immediately available.

      Strengths:

      There has been a lot of focus on Africa, Asia, and India, but very little work related to neotropical regions. The authors seek to begin filling this gap in the preclinical literature. The authors use well-developed methods for preclinical assessment.

      Weaknesses:

      A clearer and more focused discussion of the limitations of the overall present work would be desirable (e.g. protection vs. rescue, why marimastat over prinomastat for in vivo assays when both have been through clinical trials for other indications; real-world feasibility of nafamostat, which has a half-life of 1-2 minutes compared to camostat, which has a half-life of hours). All of this could be improved in a revision.

      We thank the reviewer for their shared opinion of the potential value of small molecules as snakebite envenoming therapeutics and their insight on the gap in focus in the neotropics, which this manuscript aims to address.

      Our work in this manuscript included standard practice of pre-incubation between drug and venom for all in vitro studies, and sequential (i.e. not co-incubation) administration in the egg model. In our revised manuscript we will make these distinctions clearer. Use of a ‘rescue’ approach in the in vitro assays is not feasible due to the rapid destruction of the substrates used for assay readouts. The clearest rationale for the use of rescue models relates to their power within in vivo preclinical models (i.e. murine envenoming models) which, following the in vitro characterisations presented in this paper, are the logical next step for evaluating small molecule drugs for inhibiting neotropical snake venoms.

      Although both marimastat and prinomastat are repurposed drugs that have undergone clinical evaluation for other indications, marimastat has been more extensively characterised preclinically than prinomastat for snakebite, and will soon enter Phase II clinical trial evaluation for this indication (https://www.ddw-online.com/ophirex-to-produce-snake-venom-inhibitor-for-lstm-study-40669-202602/). Marimastat also has a longer half-life in humans of 8-10 hours (Millar et al. 1998), compared to prinomastat (2-5h, Hande et al. 2004). We will more clearly highlight the rationale for selecting marimastat in the revised manuscript.

      Although we appreciate the reviewer’s point regarding the short half-life of nafamostat (which is typically given by continuous iv infusion due to its short half-life), in the manuscript we have already stated that we do not recommend the progression of nafamostat as a snake venom serine protease (SVSP) inhibitor candidate due its low efficacy and off target effects. We highlight the need for the community to identify other serine protease inhibitors that might have utility for snakebite.

      Reviewer #2 (Public review):

      Summary:

      The authors set out to test whether a defined set of small molecules can lessen damaging effects caused by venoms from several Bothrops species, and whether these effects are consistent enough to suggest a broadly applicable approach. They present a cross-venom dataset spanning in-vitro activity readouts and blood-based functional outcomes, and include a chicken embryo model to explore whether venom inhibition can translate into improved survival. The central message is that certain small molecules can reduce specific venom-driven effects across multiple samples, providing a comparative resource for the field and a basis for prioritizing future validation.

      Strengths:

      The main value of this work is the breadth and structure of the dataset, which places multiple venoms and multiple readouts into a single, comparable framework that should be useful for readers evaluating patterns across samples. The experimental flow is generally coherent, moving from activity measurements to functional outcomes and then to an in-vivo test, which helps the reader understand how the authors link mechanism-oriented assays to more integrated endpoints. The manuscript also provides practical information for the community by highlighting which readouts appear most consistently affected across venoms, which can help guide hypothesis generation and study design in follow-up work.

      Weaknesses:

      Several aspects of the study design and framing reduce the confidence with which readers can translate the findings beyond the specific experimental context presented. The evidence base is strongest in controlled in-vitro settings, while the bridge to real-world effectiveness remains limited, particularly for understanding performance under conditions that better reflect delayed treatment and systemic exposure. As a result, the manuscript is best interpreted as a well-organized comparative screening study with promising signals, rather than a definitive demonstration of a broadly effective, deployable intervention.

      We appreciate the reviewer’s opinion on the thorough and logical workflow we present in this manuscript and the value this pipeline providers the field for future and parallel work. We agree with the reviewer that this provides a well-organized comparative screening study applicable to different snake species or therapeutics. In relation to the comment on this manuscript being a definitive demonstration of a broadly effective, deployable intervention we agree with their opinion and are happy to clarify that while the evidence presented in this manuscript is promising, there is much work still to do before such molecules are ready for deployment for treating snakebite. Ultimately, this manuscript supports the growing evidence of the promising utility of marimastat and varespladib, and extends this evidence to neotropical snake venoms in a comparative manner. The next step will be to evaluate the efficacy of these molecules within in vivo murine preclinical models, which will be crucial for further supporting the evidence base for onward translation.

      Reviewer #3 (Public review):

      In this work, the authors wanted to evaluate repurposed small molecule inhibitors for the treatment of envenomation by snakes of the Bothrops genus; one of the most medically relevant in the Americas. I believe the objectives of the research were clearly achieved, and compelling evidence for the ability of these molecules to neutralize enzymatic and toxic activities of metalloproteinases and phospholipases in all the tested venoms is provided. Furthermore, the work highlights the limited efficacy of the tested serine protease inhibitor, suggesting a need for drug discovery campaigns to address toxicity caused by this protein family. The methods are well designed and performed, and the use of both in vitro and in vivo methodologies makes this a thorough and robust work.

      These results are extremely relevant, since they take us one step further to a potential orally administered snakebite treatment. The existence of such a treatment could improve the outcomes for thousands of snakebite victims worldwide. I have a few comments and questions that I hope will be useful to the authors:

      We thank the author for their high regard for the purpose and execution of this work. Their insight in relation to questions are supportive for an improved manuscript and discussion points for the field.

      During the introduction, the authors mention that small-molecule inhibitors can neutralize the localized tissue damage via cytotoxicity of some venoms, and cite PLA2s, SVMPs and/or cytotoxic 3FTxs as the main causing agents of this pathology. I am not aware of any direct effect described by small molecule inhibitors on cytotoxic 3FTxs alone. Has this been observed at all? Or is it more likely that the small molecule inhibitors act on the enzymatic toxins only, preventing synergistic effects with 3FTxs?

      We apologise for this error on our behalf. While inhibitory molecules have been described for cytotoxic 3FTxs, these are not small molecules as alluded to in the previous version of the manuscript. We have amended this text in our revised manuscript.

      I think it would be relevant to address the effects of non-enzymatic PLA2s, such as myotoxin II, which have been described in detail within Bothrops venoms. I believe there is some evidence of Varespladib also having a neutralizing effect on the myotoxicity caused by these non-enzymatic PLA2s. I suggest adding a comment about the contribution of these toxins in the discussion or in the section where PLA2 activity of the venoms is compared. In my opinion, right now it seems like these were overlooked.

      We thank the reviewer for highlighting this point. We agree that this is highly relevant and would benefit from discussion in the revised manuscript given the nature of our assays and the non-enzymatic mechanism of action of certain Bothrops PLA<sub>2</sub>s. We have added this to the discussion.

      Regarding Marimastat and the other MP inhibitors, are there any studies showing that they don't have an effect on endogenous MPs? I understand they have been approved for human use before, but is there any indication that they would not have an effect at the doses that would be required to treat envenomation?

      Most matrix metalloproteinases inhibitors will act on endogenous MPs to at least some extent (variable potency on different MMPs). Marimastat has demonstrated activity against endogenous metalloproteinases, including MMP1, which was hypothesised to cause severe joint pain when used chronically (i.e. frequent dosing over many weeks) for indications such as cancer, though this effect was reversible within 8 weeks of cessation of drug administration (Wojtowicz-Praga, 1998). Thus long-term use of matrix metalloproteinases inhibitors can cause safety concerns. However, the anticipated duration of dosing for snakebite, which is an acute life-threatening condition, is a few days. It is therefore unlikely that prior safety concerns observed following chronic dosing in cancer studies would apply to its potential use as a snakebite field therapy.

      Regarding the quenched fluorescence substrate used for enzymatic activity. Is there a possibility that some of the SVMPs would not act on this substrate, and therefore their activity or neutralization is not observed? Would it be relevant to test other substrates, such as gelatin, collagen, or even specific clotting factors?

      It has been observed that certain SVMPs (specifically several PI SVMPs) are not active against this ES010 substrate in vitro. The substrate used in the in vitro SVMP assay is reported by the manufacturer as a substrate for a wide range of MMPs which target the extracellular matrix components mentioned by the reviewer, i.e. collagenases and gelatinases as well as matrilysins, stromelysins and elastate. This in vitro assay combined with the coagulation assays are complementary in covering the main targets of SVMPs (ECM and clotting cascade), prior to haemorrhagic assessment in the egg model. Thus, we are confident that activity for the broad range of SVMP isoforms will be captured through the screening pipeline we have developed.

      Finally, could the authors comment or provide some bibliography regarding the translatability of the chicken embryo model in the context of envenomation?

      Our current model is based on an earlier egg embryo model (Sells et al. 1997, Sells et al. 1998 and Sells et al. 2000) which described good correlations (p<0.01) with the standard WHO murine preclinical envenoming model. These studies have assessed correlations for minimal haemorrhagic doses (MHDs), LD50s and ED50s in both models for a selection of viper venoms. As chicken embryos at day 6 of development have incomplete neural arcs, the model is not well suited for assessing neurotoxic effects, but can be effectively used for addressing venom-induced haemorrhage and lethality and for testing therapeutics. In addition, a more recent study (Yusuf et al. 2023) reported almost identical LD50s for the venom of Bitis arietans between the two in vivo approaches. The model is also being pursued as a preclinical testing model by an antivenom manufacturer with the focus of reducing the use of rodents in batch release testing (Verity et al. 2021). We will provide further clarification on the rationale for using the egg model, including the supportive references outlined above, in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript provides a useful comparative dataset across multiple Bothrops venoms and supports SVMP inhibition as a broadly effective lever in the authors in-vitro work. However, the strength of the 'pan-Bothrops' and translational claims is currently limited by insufficient characterization of the exact venom samples tested and by experimental designs that fall in clinically realistic rescue.

      Major comments:

      (1) The venoms used in this study are historical batches and are not formally characterized beyond SDS-PAGE and literature summaries, despite well-known intra- and inter-population venom variability; this weakens the generalization of the conclusions.

      To address this comment, we have increased clarity on our venom sources being historic, Due to the historic source locality is not available beyond country of origin, with the exception of B. lanceolatus which is endemic to Martinique. Figure 1 also makes clear that we agree with the reviewer that the variation is high within Bothrops species. We discuss this variation on the limitations in our sampling for making broad conclusions throughout the first paragraph of the discussion, with the final sentence stating Future proteomic characterisations of the specific venom samples used in this study, which were all sourced from a historical collection (except for B. lanceolatus), would be informative in this regard. Although venom composition of our samples has not been characterised, the focus of the manuscript is the characterisation of the whole venom functional activity through a wide ranging screening pipeline, and the generalisation of our findings is supported by the diversity of the venom samples (i.e. several species) despite them not being characterised (which is not critical for the focus of the study).

      (2) On a technical comment, the venom inhibition assays appear to rely on drug-first or preincubation conditions, which can easily overestimate efficacy compared with real snakebite envenomation, where toxins distribute and engage targets rapidly. Here, a translational gap is the clinical feasibility of the 'repurposed' inhibitors, as it is unclear whether the drugs central to the conclusions (especially marimastat, prinomastat and varespladib) are realistically available or stocked in hospitals or could be deployed in regions where Bothrops envenoming occurs. I think that the manuscript should clearly distinguish this from candidates with a plausible access and delivery pathway.

      Our work in this manuscript includes standard practice of pre-incubation between drug and venom for all in vitro studies, and sequential (i.e. not co-incubation) administration in the egg model. None of our methods administer drug-first. Throughout the methods and figure legends we have made these distinctions clearer. Use of a ‘rescue’ approach in the in vitro assays is not feasible due to the rapid destruction of the substrates used for assay readouts. The clearest rationale for the use of rescue models relates to their power within in vivo preclinical models (i.e. murine envenoming models), which would be the next step for this research programme.

      While the evidence presented in this manuscript is promising, there is much work still to do before such molecules are ready for deployment for treating snakebite, inclusive of the requirement to complete clinical trials, cost-benefit analysis and policy change and manufacturing/distribution feasibility assessments. Ultimately, this manuscript supports the growing evidence of the promising utility of marimastat and varespladib, and extends this evidence to neotropical snake venoms in a comparative manner. The next step will be to evaluate the efficacy of these molecules within rescue in vivo murine preclinical models, which will be crucial for further supporting the evidence base for onward translation. To further support this point we have included an additional section to the manuscript discussing the current preclinical and clinical progression of prinomastat and marimastat, which also incorporates the public comment on selection of marimastat over prinomastat.

      (3) In my opinion, the Nafamostat results and discussion need reframing, given weak SVSP inhibition and intrinsic anticoagulant behavior at 5 µM. Excluding it from certain analyses undermines interpretability, and it may be more appropriate to include it throughout as an explicit negative control condition (showing its baseline anticoagulant effect) rather than omitting it.

      Although we understand the reviewers opinion here, we disagree and believe that including nafomastat as a ‘negative control’ may present a negative reflection on the benefit that an efficacious serine protease inhibitor could provide. Furthermore, as the intrinsic anticoagulant effect of nafamostat cannot be de-coupled from direct SVSP toxin inhibition we were unable to interpret the activity which undermines the results. This can be seen in Figure 3b, which demonstrates that a false positive result would occur. For the serine protease assay, we do clearly discuss the lack of efficacy and justification of why EC<sub>50</sub> testing wasn’t appropriate within the guidance of our screening protocols.

      In the manuscript we have now further justified our approach in relation to the limitations of nafamostat as a snake venom serine protease (SVSP) inhibitor candidate due its low efficacy and off target effects. We highlight the need for the community to identify other serine protease inhibitors that might have utility for snakebite.

      (4) The data presentation needs consistent statistical analyses (currently absent for multiple key figures, including Figures 2, 3, 4, 6 and 7) and a clearer explanation for the dose of venom and drugs you choose. For example, Figure 3 relies on a fixed 5 µM drug concentration and very different venom amounts (50-100-250 ng), but it is not discussed whether such exposures are achievable in vivo, or how these concentrations map onto expected pharmacokinetics in patients. Likewise, Bothrops venoms can contain both pro- and anticoagulant activities, so the authors should justify how their framework accounts for anticoagulant components and why the observed plasma phenotypes are interpreted as they are

      In relation to the reviewers comment on the need for consistent analysis we thank the reviewer for flagging this and have now included these in figures 3, 4, 6 and 7. However, Figure 2 is presented to display the variation between all the venoms and ultimately used to select the most relevant doses for the latter inhibition experiments, therefore statistical analysis is not relevant for this figure. The updated statistical analysis now includes the following, which has been included in the relevant figure legends and results sections;

      Figure 3 - Bars indicate significant results (p = <0.05) identified through one-way ANOVA with Dunnett’s multiple comparisons test to the DMSO control

      Figure 4 - two-way ANOVA with Šídák's multiple comparisons test of each venom control compared to the matched venom treated with inhibitor

      Figure 6 – the CT and MCF data were analysed independently using one-way ANOVA with Tukey’s multiple comparisons test

      Figure 7 - Log-rank test (Mantel-Cox) with Holm- Šídák's multiple comparisons test against treatment vs venom-only control

      We have ensured that all figure legends clearly indicate the venom and drug dose to aid the clarity which the reviewer requested.

      The comment Figure 3 relies on a fixed 5 µM drug concentration and very different venom amounts (50-100-250 ng), but it is not discussed whether such exposures are achievable in vivo, or how these concentrations map onto expected pharmacokinetics in patients. is an understandable query however, in vitro assessment such as those carried out in this manuscript are not designed to directly inform pharmacokinetic/pharmacodynmanic interpretations, largely because they do not replicate real world envenoming (i.e. preincubation would not occur between a venom and treatment). This is why, as stated, follow on preclinical and clinical assessments are needed for onward progression of these inhibitors to inform dosing regimens that might achieve the necessary exposures required for in vivo venom neutralisation. That being said, PK/PD work has been initiated within Phase I trials, for example with DMPS Abouyannis et al. 2025 demonstrated a plasma exposure of >10 µg/mL for single doses of 1,200 mg and higher. This is equivalent to 80 µM, which although is lower than the EC<sub>50</sub> for some venoms in the clotting assay (Figure 3J), the venom dose (50 to 250 ng/ 50 µL, i.e. 1,000 to 5,000 ng/µL) is estimated to be >1000 times higher than a natural envenoming by Bothrops atrox at less than 1 ng/mL in serum (https://doi.org/10.1016/j.toxicon.2022.09.010). These extrapolations therefore indicate that the doses selected in our studies would have human clinical relevance.

      Finally, in terms of anticoagulant venom effects - these would be observed in our experimental approach either as reduced kinetic responses in the plasma clotting assay (as observed with nafamostat in Figure 3B) or as a prolonged clotting time in the thromboelastography assay (Figure 6). As stated in the results section Comparison of coagulation profiles, all of the venoms tested presented with a procoagulant effect. If underlying anticoagulant activity from PLA<sub>2</sub> toxins was to arise after inhibition of the procoagulant toxins (i.e. SVMPs by marimastat), as has been seen for certain other snake venoms previously, this would result in a percentage inhibition far greater than 100% in the plasma assay (Figure 3C to I) or as a prolonged clotting time in the thromboelastography assay. These described anticoagulant profiles were not observed with any venom tested in this study.

      (5) Finally, the in vivo evidence is limited to a chicken embryo model. To support your hypothesis, a conventional mouse model with delayed post-envenomation dosing (24-36 h monitoring) is needed to address both safety/toxicity and post-exposure efficacy, and to define a realistic therapeutic window, especially because venom toxins act very quickly and the timing of administration is central to the clinical utility of any small-molecule approach.

      We agree with the reviewer that the next important step for this research activity is utilising murine preclinical models to validate the in vitro and preliminary in vivo findings described in this manuscript. However, as stated above, this study provides the initial evidence base that the promising utility of marimastat, DMPS and varespladib as repurposed snakebite drugs extends to a range of neotropical viper venoms. Evaluating the safety, efficacy (both precincubation and rescue approaches) and PK/PD relationships to inform optimal dosing strategies of these molecules will be crucial next steps for the field. However, these activities are far from trivial and will take several years of additional research, and therefore fall outside the scope of this initial manuscript.

      To address the concern related to the evidence is limited to a chicken embryo model, we have included additional sentences to discuss the wider use of the egg model within snakebite research and related translation to murine studies.

      Minor comments:

      (1) Figure 2D: How do you discuss the fact that "no venom" has SVSP activity?

      The data for all in vitro assays in Figure 2 is presented as AUC from the raw data (absorbance or fluorescence), for consistency across assay. Therefore, all assays (B to D) have background signal in the absence of venom. The SVSP assay has a greater background signal.

      (2) For better understanding, I would suggest adding a dedicated column in Figure 4A with Nafamostat SVSP data reported as "N/D" where applicable.

      As stated in the results, due to the weak inhibitory activity EC<sub>50</sub> assessment was not justified, therefore adding this column would be redundant.

      (3) The introduction is too long relative to the experimental content and would benefit from tightening to sharpen the motivation and unmet need.

      We thank the reviewer for their opinion and we have reviewed the introductory section again. While we made minor edits throughout, we decided not to make substantial modifications to it.

      Reviewer #3 (Recommendations for the authors):

      I only have some minor comments:

      (1) In line 100, the word "that" is repeated.

      We thank the reviewer for spotting this error, which we have corrected.

      (2) Line 433. I believe the word "compromising" should be substituted by "comprising" here.

      We thank the reviewer for spotting this.

      (3) Figure 1 and supplementary: Bothrops asper venom has been very thoroughly studied, and using only one study from Costa Rica might underestimate the venom variation within the species. I suggest looking at the following study: https://doi.org/10.1016/j.toxicon.2022.106983. Maybe it is not necessary to change anything, but worth looking into.

      We appreciate the reviewer flagging this paper, it has been added to the manuscript (reference 48) and has provided additional data for Figure 1 and Supplementary table 1.

      (4) Methods: Given the intraspecies variation described for some of these species, I believe it is relevant to add the locality of origin of the venoms, and not only the country. I, of course, understand this is often unknown for historical samples.

      We have included the following sentence in the methods. Due to the historic nature of the venom samples, the source locality is not available beyond country of origin, with the exception of B. lanceolatus which is endemic to Martinique.

      (5) Figure 3: It is not very accurate to show an SD when the sample number is 2. I suggest, when possible, showing the mean and the two data points in the plots. This also applies to other figures where n=2. Also, in Figure 3D, does Marimastat seem to have an anticoagulant effect, or is this just within normal variation?

      We have removed the statement in the statistics paragraph of the methods Standard deviation (SD) for all kinetic reads and standard error for AUC is reported based on Prism v10 but kept the sentence. The sample sizes for HTS assays including the SVMP, PLA<sub>2</sub> and coagulation experiment are the average of the means from independent assays (n >2 within each independent assay). We understand the reviewer’s opinion on limited meaning of SD as well as SE for Fig 3 A to I, therefore we have changed the error bars to range, as we think that displaying the individual points would result in a lack of visual and analytic clarity.

      In relation to the query about marimastat anticoagulant effect in Fig 4D, as shown in 4B marimastat has no direct anticoagulant effect. The >100% inhibition for marimastat is likely to be normal variation as this is a biological assay which has high variability. However, it could also be that the strong inhibition of the SVMPs in B. asper along with limited SVSP activity has unmasked an anticoagulant effect of the remaining PLA<sub>2</sub> toxin which has high activity in this venom. That being said, as B. asper has a similar profile, we would have expected to see a similar profile in B. atrox in both the plasma and TEG assays. Therefore, assay variation seems the most likely reason for this observation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this review paper, the authors describe the concept of neural correlates of consciousness (NCC) and explain how noninvasive neuroimaging methods fall short of being able to properly characterise an unconfounded NCC. They argue that intracranial research is a means to address this gap and provide a review of many intracranial neuroimaging studies that have sought to answer questions regarding the neural basis of perceptual consciousness.

      Strengths

      The authors have provided an in-depth, timely, and scholarly contribution to the study of NCCs. First and foremost, the review surveys a vast array of literature. The authors synthesise findings such that a coherent narrative of what invasive electrophysiology studies have revealed about the neural basis of consciousness can be easily grasped by the reader. The review is also, to the best of my knowledge, the first review to specifically target intracranial approaches to consciousness and to describe their results in a single article. This is a credit to the authors, as it becomes ever harder to apply strict tests to theories of consciousness using methods such as fMRI and M/EEG it is important to have informative resources describing the results of human intracranial research so that theorists will have to constrain their theories further in accordance with such data. As far as the authors were aiming to provide a complete and coherent overview of intracranial approaches to the study of NCCs, I believe they have achieved their aim.

      We appreciate the reviewer's positive feedback on our work.

      Weaknesses

      Overall, I feel positive about this paper. However, there are a couple of aspects to the manuscript that I think could be improved.

      (1) Distinguishing NCCs from their prerequisites or consequences

      This section in the introduction was particularly confusing to me. Namely, in this section, the authors' aim is to explain how intracranial recordings can help distinguish 'pure' NCCs from their antecedents and consequences. However, the authors almost exclusively describe different tasks (e.g., no-report tasks) that have been used to help solve this problem, rather than elaborating on how intracranial recordings may resolve this issue. The authors claim that no-report designs rely on null findings, and invasive recordings can be more sensitive to smaller effects, which can help in such cases. However, this motivation pertains to the previous sub-section (limits of noninvasive methods), since it is primarily concerned with the lack of temporal and spatial resolution of fMRI and M/EEG. It is not, in and of itself, a means to distinguish NCCs from their confounds.

      As such, in its current formulation, I do not find the argument that intracranial recordings are better suited to identifying pure NCCs (i.e. separating them from pre- or post-processing) convincing. To me, this is a problem solved through novel paradigms and better-developed theories. As it stands, the paper justifies my position by highlighting task developments that help to distinguish NCCs from prerequisites and consequences, rather than giving a novel argument as to why intracranial recordings outperform noninvasive methods beyond the reasons they explained in the previous section. Again, this position is justified when, from lines 505-506, the authors describe how none of the reported single-cell studies were able to dissociate NCCs from post-perceptual processing. As such, it seems as if, even with intracranial recording, NCCs and their confounds cannot be disentangled without appropriate tasks.

      The section 'Towards Better Behavioural Paradigms' is a clear attempt to address these issues and, as such, I am sure the authors share the same concerns as I am raising. Still, I remain unconvinced that the distinguishing of NCCs from pre-/post- processing is a fair motivation for using intracranial over noninvasive measures.

      We agree that distinguishing proper NCCs from their prerequisites or consequences is primarily a matter of experimental design and theoretical framework, not merely of recording modality. We did not mean to imply that intracranial recordings inherently solve this dissociation.This is now explicitly stated that at the beginning of this section. Instead, we argued that the high signal-to-noise ratio and spatiotemporal accuracy of sEEG offer a stronger "testing ground" for the null findings often relied on by no-report paradigms. This is now also further clarified in the revised section “Limits of noninvasive measures”.

      We also explicitly acknowledge, as the reviewer noted, that even the most precise recordings require careful task dissociations to distinguish NCCs from their prerequisites and consequences.

      (2) Drawing misleading conclusions from certain studies

      There are passages of the manuscript where the authors draw conclusions from studies that are not necessarily warranted by the studies they cite. For instance:

      Lines 265 - 271: "The results of these two studies revealed a complex pattern: on the one hand, HGA in the lateral occipitotemporal cortex and the ventral visual cortex correlated with stimulus strength. On the other hand, it also correlated with another factor that does not appear to play a role in visibility (repetition suppression), and did not correlate with a non-sensory factor that affects visibility reports (prior exposure). These results suggest that activity in occipitotemporal cortex regions reflecting higher-order visual processing may be a precursor to the NCC but not an NCC proper."

      It's possible to imagine a theory that would predict HGA could correlate with stimulus strength and repetition suppression, or that it would not correlate with prior exposure (e.g. prior exposure could impact response bias without affecting subjective visibility itself). The authors describe this exact ambiguity in interpretation later in the article (line 664), but in its current form, at least in line 270 (when the study is most extensively discussed), the manuscript heavily implies that HGA is not an NCC proper. This generates a false impression that intracranial recordings have conclusively determined that occipitotemporal HGA is not a pure NCC, which is certainly a premature conclusion.

      We agree that our interpretation of these studies (lines 265–271 of the previous version of the manuscript) was presented too definitively. We have modified the text (now lines 314-317) to soften this conclusion and align it with the more nuanced discussion later in the manuscript. Specifically, we now frame this as a "suggested dissociation" rather than a conclusive finding (line 730), and we explicitly acknowledge that alternative interpretations remain viable.

      Line 243: "Altogether, these early human intracranial studies indicate that early-latency visual processing steps, reflected in broadband and low gamma activity, occur irrespective of whether a stimulus is consciously perceived or not. They also identified a candidate NCC: later (>200 ms) activity in the occipitotemporal region responsible for higher-order visual processing."

      The authors claim in this section that later (>200ms) activity in occipitotemporal regions may be a candidate for an NCC. However, the Fisch et al. (2009) study they describe in support of this conclusion found that early (~150ms) activity could dissociate conscious and unconscious processing. This would suggest that it is early processing that lays claim to perceptual consciousness. The authors explicitly describe the Fisch et al results as showing evidence for early markers of consciousness (line 240: '...exhibited an early...response following recognized vs unrecognised stimuli.) Yet only a few lines later they use this to support the conclusion that a candidate NCC is 'later (>200ms) activity in the occipitotemporal region' (line 245). As such, I am not sure what conclusion the authors want me to make from these studies.

      This problem is repeated in lines 386-387: "Altogether, studies that investigated the cortical correlates of visual consciousness point to a role of neural responses starting ~250 ms after stimulus onset in the non-primary visual cortex and prefrontal cortex."

      This seems to be directly in conflict with the Fisch et al results, which show that correlates of consciousness can begin ~100ms earlier than the authors state in this passage.

      We thank the reviewer for pointing out this inconsistency. We agree that stating ">200 ms" conflicts with the findings of Fisch et al. (2009), who observed dissociations as early as ~150 ms. Our goal was to contrast the very early, stimulus-driven responses with the later responses that reflect consciousness. However, as the reviewer correctly notes, the exact "onset" of these signals varies across studies and paradigms. To address this, we have removed the specific ">200 ms" mentioned in line 245 of the previous version of the manuscript and updated the timing in line 284 to "starting 150 ms" to better reflect the results of Fisch et al. We also clarify that while the exact latency depends on the paradigm, a consistent finding is that activity representing conscious contents in higher-order visual cortex follows an initial wave of unconscious processes (lines 809-810).

      (3) Justifying single-neuron cortical correlates of consciousness

      The purpose of the present manuscript is to highlight why and how intracortical measures of neural activity can help reveal the neural correlates of perceptual consciousness. As such, in the section 'Single-neuron cortical correlates of perceptual consciousness', I think the paper is lacking an argument as to why single-neuron research is useful when searching for the NCC. Most theories of consciousness are based around circuit or system-level analyses (e.g., global ignition, recurrent feedback, prefrontal indexing, etc.) and usually do not make predictions about single cells. Without any elaboration or argument as to why single-cell research is necessary for a science of consciousness, the research described in this section, although excellent and valuable in its own right, seems out of place in the broader discussion of NCCs. A particularly strong interpretation here could be that intracranial recordings mislead researchers into studying single cells simply because it is the finest level of analysis, rather than because it offers helpful insight into the NCCs.

      It is true that many prominent theories of consciousness were developed based on macroscopic observations, largely due to the prevalence of non-invasive recordings in humans. However, we argue that recording single-unit activity is important for several reasons, and we made this clearer in the revised version. First, signals like fMRI, EEG (or even LFP) often conflate multiple distinct neural populations. SUA allows us to dissociate neurons representing the percept from neighboring neurons involved in task-related confounds (e.g., motor preparation or arousal) that would otherwise be blurred together. Therefore, some percepts might be represented by sparse coding involving a small, specific population of "concept" or "percept" cells. Electrophysiological studies in animal models reveal that various cognitive processes are encoded within neuronal subspaces that only emerge when single-unit activity is analyzed as lower-dimensional projections of the broader neural activity manifold (Mante et al., 2013; Ebitz & Hayden, 2021; Jayazeri & Afraz, 2017). Importantly, many neural computations are only discernible through the lens of population dynamics (i.e. with single neuron activity) (Vyas et al., 2021). We believe that providing high granularity through SUA recordings prevents over-aggregation of data, ensuring that even system-level theories can build on biologically accurate foundations.

      Moreover, some theories are defined at the cellular level. For instance, the Dendritic Integration Theory (Bachmann et al., 2020) posits that the integration of feedforward and feedback signals occurs at the level of individual pyramidal neurons. Without SUA, these cellular mechanisms remain untestable. Beyond spatial granularity, SUA also provides excellent temporal granularity, which is crucial for testing theories that rely on the precise timing of spikes (e.g., neural synchrony). As LFPs reflect average activity across populations, only SUA can confirm whether individual neurons lock their spikes to a specific phase, a mechanism hypothesized to bind features into a conscious whole.

      We added these points to a new section in the revised manuscript. References:

      Bachmann, T., Suzuki, M., & Aru, J. (2020). Dendritic integration theory: A thalamo-cortical theory of state and content of consciousness. Philosophy and the Mind Sciences, 1(II).

      Ebitz, R. B., & Hayden, B. Y. (2021). The population doctrine in cognitive neuroscience. Neuron, 109(19), 3055-3068.

      Jazayeri, M., & Afraz, A. (2017). Navigating the neural space in search of the neural code. Neuron, 93(5), 1003-1014.

      Mante, V., Sussillo, D., Shenoy, K. V., & Newsome, W. T. (2013). Context-dependent computation by recurrent dynamics in prefrontal cortex. nature, 503(7474), 78-84.

      Vyas, S., Golub, M. D., Sussillo, D., & Shenoy, K. V. (2020). Computation Through Neural Population Dynamics. Annual Review of Neuroscience, 43(1), 249-275.

      (4) No mention of combined fMRI-EEG research

      A minor point, but I was surprised that the authors did not mention any combined fMRI-EEG research when they were discussing the limits of noninvasive recordings. Intracortical recordings are one way to surpass the spatial and temporal resolution limits of M/EEG and fMRI respectively, but studies that combine fMRI and EEG are also an alternative means to solve this problem: by combining the spatial resolution of fMRI with the temporal resolution of EEG, researchers can - in theory - compare when and where certain activity patterns (be they univariate ERPs or multivariate patterns) arise. The authors do cite one paper (Dellert et al., 2021 JNeuro) that used this kind of setup, but they discuss it only with respect to the task and ignore the recording method. The argument for using intracranial recordings is weaker for not mentioning a viable, noninvasive alternative that resolves the same issues.

      We thank the reviewer for this point. We have added a discussion of fMRI-EEG to the "Limits of noninvasive measures" section (lines 167-171). While we acknowledge that fMRI-EEG is a powerful non-invasive tool for bridging spatial and temporal scales, we note that it relies on merging an indirect metabolic signal with a weak electrophysiological one filtered by the skull, which is computationally complex and often noisy. In contrast, intracranial recordings provide direct measures of both local field potentials and spiking activity within the same neural population, offering interpretability and signal-to-noise ratio that non-invasive combinations cannot match. In our view, this is not just an alternative to these methods, but a unique means of accessing the underlying neuronal ground truth.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors review the study of the neural correlates of consciousness (NCCs). They discuss several of the difficulties that researchers must face when studying NCCs, and argue that several of these difficulties can be alleviated by using intracranial recordings in humans.

      They describe what constitutes an NCC, and the difficulties to distinguish between an NCC proper from the prerequisites and consequences of conscious processing.

      They also describe the two main types of experimental designs used to study NCCs. These are the contrastive approach (with its report and non-report variants), and the supraliminal approach, each with its own merits and pitfalls.

      They discuss the limitations of non-invasive methods, such as fMRI, EEG and MEG, as well as the limitations of the use of invasive recordings in non-human animals.

      After setting the stage in this way, the authors provide an extensive review of the knowledge acquired by using invasive recordings in humans. This included population-level measurements in vision and in other sensory modalities, as well as single-neuron level studies. The authors also discuss studies of subcortical NCCs.

      The second half of this work discusses the theoretical insights gained through the use of intracranial recordings, as well as their limitations, and a perspective for future work.

      Strengths:

      This work offers an impressive review, which will serve as a useful reference document, both for newcomers to the study of NCC and for experienced researchers. The inclusion of non-visual and subcortical NCCs is of particular merit, as these have been understudied.

      Besides serving as a review, this work includes a perspective, exploring several directions to pursue for the progress of the field.

      We thank the reviewer for acknowledging the strength of our work.

      Weaknesses:

      The intention of the authors is to argue how some of the problems faced when studying NCCs are alleviated by the use of intracranial recordings in humans. But in some cases, the link between the problems related to the study of NCCs and the advantages of intracranial recordings over non-invasive methods is not clear.

      For example, the authors explain the difficulties in distinguishing between true NCCs from their prerequisites and consequences. This constitutes a difficult conceptual problems that plague all recording techniques. The authors don't provide a convincing explanation of how intracranial recordings offer advantages over EEG or MEG when dealing with these problems.

      We agree that the distinction between proper NCCs and their prerequisites or consequences is a fundamental challenge that affects all recording modalities. We did not intend to imply that intracranial recordings are a "silver bullet" for solving this conceptual problem in isolation, and we now explicitly state that at the beginning of this section (line 101).

      We have revised the section on "Distinguishing NCCs from their prerequisites or consequences" to clarify that intracranial recordings are a powerful tool when used in conjunction with appropriate experimental designs, rather than a standalone solution to these conceptual difficulties.

      For example, the authors explain how the use of non-report designs to rule out post-perceptual processing relies on null results, which, according to them, are harder to interpret given the low resolution of non-invasive methods. But the interpretation of null results is actually more complicated in the case of intracranial recordings. As the coverage achieved by the electrodes is sparse, if a null result is attested, it remains possible that a true effect was present in a nearby patch of cortex out of coverage.

      It is true that a null result in an intracranial study may simply reflect that the relevant neural population was not sampled by the specific electrode implantation scheme. However, we argue that interpreting null results is equally, if not more, complicated in non-invasive methods, albeit for different reasons. While M/EEG offers broader coverage, it is blind to many cortical sources because of their orientation (radial sources in MEG) or their location in deep sulci and subcortical structures. The signal-to-noise ratio of M/EEG is also much lower than that of intracranial EEG, making it more likely that null results obscure the existence of subtle effects (Parvizi & Kastner, 2018).

      To address this, we revised the manuscript to clarify that intracranial recordings provide high local certainty within the sampled regions (lines 224-227), whereas non-invasive methods provide broader coverage (lines 247-249). We now explicitly emphasize that drawing conclusions from null results based on intracranial recordings requires caution regarding electrode placement. We also point out that these approaches are complementary: M/EEG can identify large regions of interest, while sEEG can then provide high-resolution "ground truth" to confirm whether those regions are part of the NCC.

      Reference: Parvizi, J., & Kastner, S. (2018). Promises and limitations of human intracranial electroencephalography. Nature Neuroscience, 21(4), 474-483. https://doi.org/10.1038/s41593-018-0108-2

      The authors argue that the spatial resolution of intracranial recordings is better than that of EEG and MEG. While this is technically true (especially compared to EEG), the true spatial scale of the NCCs is unknown. If NCCs' span is in the mm range, then the additional spatial resolution of intracranial recordings might not be an advantage.

      We agree with the reviewer that the exact spatial scale of the NCC remains a topic of ongoing debate. However, we believe that the advantage of intracranial recordings holds true whether the NCC spans millimeters or centimeters. The main spatial limitation of non-invasive electrophysiology (M/EEG) is not just its spatial resolution but also the inverse problem. Since scalp sensors detect a mixture of signals from across the brain, different cortical configurations can produce identical scalp patterns. This makes it challenging to precisely locate the NCC or distinguish it from nearby activity (e.g., motor or attentional signals). When recording intracortically, a widespread NCC could be captured across multiple adjacent channels with high accuracy. Conversely, if the NCC is focal, it can be isolated with high spatial resolution. In either case, intracranial recordings eliminate the spatial ambiguity inherent in scalp recordings. We have revised the Introduction (lines 158-164) to clarify that the "spatial advantage" of intracranial recordings also pertains to the inverse problem, not merely to the ability to record from smaller cortical areas.

      Another factor that should be taken into consideration when assessing the spatial resolution of intracranial recordings is that while the listening zone of individual intracranial contacts is small, coverage is sparse and defined by clinical criteria (something that the authors discuss). In practice, the activity recorded by contacts is usually attributed to anatomically defined ROIs with a scale in the cm range. Given the sparse and uneven (across regions and patients) coverage afforded by intracranial recordings, the advantage of intracranial recordings in terms of spatial resolution is overstated.

      We thank the reviewer for raising this point regarding how intracranial data is often aggregated into regions of interest. We agree that if researchers generalize findings to large anatomical regions without accounting for single-channel recordings, some of the spatial benefits of intracranial recordings are indeed mitigated. We toned down some of the original claims accordingly, and acknowledged more clearly that clinical constraints of sEEG lead to sparse coverage (245-249).

      However, we maintain that even when using an ROI-based approach, intracranial recordings offer a clear advantage over non-invasive methods, in that they represent a direct measure from a specific patch of tissue, rather than a statistical estimate that may be contaminated by "leakage" from distant sources. To address the reviewer’s concern, we have updated the manuscript (lines 244-245) to emphasize the importance of relying on MNI coordinates and individual anatomy rather than solely on broad ROI labels.

      Appraisal of whether the authors achieved their aims:

      In this work, the authors have gathered an impressive review and have discussed several important problems in the field of study of NCCs, as well as provided a perspective on how the field could move forward.

      What is less clear is how the use of intracranial recordings per se holds potential to overcome problems such as the distinction between true NCCs and the prerequisites and consequences of conscious processing.

      Discussion of the likely impact of the work on the field:

      This work has the potential of becoming a must-read for anyone working in the field of consciousness research.

      Reviewer #3 (Public review):

      Summary:

      This narrative review provides a clear, well-structured, and comprehensive synthesis of intracerebral recording work on the neural correlates of consciousness. It is written in an accessible manner that will be useful to a broad community of researchers, from those new to iEEG to specialists in the field.

      Strengths:

      The manuscript successfully integrates methodological and theoretical perspectives and offers a balanced overview of current, sometimes contradicting evidence. As such, the manuscript is important as it calls for a concerted and better exploration of NCCs using iEEG in the future.

      We thank the reviewer for stating the importance of our work and its potential contribution to the field.

      Weaknesses:

      The manuscript extensively discusses the use of "report" as a criterion for identifying conscious perception and its limitations for separating between correlates of consciousness and post-consciousness processes, yet the term is not defined at the outset. The authors should specify what they mean by "report" (e.g., verbal report, nonverbal self-report, or any meta-cognitive indication of experience). Importantly, this definition should be explicitly linked to the theoretical landscape: whether the authors adopt an access-consciousness perspective in which (self) reportability is central, or whether the review also aims to address phenomenal consciousness. Making this conceptual grounding explicit at the beginning will help readers interpret the empirical work surveyed throughout the review.

      We agree that a clear definition of report is essential for the reader to interpret the empirical findings presented. We have added a definition to the Introduction (lines 108-111), specifying that we use "report" to refer to any explicit behavioral response (whether verbal, manual, or otherwise) that communicates a subject’s subjective state.

      Regarding the conceptual distinction between Phenomenal and Access consciousness, we refer to recent work from some of the co-authors (Mudrik et al., 2025), which suggests that P and A should not be seen as two types of consciousness, but rather as two necessary conditions for conscious experience. While a full discussion of this distinction is beyond the scope of this review, we now clearly state that our focus is on identifying neural activity that reflects the subjective experience itself, regardless of the downstream requirements of report.

      Reference: Mudrik, L., Faivre, N., Pitts, M., & Schurger, A. (2025). On a confusion about there being two types of consciousness. Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2025.11.012

      In addition, the review would benefit from an earlier introduction of the distinction between states and contents of consciousness. This distinction becomes important in the later section on anaesthesia, sleep, and epileptic seizures, where the focus shifts from content-specific NCCs to alterations in global states. Presenting these definitions upfront and briefly explaining how states and contents interact would strengthen the coherence of the manuscript.

      We agree that clarifying the distinction between contents and levels of consciousness early on provides a stronger framework for the paper.

      We have added a brief clarification in the Introduction (lines 63-76): "It is also helpful to distinguish between levels of consciousness, defined as a global level of arousal or wakefulness (e.g., being awake vs. under anesthesia), and the contents of consciousness, defined as the specific subjective experiences one has while conscious (e.g., perceiving a visual stimulus; Bayne et al., 2016; Laureys, 2005). While the majority of this review focuses on 'content-specific' NCCs, the two dimensions are intrinsically linked, as global states typically set the conditions for the occurrence of specific conscious contents."

      Overall, this is an excellent and timely review. With clearer initial theoretical definitions of consciousness, the manuscript will offer an even stronger conceptual framework for interpreting intracerebral studies of consciousness.

      We thank the reviewer again for this highly positive assessment of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to reiterate that I believe this is a very scholarly piece of writing, and I congratulate the authors on producing such a useful and timely manuscript. Below, I suggest just a few ways the authors may resolve some of the issues I raised in the public review. However, I would like to emphasise that these are merely suggestions - the authors may think of different and better ways to address these comments that are more in line with either their thinking or writing style, and I would certainly encourage the authors to follow their own preferences if they feel they are at odds with my suggestions.

      For the longer comment questioning whether intracranial recordings are really a way to isolate NCCs from their pre- and post-processing, there are two ways the authors could resolve this. One is that they collapse the section distinguishing NCCs from their prerequisites and consequences into the previous section regarding limits of noninvasive measures. For instance, they could make the point that null results are easier to interpret with intracranial recordings in this previous section. Then they could discuss how specific intracranial studies have been able to resolve questions of pre-/post- processing confounds when they introduce studies later in the manuscript. At the moment, the Distinguishing NCCs from their prerequisites and consequences section, at least to me, undermines the argument of why intracranial recordings are important because it spends too much time describing how tasks are the core component of isolating pure NCCs, and not the recording method.

      Alternatively, the authors could keep the structure as it is. In this case, I would urge the authors to emphasise the role of intracortical recordings here and to make the argument that this is a problem that intracortical recordings (rather than novel tasks) can solve more convincingly. Citing specific studies that combined intracortical recordings with no-report paradigms and emphasising how the invasive recording allowed the researchers to reach a conclusion that would not have been possible with noninvasive measures would also be helpful.

      We thank the reviewer for these useful suggestions and agree that we would not want readers to take from this paper that design issues can be fixed by using invasive recordings. Because confounding issues are crucial in research on the NCC, we believe it is important to include a section on this topic in the Introduction. However, as we explained in our response to the public review, we revised the section introducing Human intracranial electrophysiology to reflect that intracranial recordings are a complementary tool that improves the interpretability of no-report paradigms, rather than a “silver bullet” solution for confound issues. We also explicitly say now that this problem is relevant to all techniques in the study of consciousness, including intracranial recordings (line 101). Additionally, based on the reviewer’s suggestion, we have added a more detailed explanation of how studies that pair intracranial recordings with no-report paradigms provide a unique insight in the Temporal Insights section (lines 822-823).

      For my comment: Drawing misleading conclusions from certain studies, I think the public review speaks for itself. I would recommend that the authors make sure they are drawing correct conclusions from the studies they cite, and make clear from the outset where there is ambiguity in interpretation.

      We thank the reviewer for bringing these ambiguities to our attention. As explained in the response to the public review, we have modified the text accordingly.

      Finally, with regard to the single-cell analyses, I would imagine that most readers will share at least some scepticism around single neurons being the appropriate level of analysis for revealing the basis of perceptual experience. As such, I think it would strengthen the manuscript greatly if the authors could provide a brief argument as to how such work can either inform theories of consciousness or contribute more generally to the study of NCCs, given that the field and its theories are mostly biased towards studying system-level neural processes. I think single-cell analyses are extremely valuable to NCC research, and the authors have a good opportunity to frame these studies accordingly.

      We agree. As detailed in the response to the public review, we now specify (1) how a higher level of granularity in electrophysiological measurements can distinguish between awareness-related signals and confounds, (2) that these measurements provide an opportunity to study neuronal population dynamics where various cognitive processes have been shown to emerge in animals and (3) that single-neuron measurements are necessary to test predictions of theories that are defined at the cellular level

      Reviewer #2 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      My compliments for having written an impressive review. Overall, I think that this is a beautiful piece of work that will be of great use to the community. My only concern is that the advantages of intracranial recordings over non-invasive methods in solving the difficulties faced in the study of NCCs are overstated.

      Here I provide more precise comments for your consideration.

      (1) On page 5, lines 100 to 102, you argue that "Scalp EEG and MEG have limitedanatomical resolution due to the overlap of deep and superficial brain signals at the scalp level and, in the case of EEG, the scattering of the adjacent electrical signals through the scalp". It would be good to provide precise estimates of the spatial resolutions of EEG, MEG and intracranial recordings, with accompanying references. Consider also that MEG is relatively insensitive to deep sources. I recommend this paper: Piastra et al. 2020 https://onlinelibrary.wiley.com/doi/10.1002/hbm.25272

      We thank the reviewer once again for their positive evaluation of our work. As detailed in the response to the public reviews, we now clarify that intracranial recordings provide high local certainty within the sampled regions (lines 224-227), whereas non-invasive methods provide broader coverage (lines 247-249). We thank the reviewer for their additional suggestions and have clarified our concern about the anatomical conclusions that can be drawn from scalp EEG and MEG data (lines 158-164).

      (2) On page 11, you describe work showing that activity in the occipitotemporal cortex mightreflect a precursor to consciousness, but not an NCC proper, except for the case of faces, in which the fusiform seems to behave like a true NCC. Could you discuss how these seemingly contradictory results could be reconciled?

      One possibility is that activity in some parts of the occipitotemporal cortex instantiates content-specific NCCs, i.e., correlates that are only specific to certain stimulus types (in this case: faces), while activity in other parts instantiates precursors of the NCCs. Because faces have been extensively studied, we might have uncovered the content-specific NCCs for these stimuli but not for others. This is now discussed in the text on lines 342-344. Based on reviewer 1’s suggestion, we have also toned down our claim about occipitotemporal activity being a precursor to the NCC.

      (3) From line 322, you start to discuss connectivity analyses. Adding a subheading mightimprove readability.

      We appreciate the suggestion; however, adding a subheading to a single paragraph would require restructuring the entire section, which could disrupt the flow. We believe the current format maintains clarity and cohesion.

      (4) In line 329, you write "It remains unclear to what extent these connectivity patterns reflectpost-perceptual processing and how the signals associated with perceptual consciousness in the occipitotemporal cortex interact with frontoparietal regions." But it's not clear why this is the case.

      We meant to make two separate points: (1) these studies did not control for report-related activity using no-report paradigms and (2) there has been no investigation so far of the interaction between occipitotemporal and frontoparietal signals associated with perceptual consciousness. These two points have been clarified in the text (lines 378-381).

      (5) In line 692, it would be good to clarify that Pereira 2021 is a single-neuron study.

      This has been clarified in the text.

      (6) The phrase "more research/work is needed" is repeated several times.

      Thank you for pointing this out. To avoid redundancy, we have deleted the second mention of this phrase.

    1. Quality at parity hasn't unlocked majority adoption. Plant-based nuggets — the format that has reached sensory parity in blinded testing — still hold only 2 to 3% of the conventional nugget category. If matching taste isn't sufficient, then taste investments alone may have lower returns than the parity-headroom argument suggests.

      Think about this more and state in a a more reasoned logical way. Note that we're largely thinking about price here (as well as taste, nutrition and availability). We're largely focused on the the impact of cost and price on consumption and substitution. In fact, skeptics were saying that "we don't care too much about substitution and price impacts because 1. it has such a low market share and 2. it's not taste or nutrition comparable."

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers for their constructive evaluation of our manuscript. We are pleased by the overwhelmingly positive consensus regarding the quality and significance of our data. In particular, the reviewers highlighted that this is a "nice, clean study with interesting data" and noted that our in vivo functional genetic findings in the Drosophila wing are "clearly a strength" that "moves the paper beyond cell-culture correlations" to provide a "simple, straightforward take-home message".

      The principal critique across the reports concerns the extent of direct mechanistic evidence linking Groucho (Gro) to regulation of the early elongation checkpoint. Several reviewers suggested additional genomic experiments, including RNA-seq, PRO-seq, or Pol II ChIP approaches, to further examine transcription and pausing behaviour. However, we would like to flag up that genomic datasets addressing these questions across multiple Drosophila cell lines have already been published previously, including work from our own group and others.

      The primary objective of the current study is therefore not to replicate these existing genomic analyses, but rather to build directly upon them. We identify a consistent genomic association between Gro and pausing/elongation factors across cell types. Importantly, we extend these findings beyond genomic correlations through in vivo genetic analysis in the developing Drosophila wing.

      1. Description of the planned revisions

      • *

      • *

      Reviewer 1

      The figures and text could lay out the logic of the genetic interactions for non-Drosophila readers. For example, the comparison of single and double copies of Gro-RNAi to combinatorial knockdowns, when it is additive, and when it is interpreted as synergistic.

      The statistical analyses presented in Figure 5C, including Fisher’s exact tests comparing phenotype distributions between genotypes, were intended to address the distinction between additive and synergistic genetic interactions. However, we agree that the presentation of these comparisons could potentially be made clearer for readers less familiar with Drosophila genetic interaction assays. We would therefore be open to revising the presentation of Figure 5 and the accompanying explanatory text following editorial guidance and with consideration of the intended readership of the eventual journal.

      The statistical analysis of the phenotype distributions should be shown more clearly (Fig. 5B).

      Figure 5B is intended to present the distribution of observed phenotypic classes and does not include statistical comparisons. A similar analysis has been published for experiments looking at the phenotypes of moderate Groucho overexpression in the wing in the presence of HDAC inhibitors (Winkler et al., 2010 doi.10.1371/journal.pone.0010166). Statistical analyses of the genetic interaction experiments are presented separately in 5C. We therefore believe the current presentation of Figure 5B is appropriate for illustrating phenotype frequencies rather than statistical inference, but we will consider moving this panel to the Supplementary material.

      Minor comments

      -Figure 5 would gain clarity if the phenotype classes/panel letters were shown more clearly on the images. -The legends of the wing figures should be expanded, especially for readers outside the Drosophila field. -"in vivo" should be italicised consistently.

      We agree that clearer labelling of phenotype classes, panel annotations and expanded figure legends could improve the accessibility of Figure 5, particularly for readers less familiar with Drosophila wing phenotypes and genetic interaction assays. We would therefore be open to revising the presentation of this figure and its accompanying legends in a future revised version.

      We thank the reviewer for noting the typographical inconsistency of italics for in vivo. This will be corrected during manuscript revision and proofing.

      __Reviewer #2 __

      Reviewer #2 (Significance (Required)):

      I think this is nice little paper providing a simple, straightforward take-home message. It does not conceptually shake the world, and the evidence consists of (nice) correlations, with no direct proof put forward for the conclusions. I am not a Drosophila geneticist but probably rather an 'expert' on basic transcription mechanisms. I think the data in the paper are of high quality, if limited in scope, and that the conclusions are supported by the results, but I do not think the results or conclusions will have a big audience. Having said that, I found it interesting to learn about this group of repressors and their likely mode of action.

      On the other hand, it is worth emphasizing that proteins such as NELF and CDK9 would arguably be expected to be found at very many genes, as promoter-proximal pausing does exist at a plethora of genes, also genes that are house-keeping genes, ie not regulated by cell type or stimuli. So, lots of genes with pausing are not regulated by modulation of pausing. So, basically, the fact that knockdown of the repressor Groucho and loss of pausing is additive does not in my opinion necessarily mean that Groucho works by stabilizing pausing. Although it is admittedly a reasonably assumption, Groucho could also work by repressing transcription initiation; the genetic outcomes of 'double relief' would be the same, ie higher transcription levels. I think a brief comment to this effect might be appropriate, especially in the absence of (difficult to obtain) direct evidence that the transcription initiation step is not affected by Groucho.

      While we agree that the current study does not directly exclude possible effects of Groucho on transcription initiation, previously published work has already provided evidence arguing against repression by Groucho occurring primarily through inhibition of transcription initiation or prevention of pre-initiation complex assembly. Groucho-bound transcriptional start sites were previously shown to retain RNAP II occupancy, active chromatin features, and detectable basal transcriptional activity despite repression (Kaul et al., 2014).

      To acknowledge this possibility and explain why it is unlikely, we will add the sentence “While effects on transcription initiation cannot be completely excluded, previous work argues against Gro repressing transcription primarily through inhibition of transcription initiation. Gro-bound promoters remain accessible, overlap RNAP II occupancy, and retain active chromatin features and basal transcriptional activity” to the start of the third paragraph of the Discussion.

      Reviewer #3

      The methods section is lacking details on how ChIP-seq was performed in the BG3 cell line. The methods section does a good job of indicating how the data were processed. Information on the antibodies and conditions used is critical, as is whether spike-in controls were used.

      The generation of the ChIP-seq data from BG3 cells has already been published. __We will add the line “The production of ChIP-seq datasets for Gro binding in Kc167, S2R+ and BG3 cells has been described elsewhere (Kaul, Schuster and Jennings, 2014; Bar-Cohen et al., 2023)” in the Analysis of ChIP-seq data subsection of the Methods. __

      1. Description of analyses that authors prefer not to carry out

      • *

      __Reviewer #1 __ Major comments 1. The main weakness is the lack of a mechanistic link between Gro and the early elongation checkpoint. This is really the main point for this reviewer. The manuscript builds an interesting model, and the data support a functional connection between Gro and pausing-related factors, but the mechanistic link is absent. At present, the paper relies on co-localisation of ChIP peaks and genetic interaction in vivo. This is interesting and supportive, but with several possible interpretations. The title and some parts of the text are thus a bit stronger than what is directly demonstrated. Two possibilities could be proposed: either tone down the mechanistic claim or strengthen it experimentally. A more direct assay of pause release or productive elongation after Gro depletion at endogenous targets would be highly valuable. For example, Gro-KD followed by Pol II Ser2-P ChIP, or promoter vs. gene body analysis on Gro-bound genes, ideally comparing genes with Gro at TSS vs. not-TSS, would greatly support the proposed model. If the assay is established, this seems feasible in about 4 months.

      We thank the reviewer for this thoughtful comment. We agree that the current study does not directly measure genome-wide RNAP II pause release following Gro depletion. However, several key observations linking Gro with promoter-proximal pausing have already been published and are summarised in the Introduction. Previous work demonstrated that Gro occupancy correlates with paused genes and that depletion of Gro reduces RNAP II pausing and increases elongating RNAP II at the endogenous E(spl)mbeta-HLH locus, an established target gene of Groucho-mediated repression (Kaul et al., 2014; doi.10.1371/journal.pgen.1004595). We also note that several of the experiments proposed by the reviewer have already been addressed in previous work. Specifically, Kaul et al. (2014) demonstrated that Gro depletion increases elongating RNAP II (Ser2-P) at the endogenous E(spl)mbeta-HLH locus while total promoter-associated RNAP II occupancy remains largely unchanged. Promoter versus gene body analyses in that study further supported a role for Gro in regulating progression through the early elongation checkpoint rather than transcription initiation.

      The aim of the current manuscript was therefore to build upon these earlier mechanistic and genomic observations by asking whether the relationship between Gro and pausing-associated factors extends across multiple cell types and whether it has functional significance in vivo. By integrating comparative genomic analyses with sensitised developmental genetic assays in the wing, we provide evidence that Gro functionally interacts with multiple regulators of the early elongation checkpoint during development.

      The bioinformatic part could be strengthened on "distinct TF repertoires" between cell types.The authors interpret the cell type-specific Gro recruitment as reflecting distinct transcription factor repertoires in BG3, Kc167 and S2R+ cells. This is interesting, but not really shown. To make this point more strongly, the author could provide a map of TF expression across different cell types, especially for the TFs corresponding to the enriched motifs they discuss. Otherwise, this remains speculative.In line, the manuscript discusses enriched motifs in BG3 and compares them to Kc167 and S2R+ cells, but this remains a bit descriptive. A clearer side-by-side comparison would strengthen the paper. This is particularly relevant to the motifs used in interpreting cell type-specific recruitment.


      The interpretation that cell type-specific Gro recruitment reflects differences in transcription factor repertoires is based on several previously established observations already described in the manuscript. BG3 cells are derived from the larval CNS, whereas Kc167 and S2R+ cells are embryonic haemocyte-like lines (Cherbas et al., 2011; doi.10.1101/gr.112961.110). Transcriptomic analyses have further shown that these Drosophila cell lines maintain stable and distinct lineage-associated transcriptional identities, including differences in transcription factor expression (Cherbas et al., 2011). Given the diversity of transcription factors known to recruit Gro, the observed cell-type-specific binding patterns and motif enrichments are consistent with the distinct lineage-associated transcriptional programmes previously described for these cell lines.

      1. Several overlap analyses could be discussed more in depth. A few statements feel too strong for the actual percentages. For example, the GAF overlap in BG3 is around 51% genome-wide and 56% at TSS, which is meaningful, but not especially high. The text already states that it is not universal, and this point could be discussed more clearly.

      We note that the manuscript already explicitly states that overlap between Gro and GAF is not universal. Given the diversity of factors known to recruit Gro and the broad genomic distribution of GAF, we consider overlap frequencies of approximately 50% to represent a substantial association, particularly at transcription start sites. Importantly, the interpretation does not rely on complete co-occupancy between these factors, but rather on the observation that Gro-bound regions show significant enrichment for multiple factors associated with promoter-proximal pausing across different cell types.

      Similarly, for the UpSet plot, the wording around the "most frequent" combination could be toned down, because this is not a dominant pattern.

      The statement that the overlap between Gro, Nelf-E, GAF, Cdk9 and RNAP II represents the “most frequent” combination refers specifically to the relative frequency of the intersection categories within the UpSet analysis. In this context, the overlap between all five factors represents the largest intersection category identified (306 of 649 Gro peaks), with the next most frequent category containing substantially fewer peaks (90 of 649). We therefore feel that the current wording accurately describes the distribution observed in the analysis.

      More generally, I think the manuscript needs a clearer quantitative breakdown of TSS versus non-TSS peaks for the overlap analyses with NELF, GAF, Cdk9 and CycT. Several interpretations depend on this distinction, and right now, this is not always clear enough.

      The overlap analyses presented in Figure 3 explicitly distinguish between TSS and non-TSS peaks, and the corresponding quantitative overlap frequencies are described in the Results section. We do not consider that additional breakdowns are required for interpretation of the current data as this distinction is already incorporated into both the analyses and figure presentation.


      The "enhancer chromatin" interpretation is interesting, but not fully integrated with the genomic distribution. The observation that Gro is enriched in open enhancer-type chromatin is interesting and supports the idea that Gro does not act mainly through classical repressed chromatin. However, Gro peaks are also enriched at promoters and introns, and this reviewer feels that the manuscript does not fully connect these observations. Where are these enhancer-type peaks located exactly? Are they often intronic? Can this be correlated with the distribution of Gro peaks? This would help the reader and also strengthen the discussion because intronic Gro peaks are present in the data, but are not well integrated into the model.

      In the current manuscript, “enhancer chromatin” refers to chromatin states defined by combinations of enhancer-associated histone modifications, including H3K4me1, H3K27ac and H3K56ac as defined by Skalska et al.,2015 (doi.10.15252/embj.201489923), rather than exclusively to distal intergenic regulatory regions. As described in the chromatin-state analysis, these enhancer-associated chromatin signatures do occur at intronic regulatory regions, including regions classified as active intron chromatin. We therefore do not consider the enrichment of Gro peaks at promoters, enhancers and intronic regions to be mutually exclusive observations within this framework.

      Intronic enhancer localisation is common in Drosophila, where the compact organisation of the genome results in many developmental regulatory elements residing within introns (Arnold et al., 2013; doi.10.1126/science.1232542). We therefore consider the presence of Gro peaks within intronic regions to be fully consistent with the observed enrichment of Gro binding within enhancer-associated chromatin states.

      The in vivo part is a strength, but some important points need clarification.The in vivo section is a clear highlight of the manuscript. It gives functional relevance to the model and moves the paper beyond cell-culture correlations. That said, a few points need to be clearer:-RNAi efficiency is not clear for the tested genes, especially the pausing factors. This is important because the differential effects between NELF subunits could simply reflect differences in knockdown efficiency.

      While differences in RNAi efficiency could potentially contribute to variation in phenotype strength between individual knockdowns, multiple biological explanations could also account for the differing effects observed between NELF subunits, including differences in protein stability, residual complex activity, or subunit-specific functions. Importantly, the central conclusion of the manuscript does not depend on quantitative comparison of phenotype strength between individual NELF components, but rather on the observation that perturbation of multiple pausing-associated factors genetically interacts with Gro in vivo.

      If RNAi validation is possible with existing reagents, this seems realistic within 3 months.

      The manuscript focuses on the genetic interactions observed between Gro and pausing-associated factors in vivo rather than on quantitative comparison between individual RNAi lines. As no specific validation experiments were proposed, we are not currently planning additional RNAi validation analyses for the present study.

      The discussion could be expanded, especially because the mechanism is not fully shown.Since the direct mechanism is still missing, the discussion could compensate. Right now, the proposed model is interesting, but it still leaves many open questions. For example:-Is Gro affecting the recruitment or activity of elongation factors?-Could looping or enhancer-promoter communication contribute?-How should the intronic Gro peaks be interpreted in the model?-In the wing, could the phenotype be discussed more mechanistically, in light of what is already known about Gro and derepression of vein-promoting genes?For example, a model figure could help here.


      We thank the reviewer for these thoughtful suggestions.

      Several of the points raised by the reviewer are discussed in the manuscript already. For example, we discuss the possibility that Gro influences the activity or recruitment of elongation-associated factors. We agree that enhancer-promoter communication and chromatin looping are a plausible component of this mechanism. As the Drosophila genome is compact and intronic enhancers are highly prevalent, topological looping provides a clear physical framework for how Gro molecules distributed at non-TSS sites regulate promoter-proximal machinery. Indeed, we have previously published this model (Kaul, Schuster, and Jennings, 2015; see Figure 1C; doi.10.1080/21541264.2014.1000709). Our current in vivo and genomic findings build directly upon this model, suggesting that within these established looped configurations, Gro acts locally to interface with and stabilize the pausing machinery.

      With respect to the wing phenotypes, the Discussion focuses primarily on the interpretation of the observed genetic interactions between Gro and pausing-associated factors rather than on defining the precise downstream target genes contributing to vein phenotypes. We agree that additional mechanistic dissection of these developmental phenotypes would be interesting. However, this would require a substantial expansion of the study into the detailed developmental and signalling mechanisms underlying vein specification, which lies beyond the primary focus of the current manuscript.

      OPTIONAL: It would be interesting to know whether the same peak distribution / functional logic is observed in mammalian TLE orthologs. This is not essential for the current conclusions, but it would broaden the impact.

      Determining whether similar genomic distributions and functional relationships are conserved for mammalian TLE orthologues will be an important future project. However, relatively little comparable genome-wide TLE occupancy data are currently available, meaning that such analyses would require a substantial independent undertaking beyond the scope of the present study.

      Minor comments -Please explain why promoters were defined as {plus minus}250 bp from the TSS. This seems rather narrow.

      Promoters were defined as ±250 bp from annotated transcription start sites. This window size is commonly used in Drosophila genomic studies, where the compact organisation of the genome means that broader windows frequently overlap adjacent genes.

      -Please clarify why S2R+ cells are included in the comparative part but are not followed in the same way in some downstream analyses.

      S2R+ cells were included in the comparative analyses to determine which aspects of Gro recruitment were shared across multiple cell types and which were cell-type specific. Some downstream analyses focused on BG3 and Kc167 cells because these lines had the most extensive corresponding datasets available for the chromatin and pausing-factor analyses performed in the current study.

      __Reviewer #3 __ Here Martínez Quiles and Jennings investigate the role of the Groucho repressor in BG3 cells. This extends a previous study that used S2R+ cells, published previously by one of the authors, as well as Kc167 cells. They find that Gro is recruited to gene promoters in a cell-type-specific manner. Gro associates with open chromatin, is mostly associated with enhancer regions, and is primarily excluded from regions of the genome that are repressed by Polycomb. After studying its function in cell culture, the authors investigate the role of Gro in a wing-specific background. The findings here are mostly correlative, showing that loss of Gro results in stronger phenotypic defects when combined with loss of factors including NELF-B or NELF-D, LARP7, and bin3. They propose that Gro acts to attenuate gene expression during early gene expression. This claim would be greatly strengthened if the authors provided RNA-seq data in addition to the ChIP-seq data shown in this manuscript, especially to examine gene expression patterns among the different cell lines studied here. At present, this is a correlative study that does not illuminate the mechanism of Gro in directly regulating promoter-proximal pausing or RNA polymerase behavior.

      We thank the reviewer for this suggestion. However, extensive transcriptomic analyses of Drosophila cell lines, including Kc167, S2R+ and BG3-derived lines, have already been published (Cherbas et al., 2011), together with RNA-seq analyses following Gro depletion (Kaul et al., 2014). In addition, the association between Gro occupancy and paused genes has also been reported previously (Kaul et al., 2014; Chambers et al., 2017; doi. 10.1186/s12864-017-3589-6).

      While additional RNA-seq analyses could further characterise transcriptional differences between cell lines, RNA-seq alone would not directly determine whether altered transcript levels arise specifically through changes in promoter-proximal pausing, as opposed to effects on transcription initiation, transcript stability, or indirect downstream regulatory effects. We therefore do not consider additional RNA-seq analyses necessary to support the central conclusions of the present study.

      Figure 2-3: For the ChIP-seq data, scale the y-axis in the same manner to better understand enrichment between the samples.

      These ChIP-seq datasets were generated independently using different antibodies and experimental conditions, direct comparison of enrichment magnitudes across datasets would not be biologically meaningful. Accordingly, our analyses focus on significant peak calls and overlap relationships rather than relative signal intensity. Applying identical y-axis scaling across all tracks would obscure significant enrichment in several datasets and could therefore be misleading.

      RNA-seq data between different cell lines would greatly enhance the authors findings or Pro-Seq to really show a relationship with Gro binding and promoter proximal pausing.

      We note that RNA-seq datasets for Gro depletion in Kc167 and S2R+ cells have already been published previously (Kaul et al., 2014), together with evidence linking Gro occupancy to paused genes (Kaul et al., 2014; Chambers et al., 2017). We therefore do not consider that additional RNA-seq analysis would substantially strengthen the central conclusions of the current manuscript.

      Moreover, RNA-seq alone cannot distinguish if altered transcript abundance reflects changes in promoter-proximal pausing from other mechanisms influencing transcript abundance. While PRO-seq approaches could provide further mechanistic information regarding RNAPII dynamics, such experiments are beyond the scope of the present study.

      This study helps to further clarify how Gro binds DNA in different cell types and indicates that may intersect with factors involved in promoter proximal pausing. The study is highly correlative and would require additional work to show a mechanistic link between Gro and transcription attenuation due to promoter proximal pausing.

      While we agree that PRO-seq approaches could provide additional mechanistic information regarding RNAPII dynamics, establishing an appropriate experimental and analytical framework for these analyses would require a substantial extension beyond the scope of the present study. In addition, several aspects of the relationship between Gro occupancy, transcriptional repression, and promoter-proximal pausing that underpin these suggestions have already been addressed in previously published work, including RNA-seq analyses following Gro depletion (Kaul et al., 2014), evidence linking Gro occupancy with paused genes (Kaul et al., 2014; Chambers et al., 2017), and studies demonstrating that Gro-mediated repression does not occur through inhibition of pre-initiation complex assembly. The current manuscript is therefore intended to build upon these existing findings by integrating comparative genomic analyses with new in vivo genetic interaction data.

      • *
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study sought to investigate the role that early childhood malaria exposure plays in the development of antibody responses to unrelated pathogens and vaccine-derived antigens in Kenyan children. In this natural experiment, the authors compare antibody levels among children who have been exposed to different levels of malaria transmission by using protein microarray technology. Although the findings are of importance, the evidence remains incomplete, and the analysis would benefit from a more in-depth evaluation of potential confounders. With the appropriate analysis, the findings will be of great interest for global health, immunology, and vaccine development.

      We thank the editors for highlighting the need for a more comprehensive evaluation of potential confounding. We agree that this is a critical aspect of the study and have now undertaken additional analyses to address this directly.

      The original longitudinal cohort was designed to investigate the acquisition of naturally acquired immunity to malaria and did not include systematic collection of anthropometric/nutritional, environmental or socioeconomic data, precluding direct adjustment for these factors within the primary dataset. However, to assess whether there were population-level differences in these factors, we leveraged contemporaneous hospital-based surveillance data from the same geographic regions, which includes measurements of anthropometry and nutritional status (muac, weight-for-age, and height-for-age) and detailed infection diagnostics.

      Using this independent dataset, we fitted mixed-effects regression models adjusting for age, calendar year, and concurrent infections (RSV, parainfluenza, influenza A, human metapneumovirus, OC43). For all three anthropometric indices, we found no evidence of systematic differences between children from Junju and Ngerenya. Adjusted differences were small and centred around zero (muac: −0.12, 95% CI −0.38 to 0.15, weight-for-age: −0.05, −0.28 to 0.19, height-for-age: 0.08, −0.17 to 0.33), with no consistent directional effect.

      As the longitudinal cohort was randomly selected from these underlying populations, these findings suggest that the groups were broadly comparable with respect to nutritional status and there were no differences in their exposure to the infections that were included in the analysis. We have incorporated these analyses into the revised manuscript, added a new figure focussed on this analysis -fig. 6, updated the statistical analysis and discussion sections), and believe they substantially strengthen the evidence by addressing a key source of potential confounding.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study shows that childhood malaria can weaken the antibody response to other vaccines and infections. This suggests that early exposure to P. falciparum may have a long-lasting effect on immunity, with implications for vaccine efficacy in endemic areas.

      Strengths:

      This study stands out for its longitudinal design, the use of robust immunological techniques, and the comparison between areas with different levels of malaria exposure. Its findings reveal that early malaria can weaken the response to childhood vaccines, with important implications for public health in endemic regions.

      We thank the reviewer for this comment

      Weaknesses:

      One of the study's main limitations is the lack of functional data confirming the clinical impact of the low antibody levels. Furthermore, although multiple immune responses were measured, other important components, such as cellular immunity, were not assessed. Furthermore, the results may not be generalizable to other regions.

      We thank the reviewer for this important comment and agree that the absence of functional immunological assays is a limitation of the current study. Our analysis was designed to determine whether early-life malaria exposure is associated with durable alterations in antibody responses to unrelated pathogens and vaccine antigens, rather than to establish the downstream functional consequences of these differences. As such, the study is able to demonstrate a broad and persistent attenuation of humoral responses but cannot directly determine whether the lower antibody levels observed translate into reduced neutralising capacity or diminished protection at the individual level.

      We have revised the manuscript to make this distinction more explicit. In the revised discussion, we now state that although reduced antibody titres to vaccine-preventable pathogens may have implications for long-term protection, the clinical significance of these differences remains to be established in future studies incorporating functional assays and clinical outcome data.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated whether early-life malaria exposure has long-term effects on immune responses to unrelated antigens. They leveraged a natural experiment in coastal Kenya where two adjacent communities (Junju and Ngerenya) experienced divergent malaria transmission patterns after 2004. Using 15 years of longitudinal data from 123 children with weekly malaria surveillance and annual serological sampling, they measured antibody responses to multiple pathogens using a protein microarray technology and ELISA.

      Strengths:

      (1) Extensive longitudinal data collection with weekly malaria surveillance, enabling precise exposure classification.

      (2) Use of a natural experiment design that allows for causal inference about malaria's immunological effects.

      (3) Broad panel of antigens tested, demonstrating generalized rather than antigen-specific effects.

      (4) Within-cohort analysis in Ngerenya controls for geographic and environmental factors.

      (5) Validation of key findings using both serologic microarray and ELISA.

      (6) Important public health implications for vaccine strategies in malaria-endemic regions.

      We thank the reviewer for these comments

      Weaknesses:

      (1) Lack of participants' characteristics (socio-economic, nutritional, physical).

      We thank the reviewer for this important comment. We have now included a detailed summary of participant characteristics in Table 1to provide context for the study population. This includes key demographic and longitudinal variables stratified by cohort (Junju and Ngerenya), including sex distribution, age at study entry and exit, duration of follow-up, number of visits per participant, and total serum samples analysed. Detailed data on socio-economic status, nutritional status, and other environmental or physical characteristics were not consistently available across all participants and time points, and therefore could not be included. This has now been explicitly stated as a limitation in the discussion.

      (2) Somewhat limited sample size (longitudinal analysis of 123 children total), with further subdivision reducing statistical power for some analyses.

      We thank the reviewer for this important observation. The study is based on an intensively followed cohort with weekly malaria surveillance and repeated serological measurements throughout childhood, allowing detailed characterisation of early-life exposure and subsequent immune trajectories. This depth of longitudinal sampling provides resolution that is not achievable in larger cross-sectional studies. We acknowledge that subdivision of the cohort reduces statistical power for some analyses. Nevertheless, the key findings were consistent in several independent comparisons, including a reduction in antibody levels for broad panel of antigens in the malaria endemic setting, within-cohort analyses in Ngerenya that replicated this observation, and the confirmation of results generated on the protein microarray on the ELISA platform. The consistency of these findings across analytical approaches and measurement platforms reduces the likelihood that the observed effects are driven by small-sample variability. We have clarified this point in the revised discussion to emphasise that the strength of the study lies in the depth and longitudinal resolution of the data rather than the absolute sample size.

      (3) Potential confounding by unmeasured socioeconomic, nutritional, or environmental factors between communities.

      We thank the reviewer for this important point and agree that residual confounding between communities must be considered. As outlined in reponse to the editorial assesment, we have undertaken additional analyses using contemporaneous population-level data from the same regions and found no evidence of systematic differences in anthropometric indices between children from Junju and Ngerenya after accounting for age, calendar year, and concurrent infections, with effect estimates small and crossing zer. In addition, the within-Ngerenya analysis provides an internal comparison within a shared geographic and environmental setting, reducing the likelihood that unmeasured socioeconomic or environmental differences between communities account for the observed associations. The new analyses suggest that major population-level differences in nutritional status or infection burden are unlikely to explain the observed patterns. We have clarified this point in the revised discussion and explicitly acknowledge the possibility of residual confounding.

      (4) Lack of ability to determine the direction of the associations found between malaria exposure and other IgG levels to unrelated pathogens.

      We agree that, as an observational study, our analysis cannot definitively establish the direction of the association between malaria exposure and antibody responses to unrelated antigens. However, several features of the study design strengthen the inference that early-life malaria exposure contributes to the observed differences. First, malaria exposure was characterised prospectively through intensive weekly surveillance, allowing precise temporal definition of exposure in early childhood. Second, within the Ngerenya cohort, where children were exposed to different levels of malaria due to a rapid decline in transmission, those with even limited early-life exposure exhibited lower antibody responses at 10 years of age than malaria-naïve peers, despite residing in the same geographic and environmental context. In addition, we now show that these differences are not confined to a single timepoint but are evident across the full longitudinal follow-up after adjustment for age and repeated measurements. While we cannot exclude the possibility of residual confounding or bidirectional relationships, the convergence of evidence from the natural experiment design, within-cohort contrasts, and age-adjusted longitudinal analyses supports early-life malaria exposure as a key contributor to long-term differences in antibody responses. We have clarified this in the discussion.

      (5) Despite good longitudinal data, the main analysis was conducted as a cross-sectional analysis at age 10 for many comparisons, which limits the understanding of temporal dynamics.

      We thank the reviewer for highlighting this point. While age 10 was initially used as a standardised reference point for cross-sectional comparisons, the underlying dataset is longitudinal, with repeated antibody measurements across childhood. To address this more directly, we have now complemented these analyses with antigen-specific mixed-effects regression models incorporating all available longitudinal data, with adjustment for age and a random intercept for repeated measurements within individuals. These models demonstrate that the differences between cohorts are not confined to the age-10 cross-section but are evident in an age-adjusted longitudinal framework for multiple antigens. We have retained the age-10 comparisons for reference, but the primary inference is now based on the longitudinal mixed-effects analyses. These changes are reflected in the revised results and statistical analysis sections. We thank the reviewer for this astute point, which we think has substantially improved the manuscript.

      (6) Statistical analysis is limited to univariable comparisons without consideration for confounders or adjusting for multiple comparisons.

      We agree that the original analyses relied primarily on univariable comparisons. In the revised manuscript, we have extended the analytical framework to include mixed-effects regression models that account for age effects and repeated measurements within individuals. These models estimate the average age-adjusted difference in antibody responses between cohorts across the full follow-up period. We have also applied false discovery rate (FDR) correction to account for multiple antigen testing. For multiple antigens, the direction and magnitude of cohort differences remain consistent under this approach, strengthening the robustness of the findings beyond the original univariable comparisons. These analyses have been incorporated into the revised results and statistical analysis sections.

      (7) No mechanistic understanding of how early malaria exposure creates lasting immunosuppression.

      We agree that this study does not directly resolve the mechanistic basis underlying the observed long-term differences in antibody responses. The primary aim of this work was to identify and characterise durable alterations in humoral immune profiles associated with early-life malaria exposure, rather than to define the cellular or molecular pathways involved. However, our findings are consistent with a growing body of experimental and clinical literature suggesting that malaria infection can induce sustained perturbations in B cell and T cell compartments, including the expansion of atypical memory B cells, altered germinal centre responses, and increased regulatory immune activity. These mechanisms have been proposed to impair the generation and maintenance of effective humoral immunity. In the revised discussion, we have clarified that the mechanistic basis of this phenomenon remains to be fully defined and have expanded the discussion of plausible pathways in light of existing literature. We now explicitly position our findings as providing population-level evidence of a durable immunological phenotype that warrants further mechanistic investigation.

      (8) No understanding of the clinical Implications of the reduced IgG levels observed in the area with high malaria exposure.

      We agree that this study does not directly establish the clinical consequences of the reduced antibody levels observed in malaria-exposed children. The primary objective of this study was to characterise long-term differences in humoral immune profiles associated with early-life malaria exposure, rather than to assess downstream clinical outcomes or functional antibody activity. We have clarified this limitation in the revised discussion. Nevertheless, the breadth and consistency of the observed differences for multiple vaccine-preventable and infectious antigens raise the possibility that early-life malaria exposure may have implications for long-term immune protection. We now emphasise in the revised discussion that future studies incorporating functional assays and clinical outcome data will be required to determine whether these serological differences translate into altered susceptibility to infection or reduced vaccine effectiveness.

      Assessment of Claims:

      The data appear to support the authors' primary claims, but the strength of the evidence is limited, and the results should be interpreted with caution. Together with the currently available evidence of P. falciparum's impact on the host's immune function, this natural experiment design provides further evidence for a relationship between early malaria exposure and reduced antibody responses. The within-Ngerenya analysis controls for geographic factors and thus enhances the quality of the evidence, however, it still fails to account for the physical, nutritional, and socio-economic factors that may have driven the observed changes. Additionally, the mechanism underlying this effect remains unclear, and the clinical significance of reduced antibody levels is not established.

      We thank the reviewer for this assessment and for recognising the strengths of the natural experiment design and within-cohort analyses. We agree that, as an observational study, our findings should be interpreted appropriately. In the revised manuscript, we have undertaken additional analyses and clarifications to strengthen the evidential basis of our conclusions and to address the points raised. To address potential confounding by nutritional and related factors, we analysed contemporaneous hospital-based surveillance data from the same geographic populations since nutritional and socioeconomic variables were not consistently collected during the course of longitudinal follow up. For three independent anthropometric indices of nutrition status (muac, weight-for-age, and height-for-age), we found no evidence of systematic differences between children from Junju and Ngerenya after adjustment for age, calendar year, and concurrent infections. As the longitudinal cohort subjects were randomly drawn from these populations, these findings suggest that the two groups were broadly comparable with respect to early-life growth and nutritional status.

      We agree that the mechanistic basis of the observed differences is not resolved in this observational study. We have clarified this point in the revised discussion and expanded our consideration of plausible biological pathways based on existing literature, including perturbations in B cell and T cell compartments. Similarly, we now explicitly state that the clinical implications of reduced antibody levels remain to be determined and will require studies incorporating functional assays and clinical outcomes. We believe these revisions strengthen the manuscript by providing a more comprehensive interpretation of the data.

      Impact and Utility:

      This work has fundamental implications for understanding vaccine effectiveness in malaria-endemic regions and may contribute to informing vaccination strategies. The findings, if strengthened, would suggest that children in areas of high malaria transmission may require modified immunization approaches. The dataset provides a valuable resource for future studies of malaria's immunological legacy.

      We thank the reviewer for this comment

      Context:

      This study builds on prior work showing acute immunosuppressive effects of malaria but uniquely attempts to demonstrate the durability of these effects years after exposure. The natural experiment design addresses limitations of previous observational studies by providing a more controlled comparison.

      We thank the reviewer for this comment

      Recommendations for the authors:

      Reviewing Editor Comments:

      We suggest that further analyses of potential confounders such as anthropometric indices, socioeconomic status, and comorbidities would render the evidence more robust.

      We thank the Reviewing Editor for this important suggestion. We agree that careful consideration of potential confounding factors is critical to the interpretation of these findings, and have undertaken additional analyses to address this.

      Because anthropometric and related socioeconomic measurements were not collected systematically within the original longitudinal malaria cohort, we assessed potential population-level differences using hospital-based surveillance data from the same geographic regions. This dataset includes measurements of anthropometry (mid-upper arm circumference, weight-for-age, and height-for-age) as well as detailed infection diagnostics in childhood. Using these data, we fitted regression models adjusting for age, calendar year, and concurrent, clinically diagnosed infections. For all three anthropometric indices, we found no evidence of systematic differences between children from Junju and Ngerenya, with effect estimates small and crossing zero (fig. 6). As the longitudinal cohorts were randomly selected from these populations, these findings suggest that the groups were broadly comparable with respect to nutritional status and infection exposure. With respect to socioeconomic status and comorbidities, detailed individual-level data were not available within the longitudinal cohort. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic and environmental setting, provides a complementary control for these factors. We have incorporated these additional analyses and clarifications into the revised manuscript statistical analysis, discussion lines and believe they strengthen the robustness of the findings by addressing key sources of potential confounding.

      Reviewer #1 (Recommendations for the authors):

      The manuscript is well written, with clear and informative figures that effectively support the findings.

      We thank the reviewer for this comment

      Suggestions:

      (1) Although the study well controlled for malaria exposure, other environmental or infectious factors that influence immunity could be considered:

      Nutritional status in childhood (malnutrition impacts immune response), co-infections (helminths, respiratory viruses), socioeconomic differences, or differences in access to health services, even minimal, between Junju and Ngerenya.

      We thank the reviewer for highlighting the potential influence of environmental, infectious, and socioeconomic factors on immune responses. We agree that these are important considerations in the interpretation of observational data. To address nutritional status and concurrent infectious exposures, we analysed contemporaneous hospital-based surveillance data from the same geographic populations. This dataset includes measurements of anthropometric indices (mid-upper arm circumference, weight-for-age, and height-for-age) alongside detailed clinical diagnostics for common childhood infections. Using regression models adjusting for age, calendar year, and concurrent infections, we found no evidence of systematic differences in anthropometric profiles between children from Junju and Ngerenya (fig. 6). These findings suggest that the populations from which the longitudinal cohorts were randomly selected were comparable with regard to early-life growth and nutritional status. We agree that we cannot fully exclude the influence of unmeasured factors such as helminth infections, socioeconomic variation, or subtle differences in healthcare access. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic, environmental, and healthcare setting, provides an internal control for many of these factors. The persistence of similar patterns within this setting supports malaria exposure as a key contributor of the observed differences. We have clarified these considerations in the revised discussion and believe that, the additional analyses and within-cohort comparisons strengthen the robustness of our conclusions while acknowledging the limitations inherent to observational studies.

      (2) Measurement of other immunological markers:

      In addition to IgG, include: B cell subpopulations (naive, memory, atypical), cytokine levels (IL-10, IFN-γ) to characterize the immunological microenvironment.

      You could include these recommendations in the text for future studies.

      We thank the reviewer for this thoughtful suggestion. We agree that detailed immunophenotyping, including characterisation of B cell subpopulations and cytokine profiles, would provide important insight into the mechanisms underlying the observed differences in antibody responses. In the revised manuscript, we have expanded the discussion to highlight these important avenues for future work, including the potential role of altered B cell subsets (and regulatory or inflammatory cytokine environments in shaping long-term humoral responses).

      Reviewer #2 (Recommendations for the authors):

      The manuscript is well-written.

      We thank the reviewer for this comment

      (1) Methodological Clarifications:

      Do the authors have any information regarding the characteristics of these children that could be of use in understanding their immune responses better? (e.g., weight, height, BMI, socioeconomic status, HB level, access to health care, etc.).

      We thank the reviewer for highlighting the importance of participant characteristics in interpreting immune responses. Anthropometric and related clinical measures were not collected systematically within the original longitudinal malaria cohort, as the study was designed to investigate the acquisition of naturally acquired immunity to malaria.

      To address this, we analysed contemporaneous hospital-based surveillance data from the same geographic populations, which include measurements of anthropometric indices (mid-upper arm circumference, weight-for-age, and height-for-age) alongside detailed infection diagnostics. Using regression models adjusting for age, calendar year, and concurrent infections, we found no evidence of systematic differences in anthropometric profiles between children from Junju and Ngerenya (fig. 6) Detailed individual-level data on socioeconomic status, haemoglobin levels, and healthcare access were not available within the longitudinal cohort impeding direct adjustment in the longitudinal cohorts. However, the within-Ngerenya analysis, where children with differing early-life malaria exposure were compared within the same geographic and healthcare setting, provides an internal control for many of these factors. These considerations are now clarified in the revised discussion.

      Could the authors provide more detailed statistical analysis, including power calculations and multiple comparison corrections?

      In the revised manuscript, we have extended the statistical analysis and now include antigen-specific mixed-effects regression models incorporating all available longitudinal measurements, which is comprehensively described in the statistical analysis section. We have also applied false discovery rate (FDR) correction to account for multiple testing across antigens, and report both unadjusted and FDR-adjusted significance in the revised results. With respect to power, the sample size was determined by the number of children meeting inclusion criteria within the long-term surveillance cohorts in terms of availability of a sufficient number of longitudinal samples. We have clarified this in the revised manuscript.

      Clarify the criteria for selecting the 123-child subset from the larger surveillance cohorts.

      We thank the reviewer for this comment. The 123 children included in this analysis were selected from the larger surveillance cohorts based on the availability of sufficiently dense longitudinal serum sampling as described above. Specifically, children were required to have at least eight longitudinal samples available in the archive, enabling robust assessment of within-individual antibody trends over time. This criterion was applied to ensure adequate temporal resolution to examine the long-term stability of malaria-associated effects on antibody responses. Children with fewer available samples were therefore excluded, as limited sampling would not allow reliable characterisation of longitudinal patterns. We have clarified these inclusion criteria in the revised manuscript.

      (2) Additional Analyses and Data Presentation:

      The authors could consider dose-response analyses relating malaria episode frequency/timing to degree of immunosuppression or even AMA-1 IgG levels and degree of immunosuppression. How do they associate over time?

      We thank the reviewer for this suggestion. To address this, we examined the relationship between malaria exposure (using cumulative febrile malaria episode count derived from longitudinal surveillance data) and the magnitude of heterologous antibody responses. In mixed-effects models adjusting for age and repeated antibody measurements, higher malaria episode burden was associated with lower antibody responses against multiple antigens (fig 7).

      Analyze whether the effects vary by specific age at malaria exposure.

      We agree that age at exposure is an important consideration. We have now assessed how the relationship between malaria burden and antibody responses varies with age by including age as a non-linear term and modelling interactions between malaria exposure and age as described above. These analyses did not suggest substantial heterogeneity in the association over age, and therefore we have retained the simpler presentation for clarity.

      Provide correlation analyses between different antibody responses to assess whether suppression is generalized.

      We have addressed this by modelling responses jointly across a panel of heterologous antigens and by examining antigen-specific associations. The direction of effect was consistent for the majority of antigens, with no evidence of opposing trends, supporting a broad rather than antigen-specific effect.

      The authors could consider moving Figures 2a and b to the supplementary material.

      We thank the reviewer for this suggestion. We carefully considered whether panels 2a and 2b could be moved to the supplementary material. However, we have retained them in the main text because they provide a simple, intuitive illustration of how AMA1 antibody responses track with malaria exposure at the individual level, complementing the population-level analysis shown in fig. 2c. We feel that this helps establish the biological validity of the microarray platform in a way that is immediately interpretable to the reader, and therefore supports the interpretation of subsequent analyses.

      The authors could consider replacing Figures 3a and b with IgG levels from ALL vaccinated children and ALL non-vaccinated children.

      We thank the reviewer for this suggestion. We would like to retain these figures for the same reasons that have been articulated above for figures 2a and b.

      (3) Discussion Enhancements:

      The authors should consider expanding the discussion to address the limitations of the data more thoroughly, particularly regarding the potential differences between cohorts that could have contributed to the results.

      We have expanded the discussion to more explicitly address potential differences between cohorts that could contribute to the observed findings, including nutritional, socioeconomic, and environmental factors.

      The discussion needs to acknowledge the lack of directionality for the associations observed. As stated above, although I agree in general terms with the observations that the authors have made, it is not possible to distinguish between a suppressive effect of malaria on immune responses to infection-derived pathogens or a protective effect of malaria that leads to less exposure to infection-derived pathogens (and consequently lower IgG levels). The mechanisms behind these could include things like different health-seeking behaviors or social interactions from kids who have malaria versus those who don't, for example.

      We agree that, as an observational study, we cannot definitively establish the direction of the association between malaria exposure and antibody responses to unrelated antigens. We have now clarified this limitation explicitly in the discussion. We acknowledge the alternative interpretations raised by the reviewer, including the possibility that differences in exposure to other pathogens, potentially driven by behavioural, environmental or healthcare-related factors, could contribute to the observed patterns. At the same time, we note that the natural experiment design, prospective malaria exposure classification, and within-Ngerenya comparisons support early-life malaria exposure as a key contributing factor. We have revised the discussion to reflect this balance.

      Extend the discussion of potential biological mechanisms underlying durable immunosuppression.

      We thank the reviewer for this suggestion. We have expanded the discussion to more fully consider potential biological mechanisms that could underlie the observed long-term differences in antibody responses. Specifically, we now discuss evidence from prior studies indicating that malaria infection can induce sustained alterations in B cell and T cell compartments, including expansion of atypical memory B cells, disruption of germinal centre responses, and increased regulatory immune activity. We position our findings as providing population-level evidence of a durable immunological phenotype, while noting that targeted mechanistic studies will be required to define the underlying pathways.

      Extend the discussion around the clinical implications of the observed antibody level differences.

      In the revised discussion, we highlight that studies incorporating functional assays and clinical outcome data will be required to determine whether these serological differences translate into altered susceptibility to infection or reduced vaccine effectiveness.

      (4) Technical Issues:

      Could the authors please:

      (1) Clarify microarray data processing and quality control procedures.

      We thank the reviewer for this request. We have expanded the methods section to provide additional detail on microarray data processing and quality control procedures.

      (2) Provide information on inter-assay variability and batch effects.

      We have expanded the methods section to clarify how these were evaluated and addressed. Inter-assay variability was monitored using pooled adult serum included on every slide as a consistent positive control. This allowed us to assess slide-to-slide consistency in signal detection across the full antigen panel. In addition, fluorophore-conjugated IgG and IgA controls were printed directly onto each miniarray to confirm scanner performance independently of antigen–antibody interactions. At the sample level, each specimen was assayed on two independent miniarrays per slide, generating four spatially separated replicate measurements per antigen. Technical variability was quantified using the coefficient of variation (CV), and measurements with CV >20% were excluded from downstream analyses.

      (3) Include details on how missing data were handled in longitudinal analyses.

      We thank the reviewer for highlighting this point. We have added clarification in the statistical analysis section describing how missing data were handled. Specifically, mixed-effects models were used, which accommodate unbalanced longitudinal data without requiring imputation, allowing all available observations to contribute to the analysis.

      (4) Include details of the parameters of the LOWESS analysis shown in Figure 1.

      We have expanded the figure 1 legend to include the parameters used for the loess smoothing shown, including the smoothing span.

      (5) Include details of the samples used for Figure 3d (Negative and Pooled Adult Serum).

      We have clarified in the methods the nature and purpose of the samples used in Figure 3d. The negative control consisted of phosphate-buffered saline applied to a full miniarray in place of serum, allowing assessment of background and non-specific signal in the absence of antibody binding. The pooled adult serum comprised a composite of sera from multiple healthy adults from the same setting and was included as a positive reference sample, expected to contain a broad repertoire of antigen-specific antibodies. These controls were included on each slide to enable interpretation of assay performance, with the negative control defining baseline signal and the pooled adult serum providing a consistent reference for antigen recognition across the microarray.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      This work sets a benchmark for integrative 3D genomics in oncology. Its methodological sophistication and conceptual advances establish a new paradigm for studying nuclear architecture in disease.

      We appreciate the very kind words.

      Weaknesses:

      Major Issues

      (1) Functional tests would strengthen the observed links between structure and gene changes. For example, the COL12A1 gene loop formation correlates with its increased expression. Disrupting this loop using CRISPR-dCas9 at chr6 position 75280 kb could prove whether the loop causes COL12A1 activation. Such experiments would turn strong correlations into clear mechanisms.

      We agree that targeted disruption of specific loops such as COL12A1 will be important for functional validation of the causal relationships between enhancer-promoter loop formation/dissipation and changes in gene expression. However, the intent of our current study was to profile changes in genome organization at a global scale to deduce general features of cancer progression-associated changes in genome organization, rather than to explore specific loop interactions. The current findings are a foundation for more targeted functional follow-up studies.

      (2) The H3K27ac looping idea needs deeper validation. Data suggests H3K27ac loss weakens loops without affecting CTCF. Testing how cohesin proteins interact with H3K27acmodified sites would clarify this process. Degron systems could rapidly remove H3K27ac to observe real-time effects. Also, the AP-1 motifs found at dynamic loop sites deserve functional tests. Knocking down AP-1 factors might show if they control loop formation.

      We agree that modulating histone modifications or transcription factors would provide insights into the underlying mechanisms driving the changes we observed. However, such studies utilizing degrons or small molecule inhibitors that globally knock down either H3K27ac or specific transcription factors are often difficult to interpret. For example, assessing the role of AP-1 factors, as suggested, would be complicated by the variety of AP-1 proteins. In addition, H3K27ac reduction could inhibit loop strength either directly (i.e. by reducing cohesin recruitment) or indirectly (i.e. by reducing gene expression which could in turn affect loop strength). Parsing out the exact relationships between these features will require extensive follow-up work and falls outside of the scope of the current study.

      (3) Connecting findings to patient data would boost clinical relevance. The MCF10 model is excellent for controlled studies. Checking if TAD boundary weakening occurs in actual patient metastases would show real-world importance. Comparing primary and metastatic tumor samples from the same patients could reveal new structural biomarkers. If tissue is scarce, testing cancer cells with added stroma cells might mimic tumor environment effects.

      We have leveraged publicly available datasets to link the observations from the progression model to clinical samples. Specifically, we have compared our datasets to chromatin organization data in non-cancerous mammary epithelial cells (HMEC), five cell lines representing distinct cancer subtypes ranging from less (luminal) to more aggressive (triple negative, TNBC), as well as tissue samples from TNBC patients with contralateral normal controls. We explored the conservation of both loops and TADs identified in the MCF10 progression system in each of these maps, paying particular attention to how features that are differential between MCF10 cells differ across other cancer cell types. We observe a high degree of conservation of static loops and TAD boundaries among the cancer samples, as well as some degree of cell-specific changes among loops and boundaries that change during MCF10 progression. These findings are included in Supplemental Figures 3 and 4 and are discussed on page 7.

      Minor Issues

      (1) Adding a clear definition for static loops would help readers. For example, state that static loops show less than 10 percent contact change across replicates.

      Static loops are defined as loops with a fold-change of 1.5 or more between any two MCF10 cell lines and an adjusted p-value of less than 0.025 considering change across biological and technical replicates. This definition is stated on page 6).

      (2) In the ABC model analysis, removing promoter regions from the enhancer list would focus results on true long-range interactions.

      The ABC model already excludes the promoter of each gene. Only self-promoters are excluded, whereas the model allows promoters of other genes to act as potential long-range enhancers of the target gene. We have added text to make this clear (see page 11).

      (3) Briefly noting why this study sees TAD weakening while other cancer types show different patterns would provide useful context.

      The biological reason for TAD weakening in the MCF10 model is not known, but neither the mechanism for boundary weakening nor the reason for apparently different behavior amongst cancers is known. We expanded the text on this discussion slightly, but we refrain from making any definitive claims. We do note that differences in the types of cancer studied or the methods used for detecting changes in TADs (i.e. different sensitivities and thresholds for detecting change) could be responsible (see page 15). We also mention that the loss of insulation at many TAD boundaries detected in our study are subtle changes in intensity that could be potentially missed if using methods tailored to find more drastic changes in TAD architecture.

      Reviewer #2 (Public review):

      While the conclusions are broadly supported, methodological and analytical refinements are required.

      We appreciate these comments.

      (1) Model representativeness. The long-term culture-adapted MCF10 genome harbours extensive aneuploidies and translocations. Validation of key COL12A1/WNT5A loop dynamics in an independent breast-cancer line (e.g., MDA-MB-231, T47D) or in patientderived organoids/PDX models would strengthen generalizability.

      Although the generation of Micro-C datasets in additional cell lines is outside of the scope of this study, we used publicly available Hi-C data from triple negative breast cancer (TNBC) progression and patient samples (Kim, Han & Chun et al. 2022) to assess generalizability of the MCF10 model findings. While these maps are lower resolution than the Micro-C maps used in our study, they are of sufficient depth to detect loops at a similar resolution (10 kb). We report these findings in Supplemental Figures 3 and 4 and discuss them on page 7.

      We find that chromatin loops and TAD boundaries detected across the MCF10 system are highly conserved across all other mammary epithelial lines studied. Chromatin loops that were more prominent in MCF10AT1 and MCF10CA1a lines were also significantly stronger in TNBC cells. Insulation score boundaries that were weakened in MCF10CA1a showed strong insulation across all cell lines in TNBC. These findings highlight that different model systems indeed have distinct profiles of structural change, just as they have distinct gene expression profiles.

      It is worth noting that direct comparison at individual loci is complicated by variations in gene expression profiles between the MCF10 model and the TNBC progression model; for example, COL12A1 is not significantly upregulated between normal and TNBC tissues in this study (unlike in the TCGA-BRCA data) and is downregulated between HMEC and TNBC cell lines. Regardless, our analysis provides some indication of conserved and divergent features in the various model systems.

      (2) The study remains purely correlative; no perturbation experiments are conducted to demonstrate causal roles of chromatin loops on gene expression. CRISPR interference (CRISPR-Cas9-KRAB/HDAC) or enhancer deletion/inversion should be applied to 3-5 pivotal loops (e.g., COL12A1, WNT5A) to test their impact on target-gene expression and cellular phenotypes (e.g., proliferation, migration).

      We agree that targeted disruption of specific loops such as COL12A1 will be important for understanding the causal relationships between enhancer-promoter loop formation/dissipation and changes in gene expression. However, the intent of our current study was to profile changes in genome organization at a global scale to deduce general features of cancer progression-associated changes in genome organization, rather than exploring specific loop interactions. The current findings are a foundation for more targeted follow-up functional studies.

      (3) The manuscript lacks integration with clinical datasets. Integrate TCGA-BRCA data to assess whether elevated COL12A1/WNT5A expression associates with overall survival (OS) or distant metastasis-free survival (DMFS)

      To assess clinical significance of specific loci, we have queried expression of all differentially expressed genes in the MCF10 progression system among TCGA-BRCA expression data. We summarize our findings in Supp. Fig. 5E and discuss them on page 8.

      We found that roughly 25% of genes that change in our model also change significantly in breast cancer, but only roughly half of those genes change in the same direction (i.e. up-regulated in MCF10CA1a vs MCF10A, and up-regulated in tumor vs normal samples). Interestingly, there was a higher degree of directional agreement between latechanging genes (i.e. genes that change in MCF10CA1a compared to MCF10A and MCF10AT1) than early-changing genes (i.e. genes that change in MCF10AT1 and MCF10CA1a compared to MCF10A).

      We have also explored the impact of select highlighted genes on overall survival (OS). We present these data in Supp. Fig. 6 and discuss it on page 8. While not all genes showcased in this study have a significant impact on overall survival, most trend in the same direction as their differential expression would suggest (i.e. genes more highly expressed in cancer vs tumor also have a hazard ratio above 1).

      Reviewer #3 (Public review):

      The differential topology analysis and its integration with transcription is very well done- one of the best versions of this I have read in the 3D genome field!

      We appreciate the reviewers’ endorsement.

      However, the paper is framed largely as a cancer biology study, and it teaches us much less about this. I am worried that some of the trends for each topologic feature are not going to be consistent across the pre-malignant-malignant-metastatic spectrum and would like the authors to soften some of their claims a bit regarding how this clarifies our understanding of cancer evolution.

      We agree that the strength of the study lies in its deep mapping of chromatin architecture and the landscape of enhancers and differentially expressed genes, which we hope to use to better understand the relationship between chromatin structure and gene expression, regardless of their cancer relevance. To better relate the findings in the progression system to cancer, we have added new data from direct comparisons of the MCF10 progression system with multiple patient-derived cancer cell lines and cancer tissues. These data are shown in Supp. Fig. 3 and 4 and discussed on p. 7. Regardless, we have softened the claims regarding cancer progression throughout the manuscript.

      Weaknesses:

      Major Concerns:

      (1) The integration of gene expression and chromatin loops is intriguing. The authors' differential analysis, however, omits consideration of genes that are on and simply further upregulated versus genes that transition on/off or off/on. It would be nice to see the authors break out looping patterns for these two different patterns of regulation, as it may be instructive regarding the rules for how EP loops govern transcription.

      To address different types of gene expression patterns, we analyzed 108 genes that went from an unexpressed or “off” state (2 or fewer read counts) in one cell line to an expressed “on” state (100 or more read counts) in another, and 111 genes that go from “on” to “high” (1000 or more read counts). We present these data in Supp. Fig. 8 and discuss the findings on page 9. While neither of these genes were enriched for differential loops, a large number overlap with loop anchors. We found a relationship between loop strength and gene expression levels; genes that are more strongly expressed are more likely to overlap with the anchor of a chromatin loop. All gene sets show similar strong trends at distal regulatory regions.

      (2) Given the paucity of differential loops at the majority of genes whose expression changes, the authors should examine chromatin subcompartments, as these may associate more with differential transcription.

      We present subcompartment analysis in Supp. Fig. 9. Our CALDER compartment calls are qualitative rather than quantitative, so to explore this we examined how compartments change genome-wide and at specific promoters. We show these data in Supp. Fig. 9 and discuss the findings on page 10-11. We see that between any two cell types, a majority of changes occur between closely related subcompartments, i.e. from A.2.2 to A.2.1 (1 step more A-like) or B.1.1 (1 step more B-like). The promoters of differentially expressed genes have minimal subcompartment changes, but genes that shift from on to off have larger changes. Differentially expressed genes with promoters that shift by multiple subcompartments have significant impacts on fold-change, but smaller shifts have minimal impacts on gene expression. In summary, small changes in subcompartments are very common and have little impact on gene expression, while larger changes are infrequent and correlate more strongly with changes in gene expression.

      (3) The authors could push their TAD analysis further by integrating it with transcription. Can they look at genes and their enhancers that span these altered boundaries to see if these shifts impact transcription?

      We provide this analysis in Supp. Fig. 9. We started, as suggested, by looking at genes with distal enhancers (as determined by the ABC model) that span a single TAD boundary. However, the number of genes that fit this definition was relatively small, so we expanded to look at any genes with promoters in the proximity (50kb) of differential insulation score boundaries, for which we saw the same trends with more robust signal. Our findings are shown in Supp. Fig. 9 and discussed on page 10. We found that genes near weakened boundaries are not enriched for differentially expressed genes, while those near strengthened boundaries are. Comparing the fold-change of genes near strengthened, weakened, and static boundaries showed a significant inverse correlation between boundary strength and gene expression, although effect sizes were small. These results show that changes in TAD boundary insulation have small but noticeable impacts on gene expression.

      (4) The progression of cancer critically goes from a benign -> pre-malignant -> malignant -> metastatic series of steps. The AT1 line is described as 'premalignant' and thus the authors' series omits a malignant line. While I think adding such a sample is an unreasonable request at this point (as it would have had to have been studied in 'batch' with these other samples), the authors should acknowledge that they omit this step and spend some time discussing the genetic, morphologic, and phenotypic features for their 3 conditions. The images in Figure 1S aren't particularly useful- they don't tell the reader that these cells are malignant/benign. The karyotypic data are intriguing but not fully analyzed, so it is hard to know what true phenotype these cells represent. For example, malignant means DCIS/invasive carcinoma - so then what does this pre-malignant cell model represent? The described alteration in the AT1 line is a Ras oncogene, so in some sense, the transition to this line really is just +/- Ras. The authors could spend some time thinking about the effects of Ras specifically on the 3D genome.

      We have expanded our discussion of the relevance of the MCF10 model on page 4, and the limitations of the model on page 17. The MCF10 progression model has been extensively used by many laboratories, and its properties have been discussed in detail (i.e. Polizzotti et al. 2012). Critically, the MCF10AT1 cell line is the product not only of Ras oncogene expression but then derived from a 100-day-old precancerous lesion that formed a squamous carcinoma in a mouse, and over this time it accumulated additional changes. The MCF10AT1 line is considered pre-malignant as it has accrued critical changes that prepare it for the metastatic transition, but it does not immediately form tumors when injected back into mice. Unlike the MCF10DCIS cell line which is malignant but not metastatic, the more aggressive MCF10CA1a is classified as both malignant and highly metastatic, forming tumors that quickly metastasize to the lungs in mouse xenografts. While both MCF10AT1 and MCF10CA1a are tumorigenic, we acknowledge the lack of a nonmetastatic malignant cell line in the discussion on page 17. We have also provided updated karyotype characterization of the cell lines used in this study in Supp. Fig. 1B and now include full composite karyotypes in the Methods section (page 18).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The reviewer’s recommendations are the same as their public review comments. See our response to the review comments above.

      Reviewer #2 (Recommendations for the authors):

      (1) If conditions permit, it is recommended that inclusion of primary human mammary epithelial cells (HMECs) to distinguish immortalisation-specific from malignancy-specific 3D changes.

      Micro-C data of equal resolution is not available for HMECs. We have, however, incorporated analysis of publicly available deeply sequenced Hi-C data of HMECs into several figures that explore the conservation of loops and TADs in these cells (Supp. Fig. 3 and 4).

      We find that chromatin loops and TAD boundaries detected across the MCF10 system are highly conserved across all other mammary epithelial lines studied. Chromatin loops that were more prominent in MCF10AT1 and MCF10CA1a lines were also significantly stronger in TNBC cells. Insulation score boundaries that were weakened in MCF10CA1a showed strong insulation across all cell lines in the TNBC system. These findings highlight that different model systems indeed have distinct profiles of structural change, just as they have distinct gene expression profiles.

      (2) The relationship between loop alterations and copy-number variations (CNVs) is not explored. If conditions permit, it is recommended that overlay differential loops with SNP/Indel/CNV data to exclude spurious differences arising from structural alterations.

      While we have not conducted an in-depth SNP analysis, we have clarified our discussion of the karyotype analysis on pages 21 and 23 and how we mitigated these effects when identifying differential loops between cell lines.

      (3) The horizontal and vertical coordinates of the diagram are difficult to view; it is recommended that the size of the text on the picture be adjusted to ensure that it is clear to read. Some of the text coordinates of the figure are labeled in gray; it is recommended that they be in black.

      The clarity of the figures has been improved.

      Reviewer #3 (Recommendations for the authors):

      I really like this paper. I think if the cancer focus can be down-emphasized (because I'm not fully clear what we've really learned about cancer), then it represents a nice dataset and a thoughtful, comprehensive analysis.

      We greatly appreciate the kind words and helpful feedback. The cancer focus has been toned down throughout the manuscript, as suggested.

      Minor Concerns:

      (1) The authors present a nice summary of the topological changes across samples. However, summary statistics can mask noise/bias and also don't fully convey the effect size of the reported changes. Highlighting individual loci and visualizing these would strengthen the paper and participate in maintaining a high standard for our genomic studies of topology, in which we summarize, but also provide representative examples. I would appreciate seeing more example plots at distinct loci (even if in the supplemental information).

      We have included several more example regions in Supp. Fig. 7 and 12, including four looped genes that change similarly between the MCF10 series and TCGA-BRCA data (2 stably looped genes and 2 differentially looped genes, 2 up-regulated and 2 downregulated), and six differentially looped and differentially expressed genes (3 which change in the same direction as the loops, and 3 which change in the opposite direction).

      (2) "To identify loops that changed significantly during cancer progression, we assessed changes in contact frequency among every loop in each cell type, correcting for karyotypic differences that result in differences in coverage between cell lines (see Methods)." The Methods section is not adequately explained. Also, could you go a bit deeper to define if these large-scale changes shift the 3D genome specifically? This is hard, but there may be some low-hanging fruit given the otherwise fairly isogenic features in your model.

      We have added more detail to the Methods section on pages 21 and 23 on how karyotypic abnormalities were included in our analysis and differential loop calling. A deeper analysis of how large-scale karyotypic changes affect chromatin organization (i.e. through the formation of neoloops and TADs through translocations) is indeed an attractive subject, but due to its complexity requires a separate dedicated study.

      (3) "Approximately half of chromatin loops featured some combination of active gene promoters and enhancers within 10kb of loop anchors". The authors have high-resolution topology data and should be more stringent; these features should have to overlap loop anchors or at least use a distance less than 10kb, which, in some sense, forfeits the advantages of high-resolution topology data.

      The threshold of 10kb was chosen for several specific reasons: First, the loop sizes detected here are large enough that this relatively large region still represents a small fraction of the loop span, and these regions are reasonably considered anchor-proximal. Second, the loops we detect are non-punctate, both in aggregate analysis (Figure 1H, bottom) and at individual loci (see example regions), showing increased contact frequency among several 5kb or 10kb bins. Therefore, adding 10kb to either side (2 pixels on 5kb maps and 1 pixel on 10kb maps) ensures that the full region of increased contact frequency is included. Finally, ultra-resolution Hi-C data has also shown that loops remain diffuse even with 1kb resolution maps (albeit they do get smaller than the 30kb used here) (Harris & Gu 2023). We have added a brief justification of this overlap size to the text on page 24.

      (4) "These results show that not only changes in either contact frequency and enhancer activity correlate with increased gene expression, but they also correlate with each other, suggesting a potentially linked functional role during enhancer-promoter communication." The authors could use this opportunity to disentangle the contributions of loops and chromatin modifications a bit more. The exceptions are of interest - e.g., loop is stable, gene expression changes or loop changes, gene expression does not. Highlighting exemplar cases for these exceptions (rather than just a genomics summary) would be nice to see.

      The additional example regions we have included in Supp. Fig. 7 and 12 now showcase a wider variety of scenarios; in addition to more examples of static loops with gene expression changes (Fig. 2, Supp. Fig. 7E-F) and differential loops with matching gene expression changes (Fig. 4, Supp. Fig. 7C-D, Supp. Fig. 12A-C), we now also feature examples of differential loops where gene expression changes in the opposite direction (i.e. a strengthened loop at a down-regulated gene, Supp. Fig. 12D-F).

    1. I started undoing the 'participatory' design plans I unilaterally made to reconceive acollective methodology with more uncertain, voluntary, and relational dynamics. Surprisingly, this'ineffective' ongoing turn became a strength rather than a limitation—

      No plan is unilateral, we are a conduit of past relationships, of people who have influenced us, and we acknowledge this so much that we allow the transference of autonomy through "differently abled" people guardians, stewards of nature, animal caretakers, and political representatives.

      What Volpi is looking for without stating is a sustainable equalised economy where there are no power monopolies and notable hierarchies that may lead to oppression. But to say that "inefficiency" is a "strength" unkowingly perpetuates those oppressive structures, because this "inefficiency" is almost exclusive of wealthy people. Volpi's language is colonised, as they probably don't realise this.

      Say they get involved in an actually slow process, one where they don't propose, but wait for others to ask, one, like an ethnography, where they learn and listen and don't try to impose themselves and their ideas because that is the productivist system that academia perpetuates... then the group they end up in will either be a marginally small group of outcast people, or privileged (or both), with minimal potential impact for change; or they will end up in a bigger already existing association where they in a way "inflitrate" and only over multiple years start to achieve trust capital to push their ideas (having also taken some others' in order to claim epistemic humility and a certain representativeness).

      Let's see... how do I spell this? We must not condone infinite growth, but when it comes to things like ending poverty, I think our stance should be unambiguously clear that this is progress, that this is positive.

    Annotators

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important methodological issue-the fragility of meta-analytic findings-by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong.

      Strengths:

      (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.

      (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.

      (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.

      Reviewer #3 (Public review):

      (1) The manuscript would benefit from a clearer explanation of in what sense EOIMETA is generalizable. The author mentions this several times, but without a clear explanation of what they mean here.

      This is a point I was remiss not to better elucidate. With regards to generalisation, the text has been modified to explicitly state that generalisability in this context means no specific study dependence, just a net number of subjects required to flip a result. The text reads:

      “Atal's method is highly useful, but one possible objection is that it has the downside of non-generalisability, as it finds very specific combinations of trials and patients that would have to be re-coded (events classified as non-events and vice-versa) for results to become insignificant. For example, an Atal meta-analytic fragility of 4 pertains to a specific and often unique circumstance when 4 patients could be recoded from a specific study or combinations thereof to change outputs, but this does not generalise to any 4 patients in that meta-analysis. This makes this definition of meta-analytic fragility useful but not general, and perhaps less intuitive to interpret than a typical RCT fragility metric. In this work, we establish a generalizable meta-analytic fragility metric, based upon Ellipse of Insignificance (EOI) analysis for dichotomous outcome trials. This method creates a pool of events and non-events in both arms, adjusted for weighing, and answers the general question of how many patients would have to be effectively recoded in a meta-analysis for results to flip, without requiring specific study identification.”

      (2) The authors mentioned the proposed tools assume low between-study heterogeneity. Could the author illustrate mathematically in the paper how the between-study heterogeneity would influence the proposed measures? Moreover, the between-study heterogeneity is high in Zhang et al's 2022 study. It would be a good place to comment on the influence of such high heterogeneity on the results, and specifying a practical heterogeneity cutoff would better guide future users.

      This is a very fair observation, and I need to better explain myself here! So there are effectively two measures of heterogeneity considered in this work; the typical value from a meta-analysis and the measure of divergence between the crude and the inverse-variance weighed adjusted – when these differ my small amounts, one could conceivably use either measure. I’ve changed the text to better reflect this, including:

      “This modification in akin to pooled in a meta-analysis, and adjusts for study level heterogeneity. After this modification, a standard EOI analysis can then be applied to the vector . In addition, we can also employ ROAR analysis to the same vector, yielding the raw number of patients in either or both arm who could be added a given direction to change the result, and exact combination of control and experimental group redactions required to change the result from a significant finding to a null one. Caveats for implementation and interpretation are outlined in the discussion section.”

      (3) I think clarifying the concepts of "small effect", "fragile result", and "unreliable result" would be helpful for preventing misinterpretation by future users. I am concerned that the audience may be confusing these concepts. A small effect may be related to a fragile meta-analysis result. A fragile meta-analysis doesn't necessarily mean wrong/untrustworthy results. A fragile but precise estimate can still reflect a true effect, but whether that size of true effect is clinically meaningful is another question. Clarifying the effect magnitude, fragility, and reliability in the discussion would be helpful.

      This is an excellent suggestion – I’ve tried to do it with percentages, as in table 2, but these are minute in the case of the vitamin D trials, partially I suspect because they are extraordinarily weak. The Cohen’s H for these meta-analyses yields tiny values, which I think might be tied to the virtually negligible percentages we obtain for number needed to flip. With stronger data, it might be worth expanding this into a useful heuristic measure for robustness, though I don’t think vitamin D data as in this work is going to help us much. In light of the reviewer’s excellent comment, I added the following:

      In light of the reviewer’s excellent comment, I added lines 230-240 in the revised manuscript.

      (4) Comments on revisions:

      I am unable to find the author's responses to my previous round comments (Reviewer #3) in the revision package, though replies to the other reviewers are present. I will provide my updated feedback once these responses are available for review.

      My sincere apologies, I neglected the specific comments in error – this document should address them now, thank you again for giving this your time and consideration!

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fogel & Ujfalussy report an extension of a visualization tool that was originally designed to enable an understanding of detailed biophysical neuron models. Named "extended currentscape", this new iteration enables visual assessment of individual currents across a neuron's spatially extended dendritic arbor with simultaneous readout of somatic currents and voltage. The overall aim was to permit a visually intuitive understanding for how a model neuron's inputs determine its output. This goal was worthwhile and the authors achieved it. Their manuscript makes two additional contributions of note: (1) a clever algorithmic approach to model the axial propagation of ionic currents (recursively traversing acyclic graph subsections) and (2) interesting, albeit not easily testable, insights into important neurophysiological phenomena such as complex spike generation and place field dynamics. Overall, this study provides a valuable and well-characterized biophysical modeling resource to the neuroscience community.

      Strengths:

      The authors significantly extended a previously published open-source biophysical modeling tool. Beyond providing important new capabilities, the potential impact of "extended currentscape" is boosted by its integration with preexisting resources in the field.

      The code is well-documented and freely available via GitHub.

      The author's clever portioning algorithm to relate dendritic/synaptic currents to somatic yielded multiple intriguing observations regarding when and why CA1 pyramidal neurons fire complex spikes versus single action potentials. This topic carries major implications for how the hippocampus represents and stores information about an animal's environment.

      Weaknesses:

      While extended currentscape is clearly a valuable contribution to the neuroscience community, this reviewer would argue that it is framed in a way that oversells its capabilities. The Abstract, Introduction, Results, and Methods all contain phrases implying that extended currentscape infers dendritic/synaptic currents contributing to somatic output., i.e. backwards inference of unknown inputs from a known output. This is not the case; inputs are simulated and then propagated through the model neuron using a clever partitioning algorithm that essentially traverses a biologically undirected graph structure by treating it like a time series of tiny directed graphs. This is an impressive solution, but it does not infer a neuron's input structure.

      We are sorry if our text could be interpreted as if we were inferring unobserved inputs from the known outputs. This was not intentional and we were unaware of the possibility of such interpretation.

      In fact, at the beginning of the Results, we started the description of the extended currentscape method by explicitly stating that we need to measure the input currents: “Our method … requires measuring the membrane and axial currents throughout the dendritic tree of a neuron (in every node of the circuit)”.

      To further clarify that our method starts with measuring the input currents, we made this information explicit already in the abstract (“Our approach relies on the iterative decomposition of the axial current flowing between neighbouring compartments in proportion to the underlying membrane currents measured in the model.”), and in the Introduction (“Even if the membrane currents are known, studying the impact of particular ion channels on the neuronal response in such a dynamical system under in vivo conditions is hindered by two major obstacles”). We also rewrote several parts of the text to remove any phrases that could imply the inference of the inputs (line 568). We believe that after clarifying this at the beginning of the paper, the readers will not misinterpret our descriptions later in the text.

      Because a directed acyclic graph architecture is shown in Figure 2, it is unintuitive that the authors can infer bidirectional current flow, e.g. Figure 3 showing current flowing from basal dendrites and axon to soma, and further towards the apical dendrites. This is explained in Methods, but difficult to parse from Results amidst lots of rather abstract jargon (target, reference, collision, compartment). Figure 2 would have presented an opportunity to clearly illustrate the author's portioning algorithm by (1) rooting it in the exact morphology of one of their multicompartmental model neurons and (2) illustrating that "target" and "reference" have arbitrary morphological meanings; they describe the direction of current flow which is reevaluated at each time step.

      We thank for this comment. We agree that the concepts introduced here to explain our method are rather abstract and could be difficult to understand. To help the reader we followed the instructions of Reviewer and redesigned Fig. 2 to provide a step by step explanation of the extended currentscape method. In particular,

      We used a simpler model where the structure of the graph can be directly related to the morphology of the model.

      We show that the target node can connect multiple subtrees with axial currents flowing in different directions. We explain that in this case the inward and the outward subtrees are pruned and partitioned separately.

      We provide a glossary in Table 1 to ensure that the readers can follow our description and do not get lost amidst lots of rather abstract jargon.

      We also clarified that although the target compartment is chosen arbitrarily by the user, it remains the same for all time points throughout the analysis.

      Analyses in Figure 7, C and D, are insightfully devised and illuminating. However, they could use some reconciliation with Figure 5 regarding initiation of individual APs versus CSBs within place fields.

      We thank the reviewer for the positive comments and also for pointing out the potential source of misunderstanding. We slightly changed the text at Fig 5 to emphasize that this is a single example trial, and we added the following sentence to the paragraph describing Fig 7CD: “Consequently, the somatic current dynamics before the iAP and the CSB presented in Fig 5Cc-Dd can be regarded as illustrative samples from a broad distribution, but the differences observed between them are not representative.}”

      The intriguing observations generated by extended currentscape also point to its main weakness, which the authors openly acknowledge: as of now, no experimental methods exist to conclusively tests its predictions.

      We agree with the Reviewer that not being able to apply our extended currentscape method to reveal the current types driving real neurons recorded in vivo is currently a weakness of our approach. However, we would like to emphasize that it may be feasible to use it to estimate the spatial distribution of the membrane currents driving the cell based on in vivo voltage imaging data, as we briefly outline in the discussion.

      Reviewer #2 (Public review):

      Summary

      The electrical activity of neurons and neuronal circuits is dictated by the concerted activity of multiple ionic currents. Because directly investigating these currents experimentally isn't possible with current methods, researchers rely on biophysical models to develop hypotheses and intuitions about their dynamics. Models of neural activity produce large amounts of data that is hard to visualize and interpret. The currentscape technique helps visualize the contributions of currents to membrane potential activity, but it's limited to model neurons without spatial properties. The extended currentscape technique overcomes this limitation by tracking the contributions of the different currents from distant locations. This extension allows tracking not only the types of currents that contribute to the activity in a given location, but also visualizing the spatial region where the currents originate. The method is applied to study the initiation of complex spike bursts in a model hippocampal place cell.

      Strengths.
>

      The visualization method introduced in this work represents a significant improvement over the original currentscape technique. The extended currentscape method enables investigation of the contributions of currents in spatially extended models of neurons and circuits. 
>

      Weaknesses.

      The case study is interesting and highlights the usefulness of the visualization method. A simpler case study may have been sufficient to exemplify the method, while also allowing readers to compare the visualizations against their own intuitions of how currents should flow in a simpler setting. 
>

      We thank the reviewer for this comment. In fact we had been also considering to include a simpler case study to illustrate the extended currentscape method in the original submission. In accordance with the comments from Reviewer 1, we now use a simple model to introduce the concepts in Figure 2 and provide a few examples where the reader can compare the results with their own intuition in simpler cases.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Model complexity vs. intuition/validation. The case study relies on a very complex CA1 model, making it difficult to build intuition about current flow and to validate the visualization. Inclusion of a simpler benchmark (e.g., soma plus a dendrite with two branches, fewer compartments) is recommended to demonstrate how the extended currentscape behaves in a more tractable setting.

      Inspired by the suggestions of the Reviewers, we modified Figure 2 and now first use a simple model with a soma and a dendrite with two branches to introduce the concepts of our analysis. We start with a few examples where the reader can compare the results with their own intuition in simpler cases.

      (2) Rationale and citations for input structure. The in vivo-like input design (untuned inhibition; 12 co-tuned excitatory clusters with large conductances; the goal of generating place fields) would benefit from a more explicit rationale and substantially more literature support. Alternative plausible scenarios (e.g., distributed co-tuned inputs and homosynaptic plasticity) should be articulated, and choices situated within the experimental literature on CA1 excitation/inhibition, including tuning and anti-tuning results.

      We extended the paragraph in the Results describing the input structure and added the most important references there. We added further references to the Methods section where we argue that “Reliable place cell tuning can be achieved by functional synaptic clustering without increased excitatory drive in the place field (Ujfalussy and Makara 2020) or via strong excitatory drive without input clustering (Grienberger et al., 2017, Ujfalussy and Makara, 2020). However, experimental data indicates that both of these mechanisms are present and contribute to the activity of place cells (Adoff et al., 2021,Tasciotti et al., 2025)” and “although interneurons can display spatial tuning, they typically have a broad tuning with low selectivity (Ego-Stengel et al., 2007, Dupret et al., 2013, Geiller et al., 2020). A weak disinhibition within the place field can also contribute to the selective firing of place cells (Geiller et al., 2022, Valero et al., 2022), this was not necessary for place cell activity in novel environments (Geiller et al., 2022) and the overall inhibitory input to place cells is largely untuned (Grienberger et al., 2017).”

      (3) Scope of PCA-based claims. The interpretations derived from the PCA analysis appear broader than warranted, given subcellular heterogeneity and the dominance of somatic action potential variance. These claims should be tempered with more explicit statements about what PCA can and cannot resolve in this context.

      We thank the Reviewer for the opportunity and encouragement to clarify this part of the text. We agree with the Editor and the Reviewers that the results of the PCA analysis can not be used to support claims regarding the presence or the absence of independent dendritic events. In fact, we aimed to use it as an illustration that global activity tends to dominate PCA analysis even when the “neuron is mainly driven by strong, functionally clustered synaptic inputs to a few dendritic branches”. We acknowledge that we did not formulate this point clearly in the original submission. Therefore we substantially rewrote this part of the Results and performed additional analysis to clarify that there is a substantial amount of soma-independent dendritic activity in our model that remains invisible for a PCA based analysis.

      Reviewer #1 (Recommendations for the authors):

      Major concerns:

      (1) Depolarization-inactivated K+ may be an important consideration to model burst-firing.

      Our current model includes 2 kinds of transient K+ channels that show inactivation after depolarization: a proximal and a distal type, as the original model in Jarsky et al., 2005. We now made this explicit in the main text (line 178).

      (2) Description of the in vivo-like model's excitatory and inhibitory input structure needs many more citations of biological studies to communicate rationale for the author's decisions, e.g. untuned inhibitory neurons, organization of a subset of excitatory inputs into 12 function synaptic clusters with co-tuned presynaptic neurons and outsized synaptic conductances. The goal is clearly to create CA1 pyramidal neurons with place fields, which would be helpful to state upfront. But additionally, (a) place fields could arise from homosynaptic potentiation of distributed co-tuned excitatory inputs (e.g., Bittner, et al. 2017 study describing BTSP made no assumptions) and (b) CA1 inhibitory interneurons can be spatially tuned (Ego-Stengel & Wilson, 2006; Wilent & Nitz, 2007; Geiller, et al. 2020) and even anti-tuned (Geiller, et al. 2021).

      We thank the Reviewer for pointing out the lack of appropriate references in this section. We made the following changes in the manuscript:

      (1) Stated explicitly that the goal was to create place cell activity.

      (2) Added references to the main text to justify our choices of the inputs (lines 234-241).

      (3) We included a longer rationale for the choice of synaptic clusters and the lack of inhibitory (anti-)tuning in the Methods section, describing the neuron model. In brief, Adoff et al., 2021 reported more clustering of excitatory inputs within the place field. In our model, the degree of clustering is somewhat larger than the clusters reported. Although inhibitory neurons can be tuned, their tuning is much weaker than that of place cells and seems to play only a minor role in the generation of place fields (Grienberger et al., 2017). The presence of inhibitory anti-tuning is controversial: although Geiller et al., 2021 reported weak (~10%) anti-tuning, they did not find it in novel environments, indicating that it is not needed for spatially selective activity (lines 628-646).

      (3) Interpretation of principal component-based analyses shown in Figure 4 could be toned down. As written in section "CSBs in the CA1 pyramidal neuron", it sounds like CA1 pyramidal neuron dendrites display minimal autonomous activity. However, PCA does not seem well-suited to address the heterogeneity of subcellular voltage dynamics over physiologically relevant timescales. Somatic action potentials, and their backpropagation/modulation of dendritic voltage, would of course explain a very large fraction of variance. However, if local dendritic events summate over fine timescales to initiate somatic firing, it is hard to imagine this important nuance being detected. On the other hand, it is hard to imagine single dendritic branches driving robust somatic firing except in the relatively extreme situation in which large numbers of synapses synchronously drive the same branch to initiate a local Ca2+ spike (Figure 3, A-C).

      We agree with the reviewer that PCA can not reveal the potential dendritic origin of somatic APs, and thus is not suitable to assess the role of local dendritic spikes in shaping the output of the cell. We wanted to highlight here that even in cells with excitable dendrites driven by strong, local input clusters, exhibiting frequent local dendritic spikes, the dendritic membrane potential dynamics will be dominated by global fluctuations with surprisingly little sign of local dynamics in the PCA components. As the reviewer also pointed out, this may not be surprising as local events either remain spatially restricted and thus contribute little to the overall variability of the dendritic Vm or they initiate somatic APs and will thus be counted as global events.

      To demonstrate the high propensity of local dendritic events, we analysed local Vm peaks in dendritic branches and found that ~7.6% of the peaks were not coupled to somatic APs.

      Although this number could seem low, we emphasize that most of the 92.4% of the dendritic peaks coupled to APs potentially reflect the backpropagation of the same somatic events to multiple dendritic sites. To confirm this, we performed an additional analysis measuring the spatial extent (number of branches involved) of the individual dendritic events. We found that 90% of the events remained local, restricted to a few dendritic branches, while 10% of the events were global, associated with BAPs and involving the majority of the dendritic tree. Interestingly, these global events dominate the PCA analysis and are responsible for >90% of the dendritic Vm peaks. These results are included in a new panel in Figure 4H.

      We conclude that, “this way, although only 10% of the dendritic Vm events were associated with bAPs, they were ~60-times larger than local events and they dominated the PCA analysis even in the presence of local regenerative dendritic events driven by strong, functionally clustered synaptic inputs.” We believe that this model and analysis could serve as an important benchmark for future experimental studies investigating the structure of membrane potential correlations in in vivo voltage imaging data (Lee et al., 2026).

      (4) One suggestion would be to display more data as shown in Figure 4F, with a longer X axis to clarify the temporal relationship between local dendritic spikes and the first somatic action potential.

      We added a few more examples including the CSBs presented in Fig8G-I as a new supplementary Figure S4. We also slightly extended the x-axis on this supplementary figure as the reviewer requested.

      If the models indicate that passively filtered EPSPs drive most somatic action potentials, as seems to be the case in Figure 5, then this would also be helpful to show as in Figure 4F.

      In Fig 5 we showed two examples of isolated APs. The first AP was indeed driven by passively filtered EPSPs. The second one was preceded and possibly caused by a dendritic spike, as highlighted by the black arrowhead labelled c in Fig. 5Cc. We further analysed the currents driving iAPs in Fig 7B and C, and found that there is considerable heterogeneity in the magnitude of the dendritic Na currents driving the soma before action potentials. Figure 8 and Figure S3 (now Fig. S5) show further examples for iAPs driven either by passively filtered EPSPs or dendritic spikes. We also included these examples in the new supplementary Figure S4.

      (5) Another suggestion would be to use one-hot vectors containing onset times of different event types, since this would divorce the amplitude/duration of events from their influence over total variance.

      In this paper our goal was to illustrate the ability of the extended currentscape method to reveal the origin of the axial currents driving neuronal activity. In Fig. 4, our primary intention was to characterize the membrane potential response of the model in a way that is easily comparable with experimental data. To further quantify the frequency of local events, we added a new panel showing the spatial extent of dendritic events (Fig. 4H). To make our model more comparable with recent publications, we also calculated two additional metrics used to evaluate the relationship between somatic and dendritic activity (Fig 4I-J). We hope that these additional analyses help the reader to characterize the prevalence and impact of local dendritic events on somatic activity.

      (6) From section "Input conditions for complex spike burst generation", paragraph 2: "Note that synapse density, the ion channel mechanisms and the input statistics are identical for tuft and oblique branches,...". The authors should justify this parameterization given the numerous known differences between tuft and oblique branches in both of these regards and acknowledge accompanying interpretational caveats.

      We agree with the reviewer that experimental data demonstrated several significant differences between the tuft and oblique branches regarding both the inputs they receive and the way they process it. However, in the present paper we chose not to include these differences for several reasons:

      Here we aimed to focus on the abilities of the dendritic currentscape methods and use CSBs as a case study to illustrate how dendritic currentscape can reveal the membrane currents underlying complex neuronal responses.

      Currently there is no CA1PN model that would be able to reproduce all data regarding tuft and oblique integration and would be able to fire calcium spikes. We only wanted to make minimal modifications to the existing CA1PN model to make it capable of generating Ca-spikes and CSBs. We are currently working towards developing and extensively testing a new model, examining the role of these regional differences in CSB generation.

      Although there is information regarding input statistics and dendritic physiology in the literature, many of the relevant parameters are underconstrained. We wanted to avoid overfitting by keeping the model simple.

      By maintaining identical inputs and ion channel distribution we can distinctly highlight the special role of tuft morphology in CSB generation. Altering the inputs or the ion channel density for the tuft would make the interpretation more ambiguous, and elucidating the specific role of the different factors in CSB generation is the subject of future investigations.

      In sum, although we acknowledge that our model does not reflect the full complexity of CA1 PNs and its inputs, we regard this simplicity as a useful feature of the model. We added a section discussing potential future extensions of the model and highlighting interpretational caveats in the discussion (lines 482-490).

      (7) Given the debate in the field regarding the level of functional autonomy present in dendrites, the authors' finding that dendritic voltage largely tracks that of the soma (though see concern above re: PCA), and their access to specific currents, the authors have an important opportunity investigate the divergence between Ca2+ and voltage sensors as reporters of dendritic activity.

      For instance, why have some studies reported relatively common isolated dendritic Ca2+ transients in CA1 pyramidal neurons while other studies, including voltage imaging studies, have reported the opposite?

      We thank the Reviewer for the opportunity to highlight a few important points regarding functional autonomy of dendrites based on the analysis of our model. We would like to first note that only parallel calcium and voltage imaging studies will be able to ultimately resolve this debate. Nevertheless, below we briefly summarize our take on this issue.

      (1) In general, most Ca2+ imaging studies found that soma-independent dendritic events are rare. "Isolated dendritic transients (no coincident somatic event; see fig. S6, C and D, for example) were overall rare. Isolated apical dendritic Ca2+ transients, which have not previously been reported in CA1PNs, were larger and more frequent than those observed in basal dendrites." (O’Hare et al., 2022). "Activity in the ... basal dendrites ... along the track but outside of the place field was rarely observed” (Sheffield and Dombeck, 2014) and “overall, isolated dendritic transients were similar in size but occurred far less frequently than coincident dendrite-soma transients”, or “data indicate that spatially reliable dendritic firing was almost exclusively yoked to somatic tuning, likely reflecting strong backpropagation of burst firing during traversals of the somatic PF” (Rolotti et al., 2022). Consistent with this observation, a dendritic Vm peak chosen randomly from any branch has ~93% probability to be related to a bAP in our model. However, it is also true that ~90% of events in the model are local events, simply because isolated events involve ~60-times fewer branches (1.8 on average) than events associated with bAPs (114 branches) in the model. If the spatial extent of typical local events are also similarly small in real neurons as in the model, then even rare occurrences of dendritic events may reveal substantial dendritic independence. We added a section quantifying the functional autonomy of dendrites in the model in the main text, around Fig 4H.

      (2) Ca2+ indicators are slower and nonlinear and thus they are somewhat unreliable reporters of dendritic voltage events, especially in distal dendrites (Wu et al., 2026; Gonzalez et al., 2026). To illustrate this, we calculated three metrics in our model that were also reported in recent dendritic Ca2+ imaging studies (Rolotti et al., 2022, Sheffield et al., 2014, 2017). First, we calculated the fraction of bAPs detected in a branch (called dendrite-soma coupling in Rolotti et al., 2022, see their Fig. 2C) as a function of the distance of the branch from the soma (our new Fig. 4I). In the Ca2+ imaging data, this was essentially constant ~30% between distances 5-100 µm from the soma. In contrast, the fraction of bAPs detected in the model was 100% in this range as bAPs propagation failures did not occur before µ100 µm. This is also consistent with a recent voltage imaging study showing that even low-transmission bAPs reliably propagate to the proximal dendrites (Lee et al., 2026, Fig 3G). The low and distance independent dendrite-soma coupling reported by Rolotti et al. can only be reconciled with the known biophysics of neurons if the recorded calcium signal is unreliable reporter of the underlying voltage. Indeed, it has been reported that Ca signals associated with bAPs can be absent in some dendritic branches (Landau et al., 2022) or that local, nonlinear Ca signals can appear in the absence of local regenerative voltage response (Weber et al., 2016, Tran-Van-Minh et al., 2016) and that the Ca signals are highly variable across cells (Eltes et al., 2019).

      Second, we calculated the fraction of local events as a function of the distance from the soma (our Fig 4J; see also Fig. 2F in Rolotti et al.). When averaged across all branches, this was somewhat lower in the model (18%) than in the data (38%) which, again, could be explained by the low reliability of detecting global voltage events in all compartments based on the calcium signal.

      Third, the range of branch-spike-prevalence (BSP) values in our model (0.5-0.9; Fig. 4H) seem consistent with that reported (0.4-0.8) at first (Fig 4C of Sheffield et al., 2014; Fig 2 of Sheffield et al., 2017). However, we note that there are several important differences: for technical reasons, Sheffield et al. reported BSP for place field traversals and not for individual events, and they measured Ca2+ dynamics in the basal dendrites. Since bAPs are almost always present in all basal dendrites in the model (basal BSP > 0.9 for all events with somatic spikes) and place field traversals were always accompanied by somatic APs, BSP for basal dendrites would be nearly 1 in the model. Thus, the lower BSP values reported by Sheffield et al. could be explained by the limited reliability of the Ca2+ indicators in reporting regenerative voltage events in neuronal processes.

      We briefly discussed these differences in the Discussion (lines 474-478).

      (3) Finally, to our knowledge, there are 3 relevant in vivo voltage imaging studies in CA1 PNs. Liao et al., 2024 found that in induced place cells the tuning of dendritic events (presumably local or back-propagating Na-spike) was similar to the somatic tuning, which is consistent with our model where dendritic activity and tuning is dominated by bAPs. However, they did not acquire simultaneous signals from the dendrites and the soma so they could not study the independence of the dendritic events. Lee et al. (2026) found that only 10% of the dendritic events are not associated with a somatic spike, which is lower than the number of independent events in the model. However, the events they found were generated in the distal apical trunk (their Fig 3D) and they could not record from the most distal branches where most of the isolated events were generated in our model. Gonzalez et al., 2026 measured voltage and calcium in selected locations within the dendritic tree, and could not reliably estimate the fraction of isolated events throughout the cell. (Gonzalez et al, 2024 measured voltage only in single spines and soma, but did not quantify independent dendritic events; Wong-Campos et al., 2023 measured dendritic integration and bAPs in L23 branches; Wu et al. 2026 recorded in CA2 neurons.)

      We added a paragraph in the discussion comparing the level of functional autonomy present in the model dendrites to recent Ca- and voltage-imaging studies (lines 467-474).

      Minor concerns:

      (1) Abstract:

      There is a need to explain what currentscape is - even at the cost of not invoking its name. To a reader not familiar with currentscape, the abstract is extremely difficult to understand.

      We reworded the title and the abstract to make them more accessible to readers not familiar with the term currentscape.

      (2) "Currentscape analysis of place field dynamics" section:

      It would be helpful to emphasize upfront that dendritic determinants of individual somatic APs versus CSBs will be discussed separately. Since somatic action potentials are discussed before CSBs, I found this section initially confusing as I attributed those findings to CSBs until reading the next paragraph.

      We added a sentence to clarify that we analysed subthreshold responses, APs and CSBs separately.

      (3) Bottom of p2 discussing mixed literature on what drives CSBs in CA1 PCs:

      Overall accurate and useful point, but an important nuance is glossed over which misportrays state of field. References ex vivo studies that fail to drive CSBs with somatic current injection and in vivo study successfully doing so. These aren't really conflicting results. In vivo current injection co-occurs with spontaneous synaptic input, which is high in CA1 and results in PCs that are significantly depolarized at rest relative to those in acute slices. Bittner 2017 ex vivo results are consistent with this: CSBs driven by Cs+-based internal solution to block K+ channels (partially, using strategy of purposefully high series resistance). Similar situation in vivo given that A-type K+ channels are inactivated by depol. Resulting increase in input resistance lowers input threshold to CSB. This is clarified in Results, p.5: "Under in vivo-like synaptic input conditions (see below and Methods), dendritic Ca2+-spikes could also be evoked by somatic current injection (Fig. S1E), as in Bittner et al. (2015).", which makes p. 2 feel especially awkward.

      We agree with the Reviewer that these are not necessarily conflicting results. We rephrased this section, emphasizing that the role of the different input pathways in the initiation of CSBs are not clear.

      (4) Abbreviating "pyramidal neuron" with PC is confusing:

      PC often means place cell. The authors could change this, such that PC refers to "pyramidal cell", or else use PN as an abbreviation. It is important to avoid confusion, especially because place cell dynamics feature prominently in the manuscript.

      Thanks for the suggestion. We replaced PC with PN throughout the manuscript.

      (5) Only apical dendritic parameters are described in section 2 of Results, but the full morphology is shown in Figure 3B with basal currents shown in panels C and F. Some clarification is needed - either what currents were considered for basal dendrites and why, or else why basal dendritic current parameters were not considered for this simulation using apical dendritic current injection but nonetheless examining basal dendritic currents.

      We clarified in the text that the original model contained a standard set of Na and K channels (line 178).

      (6) Clarify "i" and "s" in the Figure 3C legend - "intrinsic" and "synaptic" white letterings are small/hard to see in the bottom subpanels.

      We now spell out intrinsic and synaptic in the Figure and increased the contrast of the letterings.

      (7) Regarding the computational benefit of recursively decomposing axial currents along an adaptively truncated acyclic graph, it would be useful to (a) include a supplemental figure benchmarking this approach to standard approaches to quantify the described gain in computational efficiency and (b) describe computing hardware in the Methods.

      We included an estimated benefit of the pruning process (line 758) as well as the utilised computing hardware and the simulation times in the Methods (line 776).

      Reviewer #2 (Recommendations for the authors):

      The manuscript is in great shape, it is well organized, and the figures are gorgeous. I believe that the extended currentscape is a great extension of the original currentscape method. In particular, the possibility of partitioning currents by the spatial location of their sources is a great addition. 
>

      Recommendations:

      (1) The method is applied in the context of an interesting case study that highlights its usefulness. However, the model in the study is so complex that it is difficult to develop an intuition of how currents should be flowing, and this makes it hard to intuitively validate the visualization method. I think that applying the extended currentscape in a simpler model - maybe a soma with a dendrite with two branches, fewer compartments - would be instrumental in developing this intuition. 
>

      We now first use a simple model with a soma and a dendrite with two branches to introduce the concepts in Figure 2 and provide a few examples where the reader can compare the results with their own intuition in simpler cases. We also added the currentscape analysis of a standard, two-compartmental model from Pinsky and Rinzel, 1994 as Supplementary Figure 1.

      (2) I found a number of typos and minor stylistic details you may want to fix in a revised version of the manuscript.

      (a) Abstractine, line 12. I believe the word "recursive" is a bit technical at this point. It's meaning in this context becomes clear after ones goes through the details of the algorithm (Figure 2). 
>

      We replaced the word “recursive” with “iterative”. We hope that this will make the abstract clearer for the readers. In fact, we realized that the word iterative is a better description of the algorithm, so we replaced the “recursive” with “iterative” consistently throughout the manuscript.

      (b) Figure 1, caption."Since we included the capacitive current, the magnitude of the inward and the outward currents is identical (Kirchhoff's law)."This sentence can be confusing. If the inward and outward currents are the same, the membrane potential doesn't change. I believe that you are including the capacitive current in the inward (or outward) currents.

      Indeed, we included the capacitive current in the inward or outward currents. We changed the text to clarify this.

      (c) Lines 92-93. I do not fully understand this sentence. Are you making an assumption? What does 'continuos flow of axial current' mean?
>

      By ‘continuous flow of axial current’ we meant a spatially continuous stream of axial currents flowing from the reference to the target. To clarify this, we added the explanatory sentence: “i.e., if the axial current is not blocked or reversed between the reference and the target.”

      (d) Equation (1.) Why summing axial currents over j? Is this for the case of a branching point?

      The compartment could be 1) part of a continuous segment of dendritic branch, where axial currents can flow from the distal and the proximal direction (sum over 2); 2) It can be a branch point with 3 axial currents; 3) or it can be a leaf compartment with only one axial current, in which case the summation is not relevant. We clarified this in the text.

      (e) Figure 2, caption. Typo. "When the axial currents flows…" Should it be 'current'? - Figure 3, caption. Typo in (C) "Extended currentscape" 
>

      Corrected.

      (f) Figure 4. I cannot see the grey lines or the dotted lines mentioned in the caption. 
>

      We added an arrow highlighting the gray and the dotted lines in the figure.

      (g) Figure 5, caption. "Red boxes highlight regions analyzed in panels B-D."Because this is a spatially extended model, region may be confused with spatial location, but you are highlighting a temporal interval.
>

      We rephrased the caption referring to temporal intervals now.

      (h) Line 341. This is a numerical experiment, correct? 
>

      We clarified in the text and added that it was indeed a simulation experiment.

      (i) Line 349. Should it be 'distributions'? 
>

      Corrected

      (j) Line 422. Typo. Missing space 'in vivousing'
>

      Corrected

      (k) Line 537. "Preprocessing membrane…" I found this entire subsection a bit confusing and hard to read.

      We rephrased this subsection to clarify it and facilitate reading.

    1. Author response:

      eLife Assessment

      This study provides fundamental insights by demonstrating that the Nanog mRNA coding sequence (CDS) and 3′UTR domains are spatially segregated and functionally distinct in pluripotent stem cells and blastocysts, with 3′UTR-enriched border cells primarily influencing morphogenesis and CDS-enriched inner cells largely regulating transcription and epigenetic programs. The work opens a novel conceptual avenue for understanding how separable mRNA domains can differentially control cell behavior and differentiation. However, the evidence is incomplete, as key aspects of the molecular nature, biogenesis, and precise characterization of the separated 3′UTR and CDS RNA species, as well as causal links between their perturbation and the observed phenotypes (e.g., via rescue and deeper characterization of 3′UTR elements), remain to be fully established.

      We thank the editors and the three reviewers for their careful and constructive engagement with our manuscript. We greatly appreciate the reviewers’ recognition of the conceptual significance of the study and their thoughtful suggestions for strengthening the mechanistic and molecular characterization of the work. We have carefully considered all points raised and outline below the revisions planned for the revised manuscript.

      The phenomenon of differential CDS and 3’UTR expression is not unique to Nanog. Independent 3’UTR and CDS expression and differential CDS/3’UTR usage has been observed across multiple genes, tissues, and developmental contexts, including genome-wide (Mercer et al., 2011) and transcriptome scale studies (Kocabas et al., 2025, Ji et al., 2021). Prior studies have proposed that isolated 3’UTRs may arise through regulated RNA processing pathways coupled to exonucleolytic degradation and, in some cases, recapping mechanisms (Malka et al, 2017, Haberman et al., 2024). While the precise molecular mechanisms underlying isolated Nanog CDS and 3’UTR generation remain unresolved, our observations (contained here) support regulated RNA processing models. Our original submission included a brief discussion of this topic; however the revised manuscript will include substantially expanded analyses and discussion of the generation of isolated Nanog CDS and 3’UTR species.

      The revised manuscript will address the major concerns regarding:

      (1) The molecular nature, biogenesis, and precise characterization of the separated 3′UTR and CDS mRNA species

      (2) The causal relationship between perturbation of these RNA species and the observed phenotypes, including additional rescue experiments and deeper computational characterization of putative, functional 3′UTR elements.

      Specifically:

      (A) New supplementary analyses and schematics designed to further clarify the conceptual and mechanistic framework of the study, including:

      (i) Computational examination of the Nanog 3’UTR across all reading frames for open reading frames (ORFs).

      (ii) As suggested by Reviewers 1 and 3, single cell traces of Nanog mRNA expression from the full-length mESC dataset used in this study, illustrating distinct transcript isoforms and CDS/3’UTR expression patterns across individual cells, complementing the color-coded tSNE analyses currently presented in Fig. 2.

      (iii) Expanded schematic model and analyses addressing possible mechanisms underlying the generation of isolated Nanog CDS and 3’UTR enriched RNA species, including transcript architecture, predicted RNA structural barriers, and exonucleolytic processing models.

      (iv) Expanded discussion of the predominantly nuclear localization of the Nanog 3’UTR signal and its implications for transcript biogenesis, processing, and potential noncoding functions.

      (B) Correction of all minor labeling errors.

      (C) Additional experimental analyses, including:

      - Expansion of Nanog 3’UTR overexpression and rescue experiments to include cell spreading assays.

      - Expanded analysis of the effects of ROCK pathway inhibitors on colony morphology and cytoskeletal organization.

      - Examination of the ability of ROCK inhibition to restore normal embryoid body formation.

      Collectively, these planned revisions are intended to strengthen the mechanistic framing, molecular characterization, and broader significance of the study while clarifying the interpretation and scope of the conclusions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      There is evidence that some genes encode mRNAs from which separate processed transcripts may arise, separating the coding sequence (CDS) from the 3'-UTR, and with both mRNA elements remaining stable in the cell. However, the functional consequences of these mRNA fragments have not been firmly established. In the manuscript by Yang et al., the authors probe the mRNA domain architecture of Nanog in the context of embryonic stem cell colonies and blastocysts. The authors detect spatial separation of Nanog CDS-containing mRNA from abundant Nanog 3'-UTR RNAs depending on the cell position in 2D embryonic stem cell colonies or in blastocysts.

      Strengths:

      The phenotypic analyses of the Nanog mRNA hold promise for revealing distinct roles for the Nanog encoded protein and a separate RNA encompassing the Nanog 3'-UTR.

      Weaknesses:

      There are a number of questions about the molecular nature of the mRNA species that the authors should address in order for the results to be firmly established, as noted below.

      (1) It is not clear how the authors verified that their probes are specific for Nanog CDS or 3'-UTR regions. Especially for the 3'-UTR probe, it is confusing why colonies show green only regions, suggesting only the CDS is present. I would expect the CDS and 3'-UTR probes to colocalize in the interior cells. Is it possible that the 3'-UTR probe is targeting another RNA?

      We thank the reviewer for raising the important question of probe specificity. We realize that the data that underlying this concern is the absence of colocalizing between CDS and 3’UTR probes in colony border cells.

      The absence of CDS/3’UTR colocalization in colony border cells is not due to probe failure but instead reflects the principal observation underlying the study. If Nanog CDS and 3’UTR sequences were present exclusively as intact full-length transcripts in a strict stoichiometric ratio, Nanog positive cells would be expected to be positive for both probes (appearing yellow). Instead, border cells exhibit strong 3’UTR signal with minimal or absent CDS signal, while adjacent interior cells show the opposite pattern.

      The fact that both probes robustly detect signal within the same sample but in spatially distinct cell populations, argues that both probes are functional and that the observed differential localization reflects genuine biological differences in levels of transcript components.

      The CDS probe targets ~300 bp within the coding region, while the 3’UTR probe targets ~300 bp within the proximal region of the Nanog 3’UTR. Hybridization specificity was validated as described in the Methods and in our previous studies (Kocabas et al 2015; Ji et al 2021), including negative controls. We additionally now provide a supplemental figure (New Figure 1-figure supplement 2A), highlighting that the Nanog 3’UTR and CDS probes label cell populations distinct from each other, further indicating their specificity.

      In addition, full-length scRNA seq datasets from both mouse and human ESCs demonstrate differential CDS/3’UTR expression patterns for Nanog and many other genes. To further clarify this point, the revised manuscript will include single cell transcript traces from mESCs illustrating the distinct Nanog isoforms detected across individual cells (New Figure 2-figure supplement 1A)

      (2) It would help for the authors to include a graphic similar to Figure 3, Figure Supplement 1A, that diagrams the location of the CDS and 3'-UTR probes (this should also be done for Oct4 and Sox2). This graphic could also show all potential polyadenylation signals.

      We agree that additional schematic clarification would improve readability. The revised manuscript will include schematics showing the locations of the CDS and 3’UTR probes for Nanog, Sox2 and Oct4 (New Fig. 1- figure supplement 1A).

      (3) I think, based on the fluorescence patterns, there is evidence that the signal for the Nanog 3'-UTR probe is nuclear (images with DAPI staining), but this is not commented on that I could find. This should be discussed, as nuclear retention has implications for the noncoding function of the 3'-UTR fragment.

      The reviewer is correct that the Nanog 3’UTR signal mostly nuclear. Whie this was noted in (the original) Figure 1-figure supplement 2A, we agree that it is possible that mechanistic and functional implications were not sufficiently discussed in the original manuscript. The revised manuscript will include expanded discussion of the relationship between nuclear localization transcript processing, and potential noncoding functions of isolated Nanog 3’UTR species

      (4) Figure 2, Figure Supplement 1A needs a better explanation. It's not clear how the reads map to the different regions of the Nanog mature mRNA. The authors should show examples at different ratios of CDS to 3'-UTR. Do the reads have a sharp boundary at the junction of where the isolated 3'-UTR is thought to occur?

      We thank the reviewer for this suggestion. The revised manuscript will include new single cell read maps across the Nanog locus from full length mESC scRNA-seq datasets (New Figure 2-figure supplement 1A), illustrating distinct CDS enriched and 3’UTR enriched transcript isoforms across individual cells.

      These analyses indicate that some CDS dominant transcripts contain 3’UTR sequence, while many appear to contain little or no detectable 3’UTR sequence. Conversely, many 3’UTR enriched transcripts contain only minimal or truncated CDS sequence. Importantly full CDS and 3’UTR mRNA components are frequently not present in a strict 1:1 ratio, either within individual cells, or across cell populations.

      The revised manuscript will also include expanded supplementary analyses integrating transcript architecture, predicted RNA structural barriers, polyadenylation analysis, and single cell coverage patterns to further examine possible mechanisms underlying the generation of isolated Nanog CDS and 3’UTR species (New Figure 2-figure supplement 1B,C).

      (5) I looked in the Zenbu browser at human NANOG CAGE mapping in the FANTOM5 dataset. I could not see evidence for substantial capping of a 3'-UTR fragment when filtering for embryonic cell types. Given the strong signal for the 3'-UTR in border cells, I would expect to see evidence for capping if the RNA were indeed capped. This suggests that if it exists, it is likely uncapped and (as noted in point 3) is likely nuclear retained.

      Prior studies have reported isolated uncapped and recapped 3’UTR species in multiple systems (Malka et al, 2017; Haberman et al, 2024). We agree that the predominantly nuclear localization and lack of a strong CAGE signal for Nanog are important observations and will expand discussion of these points in the revised manuscript.

      (6) Are there predicted polyadenylation signals near the end of the CDS that would generate a short 3'-UTR, and are these signals conserved across mammals?

      Computational analysis of the mouse Nanog 3'UTR identifies a single canonical PAS (AATAAA) at position 1074, located at the 3’ end of the annotated 3’UTR and this terminal PAS is conserved across mammals. These analyses will be included as a supplementary figure and discussed further in the revised manuscript section addressing Nanog transcript biogenesis.

      (7) It would help to see a zoomed-in view of the region targeted by one of the guide RNAs in the 3'-UTR, and where that site is relative to the polyadenylation signal. Is the polyadenylation signal upstream, i.e., CDS proximal?

      This will be provided in the revised manuscript (New Figure 2-figure supplement 1C,i) Two guide RNAs were used to generate the Nanog 3’UTR deletions. The downstream guide is upstream of the terminal polyadenylation signal at nt 1074 to preserve polyadenylation of the remaining Nanog CDS containing transcript.

      Consistent with this, all Nanog 3’UTR knockout lines retain normal Nanog protein levels. The revised manuscript will include supplementary schematics showing guide RNA positions relative to the CDS, 3’UTR probes, and terminal PAS.

      (8) A final note, the use of green and red together will be challenging for those who are colorblind. Providing a different false color palette would be helpful. 

      We appreciate this attention to accessibly. The red/green color combination was chosen to provide the highest contrast between CDS and 3’UTR signals in the in situ hybridization experiments, which is important for visualizing their differential spatial localization. We will ensure that figure legends clearly indicate channel assignments throughout the manuscript.

      I am refraining from comments on the cell biology and morphological insights, as they are remote from my core expertise.

      Reviewer #2 (Public review):

      Summary:

      This manuscript shows that the coding sequence (CDS) and 3' untranslated region (3'UTR) of mRNA transcripts from the Nanog gene have distinct expression patterns and functions. In both human and mouse embryonic stem cells colonies and blastocysts, these domains are spatially segregated, with 3'UTR-enriched cells occupying the borders and CDS-enriched cells residing in the interior. CDS mRNA expression is correlated with the expected regulation of transcription and epigenetics associated with the Nanog protein. Interestingly, expression of the 3'UTR appears to play an independent role in cell behavior and colony morphogenesis. Indeed, deletion of the 3'UTR causes specific defects in cell spreading and protrusive activity, with alteration in the localization of adhesion and cytoskeleton-associated proteins. Remarkably, a large proportion of those defects are rescued upon ROCK inhibition. Deletion of either Nanog CDS or 3'UTR leads to distinct modifications in the differentiation competence.

      Strengths:

      The independent role of 3'UTR mRNA domains, although identified in neurosciences a couple of years ago, is a novel and exciting field relatively unexplored in early development.

      The manuscript offers a multilayer series of experiments, in ES cells colony, blastocysts, and embryoid bodies, including imaging, -omics, genetic and pharmacological challenges, and differentiation experiments, thereby unveiling very convincingly the role of Nanog 3'UTR in morphogenesis.

      Weaknesses:

      The pathways leading to the generation of those distinct transcript domains are unknown. Although the functional differential roles are well demonstrated whether the expression patterns are a cause or a consequence of the cells' localization in the embryo remains to be explored.

      We thank the reviewer for these thoughtful comments and for recognizing the potential significance of independent 3’UTR functions in early developmental systems.

      Regarding the mechanisms underlying generation of distinct CDS and 3’UTR transcript domains, the revised manuscript will include new supplementary analyses and schematic models addressing possible Nanog transcript processing pathways, as outlined above.

      We agree that the relation between spatial location and Nanog 3’UTR expression is an important question. Specifically, it remains unclear whether cells first acquire high Nanog 3’UTR expression and subsequently localize to the colony border or whether border position itself promotes high Nanog 3’UTR expression.

      Our current data suggest that both processes may contribute. Deletion of the Nanog 3’UTR does not prevent colonies from establishing border/interior pattern, indicating that high Nanog 3’UTR is not strictly required for border pattern itself. At the same time, Nanog 3’UTR overexpression and rescue experiments increased the likelihood of border localization, suggesting that elevated Nanog 3’UTR expression promotes behaviors associated with border occupancy.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Yang et al reported distinct functions of the protein-coding sequence (CDS) and the 3' untranslated region (UTR) in the Nanog mRNA in pluripotent stem cells. They first observed different localization patterns for the CDS and 3' UTR in embryonic stem cells and in blastocyst embryos, and this pattern correlates with cell populations in different pluripotent states based on single-cell sequencing data. To characterize the potentially distinct functions of these regions, the authors generated knockout (KO) cell lines in which either the CDS or the 3' UTR was genetically ablated. These deletions led to different phenotypes in multiple assays. These results provided evidence that the CDS and 3' UTR of an mRNA could have distinct functions. Although these results are potentially interesting, several questions need to be addressed before the validity of their conclusion can be confirmed.

      Strengths:

      This study provides evidence for distinct functions of the protein-coding sequence and 3' untranslated region of an mRNA in pluripotent stem cells. The concept could be more broadly applied.

      Weaknesses:

      The initial observation (distinct localization of CDS and 3' UTRs) and the causal relationship between the KO and phenotype need further validation.

      Major points:

      (1) The authors showed distinct localization patterns of the CDS and 3' UTRs in human and mouse ESCs and blastocysts, and the overlap between their signals was minimal (Figure 1). Does this mean that the CDS and 3' UTR RNAs exist separately? For example, in cells that only showed signals for 3' UTRs, do these RNAs only contain 3' UTRs and lack CDS? Was this confirmed by RNA-seq experiments? If so, how are they generated (i.e., by transcription from a novel promoter or partial degradation of the full-length mRNAs)? This is a key question. Without a clear characterization of these RNAs, the rest of the study cannot be substantiated.

      We thank the reviewer for raising this important question, which overlaps substantially with several key points raised by Reviewer #1 concerning the molecular nature and characterization of the Nanog CDS and 3’UTR species.

      Colony border cells exhibit strong Nanog 3’UTR signal with minimal detectable CDS signal, while adjacent interior cells show the reciprocal pattern. These observations strongly suggest the existence of distinct Nanog transcript species rather than exclusively full-length transcripts containing stoichiometric amounts of both CDS and 3’UTR sequence.

      This conclusion is independently supported by full-length Smart-seq2 scRNA seq datasets from both mouse and human ESCs, which provide transcript coverage across both CDS and 3’UTR regions.

      (2) To confirm that the phenotypes of CDS or 3' UTR KO cells were caused by the deleted regions instead of other artifacts, rescue experiments should be performed.

      Rescue experiments were included in the original submission (Fig. 4). The revised manuscript will expand these analyses to include cell spreading. We will also include additional ROCK pathway modulation experiments.

      (3) As over-expression of the 3' UTR showed a phenotype, important regions within it should be identified, and also the possibility that the 3' UTR contains open reading frame(s) and is translated should be tested.

      The revised manuscript will also include supplementary computational analyses of the Nanog 3’UTR, including open reading frame prediction, Kozak scoring, and evolutionary conservation analysis. (New Figure 2-figure supplement 1B). These analyses identify no evidence for strongly supported coding potential within the 3’UTR. Further, isolated Nanog 3’UTR transcripts are largely confined to the nucleus, making active translation unlikely.

      The revised manuscript will include new supplementary analyses addressing Nanog transcript structure and possible biogenesis mechanisms (New Figure 2-figure supplement 1C).

      References:

      ViennaRNA/RNA fold – Lorenz et al 2011 Algorithms Mol Biol 6:26- RNA Secondary Structure stem loop, minimum free energy (MFE) prediction

      NCBI BLASTP- Altschul et al (1990) J Mol Biol 215:403- ORF conservation, protein sequence similarity search

      NCBI Entrez/Biohthon- Cock et al (2009) Bioinformatics 25:1422- sequence retrieval

      PhastCons/UCSC multiz alignments- Siepel et al (2005) Genome Res 15:1034- evolutionary conservation scoring

      UCSC Genome Browser- Kent et al. (2002) Genome Res 12:996-1006- conservation track access

      Eaton et al (2020) Mol Cell 78:439- Stall model

      Brannan et al (2012) Genes Dev 26:2621-Stall model

      Addition to Methods.

      ORFs (≥10 amino acids) were identified in all three forward frames according to Kozak (1987). Evolutionary conservation was assessed by BLASTP (Altschul et al., 1990) against RefSeq proteins. Poly(A) signals were identified by pattern matching for canonical and non-canonical hexamers. Conserved sequence blocks were obtained from UCSC PhastCons tracks (Siepel et al., 2005). RNA secondary structures were predicted using ViennaRNA RNAfold (Lorenz et al., 2011) with a sliding 80-nt window. The stall model for isolated transcript generation follows Eaton et al. (2020).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers and we are glad that they acknowledge this work to be a timely contribution to a quickly moving field and a valuable tool to generate testable hypothesis. We are pleased that reviewer #2 highlights that “a major strength is the combination of orthogonal evidence types” and that the tool serves to generate novel hypothesis. The revised manuscript will sharpen the positioning of the study within this context. Additional experimental evidence will be provided to address the points raised by reviewers #1 and #3.

      Reviewer #1* 1.The authors do not co-IP ARF1. This does not surprise me as small GTPases often hydrolyse their GTP during lysis. *

      We agree that this is likely due to transient association and GTP hydrolysis during lysis and will add a section to the manuscript.

      There have been a number of ARF1 bioID screens done- have the authors checked if their complex has turned up here?

      We will include this in the revised manuscript.

      1. I am a bit confused by some of the interpretation about KO and loss of JTB staining. They interpret: "The SYS1 acts as a Golgi recruitment factor for both ARFRP1 and JTB". The ARFRP1 has been published and is a cytosolic protein, so that makes sense. However, the JTB is not cytosolic by a membrane protein, so cannot be "recruited". Now maybe it is retained in the Golgi by this interaction, but if that is the case you would still expect signal on another organelle or the plasma membrane (and we see it isnt degraded in the lysosome due to the western blot). I am confused by the authors model here.

      We will clarify the phrasing and will provide a clearer interpretation, also considering the other improved imaging experiments that will be included in the revised manuscript.

      4.The authors validate their JTB antibody and confirm the fact that there are not reduced SYS1 levels in the JTBKO- this is very clear (albeit unquantified). What I do not see validated is the SYS1KO. I think this is quite important.

      We will validate SYS1 KO using TIDE and/or western blotting.

      5.The colocalisation in panel 3D is weak and unclear to me. It is not quantified. It is not clear if there have been 3 repeats.

      The revised manuscript will include improved imaging data. We will repeat relevant experiments, include appropriate controls and quantify where necessary.

      6.The imaging in figure 3 is not clear in places, and it stands out in a very clear manuscript. I cannot see the JTB in panel F. There are no scale bars. The dynamic range of the image is not utalised. I do not see the stain in the JTB in either of the sys1 KO, i do not see the SYS1-FLAG staining in the complement, and it is not quantified at all. It may all seem trivial, but (to me) this is an absolutely critical bit of biology data to support the informatics.

      The revised manuscript will include improved imaging data. We will repeat relevant experiments, include appropriate controls and quantify where necessary.

      7.I am a bit unconvinced by the interpretation of it being a retrograde trafficking complex. This is for 2 key reasons- 1) the VSV-G is antrograde (despite unusually they interpret a "severe defect in retrograde transport"). 2) Even if it was only having an effect in the retrograde direction I would still remain a little open minded about it as you can easily mistake trafficking of a protein in one direction for another if an unknown protein (SNARE for example) has defective trafficking.

      We used VSVG-KDEL in this assay. This setup specifically measures retrograde trafficking. We will clarify this in the revised manuscript. We will clarify in the Discussion that we confirmed a role in retrograde trafficking but cannot exclude a role in anterograde trafficking

      Reviewer #2

      Major comment: scope and interpretation of DepMap-derived functional evidence The manuscript could benefit from more clearly defining the scope of the functional evidence used to nominate complexes. The central co-dependency signal is derived from DepMap 24Q2 CRISPR gene-effect profiles, which are primarily cancer cell-line fitness/proliferation data. This is an important limitation because the resulting correlations may preferentially capture complexes or pathways that influence viability in proliferating cancer cells, while missing complexes active in differentiated, tissue-specific, stimulus-dependent, or non-proliferative contexts. Conversely, some correlations may reflect shared cancer-lineage or fitness dependencies rather than direct participation in a stable complex. The authors are appropriately cautious in stating that DepCom is not a complete inventory of human protein complexes, but the title, framing, and resource description could still be read as implying a more general catalogue of functional protein complexes. The authors might consider adding a clearer introduction to DepMap and explicitly discuss how the cancer-cell-line origin of the data affects interpretation of the 518 predicted complexes. This could be addressed without new experiments, for example by adding text early in the Results section explaining what the CRISPR gene-effect scores measure, and by expanding the Discussion to clarify that DepCom represents structurally plausible complexes prioritized by co-dependency across cancer cell lines, rather than an unbiased or context-independent map of human protein complexes. The selection of highlighted examples would also benefit from clearer justification. The peroxisome, actin, WNK/TSC22D2, and Golgi/JASS examples are biologically interesting, but the rationale for choosing them is not always explicit. Were they selected because they were novel, high-confidence, disease-associated, experimentally tractable, or representative of different resource categories? Briefly stating the selection criteria would help readers understand whether these examples are illustrative case studies or representative outcomes of the pipeline.

      We agree with the reviewers' assessment that this resource should be viewed as hypothesis-generating and that the overall framing should be improved. We will revise the manuscript at the appropriate sections, according to the more detailed comments of all reviewers.

      Minor comments

      1. Clarify post-clustering removal of large/problematic protein families and complexes. In the Methods, the authors state that "clusters of histones and keratin clusters, as well as the mito-ribosome, complexes of the electron transport chain and the mediator complex" were removed because of their large sizes. This filtering step would benefit from additional detail. Please specify the criteria used to define these removed clusters, how many clusters/proteins were removed at this stage, and whether removal was based only on size or also on biological/manual curation. It would also be helpful to explain why these proteins or clusters were removed after clustering rather than excluded before graph construction and clustering, since highly connected or compositionally biased protein families could potentially influence neighboring cluster assignments. If available, a brief robustness check showing that pre-removal of these proteins gives similar candidate complexes would strengthen confidence in the clustering procedure.

      We will add the requested information to the relevant section. Alongside the manuscript we will also provide lists of the complexes before and after every filtering step

      1. Clarify the rationale for excluding complexes larger than 5000 residues. The 5000-residue cutoff is understandable for AF3 computational cost, but the manuscript should briefly state how many candidate complexes were excluded by this cutoff and whether this preferentially removes known large assemblies. This would help readers understand the scope of complexes that DepCom is expected to miss.

      Alongside the manuscript we will now also provide lists of the complexes before and after every filtering step.

      1. Improve wording in the CAP1/CFL1/WDR1/ACTB example. The sentence "Additionally, CAP1 works in concert with CFL1 to accelerate depolymerisation, though if a four-protein complex consisting of actin, WDR1, CAP1 and CFL1 is relevant is not clear" is difficult to parse. Possible revision might be something like: "Additionally, CAP1 works in concert with CFL1 to accelerate depolymerisation, although it remains unclear whether actin, WDR1, CAP1 and CFL1 form a stable four-protein complex in cells." This more clearly separates known biology from the speculative interpretation of the DepCom prediction.

      Wording will be improved.

      1. Improve reproducibility details for AF3 predictions. The Methods state that predictions were run using a local AF3 installation, but reproducibility would be improved by reporting relevant AF3 settings, number of seeds/models per complex, whether templates were used, how disordered regions were handled, and whether predictions were repeated for all complexes or only selected examples. This is especially important because the manuscript notes that multiple predictions can yield different subunit arrangements.

      We will provide detailed settings in the methods section. Regarding disordered parts: All predictions used full length sequences (canonical UNIPROT ID) for each protein, so disordered residues are included. If disordered regions have low PLDDT and poor PAE, these regions will simply not score as interfaces in AlphaBridge. The one exception where we did crop structures is Figure 2D, but purely for visualization purposes, the full length complex did score in the pipeline (uncropped).

      Reviewer #3

      Co-essentiality is not the same as physical complex membership. This is the biggest conceptual concern. Genes in the same pathway are co-essential whether or not their products bind. The authors lean on the structural prediction step to filter this out, but that means the entire pipeline rests on AF3+AlphaBridge being correct about who interacts with whom. There is no independent benchmarking shown of how often AlphaBridge calls a true positive vs a false positive at the chosen 0.5 cutoff. Why 0.5? Where does that number come from? A short benchmarking section using known complexes (CORUM 5.0, hu.MAP 2.0, the PDB) would make the choice defensible. Right now it reads as arbitrary.

      We thank the reviewer for bringing up the need for such an important clarification. We fully agree that co-essentiality does not equal physical interaction and structure predictions are imperfect. This is precisely the logic underlying our pipeline design, not a limitation we overlooked. The two data sources are used sequentially and serve distinct roles: first, we construct protein sets that are connected through networks of predicted binary physical interactions; then we cluster these based on DepMap correlations, selecting likely physical complexes that display co-essentiality between their components.

      In other words, clustering on DepMap data alone would certainly return many spurious correlations: as the referee points out “Co-essentiality is not the same as physical complex membership”. Anchoring the search space with structural predictions substantially reduces this noise. Neither data source alone is sufficient, nor do we claim either is definitively "correct": the value lies in their combination. We hope improved phrasing in the revised manuscript will highlight this better.

      Regarding benchmarking AlphaBridge score: we have benchmarked AlphaBridge, in response to reviewer feedback on the original AlphaBridge paper (Structure, Cell Press). In the figure here it is clear that in our benchmark of PDB structures (with

      Comparison to existing resources is incomplete. I can't help but wonder what was found here that would not have been possible by analysing existing resources. CORUM 5.0 (7,193 mammalian complexes, ~71% human-derived; Tsitsiridis et al. 2024 NAR), hu.MAP 2.0 (Drew et al. 2021, ~6,965 complexes from >15,000 MS experiments), BioPlex 3.0 (Huttlin et al. 2021, 118,162 interactions in HEK293T), ad the Complex Portal already cover a large fraction of the human complexome. The authors compare to PDB, the original interactome paper, and Complex Portal, but they explicitly skip CORUM and hu.MAP, both of which are central reference resources in this space. Without including these, the "60 complexes unique to DepCom" number is not really meaningful. This needs to be redone properly.

      We will add the comparison with Corum and hu-MAP in the revision.

      Validation rate is one out of 518. The JASS work is solid, but a single experimentally validated complex out of 518 gives the reader essentially no estimate of how often the rest of the predictions are correct. Even a smaller systematic effort, say IP-MS on five to ten predicted novel complexes in the same cell line, would do an enormous amount to establish how trustworthy the resource is. The authors already have the V5/IP-MS pipeline running. Right now the manuscript implicitly asks the reader to trust 517 predictions on the strength of one validation.

      In this paper we validated one out of the 60 complexes we claim are new. Notably we provide new biological data and demonstrate how consulting our resource, or following the same logic of combining functional and structural information, can lead to new exciting discoveries. We note that out of the 518 complexes we list, 69 complexes are exactly mirrored in the PDB and/or Complex Portal, while for another 389 there is partial evidence. Thus, our dataset is amply validated, and at the same time contains data to enable new discoveries. We also note, that following the release of our resource eight months ago, a new high-impact publication “validated” a complex we have independently picked in DepMap (Oosterheert et al, Choreography of rapid actin filament by coronin, cofilin and AIP1, Cell, 2025). We will rephrase relevant sections (also in response to reviewer 2) to increase clarity about validation.

      The functional and disease clustering is potentially circular. GO terms and STRING associations are themselves derived in large part from the published literature on protein function, including text mining channels in STRING, much of which is downstream of complex membership. Of course complexes cluster into "DNA repair" and "vesicle trafficking" if you cluster on GO and STRING. The same applies to Open Targets, which integrates GWAS Catalog, ClinVar, literature mining, and other sources. The clustering is fine as a navigation aid for the website, but it is not, as currently presented, an independent validation of anything. I would tone the discussion down accordingly.

      We did not mean to present the clustering as an independent validation. We will tone down the discussion accordingly.

      AF3 limitations on this class of problem. AF3 itself acknowledges limitations (Abramson et al. 2024, including the December 2024 addendum), and subsequent benchmarking has flagged disordered regions, dynamic/large assemblies, and certain transmembrane systems as known weak points. The JASS complex is largely transmembrane, the WNK1-TSC22D2 example involves disorder-to-order transitions, and several flagship examples involve large multi-domain proteins. The authors acknowledge some of this in passing but should state explicitly which complexes were trimmed, how the trimming choices were made, and whether predictions were repeated with different seeds to check stability. Figure S4 is a good start, but for a resource paper a more systematic seed-stability analysis is warranted.

      No complexes were trimmed for the initial AF3 predictions. The WNK1-TSC22D2 example was trimmed and re-predicted only for visualization purposes. We apologize for the misunderstanding and will state this more clearly.

      AF3 certainly has limitations. Regarding disordered regions, these will almost always be assigned a poor pLDDT (also if AF3 wrongly folds them into helices). AlphaBridge will not pickup these low pLDDT regions as interfaces. Regarding dynamic assemblies, these might again lead to poor confidence scores and consequently these will not be picked up as interfaces by AlphaBridge. If AF3 confidence metrics are analyzed properly, the main concern for both disordered regions and dynamic assemblies is to miss true positive interactions, rather than finding false positive. As we did not aim to identify all possible human complexes, we consider focusing on the most confidently predicted interactions to be a fair trade off.

      While the JASS complex is indeed a membrane protein complex, the predictions are exceptionally confident across multiple seeds (we can provide predictions from multiple seeds for revision), and validates experimentally. Of course, structure predictions are no substitute for experimental structures, as cautioned multiple times throughout the manuscript.

      Figure S4 shows that despite the complex overall geometry being flexible, the interaction sites are predicted with high confidence across different poses. Since the aim of this study was to identify proteins interacting with each other, not accurate structures (which need to be solved experimentally), we argue that recomputing all structures with multiple seeds is disproportionately expensive computationally and would delay publication of a timely study while adding little.

      Statistics are thin in several places. On the Fisher exact test for Golgi/ER enrichment in V5-JTB IP-MS (Supplemental Table 1), an odds ratio of 2.77 is modest, and there is no comparison to a matched control IP. Is this more than expected by chance against an appropriate background? The IP-MS volcano plots show many significant proteins, but how was the background controlled? On the LLM section, no quantitative evaluation is presented at all and the assessment is admitted to be subjective.

      We will qualify the conclusions drawn from the IP-MS experiments. We maintain that together with the additional cell biology data, we build a compelling and convincing picture for this JASS complex.

      Experimentally, the background is controlled by measuring enrichment over WT cell lines that have undergone the same IP procedure as the V5-SYS1/JTB expressing cells (lysis, incubation with the anti-V5 conjugated beads, same wash procedure and sample processing), as is the standard in the field. We will clarify in the Methods section. Regarding identification, FDR rate was set to 1% at protein and peptide level and peptide spectrum matches (PSMs) were additionally filtered for SequestHT Xcorr score >1.

      We agree with the referee that the LLM interpretation is subjective and cannot be benchmarked. We suggest revising the resource and the paper, only providing structured LLM prompts to facilitate users asking the right questions, but we will not provide the LLM answers as part of the resource.

      The 4�ACTB speculation. The authors themselves note the AlphaBridge score declines from 0.9 (1�ACTB) to 0.78 (4�ACTB), yet they speculate about functional implications. This is exactly the kind of post-hoc rationalisation around weak evidence that should either be supported with experiment or removed. Either remove or qualify as speculative.

      We will qualify this as speculative

      The LLM-assisted analysis. I am genuinely uncomfortable with releasing 76 LLM-generated complex annotations as part of a published resource when the authors openly state these have "not been systematically validated". Putting these summaries on a website with the imprimatur of a peer-reviewed paper will lead to them being cited and reused. At minimum, the website needs prominent warnings on every page where an LLM summary appears, the prompts must be fully reproducible (not just downloadable as JSON), and a small validation table, say 10 complexes scored by a domain expert for accuracy of each claim, should be included as a supplemental figure. As it stands this section reads like an enthusiastic add-on that has not been thought through with the same care as the rest of the work.

      We thank the referee for bringing forward this consideration. We agree to remove the LLM answers for the 78 complexes from the manuscript and from the website, to ensure that the outputs cannot be cited. We will provide two different objective structure prompts for download to encourage variety in responses for curious users who want to explore. We will add a prominent disclaimer noting that responses resulting from these prompts cannot be interpreted as facts without validation.

      We cannot guarantee reproducibility with modern LLM inference architecture. Even if seeds are kept the same and temperature=0, floating-point non-determinism in GPU operations, distributed inference, and batch effects may lead to different results. Furthermore, models go through many different iterations rapidly. As a consequence, it is impossible for us to guarantee reproducibility

      Cutoffs and cluster numbers need stability analysis. The cutoff for the 75th-percentile DepMap correlation (mean of random + 3 SD = 0.147) is reasonable but should be accompanied by an FDR or precision/recall estimate against a labelled reference set. The choice of 20 final clusters in functional clustering (because that gave a peak in silhouette score) and 14 for disease clustering should also be supported by stability analysis, e.g. resampling.

      The 75th percentile cutoff is, in our opinion, well justified and sufficient for our purposes. FDR and precision recall need a set of true and false positives. The DepMap correlation clusters are an intermediate step in our pipeline and do not necessarily hold the final complexes. How can intermediate reference DepMap clusters be constructed and defined as true or false positives? Even if we would score clusters that contain a known complex as true positives, how to define false positives? If clusters do not contain a known complex, that does not necessarily mean that these proteins don’t interact, just that they have not been shown to interact yet.

      We will run resampling to improve confidence in the choice of cluster number.

      Internal numerical consistency. The bioRxiv preprint abstract refers to 354 high-confidence multi-protein complexes, while the body of the manuscript discusses 518 (224 dimers + 294 multimers). The relationship between these numbers should be stated explicitly. Likewise, the breakdown of "60 unique to DepCom" into 41 heterodimers + 19 multimeric should be reconcilable in the figures and tables. The number "9,764 unique seed proteins" should also be clarified to confirm it is the DepCom-internal seed set and not inherited from the Zhang et al. coverage or hu.MAP 2.0 (9,963 proteins). These are easy fixes but matter for a resource paper.

      BioRxiv preprint: The preprint that the reviewer read is an older version, which will be updated. .

      The 9,764 unique seed proteins is from the Zhang et al paper, and are the human proteins identified to confidently interact with at least one other human protein. We will make this more clear.

      Mander's overlap coefficient. The VSV-G(ts045)-KDELR retrograde-transport assay is well established and the experiment is clean, but MOC has been increasingly criticised in the colocalisation literature (Adler & Parmryd 2010, 2021). Best practice is to also report Manders' M1/M2 coefficients or Pearson's correlation alongside MOC. Adding these would be straightforward and would strengthen Fig 4B.

      We will improve co-localization measures where appropriate.

      Minor comments 1. Page 4: "candidate sets of potential multi-protein complex members". Pick one, they are either candidates or potential, not both.

      Will be addressed.

      Page 7: "Complex 294... mechanistic basis for CFL1 and WDR1 cooperation has only recently been described". Please update the reference list and language given how recent this is.

      Will be addressed.

      Page 7: JTB is described as "poorly characterised". This is a bit too strong. JTB has been studied in the context of TGF-β-induced mitochondrial regulation (Kanome et al. 2007), cytokinesis and chromosomal passenger complex association (Platica et al. 2011), the structural characterisation of its extracellular domain (Rousseau et al. 2012), and breast cancer biomarker work (Jayathirtha et al. 2022). A more accurate framing would be "incompletely characterised, with previously reported but functionally unresolved roles". The novelty here is the Golgi connection, which is genuine.

      We will rephrase.

      Page 8: the citation of Blomen et al. 2015 Science for "Golgi-related synthetic lethality" should be checked against the actual supplementary data of that paper to confirm the JTB attribution is correct.

      Will be check.

      Figure 1: as in many omics papers, please think of us colourblind readers. The pink-green DepMap correlation scale will be hard for some of us.

      The color scheme in use, alongside others, was tested with two colleagues that have different variants of colour blindness and was judged to be the best compromise.

      Figure 5A and 5B: 21 and 14 colour-coded clusters respectively in a single UMAP is too much. Consider splitting into separate panels by broad theme or providing an interactive version only.

      We will focus on a subsection, and provide the full interactive version on the homepage

      Page 11: "manually evaluated the quality of outputs". By whom, blinded to which model produced which output? Methods are silent on this.

      As stated above, we will remove the LLM part

      Some figures show "hairballs" with very limited informative content. Fig. 1B left panel and the AlphaBridge wheel plots in particular convey relatively little at the size shown.

      We will try and find a way to draw the AlphaBridge circular plots in better resolution; we do not however that the reviewer’s observation might be an artefact of the PDF file distributed to reviewers.

      The reference list looks a bit thin on prior systematic complexome efforts. BioPlex 3.0 (Huttlin et al. 2021 Cell), hu.MAP 2.0 (Drew et al. 2021 MSB) and CORUM 5.0 (Tsitsiridis et al. 2024 NAR) should all be cited and discussed.

      We will include the additional references where appropriate

      The discussion section drifts into general comments about AI in science that don't add much. I would cut about a third of it and use the space for a more careful framing of the actual contribution.

      We will shorten the discussion section and phrase more carefully.

      General assessment Reviewer #3: The strongest aspect of this study is the JASS complex story. The IP-MS, the SYS1-KO rescue experiment, the VSV-G(ts045)-KDELR transport assay, and the orthogonal CRISPR screens with diphtheria and Pseudomonas exotoxins together build a convincing case for JTB as a regulator of Golgi-to-ER retrograde trafficking. This part of the paper is genuinely nice work and would stand on its own. The pipeline itself, combining structural predictions with functional dependency data and filtering with AlphaBridge, is sensible and timely. It is a reasonable demonstration of how confidence filtering should be done at this kind of scale. The main limitations concern the resource framing. After reading the manuscript several times I am still trying to identify the central novel contribution beyond the JASS validation. The interactome predictions are taken from Zhang et al., DepMap is public, AF3 is public, AlphaBridge is the authors' own previously published tool, and GO/STRING/Open Targets/dbPTM are all public. The manuscript is essentially an integrative pipeline plus a website plus one experimentally followed-up complex. The framing oversells what is genuinely new. The authors' own comparison (Fig. S3) shows 60 complexes "unique to DepCom" out of 518, of which 41 are heterodimers and only 19 are multimeric. Nineteen genuinely novel multi-protein complexes is still a contribution but it is a long way from the 354/518 that the abstract and discussion implicitly emphasise. The validation rate (one of 518) and the missing comparisons to CORUM 5.0 and hu.MAP 2.0 are the two issues that most need addressing.

      We will rephrase these issue to adjust the framing. We would put forward that the main contribution of this manuscript is to present an integrative framework that combines data from orthogonal sources to highlight the possibility of structure prediction models to serve as a discovery tool. The reviewer identifies correctly (albeit derogatorily) that this is “essentially” an integrative pipeline. But it is an integrative pipeline that combines genetics and computational structure predictions in a novel (to the best of our knowledge) way and surfaces interesting new biology. The biology of the JASS complex goes well-beyond simple validation experiments, and we believe its discovery (based on our data) carries more value that the reviewer attributes to it.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Characterising protein complexes is a fundamental goal in modern molecular cell biology. Here, Uckelmann and colleagues have presented a solution to part of this problem. By combining functional clustering with alphafold modelling, they present a high throughput bioinformatic solution. The paper and figures are exceptionally clear and well presented. The conclusions are reasonable, and the data interesting. I am a cell biologist with expertise in molecular machinery of trafficking, so the focus of my review will be on the identification of a new complex, that is proposed to have a role in retrograde trafficking. On the whole I find this a interesting and convincing finding. However I have some comments and questions that I hope may help the authors. I will naturally focus my comments on the cell biology.

      1.The authors do not co-IP ARF1. This does not surprise me as small GTPases often hydrolyse their GTP during lysis. 2.There have been a number of ARF1 bioID screens done- have the authors checked if their complex has turned up here? 3.I am a bit confused by some of the interpretation about KO and loss of JTB staining. They interpret: "The SYS1 acts as a Golgi recruitment factor for both ARFRP1 and JTB". The ARFRP1 has been published and is a cytosolic protein, so that makes sense. However, the JTB is not cytosolic by a membrane protein, so cannot be "recruited". Now maybe it is retained in the Golgi by this interaction, but if that is the case you would still expect signal on another organelle or the plasma membrane (and we see it isnt degraded in the lysosome due to the western blot). I am confused by the authors model here. 4.The authors validate their JTB antibody and confirm the fact that there are not reduced SYS1 levels in the JTBKO- this is very clear (albeit unquantified). What I do not see validated is the SYS1KO. I think this is quite important. 5.The colocalisation in panel 3D is weak and unclear to me. It is not quantified. It is not clear if there have been 3 repeats. 6.The imaging in figure 3 is not clear in places, and it stands out in a very clear manuscript. I cannot see the JTB in panel F. There are no scale bars. The dynamic range of the image is not utalised. I do not see the stain in the JTB in either of the sys1 KO, i do not see the SYS1-FLAG staining in the complement, and it is not quantified at all. It may all seem trivial, but (to me) this is an absolutely critical bit of biology data to support the informatics. 7.I am a bit unconvinced by the interpretation of it being a retrograde trafficking complex. This is for 2 key reasons- 1) the VSV-G is antrograde (despite unusually they interpret a "severe defect in retrograde transport"). 2) Even if it was only having an effect in the retrograde direction I would still remain a little open minded about it as you can easily mistake trafficking of a protein in one direction for another if an unknown protein (SNARE for example) has defective trafficking.

      Significance

      Characterising protein complexes is a fundamental goal in modern molecular cell biology. Here, Uckelmann and colleagues have presented a solution to part of this problem. By combining functional clustering with alphafold modelling, they present a high throughput bioinformatic solution. The paper and figures are exceptionally clear and well presented. The conclusions are reasonable, and the data interesting. I am a cell biologist with expertise in molecular machinery of trafficking, so the focus of my review will be on the identification of a new complex, that is proposed to have a role in retrograde trafficking. On the whole I find this a interesting and convincing finding.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The strength of this manuscript lies in the behavior: mice use a continuous auditory background (pink vs brown noise) to set a rule for interpreting an identical single-whisker deflection (lick in W+ and withhold in W− contexts) while always licking to a brief 10 kHz tone. Behaviorally, animals acquire the rule and switch rapidly at block transitions and take a few trials to fully integrate the context cue. What's nice about this behavior is the separate auditory cue, which shows the animals remain engaged in the task, so it's not just that the mice check out (i.e., become disengaged in the W- context). The authors then use optical tools, combining cortexwide optogenetic inactivation (using localized inhibition in a grid-like fashion) with widefield calcium imaging to map what regions are necessary for the task and what the local and global dynamics are. Classic whisker sensorimotor nodes (wS1/wS2/wM/ALM) behave as expected with silencing reducing whisker-evoked licking. Retrosplenial cortex (RSC) emerges as a somewhat unexpected, context-specific node: silencing RSC (and tjS1) increases licking selectively in W−, arguing that these regions contribute to applying the "don't lick" policy in that context. I say somewhat because work from the Delamater group points to this possibility, albeit in a Pavlovian conditioning task and without neural data. I would still recommend the authors of the current manuscript review that work to see whether there is a relevant framework or concept (Castiello, Zhang, Delamater, 'The retrosplenial cortex as a possible 'sensory integration' area: a neural network modeling approach of the differential outcomes effect of negative patterning', 2021, Neurobiology of Learning and Memory).

      The widefield imaging shows that RSC is the earliest dorsal cortical area to show W+ vs W− divergence after the whisker stimulus, preceding whisker motor cortex, consistent with RSC injecting context into the sensorimotor flow. A "Context Off" control (continuous white noise; same block structure) impairs context discrimination, indicating the continuous background is actually used to set the rule (an important addition!) Pre-stimulus functional-connectivity analyses suggest that there is some activity correlation that maps to the context presumably due to the continuous background auditory context. Simultaneous opto+imaging projects perturbations into a low-dimensional subspace that separates lick vs no-lick trajectories in an interpretable way.

      In my view, this is a clear, rigorous systems-level study that identifies an important role for RSC in context-dependent sensorimotor transformation, thereby expanding RSC's involvement beyond navigation/memory into active sensing and action selection. The behavioral paradigm is thoughtfully designed, the claims related to the imaging are well defended, and the causal mapping is strong. I have a few suggestions for clarity that may require a bit of data analysis. I also outline one key limitation that should be discussed, but is likely beyond the scope of this manuscript.

      Major strengths

      (1) The task is a major strength. It asks the animal to generate differential motor output to the same sensory stimulus, does so in a block-based manner, and the Context-Off condition convincingly shows that the continuous contextual cue is necessary. The auditory tone control ensures this is more than a 'motivational' context but is decision-related. In fact, the slightly higher bias to lick on the catch trials in the W+ context is further evidence for this.

      (2) The dorsal-cortex optogenetic grid avoids a 'look-where-we-expect' approach and lets RSC fall out as a key node. The authors then follow this up with pharmacology and latency analyses to rule out simple motor confounds. Overall, this is rigorous and thoughtfully done.

      (3) While the mesoscale imaging doesn't allow for cellular resolution, it allows for mapping of the flow of information. It places RSC early in the context-specific divergence after whisker onset, a valuable piece that complements prior work.

      (4) The baseline (pre-stim) functional connectivity and the opto-perturbation projections into a task subspace increase the significance of the work by moving beyond local correlates.

      Key limitation

      The current optogenetic window begins ~10 ms before the sensory cue and extends 1s after, which is ideal for perturbing within-trial dynamics but cannot isolate whether RSC is required to maintain the context-specific rule during the baseline. Because context is continuously available, it makes me wonder whether RSC is the locus maintaining or, instead, gating the context signal. The paper's results are fully consistent with that possibility, but causality in the pre-stimulus window remains an open question. (As a pointer for future work, pre-stimulusonly inactivation, silencing around block switches, or context-omission probe trials (e.g., removing the background noise unexpectedly within a W+ or W- context block), could help separate 'holding' from 'gating' of the rule. But I'm not suggesting these are needed for this manuscript, but would be interesting for future studies.)

      We thank the reviewer for the comprehensive summary of our work.

      We also thank the reviewer for highlighting the work from the Delamater group (Castiello et al., 2021), and we now briefly discuss this paper on P. 14 Lines 434-437 writing: “RSC was shown to contribute to negative patterning in behavioral tasks requiring rats to learn that the simultaneous presentation of two stimuli lead to an opposite outcome than each individual stimulus (Castiello et al., 2021).”

      We also agree with the reviewer’s noted ‘Key limitation’ regarding the role of RSC as either maintaining context representation or serving a gating function. The reviewer proposes an exciting set of further experiments inactivating RSC at different time points to investigate when RSC activity is needed. We hope to carry out such experiments in the future. We now include a brief discussion of this interesting point on P. 14-15 Lines 455-459 writing: “First, further inactivation experiments would shed light on the timing at which RSC activity is necessary for the integration of contextual information. Specifically, it would be of great interest to inactivate RSC at different time points such as during the intertrial interval or at the transition between contexts.”

      We have of course also addressed each of the more detailed comments from the “Recommendations for the authors” section, please see below.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to understand the neural basis of context-dependent sensory processing and decision-making.

      Strengths:

      They used an innovative behavioral paradigm where the action-outcome association changes independent of the sensory stimulus. This theoretically allows the authors to disentangle the effect of behavioral context on sensory processing. Using this approach combined with optogenetic silencing, they discover that RSC activity is necessary for suppressing a lick response when the stimulus switches to the unrewarded context.

      Weaknesses:

      Sensory processing appears to be entangled with jaw/tongue movement initiation. Activity in M1 and RSC during auditory-evoked lick responses appears to be identical to activity during whisker-evoked lick responses, indicating that movement initiation is the main driver of M1/RSC activity, rather than changes in the flow of sensory information. If sensory information were the main driver of the initial M1/RSC response, then auditory evoked responses should have a longer latency. Perhaps this is beyond the resolution of the calcium indicator or imaging frame rate. It is not clear from the data shown if differences in S1 activity when comparing W+ and W- stimulation are caused by context-sensitive sensory processing or whisker movement following whisker deflection.

      We thank the reviewer for the comments on our work and we agree that separating sensory processing and movement initiation is very important. In the revised manuscript, we have carried out several new analyses to specifically address the points of the reviewer. The most important point is that context-dependent activity in RSC emerges at ~50 ms after the whisker stimulus, which precedes any differences in movements of the jaw or whisker. Although sensory and motor representations become increasingly entangled after stimulus delivery, we think that the first ~100 ms after the whisker stimulus is a relatively safe period for analysing sensory processing and decision making before overt context-dependent differences in movements.

      Addressing the specific point “Activity in M1 and RSC during auditory-evoked lick responses appears to be identical to activity during whisker-evoked lick responses, indicating that movement initiation is the main driver of M1/RSC activity, rather than changes in the flow of sensory information.” - We have now directly compared the pattern of cortical activity evoked by whisker and auditory stimuli in correct trials in the W+ context (new Figure 3 – figure supplement 2). As expected, activity in wS1/wS2 and A1 is stronger in whisker and auditory trials respectively, following their sensory modalities. However, we also evidence a stronger response of wM1/wM2 in whisker trials as early as 40 to 60 ms following the stimulus, showing the specificity to the whisker system. We also observe a stronger response of RSC to whisker than to auditory stimulus. The auditory and whisker evoked responses are therefore different.

      Addressing the specific point “If sensory information were the main driver of the initial M1/RSC response, then auditory evoked responses should have a longer latency. Perhaps this is beyond the resolution of the calcium indicator or imaging frame rate.” – As stated above, the responses to auditory and whisker stimuli are different.

      Addressing the specific point “It is not clear from the data shown if differences in S1 activity when comparing W+ and W- stimulation are caused by context-sensitive sensory processing or whisker movement following whisker deflection.” - We think that the data shown in Figure 3F-H indicate that differences in S1 activity when comparing W+ and W- stimulation are not directly caused by context-sensitive sensory processing. On P. 9 Lines 270273 we write: “Early after stimulus onset, whisker deflection evoked similar activation of primary and secondary whisker somatosensory cortices (wS1 and wS2) in both W+ and W− contexts.” Indeed, context separation in wS1/wS2 only emerged later than 100 ms, which is indeed confounded by the difference in movement evoked by the sensory stimulus (now quantified in new Figure 3 – figure supplement 4). On the contrary RSC and wM1/2 responses to the whisker stimulus were different in W+ and W- at early time points (~50 ms for RSC and ~80 ms for wM1/2) which is consistent with context dependent sensory processing. At least 2 hypotheses could explain the absence of early difference in whisker evoked activity in wS1/wS2 between W+ and W-. The first one is that sensory activity in wS1/wS2 is not modulated by contextual information at all, while the alternative option would imply that sensory activity is mediated by different neuronal populations depending on context with an overall similar average response. We think this is an interesting question which we hope to address in future experiments using Neuropixels recordings and multiphoton cellular imaging to address the single neuron representation of whisker stimulus in wS1/wS2 according to context in the task presented here.

      We have of course also addressed each of the more detailed comments from the“Recommendations for the authors” section, please see below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Suggestions to strengthen the manuscript (no new data collection)

      (1) The block-switch dynamics were clearly demonstrated behaviorally. It would be very powerful to mirror this with an analysis of neural data around the block-switch: how do the various areas adjust immediately after a shift in the continuous contextual sound? Does the RSC show any evidence of changing activity patterns? How does the within-trial activity dynamic look as a function of the number of trials from the context switch? This could be done with the data collected for Figure 3 (for within-trial dynamics), but also for the pre-stimulus baseline activity data (Figure 4A-B).

      We thank the reviewer for raising this interesting point. We have now investigated the change of cortical activity at the transition between contexts (new Figure 3 – figure supplement 5). At the context transition, both to W+ and to W- contexts, we observed a rapid activation of the auditory cortex (new Figure 3 – figure supplement 5A). In addition, there appeared to be a slightly higher activation of RSC when transitioning to W- rather than to W+ (new Figure 3 – figure supplement 5A). In the future, it will be of great interest to further investigate this phenomenon.

      We also evaluated the whisker deflection-evoked responses of the different cortical regions according to the number of whisker trials from context switch (new Figure 3 – figure supplement 5B&C). This analysis revealed that while the sensory response in wS1 and wS2 were constant over the time course of a context block, the response of wM1/2 and especially RSC became progressively lower in the W- context, consistent with the behavioral results in Figure 1 supporting time-dependent contextual integration.

      Overall, these results strengthen the role of RSC and wM1/2 in integrating contextual information to guide the response to the whisker stimulus, and we thank the reviewer for raising this important point.

      (2) It might be useful to state 'earliest among the imaged dorsal cortical areas,' and briefly acknowledge potential subcortical contributors (since those were not explored and could be earlier than cortical areas).

      We agree with the reviewer. In the Summary, on P. 2 Line 39-40 we now write: “Widefield calcium imaging revealed that retrosplenial cortex was the first dorsal cortical area to show context discrimination in response to whisker stimulation”. On P. 8 Lines 257-258, we now write: “To investigate the spatiotemporal neural dynamics underlying task execution, we recorded calcium activity across the dorsal cortex in transgenic mice”. On P. 13 Lines 416-420 we now write: “Functional imaging of cortical activity with two different genetically-encoded calcium indicators each showed similar spatiotemporal dynamics of whisker sensory processing with the earliest contextdependent divergence in signalling being detected in RSC, out of the imaged dorsal cortical areas (Figure 3).” On P. 15 Lines 470-473, we now write: “Finally, it is of course important to note that many subcortical regions (as well as non-dorsal cortical regions, which were not imaged) are likely to contribute importantly to context-dependent task performance.”

      (3) Fit a simple exponential/logistic to lick probability vs time-since-switch (your Figure 1Hstyle analysis) to report a time constant with CIs; it will help quantify the integration of the continuous cue.

      We thank the reviewer for this suggestion. We have fitted an exponential to the grand average data to quantify the time constants for integration of contextual information before the presentation of the first whisker stimulus of the block (see new Figure 1H). On P. 6 Lines 170-173 we now write: “To assess whether this temporal integration would differ between contexts we fitted an exponential to the time evolution of the lick probability. This suggested a faster transition to the W+ context than to the W- context (W+ time constant: 9.4 s, W- time constant: 15.5 s) (Figure 1H).”

      (4) Because catch-trial false alarms are higher in W+ than W−, report per-context d′ and criterion for whisker trials (using signal detection theory); this separates sensitivity from bias and makes the behavioral shift more interpretable. It is also further proof that the behavior is contextual (versus a compound stimulus, for example).

      We have computed the d’ and criterion for the whisker trials in the W- and W+ contexts. (see new Figure 1 - figure supplementary 1D). As suggested by the reviewer, this further supports that the behavior is driven by contextual information.

      (5) For the pre-stimulus seed-correlation analysis, can you regress out the pupil/jaw/whisker activity to confirm whether the context modulation is (or is not) movement-driven? It would be helpful to better understand whether the baseline correlation is driven by differences in lowlevel factors between the contexts, versus the higher-level decision rule/context.

      The reviewer raises an interesting point. However, we did not find a straightforward way to regress out movements, and thus we leave this point for future in-depth analysis. On P. 11 Lines 354-357 we now write: “It is important to note that these context-dependent changes in resting-state functional connectivity could relate to the overt context-dependent movements in the prestimulus baseline (Figure 1I&J) and/or a manifestation of higher-level internal rule representations.”

      (6) For the earliest divergence analysis, is this consistent across animals and across sessions within animals? Can you show per-mouse distributions of first-crossing times (d′>2) for RSC vs wM1/2/wS2? This would help provide confidence in this key finding.

      The d’ presented in Figure 3H is computed as the discriminability between contexts at the population level, meaning that at each timepoint (from Figure 3F) we compared the 2 distributions built on N=6 mice. As such if the divergence between context was not consistent across animals this d’ would be low. That said, as suggested by the reviewer, we further investigated this context divergence at single mouse level and single session level. Our analysis supporting the main finding (Figure 3F-H) is shown in new Figure 3 – figure supplement 3.

      First, we show the results for a single mouse across sessions in Figure 3 – figure supplement 3A. We show the stimulus aligned activity in correct whisker trials in both contexts for the 3 recording sessions. For each session we quantified the main effect size defined as the difference of the trial average between contexts. Plotting the difference of mean response, we consistently observed that RSC ramps-up before wM1/2 for the 3 sessions.

      Second, across all individual mice: we further aggregated the session average responses to show discriminability between context for each region at the single mouse level (Figure 3 – figure supplement 3B). We show that RSC is the first region to exhibit context separation in 4 out of the 6 mice that we recorded. In 2 other mice all regions seemed to show context separation but without clear temporal ordering.

      Finally, when averaging across mice, we observed a clear separation and first discrimination in RSC (Figure 3F-H and Figure 3 – figure supplement 3C).

      Overall, these further analyses suggest that the early divergence of RSC activity appears to be robust with a consistent mean difference in single sessions and single mice, as well as across the population of mice. We think this analysis has strengthened our manuscript and we thank the reviewer for the valuable suggestion.

      (7) For the opto mapping data, could you provide P(lick) effect sizes with CIs per grid site? It would also be nice to summarize the qualitative dichotomy: RSC/tjS1 increases licking in W−; canonical wS1/wS2/wM/ALM decreases licking across contexts (to my understanding).

      We now provide the P(lick) effect sizes for the main cortical areas studied in the paper in Figure 2 – figure supplement 1C. This shows the relative change in lick probability in optogenetic trials compare to control trials for each mouse.

      Reviewer #2 (Recommendations for the authors):

      (1) Do mice move their whiskers after stimulus onset? If so, are these movements dependent on behavioral context? What causes the increase in S1 activity during auditory-evoked response trials?

      To answer the reviewer’s questions we have further investigated whisker movements following the sensory stimuli (whisker and auditory correct trials) in both contexts. The results of this analysis are presented in new Figure 3 – figure supplement 4.

      We find that mice move their whiskers shortly after the whisker stimulus in both contexts. The time course of whisker angle in correct whisker trials is similar in both contexts with a discriminability index (d’) consistently below 1. The whisker speed in response to stimulus is slightly higher in the W+ context compared to W- with a d’ slightly above 1 after ~100 ms. We also observed evoked whisker movements in auditory trials independent of context. Thus, whisker movements are indeed evoked by the sensory stimuli, but the overall context-dependent modulation of whisker movements is weak. The early differences in whisker-evoked cortical activity in W+ compared to W- contexts are therefore more likely related to the integration of contextual information than to differences in evoked movements.

      The reviewer is correct to point out that wS1 activity increases in auditory trials (Figure 3E). The response is initially very weak, but becomes more prominent after ~100 ms following the auditory tone. We do not know the underlying mechanisms, but there are several likely explanations. First, as discussed above, there are indeed some whisker movements evoked in response to the auditory stimulus (Figure 3 – figure supplement 4), which could result in sensory input to wS1. Equally, the increase could relate to licking, given the broad representation of movements in cortex and an appropriate reaction time in auditory trials (Figure 3C). Alternatively, wS1 activity in auditory trials could also be related to input connectivity from auditory cortex, top-down input from frontal cortex or subcortical regions such as high-order POm.

      (2) What do the authors think is causing the W+ vs W- difference in S1/S2 activity approximately 100ms after whisker deflection?

      The late W+ vs W- difference in wS1/wS2 activity could be explained by several factors. First this could be due to the difference in whisker movements after ~100 ms as shown in Figure 3 – figure supplement 4. Second this could be driven by the lick vs no lick activity (see reaction time in Figure 3C for whisker trials ~110 ms). Finally, this could be partly due to some movement independent top-down contextual information reaching wS1/wS2 at late time points. Overall, our claim in the paper is that there was no contextual difference in whisker primary and secondary cortices at early time points (before movement). On P. 9 Lines 270-273 we explicitly write: “Early after stimulus onset, whisker deflection evoked similar activation of primary and secondary whisker somatosensory cortices (wS1 and wS2) in both W+ and W− contexts.” In contrast, our main findings are grounded in the divergence of cortical activity in RSC and wM1/2 at early time points (<100 ms).

      (3) The choice of PC3 seems arbitrary. Is there no task-relevant information in PC1 and PC2?

      We appreciate the point raised by the reviewer and have clarified the reasoning leading to PC3 selection in the main text, where on P. 12-13 Lines 384-391 we now write: “The loadings of the first principal components were uniformly distributed and could reflect a late movement driven activation distributed across all cortical areas (Figure 4 – figure supplement 2C&D). PC2 loadings show variation along the anteroposterior axis that could reflect differences between sensory and motor regions but its time course does not separate between lick and no lick in control conditions (Figure 4 – figure supplement 2C&D). The loadings of PC3 highlighted task-related cortical regions and its time course exhibited clear differences comparing lick and no-lick trials.” In addition, we now also show the time courses for PC1 and PC2 in Figure 4 – figure supplementary 2D.

      Overall, the reasoning is the following:

      PC1 has spatially-homogeneous positive loadings (Figure 4 – figure supplementary 2C) and activity along PC1 gradually ramps up following sensory stimulation (Figure 4 – figure supplementary 2D). It is likely driven by widespread activation of the cortex following the whisker stimulus and the lick response. As such we believe that the taskrelated information captured by PC1 is movement related and not necessarily informative about processing of whisker and context.

      PC 2 has loadings varying along the antero-posterior axis (Figure 4 – figure supplementary 2C), which could be relevant for the task, but its time-course does not discriminate between lick and no lick neither in W+ nor W- (Figure 4 – figure supplementary 2D).

      PC3 has both loadings that vary between several cortical regions involved in the task (Figure 4 – figure supplementary 2C) and a time course that separates between lick and no lick in both contexts (Figure 4 – figure supplementary 2D). We thus focus on PC3 to investigate the effect of optogenetic inactivation on whisker stimulus evoked activity.

      The remaining components beyond PC3 contain a very small fraction of variance and were thus not considered.

      (4) Figure 3 - Supplement 1: What explains the change in fluorescence in GFP/tdT mice during W+ stimulation? Is it brain movement on the z-dimension? Could this explain differences in calcium imaging results?

      We thank the reviewer for this question. The nature of intrinsic signals is a complex topic, but brain movement is unlikely to contribute importantly, because under similar behavioral conditions we (and others) typically find brain movements to be on the scale of a few microns. The three most widely-reported contributions to intrinsic optical changes in cortex relate to:

      (i) Light scattering – as neurons integrate synaptic inputs and fire action potentials, the neuronal elements swell slightly due to the ionic and water fluxes (see for example Vincis et al. Cell Reports 2015, doi: 10.1016/j.celrep.2015.06.016). This reduces the refractive index mismatch between the intracellular and extracellular space. This in turn reduces light scattering, which could result in fluorescence increases.

      (ii) Hemodynamics – changes in blood volume and changes in oxygenation/deoxygenation will change the absorption of light at different wavelengths, in an activity-dependent manner (also forming the basis of BOLD fMRI signals).

      (iii) Flavoproteins – endogenous fluorescent proteins, such as flavoproteins present at high levels in mitochondria, have been reported to change their fluorescence depending upon neuronal activity, presumably in relationship to increased mitochondrial activity.

      We therefore think it is very important to image GFP/tdTomato-expressing mice as controls, and we would suggest that this should be carried out more commonly in the field. Indeed, similar to our results, another study (Yogesh et al., eLife 2025, doi: 10.7554/eLife.104914) recently reported upon the importance of carefully examining intrinsic fluorescence changes, which were found to be present in both wide-field and two-photon imaging of GFP expressing mice.

      Our results reported in Figure 3 – figure supplement 1, show that GFP/tdTomato signals over the first ~120 ms following whisker stimulation were much smaller that the equivalent changes in GCaMP6f/jRGECO1a-expressing mice, and therefore would only have a minor contribution to our analyses. However, we refrained from analysing fluorescence changes at later post-stimulus times, because the intrinsic signals indeed become increasingly prominent as the mice initiate licking.

    1. Author response:

      The following is the authors’ response to the original reviews

      General note

      We have issued a new release of the general Peekbank database, 2026.1, which includes more data integrity checks and several more datasets. As a result of this release, the underlying dataset we use in our paper has shifted slightly. The shifts represent a relatively small proportion of the total data and thus these changes have caused only relatively minor changes to our numerical results. We also highlight that we now include a small amount of data regarding children younger than 12 months, increasing the developmental range of our analysis (see Figure 1).

      Reviewer 1 (Public review):

      The limitations of the study are acknowledged to some extent, but need to be improved and ensured that they run throughout the manuscript. Thus, in the discussion, the authors note that the approach is observational and exploratory, and highlight for me a key alternative explanation of the findings, namely that faster children could be faster due to their larger vocabulary, rather than faster children learning more words. Indeed, the latter explanation for the relationship is called into question, given that growth in speed was not related to growth in vocabulary. Here, the authors note that the null result may be related to the fact that they do not sufficiently precise estimates of growth slopes, rather than taking the alternative explanation seriously that there may not be as causal a link between being a faster word learner and a better word learner (learn more words).

      Thank you very much for your challenging and thoughtful comments. In hindsight we did not realize that the way we were writing about our results was ambiguous between several interpretations (one of which we endorse and one of which we do not).

      We respond below to the specific suggestions about causal directionality in the longitudinal analysis, but we certainly believe that we cannot draw strong conclusions about causality from our dataset and have attempted throughout the paper to remove causal language that might have crept into our interpretation.

      In response to your comments, we have made a number of key revisions aimed at qualifying and clarifying our points:

      • The abstract now prominently notes that our design is observational: “In an observational study…”

      • The abstract notes a positive and a negative result in the relationship between word recognition and vocabulary: “Further, across a range of longitudinal models, speed, accuracy, and vocabulary were coupled. Children with overall faster word recognition tended to show faster vocabulary growth, though developmental growth in word recognition skill was not specifically associated with growth in vocabulary.”

      • The abstract removes potential casual language in the final sentence: “... these findings support the view that word recognition is a skill that develops gradually across early childhood and that this skill is deeply intertwined with early language learning.”

      • A new paragraph in the Results introduces the potential hypotheses investigated via the longitudinal models.

      • The final paragraph of the Results section sharpens the contrast between two possible growth hypotheses: “However, we did not find evidence for the stronger version of this claim: in neither the non-linear growth model nor the linear SEM did we find evidence that increases in speed were related to increases in vocabulary size. Thus, our findings do not support a ‘virtuous cycle’ model in which increases in recognition specifically lead to increases in vocabulary size.”

      We hope these changes lead to a manuscript that better aligns with the limitations of the study.

      This is especially since, but correct me if I’m wrong here, the current vocabulary size is not taken into consideration in the model examining vocabulary growth. Given the increasing number of studies showing that current vocabulary knowledge predicts vocabulary growth (Laing, Kalinowski et al, Siew & Vitevitch), one simple alternative explanation is that current vocabulary knowledge predicts both current word recognition skill and later vocabulary knowledge. Is there anything in the data speaking against this hypothesis?

      We think the reviewer’s overall point is generally correct, as we described above, but we want to clarify a specific statistical point. The non-linear longitudinal model of vocabulary growth does in fact take into account a child’s average vocabulary size. (This point feels tricky in a non-linear model but it’s actually quite similar to a linear model for the purposes of this discussion). Basically, vocabulary (at all timepoints) is modeled as a function of age, with both main effects and interactions with age. Critically, each participant is also modeled as having a random intercept capturing their deviation from the average growth pattern across ages (as expressed by the fixed effects). In this model, the “main effect” (here captured by the intercept for the logistic curve in the model) that we observe for speed indicates that vocabulary growth for individuals is predicted to be faster (their curve is shifted left) if their RTs are fast. The presence of the random effects in this model thus “controls” for the fact that some participants have overall higher vocabularies (and are shifted up relative to the average growth curve).

      But, we note that this model does not show an “interaction effect” (here captured by the null effect of RT on the slope parameter in the logistic model). That’s one of the null effects that we now call out much more prominently in the abstract and end of the results (per our response above).

      Equally, while the SEM examines vocabulary growth controlling for age, I wonder about the other way around. What would happen to the effect of age on word recognition skill (in the LME model, S8) if one were to add concurrent vocabulary size? So does chronological age explain word recognition skill or vocabulary knowledge? Right now, the manuscript describes this effect purely related to chronological age, but is it age per se or other cognitive abilities, including a key change across development, namely, vocabulary size? Thus, the presentation of the skill learning hypothesis suggests that age is a proxy for experience, while you actually have here a very nice proxy for experience in terms of children’s vocabulary size.

      Again, thank you for engaging with this tricky set of issues. Overall, our goal is to adjust the manuscript to reflect points of agreement; in particular, we agree that age is a proxy for language experience, vocabulary, and other cognitive changes, and we have stated this explicitly now in the intro to the factor analyses: “In our prior analyses, chronological age acts as a proxy for greater language experience and larger vocabulary as well as a host of other correlated developmental changes in cognition. Now we explicitly explore relations to vocabulary growth and the triadic relationship between age, word recognition, and vocabulary.”

      On the statistical side, we do think that the NLME (non-linear mixed effects; the logistic growth mode) effectively controls for average vocabulary size, as described above. The longitudinal SEM also relates vocabulary growth to growth in word recognition skill. In both models, we find no evidence for coupled growth; instead the evidence points to children with higher baseline word recognition skill showing faster growth in vocabulary (speed intercept significantly related to vocabulary slope, -.14, p < .01) but not the reverse (vocabulary intercept not strongly related to speed slope; -.01, ns).

      More generally, we hope our edits to the paper, detailed above, both clarify this tricky set of issues and also remove inappropriate casual language throughout.

      Critically, while the discussion is more nuanced, the way the abstract is concluded and the way the Introduction is phrased suggest that the study is able to answer a causal question, which, as the authors themselves note, is not possible. The abstract, for instance, states that word recognition becomes faster, more accurate and less variable...consistent with a process of skill learning. And also that this skill plays a role in supporting early language learning, which is very causal language. I don’t think you can really claim that you are testing the two hypotheses you suggest here. The work is definitely embedded in the context of these hypotheses, but are you really able to test them? My worry is that while the discussion is more nuanced, the extent to which this study will then be cited down the line as showing that children learn more words down the line because they are faster at recognizing words, and anything that you can do to tamper with such interpretations would be good for the literature. For me, this should not just be relegated to the discussion but should be touched upon in the abstract and Introduction.

      Thanks for pushing us to be more precise with how we frame and describe our findings. We agree with the reviewer that our findings do not warrant strong conclusions about the causal role of word recognition skill in vocabulary growth. Per our response above, we have now tried to carefully revise our language throughout the paper (in particular, in the abstract and introduction, as noted by the reviewer).

      Finally, it would help to talk more about the mechanisms at work in any relationship between word recognition and language learning. It seems to me that this would rely on some predictive processing framework, given the description on page 4, and it would be good to make this clear (faster and more accurately you can recognize a ball, better use this evidence to infer the speaker’s intended meaning).

      Thanks, this is a great point. We’ve revised this text and added references to predictive processing, unpacking a problematic paragraph into two:

      “Familiar word recognition -- as measured by LWL -- is hypothesized to play a key role in language learning (19). The idea, in a nutshell, is that the faster and more accurately a child can process incoming words, the more opportunities they have for learning. Consider a child hearing the utterance "Can you put the ball in the crate?" The better the child can recognize the word "ball", the better they can use this evidence to help infer the speaker's intended meaning, allowing possible inferences about the meaning of the less familiar word, "crate" (20).

      “Real time language processing, including word recognition, relies heavily on predictive processing, in which comprehenders integrate expectations from prior linguistic context with noisy and ephemeral incoming signals (21, 22). The more input a child receives, the better their predictions are likely to be, and hence the more they can learn (19, 23). Indeed, measurements of children's language input at home are consistently associated with their vocabulary size (24, 25). And, in line with this predictive processing framework, one important study found that children's word recognition speed mediated the longitudinal relationship between home language input and vocabulary growth (26). Thus, word recognition is thought to be a key support for ongoing word learning.”

      Equally, when referring to word recognition, it would be good to clarify what this refers to - how well a child knows what a word refers to (and in the context of LWL, what it does not refer to) or how quickly it directs attention to what is referred to.

      Thanks, we’ve added a capsule definition in the second paragraph, and added the sentence “This procedure [LWL] measures the general construct of word recognition by operationalizing knowledge of a meaning as visual attention to a specific named referent.” We hope this clarifies the relationship between LWL and word recognition.

      With regards to the data, I wonder if there is a clustering of kids past 24 months that is happening here, looking at Figures 1 and 2, where it seems like there is less change past the 24-month point. Is there any way to look at whether the effect of age or vocabulary on word recognition is not linear but asymptotic?

      Thanks for pointing this out; we do see what you are talking about but think it’s being handled appropriately in the analysis. In Figure 1 it clearly looks like changes to RT are asymptotic – this is why we analyze the logarithm of RT throughout the paper. In Supplement S6 we show that reaction time is indeed best fit by a log-log function. Your question about Figure 2 asks whether there is further structure beyond the log-log fit; in Supplement S7 we show some analyses that suggest a polynomial fit is not better than the log-log fit; there is some small additional linear effect of age over and above the log-log fit, but it’s minor and pretty hard to interpret in our view.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Page 3. Word production may manifest in overt behaviour but need not reflect complete knowledge. A child can say the word dog and use it to refer to a cat.

      This is a good point. Since we are not able to speak to the precision of meaning representations (an important issue in its own right), we have omitted the phrase “with incomplete knowledge.”

      Page 4. The first two sentences of the paragraph beginning with word recognition ability... don’t go together. The second sentence does not support the claim that word recognition plays a role in language learning.

      Thanks, we’ve tried to smooth out this transition as part of unpacking the role of predictive processes.

      Page 4. “predicts children’s standardized test scores years later” - make clear what test scores are here.

      We added some additional details. The specific tests were the CELF (expressive language) and the KABC (IQ), but we thought too much detail might be distracting.

      Page 5. I love Table 1, but would like for the data to be weighted somehow. So, given that some studies had a lot more trials and more children, what percentage of the data did this study contribute? That allows a clearer view of how biased the sample is in certain studies. The x in CDIS and longitudinal could be aligned to the right. I kept wondering why there was an x near some trials.

      Thanks, we’ve adjusted the table to add the percentage of the total dataset (in trials) due to each study and fixed the alignment issue.

      Page 6. 12 million individual samples: what samples are these? Individual data points per trial per time point. Making this clear would be great.

      Clarified, thanks.

      Page 9. Your accuracy measures only seem to consider the target. From what I remember of my preferential looking days, this measure usually also includes the distractor. Why do you not do this? This is especially since you have such a wide age range, so if a 12-month-old only looks for about 50 per cent of the trial and spends that time looking at the target, that is very different from a child who looks at the screen all of the trial and spends less time looking at the target here.

      Sorry for any lack of clarity: we do in fact compute accuracy as the ratio of looking to target over looking to target plus looking to distractor. We have added this information to the parenthetical referenced above: “... accuracy (more target looking; computed as the ratio of target to target plus distractor looking)”.

      Page 12. I only found out that age was in this model by looking at S9.

      Thanks for mentioning this omission, we’ve clarified in the text: “We initially add age as an additional variable to our models to explore whether this factor structure relates to age; later we treat age as a predictor of latent factors.”

      Page 12. Isn’t it trivial that speed and accuracy show negative covariance, especially given how you measure accuracy? Thus, if I take longer to fixate the target, I have less time to look at the target during the trial. If, however, I included the distractor in my accuracy measure, then I could still take longer to look at the target, but still look more at the target than the distractor.

      Thanks for mentioning that this covariance is not the key result of interest; that observation didn’t come out in the text. Now we note that this covariation is “... as expected since they [speed and accuracy] are derived from the same data.” Note per above that accuracy is computed as target / target + distractor looking; even so, your observation is correct: slower looking at the target means lower accuracy at least to some degree.

      Page 19. If you excluded data from trials with less than 50% of timepoints, how did this vary across age? Arguably, your study has to worry less about this, given your sample size, but it would be nice to know, which you could include in the percentage of data that each study contributed to the final sample.

      Thanks, we’ve added this information to a new table in S1.

      Reviewer #2 (Public review):

      First, I wasn’t entirely clear about what the authors meant by “word recognition ability”. For much of the manuscript (including the use of the term “word recognition ability” itself), this comes across as an intrinsic ability or skill that improves with development. Alternatively, the speed and accuracy metrics taken from studies in Peekbank might capture children’s increasing knowledge of the common, concrete words typically used in these studies. To me, this is a somewhat different construct from a general skill at recognizing words. It would be helpful if the authors could clarify which construct they intend to capture, or if it is not possible to distinguish between these constructs from the Peekbank data.

      In response to this comment and related comments above, we’ve added text to the first two paragraphs trying to clarify the general construct that we’re talking about – recognizing the meaning of a word in real-time language comprehension. We’ve also clarified several times throughout the introduction that we’re talking about familiar word recognition, that is, the ability to recognize specific known words. Further, we directly acknowledge the issue above in the introduction:

      “Critically, most word recognition paradigms use words that children at the target age are reported to understand and produce. They are thus not indices of vocabulary size but rather measures of how quickly and accurately the child can recognize a familiar spoken word and use it to guide their visual attention to a referent. However, it is unknown the extent to which specific responses reflect an individual child's general speed of language processing versus their familiarity of specific words.”

      Second, and relatedly, if the source of the age-related improvements is increasing experience with the common concrete words used in the Peekbank studies, then one might expect word recognition and improvements with age to be related to word frequency, given that more frequent words are experienced more often. Word frequency predicts word knowledge when assessed using CDI data. Can effects of frequency be detected in Peekbank word recognition metrics? If not, why? Similarly, is the speed and accuracy of word recognition in Peekbank data related to CDI-derived word age of acquisition, and again, if not, why?

      This is a fascinating set of ideas, and one that we’ve pursued extensively using the Peekbank data. Unfortunately, we think it is out of scope for the current paper, which focuses on child-level metrics (including vocabulary and processing measures). Right now the current paper doesn’t include any analysis of individual words.

      Just to expand a bit on the problem here: unfortunately, modeling word recognition as a simple linear function of (log) word frequency is only possible in the case that distractors are held constant (e.g., “ball” always has “book” as its distractor), because distractor frequency plays an important role in the recognition process. However, in our dataset, words are paired with many different distractors across studies. This property means a fairly complex model of the LWL decision process would be necessary for a model to successfully predict effects for individual words. While such a model is an exciting research goal, it’s not something we can include in the current manuscript.

      Finally, there is a bit of a risk of the main findings of this paper coming across as a foregone conclusion. I.e., how could it be otherwise that word recognition improves with development?

      Reviewer #2 (Recommendations for the authors):

      Regarding the feedback about the risk of the findings coming across as a foregone conclusion - perhaps a primary place in the paper where it would be useful to clarify this point is on page 6, in the paragraph beginning, “We investigate two specific hypotheses here. First, one influential theory...”. Here, it might be worth clarifying whether there are alternative ideas about the emergence of word recognition in childhood that predict different patterns, so that the findings of the current paper can be framed as shedding new light on word recognition in development, rather than a confirmation of the common-sense idea that word recognition must improve over development.

      Thanks, we appreciate this feedback and it’s something we’ve struggled with in this project. Our conclusion is that this paper does not constitute a binary hypothesis test of e.g., whether word recognition is linked to vocabulary development. Instead, we lean into the idea that there are empirical issues (rather than hypotheses) that have not been quantified sufficiently. Thus, we end the revised introduction with the following paragraph:

      “Across both of these issues, the contribution of our work here lies in the detailed quantitative description of development. Nearly every theory of language learning assumes some role for continuous developmental change in word recognition, but these assumptions have not previously been anchored to specific measurements. Hence neither the functional form of the assumed changes nor their concurrent and predictive relationships to vocabulary have been quantified. We leverage the Peekbank dataset to accomplish these goals.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to express our deep appreciation to the editor and reviewers for their constructive comments and suggestions, which have significantly improved the quality of our manuscript. In response, we have carefully revised the manuscript, addressed all comments, and performed additional experiments and analyses to strengthen our findings.

      (1) We repeated retrograde tracing using CTB-647 to verify precise targeting of SPN and DGC neurons, as shown in the new Figure 7.

      (2) We performed dual retrograde tracing combined with fiber photometry or optogenetic activation to investigate the role of PMC dual-projecting neurons in the control of urination, as shown in Figure supplements 11 and 12.

      (3) We conducted new experiments activating PMC<sup>ESR1+</sup> neurons after PDNx to assess their role in urination, as shown in new Figure 6.

      (4) We added a more detailed analysis of the dynamics of neural responses in PMC<sup>ESR1+</sup> neurons in Figure supplements 3F-3G.

      (5) We analyzed peak Ca<sup>2+</sup> signals in the PMC during and after the onset of EMG bursting, as shown in Figure supplement 4F.

      (6) We added a comparison of spontaneous and light-induced spikes in PMC<sup>ESR1+</sup> neurons, as shown in Figure supplements 3B–3C.

      (7) We expanded the Discussion to address how PMC<sup>ESR1+</sup> neurons coordinate bladder contraction and sphincter relaxation to control both the initiation and suspension of urination.

      We hope these revisions meet the reviewers' expectations and contribute to the improvement of our manuscript.

      Reviewer #1 (Public review):

      Summary:

      Urination requires precise coordination between the bladder and external urethral sphincter (EUS), while the neural substrates controlling this coordination remain poorly understood. In this study, Li et al. identify estrogen receptor 1-expressing neurons (ESR1+) in Barrington's nucleus as key regulators that faithfully initiate or suspend urination. Results from peripheral nerve lesions suggest that BarEsr1 neurons play independent roles in controlling bladder contraction and relaxation of the EUS. Finally, the authors performed region-specific retrograde tracing, claiming that distinct populations of BarEsr1 neurons target specific spinal nuclei involved in regulating the bladder and EUS, respectively.

      Strengths:

      Overall, the work is of high quality. The authors integrate several cutting-edge technologies and sophisticated, thorough analyses, including opto-tagged single unit recordings, combined optogenetics, and urodynamics, particularly those following distinct peripheral nerve lesions.

      We are grateful for your insightful and constructive comments, which affirmed the importance and technical depth of our work. Thank you for dedicating your expertise and time to reviewing our manuscript. Guided by your suggestions, we have revised the paper as detailed below.

      Weaknesses:

      (1) My major concern is the novelty of this study. Keller et al. 2018 have shown that BarEsr1 neurons are active during urination and play an essential role in relaxing the external urethral sphincter (EUS). Minimally, substantial content that merely confirms previous findings (e.g. Figures 1A-E; Figures 3A-E) should be move to the supplementary datasets.

      Thank you for this valuable and constructive comment. We fully agree that the novelty of our study relative to Keller et al., 2018 must be made explicit. Keller et al. established that PMC<sup>ESR1+</sup> neurons are active during socially evoked urine-marking behavior (voluntary urination) and demonstrated their essential role in relaxing the EUS. Their study mainly focused on behavioral context and EUS relaxation. In contrast, our work addresses a distinct, mechanistic question: how these same neurons participate in reflexive, physiological urination and coordinate both bladder detrusor contraction and EUS relaxation.

      Novel aspects of the present study:

      (1) Temporal dynamics of PMC<sup>ESR1+</sup> neurons during reflexive micturition.

      Using opto-tagging and single-unit recordings, we reveal the precise firing pattern of PMC<sup>ESR1+</sup> neurons during reflexive voiding. Simultaneous fiber photometry, cystometry, and EUS-EMG recordings demonstrate that population-level activity of PMC<sup>ESR1+</sup> neurons precedes and tightly correlates with both bladder contraction and EUS relaxation a coordination not previously demonstrated.

      (2) Causal role in reflexive urination.

      Manual closed-loop optogenetic inhibition at the onset of reflexive voiding acutely terminates EUS bursting and bladder contraction, immediately halting urine release.

      (3) Dual control of bladder and EUS.

      Optogenetic activation combined with selective pelvic or pudendal nerve transection shows that PMC<sup>ESR1+</sup> neurons drive both bladder contraction and EUS relaxation, revealing a coordinating role beyond EUS relaxation alone.

      (4) Anatomical substrate for coordinated control of bladder contraction and EUS relaxation in reflexive urination.

      Retrograde tracing identifies three spinal-projecting sub-populations: SPN-only, DGC-only, and dual-targeting neurons, providing a circuit-level explanation for the simultaneous control of bladder and EUS.

      Following your suggestion, panels that merely replicate Keller et al. (former Figures 1A–1E and Figures 3A–3E) have been moved to new Figure Supplements 1 and 7, respectively, so that the main figures now emphasize the new mechanistic findings.

      (2) I also have concerns regarding the results showing that the inactivation of BarEsr1 neurons led to the cessation of EUS muscle firing (Figures 2G and S5C). As shown in the cartoon illustration of Figure 8, spinal projections of BarEsr1 neurons contact interneurons (presumably inhibitory) that innervate motor neurons, which in turn excite the EUS. I would therefore expect that the inactivation of BarEsr1 should shift the EUS firing pattern from phasic (as relaxation) to tonic (removal of relaxation), rather than stopping their firing entirely. Could the authors comment on this and provide potential reasons or mechanisms for this finding?

      Thank you for this crucial comment. We apologize that the representative EUS-EMG traces in Figures 2G and S5C were too small to be clearly seen and that the corresponding results description was not sufficiently accurate. We have now replaced these EMG traces with enlarged versions (revised Figures 2G and S5C) and revised the corresponding Results section (lines 184, 197, 340-341). Based on the enlarged traces, we found that acute photoinhibition of PMC<sup>ESR1+</sup> neurons at the onset of phasic EUS-EMG bursting shifted the EUS firing pattern from large-amplitude phasic bursts to low-amplitude tonic firing. This suggests that ongoing activity of PMC<sup>ESR1+</sup> neurons is required to maintain phasic EUS bursting. A similar shift from phasic to tonic EUS-EMG activity during optogenetic silencing of PMC<sup>ESR1+</sup> neurons was reported by Keller et al., 2018 (Figure supplement 8C), confirming the reproducibility of the phenotype. We propose that the potential mechanism of this low-amplitude tonic activity may be mediated in part by a spinal reflex pathway (the guarding reflex) for preventing urination, whereby the loss of PMC<sup>ESR1+</sup> neurons-mediated supraspinal facilitation reduces inhibition of spinal interneurons, leading to enhanced baseline excitability of EUS motor neurons in response to bladder afferent input during bladder distension (William C. de Groat et al., Comprehensive Physiology. 2015, PMID: 25589273).

      (3) Current evidence is insufficient to support the claim that the majority of BarEsr1 neurons innervate the SPN but not DGC. The current spinal images are uninformative, as the fluorescence reflects the distribution of Esr1- or Crh-expressing neurons in the spinal cord, along with descending BarEsr1 or BarCrh axons. Given the close anatomical proximity of these two nuclei, a more thorough histological analysis is required to demonstrate that the spinal injections were accurately confined to either the SPN or the DGC.

      Thank you for raising this important concern. To rigorously verify that our spinal injections were confined to either the SPN or the DGC, we performed new retrograde-tracing experiments in ESR1-Cre and CRH-Cre mice. We injected a mixture of AAV-Retro-DIO-mCherry or AAV-Retro-DIO-EGFP with the retrograde tracer CTB-647 specifically into the SPN or DGC (Methods, lines 465-466). Only animals in which CTB-647 fluorescence was strictly limited to the target nucleus, without detectable spread to the adjacent region, were included in the analysis (new Figures 7A and 7E). These results confirm our original observation that PMC<sup>ESR1+</sup> neurons comprise three distinct spinal-projection subpopulations: one (19.0%) targeting the SPN, one (52.2%) innervating the DGC, and a third (28.8%) projecting to both regions (Results, lines 304–306; new Figures 7F–7H). In addition, the majority of PMC<sup>CRH+</sup> neurons project to the SPN but not the DGC (new Figures 7B–7D; Results, lines 297–301). We have assembled new Figure 7 using the newly acquired spinal images and the validated data.

      Reviewer #1 (Recommendations for the authors):

      From the abstract: "Anatomically, PMCESR1+ cells possess two subpopulations projecting to either the pelvic or pudendal nerve". I don't think these neurons directly project to either nerve.

      Thank you for this precise comment. We apologize for incorrectly stating that PMC<sup>ESR1+</sup> cells project directly to the pelvic or pudendal nerves. In the revised Abstract (lines 32–36) we have rephrased the sentence to clarify the actual anatomy: “Anatomically, PMC<sup>ESR1+</sup> neurons consist of three distinct spinal-projection-based subpopulations: one targeting the sacral parasympathetic nucleus (SPN), one innervating the dorsal gray commissure (DGC), and a third that projects to both regions, thereby enforcing the coordination of bladder contraction and sphincter relaxation in a rigid temporal sequence.”. We trust this revision now accurately reflects the anatomical findings.

      Reviewer #2 (Public review):

      Summary:

      The authors have performed a rigorous study to assess the role of ESR1+ neurons in the PMC to control the coordination of bladder and sphincter muscles during urination. This is an important extension of previous work defining the role of these brainstem neurons, and convincingly adds to the understanding of their role as master regulators of urination. This is a thorough, well-done study that clarifies how the Pontine micturition center coordinates different muscle groups for efficient urination, but there are some questions and considerations that remain.

      Strengths:

      These data are thorough and convincing in showing that ESR1+PMC neurons exert coordinated control over both the bladder and sphincter activity, which is essential for efficient urination. The anatomical distinctions in pelvic versus pudendal control are clear, and it's an advance to understand how this coordination occurs. This work offers a clearer picture of how micturition is driven.

      We sincerely thank you for highlighting the rigor of our study and for recognizing the advance in understanding how PMC<sup>ESR1+</sup> neurons exert coordinated, anatomically segregated control over bladder and sphincter. We also appreciate the constructive suggestions that helped us further improve clarity, which we address point-by-point below.

      Weaknesses:

      The dynamics of how this population of ESR1+ neurons is engaged in natural urination events remains unclear. Not all ESR1+ neurons are always engaged, and it is not measured whether this is simply variation in population activity, or if more neurons are engaged during more intense starting bladder pressures, for instance. In particular, the response dynamics of single and doubly-projecting neurons are not defined. Additionally, the model for how these neurons coordinate with CRH+ neuron activity in the PMC is not addressed, although these cell types seem to be engaged at the same time. Lastly, it would be interesting to know how sensory input can likely modulate the activity of these neurons, but this is perhaps a future direction.

      Thank you for this insightful comment. First, we agree that not all ESR1+ neurons are consistently engaged during urination (Figure 1B). Because bladder pressure was not measured during the opto-tagging experiments, we cannot determine whether this reflects trial-to-trial variability in population activity or pressure-dependent recruitment of additional neurons. We speculate that stronger starting bladder pressures may recruit a larger subset of ESR1+ neurons, analogous to graded, pressure-dependent recruitment observed in peripheral sensory neurons (Bruns et al., J Neural Eng. 2011, PMID: 21878706; Marshall et al., Nature. 2020, PMID: 33057202).

      Second, using fiber photometry recording and optogenetic activation, we examined the dynamics of dual-projecting neurons in the PMC that were retrogradely labeled from the SPN and DGC. Their activity correlated with bladder contraction and sphincter relaxation, and optogenetic activation sequentially induced these events to trigger urination (see Recommendation #8). Although retrograde labeling captured only a subset of dual-projecting neurons, the results indicate that they coordinate bladder and sphincter activity.

      Third, previous studies suggest that PMC<sup>CRH+</sup> cells are associated with bladder contraction and likely serve as an integration center for context-dependent micturition behavior (Hou et al., Cell. 2016, PMID: 27662084; Ito et al., Elife. 2020, PMID: 32347794). We therefore propose that PMC<sup>CRH+</sup> cells establish the baseline conditions and contextual readiness for voiding, whereas PMC<sup>ESR1+</sup> cells act as the executive command to reliably initiate and execute the event.

      Finally, we agree that sensory inputs likely modulate PMC<sup>ESR1+</sup> neuron activity. Although this falls beyond the scope of the present study, it represents an important avenue for future investigation.

      Reviewer #2 (Recommendations for the authors):

      (1) In the introduction, the authors write that Keller 2018 only showed this ESR1 population to induce EUS relaxation, but those results also do show bladder contraction with photostimulation of this population. While the authors' work extends this finding in important ways, this should be acknowledged (line 60).

      Thank you for this important correction. We have now revised the Introduction to explicitly acknowledge that stimulation of neurons expressing estrogen receptor 1 (ESR1) in the PMC (PMC<sup>ESR1+</sup>) contributes to sphincter relaxation and increased bladder pressure (Introduction, lines 60-62), as originally reported by Keller et al., 2018.

      (2) I think a more detailed analysis of the dynamics of neural responses in the PMC ESR1 neurons would be valuable. For example: are the same cells always engaged before micturition, or do different populations activate on different trials? Can the authors comment on the half of the opto-tagged ESR1 population that is not firing during urination? Do they ever fire? A cell-by-cell analysis of which neurons are engaged over multiple trials would be very valuable to understand the dynamics of population activity. Figure 1H shows cumulative sessions, but what do single sessions look like?

      Thank you for these valuable comments. In response, we have performed refined single-trial analyses of neuronal activity, as detailed in the point-by-point replies below.

      For example: are the same cells always engaged before micturition, or do different populations activate on different trials?

      Among 11 PMC<sup>ESR1+</sup> units that showed urination-related excitation, 8 units exhibited a consistent firing increase in every voiding trial, whereas the remaining 3 increased their discharge in >78 % of trials (Figure 1B; new Figure supplement 3F). Thus, the same PMC<sup>ESR1+</sup> cells are recruited repeatedly, rather than distinct populations being activated on different trials. We have added this clarification to Results (lines 106–108).

      Can the authors comment on the half of the opto-tagged ESR1 population that is not firing during urination? Do they ever fire? A cell-by-cell analysis of which neurons are engaged over multiple trials would be very valuable to understand the dynamics of population activity.

      Approximately half of the opto-tagged PMC<sup>ESR1+</sup> cells showed no increase in firing rate during urination, yet exhibited spontaneous spikes at other times (new Figure supplement 3G), confirming their electrical competence. Because the PMC also participates in defecation, uterine activity, and other pelvic functions (Rouzade-Dominguez et al., Eur J Neurosci. 2003, PMID: 14686905; Schellino et al., Frontiers in Neuroanatomy. 2020, PMID: 33013330; Quaghebeur et al., Auton Neurosci. 2021, PMID: 34391125), these ESR1+ neurons may serve functions other than urination. We have now added this cell-by-cell analysis and discussion to the manuscript (Results, lines 108-112).

      Figure 1 H shows cumulative sessions, but what do single sessions look like?

      As shown in new Figure supplements 3F–3G, single-session raster plots reveal that PMC<sup>ESR1+</sup> neurons display consistent firing patterns across individual trials. Neurons whose firing rate increased during urination did so in most trials (Figure supplement 3F), whereas neurons unrelated to voiding remained silent or showed no discernible rate change during voiding across trials (Figure supplement 3G). These single-session observations are consistent with the cumulative population analysis shown in Figure 1H (new Figure 1B).

      (3) Supplemental Figure 4: It seems clear from this figure that NVCs are only occurring when the sphincter fails to engage. Can the authors quantify how often this is the case?

      Thank you for this important point. We have now quantified the occurrence of non-voiding contractions (NVCs) across all 229 bladder contraction events from 3 mice shown in Supplemental Figure 4. NVCs were observed exclusively when the external urethral sphincter failed to relax, accounting for 62/229 events (27.1 %), whereas coordinated voiding contractions (VCs) occurred in the remaining 167 events (72.9 %). These new data are presented in Figure supplement 4C.

      (4) Continuing from the above point: the authors say that the insufficient top-down drive or strength of activity from PMC ESR1 neurons is why NVCs occur. In looking closely, it also seems there is a small hump and subsequent increase in the calcium signal when the EUS bursting begins (particularly clear in Supplementary Figure 4). Could this instead mean that the bursting/urethral activity itself is feeding back onto the PMC to continue/enhance its activity, and it is instead the lack of sphincter bursting that results in the NVC? Could the authors analyze the signal during and after bursting starts? This model is consistent with one of the classic reflexes defined by Barrington, in which urethral fluid flow/activation enhances bladder contraction. The Figure 4 transection experiments do not fully answer this, as the authors are driving activity in the PMC at this time, but they could test this using PDN transection with fiber photometry recording.

      Thank you for this important point. We fully agree that EUS bursting may provide excitatory feedback to the PMC that sustains or even amplifies its activity, and that the absence of such feedback could underlie NVCs. To test this possibility, we re-analyzed the fiber-photometry traces aligned to the onset and offset of each EUS bursting (new Figure supplement 4). A small but consistent hump in the Ca<sup>2+</sup> signal appeared before bursting onset and the Ca<sup>2+</sup> signal continued to rise throughout the bursting (Figure supplement 4B, yellow arrow). The amplitude at bursting offset was significantly higher than both the NVC peak and the level recorded at bursting onset. These observations support the interpretation that urethral fluid flow/activation supplies excitatory feedback that reinforces PMC activity and bladder contraction, consistent with Barrington’s classic reflex. We have incorporated these new analyses into the revised manuscript (lines 145–155 and Figure supplement 4F).

      We agree that the positive-feedback loop described by Barrington’s classic urethra-to-bladder reflex is an intriguing mechanism. However, the PDN-transection experiment in Figure 4 was designed to determine if bladder contractions triggered by PMC<sup>ESR1+</sup> cells can proceed in the absence of sphincter bursting, not to evaluate this reflex. Incorporating simultaneous fiber-photometry recording into the PDN-transection experiment would therefore go beyond the scope of the present study. In future work we are keen to combine PDN transection with fiber photometry to further determine whether the urethra-to-bladder reflex contributes to the sustained PMC activity observed in our paradigm.

      (5) In Figure 4, is the timing of sphincter engagement different with ChR2 stimulation from what normally occurs? It appears that the bursting happens immediately upon activation whereas bladder contraction is a bit delayed.

      Thank you for this important observation. We have carefully re-examined the EMG traces from all animals shown in Figure 4. We confirm that the onset of sphincter bursting activity during ChR2 stimulation is indeed more rapid than during natural reflex voiding; nevertheless, the onset of phasic sphincter bursting during ChR2 stimulation remained delayed relative to the intravesical pressure rise (see Figure 8B).

      The immediate sphincter discharge visible in some trials was tonic EUS discharge or rare irregular bursting, not the typical EUS bursting. This tonic pattern corresponds to the spinal guarding reflex that suppresses urine leakage (Fowler et al., Nature Reviews Neuroscience. 2008, PMID: 18490916; Keller et al., Nature Neuroscience. 2018, PMID: 30104734). These segments were identified by their amplitude and spectral content and excluded from burst-onset analysis. Our analysis protocol therefore distinguishes tonic guarding activity from true phasic bursting, ensuring that only the latter was used to determine burst timing.

      (6) The explanation on line 299 about how spinal reflexes are impinging on this circuit is confusing. I agree that the bladder contraction stopping later than the EUS signal likely has something to do with spinal reflexes, but it seems this could instead be feedback from the urethral fluid flow, which continues bladder contractions (urethra-destrusor facilitative reflex). Could the authors clarify their thoughts here?

      Thank you for highlighting this ambiguity. We agree that the delayed cessation of bladder contraction could equally reflect either (1) the urethra-to-bladder facilitative reflex driven by ongoing urethral fluid flow or (2) spinal reflexes that we described. In the revised manuscript (Results, lines 343–349), we have re-worded the paragraph to make this dual possibility explicit, thereby avoiding an overly strong emphasis on spinal mechanisms alone.

      (7) A note on phrasing: the authors frequently say PMCESR1 cells drive sphincter relaxation, but then show an effect on sphincter bursting. Experienced readers might realize that relaxation and bursting are connected, but this might be confusing for readers and should be clarified in the text.

      Thank you for highlighting the potential ambiguity. We agree that the sentence “PMC<sup>ESR1</sup> cells drive sphincter relaxation” can seem paradoxical when our data show increased EUS bursting. In adult mice, the EUS does not remain continuously relaxed during voiding; instead, it generates rhythmic bursting composed of high-frequency spike clusters (active periods) alternating with low tonic activity (silent periods), resulting in rhythmic contractions and relaxations of EUS. This phasic activity acts as a pump that facilitates urine flow through the narrow rodent urethra (Kadekawa et al., Am J Physiol Regul Integr Comp Physiol, 2016, PMID: 26818058). The EUS bursting activity we recorded is consistent with the results reported in previous studies (Keller et al., Nat Neurosci, 2018, PMID:30104734; Ito et al., Elife, 2020, PMID:32347794).

      Consequently, when PMC<sup>ESR1</sup> neurons initiate bursting, they simultaneously generate the relaxation phases that separate the spikes. To make this explicit we have replaced the phrase “PMC<sup>ESR1+</sup> cells drive sphincter relaxation” with “PMC<sup>ESR1</sup> neurons trigger EUS bursting, which generates rhythmic sphincter contractions and relaxations.” (Results, page 7, lines 219-221). We have applied similar clarifications throughout the revised manuscript (Results, lines 125-129). We hope this revision eliminates any apparent contradiction.

      (8) The question remains as to which neurons (dual projecting, single projecting, or all?) are active in natural urination. This is possible to do through dual injection of retrograde virus in SPN and DGC that could coordinately turn on Gcamp, but this challenging experiment is perhaps beyond the scope of this paper. Even still, the authors could discuss their model for whether the dual- and single-projecting neurons are all engaged at once in a natural urination event. Do the authors have any data that could provide insight as to when these sub-populations are active? Results from the opto-tagging in Figure 1 (and comment #2 about single neuron firing properties) might provide a foundation for hypotheses or insights.

      Thank you for this valuable suggestion. We have now performed the experiment you proposed: dual injection of retrograde virus (AAV-Retro-Cre and AAV-Retro-DIO-GCaMP6s) in SPN and DGC were used to selectively label PMC dual-projecting neurons, and a 200-µm optic fiber was implanted above the PMC to record their Ca<sup>2+</sup> dynamics during natural urination (Figure supplement 11A and Methods, lines 470–474, 652-655). Dual-projecting neurons exhibited robust activation throughout the entire voiding phase that was tightly correlated with intravesical pressure rise and EUS bursting (Figure supplements 11A–11H). However, technical limits of current retrograde tools preclude selective isolation of single-projecting (SPN-only or DGC-only) subsets for independent fiber-photometry recordings and injection restricted to one target unavoidably labels both single- and dual-projecting cells. We now state this technical limitation explicitly (Discussion, lines 426-430).

      Accordingly, in the revised Discussion (lines 389-406), we integrate fiber-photometry Ca<sup>2+</sup> signals with single-unit data from opto-tagged recordings to propose several testable, non-mutually-exclusive models for how dual- and single-projecting PMC<sup>ESR1+</sup> neurons are engaged during natural urination: “Based on population dynamics obtained by fiber photometry (Figures 1D-1H, Figure supplements 1A-1F, and Figure supplements 11A-11H) and single-neuron firing properties recorded via optrode (Figures 1A-1C), we propose several mechanistic models for the engagement of dual- and single-projecting PMC<sup>ESR1+</sup> neurons during natural micturition. One possibility is that all three populations (dual-projecting, SPN-projecting and DGC-projecting neurons) are co-activated, with the dual-projecting subset acting as a “bridging amplifier” that sustains rising bladder pressure while coordinating EUS relaxation. Alternatively, SPN-projecting neurons may be recruited first to initiate bladder contraction, followed by DGC-projecting neurons that evoke EUS bursting and facilitate urine entry into the urethra; once flow begins, the urethro-detrusor facilitative reflex could recruit dual-projecting neurons to further enhance voiding efficiency. In addition, contextual or state-dependent urination—such as scent-marking behavior characterized by multiple voiding events with smaller volumes than reflexive urination—may predominantly rely on sequential and cooperative activation of single-projecting neurons. Other recruitment sequences remain conceivable. Future studies combining diverse urination-related behavioral paradigms with simultaneous recordings from projection-specifically labeled PMC neurons will be required to validate and refine these models.”

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al explored the role of Estrogen receptor 1 (Esr1) expressing neurons in the pontine micturition center (PMC), a brainstem region also known as Barrington's nucleus (Hou et al 2016, Keller et al 2018). First, the author conducted bulk Ca2+ imaging/unit recording from PMCESR1 to investigate the correlations of PMCESR1 neural activity to voiding behavior in conscious mice and bladder pressure/external urethral muscle activity in urethane anesthetized mice. Next, the authors conducted optogenetics inactivation/activation of PMCESR1 to confirm the contribution to the voiding behavior also conducted peripheral nerve transection together with optogenetics activation to confirm the independent control of bladder pressure and urethral sphincter muscle.

      We sincerely thank you for providing a thoughtful summary and insightful comments on our study.

      Weaknesses:

      (1) The study demonstrates that pelvic nerve transection reduces urinary volume triggered by PMC ESR1+ cell photoactivation in freely moving mice. Could the role of pudendal nerve transection also be examined in awake mice to provide a more comprehensive understanding of neural involvement?

      Thank you for this valuable suggestion. We conducted an additional experiment to determine the contribution of the pudendal nerve to PMC<sup>ESR1+</sup> neuron-driven voiding in awake mice. Bilateral pudendal nerve transection (PDNx) reduced the optogenetically evoked urine volume compared with sham-operated controls, yet photoactivation of PMC<sup>ESR1+</sup> neurons still reliably induced urination after PDNx (new Figure 6). Thus, bilateral integrity of the pudendal nerve is required for efficient PMC<sup>ESR1+</sup> neuron-driven voiding, most likely by transmitting the signals that entrain rhythmic EUS bursting. These data and experimental details have been incorporated into Figure 6, Results (lines 272–276), and Methods (lines 542–545).

      (2) While the paper primarily focuses on PMCESR1+ cells in bladder-sphincter coordination, the analysis of PMCESR1+-DGC/SPN neural circuits - given their distinct anatomical projections in the sacral spinal cord - feels underexplored. How do these circuits influence bladder and sphincter function when activated or inhibited? Also, do you have any tracing data to confirm whether bladder-sphincter innervation comes from distinct spinal nuclei?

      Thank you for this critical comment. To determine how PMC<sup>ESR1+</sup> neurons that target distinct sacral nuclei influence bladder–sphincter coordination, we first focused on the dual-projecting subset in a new experiment (Figures supplement 11 and Methods, lines 470–477, 652-655, 669-673). Dual retrograde virus injections into SPN and DGC selectively labelled PMC dual-projecting neurons, a subset of which are ESR1+. Fiber-photometry recordings showed that these cells were active during bladder contraction and sphincter relaxation (Figure supplements 11E-11H), whereas optogenetic activation reliably initiated urination: bladder pressure rose immediately and was followed by rhythmic EUS bursting (Figure supplements 11I-11N and 12B; Results, lines 309-313, 332-335). Thus, the dual-projecting sub-population is sufficient to coordinate bladder contraction with sphincter relaxation. Current retrograde tools do not allow selective isolation of single-projecting (SPN-only or DGC-only) subsets; injecting only one target unavoidably labels both single- and dual-projecting cells. Consequently, we cannot yet compare the functional impact of pure SPN-only versus DGC-only PMC populations. This limitation is now stated explicitly in the revised Discussion (lines 426–430).

      In our 2025 paper (Yan et al., Commun Biol, 2025, PMID: 40259086), we used PRV-based retrograde tracing to show that SPN and DGC constitute two separate spinal nuclei controlling the bladder and the EUS, respectively. Classic studies have reached the same conclusion (Yao et al., Nat Neurosci, 2018, PMID: 30361547; Karnup & De Groat, IBRO Reports, 2020, PMID: 32775758; Karnup, Auton Neurosci, 2021, PMID: 34391124). These citations and a concise summary have been added to the Results (lines 289–294).

      (3) Although the paper successfully identifies the physiological role of PMCESR1+ cells in bladder-sphincter coordination, the study falls short in examining the electrophysiological properties of PMC ESR1+-DGC/SPN cells. A deeper investigation here would strengthen the findings.

      Thank you for this thoughtful suggestion. While a detailed electrophysiological characterization of PMC<sup>ESR1+-DGC/SPN</sup> neurons would provide complementary information, the primary goal of the present study was to define the in vivo functional dynamics and behavioral role of these neurons during natural urination. As you suggested, further electrophysiological analysis of PMC<sup>ESR1+-DGC/SPN</sup> neurons will be an important direction for our future work.

      (4) The parameters for photoactivation (blue light pulses delivered at 25 Hz for 15 ms, every 30 s) and photoinhibition (pulses at 50 Hz for 20 ms) vary. What drove the selection of these specific parameters? Moreover, for photoactivation experiments, the change in pressure (ΔP = P5 sec - P0 sec) is calculated differently from photoinhibition (Δpressure = Ppeak - Pmin). Can you clarify the reasoning behind these differing approaches?

      Thank you for this opportunity to clarify our experimental design. The photoactivation protocol (25 Hz, 15 ms pulses) was chosen because PMC<sup>ESR1+</sup> neurons faithfully follow this frequency without depolarisation block and it reliably triggers voiding (Keller et al., Nat Neurosci, 2018, PMID:30104734). For photoinhibition we originally stated “50 Hz, 20 ms pulses”, but this was an error. Consistent with the same study (Keller et al., Nat Neurosci, 2018, PMID:30104734), we used continuous light (constant illumination) to maintain sustained suppression. The Methods section has been corrected (lines 659-661, 690-691).

      The ΔP formula was tailored to the temporal profile of each manipulation. For activation, ΔP (P<sub>5 sec</sub> - P<sub>0 sec</sub>) captures the rapid pressure rise after light onset; the same window was used in (Hou et al., Cell. 2016, PMID: 27662084). For inhibition, because saline infusion produces rhythmic reflex voiding, we delivered light at the onset of EUS bursting (i.e. when pressure was already at ~peak). Inhibition abruptly stops the bladder contraction, so the bladder cannot return to its pre-void baseline. The Δpressure (P<sub>peak</sub> – P<sub>min</sub>) was therefore used to quantify the extent to which the ongoing pressure wave was aborted by photoinhibition. P<sub>min</sub> is the lowest value reached before the next infusion-driven upswing, making the metric insensitive to the slow baseline drift produced by continuous infusion. These clarifications have been added to the Methods (Methods, lines 676-677, 679-680, 692-693).

      (5) The discussion could further emphasize how PMCESR1+ cells coordinate bladder contraction and sphincter relaxation to control urination, highlighting their central role in the initiation and suspension of this process.

      Thank you for this valuable comment. We have revised the Discussion to emphasize that PMC<sup>ESR1+</sup> neurons coordinate urination by sequentially driving bladder contraction followed by sphincter relaxation through their dual projections to the SPN and DGC. We also emphasized that this coordination is essential for the initiation and effective execution of voiding (Discussion, lines 369-388). In addition, in the revised Discussion (Discussion, lines 389-406), we integrate fiber-photometry Ca<sup>2+</sup> signals with single-unit data from opto-tagged recordings to propose several testable, non-mutually-exclusive models for how PMC<sup>ESR1+</sup> cells are engaged during natural urination.

      (6) In Figure 8, The authors analyze the temporal sequence of bladder pressure and EUS bursting during natural voiding and PMC activation-induced voiding. It would be acceptable to consider the existence of a lower spinal reflex circuit, however, the interpretation of the data contains speculation. Bladder pressure measurement is hard to say reflecting efferent pelvic nerve activity in real time. (As a biological system, bladder contraction is mediated by smooth muscle, and does not reflect real-time efferent pelvic nerve activity. As an experimental set-up, bladder pressure measurement has some delays to reflect bladder pressure because of tubing, but EUS bursting has no delay.) Especially for the inactivation experiment, these factors would contribute to the interpretation of data. This reviewer recommends a rewrite of the section considering these limitations. Most of the section is suitable for the results.

      We agree with the reviewer that bladder pressure, mediated by smooth muscle contraction, provides an indirect measure of efferent pelvic nerve activity and is subject to both physiological and experimental delays. Regarding potential delay from the tubing system, pressure propagates in fluid at approximately 1000 m/s (Kela & Pekka, Proceedings of World Academy of Science Engineering & Technology, 2009, DOI: 10.5281/zenodo.1080526). Given that the total tubing length in our setup is 0.5-1 meter, this gives an estimated transmission delay of only 0.5-1 ms. However, this delay is negligible compared with the observed time difference (~700 ms) between the cessation of EUS bursting and the termination of bladder contraction. Theoretically, pressure transmission is not expected to introduce a temporal delay. However, we cannot exclude the possibility that the pressure measurement itself may impose such a delay, because bladder pressure does not necessarily reflect efferent pelvic nerve activity in real time. Future studies using simultaneous recordings of bladder pressure and pelvic nerve discharges will help clarify whether a true temporal delay exists. Nevertheless, we agree that additional physiological or peripheral factors may also contribute to this difference in timing. As suggested by the reviewer, we have revised the discussion to consider the potential influence of other factors, such as urethra-detrusor facilitative reflex (Results, lines 343-349).

      Reviewer #3 (Recommendations for the authors):

      (1) In opto-tag experiments, a comparison of average AP waveform during behavior and during light stimulation should be included as criteria. It should be mostly the same waveform.

      Thank you for bringing this to our attention. We have now added this comparison as an inclusion criterion in the revised manuscript. Figure supplement 3B shows representative examples of the average waveforms, and Figure supplement 3C displays the distribution of correlation coefficients between spontaneous and light-evoked spikes for all recorded PMC<sup>ESR1+</sup> units, all of which exhibited r > 0.8.

      (2) Optical fiber implantation seems to be done in two different methods. In Figure 1 and Figure 2, the fiber tip is positioned just above PMC but in Figure 3 it seems to be angled. The information should be included in the Methods section.

      Thank you for this important comment. We have now clarified in the Methods that for Figures 1 and 2, the optical fibers were implanted vertically above the PMC, whereas for Figure 3, the left optical fiber was implanted at a 33° lateral angle targeting the PMC (Methods, lines 499-503).

      (3) In the closed-loop inhibition experiments of Figure 2, the parameters to start closed-loop photo-inactivation were not described in the method. If it is a manual closed loop, it should be described clearly.

      Thank you for raising this important point. We apologize for omitting these details in the original Methods. We have now added a complete description of the manual closed-loop photo-inhibition protocol, including the triggering criteria and operator-controlled timing, in the revised Methods section (lines 602–605).

      (4) In Figure 7A/E the authors provide a spinal cord image to show the injection site, but the image is misleading. The figure only shows AAV-infected CRH/ESR1 neurons in the spinal cord section. It does not indicate the AAV injection site or the terminal distribution.

      Thank you for your important comment. We apologize for providing a spinal cord image that did not accurately depict the injection site. To rigorously verify that our spinal injections were confined to SPN or DGC, we performed new retrograde-tracing experiments in ESR1-Cre and CRH-Cre mice. A mixture of AAV-Retro-DIO-mCherry or AAV-Retro-DIO-EGFP with the retrograde tracer CTB-647 was injected specifically into SPN or DGC. Only animals in which CTB-647 fluorescence was strictly limited to the target nucleus, without spread to the adjacent region, were included (new Figures 7A and 7E). These data confirmed our original observations and have been pooled in Figure 7. The manuscript and figure have been updated accordingly (Results, lines 297-301, 304-306; Methods, lines 465–466).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank both editors and the three reviewers for their constructive criticism of our work. As a result of these comments, we have made several significant revisions to the paper that we believe strengthen and clarify our major results:

      (1) Following suggestions from Reviewers #1 and #3, we have have improved our introduction to the different fitness concepts (lines 105–148) and streamlined the discussion of the logit encoding (lines 175–190). In particular, we have moved the most technical points to the SI (Sec. S3).

      (2) Based on criticisms of our usage of the population dynamics model from Reviewers #1 and #3, we significantly revised our explanation of the motivation and interpretation of this model (lines 284–310 and 323–336) and our discussion of the generalizability of these results (lines 678–728), including the possible effects of interactions besides resource competition.

      (3) Following a request from Reviewer #3, we have expanded our analysis of epistasis to systematically test all possible double mutants between qualitative types of trait perturbations in the model. We have added a new main text figure (Fig. 3), new SI figures (Figs. S9–S15), a new subsection in the Results (lines 344–395), and corresponding new sections in the Methods (lines 864–892) and SI (Sec. S8).

      (4) Following concerns from Reviewers #2 and #3 about the limited empirical data, we have expanded our analysis of the LTEE data (new main text Fig. 4, revised text on lines 416–439, and revised SI Figs. S16–S18) and have analyzed two new benchmarking datasets for bulk fitness to test our predictions (new main text Fig. 6, new Results subsection on lines 561–590, and new SI Figs. S24 and S25).

      (5) Following the criticism of Reviewer #3 about the lack of a clear recommendation on fitness quantification that provides the greatest value for a given scientific question, we have better explained what we think the scientific consequences of fitness are as a motivation for our analysis (lines 82–88, 319–322, and 615–630) and replaced the final flowchart figure with a step-by-step guide in the Methods to implement our recommendations in practice (lines 964–982).

      Reviewer #1 (Public review):

      The authors point out that the fitness estimates obtained from different experimental assays (monoculture, pairwise competition, or bulk competition) are not generally equivalent, not even with regard to the fitness ranking of different genotypes. Using a computational model based on experimentally measured growth phenotypes for knockout strains in yeast, as well as data from Lenski’s Long Term Evolution Experiment (LTEE), they derive a set of best practice rules aimed at extracting the optimal amount of information from such experiments.

      The study is very complete on a technical level and I have no suggestions for further analyses. However, I feel the readability and the conceptual focus of the manuscript could be significantly improved by rearranging the material with regard to the contents of the main text vs. the Methods and the Supplement. Detailed recommendations:

      (1) Regarding readability, the large number of references to material in the Methods and Supplement fragment the main text and make it difficult to follow.

      We understand the challenges these references pose to the flow of the main text; we have attempted to keep those references to a minimum, while ensuring that technical details of the work are fully documented and referenced for completeness.

      (2) Conceptually, it seems to me that the current presentation obscures the reasons why we should care about fitness in the first place. In the first paragraph of Results, the authors define fitness “as any number that is sufficient to predict the genotype’s relative abundance x(t) over a short-time horizon”. To me, this seems like an extremely narrow and not very interesting definition. Instead, I view fitness as an intrinsic property of a genotype that allows us to predict its performance under a range of conditions, including in particular conditions that are different from the experimental setup that was used to obtain the fitness estimates. The latter viewpoint is well expressed in Supplementary Section S1, where the authors discuss the notion of fitness potential. I would recommend to move at least part of this discussion to the main text.

      We appreciate the reviewer’s viewpoint and have moved that conceptual discussion from the SI to the beginning of the Results section to give readers a broader perspective on fitness (lines 105–148). We use “potential” in analogy with potential energy in physics and have clarified this on lines 126–135.

      What we call fitness potential, like the other notions of fitness we discuss in this paper (relative and absolute fitness), is still specific to an environmental condition. Fitness as a property intrinsic to a genotype and independent of any environment, as the reviewer mentions, is an interesting concept but beyond the scope of this paper, which is focused on analyzing fitness measurements that are inevitably environment-specific and we have clarified this on lines 142–148. While it is true that this definition of fitness is narrow, it is what can be empirically measured directly, and thus we believe it is crucial to understand how to best interpret that data.

      By comparison, the arguments in favor of the logit encoding that currently opens the Results session are rather straightforward and could be shortened significantly.

      We agree and have condensed this section (lines 175–192).

      (3) Similarly, the modeling strategy used in this work is quite subtle and needs to be explained more fully in the main text. The authors use growth traits (lag time, growth rate, and yield) extracted from monoculture experiments on a yeast knockout collection and feed them into a specific mathematical model to simulate pairwise and bulk competition scenarios. Since a key claim of the work is that monoculture experiments are generally poor predictors of competitive fitness, the basis for this conclusion and the assumptions on which it is based need to be described clearly in the main text. In the current version of the manuscript, this information has been largely relegated to the Methods section.

      We agree that our motivation for the population dynamics model and growth curve data was not clearly explained. We have significantly revised this section of the Results in the main text (lines 284–310).

      In particular, we recognize the potential for misunderstanding this material we do not intend the relative fitness values calculated from this model to be interpreted as predictions of the true relative fitness between yeast deletion strains. Rather, we use the population dynamics model for our proof of principle: that the most basic features of microbial population dynamics in laboratory experiments, as captured by this model (resource competition, lag phase, growth phase, saturation), are sufficient to create discrepancies between common fitness statistics used in these experiments (different encodings, time scales, choices of reference subpopulations). We have added a statement to highlight existing work on monoculture predictors for competition outcomes [32, 34, 36, 37] on lines 453–459.

      Reviewer #1 (Recommendations for the authors):

      In the discussion of the LTEE in Section S8, the authors write on page 8 that “we couldn’t fit the fitted values a,b in ref. 29 so we were unable to check it”. I don’t understand this sentence - is the claim that the fit in ref. 29 was incorrect?

      We have clarified this point in the SI (now Sec. S9). Our point was not that the fit in Wiser et al. 2013 is incorrect, but merely that we could not find the exact values of the fitted parameters they obtained documented in their paper, so we could not compare our own fitted parameters directly to theirs.

      Also, at the end of the section, the authors refer to theory work on the long-term fitness trend in the LTEE. Here, two early references arguing for a logarithmic increase in fitness could be mentioned as well:

      International Journal of Modern Physics B 12,:361-391 (1998) Evolution and Extinction Dynamics in Rugged Fitness Landscapes Paolo Sibani, Michael Brandt, and Preben Alstrøm

      J. Stat. Mech. (2008) P04014 Evolution in random fitness landscapes: the infinite sites model Su-Chan Park and Joachim Krug

      We thank the reviewer for providing these two references and have added them to the list of previous works on long-term fitness trends at the end of the section (now Sec. S9).

      Reviewer #2 (Public review):

      Summary:

      The manuscript “Quantifying microbial fitness in high-throughput experiments” provides a comprehensive analysis of the various approaches to quantifying fitness in microbial evolution, focusing on three primary factors: encoding of relative abundance, time scale of measurement, and the choice of reference subpopulation. The authors systematically explore how these choices impact fitness statistics and provide recommendations aimed at standardizing practices in the field. This manuscript aims to highlight the impact of differing fitness definitions and the methodologies utilized for analysis and how that can significantly alter interpretations of mutant fitness, affecting evolutionary predictions and the overall understanding of genetic interactions in the experiments. Although this manuscript focuses on a critical issue in the quantification of fitness in high throughput experiments, it heavily relies on only one experimental dataset (Warringer et al 2003) and one organism i.e, Yeast (Saccharomyces cerevisiae) grown in a defined medium, the environmental influence is not completely captured. While the theoretical framework is strong, more experimental examples with more organisms (i.e., more datasets) in their analysis and comparison would enhance the manuscript, especially its conclusion.

      We have expanded our analysis of competition data from the Long-Term Evolution Experiment in E. coli (lines 416– 439), including adding a main text figure (Fig. 4) along with the three SI figures (Figs. S16–S18). We have also added two completely different data sets that directly test our predicted discrepancies in fitness estimates from bulk competition experiments. From this data we have added a new main text figure (Fig. 6), two new SI figures (Figs. S24 and S25), and a new section at the end of the Results (lines 563–590).

      We wish to clarify, though, that the aim of this study is to develop theory on fitness quantification choices and minimal examples to demonstrate the potential for discrepancies between these choices. While we appreciate the reviewer’s interest in understanding how discrepancies in fitness statistics vary across organisms and environments, that is an empirical question beyond the scope of this paper.

      Strengths:

      The choices for quantifying fitness in evolution experiments are critical and highly relevant given the increasing prevalence of high-throughput experiments in evolutionary biology. The authors methodically categorize fitness statistics and their implications, providing clarity on a complex subject. This structured approach aids in understanding the nuances of fitness measurement. The manuscript effectively highlights how different choices in fitness measurement can influence fitness rankings and the understanding of epistasis, which is important for modeling evolutionary dynamics.

      Weaknesses:

      The theoretical framework is robust, but the manuscript could benefit from more empirical examples to illustrate how different fitness quantification methods lead to varied conclusions in experiments.

      Please see our response to the previous comment on this point.

      The discussion on the choice of reference subpopulation could be expanded with the influence of the environment or the condition. Different types of reference groups might yield different implications for fitness calculations, and further elaboration would enhance this section.

      While we agree that studying how environmental conditions affect fitness is an important and interesting problem, it goes beyond the scope of this paper, which focuses on the basic theory of quantifying microbial fitness from highthroughput experiments. Applications of this theory to empirical questions about environmental variation would be best served by their own studies. We have added a statement clarifying this goal (lines 144–148).

      We are unsure how the choice of reference subpopulation is related to this issue. In our view, if the goal of a mutant fitness measurement is to predict how that mutant would behave when arising spontaneously and competing against its immediate ancestor, the gold-standard reference subpopulation must always be the mutant’s immmediate ancestor, or another mutant that is known to be phenotypically equivalent to the ancestor (e.g., neutral mutants in the case of a large mutant library). Other choices of reference subpopulations would not provide directly meaningful information in this regard.

      The authors overgeneralize some findings; for instance, the implications of fitness measurement choices could vary significantly across different microbes or experimental conditions. A more detailed discussion would strengthen the conclusion.

      We certainly agree that the consequences of fitness quantification choices could vary significantly across organisms and environments; our goal for this paper is to demonstrate what discrepancies are possible in principle and in particular how they depend on basic features of microbial population dynamics (e.g., variation in yield). We have added two separate paragraphs in the Discussion section to address the generalizability of our results in the context of pairwise (lines 678–710) and bulk fitness measurements (lines 711–728).

      Overall, this manuscript is a significant contribution to the field of evolutionary biology, addressing a critical issue in the quantification of fitness but lacks more experimental support to make it a wider claim. By systematically exploring the factors that influence fitness measurements, the authors provide valuable insights that can guide future research - the framework is computationally thorough but needs a more detailed explanation of concepts instead of generalizing.

      We have improved our explanation of several of the important concepts. In particular, we have significantly revised our explanation of the population dynamics model (lines 284–310) to emphasize its role as a null model to demonstrate how fundamental aspects of microbial growth are sufficient to cause discrepancies between fitness statistics. We have also revised two paragraphs on the generalizability of our results in the Discussion section (lines 678–728).

      Further work is needed, particularly to incorporate empirical examples and expand certain discussions to include environmental variation and their impact, which would improve clarity and applicability.

      We have added a sentence at the beginning of the Results section to acknowledge the environmental dependence of fitness (lines 142–148). We believe further discussion of that issue is beyond the scope of this paper, as it would require a significant amount of additional data and/or environmental modeling.

      Reviewer #2 (Recommendations for the authors):

      In addition to the comments from the previous sections, other specific comments:

      (1) Figure 5 needs to be populated with additional parameter details. For example, include brief descriptions of each parameter involved in the encoding, time scale, and reference choices. This will help users understand the implications of each choice. Adding these details will make the flow diagram more comprehensive, aiding researchers in implementing these steps more clearly.

      Following this comment and another comment about this figure from Reviewer #3, we decided to replace this figure with a new Methods section with step-by-step instructions (lines 964–982).

      (2) Duplication in Line 620: “Nevertheless, the fact that we see the fact that we see...” This redundancy needs to be corrected.

      We thank the reviewer for pointing this out; we have rewritten this paragraph.

      (3) More experimental data comparisons and their assessment concerning various microbial systems and multiple environmental conditions are recommended to support the claim.

      Please see our responses to the related public comments.

      Reviewer #3 (Public review):

      Summary:

      The authors present analyses of different fitness measures derived from empirical data from yeast knockout mutants and the long-term evolution experiment (LTEE) with Escherichia coli to explore discrepancies and identify preferred methods to estimate relative fitness in high-throughput experiments. Their work has three components. They first discuss the different “encodings” of relative abundance data and conclude that logit transformations are preferred because they transform nonlinear abundance trajectories into linear trajectories with greater predictive power. Next, they compare per-generation with per-growth cycle relative fitness estimates inferred from simulations of pairwise competitions based on published growth traits for the yeast strains and on published pairwise competition measurements for the LTEE data. Both data sets show quantitative and qualitative (i.e. rank order) discrepancies of estimates across different time scales, which are highlighted by considering possible underlying causes (i.e. trade-offs between growth traits) and consequences (i.e. epistasis among mutations affecting different growth traits). Finally, the authors compare simulated pairwise and bulk (i.e. where many mutants compete during a growth cycle in a single environment) competition assays based on the yeast knock-out mutants and demonstrate an optimal ratio of collective mutants to wild-type strains that minimizes both sampling error and overestimation of fitness estimates when compared with pairwise competitions.

      Strengths:

      The study deals with a highly relevant topic. Fitness is central to general evolutionary theory, but also poorly defined and implies different traits for different organisms and conditions. For microbes, which are often used in evolution experiments, high-throughput experiments may yield different measures to quantify abundance over time, from individual growth traits to bulk competition experiments. Hence, it is relevant to consider discrepancies among those measures and identify preferred measures with respect to predicting population dynamics and evolutionary processes. The present study contributes to this aim by (i) making readers aware of differences among commonly used fitness estimates, (ii) showing that simulated (yeast) and calculated (E. coli) competitive fitness may differ across time scales, and (iii) showing that bulk competitions may yield relative fitness estimates that are systematically higher than pairwise competitions. The study is rather thorough on the theory side, with extensive derivations and analyses of various fitness measures using their resource competition model in the Supplementary Information. The study ends with a few practical recommendations for preferred methods to infer relative fitness estimates, that may be useful for experimentalists and stimulate further investigations.

      Weaknesses:

      The study has several limitations. Perhaps the most apparent limitation is the lack of a clear answer to the question of which fitness measure is best “in the light of first principles”. The authors show clear discrepancies between fitness estimates across different time scales or using different reference genotypes in bulk competition and provide useful recommendations based on practical considerations (e.g. using pairwise competitions as the “golden standard”), but it remains unclear whether these measures provide the greatest value for the questions researchers may want to answer with them (e.g. predict shifts in genotype frequencies).

      We agree on the importance of considering the scientific questions researchers want to answer in determining the best way to quantify fitness. We have revised both the Introduction (lines 82–88) and the Discussion (lines 615–630) to more clearly explain possible downstream questions researchers may wish to answer with fitness data, and thus why discrepancies in that data based on analysis choices may be important.

      We believe that the text does provide a specific recommendation (second subsection of the Discussion, lines 635– 658) for how to quantify relative fitness: using the logit encoding (rather than other encodings), measuring fitness per-cycle (rather than per-generation), and using the wild-type or a phenotypically-equivalent proxy as reference subpopulation to calculate pairwise fitness in a bulk competition (rather than using the mutant library as a whole). This recommendation is based on first principles: the logit encoding is based on the principle of the logistic equation as the null model of relative abundance dynamics (lines 635–637), the choice of the per-cycle timescale is based on the principle that in non-steady state environments the time scale for measuring selection should not depend on the wild-type growth (lines 640–645), and the choice of reference population is based on the principle that a mutant’s fitness should serve as a predictor of its dynamics when arising de novo at low frequency and competing against its wild-type (lines 648–653).

      A second limitation is that the authors analyse fitness differences arising solely from resource competition, whereas microbes often interact via other mechanisms, e.g. the production of anticompetitor toxins, cross-feeding of metabolites, or lack of growth to enhance their persistence in stress conditions. Without simulations of these processes, understanding discrepancies among fitness measures is necessarily limited.

      We agree that other interactions are important in many microbial ecosystems and could affect measurements of fitness. We discuss the possibility of these other interactions and their potential consequences for fitness on lines 697– 710.

      We focus on resource competition in this paper, however, for two reasons. One is that we are using it as a null model: resource competition is always present, and thus it provides an important baseline for discrepancies in fitness statistics in the absence of any other assumptions. Indeed, our results are that this minimal assumption alone is sufficient to produce a wide range of significant discrepancies, which provides the proof of principle that choices of fitness quantification matter. We have clarified this in a revised explanation of the population dynamics model on lines 294–304.

      The second reason is that fitness measurements of the type discussed in this paper are typically performed on mutants that have only small genetic differences with their ancestor (e.g., a point mutation or gene deletion). While more complex interactions between such similar genotypes are not impossible, we expect them to be rare, in which case resource competition is the only interaction. Explicit modeling of other interactions is an important question for future work, but would require more detailed models and data of those phenomena, and thus would go beyond the scope of the present study. We have added a sentence to explain our emphasis on resource competition on lines 298–301 and 690–697.

      In addition, the analysis of trade-offs between growth traits causing these discrepancies during resource competition seems confounded by biases in measurement error or parameter estimation, at least for growth rate and lag time (Figure 2B), where the replicate estimates for the wildtype show a similar negative correlation.

      The tradeoff between growth traits was only an incidental observation and is not necessary for the fitness statistic discrepancies we analyze in this paper; the only important pattern in the growth traits is the existence of mutants with reduced yields (so as to reduce the wild-type log fold-change in a competition) as well as variation in one other trait under selection (lag time or growth rate in this model). We have clarified this mechanism on lines 328–336, which is demonstrated by Fig. S7. Since these tradeoffs are not relevant to the results and we agree that their significance may be unreliable due to the noisiness of the data, we have removed mention of them.

      Third, the study does not validate relative fitness predictions from growth traits (as is done for the yeast mutants) with measured relative fitness estimates using competition assays, while such data are available, e.g. for the LTEE. This would strengthen their inferences about preferred fitness measures.

      The goal of our modeling with the yeast growth trait data is not to test the ability to predict competition experiments from monoculture data; that has been the focus of previous studies [32, 34, 36, 37]. Rather, we use the population dynamics model for a proof of principle: that the most basic features of microbial population dynamics in laboratory experiments, as captured by this model (resource competition, lag phase, growth phase, saturation), are sufficient to create discrepancies between common fitness statistics used in these experiments (different encodings, time scales, choices of reference subpopulations). The yeast growth curve data merely provides realistic parameters for this model, to ensure we are studying a biologically relevant regime of the dynamics. To avoid this misconception, we have revised our explanation of this model and the data on lines 284–310.

      Fourth, the analysis of epistasis between mutations affecting different growth traits (shown in Figure 3) based on the LTEE data could be better introduced and analysed more comprehensively. Now, the examples given in panels C-F seem rather idiosyncratic and readers may wonder how general these consequences of using fitness estimates based on different time scales are.

      We agree that this analysis was incomplete and missed an opportunity to emphasize this important consequence of fitness quantification. We have thus expanded this analysis into a systematic test of all possible double mutants between qualitative types of trait perturbations in the model. We have added a new main text figure (Fig. 3), new SI figures (Figs. S9–S15), a new subsection in the Results (lines 346–395), and corresponding new sections in the Methods (lines 864–892) and SI (Sec. S8).

      Finally, the study is generally less accessible to experimentalists due to the extensive and principled treatment of specific population dynamic models and fitness inferences. This may distract from the overarching aim to identify fitness measures that are most accurate and useful for predictions of population dynamics and evolutionary processes.

      We appreciate this concern as we do hope to make the paper as broadly accessible as possible, especially to experimentalists who measure microbial fitness. To this end, we have reduced the technical discussion of encodings in the first section of the Results (lines 164–187); revised explanations of the population dynamics model (lines 284–310), importance of growth trait variation (lines 328–336), and epistasis (lines 346–395) to better emphasize the conceptual intuition of these parts; and added a step-by-step guide for our recommended best practices of quantifying fitness in bulk competition experiments (lines 964–982).

      In this light, the motivation for the initial discussion of the importance of how to best encode relative abundance (Figure 1) is unclear. Also, the conclusion, that logit encoding is preferred, because it linearizes logistic growth dynamics and “improves the quality of predictions”, is not further motivated. Experimentalists using non-linear models to infer fitness from growth curves or competition assays may miss the relevance of this discussion.

      The motivation for the discussion of encodings is that it is one of the choices made differently by researchers, mainly using either the logit (more common in experimental evolution and population genetics studies) or log encoding (more common in TnSeq analyses). As such we believe it is important to explain where this choice comes from (a transformation of relative abundance data to make it approximately linear in time, and thus amenable to characterization by a single slope parameter) and why we believe the logit encoding is more logical in most cases. We have streamlined and revised this subsection to make it clearer (lines 164–187).

      Our argument for favoring the logit encoding in most cases is based on the logistic model being a null model for relative abundance dynamics (Sec. S3). In light of the reviewer’s comments, we have realized this may be confusing because there are two common usages of logistic dynamics that are biologically distinct. What we mean by logistic model is the dynamics of relative abundance x of a mutant in competition with other genotypes:

      Here s turns out to be the relative fitness under the logit encoding. On the other hand, researchers also use a logistic ODE to describe the dynamics of absolute abundance N of a single strain in monoculture (e.g., as in a growth curve):

      We believe the reviewer’s last point refers to Eq. (2), whereas our argument about the logit encoding is based on Eq. (1). We have added a note to clarify this distinction for the reader (lines 192–196).

      Reviewer #3 (Recommendations for the authors):

      In addition to my general comments in the public review, I have several more specific recommendations:

      (1) Line 183-189: unclear why logit-based relative fitness is preferred. Abundance data are not typically binomial.

      We agree this claim about abundance data was incorrect and have removed it. We have revised the section to focus on motivating the logit encoding from logistic dynamics of relative abundance as a null model for most systems (main text lines 175–187 and Sec. S3).

      (2) Line 205: it may be mentioned that s(logit) is the same as the “selection rate constant” often used in microbial studies.

      We have added a sentence clarifying the equivalence of the logit-encoded relative fitness to the selection coefficient in population genetics (lines 188–190).

      (3) Line 368: why do mutations that increase biomass yield also increase WT LFC? Is this, because they grow slower and hence allow the WT more time to grow?

      Mutants with higher yield allow the wild-type to achieve higher log fold-change because those mutants consume fewer resources per cell, which frees up more resources for the wild-type to consume and increase its overall growth. It’s not about growth rate or time, as this would occur even for mutants whose growth rates are identical to the wild-type’s. We have revised our explanation of how variation in growth traits differentially affects fitness statistics (lines 323–340) and epistasis (lines 361–378).

      (4) Line 382-386: you may want to cite Ram et al. (2019, 10.1073/pnas.1902217116), who also did such analyses for experimental data from E. coli.

      We have cited this work as Ref. [34].

      (5) Line 415: perhaps use “bulk relative fitness” instead of “total relative fitness”, to contrast with “pairwise relative fitness”.

      We acknowledge the language in this section can be subtle. However, “bulk” is not a sufficient identifier for the concept of total relative fitness as bulk competition experiments (with many genotypes competing simultaneously) can be used to measure either total relative fitness or pairwise relative fitness. (In pairwise competition experiments with only two genotypes, these two types of fitness are identical.) As such we adhere to our original language but have added words to clarify which type of experiment (bulk or pairwise) we are talking about in a given context (e.g., on lines 495–504).

      (6) Line 451-453: why does a population in bulk competition consume resources more slowly than in pairwise competitions?

      Mutant libraries used in bulk competition experiments usually include a large number of deleterious mutants, which grow more slowly than the wild-type. Thus these populations typically consume resources more slowly than a population in a pairwise competition would, where a large part of the population is the wild-type.

      (7) Line 565: I don’t understand how one can compare relative fitness to other timescales.

      Relative fitness, as we’ve defined it, has units of rate, since it describes the rate of change of relative abundance (or an encoding of it) over some time scale (e.g., a batch growth cycle or a generation). Therefore it can be compared to other times scales of the system, such the rate of new mutations arising or the rate of genetic drift fluctuations, as long as they are measured in the same units. This comparison is important to population genetics analyses, such as determining whether the population is in the strong selection-weak mutation limit or the clonal interference regime.

      (8) Line 620 repeats text.

      Thank you, we have revised this paragraph and removed the typo.

      (9) Figure 1C+D: the link between the scenarios on the left and the graphs on the right may be better explained. For example, it may help to make explicit that the 4 scenarios in panel C show the same relative fitness per cycle and that mutant and wildtype have the same growth rate, but different growth periods in both scenarios in panel D. It is also unclear whether the grey dot links to the upper scenario in D.

      We have clarified this issue in the caption and changed the colors to avoid this confusion.

      (10) Figure 2E: it is unclear why “mutants with equal fitness are assigned the lowest rank”.

      This was a technical comment about how to handle ties in our analysis of mutant rankings, but it is moot since no exact ties actually occur in our simulations. We have removed this remark to avoid confusion.

      (11) Figure 2F: the axis labels are confusing, as for the WT estimates no LFC mutant exists. It would also help to make explicit in the legend against which WT replicate/reference strain each strain has competed.

      We agree the inclusion of wild-type replicates in this plot was confusing and unnecessary, so we have removed them. The mutants compete against a wild-type with traits defined by their median values across all wild-type replicates; this is noted in Fig. 2A and the Methods section on our analysis of this data (lines 809–813).

      (12) Figure 5: I am not sure this is needed, as its information is rather limited.

      We agree and have removed this figure.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Summary: This manuscript has presented a high-throughput fluorescence recovery after photobleaching (HiT-FRAP) platform to screen genes affecting the dynamics of the nucleolar scaffold nucleophosmin (NPM1). The platform included the siRNA-based screening of 65 RNA helicases, 9 phylogenetically related helicase pairs, and 290 ribosomal proteins along with selected assembly factors. These factors were classified as those accelerating or decelerating NPM1 dynamics based on the t1/2 measurements. Combined with nucleolar morphological changes, the authors identified that depletion of early-stage (A-F) and later-stage (G-H) LSU assembly factors resulted in different nucleolar phenotypes, suggesting the pre-ribosome assembly can impact nucleolar morphology. Further exploring the potential mechanis m suggested that the NPM1's intrinsically disordered region (IDR) contributed to the nucleolar organization and dynamics.

      Together, this well-designed study uncovered that the ribosome assembly, both the early and late ribosomal precursors can influence biophysical properties of the nucleolus. Below please find our concerns for the authors to consider to strengthen the major conclusions.

      Major comments:

      The main conclusion that NPM1's biophysical states directly impact its interaction strength with ribosome intermediates (and thereby nucleolar dynamics) should be further strengthened as listed below:

      1). Given the nucleolus's complexity, an additional GC factor, or/and one more marker of other nucleolar regions, should be examined to substantiate the proposed impact of LSU-associated factors on nucleolar morphology (Figures 3, 4).

      We thank the reviewer for this very important point. We have now included representative images for representative hits in major phenotypic clusters co-stained for SURF6, another GC marker, which shows similar localization patterns as NPM1 (Fig. S4B). For other nucleolar subcompartments, we have included images obtained from a cell line harboring endogenously tagged FBL-mNeonGreen (a marker for the DFC) for representative hits (Fig. S4A). We see a similar overall distribution of the DFC within the GC (i.e. DFCs distribute to fill the area of the disrupted GC), confirming our screen results. We look forward to further examining the changes in nucleolar subcompartment architecture in future work.

      As additional support, we note that we probed NOG2, NOP53, and NOP2 in our IF results, all of which are GC-localized factors. We see a very similar distribution for these factors in our hits as for NPM1 (see Fig. S8D). In addition, FISH data for pre-rRNA precursors show similar morphological patterns as NPM1, further confirming our results (Fig. S7). We have noted this in text and have also included representative images in supplement.

      2). Additional experiments are needed to support the proposed model that ribosomal intermediates, especially the pre-LSU complexes could determine nucleolar biophysical properties through the interaction with NPM1. Their direct interaction by biochemical assays should be provided. Also, when analyzing the interaction with other nucleolar factors, the authors should provide data that show NPM1 mutant expression levels were comparable to endogenous levels (Figures 4, 6).

      We agree that directly probing NPM1's interactions with LSU precursors is critical to supporting our model, and we have addressed this through several complementary biochemical approaches. First, we performed immunoprecipitation of tagged NPM1 (NPM1-mScarlet, IP-ed using RFP-trap agarose) and assessed interaction with pre-LSU rRNA transcripts via Northern blot (Fig. 5D). We find that NPM1 interacts strongly with the 32S pre-rRNA. Second, we performed sucrose gradient sedimentation and find that NPM1 preferentially co-migrates with pre-60S complexes (Fig. 5B). Together with previous reports of NPM1-pre-LSU interactions, these data provide direct biochemical support for the proposed interaction.

      To test whether interaction strength with pre-LSUs could regulate NPM1 dynamics, we next asked whether our NPM1 mutants that differ in their dynamics in turn interact differentially with pre-LSU complexes. Using co-IP Northern blot for ITS2 and sucrose co-sedimentation, we find that NPM1 mA3 pulls down more 32S and co-sediments more robustly with pre-60S complexes, while NPM1 mB2 shows reduced association (Fig. 5D, E; Fig. S10F, G). These data support that the strength of the NPM1-pre-LSU interaction is a determinant of NPM1 exchange dynamics, and, by extension, of nucleolar biophysical properties.

      Exogenous mutant NPM1 is expressed at approximately 10% of endogenous levels (Fig. S10A). We address this in two ways. First, all interaction comparisons are made between WT and mutant exogenous constructs, not against endogenous NPM1, controlling for expression level differences. Second, we observe similar effects on interactions both in the presence of endogenous NPM1 and in null backgrounds, indicating that the differences we detect reflect NPM1 mutation, not expression level.

      3). Northern Blotting should be done to dissect which pre-rRNA intermediates interact with NPM1 and contribute to the nucleolar dynamics (Figures 4B, D, F). These additional experiments should be feasible within a reasonable timeframe.

      We agree with the reviewer and have performed northern blots for major hits in our different nucleolar phenotypes, and results reinforce what we see by FISH and qPCR (Fig. S6B). Briefly, depletion of the “RNA Exosome” hit SKIV2L2 results in smearing of pre-rRNA precursors that harbor both ITS1 and ITS2 and an accumulation of the 12S, in keeping with its role in end-processing of these transcripts. For “Other” hit PHF5A, we see an enrichment for 47S/45S/41S species, consistent with an early precursor stall. Notably, we do not see this phenotype for depletion of “Other” hit CNOT1, which suggests multiple processing defects may lead to a similar nucleolar phenotype. Treatment with PolI inhibitor CX5461 shows a depletion in ITS1 containing transcripts, and minimal impact on ITS2-containing transcripts, similar to FISH results. Lastly, depletion of “LSU” hits NOP53 and RPF2 leads to accumulation of the 32S and 12S species, in keeping with accumulation of abortive pre-LSUs.

      In addition, the authors should provide the code and the hardware control procedures for HiT-FRAP to ensure reproducibility.

      We thank the reviewer for this thoughtful suggestion. We have made our software available on GitHub (https://github.com/jess-sheu/colony_blob_bleacher) and archived on Zenodo

      (https://doi.org/10.5281/zenodo.20275447).

      According to the authors' statement, all the experiments are adequately replicated, and the statistical analysis is adequate.

      Minor comments:

      To enhance clarity and focus, consider the following:

      1). Simplifying the HiT-FRAP screening section (Fig. 1-3) would emphasize the significant findings.

      We have simplified text throughout to better highlight significant findings.

      2). Expanding analysis and experimental validation could help to solidify the interdependency between rRNA / ribosome precursors and the NPM1- driven nucleolar dynamics (Fig. 4-5). Indeed, additional experiments suggested above in the major concerns should be supplemented here.

      We have performed additional experiments to demonstrate the interdependency between ribosomal precursors and their interaction with NPM1 in shaping nucleolar dynamics, as described above.

      Reviewer #1 (Significance (Required)):

      This work has established a powerful toolkit, named HiT-FRAP, to identify factors involved in the organization and regulation of the membrane-less nucleolus, which will be useful for understanding the complexity not only the nucleolus, but likely other condensates in cells in the future. Using this platform and with the Granular Component (GC)-localized NPM1 as an indicator of nucleolar morphology, the authors found that the biophysical properties of the nucleolus are sensitive to the ordered assembly of ribosomes, in particular the LSU maturation steps at the GC. This finding is important as it suggests the interdependency between the dynamic rRNA processing and the functional assembly and morphology of the nucleolus. Further studies are warranted to analyze the dynamics of other nucleolar constituents, particularly those localized at other sub-nucleolar regions, to fully depict how exactly the nucleolar function is coordinated with its biophysical properties.

      Reviewer #2

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: The nucleolus is a multiphase biomolecular condensate whose primary function is ribosome biogenesis. There are mounting evidences that the material state of condensates is important for their function. Here the authors have probed how the material property of the nucleolus responds to inhibitions of ribosome biogenesis.

      They have assessed nucleolar dynamics (molecular diffusivity) of a nucleolar protein, NPM1, by fluorescence recovery after photobleaching (FRAP). NPM1 is a protein that labels the periphery of the nucleolus (the so-called granular component, GC). (The nucleolus has 3 main subcompartments: the internal fibrillar centers, the middle dense fibrillar components, and the GC).

      One of the main findings of the work is that inhibition of late steps of ribosome biogenesis increases fluidity (faster recovery of NPM1), while inhibition of earlier (and inhibition of mRNA processing -but see below) rather increases rigidification (slower recovery). They then attempt to correlate what is interpreted as biophysical changes to pre-ribosomal intermediates and interaction with NPM1.

      Practically, the authors have produced reporter cell lines (HeLa) expressing stably (CRISPR engineering) mono or bi-allelic fluorescent version of NPM1; they have developed a powerful platform to conduct high throughout FRAP (this is really good); they have calibrated their system, initially with basic perturbations (ATP depletion, proteasome inhibition, etc), and then they focused on a family of trans-acting factors: the helicases, investigating systematically their effect on NPM1 recovery. They then extended their initial candidate-based screen to additional factors (using STRING interactions). This is nice and useful. Later in the work, they include in their analysis additional (morphological) features of nucleoli to cluster functionally their hits, as was done earlier by others in similar works. Finally, using recently published structural data (CryoEM), they attempt to correlate groups in the cluster with particular pre-ribosomal species. This part is less advanced and weaker than the initial part of the paper (screens and FRAP measurements).

      Major comments:

      -A major comment is with the compositional analysis of precursor intermediates that should be better defined. The stage assignment of particles is not quite as good as the screening part of the paper. At the RNA level, the authors provided FISH, as histograms of quantifications (see e.g. Fig 4D, and Fig SS6E). It would be necessary to show images, and to perform biochemistry. At the protein level, the authors provide immunostaining, but it does not really prove the detected protein is part of a particle,..

      We thank the reviewer for this important critique. We have taken several steps to address both the stage assignment and biochemical characterization concerns.

      Regarding stage assignment: We have consolidated our LSU phenotypic clusters (previously LSU1 and LSU2) into a single "late pre-LSU" group based on their shared features and proximity in PCA space. We want to be clear that this consolidation is intended to more accurately represent what our data can support: the screen reliably identifies factors whose perturbation produces a coherent late LSU assembly phenotype, and we do not wish to overstate the resolution of state assignment from imaging data alone. Sub-cluster distinctions are retained in supplementary materials for transparency. We have revised language throughout to reflect this framing.

      Regarding biochemical characterization of intermediates: We have now performed Northern blots on strong hits within our phenotypic groups (Fig. S6B). For LSU cluster hits, we observe accumulation of the 32S and 12S species, indicating a stall in ITS2 processing, which is directly consistent with our ITS2 FISH results and confirms that the RNA-level phenotypes reflect genuine pre-rRNA processing defects rather than indirect effects. For "Other" group factor PHF5A, we observe 47/45/41S accumulation consistent with an early processing stall. We have also added representative FISH images to Fig. S7 to allow direct visual assessment of RNA-level phenotypes.

      Regarding protein-level particle assignment: We agree that IF alone cannot establish that assembly factors are incorporated into discrete pre-ribosomal particles rather than existing as free factors. To more directly test whether the LSU cluster phenotypes reflect accumulation of genuine pre-ribosomal particles rather than mislocalized free factors we used NOP53 knockdown as a representative LSU cluster perturbation and, similar to RPF2 knockdown, see an accumulation of ITS2 and NOG2 in the nucleolus by FISH and IF (Fig. 4E). We then performed nuclear sucrose gradient fractionation and found that NOG2 co-migrates with the LSU peak and does not enrich in soluble fractions (Fig. 4F-H), supporting the interpretation that late pre-LSU particles accumulate in the nucleolus upon disruption of LSU cluster genes. Importantly, we also observe a strong decrease in co-sedimentation of NPM1 with the LSU peak upon depletion of NOP53 (Fig. 4G,H). This result, together with the Northern blot and FISH data, provides biochemical and cell biological evidence that the nucleolar phenotypes we identified by HiT-FRAP are associated with accumulation of late LSU assembly intermediates.

      -Another concern is to know if NPM: a GC component located periphery of the condensate and a late assembly factor is an appropriate marker for assessing the effects on nucleolar material state of all (including early and late) inhibitions.

      Would factors involved in earlier ribosomal assembly steps, and localized more internally would not be better tools to evaluate change in material states caused by alterations in early steps?

      We appreciate this important point and agree that NPM1 reports primarily on GC dynamics. However, we would argue this is a feature rather than a limitation for two reasons.

      First, the GC is the terminal assembly compartment through which pre-ribosomal particles must transit before nuclear export. Perturbations to earlier assembly steps, including FC/DFC-localized processes, likely propagate into GC dynamics, because stalled or aberrant particles accumulate in or are excluded from the GC. NPM1 FRAP thus functions as a downstream integrator of upstream assembly status, not only a reporter of GC-proximal events. This interpretation is consistent with our observation that depletion of early factors (and, therefore, depletion of downstream intermediates) do produce detectable NPM1 phenotypes in our screen. Second, the pattern of our screen results supports rather than undermines this logic: the striking enrichment of late LSU factors and near-complete absence of SSU hits is precisely what one would predict if NPM1 reports selectively on pre-LSU flux through the GC. A sensor that reported indiscriminately on all condensate perturbations would not produce this specificity.

      We do acknowledge, however, that NPM1 cannot report on material state changes that are compartmentally confined to the FC or DFC and do not propagate outward. Extending this approach to internal markers remains an important future direction. To clarify the scope of our readout, we have revised the text to specify that we are monitoring GC dynamics, and we have added representative images of fibrillarin localization in Supplemental Figure S4A to illustrate the relationship between DFC and GC compartments in our experimental system.

      -About the engineered cell lines used for screening by FRAP (Fig 1S): NPM1-mNeonGreen (biallelic with reduced expression of NPM1) and mScarlet (heterozygous): There is a need to characterize pre-rRNA processing in both cell lines to show they are not affected for ribosome biogenesis. This is important information since the entire work is based on these cells.

      We have performed a Northern blot across the cell lines used in this paper as compared to their parent cell line and see no substantial difference in rRNA processing. We have included this data as Supplemental Figure 1D.

      The screening cells are HeLa cells implying they are not physiologically regulated for p53. Nucleolar surveillance is a key regulatory surveillance loop triggered by ribosome biogenesis inhibitions leading to p53 stabilisation. How could this affect this work? Should key findings be confirmed in diploid p53 positive cells?

      We acknowledge that our choice of HeLa cells limits our ability to distinguish cell-type-specific responses from more universal mechanisms and have added an explicit discussion of cell choice in the main text. To begin exploring the impact of p53, we performed gene depletions for representative hits across phenotypic clusters in untransformed, diploid hTERT-RPE cells that were lentivirally-transduced with NPM1-mScarlet and assessed nucleolar morphological phenotypes at smaller scale (Figure S6C, Supplementary Text). At baseline, RPE cells show more and smaller nucleoli than HeLa cells, which may reflect a difference in basal nucleolar assembly and, potentially, ribosome biogenesis, in keeping with previous observations that transformed cells rely more heavily on ribosome biogenesis than non-transformed.

      Upon gene depletion, we found that hits from the "RNA exosome" cluster shows a different phenotype than seen in HeLa cells, where we observe less size difference and a marked decrease in eccentricity, which may reflect a p53 or cell type specific response. Depletion of the “Other” cluster gene PHF5A results in a milder though qualitatively similar phenotype as seen in HeLa cells, with nucleolar rounding and an increase in NPM1 intensity. Depletion of “LSU”-associated hits in RPE cells very robustly replicated most of the nucleolar features we observed in HeLa, which suggest that these are likely generalizable responses to LSU disruption. We have included this data in Supplementary Figure 5C. We note that we did not directly test whether p53 is stabilized upon depletion of our hits in RPE cells, and whether p53 activation feeds back on condensate dynamics remains an open area for future work. However, the concordance of LSU-associated phenotypes across HeLa and RPE cells, which differ substantially in p53 status, transformation state, and baseline nucleolar architecture, supports the generalizability of our core findings.

      -About factor depletion, e.g. helicases, it's important to consider direct versus indirect effects on ribosome biogenesis, the timeline of depletion should be well described in the paper. Apparently, most factors, including the helicases were depleted for 72 hours, this is very long considering most of them play important roles in essential processes for cell homeostasis implying severely reduced growth at the time of capture (and the possibility of indirect effects).

      We thank the reviewer for this important point. To directly address depletion timeline, we performed time courses for strong hits and monitored nucleolar morphology at 24 and 48 hour intervals (now included in Fig. S3D). Morphological changes begin to emerge by 48 hours across phenotypic classes; for the RPF2 LSU phenotype specifically, nucleolar expansion and decreased NPM1 intensity are detectable as early as 24 hours, inconsistent with a general stress response and more consistent with a direct downstream consequence of LSU assembly disruption. Moreover, despite all targeted genes being essential for homeostasis, phenotypic profiles are cluster-specific and associated with multiple genes of coherent function, which suggests that observed impacts are downstream of specific pathway inhibition rather than a general cellular stress response.

      -Another cause of concern is that some perturbations (factor depletion) affect very deeply nucleolar structure/morphology (eg uL2 depletion shown in Fig 2C); how easy/difficult was it to control/make sure that a correct area was obliterated in the FRAP experiment using the (remarkable) data-adaptive approach. For cases where the nucleolus was deeply affected how did you check that a significant nucleolar area had been selected for analysis? It would be good to describe this in the text.

      We manually ensured our segmentation protocol accurately captured nucleoli, defined by higher intensity regions of NPM1, for all depletion cases during screen development. As this is the key factor in ensuring where the bleach point is, most bleaches, even in disrupted cases, bleached the nucleolar interior. To address this point, we have included figures in the supplement (Fig. S4D) that show bleaching time courses for select highly disrupted hits uL2 and eL39.

      • Fig 6C, interaction of NPM1 constructs with pre-ribosomes: the authors have tested interaction with select nucleolar proteins (NOP53, NOP2, NOG2, and uL2), which is not the same as preribosomes.

      It would be important to see the interactions with precursors (Fig S9C, now histograms) please show the actual data, this was tested by qPCR, please show classical northern blots as RTqPCR have shown their limits in such applications.

      Indeed, we cannot distinguish between assembly factors/ribosomal proteins that are associated with NPM1 in their latent, non-pre-LSU bound state versus those that are part of a developing ribosome. We have addressed this gap in several ways. Firstly, we have performed IP-northern blots for tagged NPM1-mutants, as suggested, and find that the mA3 mutant co-IPs more 32S than WT, while the mB2 binds less (Fig. 5D). We also performed sucrose gradient analysis of pre-ribosomal complexes and find that the mA3 mutant co-sediments more with the pre-60S peak, while mB2 co-sediments less (Fig. 5E). These findings are consistent with in vitro findings in the field that B2 mediates interactions with rRNA, while A3 occludes B2 through intramolecular interactions. Collectively with our co-IP western data, we believe the evidence strongly suggests that NPM1 mutants interact differentially with pre-LSU complexes.

      -Minor comments:

      -The effects of mRNA processing disruption on nucleolar dynamics could be (is most likely) very indirect (the so-called "slow hits"). The respective time course of inhibitions is important to describe.

      We direct the reviewer to our response above for other phenotypes. For our "slow hit" / "Other" cluster, we also used the splicing inhibitor PladB as an orthogonal approach. Strikingly, nucleolar rounding was detectable within less than one hour of treatment, well before any general cell health effects would be expected, while dynamics changes required approximately 24 hours — suggesting that morphological and biophysical responses are kinetically separable and that the early morphological response is directly downstream of splicing inhibition. We have included a representative rounding timecourse in Fig. S8E.

      Reviewer #2 (Significance (Required)):

      -General assessment: strengths and limitations

      Strengths: -The automated platform for high throughput FRAP\

      -The authors develop a potentially interesting model where they attempt to connect rigidification/fluidity of a condensate to its function in assembly of large ribonucleoprotein complexes. -The manuscript reads very well; it has been prepared with great care (figures). Some complicated concepts are explained very well (Introduction/Discussion). Limitations: -particle stage assignment based on FISH and immunostaining only. The authors have not demonstrated that the LSU1 cluster = state F and LSU2 cluster = states G/H

      -Advance: -Technological advance, high throughput FRAP, a powerful platform to interrogate macromolecular diffusivity.

      -Several nucleolar screens have been conducted in the past (but at steady-state, not using FRAP), in these works textural and morphological features were used together with dimensionality reduction techniques to define functional clusters of genes that impact the homeostasis of the nucleolus. Often these references are cited but it could be useful to expand a bit on some of the earlier findings to bring the new ones in perspective. Some clusters (typically, the transcriptional cluster that disrupts the nucleolus; and the late binder ribosomal proteins) have been well identified before.

      -Audience: Cell biologists, scientists involved in ribosome biogenesis research, scientists with an interest in helicases. The growing condensate community.

      -Describe your expertise: ribosome biogenesis, structure-function relationships in the nucleolus, technological development in microscopy.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors use high throughput FRAP (HiT-FRAP) in arrayed genetic screens of HeLa cells expressing nucleophosmin (NPM1)-fluorescent protein variants to monitor the biophysical properties of the nucleolus in response to genetic perturbations. HiT-FRAP uses a data adaptive imaging strategy to automatically identify and photobleach fluorescently labeled organelles in living cells and acquire movies for FRAP. Quantitative analysis of FRAP curves include t1/2 and mobile fraction. NPM1 was monitored since it is an important nucleolar scaffolding protein that is thought to interact with many pre-ribosome intermediates.

      The authors depleted 65 RNA helicases (+ 9 pairs) with siRNA and found that 15 of them either increased or decreased t1/2. Knockdowns were confirmed with western blotting. RNA helicase knockdowns with faster NPM1 diffusion were associated with large subunit (LSU) assembly. Most RNA helicase knockdowns with slower NPM1 diffusion were associated with early rRNA processing via the small subunit (SSU) intermediate. The authors screened an additional 290 gene depletions of many ribosomal proteins and assembly factors. With this expanded set of perturbations, they categorized nucleoli based on four morphological features in addition to t1/2 and mobile fraction. Using principal component analysis (PCA), the authors identified clusters of genes with similar effects on NPM1 dynamics and nucleolar morphology. From this secondary screen, the majority exhibited slower NPM1 dynamics. The knockdowns associated with faster NPM1 dynamics were associated with LSU assembly, similar to the helicase experiments. The authors further analyzed several mutants of NPM1 to elucidate the likely interactions between the scaffolding protein and ribosome biogenesis factors. The accumulation of early ribosomal intermediates were associated with decreases in NPM1 dynamics, and accumulation of late intermediates led to increased NPM1 dynamics. The findings established a link between the biophysical properties of the nucleolus and the stages of ribosome biogenesis.

      Major comments:

      • The claims are supported by experimentation.
      • No additional experiments requested.
      • The experiments are adequately replicated, and statistical analysis is sufficient. • Methods are very detailed, which should facilitate reproducibility. Minor comments:
      • Prior studies are referenced appropriately.

      • A bit more coverage of background on the nucleolar scaffolding protein, nucleophosmin (NPM1) would be helpful in the introduction, perhaps in favor of the details on ribosome biogenesis o Paragraph 2 could be shorter or placed elsewhere

      We thank the reviewer for this suggestion and have now included some background on NPM1 in the introduction and have shortened paragraph 2.

      • Figures

      o In Figures 2 - 5: explicitly state in the figure caption what dotted lines are encircling (entire cell?)

      We have now included this in the figure captions (they encircle the nucleus).

      o In Figures 2 - 5: explicitly state what the mp-inferno LUT intensity in the images is quantitating (amount of NPM1?)

      We have now included this in the figure captions (NPM1/mScarlet intensity).

      o Figure 7: more detail in the figure caption

      We have now expanded our model figure caption.

      • The paper is quite dense with a lot of nice work, discussing many different genetic perturbations. It feels a bit overwhelming, and I think the biological significance gets somewhat lost in the presentation of all the data. Perhaps some of the presentation of results can be moved to the supplement in favor of a "leaner" main text. Currently, there are only figures in the supplement, but I feel that some of the text that is not central to the key conclusions can be moved to the supplement. I found myself getting a bit bogged down and having to re-read several times to catch the takeaway messages. Some of the clarifying statements that are found in the discussion section can be moved to the results section. In short, some reorganization would help with readability. One suggestion is to move the Inhibition of rRNA transcription or the RNA exosome leads to nucleolar fragmentation and/or the Perturbation of mRNA processing pathways results in slowed NPM1 dynamics and accumulation of rRNA precursors in the nucleolus to the supplement.

      We thank the reviewer for this helpful suggestion. Due to this and other reviewers, we have now simplified discussion of phenotypic groups, including combining the “LSU” phenotypes into a single group and discussing LSU1/2 in the supplementary text. In addition, while we have chosen to keep the “rRNA transcription/exosome” and “Other” descriptions in the main text, they have been condensed and included in one main section with the other ribosome biogenesis phenotypes to highlight this key takeaway. Remaining discussion of phenotypes is now in supplemental text, as suggested.

      Reviewer #3 (Significance (Required)):

      • General Assessment: The main claim of the paper is that nucleolar phenotype (measured by morphology and NPM1 diffusivity) is correlated with stages in ribosome assembly - i.e. the stage of ribosome assembly determines the biophysical properties of the nucleolus. A strength of the study is the wide range of genetic perturbations tested enabled by the high throughput FRAP. With FRAP, I do worry a bit about using t1/2 as the sole dynamic measurement, but it is not a deal breaker. The authors introduce morphology as another way to characterize the nucleoli. • The claims are well supported by extensive experiments and data. The experiments are well designed, and proper controls were conducted. To validate the method, the authors used perturbations of NPM1 dynamics from the literature including ATP depletion, blocking glycolysis and oxidative phosphorylation, inhibition with MG132, and treatment with sodium arsenite. They observed slower NPM1 diffusivity under all validation conditions. • Advance: The authors have introduced a high-throughput technique for extracting diffusivity with FRAP, yielding a lot of data, but I think the paper suffers a bit in trying to present so much data in the main text. The mechanistic biological insights are compelling but get a bit overshadowed. Improved organization can help the messages come across more clearly. • To my knowledge, there is not a similar study in the literature as the detailed mechanisms of ribosome biogenesis are not well studied. • Audience: The audience for this manuscript seems to be biophysical researchers, thought there may be broader interest due to the wide screening of genetic perturbations. • Expertise: I have evaluated this manuscript from the perspective of a single-molecule biophysicist that studies protein-protein interactions between ribosome biogenesis factors. I am not an expert in FRAP, but I use FCS.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This study uses dental traits of a large sample of Chinese mammals to tract evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis -- mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.

      Strengths:

      This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper and I think the results will be of interest to a broad audience.

      Weaknesses:

      For the original draft of the manuscript, I had four major concerns with the study, especially related to the sampling, diet, and evidence for the 'brawn before bite' hypothesis. I still believe that the original issues that I raised may be weaknesses of the study. For example, there is still limited discussion on diets (even though the dental topographic analyses used in the study are designed for inferring diets). And I find the results a little challenging to interpret because teeth of multiple positions are included in the same samples, which seems problematic. That said, the authors have addressed each of my previous concerns and have made major revisions, including running new analyses, and thus I support the paper.

      This revised submission includes only minor changes aimed at clarifying the main text.

      Reviewer #2 (Recommendations for the authors):

      I appreciate that the authors made many improvements to their study based on reviewers' comments. I don't have any remaining major issues with the paper, but I do have several minor comments.

      Thank you for taking the time to provide additional helpful feedback on our study. We have made minor revisions to the manuscript based on your suggestions. Please see our point-by-point response below.

      Lines 48-50. I reiterate my suggestion in my previous review to explicitly state which clade is being discussed, which is important because several major mammal groups beyond placentals (metatherians, multituberculates, dryolestoids, gondwanatherians) survived the K-Pg and had very different diversification patterns. You mention "mammal taxonomic diversity" but in the next sentence say "This initial placental mammals diversification ..." and later mention "stem placental/eutherian lineages." To stay consistent, you might replace "mammal" (L48) and "placental mammals" (L50) with "eutherian(s)" (usually defined as stem + crown placentals). If you follow this suggestion, then elsewhere in the paper I recommend replacing "mammals" with "eutherians" for consistency.

      Thank you for this suggestion. We modified the use of “mammals” throughout the text to general reference to the group only; specific mentions of the dataset analyzed are revised to “eutherians.”

      Lines 75-83. I respect the authors' hesitancy to reconstruct specific diets for the fossil taxa (L75-83), especially considering that dental topographic analyses (DTAs) often struggle to differentiate diets in extant taxa (e.g., Pineda-Munoz et al. 2016 Methods Ecol Evol). I still think that the authors might be able to interpret dietary trends from their results (e.g., an increase in average OPCR values indicating a shift toward more herbivorous diets) - I think discussing dietary trends would be an interesting discussion topic later in the paper. That said, I also recognize that different DTA results seem to show conflicting dietary trends (based on my limited knowledge of those metrics) so maybe that complicates things too much.

      We concur with Reviewer 2 that dietary inferences of DTA data are premature, especially given the ongoing controversies of its use in studies of extant mammal teeth. We kept our current scope of discussion unchanged.

      Lines 75-77. "early mammals ... are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction." But your fossils (eutherians) are certainly within 'phylogenetic brackets' of modern clades (therians, i.e. Eutheria + Metatheria). Maybe you're alluding to the fossils being stem lineages of extant subgroups like Ungulata, which means we can't bracket them specifically within those eutherian subgroups? So, I recommend revising or expanding your statement for clarity. Also, the considerable phylogenetic uncertainty for Paleocene groups (e.g., Halliday et al. 2015) complicates this issue, which you could mention.

      We modified the sentence to now say “Additional complications with ecomorphological analysis of these stem eutherians include the uncertainty in their dietary ecology, having diverged prior to the crown radiation, and uncertainty in phylogenetic positions of Paleocene taxa [7]; thus, they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction.”

      Line 84. "We investigated dental topography-performance shifts ...". You haven't introduced dental topography or even mentioned teeth yet, and "performance shifts" is vague. So, this phrase might confuse readers. Maybe you can just erase it and start the sentence with "We investigated the timing of ecomorphological ..."?

      We made the recommended revision.

      Lines 104-105 (and elsewhere). "Dental traits paralleled Paleocene global and regional environmental conditions" and "We found that dental topographic trait variability in Paleocene mammals in south China tracked global and regional climatic changes". These conclusions seem a little too assertive to me. Your sample is grouped into 3 rough time bins (of somewhat uncertain ages) and is from a relatively small geographic range - that seems like very limited information for inferring links between dental patterns and climatic changes, especially global patterns. I think it's worth HYPOTHESIZING that dental traits are linked to environmental/climatic changes (with results like those in Figure 2A & B as evidence to support that hypothesis), but I wouldn't make that claim with any confidence. So, I recommend that you temper your relevant conclusion statements. For example, for Line 105, you could replace "We found ..." with "We posit ..." (L105). I would make similar changes to similar statements throughout the paper (e.g., L243).

      Thank you for this suggestion to temper our phrasing. We edited throughout the text to make our interpretations less assertive.

      Figure 1 (and your response to reviewers). Why was the timescale changed to 65.5 Ma for the K-Pg boundary? The K-Pg is 66 Ma (not 65.5), which is the age you mention in the text (e.g. Pg 3 L39) and is well established in the literature - see recent papers from the Paul Renne lab for a more exact age.

      We revised the figure to have the K-Pg at 66 Ma.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank you for the time you took to review our work and for your feedback! The main changes to the manuscript are:

      (1) We have performed additional experiments to increase the number of recordings from frontal and occipital electrodes (previously 51 (occipital: O1+O2) and 26 (frontal: Fp1+Fp2), now 133 and 102). The additional data have strengthened many of our results, including for example the trend for a latency difference between occipital and frontal electrodes that was likely underpowered and is now significant (Figure 3E). We have updated all relevant figures to include the additional data (Figures 2–6, Figure S4, Figure S5). None of the main conclusions have changed.

      (2) As suggested by reviewer 1, we have conducted additional experiments to rule out the possibility that the observed effects were driven by the temporal order of open and closed loop sessions (new Figure S6). We also found another 9 participants who were willing to go on the ‘vomit comet’ of six degrees of freedom (6DOF) playback (previously 5, now 14). These data have further strengthened our conclusion that playback halt responses in 4DOF and 6DOF playback are not substantially different (Figure S4).

      (3) To address the point of reviewers 2 and 3, that mismatch negativity (MMN) responses would be larger on temporal electrodes, we conducted additional experiments in which we also recorded from temporal electrodes T3–T6. We have now added a comparison of visuomotor mismatch and MMN responses on T3–T6 electrodes as Figures S8–S9. On all electrodes, visuomotor mismatch responses were larger than MMN responses.

      (4) As suggested by reviewer 1, we have added an analysis of the experience-dependent changes in mismatch responses comparing frontal and occipital responses early and late in the session (new Figure 4).

      (5) As suggested by reviewer 2, we conducted additional experiments in an independent cohort of participants (note, without concurrent EEG) to measure eye movements triggered by visuomotor mismatches. We found eye-movement speed and blink/eye-closure changes, but these had longer latency than visuomotor mismatch responses (Figure S7).

      (6) Finally, as suggested by reviewers 2 and 3, we applied independent component (ICA) and time–frequency analyses to the EEG data. We show these results and explain why they are not applicable or useful in our case in the responses below.

      Please note, during the revision, we found that a part of our analysis used a bandpass of 0.2-100 Hz while a 1-100 Hz bandpass filter was used elsewhere. This has now been standardized to a 1-100 Hz bandpass filter, and the corresponding methods were updated. This resulted in no relevant changes to the figures. Additionally, the 50 Hz band-stop filter was erroneously described in the methods as 49-51 Hz. The filter used was 40-60 Hz, and the methods have been updated to reflect this.

      Reviewer #1 (Public review):

      In this paper, the authors wished to determine human visuomotor mismatch responses in EEG in a VR setting. Participants were required to walk around a virtual corridor, where a mismatch was created by halting the display for 0.5s. This occurred every 10-15 seconds. They observe an occipital mismatch signal at 180 ms. They determine the specificity of this signal to visuomotor mismatch by subsequently playing back the same recording passively. They also show qualitatively that the mismatch response is larger than one generated in a standard auditory oddball paradigm. They conclude that humans therefore exhibit visuomotor mismatch responses like mice, and that this may provide an especially powerful paradigm for studying prediction error more generally.

      Asking about the role of visuomotor prediction in sensory processing is of fundamental importance to understanding perception and action control, but I wasn't entirely sure what to conclude from the present paradigm or findings. Visuomotor prediction did not appear to have been functionally isolated. I hope the comments below are helpful.

      (1) First, isolating visuomotor prediction by contrasting against a condition where the same video stream is played back subsequently does not seem to isolate visuomotor prediction. This condition always comes second, and therefore, predictability (rather than specifically visuomotor predictability) differs. Participants can learn to expect these screen freezes every 10-15 s, even precisely where they are in the session, and this will reduce the prediction error across time. Therefore, the smaller response in the passive condition may be partly explained by such learning. It's impossible to fully remove this confound, because the authors currently play back the visual specifics from the visuomotor condition, but given that the visuomotor correspondences are otherwise pretty stable, they could have an additional control condition where someone else's visual trace is played back instead of their own, and order counterbalanced. Learning that the freezes occur every 10-15 s, or even precisely where they occur, therefore, could not explain condition differences. At a minimum, it would be nice to see the traces for the first and second half of each session to see the extent to which the mismatch response gets smaller. This won't control for learning about the specific separations of the freezes, but it's a step up from the current information.

      In theory, it is correct that the open loop (playback) session is predictable. However, this is relatively unrealistic. The open loop session is a 5-minute sequence that participants have only experienced once before, when they were generating it in the closed loop session a couple of minutes earlier. It is unlikely that participants would remember the entire sequence to a precision of less than a second, which is what they would need to predict the mismatch event. However, the reviewer is correct that it is possible that the mismatch events lose salience with time, for example as a consequence of participants losing interest in the task with time, or by undergoing some form of adaptation. To address this, we repeated the experiments with the sequence of closed and open loop sessions reversed (Figures S6A-S6C), and we analyzed the responses as a function of time within the session (Figures S6D and S6E), as suggested.

      The reversed-order design consisted of (1) open loop session: a playback, in which participants viewed the recorded closed loop session of a previous participant. This was followed by (2) a closed loop session, in which participants actively walked through the tunnel and experienced visuomotor mismatch events. Using this design, we again found that responses in the closed loop session were significantly larger than in the open loop session (Figures S6A-S6C).

      In addition, we analyzed both new and previously collected data as a function of time in the session. We computed moving average responses across 10 mismatch or playback halt trials at different percentages of progress through the paradigm (Figures S6D and S6E). This analysis revealed no consistent experience-dependent changes that could account for the observed differences between closed and open loop session. While there was indeed some form of experience dependent attenuation of visuomotor mismatch responses (see new Figure 4), the difference at the transition from mismatch to playback halt (and vice versa) far exceeded these adaptation effects (Figures S6D and S6E). This analysis was performed only on data from participants for whom we had both closed and open loop sessions and met our inclusion criteria.

      We used a similar analysis to test whether early and late responses within a session systematically differed (new Figure 4). Here, to maximize the chance of finding a difference, we compared early (first five) and late (last five) trials. Behaviorally, participants reduced their walking speed following mismatch events, with a significantly larger reduction during early trials (14.3%) than during late trials (5.7%) (Figure 4A). Neural responses mirrored this pattern primarily on frontal electrodes: frontal activity showed a clear attenuation from early to late trials (Figure 4B), consistent with the reduction in behavioral responses. In contrast, changes on occipital electrodes were much smaller between early and late trials (Figure 4C-4D). Thus, experience-related modulation is substantially stronger in frontal compared to occipital regions.

      In sum, we do not believe that the difference between visuomotor mismatch responses and playback halt responses can be explained by differences in the predictability of mismatch and playback halt events.

      (2) Second, the authors admirably modified their visual-only condition to remove nausea from 6 df of movement (3D position, pitch, yaw, and roll). However, despite the fact it's far from ideal to have nauseous participants, it would appear from the figures that these modifications may have changed the responses (despite some pairwise lack of significance with small N). Specifically, the trace in S3 (6DOF) and 2E look similar - i.e., comparing the visuomotor condition to the visual condition that matches. Mismatch at 4/5 microvolts in both. Do these significantly differ from each other?

      Yes, the 6DOF playback halt response shown in the previous Figure S3 and the mismatch response shown in previous Figure 2E are significantly different (Author response image 1).

      Author response image 1.

      Comparison of visuomotor mismatch response (A) and 6DOF playback halt response (B) from the original submission with statistics of the comparison (C).

      Nevertheless, to strengthen this conclusion, we collected additional data in the 6DOF condition. We show the comparison for participants for whom both closed loop (active) and open loop sessions (6DOF) were recorded within the same recording session (14 participants) in Figure S4. Consistent with our previous findings, visuomotor mismatch responses were significantly larger than 6DOF playback halt responses (Figures S4A-S4C). And we found no evidence of a difference between 6DOF and 4DOF playback halt responses (Figures S4D and S4E).

      (3) It generally seems that if the authors wish to suggest that this paradigm can be used to study prediction error responses, they need to have controlled for the actions performed and the visual events. This logic is outlined in Press, Thomas, and Yon (2023), Neurosci Biobehav Rev, and Press, Kok, and Yon (2020) Trends Cogn Sci ('learning to perceive and perceiving to learn'). For example, always requiring Ps to walk and always concurrently playing similar visual events, but modifying the extent to which the visual events can be anticipated based on action. Otherwise, it seems more accurately described as a paradigm to study the influence of action on perception, which will be generated by a number of intertwined underlying mechanisms.

      We are not entirely sure we understand the point here correctly. If the reviewer is suggesting that visuomotor coupling is not describable by the ideas of predictive processing, we disagree. However, given that the papers the reviewer is pointing to are premised on what seems to be a somewhat unorthodox interpretation of predictive processing when it comes to cortical circuits, we suspect this is contributing to the misunderstanding here. Let us briefly explain. In the two papers, Press and colleagues argue that most experiments cannot distinguish between “predictive cancellation” and “gated suppression”. This is indeed relatively tricky, even when one has single neuron data. The question is, does movement simply suppress sensory feedback (as is likely the case e.g. in the famous example of the cricket), or does movement result in a precise removal of only the self-generated sensory reafference? The first good evidence of the latter happening in any system is quite recent (Keller and Hahnloser, 2009). The premise the authors build their argument on is that the theory posits that “the brain predictively ‘cancels’ expected action outcomes from perception” (from the abstract of one of the papers). This is incomplete. The minimum circuit for predictive processing is composed of 3 neuron types: positive prediction error neurons, negative prediction error neurons, and internal representation neurons. Only the positive prediction error neurons have the predictive cancellation property the authors discuss. This is not the case for either negative prediction error neurons, or for the internal representation neurons. Negative prediction error neurons are excited by predictions and suppressed by sensory input (i.e. if anything, they are “predictively amplified”). This circuit is relatively well characterized in mouse cortex – for a brief summary see (Keller and Mrsic-Flogel, 2018). Note, this is not our idea of course – the original formulation of predictive processing (Rao and Ballard, 1999) was built to explain end-stopping. These are responses to the absence of an expected line that were stronger than would be expected from classical theories (i.e. negative prediction error responses). In mouse visual cortex, we know that a sudden break in the coupling between locomotion and visual flow selectively activates layer 2/3 negative prediction error neurons. Thus, if human cortex also implements a predictive processing like circuit with positive and negative prediction error neurons, we would expect a break in visuomotor coupling to drive a measurable response in visual cortex (by exciting the population of negative prediction error neurons – this is also why we are quite excited by the phase reversal of visual and mismatch responses as this could indicate that mismatch activates negative prediction error neurons first and positive prediction error neurons later, and vice versa for visual stimulation – negative prediction error neurons are more superficial in cortex (O’Toole et al., 2023)). We do indeed find a response over occipital cortex consistent with the negative prediction error response we observe in mouse cortex. The difficulty in distinguishing “predictive cancellation” and “movement driven suppression” comes only when looking at positive prediction error type responses (that are suppressed by predictive inputs) but does not apply to negative prediction error responses. The predictive processing circuit we are testing is the one described by (Keller and Mrsic-Flogel, 2018; Rao and Ballard, 1999), and here the break in visuomotor coupling is a stimulus that drives negative prediction error responses. Note, other authors who have thought about cortical implementations of predictive processing (e.g. (Bastos et al., 2012)) have glossed over the problem that individual neurons cannot trivially encode both positive and negative errors. Prediction errors are a signed quantity. If neurons signal prediction errors in firing rates and are close to zero firing rate at baseline (as is the case in layer 2/3 of cortex), they cannot (short of rather exotic ideas) encode a signed prediction error. Hence such proposals are not very useful for thinking about prediction error responses in cortex. For these reasons, we see no problem with referring to the response as a prediction error response. This is in line with a large body of mouse research (using a nearly identical paradigm) on the topic.

      One could of course argue that gated suppression could also mean that movement relieves suppression. Thus, one could assume that some neurons are suppressed by movement while others are enhanced. If one allows for enough neuron and stimulus specificity in the precision of the movement related suppression and enhancement of responses, the two models (predictive processing and gated suppression) become equivalent, and the discussion becomes semantic. See (Vasilevskaya et al., 2023) for an extended discussion on this point, and the reasons why we think predictive processing is a more useful model than gated suppression (keep in mind, gated suppression only explains the data if we allow for stimulus/neuron specific gain factors of the suppression, in which case the two models are equivalent).

      More minor points:

      (1) I was also wondering whether the authors may consider the findings in frontal electrodes more closely. Within the statistical tests of the frontal electrodes against 0, as displayed in Figure 3c, the insignificance of the effect of Fp2 seems attributable to the small included sample size of just 13 participants for this electrode, as listed in Table S1, in combination with a single outlier skewing the result. The small sample size stands out especially in comparison to the sample size at occipital electrodes, which is double and therefore enjoys far more statistical power. It looks like the selected time window is not perfectly aligned for determining a frontal effect, and also the distribution in 3B looks like responses are absent in more central electrodes but present in occipital and frontal ones. I realise the focus of analysis is on visual processing, but there are likely to be researchers who find the frontal effect just as interesting.

      That is correct; our data in frontal electrodes was likely underpowered. The reason we have fewer data in frontal electrodes is that eye-blink artifacts are particularly strong in frontal channels, resulting in a larger proportion of trials failing to meet our data inclusion criteria. We have now added more data from frontal and occipital electrodes by including additional experimental sessions. In addition, we applied less stringent trial-exclusion criteria, requiring that no artifacts occur within the time window −0.5 to 1 s relative to the event trigger (instead of −0.5 to 2 s). This adjustment allowed us to retain a larger number of trials. As anticipated by the reviewer, this increase in data was sufficient to confirm a significant response to the visuomotor mismatch event at both frontal electrodes (Figure 3C). The expanded dataset also revealed a significant difference in response onset times between occipital and frontal electrodes (Figure 3E), an effect that was not significant previously. In addition, we have included analysis comparing early and late mismatch responses in frontal and occipital electrodes (Figure 4).

      (2) It is claimed throughout the manuscript that the 'strongest predictor (of sensory input) - by consistency of coupling - is self-generated movement'. This claim is going to be hard to validate, and I wonder whether it might be received better by the community to be framed as an especially strong predictor rather than necessarily the strongest. If I hear an ambulance siren, this is an especially strong predictor of subsequent visual events. If I see a traffic light turn red, then yellow, I can be pretty certain what will happen next. Etc.

      This is a statistical argument. Every movement – throughout life – is directly and immediately coupled to sensory feedback and has been throughout evolutionary history. The vast majority of visual input you receive (we estimate, well above 99%) is the consequence of your own movements (e.g. every few 100 ms your eye movements cause a full field change in your visual input). The same is likely true of proprioceptive and somatosensory input – the vast majority is the direct consequence of your own movements (not other people poking you). This is likely different in the auditory system where a much larger fraction of the input is externally driven (depending a bit on how much one likes to talk). But even here the best predictor is self-motion (most non-self-generated sounds one experiences in life are very difficult to predict with millisecond precision). The example the reviewer gives is a good illustration of this. Take the siren that hails the appearance of an ambulance. The siren tells us that an ambulance will appear, but not how it will look, not when exactly it will appear, and with only very low resolution as to where it will appear. Incidentally, if you ask people to draw an ambulance they tend to draw a WWII style white square vehicle with a red cross on the side – a style of ambulance they likely have not ever seen in life. Their visual predictions of what they are about to see are very low resolution. We catastrophically fail at making pixel perfect predictions from learned stimulus associations of this nature. The traffic light example is difficult to compare to visual feedback control of movement as it is a much simpler prediction of a single bit in the form of a change in color of an existing object.

      In addition, consider how often (in life) you have seen an ambulance after hearing it? 100 times maybe? Maybe less. How often have you seen traffic lights change - 10 000 times? 100 000 times? Now consider, how often you have experienced the visual consequences of moving your head or eyes to the left (keep in mind this includes micro saccades) – at a conservative, once per second, that is somewhere on the order of 1 000 000 000. This is not even in the same ballpark. Our brains can certainly learn to make the ambulance and traffic light type predictions - to some extent - but by far the best predictor of sensory feedback (simply by virtue of the physics of how our body interacts with the world) is self-motion.

      We think this is an argument we can make based on first principles, and one that is frequently overlooked in the field, as experiments often focus on training people or animals to learn novel associations that, especially in the case of mice, we often have no idea whether cortical circuits can even learn. We should focus experiments on the predictive systems our brains have evolved since long before the evolutionary appearance of ambulances and traffic lights. We understand that the reviewer may disagree with this, but unless the reviewer has a concrete example of an even stronger predictor (as measured by frequency of experience, consistency in coupling, and precision in timing – we can’t think of one), it is a point we will make.

      (3) The checkerboard inversion response at 48 ms is incredibly rapid. Can the authors comment more on what may drive this exceptionally fast response? It was my understanding that responses in this time window can only be isolated with human EEG by presenting spatially polarized events (cf. c1, e.g., Alilovic, Timmermans, Reteig, van Gaal, Slagter, 2019, Cerebral Cortex).

      We don’t know, but it is not inconsistent with previous reports. For example, compare the “standing” and “fast walking” target ERP responses in Figure 5 of (Gramann et al., 2010). Both here and in our data, the fast response peak is only really apparent in the direct comparison of visual responses recorded while participants were walking to those when they were stationary.

      While we have taken great care to calibrate the timing of the visual display with the EEG recording, one could be worried that the alignment is off by as much as tens of milliseconds. However, even if this were so, one could use P1 as a reference and determine that the fast peak roughly precedes P1 by about 40 ms. Which again would result in a latency of about 50 ms of the fast walking peak (assuming P1 peaks at about 90 ms). In sum, we have added a reference to the previous work (that we found thanks to the reviewer’s comment) but fear we have nothing intelligent to say beyond that.

      Reviewer #2 (Public review):

      Summary:

      This study investigates whether visuomotor mismatch responses can be detected in humans. By adapting paradigms from rodent studies, the authors report EEG evidence of mismatch responses during visuomotor conditions and compare them to visual-only stimulation and mismatch responses in other modalities.

      Strengths:

      (1) The authors use a creative experimental design to elicit visuomotor mismatch responses in humans.

      (2) The study provides an initial dataset and analytical framework that could support future research on human visuomotor prediction errors.

      Weaknesses:

      (1) Methodological issues (e.g., volume conduction, channel selection, lack of control for eye movements) make it difficult to confidently attribute the observed mismatch responses to activity in visual cortical regions.

      (2) A very large portion of the data was excluded due to motion artefacts, raising concerns about statistical power and representativeness. The criteria for trial inclusion and the number of accepted trials per participant appear arbitrary and not justified with reference to EEG reliability standards.

      (3) The comparison across sensory modalities (e.g., auditory vs. visual mismatch responses) is conceptually interesting, but due to the choice of analyzing auditory mismatch responses over occipital channels, it has limited interpretability.

      We have responded to these points in the more detailed itemization below.

      The authors successfully demonstrate that visuomotor mismatch paradigms can, in principle, be applied in human EEG. However, due to the issues outlined above, the current findings are relatively preliminary. If validated with improved methodology, this approach could significantly advance our understanding of predictive processing in the human visual system and provide a translational bridge between rodent and human work.

      Reviewer #2 (Recommendations for the authors):

      Overall, the study addresses an interesting and underexplored question (translation of the visuomotor mismatch responses observed in rodents to humans). Below, please find a list of specific suggestions for improvement

      Introduction:

      (1) "updating internal representations and internal models" - what is the difference between the two, and why is it relevant to this study?

      In a nutshell, an internal model is the synaptic weight matrix that transforms between coding spaces. An internal representation is the activity pattern coding for the current representation. See (Aizenbud et al., 2025; Keller and Mrsic-Flogel, 2018) for more lengthy elaborations. The fact that the mechanism used for representation update can also be used to update internal models (i.e. solve the credit assignment problem) is likely the prime advantage of predictive processing (see work from the Bogacz lab). The relevance to the current study is justifying why predictive processing is a reasonable hypothesis for the function of cortex.

      (2) "Certain stimuli can be predicted from the preceding sensory input" vs. "Predictions can also be based on memory" - how are these two different? Do you mean specific (e.g., long-term associative or episodic) memory types in the latter?

      Correct, this is an arbitrary distinction that primarily makes sense in the light of experimental approaches. In this particular case, we were talking about spatial memory. We made this explicit to increase clarity.

      (3) "the strongest predictor - by consistency of coupling - is self-generated movement"

      (a) Externally induced movement, while not self-generated and therefore not predicted, will also generate sensory coupling, so is it really only about consistency?

      Externally induced movement (as in somebody else moving one’s arm we are not sure this is what the reviewer means) will induce sensory-sensory coupling but not sensorimotor coupling. We might be misunderstanding the point. In case the reviewer means stimuli that trigger movement as in us asking participants to walk, or a sudden startle stimulus that makes them jump in all such cases there are of course sensorimotor predictions. Sensorimotor predictions are driven by efference copies of the motor command thus all movements whether ‘voluntarily’ executed or triggered by an external stimulus will drive sensorimotor predictions. (All of this of course assumes that the predictive processing theory is correct.)

      (b) Do you mean temporal consistency (minimal lags), statistical contingencies (same movements linked to the same sensory inputs), or both? How does it differentiate sensorimotor/visuomotor mismatch responses from responses to incongruent stimuli in sensory modalities (e.g. audiovisual)?

      Both. We have rephrased the sentence to try to make this clearer. See also response to reviewer 1 minor point 2 above.

      How does it differentiate sensorimotor/visuomotor mismatch responses from responses to incongruent stimuli in sensory modalities (e.g. audiovisual)?

      Most cross-modal associations are much less consistent (the exact sound of a glass shattering is always slightly different and impossible for us to predict), and orders of magnitude less frequently experienced, than sensorimotor associations. Again, see also response to reviewer 1 minor point 2 above.

      (4) "Every movement is directly coupled to sensory feedback throughout life"

      This may be the case for proprioceptive and/or somatosensory feedback, but not necessarily for visual feedback (e.g., a mouse moving its tail), which is the topic of the study.

      Correct, there are movements that can be disconnected from visual feedback. Most of the time, most movements however are not, and we are studying one of the more prominent ones that is clearly not decoupled locomotion. The contrast we aim to highlight here very prominently is that there is still this vague idea in the field that you can take a participant, or a mouse, and expose them/it to a few tens or hundreds of trials of some sensory stimulus contingency and then probe for prediction error responses to a pattern only recently if at all learned. Given the life-long experience of subjects and mice, is it really surprising that oddball responses are less strong than a sensorimotor mismatch?

      (5) "However, the overall level of this motor-related activity is much higher than one would expect simply from predictions of visual feedback that are compared against visual input."

      Could you please clarify what one would expect in this case, and/or back it up with citations?

      This is in reference to the fact that there are very strong movement related signals in the mouse visual cortex that persist even when the mouse is in complete darkness. In darkness, movements should not trigger any visual feedback change hence the activity is difficult to explain as a movement related prediction of visual flow. We have rephrased this section of the introduction to make this clearer.

      (6) "The more precise the prediction and comparison, the less motor-related activity should be detectable in visual cortex."

      I think this conflates two issues. A good match between prediction and input would indeed result in sensory attenuation. However, sensory precision, at least in active inference, can upregulate prediction error responses. Since predictions cannot be assumed to be perfect (due to external or internal noise), increased precision may therefore augment activity. See e.g. https://doi.org/10.1007/s10339-013-0571-3

      We agree with the reviewer – the phrasing here was misleading. We do not mean precision in the predictive processing sense, but the precision of sensorimotor control necessary for the behavior. We have rephrased the corresponding section of the manuscript.

      (7) Neither the introduction nor the discussion refers to previous human EEG studies on sensorimotor mismatch responses, where sensory feedback doesn't match motor actions (e.g. https://doi.org/10.3758/s13423-021-01992-z ; https://www.sciencedirect.com/science/article/pii/S0028393214003777 ; https://www.sciencedirect.com/science/article/pii/S0028393219301265).

      The studies cited by the reviewer primarily test how discrete violations of learned action–outcome associations are represented in the brain, whereas our visuomotor mismatch paradigm probes violations of continuous sensorimotor coupling during ongoing action. The paradigms are conceptually different both in how strong the coupling is (lifelong vs. learned in the experiment), and in how prediction errors are likely used (visuomotor control vs. stimulus detection). We have added a brief part to our introduction discussing this.

      Results:

      (1) A very large proportion of the dataset was excluded due to movement artefacts. This is rather problematic as

      (a) the rationale behind finding mismatch responses is that motion-related (neural) signals should affect visual cortical activity, so it's essential to disentangle these neural signals from artefacts;

      Correct, we excluded 21.7% of the total data for visuomotor mismatch paradigm. Note, this percentage compares to other similar studies of EEG recordings during movement (Oliveira et al., 2016). By “problematic”, we assume the reviewer means the fact that we have artefacts, not that we exclude trials with artefacts. The movement artefacts are typically caused by the acceleration during stepping in participants with a heavy gait. None of these movement artefacts are time locked to any of the responses we investigate. Thus, they should just appear as increased levels of noise if not excluded. We don’t understand why the reviewer thinks this is particularly problematic for our analysis/conclusions (beyond the trivial consequence of increasing noise levels that would only cause us to underestimate the strength of the mismatch signals we report).

      (b) the criterion for the number of trials of 15 triggers (per condition?) is arbitrary and lower than widely used in the literature, so authors should demonstrate that this is a sufficient number to observe a measurable ERP even for those participants with 15 triggers;

      We have between 16 and 25 visuomotor mismatch events per participant. Author response image 2 is a selection of single participant examples with different number of trials. The number of mismatch events is limited by the fact that we introduce them approximately every 10 - 15 s and have a total duration of the closed loop session of 5 minutes. Thus, on average, we expect to have 24 mismatch events. But we are not sure we understand the logic of the comment, if we set exclusion too low, we just risk losing a response in the noise. And we clearly have stronger and higher signal to noise mismatch responses with an average of 20 trials compared to visual responses during movement with an average of 40 trials or MMN responses with an average of 28 trials.

      Author response image 2.

      Reliable ERPs can be observed with as few as 16 trials across EEG channels. (A) Histograms showing the distribution of the number of valid mismatch trials per participant for each electrode pair (Fp1–2, C3–4, P3–4, O1–2). (B) Representative EEG responses to visuomotor mismatch events from a single participant, recorded at electrode pairs Fp1–2, C3–4, P3–4, and O1–2. Waveforms were computed using the indicated number of trials (shown above each trace). Dashed vertical red lines are onset and offset of the visuomotor mismatch.

      (c) it seems that the seemingly static "visual" condition resulted in a larger proportion of data rejected due to movement (or, as later mentioned, nausea) than the "visuomotor" condition, which is counterintuitive and needs further explanation;

      This is a misunderstanding the ‘visual paradigm’ the reviewer is referring to are the experiments shown in Figure 1. Here we record visual responses in both sitting and walking participants. In this experiment, as in others, exclusion was primarily driven by part of the paradigm where the subjects were moving. To make this clearer we have added Table S2 to the manuscript that provides an overview of trials excluded by paradigm and session.

      (d) authors mention eye movements as a potential issue, which should be possible to detect from frontal channels. Additionally, it's not entirely clear how many datasets were discarded (the results section mentions 19/48 in the visual condition, then 4+11 in the playback condition - isn't this the same condition?)

      The visual paradigm corresponds to the data shown in Figure 1, in which participants viewed a flipping checkerboard in both a walking and a stationary session. The open loop session is part of the visuomotor paradigm shown in Figure 2, where participants were exposed to a replay of the visual flow that had been self-generated during the preceding closed loop session, including the visual flow halts that constituted visuomotor mismatches in the closed loop session. Please note, to avoid such confusion, we have attempted to standardize the usage of paradigm (visual vs. visuomotor) and session (sitting vs. walking, and closed loop vs. open loop) throughout. In addition, we have added a table to summarize the number of excluded trials by paradigm and session as Table S2 to the manuscript.

      In comments 1 and 2 of the public review, the reviewer also points out that we did not control for eye movements and we presume relatedly claims that we did not use common EEG reliability standards. Regarding the first point, we performed additional experiments in an independent cohort of participants to test whether eye movements could account for the visuomotor mismatch responses. We recorded eye movements during closed loop sessions and found that changes in eye speed (Figure S7A) or blink rate (Figure S7B) following the mismatch stimulus had a longer latency than visuomotor mismatch responses in EEG. This suggests that the visuomotor mismatch response cannot be explained by eye blinks or changes in eye movement speed. Regarding the second point, we are not sure we understand. Trial exclusion based on a fixed voltage threshold of 100 µV is relatively common, and our rejection rates are on par, and particularly on occipital electrodes even lower, with other work in EEG recordings during locomotion or movement (see e.g. (Oliveira et al., 2016)).

      Nevertheless, we did attempt to apply independent component analysis (ICA) based filtering to the EEG data (Delorme and Makeig, 2004). However, these methods were designed for high channel density recordings. With only 8 channels, ICA is unable to reliably isolate eye movement or motion artefact components of the EEG. To illustrate this, we tested two artifact-rejection strategies. In the first approach, components associated with non-neural artifacts (e.g., muscle activity, line noise, eye movements) were removed only if at least 90% of the component’s variance was assigned to a single artifact class (Author response image 3A). In the second, more permissive approach aimed specifically at reducing eye movement artifacts, components were removed if artifact-related activity exceeded 90% for non-eye artifacts, while the threshold for eye-related components was lowered to 60% (Author response image 3C). We lowered the threshold for excluding eye-related components to ensure that EEG signals influenced by eye movements were effectively removed. In both cases - whether the eye-component threshold was set to 90% or 60% - the averaged responses to visuomotor mismatch trials remained largely similar to the previously reported data, despite higher noise in some traces. Interestingly, when we then followed the ICA filtering by our voltage threshold based exclusion with a threshold of 100 µV, the resulting traces closely resembled the patterns described in the paper (Author response image 3B and 3D). Thus, we conclude the nonICA filtered responses are easier to interpret, free of any potential ICA filtering artifacts, and far less parameter choice (of the ICA filtering) dependent.

      Author response image 3.

      Removal of artifacts identified with ICA does not change the visuomotor mismatch responses. (A) Visuomotor mismatch responses recorded from occipital electrodes after artifact correction. Components associated with non-neural artifacts (e.g., muscle activity, line noise, eye movements) were removed only if ≥90% of the component’s variance was attributed to a single artifact class. Solid black line represents the mean, and shading indicates the SEM across participants. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but excluding trials with amplitudes exceeding 100 µV. (C) As in A, but components were removed if artifact-related activity exceeded 90% for non-ocular artifacts, while the threshold for eye-related components was lowered to 60%. (D) As in C, but excluding trials with amplitudes exceeding 100 µV.

      (2) The finding that mismatch responses are observed at all channels, with differences in amplitudes but not latencies, indicates that volume conduction may affect the results. I would strongly suggest accounting for this using a method appropriate for the very small number of channels, e.g., phase lag index.

      We are not sure we understand. The phase lag index is a method to estimate functional connectivity in a way that corrects for volume conduction (using phase lag). We make no claims about functional connectivity; thus, we are not sure what the reviewer is suggesting we do. The fact that the visual and visuomotor mismatch responses were measurable on all electrodes could indeed be in part explained by volume conduction, but we see no way to estimate the volume conduction contribution. From mouse calcium imaging data, we know that both visual and visuomotor mismatch responses spread across large parts of dorsal cortex (including frontal regions like the ACC).

      With the addition of new data, the latency difference between occipital and frontal electrodes - previously observed only as a trend - is now statistically significant (Figure 3E). Occipital responses emerge earlier than frontal responses, suggesting that mismatch-related activity likely originates in sensory visual regions and subsequently propagates to more frontal areas, as similar to what had been reported in mouse cortex (Heindorf and Keller, 2024).

      (3) The authors compare different types of mismatch responses (including auditory oddballs) in the same set of (occipital) channels, but doesn't this undermine the spatial specificity of the results? Classical auditory mismatch negativity is typically observed over central channels, so weaker amplitudes of auditory mismatch responses in occipital channels are likely trivially explained by modality differences. As such, I'm not convinced that this comparison is informative even in a qualitative manner.

      To address this point, we conducted additional auditory oddball experiments with recordings over the auditory cortex (channels T3, T4, T5, and T6). Given our central reference, these channels should capture the strongest mismatch negativity. The amplitude of the visuomotor mismatch response exceeded that of mismatch negativity on all tested channels (new Figures S8 and S9).

      (4) On a similar note, is the polarity reversal found for visual vs. mismatch responses specific to occipital channels?

      Thank you for this interesting question. In fact, polarity reversal was consistently observed across all recorded channels; this has now been added as a main figure to the manuscript (Figure 5).

      (5) Figure S4C seems to cut off one outlier, and I don't see this outlier included in the boxplot.

      Correct, that is why we describe the boxplots in the figure legend as: “Boxes mark median, quartiles, and range of data not considered outliers.” The axes were now adjusted to include all data points.

      Discussion:

      "A central tenet of the cortical circuit for predictive processing is the split into separate populations of neurons that compute positive and negative prediction errors (Keller and Mrsic-Flogel, 2018; Rao and Ballard, 1999)" - this may be the case for visuomotor mismatch signals or reward prediction errors, but signed PEs do not play a central role in other proposed microcircuits for predictive processing in the perceptual domain (e.g. Bastos)

      Signed prediction errors do not play a central role in proposed cortical microcircuits for predictive processing that do not burden themselves with making a concrete proposal for the implementation of the prediction error computation. The (Bastos et al., 2012) work is a good example of this. The equation for the error term provided in that paper is clearly signed (nothing stops the error from going negative), but no proposal is made for how layer 2/3 excitatory neurons are supposed to signal this quantity. With baseline activity levels close to zero in layer 2/3, there really is only one way to do this, and that is separate populations of negative and positive prediction error neurons. With non-zero baseline firing rate, one could do this bidirectionally around a mean firing rate (as is typically thought of dopaminergic RPE neurons). There are more abstract Bayesian implementations that assume logarithmic transformations that could also implement a prediction error-like system without negative firing rates. But given the absence of any physiological evidence, we will refrain from discussing these. However, most importantly, there is now considerable evidence for the existence of both negative and positive prediction error neurons in layer 2/3 of mouse visual cortex. Thus, by “cortical circuit for predictive processing” we here mean those that make biologically plausible proposals for prediction error computations. Also note, the (Rao and Ballard, 1999) model is probably the prime example for what the reviewer calls a proposed microcircuit for predictive processing in the “perceptual domain”.

      Reviewer #3 (Public review):

      Summary:

      Solyga, Zelechowski, and Keller present a concise report of an innovative study demonstrating clear visuomotor mismatch responses in ambulating humans, using a mobile EEG setup and virtual reality. Human subjects walked around a virtual corridor while EEGs were recorded. Occasionally, motion and visual flow were uncoupled, and this evoked a mismatch response that was strongest in occipitally placed electrodes and had a considerable signal-to-noise ratio. It was robust across participants and could not be explained by the visual stimulus alone.

      Strengths:

      This is an important extension of their prior work in mice, and represents an elegant translation of those previous findings to humans, where future work can inform theories of e.g., psychiatric diseases that are believed to involve disordered predictive processing. For the most part, the authors are appropriately circumspect in their interpretations and discussions of the implications. I found the discussion of the polarity differences they found in light of separate positive and negative prediction errors, intriguing.

      Weaknesses:

      The primary weaknesses rest in how the results are sold and interpreted.

      Most notably, the interpretation of the results of the comparison of visuomotor mismatches to the passive auditory oddball induced mismatch responses is inappropriate, as suboptimal electrode choices, unclear matching of trial numbers, and other factors. To clarify, regarding the auditory oddball portion in Figure 5, the data quality is a concern for the auditory ERPs, and the choice of Occipital electrodes is a likely culprit. Typically, auditory evoked responses are maximal at Cz or FCz, although these contacts don't seem to be available with this setup. In general, caution is warranted in comparing ERP peaks between two different sensory modalities - especially if attention is directed elsewhere (to a silent movie) during one recording and not during the other. The authors discuss this as a purely "qualitative" comparison in the text, which is appreciated, and do acknowledge the limitations within the results section, but the figure title and, importantly, the abstract set a different tone. At least, for comparisons between auditory mismatch and visuomotor mismatch, trial numbers need to be equated, as ERP magnitude can be augmented by noise (which reduces with increased numbers of trials in the average).

      To address this point, we conducted additional auditory oddball experiments with recordings over the auditory cortex (channels T3, T4, T5, and T6). Given our central reference, these channels should capture the strongest mismatch negativity. Nevertheless, the amplitude of the visuomotor mismatch response exceeded that of mismatch negativity on all tested channels (these results are now shown in the new Figures S8 and S9), and the response power was significantly greater for the visuomotor mismatch than for mismatch negativity. Independent of electrode we test, the visuomotor mismatch response has a power 5 to 10 times higher than that of the MMN response. And the number of trials per participant that met quality criteria was comparable between the visuomotor mismatch paradigm (mean = 23 trials) and the auditory mismatch paradigm (mean = 28 trials) (Author response image 4).

      Author response image 4.

      Number of trials included for analysis is comparable between visuomotor and oddball paradigm. (A) Histogram showing the distribution of the number of valid trials per participant for O1-2 electrode pair in visuomotor mismatch paradigm. (B) Same as in A but for deviant stimulus presentations in the oddball paradigm.

      And more generally, the size of the mismatch event at the scalp does not scale one-to-one with the size at the level of the neural tissue. One can imagine a number of variables that impact scalp level magnitudes, which are orthogonal to actual cortex-level activation - the size, spread, and polarity variance of the activated source (which all would diminish amplitude at the scalp due to polyphasic summation/cancelation). The variance of phase to a stimulus across trials (cross trial phase locking) vs magnitude of underlying power - the former, in theory, relates to bottom-up activity and the latter can reflect feedback (which has more variability in time across trials; the distance of the scalp electrode from the activated tissue (which, for the auditory system, would be larger (FCz to superior temporal gyrus) than for the visual system (O1 to V1/2)). None of this precludes the inclusion of the auditory mismatch, which is a strength of the study, but interpretations about this supporting a supremacy of sensory-motor mismatch - regardless of validity - are not warranted. I would recommend changing the way this is presented in the abstract.

      We agree with the point that the EEG response does not need to reflect the total cortical activation. However, the discussion in the abstract (and elsewhere) is in the context of clinical experiments where the underlying cortical activity pattern is irrelevant if it does not trigger a clinically measurable (by EEG in this case) response. The abstract only makes a comparison to MMN implicitly in this sentence “Second, a paradigm that can trigger strong prediction error responses and consequently requires shorter recording times could simplify experiments in a clinical setting.” We are not sure how to phrase this even more carefully – the statement at face value is a truism. The reviewer, we assume, takes exception to the unstated implication that visuomotor prediction errors trigger stronger responses than MMN. Given the data we have, we assume most authors would not consider it an overstatement to make that claim outright.

      Otherwise, the data are of adequate quality to derive most of their conclusions.

      The authors claim that the mismatch responses emanate from within the occipital cortex, but I would require denser scalp coverage or a demonstration of consistent impedances across electrodes and across subjects to make conclusions about the underlying cortical sources (especially given the latencies of their peaks). In EEG, the distribution of voltage on the scalp is, of course, related to but not directly reflective of the distribution of the underlying sources. The authors are mostly careful in their discussion of this, but I would strongly recommend changing the work choice of "in occipital cortex" to "over occipital cortex" or even "posteriorly distributed". Even with very dense electrode coverage and co-registration to MRIs for the generation of forward models that constrain solutions, source localization of EEG signals is very challenging and not a simple problem. Given the convoluted and interior nature of human V1, the ability to reliably detect early evoked responses (which show the mismatch in mouse models) at the scalp in ERP peaks is challenging - especially if one is collapsing ERPs across subjects. And - given the latency of the mismatch responses, I'd imagine that many distributed cortical regions contribute to the responses seen at the scalp.

      This is an excellent point we have rephrased throughout to “over occipital cortex” instead of “in occipital cortex”.

      I think that Figure 3C, but as a difference of visual mismatch vs halting flow alone (in the open loop) might be additionally informative, as it clarifies exactly where the pure "mismatch" or prediction error is represented.

      We performed the analysis as suggested (Author response image 5). Visuomotor mismatch responses are stronger on all electrodes compared to playback halt responses. This difference is also larger in data recorded on occipital electrodes.

      Author response image 5.

      Comparison of the difference between visuomotor mismatch and playback halt on all electrodes. Average response strength was calculated within a 100 ms window centered on the peak of the average visuomotor mismatch response across all electrodes. Boxes mark median, quartiles, and range of data not considered outliers. Each circle represents data from one participant. **: p<0.01, *: p<0.05, Fp1-2: 20 participants, C3-4: 31 participants, P3-4: 35 participants, O1-2: 32 participants.

      As a suggestion, the authors are encouraged to analyse time-frequency power and phase locking for these mismatch responses, as is common in much of the literature (see Roach et al 2008, Schizophrenia Bulletin). This is not to say that doing so will yield insights into oscillations per se, but converting the data to the time-frequency domain provides another perspective that has some advantages. It fosters translations to rodent models, as ERP peaks do not map well between species, but e.g., delta-theta power does (see Lee et al 2018, Neuropsychopharmacology; Javitt et al 2018, Schizophrenia research; Gallimore et al 2023, Cereb Ctx). Further, ERP peaks can be influenced by the actual neuroanatomy of an individual (especially for quantifying V1 responses). Time frequency analyses may aid in interpreting the "early negative deflection with a peak latency of 48 ms " finding as well.

      We have performed time–frequency power and phase-locking analyses for both visual responses (Author response image 6 and Author response image 7) and visuomotor mismatch and playback halt responses (Author response image 8 and Author response image 9), as suggested. We have added the results of these analyses here, as these are not fully developed yet. We may add these to a future publication, for which we would properly want to quantify stability of these effects.

      In brief, time–frequency representations of power did identify potentially interesting differences between walking and sitting sessions in the visual paradigm. Inter-trial phase coherence (ITPC) revealed an early increase in alpha-band synchronization suggesting that phase alignment of alpha oscillations may contribute to the early differences in visual responses between walking and sitting. The same analyses were applied to visuomotor mismatch and playback halt responses. Time–frequency power analysis revealed an increase in delta-band power during visuomotor mismatch, consistent with previous reports linking delta activity to prediction error processing, including reward prediction errors (Cavanagh, 2015), unexpected final words (Webb and Sohoglu, 2025), and visual deviance detection (West et al., 2024). Notably, it appears as if the increase in delta power emerged first over occipital electrodes and appeared later over more frontal electrodes, forming a spatiotemporal gradient of onset across the scalp.

      Delta power changes were markedly reduced in the playback halt responses at the time of visual flow cessation. While some power changes were observed, they occurred primarily at visual flow onset rather than at flow offset. Inter-trial phase coherence analysis further revealed delta-band synchronization over occipital electrodes following visuomotor mismatch, whereas the playback halt response showed strong phase synchronization in both delta and theta bands following visual flow onset.

      Author response image 6.

      Time–frequency representations of EEG power changes during the visual paradigm. (A) Time–frequency maps showing changes in spectral power relative to baseline for electrodes Fp1–2, C3–4, P3–4, and O1–2 following checkerboard reversal in the sitting session. The dashed red vertical line indicates the time of the checkerboard reversal (0 s). (B) As in A, but recorded while participants were walking.

      Author response image 7.

      Inter-trial phase coherence (ITPC) for visual trials during sitting and walking. (A) ITPC across trials for electrode pairs Fp1–2, C3–4, P3–4, and O1–2 following checkerboard reversal in the sitting session. The dashed red vertical line marks the time of the checkerboard reversal (0 s). (B) As in A, but recorded during walking.

      Author response image 8.

      Time–frequency representations of EEG power changes during visuomotor mismatch and playback halt responses. (A) Time–frequency maps showing changes in spectral power relative to baseline for electrodes Fp1–2, C3–4, P3–4, and O1–2 following visuomotor mismatch presentation. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but for playback halts.

      Author response image 9.

      Inter-trial phase coherence (ITPC) for the visuomotor mismatch and playback halt responses. (A) ITPC across trials for electrode pairs Fp1–2, C3–4, P3–4, and O1–2 following visuomotor mismatch presentation. Dashed vertical red lines are onset and offset of the visuomotor mismatch. (B) As in A, but for playback halts.

      Finally, the sentence in the abstract that this paradigm " can trigger strong prediction error responses and consequently requires shorter recording times would simplify experiments in a clinical setting" is a nice setup to the paper, but the very fact that one third of recordings had to be removed due to movement artifact, and that hairstyle modulates the recording SnR, is reason that this paradigm, using the reported equipment, may have limited clinical utility in its current form. Further, auditory oddball paradigms are of great clinical utility because they do not require explicit attention and can be recorded very quickly with no behavioral involvement of a hospitalized patient. This should be discussed, although it does not detract from the overall scientific importance of the study. The authors should reconsider putting this statement in the abstract.

      We have added a paragraph to the discussion to address these points. Note, we get robust and strong responses with very few trials (Author response image 2). The fact that we need to discard up to 21.7 % of trials due to movement/eye blink artefacts, does little to change the fact that we need much fewer trials and have larger and more robust responses compared to other EEG paradigms. Finally, we understand that sometimes not needing participants to pay attention to the task is useful. However, having a paradigm that is engaging and fun for participants and takes 5 minutes of recording time is probably equally often of advantage.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      (1) In the Introduction, I'm not sure that the logic comes through as to what the authors aim to illustrate by comparing mice to humans, in terms of precision and "movement modulation". In some cases, the precision of the comparison is referred to, and in others, the precision of the prediction (I think?). I'm not sure if they mean for this to be different or not. Simlarly, on line 81, "If indeed the precision of visuomotor coupling determines the amount of motor modulation of visual responses" - here I'm a little confused, as "amount of motor modulation" to me, the term "modulation" refers to a conditional modifier (if moving, than suppress visual movement resposnes. if not moving, then amplify visual movement repssones) rather than movement driven activity. The way I'm reading it, the authors mean the latter, but I could be misunderstanding.

      We have rephrased this section of the introduction.

      (2) I think it could be helpful, in the sentence starting on line 65, to reiterate that this observation of higher-than-expected motor activity in V1 is in mice (if I'm understanding it correctly). I also found myself tangled up in the difference between motor-related activity in V1 and motor-modulation in V1 in this paragraph.

      We have rephrased this section of the introduction.

      (3) For signal power, was the amplitude squared on individual trials prior to averaging, or after averaging? If prior, it would help with separating amplitude modulations from phase variance.

      In our previous analysis, power was computed by squaring the amplitude after trial averaging (Author response image 10A). We repeated the analysis using the alternative approach in which power was calculated for individual trials and then averaged (Author response image 10B). Although this method yields substantially higher absolute power values, the overall pattern of results remains unchanged: visuomotor mismatch responses continue to show significantly higher power than visual responses. To look at the phase variance we additionally analyze inter-trial phase coherence (Author response image 7 and Author response image 9).

      Author response image 10.

      Visuomotor mismatch responses have more power compared to visual responses. (A) Comparison of power between visuomotor mismatch and visual responses, calculated within a 0 - 0.5 s time window following stimulus onset. Power was computed by squaring the amplitude after trial averaging. Boxes indicate the median and interquartile range, with whiskers showing the range excluding outliers; circles represent data from individual participants. ***p < 0.001. (B) Same comparison as in (A), but with power calculated by squaring the amplitude of individual trials prior to averaging.

      (4) The "the world suddenly flew forward!" response from the participant, I understand, and I believe that it is useful to illustrate a point. I do not understand the "Are you printing this? - Hi Mom! " part of the participant response, and I'm not sure it adds to the paper, beyond amusement, which seems inappropriate.

      One of the authors (the one who did none of the experiments) finds this endlessly hilarious and as the reviewer notes, it might add amusement more generally. “Inappropriate” might be a bit harsh – according to our favorite AI chatbot: “Amusement provides significant mental, physical, and social value by offering a necessary escape from routine, reducing stress, and fostering a connection. It enhances well-being through endorphin-releasing experiences and encourages social bonding, learning, and joy.” Nevertheless, we have censored the offending passage.

      Aizenbud, I., Audette, N., Auksztulewicz, R., Basiński, K., Bastos, A.M., Berry, M., Canales-Johnson, A., Choi, H., Clopath, C., Cohen, U., Costa, R.P., Filippo, R.D., Doronin, R., Errington, S.P., Gavornik, J.P., Gillon, C.J., Granier, A., Hamm, J.P., Hertäg, L., Kennedy, H., Kumar, S., Ladd, A., Ladret, H., Lecoq, J.A., Maier, A., McCarthy, P., Mei, J., Mejias, J., Mikulasch, F., Mudrik, N., Najafi, F., Nejad, K., Nejat, H., Oweiss, K., Petrovici, M.A., Priesemann, V., Rudelt, L., Ruediger, S., Russo, S., Salatiello, A., Senn, W., Sennesh, E., Sima, S., Uran, C., Vasilevskaya, A., Vezoli, J., Vinck, M., Westerberg, J.A., Wilmes, K., Xiong, Y.S., 2025. Neural mechanisms of predictive processing: a collaborative community experiment through the OpenScope program. https://doi.org/10.48550/arXiv.2504.09614

      Bastos, A.M., Usrey, W.M., Adams, R.A., Mangun, G.R., Fries, P., Friston, K.J., 2012. Canonical microcircuits for predictive coding. Neuron 76, 695–711. https://doi.org/10.1016/j.neuron.2012.10.038

      Cavanagh, J.F., 2015. Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. NeuroImage 110, 205–216. https://doi.org/10.1016/j.neuroimage.2015.02.007

      Delorme, A., Makeig, S., 2004. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21. https://doi.org/10.1016/j.jneumeth.2003.10.009

      Gramann, K., Gwin, J.T., Bigdely-Shamlo, N., Ferris, D.P., Makeig, S., 2010. Visual evoked responses during standing and walking. Front. Hum. Neurosci. 4, 202. https://doi.org/10.3389/fnhum.2010.00202

      Heindorf, M., Keller, G.B., 2024. Antipsychotic drugs selectively decorrelate long-range interactions in deep cortical layers. eLife 12, RP86805. https://doi.org/10.7554/eLife.86805

      Keller, G.B., Hahnloser, R.H.R., 2009. Neural processing of auditory feedback during vocal practice in a songbird. Nature 457, 187–90. https://doi.org/10.1038/nature07467

      Keller, G.B., Mrsic-Flogel, T.D., 2018. Predictive Processing: A Canonical Cortical Computation. Neuron 100, 424–435. https://doi.org/10.1016/j.neuron.2018.10.003

      Oliveira, A.S., Schlink, B.R., Hairston, W.D., König, P., Ferris, D.P., 2016. Proposing Metrics for Benchmarking Novel EEG Technologies Towards Real-World Measurements. Front. Hum. Neurosci. 10, 188. https://doi.org/10.3389/fnhum.2016.00188

      O’Toole, S.M., Oyibo, H.K., Keller, G.B., 2023. Molecularly targetable cell types in mouse visual cortex have distinguishable prediction error responses. Neuron 111, 2918-2928.e8. https://doi.org/10.1016/j.neuron.2023.08.015

      Rao, R.P.N., Ballard, D.H., 1999. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87. https://doi.org/10.1038/4580

      Vasilevskaya, A., Widmer, F.C., Keller, G.B., Jordan, R., 2023. Locomotion-induced gain of visual responses cannot explain visuomotor mismatch responses in layer 2/3 of primary visual cortex. Cell Rep. 42, 112096. https://doi.org/10.1016/j.celrep.2023.112096

      Webb, J.M., Sohoglu, E., 2025. Cortical tracking of prediction error during perception of connected speech. https://doi.org/10.1101/2025.07.18.665498

      West, C.L., Bastos, G., Duran, A., Nadeem, S., Ricci, D., Groves, A.M.R., Wargo, J.A., Peterka, D.S., Leeuwen, N.V., Hamm, J.P., 2024. A lasting impact of serotonergic psychedelics on visual processing and behavior. https://doi.org/10.1101/2024.07.03.601959

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Rolland and colleagues investigated the interaction between Vibrio bacteria and Alexandrium algae. The authors found a correlation between the abundance of the two in the Thau Lagoon and observed in the laboratory that Vibrio grows to higher numbers in the presence of the algae than in monoculture. Timelapse imaging of Alexandrium in coculture with Vibrio enabled the authors to observe Vibrio bacteria in proximity to the algae and subsequent algae death. The authors further determine the mechanism of the interaction between the two and point out similarities between the observed phenotypes and predator prey behaviours across organisms.

      Strengths:

      The study combines field work with mechanistic studies in the laboratory and uses a wide array of techniques ranging from co-cultivation experiments to genetic engineering, microscopy and proteomics. Further, the authors test multiple Vibrio and Alexandria species and claim a wide spread of the observed phenotypes.

      Comments on revisions:

      I thank the authors for their additional work on the manuscript. My comments were addressed to my satisfaction.

      Dear Reviewer #1, we thank you for your careful evaluation of our manuscript and for the time and effort you dedicated to this review. We are pleased that the revised version has addressed your concerns to your satisfaction.

      Reviewer #2 (Public review):

      Goal summary

      The authors sought to (i) demonstrate correlations between the dynamics of the dinoflagellate Alexandrium pacificum and the bacterim Vibrio atlanticus in natural populations, ii) demonstrate the occurrence of predation in laboratory experiments, iii) demonstrate that predation is induced by predator starvation, and iv) test for effects of quorum sensing and iron-uptake genes on the predation process.

      Strengths include

      - Data indicating correlated dynamics in a natural environment that increase the motivation for study of in vitro interactions

      - Experimental design allowing clear inference of predation based on population counts of both prey and predators in addition to microscopy-based evidence

      - Supplementation of population-level data with molecular approaches to test hypotheses regarding possible involvement of quorum sensing and iron update in predation

      Weaknesses include

      - A quantitative analysis of effects of manipulating V. atlanticus density on rates of predation would have been valuable

      - Lack of clarity in some of the methodological descriptions

      Appraisal

      The authors convincingly demonstrate that V. atlanticus can prey on A. pacificum, provide strongly suggestive evidence that such predation is induced by starvation and clearly demonstrate that both iron availability and correspondingly the presence of genes involved in iron uptake strongly influence the efficacy of predation.

      Discussion of impact

      This paper will interest those interested in the diversity of forms of microbial predation and how microbial predatory behavior responds to environmental fluctuations. It will also interest those investigating bacteria-algae interactions and potential ecological controls of algal blooms. It may also interest researchers of microbial cooperation in light of the suggestion of communication between predator cells.

      Dear Reviewer #2, we sincerely thank you for the time you devoted to this second review of our manuscript. We greatly appreciate your thoughtful comments, which helped us further improve the clarity and precision of the manuscript. All your additional recommendations have been carefully considered and addressed in the revised version and in our responses below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (2) The authors' reference to Fig. 4a did not address our concern about density potentially affecting the outcomes shown in Fig. 3. Fig. 4a does not provide any quantitative effects of manipulating Vibrio density. But the new density numbers the authors added in response to point (33) do seem to address our concern, because Vibrio densities become lower in the older cultures, excluding the possibility that the increased predation in older cultures might have been due higher Vibrio densities. We think this should be stated explicitly.

      (33) See point (2) above. We think the authors should explicitly state in the text that the increased predation in older cultures was not due higher Vibrio densities in those older cultures, referring to their data.

      As recommended by Reviewer#2, we added the sentence “Importantly, Vibrio densities decreased with culture age, ruling out the possibility that the stronger predation observed in older cultures was driven by higher bacterial densities” in the results section “Attack of A. pacificum ACT03 is activated by V. atlanticus LGP32 starvation.”

      (45) Is it known that bacterial predators collectively feed more on other bacteria than on microbial eukaryotes in natural habitats? While this certainly seems most likely, it's stated as fact and so should either the statement should be supported with relevant citations or phrased as a likely hypothesis.

      As suggested, we rephrased this sentence “Predatory bacteria are found in a wide variety of environments and are commonly described as feeding on other bacteria, although some cases of predation on microbial eukaryotes have also been hypothesized” in the discussion section.

      (46) Perhaps "Conceiving predators as free-living organisms that kill other organisms and feed on them, this study suggest that Vibrios engage in a novel form of predation in which they kill and feed on algae."

      The reference to 'developing' a predator behavior is not clear. What is meant by 'develop'? It seems unnecessary.

      The use of italics when writing Vibrio is inconsistent.

      We agree that the reference to “developing” a predatory behavior was unclear and unnecessary. We therefore revised the sentence as follows: “Conceiving predators as free-living organisms that kill other organisms and feed on them, this study suggests that Vibrio engages in a novel form of predation in which it kills and feeds on algae.” We also corrected the inconsistent use of italics for Vibrio throughout the manuscript.

      (48) The authors might wish to revise this sentence, as although M. xanxthus does have contact-dependent killing mechanism, it is our understanding that both Lysobacter and myxobacteria can kill some prey at a distance with diffusible secretions.

      The sentence “These bacteria must be in close proximity to their prey in order to cause lysis and utilize their biomass, regardless of the prey's species” was replaced by “These bacteria may require close proximity to their prey to cause lysis and utilize their biomass, although some can also kill prey at a distance through diffusible secretions”.

      (50) Why not directly say 'predatory behavior?

      We totally agree and have reworded the sentence.

      Line by line feedback:

      28 '...the phycosphere, an interface ...'

      We agree and have revised the wording.

      24 'In the attack stage, Vibrios...'

      This sentence has been rephrased as recommended.

      35 surrounds -> surround

      The correction has been done.

      36 The lysis is induced by the cells not by the 'stage'. We would rephrase to 'in which the lysis and consumption of the dinoflagellates occurs'

      This sentence has been rephrased as recommended.

      41 'a new mechanism that could to be involved' -> 'a new mechanism that could be involved ...'

      The correction has been done.

      61 forms

      The correction has been done.

      98 'the role...in'

      The suggested correction has been performed.

      103 'Qpcr' -> 'qPCR'

      Thank you for spotting this typo. “Qpcr” was corrected to “qPCR” in the manuscript.

      125 Misplaced punctuation

      The punctuation was corrected.

      152 The use of '.' vs 'x' to indicate multiplication when writing numbers is inconsistent. In some cases both are missing.

      Numbers have been corrected throughout the manuscript.

      231 I would rephrase 'poor nutrient stress' to 'little nutrient stress' or 'no nutrient stress'

      The rephrasing was carried out as suggested.

      310 R and used packages are not cited

      We added the citation (R Core Team, 2024). Linear models, QQ plots (which are part of linear models), tests, and AICs are included in R by default and are credited to the R Core Team.

      The sentence “Statistical analyses were performed using R 3.6.3 software” was replaced by “Statistical analyses were performed using R 3.6.3 software (R Core Team, 2024) using Rstudio”.

      358 'are capable of simultaneously attacking'

      The expression “are capable of simultaneously attacking” was revised in the manuscript to improve clarity and readability.

      366 'exponential growth phase'

      We have corrected the wording to “exponential growth phase” in the revised manuscript.

      430 The large difference in incubation time between the sea-water vs nutrient-rich treatments and use of different media are unfortunate. These additional variables compromise the ability to directly ascribe observed differences to starvation.

      We agree, the sentence “The comparative analysis of the proteome of V. atlanticus LGP32 incubated 60 h in artificial seawater (ENSW) versus V. atlanticus LGP32 grown 12 h in Zobell nutrient-rich medium revealed 10 proteins modulated by nutrient stress (Fig. S2)” was replaced by “The comparative analysis of the proteome of V. atlanticus LGP32 incubated 60 h in artificial seawater (ENSW) versus V. atlanticus LGP32 grown 12 h in Zobell nutrient-rich medium revealed 10 proteins that were differentially abundant under these two contrasting conditions (Fig. S2)”

      443 Somewhat unclear sentence. I would rephrase this to "Remarkably, of the 10 proteins identified by proteomic analysis and eliminated by mutation, only elimination of PvuB prevented V. atlanticus from attacking A. pacificum ACT03."

      To clarify this point, the sentence “Remarkably, among the 10 proteins identified by proteomic analysis only V. atlanticus LGP32 mutant lacking pvuB failed to attack A. pacificum ACT03 (Fig. 4C; ANOVA p <0.001)” was replaced by “Remarkably, of the 10 proteins identified by proteomic analysis and eliminated by mutation, only elimination of PvuB prevented V. atlanticus from attacking A. pacificum ACT03 (Fig. 4C; ANOVA p <0.001).”

      445 'attack simultaneously' -> 'simultaneously attack'

      The suggested modification has been done.

      450 H3BO4 is written as Boron later, it would be good to call it boron here as well so that it is easier to make the connection for the reader.

      We agree, we modified the manuscript and called it boron.

      459 'no linked' -> 'no link'

      The text was modified accordingly.

      483 'which induces' -> 'which induce'

      The correction has been made.

      519 The use of Vibrio atlanticus and V. atlanticus is inconsistent within the text.

      We have checked and modified the manuscript in accordance with the recommendations.

      807-808 The use of the phrase 'Akaike information criterion (AICc) models' is confusing. Aren't these models just generalized linear models? It should be rephrased to make clear that the AICc is just a test that is used to select which model to use.

      We clarified this point by revising Figure 1 legend. The sentences “(C) Result of Akaike information criterion (AICc) models tested to explain the mean value of degraded Alexandrium cells (dead cells) in spring. (D) Wald test of the AICc model attributing the mean value of degraded cells of Alexandrium in spring to free Vibrio “were replaced by “(C) Results of the Akaike Information Criterion (AICc) test conducted to select a model for explaining the mean value of dead Alexandrium (degraded cells) in spring. (D) Wald test of the AICc model explaining the mean value of dead Alexandrium in spring by free Vibrio”

      827 The chronological sequence of snapshots is not very clear. Perhaps it would be clearer if pictures over a shorter timeframe were used to clearly show the gathering of the V. atlanticus cells near the algal cells.

      To address this point, we removed the first and the last 14 seconds of the snapshots to clearly show the gathering of the V. atlanticus cells near the algal cells, and we added an arrow on Fig. 2D to indicate the chronological order.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the editors and the reviewers for the thorough and insightful comments and suggestions. Addressing them has strengthened our manuscript. We have carefully addressed all reviewer comments, as described in detail below, as well as additional comments we received from others. In addition, we made two substantive updates to the manuscript:

      (1) We improved the estimation of uncertainty in the model predictions by computing 95% confidence intervals using 120 bootstrapped datasets (instead of the 100% of 10 bootstrapped datasets in the original submission) to match the number of bootstrap for the validation dataset.

      (2) We selected a slightly different hyperparameter value based on follow-up analyses suggested by Reviewer 1, which provided very useful information.

      Importantly, none of these changes alter the main results or conclusions of the paper.

      Beyond these changes and those outlined below, we also worked to improve the clarity of the prose throughout as well as added various additional citations to the literature.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents an ambitious and technically impressive attempt to map how well humans can discriminate between colours across the entire isoluminant plane. The authors introduce a novel Wishart Process Psychophysical Model (WPPM) - a Bayesian method that estimates how visual noise varies across colour space. Using an adaptive sampling procedure, they then obtain a dense set of discrimination thresholds from relatively few trials, producing a smooth, continuous map of perceptual sensitivity. They validate their procedure by comparing actual and predicted thresholds at an independent set of sample points. The work is a valuable contribution to computational psychophysics and offers a promising framework for modelling other perceptual stimulus fields more generally.

      Strengths:

      The approach is elegant and well-described (I learned a lot!), and the data are of high quality. The writing throughout is clear, and the figures are clean (elegant in fact) and do a good job of explaining how the analysis was performed. The whole paper is tremendously thorough, and the technical appendices and attention to detail are impressive (for example, a huge amount of data about calibration, variability of the stim system over time, etc). This should be a touchstone for other papers that use calibrated colour stimuli.

      Weaknesses:

      Overall, the paper works as a general validation of the WPPM approach. Importantly, the authors validate the model for the particular stimuli that they use by testing model predictions against novel sample locations that were not part of the fitting procedure (Figure 2). The agreement is pretty good, and there is no overall bias (perhaps local bias?), but they do note a statistically-significant deviation in the shape of the threshold ellipses. The data also deviate significantly from historical measurements, and I think the paper would be considerably stronger with additional analyses to test the generality of its conclusions and to make clearer how they connect with classical colour vision research. In particular, three points could use some extra work:

      (1) Smoothness prior.

      The WPPM assumes that perceptual noise changes smoothly across colour space, but the degree of smoothness (the eta parameter) must affect the results. I did not see an analysis of its effects - it seems to be fixed at 0.5 (line 650). The authors claim that because the confidence intervals of the MOCS and the model thresholds overlap (line 223), the smoothing is not a problem, but this might just be because the thresholds are noisy. A systematic analysis varying this parameter (or at least testing a few other values), and reporting both predictive accuracy and anisotropy magnitude, would clarify whether the model's smoothness assumption is permitting or suppressing genuine structure in the data. Is the gamma parameter also similarly important? In particular, does changing the underlying smoothness constraint alter the systematic deviation between the model and the MOCS thresholds? The authors have thought about this (of course! - line 224), but also note a discrepancy (line 238). I also wonder if it would be possible to do some analysis on the posterior, which might also show if there are some regions of color space where this matters more than others? The reason for doing this is, in part, motivated by the third point below - it's not clear how well the fits here agree with historical data.

      Thank you for raising this important point. We have now added analyses of the effects of the two smoothness-related hyperparameters, ε and γ (see Appendix 10).

      First, we swept a range of values for each hyperparameter (ε: 0.1 – 1; γ: 0.000001 – 0.003) and evaluated model performance using 5-fold cross-validation of the dataset used to fit the WPPM, quantifying predictive accuracy on held-out test data. We used the mean negative log likelihood averaged across the held-out data in the cross validation as our measure of predictive accuracy (Figs. S27-31).

      The two hyperparameters affect cross-validation accuracy in a similar manner. With γ fixed at 0.0003, predictive accuracy is highest for ε in the range of approximately 0.3–0.5 and drops quite rapidly for ε < 0.3. We attribute this drop to oversmoothing. Cross-validation accuracy also decreases, albeit more gradually, for ε > 0.5. We attribute this to increased variance due to undersmoothing relative to the power of our datasets. Similarly, with ε fixed at 0.4, predictive accuracy is highest for γ values between approximately 0.0001 and 0.001, declines rapidly for smaller γ (oversmoothing), and more slowly for larger γ (undersmoothing).

      Second, we examined how the hyperparameter ε affected the agreement between the WPPM fit and the MOCS validation data. Specifically, at each ε, for each participant, we computed the linear regression between WPPM thresholds and validation thresholds at 25 reference locations. Then, we examined the slope and correlation coefficient of all participants as a function of ε. We found a classic bias–variance tradeoff. Excessive smoothness introduces bias by failing to capture structure in the data, whereas insufficient smoothness increases variance in model predictions. These results further support a choice of ε = 0.4 as lying near the optimal balance between bias and variance (Fig. S32).

      Based on these analyses, we selected for the final analysis ε = 0.4, slightly smaller than the preregistered value used in the original submission (0.5), while retaining the original value of γ (0.0003).

      We now discuss these reasons for changing this value in the revision, as well as provide a more general discussion of the importance and practicalities of hyperparameter choice in Bayesian approaches to analyzing data (Discussion / Prior specification).

      (2) Comparison with simpler models. It would help to see whether the full WPPM is genuinely required. Clearly, the data (both here and from historical papers) require some sort of anisotropy in the fitting - the sensitivities decrease as the stimuli move away from the adaptation point. But it's >not< clear how much the fits benefit from the full parameterisation used here. Perhaps fits for a small hierarchy of simpler models - starting with isotropic Gaussian noise (as a sort of 'null baseline') and progressing to a few low-dimensional variants - would reveal how much predictive power is gained by adding spatially varying anisotropy. This would demonstrate that the model's complexity is justified by the data.

      In the 5-fold cross-validation analysis described above (and now presented in Appendix 10), we found that when ε or γ is small, the stronger smoothness constraint leads to threshold ellipses that are nearly identical to each other across color space. Under these conditions, model predictions show poor accuracy on held-out test data and lead to poor predictions of the validation data. This observation addresses the underlying point raised by the reviewer, albeit in a different way than suggested: it shows that a degree of spatially varying anisotropy is necessary to capture the structure of the data. We now make this point in the paper (Discussion / Prior specification).

      More broadly, we employed the WPPM as a prior that imposed smoothness but not much other obvious structure, and used this to learn about the psychometric field. We are currently working to understand how we can best use our current data to improve the prior we would apply to future measurements. There are a number of approaches to this. One would be to seek a parametric mechanistic model that can describe the current data, and to the extent this is possible formulate prior distributions over the parameters of the model. The results reported here thus provide a foundation for deriving and evaluating more structured priors that would even more efficiently leverage future datasets, but with the feature that they impose more structure. We have added this perspective to the Discussion / Extensions of the WPPM framework.

      (3) Quantitative comparison to historical data. The paper currently compares its results to MacAdam, Krauskopf & Karl, and Danilova & Mollon only by visual inspection. It is hard to extract and scale actual data from historical papers, but from the quality of the plotting here, it looks like the authors have achieved this, and so quantitative comparisons are possible. The MacAdam data comparisons are pretty interesting - in particular, the orientations of the long axes of the threshold ellipses do not really seem to line up between the two datasets - and I thought that the orientation of those ellipses was a critical feature of the MacAdam data. Quantitative comparisons (perhaps overall correlations, which should be immune to scaling issues, axis-ratio, orientation, or RMS differences) would give concrete measures of the quality of the model. I know the authors spend a lot of time comparing to the CIE data, and this is great.... But re-expressing the fitted thresholds in CIE or DKL coordinates, and comparing them directly with classical datasets, would make the paper's claims of "agreement" much more convincing.

      Although we are sympathetic to this request, we have chosen not to implement the sort of quantitative comparison requested by the reviewer. The reason is that an important feature of color thresholds is that they depend on the spatial (e.g. Kelly, 1974; Poirson & Wandell, 1996; Danilova & Mollon, 2025) and temporal (e.g. Kelly, 1974) properties of the stimuli, and on the observer’s state of adaptation (e.g. Loomis & Berger, 1979; Krauskopf & Gegenfurtner, 1992). Because (as the reviewer notes below) the spatial and temporal properties of our stimuli were not matched to those of the comparison datasets, our purpose in making these comparisons was to examine qualitative agreement, as well as to situate our results in the literature and to demonstrate that our approach allows us to read out thresholds around the references and in the color spaces used in other studies. We would not expect detailed quantitative agreement with the current dataset because of differences in stimuli.

      As a consequence of this, we think we would be overreaching to quantify the differences between our data and classic datasets. This consideration is particularly important for the MacAdam measurements, where because of the matching adjustment procedure used, the observer’s state of adaptation is likely to have varied (by amounts that are difficult to estimate) from one reference to the next (e.g. Danilova & Mollon, 2025). We have clarified the manuscript with respect to these points (Results / Comparison with previous measurements).

      A point to make on this topic is that an important and interesting future direction that emerges from our work is to develop efficient methods to characterize the dependence of the full discrimination field on ancillary variables, such as those that describe spatial and temporal properties and/or the state of adaptation, which we now also mention in the paper (Discussion / Implications for the mechanisms of color perception). Although not the primary motivation, doing so would enable comparison of data with a wider range of studies.

      We do agree that the comparisons to CIELAB predictions work better when we express them in CIELAB, and have now done so (Fig. 3D; Fig. S24-S26).

      Kelly, D. H. (1974). "Spatio-temporal frequency characteristics of color-vision mechanisms." Journal of the Optical Society of America 64(7): 983–990.

      Poirson, A. B. and B. A. Wandell (1996). "Pattern-color separable pathways predict sensitivity to simple colored patterns " Vision Research 36(4): 515–526.

      Danilova, M. V. and J. D. Mollon (2025). "Effect of stimulus size on chromatic discrimination." Journal of the Optical Society of America A 42(5).

      Loomis, J. M. and T. Berger (1979). "Effects of chromatic adaptation on color discrimination and color appearance." Vision Research 19(8): 891–901.

      Krauskopf, J., Gegenfurtner, K. (1992). "Color discrimination and adaptation." Vision Research 32(11): 2165–2175.

      Overall, this is a creative and technically sophisticated paper that will be of broad interest to vision scientists. It is probably already a definitive method paper showing how we can sample sensitivity accurately across colour space (and other visual stimulus spaces). But I think that until the comparison with historical datasets is made clear (and, for example, how the optimal smoothness parameters are estimated), it has slightly less to tell us about human colour vision. This might actually be fine - perhaps we just need the methods?

      Related to this, I'd also note that the authors chose a very non-standard stimulus to perform these measurements with (a rendered 3D 'Greebley' blob). This does have the advantage of some sort of ecological validity. But it has the significant disadvantage that it is unlike all the other (much simpler) stimuli that have been used in the past - and this is likely to be one of the reasons why the current (fitted) data do not seem to sit in very good agreement with historical measurements.

      As the reviewer notes, our stimuli head in the direction of ecological validity (see also Hedjar et al., 2025) and indeed this was a consideration when we chose them, at the cost of limiting the degree of comparison we can make with prior studies (as discussed above). Another reason we chose our stimuli is that they enable the current data to be used as a basis of comparison with stimuli where we add specularity, change object shape, and vary object pose in the future. These manipulations are not possible with flat matte patches. Such experiments are of interest to us, as they will tell us about how effectively color may be used to differentiate stimuli in cases where other ecologically important variables co-vary. We now mention this motivation in the paper (Results / Task and Stimuli).

      Hedjar, L., M. Toscani and K. R. Gegenfurtner (2025). "Importance of hue: color discrimination of three-dimensional objects and two-dimensional discs." Journal of the Optical Society of America A 42(5).

      Reviewer #2 (Public review):

      Summary:

      Hong et al. present a new method that uses a Wishart process to dramatically increase the efficiency of measuring visual sensitivity as a function of stimulus parameters for stimuli that vary in a multidimensional space. Importantly, they have validated their model against their own hold-out data and against 3 published datasets, as well as against colour spaces aimed at 'perceptual uniformity' by equating JNDs. Their model achieves high predictive success and could be usefully applied in colour vision science and psychophysics more generally, and to tackle analogous problems in neuroscience featuring smooth variation over coordinate spaces.

      Strengths:

      (1) This research makes a substantial contribution by providing a new method to very significantly increase the efficiency with which inferences about visual sensitivity can be drawn, so much so that it will open up new research avenues that were previously not feasible. Secondly, the methods are well thought out and unusually robust. The authors made a lot of effort to validate their model, but also to put their results in the context of existing results on colour discrimination, transforming their results to present them in the same colour spaces as used by previous authors to allow direct comparisons. Hold-out validation is a great way to test the model, and this has been done for an unusually large number of observers (by the standards of colour discrimination research). Thirdly, they make their code and materials freely available with the intention of supporting progress and innovation. These tools are likely to be widely used in vision science, and could of course be used to address analogous problems for other sensory modalities and beyond.

      Weaknesses:

      It would be nice to better understand what constraints the choice of basis functions puts on the space of possible solutions. More generally, could there be particular features of colour discrimination (e.g., rapid changes near the white point) that the model captures less well.

      This comment bears conceptual similarity to Reviewer 1’s question about the hyperparameters of our prior, as it is basically asking whether we might be oversmoothing through the choice of form and number of basis functions. The hyperparameter sweeps we now present suggest that within the choice of basis functions we used, we are operating at a reasonable point on the bias-variance tradeoff curve - we can see bias emerging with a smoother prior, and variance increasing with a less smooth prior. Our expectation is that varying the smoothness of the prior in other ways, such as by varying the form and number of the basis functions, would lead to similar tradeoffs.

      We did perform one additional check that shows, within our current framework, that adding more basis functions is unlikely to change things much. This was to plot the fit weights as a function of Chebyshev basis order (Figure S4 in Appendix 2). These decline to near zero at the highest order we used, suggesting that adding more would not alter the inferred psychometric field, given our hyperparameter choices. Although we could explore this question further by explicitly fitting the data using more basis functions along with different hyperparameter choices, or different functional forms for the basis functions, we decided not to pursue this in favor of performing the other additional analyses we now present.

      We resonate with the reviewer’s concern that assuming smoothness, both by assuming that isoperformance contours are elliptical and by assuming that these vary smoothly with reference, might cause us to miss features of the true underlying field in cases where that field varies rapidly or the isoperformance contours are asymmetric or non-elliptical. Our approach to this was to measure the validation thresholds and demonstrate that any bias in our WPPM-inferred field is small for these measurements. Because we shared the reviewer’s intuition that the adapting point is a candidate location where there might be less smooth variation, we measured a validation threshold at this reference for every subject. Nonetheless, we only measured in one direction around the adapting reference for each subject. We considered validation approaches where we measured full ellipses at a set of validation references, but we were worried about effects of uncertainty reduction and perceptual learning which might distort thresholds at highly sampled locations.

      It is the case that if one wanted to study the discrimination field in more detail around a particular reference, one could concentrate trials in a smaller model space around that reference, and for the same number of trials use a prior with less smoothness relative to the underlying stimulus space. Indeed, simply halving the size of the stimulus space that maps onto the [-1,1] model space and keeping the same prior over the model space effectively halves the degree of smoothness expressed with respect to the stimulus space. Thus our methods could prove useful in studying more rapid variations in the discrimination field if one hypothesized that they might occur around particular reference choices, but this would still rest upon the elliptical assumption. To relax that assumption, one could use the threshold field estimation methods implemented in AEPsych, which incorporate a smoothness assumption but do not assume elliptical isoperformance contours. Weakening the prior in this way would, however, increase trial demand to obtain similar measurement precision.

      As a general matter, we don’t think it is possible to leverage smoothness for trial efficiency on the one hand and at the same time be completely sure that there isn’t some aspect to the underlying ground truth that has been smoothed over. Carefully choosing the degree of prior smoothness together with the number of experimental trials in the context of a particular content problem is an important part of bringing the WPPM and related methods to bear, and one where simulation and held-out data both play an important role.

      We now bring these points out more fully in the paper (Discussion / Extensions of the WPPM framework; Discussion / Prior specification).

      Chen, C.-C., J. M. Foley and D. H. Brainard (2000). "Detection of chromoluminance patterns on chromoluminance pedestals I: threshold measurements." Vision Research 40(7): 773–788.

      The substantial individual differences evident in Figure S20 (comparison with Krauskopf and Gegenfurtner, 1992) are interesting in this context. Some observers show radial biases for the discrimination ellipses away from the white point, some show biases along the negative diagonal (with major axes oriented parallel to the blue-yellow axis), and others show a mixture of the two biases. Are these genuine individual differences, or could the model be performing less accurately in this desaturated region of colour space?

      We agree that these differences are interesting. We have now added more complete bootstrapped confidence regions in these (Appendix 8) and the other comparison figures (Appendix 6, 7, 9), so that an estimate of measurement precision is directly available in these figures. These confidence regions suggest that the individual differences in this region of color space are real. A longer-term goal is to develop more mechanistic models that can account for individual subject data through parameter choice. This might lead to insight into what differs in the visual system across individuals.

      Reviewer #3 (Public review):

      Summary:

      This study presents a powerful and rigorous approach for characterizing stimulus discriminability throughout a sensory manifold, and is applied to the specific context of predicting color discrimination thresholds across the chromatic plane.

      Strengths:

      Color discrimination has played a fundamental role in studies of human color vision and for color applications, but as the authors note, it remains poorly characterized. The study leverages the assumption that thresholds should vary smoothly and systematically within the space, and validates this with their own tests and comparisons with previous studies.

      Weaknesses:

      The paper assumes that threshold variations are due to changes in the level of intrinsic noise at different stimulus levels. However, it's not clear to me why they could not also be explained by nonlinearities in the responses, with fixed noise. Indeed, most accounts of contrast coding (which the study is at least in part measuring because the presentation kept the adapt point close to the gray background chromaticity, and thus measured increment thresholds), assume a nonlinear contrast response function, which can at least as easily explain why the thresholds were higher for colors farther from the gray point. It would be very helpful if a section could be added that explains why noise differences rather than signal differences are assumed and how these could be distinguished. If they cannot, then it would be better to allow for both and refer to the variation in terms of S/N rather than N alone.

      We agree with the reviewer. We are measuring SNR and attributing it to noise, but cannot identify from the data whether changes in SNR across color spaces are due to changes in noise, to a nonlinear relationship between stimulus space and the observer’s response space with noise in the response space held fixed, or both. We now make this point where we introduce the Results / Wishart Process Psychophysical Model and reiterate it in the Discussion / Extensions of the

      WPPM framework.

      Related to this point, the authors note that the thresholds should depend on a number of additional factors, including the spatial and temporal properties and the state of adaptation. However, many of these again seem to be more likely to affect the signal than the noise.

      We don’t disagree. Indeed, as we noted in our response to a comment by Reviewer 1 and above in the context of individual differences, we are very interested in developing a mechanistically plausible model that accounts for the data. If we or others are able to do so, that would provide a basis for parsing performance into separate signal and noise effects. And if such a model has natural ways in which additional variables affect its predictions, measuring the effects of these variables would be a way to provide evidence in favor of the model (Discussion / Implication for the mechanisms of color perception - Extensions of the WPPM framework).

      An advantage of the approach is that it makes no assumptions about the underlying mechanisms. However, the choice to sample only within the equiluminant plane is itself a mechanistic assumption, and these could potentially be leveraged for deciding how to sample to improve the characterization and efficiency. For example, given what we know about early color coding, would it be more (or less) efficient to select samples based on a DKL space, etc?

      The more we are willing to assume about the structure of the psychometric field, the more efficiently we can measure it. As the reviewer correctly notes, this principle applies to trial placement as well. We are currently using an adaptive method (AEPsych) that starts with a fairly weak smoothness prior and attempts to place trials using heuristics that aim to minimize the expected uncertainty in the posterior. As we learn more about the discrimination field, we should be able to leverage stronger priors to increase trial efficiency. This point is closely related to one we made above about developing stronger priors that capture what we have learned in this study. Such priors could also help improve trial placement. For a prior that has a relatively small number of parameters, for example, perhaps a mechanistic prior, methods such as Quest+ (Watson, 2017) may be used for trial placement.

      Watson, A. B. (2017). "QUEST+: A general multidimensional Bayesian adaptive psychometric method." J Vis 17(3): 10.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I do not think that the authors need to perform additional experiments. However, I would like to see some additional analyses regarding the assumptions made in the fitting procedure and how they affect the final maps.

      I also think some more quantitative comparisons with historical data would be valuable - at the moment, a lot of the comparisons are simply 'by eye'.

      It would have been nice to have the code and data available during the review procedure - I'm sure these will be released with excellent documentation?

      We addressed the first two points in the public review section. The code is now available online as is the data. These links are now provided in the paper (Methods and Materials / Data and code availability).

      Reviewer #2 (Recommendations for the authors):

      Minor points

      I have a few suggestions for additions and small changes.

      (1) Several examples of covariance matrix fields are shown in Figure 1, 4, but these are for simulated examples. It would be nice to see the fields actually fit the data! I would be interested in seeing this for all participants in an Appendix, and maybe for participant CH in the main paper?

      We have made the changes (see Figure 4 and Figure S3).

      (2) I have not worked through all the math in the appendices line by line, but it seems to be complete, and the model validation results speak for themselves. I think the authors have done a pretty good job of explaining the model conceptually (not easy), but I struggled with the 'weighted sum' step in Figure 4 and the main text. I would appreciate a bit more hand-holding here, e.g, why is an 'overcomplete' representation needed as an intermediate, and providing an intuition of why there are 12 matrices in the overcomplete representation and what each matrix in this representation represents.

      We have now added more explanations in the figure legend and text (Fig. 4 and Methods and Materials / The Wishart Process Psychometric Model).

      (3) Individual differences: There is a section on this in the manuscript, and it's concluded that there are only "modest" individual differences. However, in Figure S20, the individual differences, I think, are huge and place observers almost in qualitatively different categories! Some observers show a radial bias in discrimination ellipses, others seem to show basically a bias along the negative diagonal, and others a mixture of both biases. These ellipses are at a desaturated part of colour space - is it possible that there are some rapid changes in the underlying noise in this region that the Wishart fit has not captured due to relatively sparse sampling or the fact that the basis functions are all fairly low spatial frequency? I wondered whether the results are constrained by the choice of Cartesian rather than polar basis functions, e.g, polar basis functions may have better allowed fine-grained changes near the white point but slower changes at higher saturations away from the white point.

      We agree that the individual differences are meaningful and, in some cases, quite pronounced. Our intent in describing the differences as “modest” was to emphasize that the overall structure of the psychometric fields remains broadly consistent across observers. We have revised the Results to note and more fully describe these differences.

      Regarding the possibility that sharp changes in the underlying noise near the achromatic point might not be fully captured by the current model, we agree that this is an important consideration. The current implementation uses relatively low-order Chebyshev basis functions that primarily capture smooth global variations in the psychometric field. While validation analyses indicate that these basis functions capture the dominant structure in the data, they may be less sensitive to sharp local variations such as those that could occur near the white point. Future work could address this by mapping the model space to a smaller region around the achromatic reference or by exploring alternative basis sets (e.g., polar or Zernike functions) that may better capture such localized structure. This is discussed above in this response and now addressed in Discussion / Extensions of the WPPM framework.

      On sampling, I wondered if the results might have been biased by the strongly biased ellipse that occurs at the grey point. If not, and the model is accurate in this region of colour space, I think this figure does show some large individual differences, and it would be good to comment on these in the individual differences section of the manuscript.

      Based on our analysis of trial placement (Fig. S1), the adaptive algorithm does not appear to have disproportionately concentrated trials near the gray point. In fact, more trials were allocated to the edges of the stimulus space than to the center. This suggests that the WPPM estimates are unlikely to be driven primarily by performance in the gray region. In addition, we examined the threshold ellipses around the gray reference in DKL space and found that they are broadly consistent across participants (Figs. S22–S23). Together, these analyses suggest that the anisotropy observed near the gray point reflects a genuine property of the psychometric field rather than an artifact of the sampling procedure.

      As noted just above, we have added additional text about individual differences in the Results and referenced it in the Discussion.

      (4) The manuscript seems unusually free of typographical errors, but I noticed that in many places "Krauskopf and Karl 1992" is cited! Also, I think something has gone wrong with the legend to Figure 2 - perhaps the order of panels was swapped around, but the legend was not fully updated. There is a repeated reference to the "summary of regression slopes" which seems to be in 2 positions, after C and G. It would make more sense to label panel G as D and progress from there, or switch the order of the panels so that G is on the bottom row.

      Thank you for catching those errors. They are now fixed.

      Reviewer #3 (Recommendations for the authors):

      A minor point (or perhaps major if your last name is Gegenfurtner) is that the reference to Krauskopf and Karl is incorrect.

      They are now fixed.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In the paper, the authors compare the performance of their new version to two previous approaches. Figure 2b shows that the new toolbox performs similarly to the previous deep-learning-based toolbox, but requires only an anatomical scan, which is a significant improvement. They also compare it to an older method that uses an atlas without requiring deep learning. For eccentricity and pRF size predictions, both deep-learning methods perform better than the older approach. For polar angle, a critical parameter for delineating visual field maps, the gain is substantially less. Moreover, the comparison to the atlas method (Benson2014) is not entirely fair, as, to our knowledge, there is also a more advanced atlas version that uses Bayesian fitting methods and already performs better than the old method. To better understand the gain of using deep learning, it would be beneficial if the authors also made the comparison to this more recent atlas-based approach. Moreover, it would be useful to know the correlations for the representative participant. Some examples of relatively "bad" maps would also be useful to have (and could be provided as supplementary information).

      We thank the reviewer for their constructive feedback. We plan to expand our benchmarking section to include the Bayesian model comparison. Note, however, that the additional accuracy gain afforded with the Bayesian model of retinotopy (Benson and Winawer, 2018) results from combining anatomical data with retinotopic maps estimated with a few minutes of functional data. The Bayesian model of retinotopy without such functional data is equivalent to Benson14. We plan to report the correlations (between predicted and empirical maps) for the representative participant shown in Figure 2 and include an additional supplementary figure showing retinotopic map predictions for a participant whose predictions deviate the most from empirical maps, as suggested by the reviewer.

      Figure 2b shows that the toolbox is quite good at estimating eccentricity and polar angle parameters, but less good at estimating the population receptive field (pRF) size. I will return to this latter point.

      An interesting feature is that while the toolbox is trained on a specific data set (HCP), it can, "out-of-the-box", be applied to different existing data sets, without the need to retrain the model. This is quite important for the general utility of the method. The results for this are shown in Figure 3. Again, in panel b, it can be seen that the toolbox does a good job at estimating eccentricity and polar angle values, but performs rather poorly for pRF size: the deepRetinotopy toolbox has a strong tendency to only estimate very small pRFs, particularly when applying it across different datasets. For this reason, at the moment, these estimates appear hardly useful. It would be very helpful for readers if the authors could clarify or elaborate on this point, particularly regarding the limitations of pRF size predictions. They explain that this could be due to the use of different types of stimuli, but even within the same (HCP) dataset, the predictions primarily suggest tiny pRFs, even though the training dataset also contains larger ones (which can be better seen in supplementary Figure 4). Showing the predictions for higher-order brain areas, which have larger pRFs on average, could serve a similar evaluation purpose. Presumably, the underlying reasons are complex and could relate to the use of different stimuli, different analysis toolboxes, and how the deep learning model is currently being trained. Possibly, the abundance of small pRFs at lower eccentricity in the training set (which is usually the case in any empirical analysis) has given the model a very strong bias toward predicting small pRFs.

      There would be various ways to verify which of these components is critical. For example, the model could be trained only on the bar stimuli of the HCP dataset, or the pRFs for all stimuli and datasets could be estimated using the same software tool. The latter seems important. For example, Supplementary Figure 4 indicates a high correlation between the Stanford and NYU cohorts that have used the same stimulus and analysis package, despite having different resolutions and scanners. Further investigation into the underlying reasons for these discrepancies would strengthen the paper. It would also provide valuable guidance for users of the toolbox on which toolbox predictions to trust and which not, as well as how well the model generalizes to other stimulus types, scanners, and image resolutions.

      We will expand our discussion of the limitations of pRF size prediction, highlighting that differences in visual stimuli, analysis toolboxes used to estimate pRF parameters from empirical data, and the current training of deepRetinotopy affect prediction accuracy. As the reviewer pointed out, the underlying reasons are complex, and it is difficult to isolate all the potential contributing factors. However, in addition to our expanded discussion, we also intend to present results from additional experiments that assess the impact of different loss functions on the range of predicted pRF sizes (to explain how training may partly account for the differences observed in the HCP dataset). We will also perform pRF fitting on at least one dataset using the same software/encoding model as in the HCP dataset (the training data) to illustrate that the lower performance in pRF size prediction in out-of-distribution datasets is also partly explained by differences in how the empirical maps were obtained.

      An aspect that is not directly apparent from the title, abstract, and introduction is that the deepRetinotopy toolbox does not by itself produce estimates of visual area labels or boundaries. It predicts only polar angle and eccentricity values. To predict labels and boundaries, the authors combine the toolbox with an atlas (the aforementioned Bayesian atlas). For visual areas V1 - V3, it does a very good job, in that the predictions are as good as the empirical ones. Notably, the authors indicate that the predictions for V2 and, in particular, V3 are worse than for V1, but Figure 4 clearly shows that predictions are as good as the empirical ones. More cannot be expected from a model that is trained on such empirical data.

      We will edit the introduction and abstract to make it clearer that the deepRetinotopy toolbox does not yet produce estimates of visual boundaries on its own.

      Irrespective of the limitations with respect to predicting pRF size, the toolbox opens up functionally oriented analyses of very large cohorts of healthy participants, of which only anatomical data is available. The authors present an example of this by confirming the existence of differences in horizontal and vertical asymmetries in the field maps of the visual cortex of children and adults. While Figure 5 confirms the existence of differences, the analysis could be expanded to provide deeper insights, such as normalized developmental trajectories for both asymmetries, given the size of the dataset. This would better highlight the true power of their approach.

      Although providing insights into developmental trajectories for horizontal and vertical asymmetries is beyond the scope of the current work, as it would require aggregating datasets such that individuals’ age span a larger range (ABCD dataset only contains individuals between 9-11 years old and the HCP Young Adult dataset between 22-36 years old), we plan to provide some complementary analyses (differences across ages and sex within the ABCD dataset).

      While the authors address limitations with respect to studying experience-dependent atypical functional organization, they do not address how the deepRetinotopy toolbox would handle (acquired) brain lesions. Addressing this, even if only speculative, would be welcome. Another welcome addition would be to see the predictions for additional brain areas, even if those would (presumably) be worse at present. Such information would nevertheless be essential for users considering applying this toolbox. Moreover, this could be a valuable resource serving as a benchmark for future iterations of either deepRetinotopy or other approaches.

      We plan to expand and report performance evaluation across other visual areas (using Wang atlas’ parcels) to serve as a benchmarking resource. Moreover, we will expand our discussion on how deepRetinotopy would handle brain lesions.

      Reviewer #2 (Public review):

      (1) The weak point of the contribution is the choice to limit anatomical quality assessments and error quantifications to just three early regions, V1-V3, even though the deepRetinotopy toolbox can delineate over 20 regions (including parietal, ventral, and lateral regions, such as IPS0-5, hV4, VO1-2, V3A, PHC1-2, LO1-2, and TO1-2).

      (2) The limit is fine for their large-scale application of the toolbox to age groups, as here, a clear hypothesis on early cortex variability was tested.

      (3) However, the introduction of the toolbox itself warrants quality assessments and comparisons to prior models and ground truth beyond V1-V3, just like the authors did in their prior publication of the predecessor model.

      (4) This is important as the vast majority of applications of this toolbox will likely go beyond V1-V3 to delineate dorsal, ventral, and lateral regions.

      (5) For the present paper, this will require only 1 or 2 additional figures, or extending their present figures 2 and 4 along the lines of their previous figure 7 (Ribeiro et al 2021), which included error measures for high-level regions. Ideally, you provide sub-graphs separately for early visual, dorsal, ventral, and lateral regions.

      (6) Going beyond V1-V3 is important for several reasons: first, future studies applying the software beyond V3 will need quantification for reassurance and justification. Second, for the sake of transparency, even if results are noisy or on par with prior models. Third, as a benchmark or reference point for future approaches.

      We thank the reviewer for their constructive feedback, and we agree that expanding our performance assessment beyond V1-3 would be a valuable benchmarking resource. Thus, we plan to evaluate retinotopic map prediction accuracy across visual areas defined by the Wang atlas’ parcels, expanding on the results reported in Figure 2, and provide it as a supplementary figure. However, performance estimation ultimately depends on the quality of the dataset used for evaluation. The empirical maps, although treated as ground truth, may themselves misrepresent the underlying retinotopic organization. As a matter of fact, the quality of the empirical data (HCP dataset and others) is indeed lowest in some of the higher-order visual areas.

      It may be unclear from the text that the deepRetinotopy toolbox does not yet produce estimates of visual boundaries on its own. Accordingly, we illustrate how deepRetinotopy toolbox’s predictions can be combined with another tool [the Ba yesian model of retinotopy from Benson and Winawer (2018)] to obtain visual area boundaries automatically. We will edit the introduction and abstract to make it clearer. Given the availability of empirical labels (currently only for V1-3) and the segmentation tool (which was only assessed for V1-3), we cannot expand Figure 4 to other visual areas as suggested.

      Reviewer #3 (Public review):

      Quantification of the Analysis: My main concern is that the analysis relies heavily on global summary measures such as correlation and Dice score. Those measures are useful, but the paper would be more informative if it also quantified boundary differences in millimeters, especially for comparisons such as the V1/V2 boundary in Figure 2. That kind of analysis would help readers understand how large the errors are in physically meaningful terms.

      We thank the reviewer for their constructive feedback. Following the reviewer’s suggestion, we plan to expand our segmentation evaluation to quantify the extent to which boundary predictions from deepRetinotopy’s maps deviate from those from empirical maps, in millimetres.

      Model fitting methods: I also think the discussion of prediction failures for pRF size should be more explicit. The mismatch is likely influenced by the fact that the training data and several evaluation datasets were fit with different models and different analysis software. In particular, the network was trained on non-linear size estimates from the HCP data, while the comparison datasets were derived using other packages and, in some cases, different model assumptions. That likely contributes to the spread in Figure 3b and should be discussed more directly. It is important to discuss that the pRF parameters were derived using different software tools.

      We will expand our discussion of the limitations of pRF size prediction, highlighting that differences in visual stimuli, different encoding models for estimating pRF parameters from empirical data, and the current training of deepRetinotopy affect prediction accuracy. In addition to our expanded discussion, we intend to also present results from additional experiments that assess the impact of those factors on pRF size prediction performance.

      Clarifying Model Accuracy: If deepRetinotopy generates a true "noise-removed" representation of functional mapping based on anatomy, then fitting it to one fMRI measurement should predict a second, independent fMRI run better than the noisy data from the first run does.

      The authors possess the exact data for this test. For the HCP dataset, the empirical fMRI data were explicitly separated into two halves: "fit 2" (the first half of the fMRI runs) and "fit 3" (the second half). They correlated these two halves to establish a "noise ceiling," the maximum possible reliability of the data. Looking at their results in Figure 2b, the correlation of the deepRetinotopy predictions falls below this noise ceiling. This means that the noisy functional Half 1 actually predicts functional Half 2 better than the anatomical model does.

      The authors should state this explicitly. A side-by-side plot of Half 1 predicting Half 2 versus deepRetinotopy predicting Half 2 would show that the anatomical model regularizes map location well, but misses reliable subject-specific variation that anatomy alone cannot capture.

      We will expand our benchmarking session to make these comparisons (“Half 1 predicting Half 2 versus deepRetinotopy predicting Half 2”) more explicit. It is important to highlight that there is more subject-specific variation that is currently not captured by our model, and it can also serve as a benchmarking resource for future model versions and newer approaches.

      The Hemodynamic Response Function: The assumptions used to generate the original empirical maps are permanently baked into the deep learning model. However, the authors explicitly mention the hemodynamic response function (HRF) only once, noting in the Methods that the modeled time series was "convolved with a canonical hemodynamic response function."

      Beyond this single mention, there is no direct discussion of how the assumption of a single canonical HRF across all 161 HCP training subjects might have systematically impacted or biased the network's predictions. The authors address cross-dataset differences broadly under the umbrella of "experimental design" and "fMRI preprocessing pipeline" biases, but the HRF is a core biological property that mediates the connection between the anatomy and the data. The authors should explicitly discuss how this canonical assumption limits or biases the resulting deepRetinotopy network.

      As Reviewers 3 and 1 have noted, the observed limitations in pRF size prediction stem from multiple underlying factors. One of those factors is indeed the HRF assumed in the encoding models. We will expand our discussion about factors that may introduce biases into deepRetinotopy predictions, including the HRF.

      Scoping the Input Data and Normative Use: The authors use FreeSurfer to generate a mean curvature map for the entire midthickness cortical surface. This full-hemisphere curvature map is resampled to a standard template surface space (32k_fs_LR), acting as the data frame that feeds input features into the neural network. However, while the network receives the full geometric structure of the hemisphere, it is explicitly trained to predict retinotopic parameters only within a restricted posterior ROI, based on the Wang et al. atlas and containing roughly 3,200 vertices per hemisphere.

      A useful experiment to try, and perhaps the authors have already considered this, would be to restrict the input features exclusively to the posterior vertices. Including all anterior vertices may make it harder for the network to fit the localized visual data. A brief commentary on why the full hemisphere was retained as input could be highly informative for researchers adapting this geometric deep learning pipeline.

      Thanks for this suggestion. We have not performed a systematic evaluation of using ROIs that span a larger portion of the cortex (including the full hemisphere). It is a great idea to do so and report it in our manuscript to inform other researchers interested in adapting our pipeline. We intend to also update our toolbox by retraining our models to take all posterior vertices as suggested, which would improve the coverage of current predictions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful consideration of our work and constructive comments. We are glad that reviewers appreciated the rigor and value of our work. In response to the reviewer comments we have made the following changes:

      (1) Addition of new experiments on EndoA localization at the Drosophila NMJ (Fig. 2).

      (2) Addition of new experiments on Dap160 localization at the Drosophila NMJ (Fig. 2).

      (3) Addition of new experiments to validate Dynamin, Dap160 and EndoA antibodies (Fig. 2 – figure supplement 1).

      (4) Assessment of the activity-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 3).

      (5) Assessment of the liprin-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 8).

      (6) Addition of a limitations section to the discussion to directly address that spontaneous release was not fully ablated in our studies and might contribute to recruitment.

      (7) Addition of an outlook to the same section on what experimental avenues could address the limitations in the future.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Emperador-Melero et al. seek to determine whether recruitment of endocytic machinery to the periactive zone is activity-dependent or tethered to delivery of active zone machinery. They use genetic knockouts and pharmacological block in two model synapses - cultured mouse hippocampal neurons and Drosophila neuromuscular junctions - to determine how well endocytic machinery localizes after chronic inhibition or acute depolarization by super-resolution imaging. They find that acute depolarization in both models has minimal to no effect on the localization of endocytic machinery at the periactive zone, suggesting that these proteins are constitutively maintained rather than upregulated in response to transient activity. Interestingly, chronic inhibition slightly increases endocytic machinery levels, implying a potential homeostatic upregulation in preparation for rebound depolarization. Using genetic knockouts, the authors show that localization of endocytic machinery to periactive zones occurs independently of proper active zone assembly, even in the absence of upstream organizers like Liprin-α. Overall, they propose that the constitutive deployment of endocytic machinery reflects its critical role in facilitating rapid and reliable membrane internalization during synaptic functions beyond classical endocytosis, such as regulation of the exocytic fusion pore and dense-core vesicle fusion. Although many experiments reveal limited changes in the localization or abundance of endocytic machinery, the findings are thorough, and data substantially support a model in which endocytic components are organized through a pathway distinct from that of the active zone. This work advances our understanding of synaptic dynamics by supporting a model in which endocytic machinery is constitutively recruited and regulated by distinct upstream organizers compared to active zone proteins. It also highlights the utility of super-resolution imaging across diverse synapse types to uncover functionally conserved elements of synaptic biology.

      We thank the reviewer for the positive assessment of our study.

      Strengths:

      The study's technical strengths, particularly the use of super-resolution microscopy and rigorous image analyses developed by the group, bolster their findings.

      We thank the reviewer for highlighting the technical strength of our work.

      Weaknesses:

      One notable limitation, however, is the absence of interrogation of endocytic proteins previously suggested to be recruited in an activity-dependent manner, in particular, endophilin.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drospophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin, which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al., 2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord versus Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together with our work, we conclude that these data suggest that Endophilin constitutively, but not completely, localizes to the periactive zone.

      Reviewer #2 (Public review):

      Summary:

      This study examines whether the localization of endocytic proteins to presynaptic periactive zones depends on synaptic activity or active zone scaffolds. Using a combination of genetic and pharmacological perturbations in Drosophila and mouse neurons, the authors show that proteins such as Dynamin, Amphiphysin, AP-180, and others are still recruited to periactive zones even when evoked release or active zone architecture is disrupted. While the results are mostly negative, the study is methodologically solid and contributes to a more nuanced understanding of synaptic vesicle recycling machinery.

      We thank the reviewer for deeming our work solid and for highlighting its importance for the field.

      Strengths:

      (1) The experimental design is careful and systematic, covering both fly and mammalian systems.

      (2) The use of advanced genetic models (e.g., Liprin-α quadruple knockout mice) is a notable strength.

      (3) High-resolution imaging (STED, Airyscan) is well used to assess spatial localization.

      (4) The findings clarify that certain core assumptions - such as strict activity dependence of endocytic recruitment - may not hold universally.

      We thank the reviewer for pointing out these strengths.

      Weaknesses:

      (1) The study would benefit from a clearer positive control to demonstrate activity-dependent recruitment (e.g., Endophilin).

      We have added experiments to measure the localization of Endophilin, a protein previously reported to localize to the synaptic vesicle cloud [1], in Drosophila NMJs (Figs. 2 and 3). We observed that EndoA localized both to the synaptic vesicle cloud and to the periactive zone area. While stimulation did not enhance levels in either compartment, this outcome is not inconsistent with shuttling of protein between compartments during activity. Nevertheless, our data support a model in which EndoA, like the other tested endocytic proteins, is present at the periactive zone at rest.

      (2) The reliance on Tetanus toxin in the Drosophila NMJ experiments in my eyes is a limitation, as it does not block all presynaptic fusion events; this should be discussed more directly.

      We agree with the point of the reviewer. To more directly discuss it, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” (519-523).

      (3) The potential role of Dynamin in organizing other periactive zone proteins is not addressed and could be an important next step.

      We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”

      (4) Some small changes in protein levels upon silencing are reported; their biological meaning (e.g., compensation vs. variability) is not fully clarified.

      These changes might include homeostatic adaptations. In the revised version of the manuscript, this is addressed on lines 135-137 and 405-407. We think it is overall difficult to assign biological meaning to small-magnitude changes, and chose to highlight the main point that there are no large-magnitude changes.

      (5) While alternative organizing mechanisms (actin, lipids, adhesion molecules) are mentioned, a more forward-looking discussion of how to test these models would be helpful.

      Following the reviewer’s suggestion, we have added an outlook section to the discussion where we provide suggestions for future studies (lines 510-543).

      (6) The authors should consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.

      We have included new experiments on EndoA at the fly neuromuscular junction (Fig. 2, Fig. 3, Fig. 8, Fig. 3 – figure supplement 1) and have added appropriate discussion of these findings as outlined above.

      Reviewer #3 (Public review):

      Summary:

      This study examines how synaptic endocytic zones are positioned using a combination of cultured neurons and the Drosophila neuromuscular junction. The authors test whether neuronal activity, active zone assembly, or liprin-α function is required to localize endocytic zone markers, including Dynamin, Amphiphysin, Nervous Wreck, PIPK1γ, and AP-180. None of the manipulations tested caused a coordinated disruption in the localization or abundance of these markers, leading to the conclusion that endocytic zones form independently of synaptic activity and active zone scaffolds.

      We thank the reviewer for reviewing our work.

      Strengths:

      The work is systematic and carefully executed, using multiple manipulations and two complementary model systems. The authors consistently examine multiple molecular markers, strengthening the interpretation that endocytic zone positioning is robust to changes in activity and structural assembly.

      We thank the reviewer for pointing out these strengths.

      Weaknesses:

      The main limitation is that the study does not test whether the methods used are sensitive enough to detect subtle functional disruption, and no condition tested produces clear disorganization of the endocytic zone. As a result, the conclusion that these zones assemble independently is supported by negative data, without a strong positive control for disassembly or mislocalization.

      We are confident that our methods are sensitive enough to detect changes within synaptic compartments. First, for mouse neurons assessed with STED microscopy, we have demonstrated that we can distinguish between the N- and the C-termini of the presynaptic protein Bassoon, which are positioned only a few tens of nanometers apart [4]. We have subsequently been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart and have established that genetic manipulations of active zone proteins induce detectable disruptions as assessed by STED microscopy [4-12]. Given that the periactive zone is larger than the distances that we can resolve, we are confident that we can detect changes in this area with enough sensitivity. Second, for Drosophila NMJs, we use a carefully validated workflow that allows assessing the distribution of periactive zone proteins and can detect subtle changes [13]. Unfortunately, there are no known manipulations that lead to periactive zone disassembly that could serve as a positive control, which reflects the little knowledge available in this field. We acknowledge that there may be subtle changes in protein localization that escape the resolution of our microscopy methods or experimental design, but this would not undermine the conclusion that the periactive zone remains assembled across the manipulations that we have tested. Overall, none of the manipulations we test induces a detectable disruption of the periactive zone. Naturally, we cannot exclude milder effects and have added a limitations section to discuss this possibility and some of the subtle changes we observe.

      This paper addresses a longstanding question in synaptic biology and provides a well-supported boundary on the types of mechanisms that are likely to govern endocytic zone localization. The conclusions are well justified by the data, though additional evidence would be needed to define the assembly mechanism itself.

      We thank the reviewer for the support of the conclusion of our study.

      Recommendations for the authors:

      Reviewing Editor Comments:

      This is a rigorous study that, while presenting largely negative data, delimitates the processes that control peri-active zone organization. In addition to the interpretive and technical comments below, we encourage the authors to consider extending this study in two areas. First, examining the activity-dependence of Endophilin, and perhaps other factors, being recruited to the PAZ, where previous research has indicated a positive role for activity. Second, further characterization of the role of miniature release events in potentially contributing to PAZ organization. Overall, this was a rigorous and well-executed study.

      We thank the reviewing editor for this positive assessment of our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The rationale for comparing chronic inhibition to acute depolarization could be more clearly articulated. While this approach may be grounded in prior studies, the physiological consequences of chronic silencing differ markedly from those of transient activity, and these distinctions should be more explicitly addressed in the interpretation of results. For example, might lower intensity, chronic stimulation be a better comparison? Since fixation takes place immediately after stimulation, the time window to capture changes in protein recruitment may be curtailed.

      We thank the reviewer for this comment. The introduction of the manuscript now includes a rationale on lines 110-112. By inhibiting evoked synaptic vesicle fusion throughout the lifespan of neurons, we assessed whether this process is necessary for periactive zone assembly and concluded that it is not a requirement. By acutely depolarizing neurons with 50 mM KCl or with a 40 Hz train of action potentials, we were able to test whether synaptic vesicle fusion triggers the rapid recruitment of endocytic proteins to the periactive zone and concluded that this is not the case for most of the endocytic proteins that we studied. While these results indicate that a constitutive pathway must exist to assemble the periactive zone, we remain agnostic as to whether stimulation paradigms not tested in our study can enhance the deployment of endocytic proteins, especially over long periods of time. This may be the case for low, chronic stimulation, as suggested by the reviewer. We clarify these limitations on a “limitations and outlook” section of the discussion (lines 510-543).

      (2) Amphiphysin stood out as the only protein showing a notable change in opposite directions under either active zone protein knockout/blockers and Liprin-α knockout. Given the predominance of negative results, it would be valuable to devote more discussion to why Amphiphysin behaves differently. What functional role might it play in this context that sets it apart from other endocytic components?

      As suggested by the reviewer, we have extended the discussion on Amphiphysin. One possibility why Amphiphysin may respond differently to different genetic manipulations or changes in stimulation is that different endocytic proteins might belong to different endocytic submachineries. This is addressed on lines 421-424. On lines 444-449, we further discuss the subtle decrease in the levels of Amphiphysin and AP-180 in Liprin-α mutants. We suggest that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus, and that this link may be partially disrupted in Liprin-α mutants. Overall, we note that Amphiphysin is still localized to the periactive zone at rest, and hence that it fits with the overall model of constitutive deployment that we propose.

      (3) The claim of activity-independence may need to be nuanced. Although the data suggest no recruitment in response to acute stimulation, the subtle changes following chronic inhibition complicate this interpretation, especially when considering redundancy. If activity-dependence is considered bidirectional, these findings might reflect a more complex regulatory mechanism. The interpretation in lines 188-190 more accurately captures this complexity than earlier generalizations.

      We agree with the reviewer that the dependence on activity should be discussed in a nuanced fashion. We have scrutinized the manuscript on this point and state throughout that recruitment is independent of evoked activity and not necessarily of any kind of activity. We believe that this interpretation is accurate because evoked release of neurotransmitter was ablated by the pharmacological and genetic manipulations that we used. Furthermore, we have included a “Limitations of the study” section in the discussion where we openly address that spontaneous fusion of synaptic vesicles cannot be ruled out as a potential mechanism to sustain periactive zone assembly (lines 514-523). Finally, we have expanded on the complexity of periactive zone assembly relative to activity. In particular, homeostasis may contribute to increased levels of endocytic proteins upon chronic blockade of evoked transmission (lines 404-406).

      (4) Given published work on endophilin's role in activity-dependent endocytic recruitment, adding endophilin (at least in the Drosophila NMJ experiments) would be highly informative.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for these findings compared to previous work on Endophilin [3], which we discuss on lines 407-410:

      “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are compatible with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.

      (5) Line 57 might have a typo in the citation.

      We thank the reviewer for pointing this out. The citations now include: Bai et al., 2010; Jiang et al., 2024; Koh et al., 2007; Winther et al., 2013 and Winther et al. 2015. Please note that these two last citations are grouped as Winther et al. 2013, 2015 following our formatting style.

      (6) Line 208 might be missing a citation that justifies parameters.

      In the revision, this information is discussed on lines 222-224, where we cite our prior work describing these data: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023)”.

      Reviewer #2 (Recommendations for the authors):

      (1) Please consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin [3], which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are consistent with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.

      (2) Expand the discussion of TeNT's limitations-specifically that it does not block spontaneous fusion or alternative fusion pathways-and consider referencing more stringent tools (e.g., Botulinum toxins or SNARE mutants), even if they weren't used here.

      Following the reviewer’s suggestion, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017)” (520-523).

      (3) We encourage the authors to briefly discuss whether Dynamin might contribute to periactive zone structure beyond its role in membrane fission. Loss-of-function data could be particularly informative in future work.

      We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”

      (4) Clarify the interpretation of increased endocytic protein levels upon chronic silencing - are these interpreted as homeostatic responses or experimental variability?

      We suggest that these changes might include homeostatic adaptations. We note that this increase is of the same magnitude as the increase in active zone proteins following a similar pharmacological manipulation on lines 405-406, where we state that “a mechanism for this effect might be a homeostatic response (Wen and Turrigiano, 2024) similar in magnitude to the increase in active zone protein levels following activity blockade (Held et al., 2020).”

      (5) The Discussion could be strengthened by sketching out more concrete experimental approaches to test candidate mechanisms (e.g., roles for actin, lipids, adhesion molecules) in organizing periactive zones.

      The potential roles of the cell adhesion molecules (lines 430-440), cytoskeleton and lipids (442-452) are addressed in the discussion. Furthermore, following the reviewer’s suggestion, we have added the following statement (lines 541-543): “This work builds a foundation to assess alternative mechanisms and models of periactive zone assembly, including roles of the cytoskeleton, lipids, adhesion molecules, and intrinsic endocytic protein interactions”. We hope that the reviewer agrees that the discussion of our paper is not the right format to provide a concrete experimental plan for future work. In our view, the discussion should put the findings of our experiments in the context of the field.

      Reviewer #3 (Recommendations for the authors):

      (1) At a spine synapse, the endocytic zone is estimated to be between 100-200nm from the active zone. The focus of the author's analysis is largely outside of this region (0-150nm), raising the question of whether the area studied may be outside of the area affected by the manipulations made. While STED systems claim ~80 nm resolution, this is rarely achieved in practice, and the authors do not report the effective resolution of their system. Reporting the resolution achieved would address this issue. In addition, super-resolution imaging does not appear to have been used at the Drosophila NMJ. The authors should clarify whether resolution limitations influenced the choice of analysis region and whether their imaging approach is sufficient to detect changes in the endocytic zone.

      We believe that it is unlikely that the relevant signals were missed. First, in mouse synapses, most signal corresponding to endocytic proteins was detected inside the selected region of interest. Our rationale to select the area was based on the fact that expanding the region analyzed would have reduced the sensitivity of our approach, as averaging over a larger area would dilute the signal. The resolution of our microscopy should not be a limitation either. In our previous work, we demonstrated that STED microscopy allows discriminating between the N- and the C-terminal termini of the presynaptic scaffold Bassoon, which are positioned only a few tens of nanometers apart [4]. This establishes that we can resolve differences at tens of nanometers in biological context, which is more relevant than the resolution measured with fluorescent beads (which we have repeatedly assessed to be ~80 nm laterally). Subsequently, we have also been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart [4-12]. Given that the periactive zone spans over a larger area than the distances that we can resolve experimentally in the examples above, we are confident that our measurements are sensitive enough to detect changes in this area.

      Second, for Drosophila NMJs, the choice for the region of interest and the overall analysis was done following a workflow validated in our previous work [13]. This method analyzes both immediately adjacent and more distant regions from the active zone, and does not exclude any region based on distance from the active zone as described on lines 222-224: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023).” In our previous study, we analyzed the distribution of periactive zone proteins at rest with STED microscopy and with Airyscan confocal microscopy. The resolution provided by Airyscan is reported to be ~175 nm in XY and ~400 nm in Z, which is sufficient to assess localization to the periactive zone compartment imaging methods and is not inferior to imaging methods previously used to report changes in the distribution of endocytic proteins; for examples, see [1,2]. In the revised manuscript, we have added new data measuring the levels and distribution of EndoA and Dap160 using STED microscopy (Figure 3 – figure supplement 1). The results acquired with STED microscopy and with Airyscan confocal microscopy are consistent with one another.

      Overall, the accuracy of the imaging methods and analyses used in this study are sufficient to assess periactive zone structure given its size and organization.

      (2) Interestingly, in a number of cases, the authors observe significant differences in endocytic markers (Figure 1q, 4k, 6k, 6r). However, little is made of these differences. The authors should provide more discussion of these changes and how they make sense of them alongside their claims of a lack of effect from their manipulations.

      The reviewer raises a good point. We interpret these changes in two different ways. First, we suggest that changes observed in response to block of action potentials or disassembly of the active zone might be homeostatic. This is addressed on lines 135-137. Second, we discuss that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus. Several active zone proteins interact with the actin cytoskeleton. One of them is Liprin-α. This interaction may explain the decrease in the level of Amphiphysin and AP-180 at the periactive zone in Liprin-α null neurons. This is addressed on lines 444-449. We hope that the reviewer agrees that overall, we should focus on the main conclusion that deployment of endocytic proteins persists over a number of manipulations and synapse types.

      (3) The graphs in Figure 1c and 1g, 3g, 4c, 4e, 6c, and 6g do not appear to be identical. If the solid line represents the mean and the lighter color represents the distribution of these data, these data appear to be different from one another. It is surprising that these differences are not significant. What statistical tests were used to determine whether the differences in these graphs are not significant? Is the issue that a relatively now number of synapses were examined (30-60)? Did the authors conduct a power analysis?

      We apologize if the display of our data and analyses was not clear. We do not perform statistical analyses on the line profiles. Instead, we perform it on two values that are extracted from line profiles. These values are (1) the distance between the peak intensity values of the protein of interest and the marker and (2) the peak intensity values. For example, in Figure 1, distances are quantified and statistically analyzed in panel j, and the peak levels are quantified and statistically analyzed in panel k. We have clarified this in the legend of current Figures 1, 4, 5, and 7.

      (4) The authors clearly state that their experiments address the role of evoked activity in endocytic zone positioning, but they do not examine whether spontaneous vesicle fusion might play a role. Given the availability of Drosophila mutants that decrease (Doc2, Dunc-13) or increase (syt1) spontaneous release, this is a notable omission. Ideally, these mutants should be examined. And at a minimum, the authors should discuss whether spontaneous release could contribute to endocytic zone organization.

      We agree with the reviewer that spontaneous fusion of synaptic vesicles may contribute to periactive zone organization. Many of the genetic manipulations that we used in mouse neurons result in a significant decrease in spontaneous release. This includes Ca<sub>V</sub>2 triple knockouts with a ~60% decrease in spontaneous fusion [10], RIM+ELKS quadruple knockouts with a ~70% decrease in spontaneous fusion [9] and Liprin-α quadruple knockouts with a ~50% decrease in spontaneous fusion [7]. We cannot rule out that the spontaneous release that is left is sufficient to mediate assembly functions. The conclusive way to address this possibility is using a manipulation that ablates spontaneous release without altering other pathways. However, to our knowledge, this is not available. The manipulations suggested by the reviewer might suffer from similar limitations, as they would change the frequency of spontaneous release without fully ablating it, and they would also affect evoked release. We have included a limitations section in the discussion where we address this (lines 514-523), specifically stating “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited. While many of the manipulations used here, including Ca<sub>V</sub>2 knockout (Held et al., 2020), RIM+ELKS knockout (Tan et al., 2022; Wang et al., 2016) and Liprin-α knockout (Emperador-Melero et al., 2024) in hippocampal neurons, and TeNT expression in fly NMJs (Sweeney et al.,1995) , result in 50% to 70% decreased spontaneous release rates, it is possible that the remaining spontaneous release supports periactive zone assembly. Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” We hope that the reviewer agrees that assessing these mutants should be a topic of future studies, given that we already test many mutants in the paper.

      (5) In Figures 1 and 6, the authors assess presynaptic protein localization in cultured neurons, but it is unclear whether these are synaptic sites. Many presynaptic proteins traffic together and can accumulate at sites lacking postsynaptic specializations. The authors should validate that the observed spatial organization occurs at bona fide synapses, ideally by co-labeling with postsynaptic markers as done in Figure 4. If methods like these were used, providing more details on how synapses were identified and selected would be useful to the reader.

      While we understand the reviewer’s point, we are confident that the structures analyzed are bona fide synapses for three reasons, as we have established before across many papers [4-8,10-12,17].

      The diameter of the structures detected using the synaptic vesicle marker Synaptophysin aligns much more closely with the size of the large vesicle clusters found at presynaptic terminals than with that of a few transport vesicles.

      In side-view synapses, the bar-like distribution of the active zone marker (Bassoon or Munc13-1) at one edge of the vesicle cloud indicates that active zone proteins are organized at one edge of the vesicle cluster—consistent with the architecture of synapses.

      Synaptophysin is one of our key markers for detecting synapses. In our cultures, most of the Synaptophysin signal colocalizes with postsynaptic markers (either PSD-95 or Gephyrin), as we have established across many studies [4,7-12]. This indicates that the markers used here are sufficient to select synapses. Furthermore, the frequency at which synapses were identified using an active zone marker as the second marker was similar to that observed when using a postsynaptic marker, suggesting that we were not randomly including unrelated structures.

      (6) Many of the images, particularly of the Drosophila NMJ, are of low quality and are shown in very small images. In addition, the quality of the images throughout the paper makes it difficult to assess the author's analysis and results. The authors should provide larger, higher-quality images that show examples of the means for each of the examples shown. This is an issue for most of the figures, but is particularly prominent in the dNMJ. A minor additional point is that the authors should be clear whether the dNMJ images are collected at super-resolution or using a conventional microscope.

      We believe that the quality of our images is sufficient for the assessments made for the following reasons:

      These images were acquired with enough spatial resolution to assess levels at the PAZ as discussed in response to this reviewer’s first comment. In our previous work, we used images acquired at the same resolution and presented in the same manner for both mouse hippocampal synapses [6,7] and Drosophila NMJs [13,18]. In those previous studies, we drew conclusions at a similar level of detail as in the current study.

      In our view, our representative images are not inferior in quality to other papers in the field addressing similar questions [1,2,19,20].

      We have selected sample images based on the quantified mean values per condition. Hence, we strived to select panels that are objectively representative regarding the quantified parameters.

      We have specified microscopy methods in the figure legends. Specifically, for Drosophila NMJs, we used Airyscan confocal microscopy and STED microscopy. For each experiment, it is now stated which microscopy method was used in the corresponding legend.

      References:

      (1) Winther, Å. M. E. et al. An Endocytic Scaffolding Protein together with Synapsin Regulates Synaptic Vesicle Clustering in the Drosophila Neuromuscular Junction. J Neurosci 35, 14756–14770 (2015).

      (2) Winther, Å. M. E. et al. The dynamin-binding domains of Dap160/intersectin affect bulk membrane retrieval in synapses. J Cell Sci 126, 1021–1031 (2013).

      (3) Bai, J., Hu, Z., Dittman, J. S., Pym, E. C. G. & Kaplan, J. M. Endophilin functions as a membrane-bending molecule and is delivered to endocytic zones by exocytosis. Cell 143, 430–441 (2010).

      (4) Wong, M. Y. et al. Liprin-alpha3 controls vesicle docking and exocytosis at the active zone of hippocampal synapses. Proc Natl Acad Sci U S A 115, 2234–2239 (2018).

      (5) Emperador-Melero, J., de Nola, G. & Kaeser, P. S. Intact synapse structure and function after combined knockout of PTPδ, PTPσ, and LAR. Elife 10, (2021).

      (6) Emperador-Melero, J. et al. PKC-phosphorylation of Liprin-α3 triggers phase separation and controls presynaptic active zone structure. Nat Commun 12, 3057 (2021).

      (7) Emperador-Melero, J. et al. Distinct active zone protein machineries mediate Ca2+ channel clustering and vesicle priming at hippocampal synapses. Nature Neuroscience 2024 1–15 (2024) doi:10.1038/s41593-024-01720-5.

      (8) Tan, C., Wang, S. S. H., de Nola, G. & Kaeser, P. S. Rebuilding essential active zone functions within a synapse. Neuron 110, 1498-1515.e8 (2022).

      (9) Wang, S. S. H. et al. Fusion Competent Synaptic Vesicles Persist upon Active Zone Disruption and Loss of Vesicle Docking. Neuron 91, 777–791 (2016).

      (10) Held, R. G. et al. Synapse and Active Zone Assembly in the Absence of Presynaptic Ca(2+) Channels and Ca(2+) Entry. Neuron 107, 667-683.e9 (2020).

      (11) Chin, M. & Kaeser, P. S. The intracellular C-terminus confers compartment-specific targeting of voltage-gated calcium channels. Cell Rep 43, 114428 (2024).

      (12) Nyitrai, H., Wang, S. S. H. & Kaeser, P. S. ELKS1 Captures Rab6-Marked Vesicular Cargo in Presynaptic Nerve Terminals. Cell Rep 31, 107712 (2020).

      (13) Del Signore, S. J., Mitzner, M. G., Silveira, A. M., Fai, T. G. & Rodal, A. A. An approach for quantitative mapping of synaptic periactive zone architecture and organization. Mol Biol Cell 34, (2023).

      (14) Sweeney, S. T., Broadie, K., Keane, J., Niemann, H. & O’Kane, C. J. Targeted expression of tetanus toxin light chain in Drosophila specifically eliminates synaptic transmission and causes behavioral defects. Neuron 14, 341–351 (1995).

      (15) Kaeser, P. S. & Regehr, W. G. Molecular mechanisms for synchronous, asynchronous, and spontaneous neurotransmitter release. Annu Rev Physiol 76, 333–363 (2014).

      (16) Santos, T. C., Wierda, K., Broeke, J. H., Toonen, R. F. & Verhage, M. Early Golgi Abnormalities and Neurodegeneration upon Loss of Presynaptic Proteins Munc18-1, Syntaxin-1, or SNAP-25. Journal of Neuroscience 37, 4525–4539 (2017).

      (17) de Jong, A. P. H. et al. RIM C2B Domains Target Presynaptic Active Zone Functions to PIP2-Containing Membranes. Neuron 98, 335-349.e7 (2018).

      (18) Del Signore, S. J. et al. An autoinhibitory clamp of actin assembly constrains and directs synaptic endocytosis. Elife 10, (2021).

      (19) Imoto, Y. et al. Dynamin 1xA interacts with Endophilin A1 via its spliced long C-terminus for ultrafast endocytosis. EMBO Journal https://doi.org/10.1038/S44318-024-00145-X

      (20) Imoto, Y. et al. Dynamin is primed at endocytic sites for ultrafast endocytosis. Neuron 110, 2815-2835.e13 (2022).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the relationship between 3D chromatin architecture and innate immune gene regulation in monocytes from patients with alcohol-associated hepatitis (AH). Using Hi-C technology, they attempt to identify structural changes in the genome that correlate with altered gene expression. Their central claim is that genome restructuring contributes to the hyper-inflammatory phenotype associated with AH.

      Strengths:

      (1) The manuscript employs Hi-C technology, which, in principle, is a powerful approach for studying genome organization.

      (2) The focus on disease-relevant genes, particularly innate immune loci, provides a contextually important angle for understanding AH.

      Weaknesses:

      (1) Sample Size: The study relies on an exceptionally small cohort (4 AH patients and 4 healthy controls), rendering the results statistically underpowered and highly susceptible to variability.

      (2) Hi-C Resolution unpaired to RNA seq: The data are presented at a resolution of 100kb, which is insufficient to uncover meaningful chromatin interactions at the level of individual genes. This data is unpaired.

      (3) Functional Validation: The manuscript lacks experiments to directly link changes in chromatin architecture with gene expression or monocyte function, leaving the claims speculative.

      (4) Data Integration: The lack of Hi-C with ATAC and RNA-seq data handicaps the analysis and really makes it superficial. In short, it does not convincingly demonstrate a functional relationship.

      (5) Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.

      Appraisal of the Aims and Results:

      The manuscript sets out to establish a connection between chromatin architecture and AH pathology. However, the study fails to achieve its stated aims due to inadequate methods and insufficient data. The conclusions drawn from the Hi-C analyses alone are poorly supported, and the lack of functional validation undermines the credibility of the proposed mechanisms. Overall, the results do not provide compelling evidence to substantiate the authors' claims.

      Impact on the Field and Utility to the Community:

      The work, in its current form, is unlikely to have a meaningful impact on the field. The limited scope, methodological shortcomings, and lack of robust data significantly diminish its potential utility. Without addressing these critical gaps, the study does not offer new insights into the role of genome architecture in AH or provide useful methodologies or datasets for the community.

      Additional Context:

      The manuscript would benefit from a more comprehensive analysis of potential mechanisms underlying the observed changes, including the interplay between chromatin architecture and epigenetic modifications. Furthermore, longitudinal studies or therapeutic interventions could provide insights into the dynamic aspects of genome restructuring in AH. These considerations are entirely absent from the current study.

      Conclusion:

      The manuscript does not achieve its stated goals and does not present sufficient evidence to support its conclusions. The limitations in sample size, resolution, and experimental rigor severely hinder its contribution to the field. Addressing these fundamental flaws will be essential for the work to be considered a meaningful addition to the literature.

      Reviewer #2 (Public review):

      Summary:

      Dr. Adam Kim and collaborators study the changes in chromatin structure in monocytes obtained from alcohol-associated hepatitis (AH) when compared to healthy controls (HC). Through the usage of high throughput chromatin conformation capture technology (Hi-C), they collected data on contact frequencies between both contiguous and distal DNA windows (100 kB each); mainly within the same chromosome. From the analyses of those data in the two cohorts under analysis, authors describe frequent pairs of regions subject to significant changes in contact frequency across cohorts. Their accumulation onto specific regions of the genome -referred to as hotspots- motivated authors to narrow down their analyses to these disease-associated regions, in many of which, authors claim, a number of key innate immune genes can be found. Ultimately, the authors try to draw a link between the changes observed in chromatin architecture in some of these hotspots and the differential co-expression of the genes lying within those regions, as ascertained in previous single-cell transcriptomic analyses.

      Strengths:

      The main strength of this paper lies in the generation of Hi-C data from patients, a valuable asset that, as the authors emphasize, offers critical insights into the role of chromatin architecture dysregulation in the pathogenesis of alcohol-associated hepatitis (AH). If confirmed, the reported findings have the potential to highlight an important, yet overlooked, aspect of cellular dysregulation-chromatin conformation changes - not only in AH but potentially in other immune-related conditions with a component of pathological inflammation.

      Weaknesses:

      In what I regard as the two most important weaknesses of the work, I feel that they are more methodological than conceptual. The first of these issues concerns the perhaps insufficient level of description provided on the definition of some key types of genomic regions, such as topologically associated domains, DNA hotspots, or even DNA loci showing significant changes in contact frequency between AH and HC. In spite of the importance of these concepts in the paper, no operational, explicit description of how are they defined, from a statistical point of view, is provided in the current version of the manuscript.

      Without these definitions, some of the claims that authors make in their work become hard to sustain. Some examples are the claim that randomizing samples does not lead to significant differences between cohorts; the claim that most of the changes in contact frequency happen locally; or the claim that most changes do not alter the structure of TADs, but appear either within, or between TADs. In my viewpoint, specific descriptions and implementation of proper tests to check these hypotheses and back up the mentioned specific claims, along with the inclusion of explicit results on these matters, would contribute very significantly to strengthening the overall message of the paper.

      The second notable weakness of the study pertains to the characterization of the changes observed around immune genes in relation to genome-wide expectations. Although the authors suggest that certain hotspots contain a high number of immune-related genes, no enrichment analysis is provided to verify whether these regions indeed harbor a higher concentration of such genes compared to other genomic areas. It would be important for readers to be promptly informed if no such enrichment is observed, for in that case, the presence of some immune genes within these hotspots would carry more limited implications.

      Additionally, the criteria used to define a hotspot are not clearly outlined, making it difficult to assess whether the changes in contact frequencies around the immune genes highlighted in figures 5-8 are truly more pronounced than what would be expected genome-wide.

      Reviewer #3 (Public review):

      In this manuscript, the authors use HiC to study the 3D genome of CD14+ CD16+ monocytes from the blood of healthy and those from patients with Alcohol-associated Hepatitis.

      Overall, the authors perform a cursory analysis of the HiC data and conclude that there are a large number of changes in 3D genome architecture between healthy and AH patient monocytes. They highlight some specific examples that are linked to changes in gene expression. The analysis is of such a preliminary nature that I would usually expect to see the data from all figures in just one or two figures.

      In addition, I have a number of concerns regarding the experimental design and the depth of the analyses performed that I think must be addressed.

      (1) There is a myriad of literature that describes the existence of cell type-specific 3D genome architecture. In this manuscript, there is an assumption by the authors that the CD14+ CD16+ monocytes represent the same population from both healthy and diseased patients. Therefore, the authors conclude that the differences they see in the HiC data are due to disease-related changes in the equivalent cell types. However, I am concerned that the AH patient monocytes may have differentiated due to their environment so that they are in fact akin to a different cell type and the 3D genome changes they describe reflect this. This is supported by published articles for example: Dhanda et al., Intermediate Monocytes in Acute Alcoholic Hepatitis Are Functionally Activated and Induce IL-17 Expression in CD4+ T Cells. J Immunol (2019) 203 (12): 3190-3198, in which they show an increased frequency of CD14+ CD16+ intermediate monocytes in AH patients that are functionally distinct.

      I suggest that if the authors would like to study the specific effects of AH on 3D genome architecture then they should carefully FACsort the equivalent monocyte populations from the healthy and AH patients.

      (2) The analysis of the HiC data is quite preliminary. In the 3D genome field, it is usual to report the different scales of genome architecture, for example, compartments, topologically associated domains (TADs), and loops. I think that reporting this information and how it changes in AH patients in the appropriate cell types would be of great interest to the field.

      We thank the reviewers for their careful and thorough examination of our manuscript. We agree with all of their comments regarding the limitations of the study. Many of the criticisms focus on the small sample size of our study (n=4 for healthy controls and disease patients) in both Hi-C and single-cell RNA-seq experiments, and that these experiments are unpaired, or in other words, PBMCs came from different patients for each experiment.

      Unfortunately, these experiments are fairly complicated to perform, requiring patient cells and very expensive deep sequencing. We are not currently in a position to be able to easily or cost effectively increase sample size. In the case of Hi-C, we still believe our study to be of value as Hi-C is not a commonly used technique to study disease effects on chromatin, and very few studies have employed a large enough sample size to perform statistical comparisons. Additionally, to analyze the data at a higher resolution would require deeper sequencing, and unfortunately we do not have the resources to sequence these libraries deeper. Regarding the single-cell RNA-seq data, this dataset was generated for an earlier study [1] focusing on gene expression responses to LPS, and we were unable to get PBMCs from exactly the same patients to perform the Hi-C study.

      We disagree that our study has limited scientific value. Our study is the first to use Hi-C to show that the 3D genome architecture of primary monocytes is changed in a disease context. The only other study to follow a similar approach performed Hi-C in monocytes from 2 healthy and 2 Systemic lupus erythematosus (SLE) patients, and in their study the data from both patients were combined prior to comparison. No statistics were performed and their conclusion was no differences in genome architecture due to disease. They did find differences between primary monocytes and the THP1 monocytic cell line, but this lacked statistical analysis. Their conclusion was that inflammatory disease may not lead to genome wide changes in architecture. Our study, though a very different disease than SLE, shows statistically significant differences between AH and healthy controls. We believe our study lays the groundwork for how Hi-C can be used to study genome architecture in human disease, and the possible downstream effects.

      Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.

      This is an interesting suggestion. This dataset only contains 4 AH patients, which we have included basic clinical data in Supplemental Table 1, including Age, HCA1c, Bilirubin, AST, ALT, Creatinine, Albumin, and MELD score. 3/4 of these patients are severe AH while 1 is moderate (AH2). Despite one patient being moderate, all four AH patients had similar correlations with each other, suggesting these disease specific differences we observed are not indicative of severity. More patient samples are needed to determine if genome architecture changes throughout disease progression. We have added this important discussion to the manuscript (page 12, lines 5-14).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The criteria used to determine which pairs of regions exhibit significant differences in contact frequency between alcohol-associated hepatitis (AH) and healthy controls (HC) are not disclosed. It would be beneficial for the authors to provide this information, including details such as the number of pairs tested, the nature of the statistical tests conducted, the method of multiple testing correction applied, as well as the significance thresholds used, and the number of loci-pairs below these thresholds for each chromosome. This information would greatly enhance the reader's understanding of the relevance of the reported findings.

      Thank you for this comment, though we are not sure we totally understand. All of our statistics were performed using multiHiCcompare [2], where we input all 8 datasets (.hic files from Juicer), then measured statistical differences between defined groups (HC vs AH). For our randomization studies, we randomized the group comparisons, so each group contained a mix of HC and AH.

      Second, a formal statistical definition of what constitutes a hotspot would be valuable for clarity.

      Thank you for this suggestion. Initially, hotspots were defined as just regions of the genome with a high frequency of very significant differential contacts. We have defined a more formal definition of “hotspot” based on similar criteria. A hotspot is defined by both adjusted p value and frequency of locations. First, we filtered all pair-wise chromosomal interactions by a very, very stringent padj < 0.0000001 to focus on only the most changed coordinates (Supplemental Table 4). Then we looked for regions of the genome with a high frequency of these differential locations. Borders for each hotspot were determined more liberally by looking at the full list of differential spots (padj < 0.05). Then we used code to list genes within each interacting region. We have added these important details to the Methods (page 14, lines 11-14).

      Third, a clear definition of the criteria used to identify different topologically associated domains (if these were indeed defined in the data and/or utilized in the analyses) would also be a helpful addition.

      Thank you for this suggestion, we did not identify TADs or really utilize TADs in any of these analyses.

      Likewise, several statements throughout the paper lack support from specific analyses, although it should be feasible to implement such analyses (or at least present them if they have already been conducted) to substantiate these claims:

      If randomizing samples does not result in significant differences between (randomized) cohorts, it would be beneficial to provide insights into the number of loci pairs that exhibit differences in frequency when using both the actual and randomized cohorts.

      Thank you for asking this question, as this is an important point. Using multiHiCcompare, if we compare WT (n=4) to AH (n=4), we get the results in the figures and supplementary data but if we randomize Group 1 (WT, WT, AH, AH) vs Group 2 (WT, WT, AH, AH), we get almost 0 significant changes in contact frequency. To show this more robustly, we performed 5 randomized comparisons and found far fewer changes in contact frequency between groups. This shows that these changes in contact frequency caused by disease are not random, but rather due to our real difference in AH. This point has been added to the Results (page 6, lines 15-17), and Methods (page 14, lines 16-21)

      If most changes in contact frequency occur locally, it would be useful to visualize the relationship between effect sizes and/or significance levels for the observed differences in frequency in relation to the distance between the involved loci. Additionally, comparing these results to the average baseline contact intensities as a function of distance would be informative. This comparison could help determine whether the distance decay in effect size/significance for the differences between AH and HC is faster or slower than the decay rates for baseline contact frequencies.

      This is a good suggestion. In our initial analysis, we made a number of figures relating chromosome positions, distance between loci, and statistics regarding the differential contact frequency. In the initial submission, we only showed Figure 3, which shows the logFC (log fold change) for the differential contact frequency by chromosomal position on both sides. To address this question, we have added a supplemental figure showing logFC as a function of the distance between two loci (new Supplemental Figure 3)

      Similarly, the assertion that most changes do not affect the structure of topologically associated domains (TADs) but occur either within or between TADs should be supported by specific testing; otherwise, or else, removed.

      Thank you, yes we have adjusted the language in the Discussion

      Furthermore, the authors should clarify whether differences in chromatin conformation are more pronounced around immune genes compared to genome-wide expectations. If this is not the case, it would be helpful to quantify the intensity of these differences around the highlighted genes in relation to the rest of the genome. To achieve this, I would suggest the following:

      Conduct enrichment analyses on the genes located within the most prominent hotspots to determine whether they are significantly enriched in immune genes (and, or, alternatively, in any other functional category).

      Estimate the average absolute fold change in contact frequency within all topologically associated domains (TADs) identified in the study. This would allow for the identification of immune gene-containing TADs highlighted in Figures 5-8, providing readers with a quantitative understanding of how anomalously different these genomic regions are with regards to the magnitude of its alterations in AH, compared to the rest of the genome.

      While some of the selected gene clusters appear to co-localize well with topologically associated domains (e.g., Figures 5A, 8A), others seemingly encompass either multiple TADs (Figure 6) or only portions of them (Figure 7). This should be clarified.

      Thank you, this is a great suggestion. In order to be as unbiased as possible, we took all genes present in the regions with the highest significant changes in genome (Supplemental Table 4) that we used to identify the hotspots. And you are correct, we do in fact see enrichment of genes involved in innate immune signaling. This has been added to Results (page 7, lines 19-25) and Figure 4.

      Finally, there are several minor issues concerning the figures that could be easily addressed to substantially enhance their readability:

      Font sizes in most figures should be increased, particularly for some axis labels and tick marks. This issue affects most figures; for instance, in Figure 4, it hinders the reader's ability to interpret the ranges of the data presented.

      Thank you, the figures have been adjusted

      Figures 5 to 8 (panels A and B) would benefit significantly from a more consistent format. Specifically, the gene cluster boxes should also be included in the right panels, and the gene locations should be displayed on the left in a uniform format across all figures (e.g., formatting Figures 7 and 8 to match the style of Figures 5 and 6).

      Figures 5 and 6 have a similar structure to each other because we were focusing on all of the genes in that chromosomal region. Figures 7 and 8 are different because we are focusing on how the region around a certain hotspot of interest changes.

      It is also important to note that the genes plotted in Figures 8C and 8D are not the same. Concerning these two panels, it would be valuable to clarify whether the data presented pertains exclusively to monocytes. If so, information regarding the number of cells analyzed and the number of donors from which they were drawn would also be beneficial.

      These figures are generated using scRNA-seq data. They represent all of the genes expressed in that region of the genome, in their chromosomal position. If a gene is not expressed in the scRNA-seq data, then it is not shown. I have debated with myself a lot on how to show gene expression in a region of the genome, but I think this is the clearest way to show this; including the genes that have no expression would make it more confusing. But yes, if you compare HC and AH, you see some differences in the list of genes. We have added more clarity to the figure legend for this figure.

      References

      (1) Kim, A., Bellar, A., McMullen, M. R., Li, X. & Nagy, L. E. Functionally Diverse Inflammatory Responses in Peripheral and Liver Monocytes in Alcohol-Associated Hepatitis. Hepatol Commun 4, 1459-1476 (2020). https://doi.org:10.1002/hep4.1563

      (2) Stansfield, J. C., Cresswell, K. G. & Dozmorov, M. G. multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments. Bioinformatics 35, 2916-2923 (2019). https://doi.org:10.1093/bioinformatics/btz048

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Lemen et al. represents a comprehensive and unique analysis of gene networks in rat models of opioid use disorder, using multiple strains and both sexes. It provides a time-series analysis of Quantitative Trait Loci (QTLs) in response to morphine exposure.

      Strengths:

      A key finding is the identification of a previously unknown morphine-sensitive pathway involving Oprm1 and Fgf12, which activates a cascade through MAPK kinases in D1 medium spiny neurons (MSNs). Strengths include the large-scale, multi-strain, sex-inclusive design, the time-series QTL mapping provides dynamic insights, and the discovery of an Oprm1-Fgf12-MAPK signaling pathway in D1 MSNs, which is novel and relevant.

      Weaknesses:

      (1) The proposed involvement of Nav1.2 (SCN2A) as a downstream target of the Oprm1-Fgf12 pathway requires further analysis/evidence. Is Nav1.2 (SCN2A) expressed in D1 neurons?

      The authors mentioned that SCN8A (Nav1.6) was tested as a candidate mediator of Oprm1-Fgf12 loci and variation in locomotor activity. However, the proposed model supports SCN2A as a target rather than SCN8A. This is somewhat unexpected since SCN8A is highly abundant in MSN.

      Can the authors provide expression data for SCN2A, Oprm1, and Fgf12 in D1 vs. D2 MSNs?

      Author response image 1.

      We generated Author response image 1 to show both Scn2a and Scn8a are ubiquitously expressed in MSN and GABAergic neurons.

      (2) The authors should consider adding a reference to FGF12 in Schizophrenia (PMC8027596) in the Introduction.

      This is a relevant reference. We have cited it in the discussion section instead of introduction because we felt that is more relevant.

      (3) There is recent evidence supporting the druggability of other intracellular FGFs, such as FGF14 (PMC11696184) and FGF13 (PMC12259270), through their interactions with Nav channels. What are the implications of these findings for drug discovery in the context of the present study? Could FGF12 be considered a potential druggable therapeutic target for opioid use disorder (OUD)?

      The recent success in targeting FGF14 and FGF13 protein-protein interactions with sodium channels suggests that FGF12 could indeed be a druggable target for OUD. We have added a section to the Discussion exploring the potential for developing small-molecule modulators of the FGF12-Nav interface as a novel therapeutic strategy.

      Reviewer #2 (Public review):

      Summary:

      This highly novel and significant manuscript re-analyzes behavioral QTL data derived from morphine locomotor activity in the BXD recombinant inbred panel. The combination of interacting behavioral-pharmacology (morphine and naltrexone) time course data, high-resolution mouse genetic analyses, genetic analysis of gene expression (eQTLs), cross-species analysis with human gene expression and genetic data, and molecular modeling approaches with Bayesian network analysis produces new information on loci modulating morphine locomotor activity.

      Furthermore, the identification of time-wise epistatic interactions between the Oprm1 and Fgf12 loci is highly novel and points to methodological approaches for identifying other epistatic interactions using animal model genetic studies.

      Strengths:

      (1) Use of state-of-the art genetic tools for mapping behavioral phenotypes in mouse models.

      (2) Adequately powered analysis incorporating both sexes and time course analyses.

      (3) Detection of time and sex-dependent interactions of two QTL loci modulating morphine locomotor activity.

      (4) Identification of putative candidate genes by combined expression and behavioral genetic analyses.

      (5) Use of Bayesian analysis to model causal interactions between multiple genes and behavioral time points.

      Weaknesses:

      (1) There is a need for careful editing of the text and figures to eliminate multiple typographical and other compositional errors.

      We have performed a thorough review of the manuscript and corrected typographical errors, including "ddactivates" and other compositional issues.

      (2) There are multiple examples of overstating the possible significance of results that should be corrected or at least directly pointed out as weaknesses in the Discussion. These include:

      (a) Assumption that the Oprm1 gene is the causal candidate gene for the major morphine locomotor Chr10 QTL at the early time epochs. Oprm1 is 400,000 bp away from the support interval of the Mor10a QTL locus, and there is no mention as to whether the Oprm1 mRNA eQTL overlaps with Mor10a.

      We have clarified this in the text. While Oprm1 is located proximal to the peak, its massive size and the presence of a strong mRNA cis-eQTL in the NAc and hippocampus that precisely overlaps with the Mor10a QTL support interval provide robust evidence for its candidacy. We have added this detail to the Results section.

      (b) Although the Bayesian analysis of possible complex interactions between Oprm1, Fgf12, other interacting genes, and behaviors is very innovative and produces testable hypotheses, a more straightforward mediation analysis of causal relationships between genotype, gene expression, and phenotype would have added strength to the arguments for the causal role of these individual genes.

      We agree that mediation analysis would be a valuable addition. We revised the Results section to acknowledge that while the Bayesian network provides a comprehensive causal hypothesis, future studies employing formal mediation analysis could further strengthen these individual gene-to-behavior links.

      (c) The GWAS data analysis for Oprm1 and Fgf12 is incomplete in not mentioning actual significance levels for Oprm1 and perhaps overstating the nominal significance findings for Fgf12.

      We have updated the manuscript to include the specific significance levels for the human GWAS findings related to Oprm1 and Fgf12. We have clarified that the OPRM1 variant rs1799971 reached genome-wide significance (OR = 1.046, p = 4.92 × 10<sup>-9</sup>). Furthermore, we have ensured that the findings for FGF12 are described as nominally significant to avoid any overstatement of the results. For example, we now specify that the top FGF12 SNP rs1553460 achieved nominal significance (OR = 1.015, p = 0.021). The Results and Discussion sections have been revised to reflect these precise statistical values.

      Appraisal:

      The authors largely succeeded in reaching goals with novel findings and methodology.

      Significance of Findings:

      This study will likely spur future direct experimental studies to test hypotheses generated by this complex analysis. Additionally, the broad methodological approach incorporating time course genetic analyses may encourage other studies to identify epistatic interactions in mouse genetic studies.

      Reviewer #3 (Public review):

      Summary:

      This is a clearly written paper that describes the reanalysis of data from a BXD study of the locomotor response to morphine and naloxone. The authors detect significant loci and an epistatic interaction between two of those loci. Single-cell data from outbred rats is used to investigate the interaction. The authors also use network methods and incorporate human data into their analysis.

      Strengths:

      One major strength of this work is the use of granular time-series data, enabling the identification of time-point-specific QTL. This allowed for the identification of an additional, distinct QTL (the Fgf12 locus) in this work compared to previously published analysis of these data, as well as the identification of an epistatic effect between Oprm1 (driving early stages of locomotor activation) and Fgf12 (driving later stages).

      Weaknesses:

      (1) What criteria were used to determine whether the epistatic interaction was significant? How many possible interactions were explored?

      By design we only tested for epistasis between the Oprm1 and the Fgf12 loci—a single test of a non-linear interaction. As such there is no correction for multiple tests and no need for permutation. In other words the “nominal” P value in this case is the only relevant P value. We have added this clarification in the Results and Methods.

      (2) Results are presented for males and females separately, but the decision to examine the two sexes separately was never explained or justified. Since it is not standard to perform GWAS broken down by sex, some initial explanation of this decision is needed. Perhaps the discussion could also discuss what (if anything) was learned as a result of the sex-specific analysis. In the end, was it useful?

      We chose to analyze sexes separately AND jointly due to significant sex differences and sex by strain interactions in locomotion data. This rationale has been added to the results section. We also discussed sex-specific results in the revision.

      (3) The confidence intervals for the results were not well described, although I do see them in one of the tables. The authors used a 1.5 support interval, but didn't offer any justification for this decision. Is that a 95% confidence interval? If not, should more consideration have been given to genes outside that interval? For some of the QTLs that are not the focus of this paper, the confidence intervals were very large (>10 Mb). Is that typical for BXDs?

      The 1.5 LOD support interval is a standard metric for most QTL mapping studies, and does correspond approximately to a 95% confidence or support interval. Large intervals are common in BXD studies when effect sizes are moderate or recombination density is lower in specific regions. We have clarified the use of the 1.5 LOD interval in the Results section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the vast majority of the figures, the text is too small to read.

      We have adjusted the font size in most of the figures.

      Reviewer #2 (Recommendations for the authors):

      (1) There is a need for careful editing of the text and figures to eliminate multiple typographical and other compositional errors. Examples of these include:

      (a) Figure 2E&F lacks identification of Oprm1 as the gene for cis-eQTL studies.

      (b) Figure 2H is fairly uninterpretable given the small font sizes. It should be excluded, put as a supplemental figure, or reconfigured to highlight the most important findings in a more legible manner.

      (c) Figure 4b: columns in the table need to be identified by a header row.

      We thank the reviewer for these comments and have addressed them in the revised version.

      Oprm1 is now labeled in Figure 2E and 2F, Figure 2G and 2H is now moved to the Supplementary material. And a header row is added to the table in Figure 4b.

      Reviewer #3 (Recommendations for the authors):

      Abstract

      (1) For the abstract, it might be simpler to name the alleles as "the C57BL/6J allele", etc., since B allele will confuse people unfamiliar with mouse nomenclature.

      It is critical to not confound the organism known as C57BL/6J with the genotype, allele, or haplotype that a mouse happens to inherit. Diverse types of mice inherit reference alleles but they may be only very distantly related the C57BL/6J strain. And even the C57BL/6J strain is a moving target that accumulates mutations that are not even consider reference. For example the mutation in Gabra2 of C57BL/6J is a de novo mutation that is not carried by many of the BXD strains since this mutation happened in JAX foundation stock after the BXDs were first established by Dr. Ben Taylor in the 1970s.

      The convention is to refer to mouse strains by one string and RRID, the abbreviation of that strain by a common code (often B6), and the abbreviation of the allele, genotype, or haplotype by the italic letter B. This has been the recommendation of the Mouse Nomenclature Committee (on which one of the authors has been a member) for well over 50 years.

      (2) I wondered if "also associated with a high B allele" could be reworded somehow; I had to re-read that sentence several times.

      This sentence has been reworded for clarity.

      (3) Parts of the abstract are written in the present tense, but then it switches to past ("we generated" but then "a Bayesian network analysis supports...").

      We have thoroughly revised the abstract. Following standard scientific writing conventions, we now utilize the past tense to describe the specific experimental actions and results of this study. We have maintained the present tense for established biological facts and the broader significance of the findings.

      (4) While the -log(p) values are all impressive, the abstract should indicate what threshold is used for genome-wide significance and how that threshold was obtained.

      We have added the significance threshold to the Abstract.

      (5) Do the details of the MAP kinase cascade need to be explained in the abstract? It feels like a lot of detail for an abstract and represents one of the most speculative aspects of the paper. Maybe just say you identified a possible network, but save the details for the main paper.

      This is a valid suggestion. We removed the specific MAP kinase from the abstract.

      Introduction

      (1) You could add a sentence explaining why using an LMM (GEMMA) was an improvement over the prior analysis.

      We have added a sentence explaining that GEMMA improves mapping power and better controls for population structure compared to previous methods.

      (2) When mentioning Philips 2010, you could indicate that it identified Oprm1. This might be easier than "In addition to Oprm1" which confused me at first because it had not been mentioned before, so 'in addition' was jarring.

      We have revised the text to state that Philip et al. (2010) originally identified the Oprm1 locus.

      Results

      (1) There are additional instances of the tense switching between past and present in the results section.

      We have standardized the tenses in the Results section.

      (2) "Ostn, Uts2d, Ccdc50, Gm10823, Fgf12, and Mb21d2" - before giving arguments for fgf12, can you clarify if there are coding variants or eQTLs for any of these genes?

      We have added a statement clarifying the coding variants for other genes in this interval and highlighting their eQTL status.

      (3) "a total number of 4,495 high-quality nuclei transcriptomes". Consider removing the word "number".

      Removed.

      (4) "approximately 6 males and 6 females" - could you point the reader to a supplementary table that has the exact number of individuals at the end of this sentence?

      The exact number of mice used in each of the BXD strains is not recorded in the original publication by Philip et al., with only mean and max was given. We have clarified that 6 is the average.

      (5) "computed using a subset" - please explain how you selected this subset (I assumed LD pruning, but why not be explicit. How many SNPs/markers were there originally, and how many are retained?

      We have specified that the subset of markers was selected via LD pruning to represent the genetic diversity of the BXDs.

      (6) A few words about how the significant threshold was obtained (permutation?) are needed.

      We have clarified that the significance threshold was obtained through 1,000 permutations.

      (7) Some of the GWAS results are presented for males and females separately (as well as combined). This is not typical, and so maybe a sentence explaining why the authors thought there might be sex specific GWAS results would be warranted.

      The rationale for sex-specific analysis is provided in the results section (significant sex difference and sex by strain interaction)

      (8) The correlation between the sexes of 0.68 could be evidence that there are sex-specific genetic effects, but could it also just be due to increased noise as you reduce sample size? What is the confidence interval for that number? Does it include 1? Or 0? If you randomly split the dataset, rather than splitting on the basis of sex, would you obtain higher correlations? The idea of sex differences is interesting, but a bit more work is needed to clarify these concerns.

      The correlation of 0.68 (95% CI: 0.52–0.79) significantly excludes both 0 and 1. The drop from r = ~0.86 at earlier intervals suggests a biological shift rather than noise due to sample size, as n remains constant (n = ~ 6 /sex/strain) across all time points. This divergence is driven by sex-specific genetic modifiers, such as the Fgf12 locus, which is more than twice as strong in females (LOD 10.6) as in males (LOD 4.3). We have addressed this in the revision.

      (9) Maybe I missed it, but how did you determine the threshold for significance for the epistatic interaction? Could you also clearly indicate how many possible cases of epistasis were examined/considered, since that dictates the correction for multiple testing.

      We only tested the interaction between the Fgf12 and the Oprm loci.

      (10) "To further examine whether Oprm1 and Fgf12 were co-expressed in the same cells of the NAc," can you first give an indication as to why you looked in NAc versus other brain areas you might have considered?

      We have added a sentence explaining that the NAc was chosen due to its central role in opioid reward and the observed strain differences in dopamine release in this region.

      (11) "...from every cell type conveyed a weak but significant positive correlation (r = 0.08, p = 1.8e-8) between the expression of Oprm1 and Fgf12 (Figure 7e). When we performed Pearson's correlation analysis within each individual cell cluster, only D1-MSN-3 had a significant positive correlation (r = 0.35, p = 6.1e-8, Figure 7f). In contrast, D1-MSN-2 had a significantly weak negative correlation (r = -0.12, p = 0.02, Figure 7g)." Can you explain why these correlations are relevant? What hypothesis are you testing?

      We have clarified that these correlations were used to test the hypothesis that Oprm1 and Fgf12 are co-expressed and potentially co-regulated within the same neuronal subtype to support their epistatic interaction.

      (12) "After the morphine locomotion tests were complete," can you give a specific timepoint? Like, was it exactly 180 minutes after the morphine injection?

      We have specified that naloxone was injected exactly 180 minutes after the morphine injection.

      (13) I appreciate the desire to relate the results of this paper to human GWAS results; however, I don't feel there is much worth discussing beyond the Oprm1 finding. Therefore, I would suggest removing this from the results section and instead just making it a discussion topic. The results presented are clearly the weakest part of this paper, and I personally think it is a shame to end the results section with something that is not very informative. But I suspect the authors may wish to retain this section, and I leave that decision to them and the editor.

      We have retained this section but moved some of the more speculative human data discussion to the Discussion section as suggested.

      Discussion

      (1) Typo "deactivates".

      Corrected to "activates".

      (2) The last sentence in the first paragraph again discusses the comparison to humans; I would remove this.

      That sentence is condensed.

      (3) "These data indicate that Oprm1 is a strong candidate gene for the Chr 10 locus associated with morphine-induced locomotion response." I would remind them of the eQTL for Oprm1 since this is a key piece of evidence supporting this gene as a candidate.

      We have added a reminder of the overlapping mRNA cis-eQTL for Oprm1.

      (4) "It is likely that differences in morphine-induced dopamine release are involved in the highly variable locomotor responses to morphine across the BXD family." I agree this might be true, but since you have no evidence to support this claim, is it worth mentioning at all?

      We have rephrased this as a hypothesis or cited relevant literature supporting this link in parental strains.

      (5) Could you include a sentence or two about why Philip 2010 didn't find Fgf12? Lack of markers? The difference between an LM and an LMM?

      We have added an explanation that the use of a high-density WGS-based marker set and the LMM (GEMMA) allowed for the detection of this novel locus that was previously missed.

      (6) Section titled "Cell-type specific gene expression in NAc". While this is interesting, you might also want to remind the reader that epistatic interactions do not necessarily require the genes to be expressed in the same cell or for their gene products to physically interact.

      We have added this caveat to the Discussion.

      (7) I think the Bayesian network section is not very strong. For example, they did not compare the results for their two chosen genes to the results they might have obtained if they had chosen other genes from their QTL intervals. My guess is that those other genes might have also produced results that were equally convincing. I'm not asking them to do that, but it reflects the risk of false positive results when taking an approach like this. Nevertheless, I am guessing the authors would prefer to include this section.

      We appreciate the reviewer pointing out this possibility and agree with this concern. We have added a statement acknowledging the risk of false positives in Bayesian modeling in this context and noting that these findings are intended as testable hypotheses

      Methods

      (1) How were the 2 HS rats selected? I had the impression that Dr. Telese's lab had access to snRNA-seq data from more than 2 HS rats.

      We have clarified that these rats were selected based on their addiction-like behavior phenotypes from a larger cohort.

      (2) I didn't look back, but did the main paper point out that the rats are treated with oxycodone rather than morphine?

      We have clarified this distinction in the Methods section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) I think this is an important paper, but I’m puzzled about a tension in the results. On the one hand, it looks like the behavioural gains post-TT happen rather smoothly over time (Figure 5). On the other hand, muscle synergy activations change abruptly at specific days (around day ~65 for Monkey A and around day ~45 for Monkey B; e.g., Figure 6). How do the authors reconcile this tension? In other words, how do they think that this drastic behavioural transition can arise from what appears to be step-by-step, continuous changes in muscle coordination? Is it “just” subtle changes in movements/posture exploiting the mechanical coupling between wrist and finger movements, combined with subtle changes in synergies, and they just happen to all kick in at the same time? This feels to me to be the core of the paper and should be addressed more directly.

      We thank the reviewer for this insightful comment, as it touches upon the central finding of our study. The apparent tension between the smooth behavioral recovery and the abrupt shift in neural strategy is indeed a key feature of the adaptation process. We propose that this reflects the interaction of two distinct, parallel processes operating on different timescales:

      A slow, gradual skill-learning process, where the monkeys incrementally developed and refined a compensatory motor strategy (i.e., the tenodesis effect). This slow refinement is responsible for the smooth improvement seen in the behavioral metrics over many weeks.

      A fast, switch-like adaptive process, which governs the activation of the primary muscle synergies. The initial ‘swap’ strategy, while simple, was biomechanically conflicting and inefficient. The CNS only abandoned this flawed strategy abruptly once the slow learning process had rendered the new compensatory strategy “good enough” to be a viable alternative.

      Therefore, the abrupt neural shift does not cause the behavioral improvement but is rather enabled by the gradual, underlying development of a better motor solution. To address this important point more directly within the manuscript, we added a new subheading to the Discussion section. This section is dedicated to explicitly framing our findings within this multi-timescale learning model, ensuring the link between the gradual behavioral recovery and the abrupt neural shift is clearly articulated.

      (2) The muscle synergy analyses, which are an important part of the paper, could be improved. In particular:

      (a) When measuring the cross-correlation between the activation of synergies, the authors should include error bars and should also look at the lag between the signals.

      We thank the reviewer for these excellent suggestions to improve our analysis.

      Error Bars: We agree that showing trial-to-trial variability is important. In our revision, we have added a shaded envelope (representing the SD across trials) to the cross-correlation plots in Figures 6, 9 and 10.

      Time Lag: We have performed the cross-correlation analysis allowing for variable time lags and extracted the lag yielding the maximum correlation coefficient (max CC) for each session, in addition to the zero-lag correlation presented in the main figures. As hypothesized, allowing variable lags often resulted in high max CC values throughout the adaptation period, potentially obscuring the clear swap-and-revert pattern visible in the zerolag analysis. This is likely because the primary adaptation involved changes in synergy timing rather than fundamental shape. However, the analysis of the lag itself proved informative. We observed significant fluctuations in the optimal lag during the early and mid-adaptation phases, particularly around the time of the ‘switch-back’, before the lag stabilized closer to zero in the late phase.

      We have added a description of this analysis to the Methods section. The results of the lag analysis are now presented in a new Supplementary Figure S6 and S7, and a sentence summarizing this finding has been added to the Results section.

      (b) Figure 7C and related figures, the authors state that the activation of muscle synergies reverts to pre-TT patterns toward the end of the experiments. However, there are noticeable differences for both monkeys (at the end of the “task range” for synergy B for monkey A, and around 50% task range for synergy B for monkey B). The authors should measure this, e.g., by quantifying the per-sample correlation between pre-TT and post-TT activation amplitudes. Same for Figures 8I, J, etc.

      We thank the reviewer for this detailed and insightful suggestion. We agree that our use of the term ‘reversion’ should be nuanced, as the recovery of the synergy activation patterns is substantial but not perfect.

      To formally quantify these remaining differences, we performed a rigorous quantitative comparison between the pre-surgery and final-day post-surgery activation profiles. We calculated the Cosine Similarity to assess the recovery of the temporal shape, and used a Permutation Test (n=10,000) to test for statistical distinctness between the pre- and post-surgery trajectories.

      Results: We found that while the temporal shapes were highly similar (Cosine Correlation > 0.90 for all synergies), the Permutation Test confirmed that the profiles remained statistically distinct (p < 0.0001) in both animals.

      We have added this quantification to the text (Results). This confirms our nuanced interpretation: while the primary temporal features of the synergies reverted, the recovered motor program represents a novel, ‘good enough’ solution that is robust and functional, rather than a mathematically perfect restoration of the original baseline.

      (c) In Figures 9 and 10, the authors show the cross-correlation of the activation coefficients of different synergies; the authors should also look at the correlation between activation profiles because it provides additional information.

      We thank the reviewer for this comment and the opportunity to clarify our terminology. We agree that analyzing the correlation between the full activation profiles is the most informative approach. In our manuscript, the terms ‘activation coefficients’ and ‘activation profiles’ both refer to the complete, time-varying activation patterns of the muscle synergies. Therefore, the crosscorrelation analysis presented in Figures 9 and 10 is indeed the correlation between these full activation profiles. To prevent any potential ambiguity for future readers, we have revised the manuscript to use the term ‘activation profiles’ exclusively and consistently when referring to these time-varying synergy activations.

      (d) The muscle synergy analysis for Monkey B is hindered by the fact that the authors lost the ability to record from the (very) functionally relevant FDS muscle. I’d repeat the synergy analyses without this muscle to understand to what extent the observed changes with respect to baseline are driven by the lack of this data.

      We thank the reviewer for raising this important methodological point. We agree that controlling for changes in the recorded muscle set is crucial for a valid comparison between pre- and post-surgical synergy structures. The reviewer’s concern is based on the premise that the FDS muscle was included in the pre-surgical analysis for Monkey B but absent from the postsurgical analysis.

      We would like to clarify that this is not the case. Due to the loss of the FDS signal post-surgery, we made the deliberate decision to exclude the FDS muscle from ALL synergy analyses for Monkey B, including the pre-surgical baseline period. This was done for the precise reason the reviewer identifies: to ensure a direct and unbiased “apples-to-apples” comparison and to avoid introducing the lack of this muscle as a confound. Therefore, the changes in synergy structure that we report for Monkey B can be confidently attributed to genuine physiological adaptation rather than an artifact of a changing input dataset.

      (e) Figure 11: The authors talk about a key difference in how Synergy B (the extensor finger) evolved between monkeys post-TT. However, to me this figure feels more like a difference in quantity - the time course than quality, since for both monkeys the aaEMG levels pretty much go back to close to baseline levels - even if there’s a statistically significant difference only for Monkey B. What am I missing?

      We thank the reviewer for this insightful question, as it has prompted us to refine our interpretation of this key finding. The reviewer correctly notes that the recovery trajectories of Synergy B appear different, and we agree that our original explanation can be improved.

      A more parsimonious interpretation, and one that we believe aligns better with the data, is that both monkeys likely underwent a similar ‘arms race’, but we captured different phases of this process. In Monkey A, our recordings (starting Day 29) captured the escalating phase of this neuromuscular conflict. In contrast, for Monkey B, recordings began on Day 20, by which time this rapid escalation had likely already occurred and peaked. This difference in the timing of the ‘arms race’ is consistent with our behavioral observations; Monkey A struggled for a longer period before performing the task proficiently, suggesting a more protracted overall adaptation process. Thus, the apparent difference in the figures is likely a reflection of the observational window and the individual adaptation rate of each animal, rather than a fundamental qualitative difference in their adaptive strategy. We have revised the text to present this more unified and coherent interpretation.

      (f) Lines 408-09 and above: The authors claim that “The development of a compensatory strategy, primarily involving the wrist flexor synergy (Synergy C), appears crucial for enabling the final phase of adaptation”, which feels true intuitively and also based on the analysis in Figure 8, but Figure 11 suggests this is only true for Monkey B. How can these statements be reconciled?

      We believe the reviewer may be referring to Monkey A in their comment, as the strong compensatory effect is indeed seen in this animal. The core of this issue, which we have clarified in our revision, is that both monkeys developed a compensatory tenodesis grasp but used different neural strategies to achieve it.

      For Monkey A, strong evidence for this strategy is provided by a clear temporal shift in the activation of its dedicated wrist flexor synergy (Synergy C). As we have now clarified in the manuscript, the peak of this synergy’s activation moved from occurring just after object contact to just before it, a re-timing well-suited to enable a tenodesis grasp.

      For Monkey B, the strategy was one of subtle re-timing rather than scaling. While the total aggregated activation of its primary flexor synergy (Synergy A) did not significantly increase, its temporal profile shifted. Specifically, activation prior to object contact increased, providing the necessary wrist flexion for its assistive tenodesis grasp, which was kinematically confirmed in Figure 12. This was achieved by reallocating activation from the post-contact phase, resulting in an earlier activation peak for the synergy overall. Crucially, a finer-grained analysis reveals a precise temporal sequence within this synergy’s activation: the wrist flexor component (PL) consistently peaked just before object contact to enable hand opening, while the finger flexor component (FDP) peaked just after contact to secure the grasp.

      This timing resolves the apparent biomechanical conflict. It also reveals that while both monkeys converged on the same biomechanical solution (a tenodesis grasp), the observable neural implementation appeared different. However, we must be cautious in directly comparing the computed synergy structures themselves, as the analysis for Monkey B was performed without the FDS muscle. The apparent “multi-functional synergy” in Monkey B is most likely a consequence of this missing data. What is clear and robust, however, is that both monkeys converged on a remarkably similar temporal solution: they both learned to re-time the activation of their key wrist flexor muscles to the pre-grasp phase.

      In Monkey A, this was observed in the temporal shift of its dedicated wrist flexor synergy (Synergy C). In Monkey B, this was observed in the temporal shift of the Palmaris Longus (PL) muscle itself (which, in our computed synergies, was grouped into Synergy A). This convergence on an identical temporal adaptation, regardless of the computed modular organization, is the key finding. We have revised the manuscript to articulate this more precise and defensible interpretation.

      (3) Experimental design: at least for the monkey who was trained on the “artificial task” (Monkey A), it would have been good if the authors had also tested him on naturalistic grasping, like the second monkey, to see to what extent the neural changes generalise across behaviours or are task-specific. Do the authors have some data that could be used to assess this even if less systematically?

      We thank the reviewer for raising this important point regarding the generalizability of our findings across different behaviors. We fully agree that a direct comparison of both tasks in the same animal would have been a valuable experiment. Unfortunately, we do not have systematic data on naturalistic grasping for Monkey A that would allow for such a direct comparison. We therefore view the two tasks as providing complementary evidence. Monkey A’s data shows the adaptation process during a highly stereotyped behavior, while Monkey B’s data demonstrates that a similar two-phase adaptive process occurs during a more naturalistic, unconstrained task. The convergence of these findings strengthens our overall conclusion that this multi-timescale adaptation is a robust principle of motor learning. Nonetheless, the reviewer raises a fascinating question about the task-specific tuning of motor synergies, which remains an excellent direction for future studies.

      (4) Monkey B’s behaviour pre-tendon transfer seems more variable than that of Monkey A (e.g., the larger error bars in Figure 5 compared to monkey A, the fluctuating crosscorrelation between FDS pre and EDC post in Figure 6Q). This should be quantified to better ground the results since it also shows more variability post-TT.

      We thank the reviewer for this excellent suggestion to formally quantify the presurgery behavioral variability. We have performed the suggested analysis on the "Grip Formation Time" metric (Fig. 5A), which was the comparable metric between the two tasks. Our calculation of the Coefficient of Variation (CV) confirms the reviewer’s observation. Monkey B’s pre-surgery performance was substantially more variable (CV = 81.93%) than Monkey A’s (CV = 46.62%). Furthermore, a non-parametric test for equal variances (Ansari-Bradley test) confirmed that this difference is highly statistically significant (p < 0.0001). We have added a description of this analysis to the Methods and reported this finding in the Results section to provide a clearer context for the baseline differences between the subjects.

      (5) Minor: Figure 12 is interesting and supports the idea that monkeys may exploit the biomechanical coupling between wrist and fingers as part of their functional recovery. It would be interesting to measure whether there is a change in such coupling (tenodesis) over time, e.g., by plotting the change in wrist angle vs change in MCP angle as a scatter plot (one dot per trial), and in the same plot show all the days, colour coded by day. Would the relationship remain largely constant or fluctuate slightly early on? I feel this analysis could also help address my point (1) above.

      We thank the reviewer for this excellent and insightful suggestion. We have performed the suggested analysis for Monkey B, plotting the trial-by-trial relationship between wrist and MCP angles for all recording days (New Figure 13).

      The results clearly show the gradual refinement of the tenodesis coupling. Pre-surgery, there was no correlation (R²=0.00). Immediately post-surgery (Day 22), the relationship was weak and variable (R²=0.16), reflecting an exploratory phase. Over the following weeks, the coupling became progressively stronger and more consistent, with the R² value peaking at 0.58 around Day 56, indicating a robust exploitation of the new strategy. The relationship then stabilized at a moderate level (R² ~0.2-0.3) in the final days. This analysis provides direct kinematic evidence for the slow, gradual skill-learning component of our two-state model. It beautifully complements our response to the reviewer’s first point by visualizing the underlying refinement process that occurred concurrently with the more abrupt neural shifts. We have added this new figure and a description of these results to the manuscript.

      Reviewer #2 (Public review):

      Weaknesses:

      The most notable weakness of the study is the incompleteness of the data. [...] As a result, it is difficult to make general conclusions from the study, and it awaits further analysis or the addition of another subject.

      We thank the reviewer for this critical and accurate assessment of the study’s limitations. The reviewer is correct that the datasets for the two monkeys are incomplete in different ways and that the tasks were not identical. We fully acknowledge these limitations throughout the manuscript. Rather than viewing these differences as a weakness that prevents generalization, we propose that they offer a unique strength in the form of complementary evidence. We consider the two animals not as a direct replication, but as two distinct case studies that test the same underlying hypothesis under different conditions.

      Monkey A, with its high-quality EMG and highly stereotyped task, provides a detailed, quantitative view of the neural adaptation process, allowing us to precisely characterize phenomena like the ‘neuromuscular arms race’.

      Monkey B, with its kinematic data and more naturalistic task, provides crucial evidence that the same fundamental principles, a two-phase adaptation and the eventual development of a compensatory strategy, generalize to a less constrained, more behaviorally relevant context. We believe the key finding is the convergence of the results. Despite the differences in individual strategy, task demands, and available data, both animals demonstrated the same core "swapand-revert" adaptive process. We propose that this convergence from heterogeneous sources lends support to the generalizability of our conclusions, suggesting that the multi-timescale adaptation we describe may be a general feature of motor learning following such perturbations. We agree that future studies with more subjects are needed to fully establish this principle. Nonetheless, we feel that the convergent evidence from these two complementary cases provides a valuable foundation for the model we present.

      A second weakness is the insufficient analysis of the movements themselves, particularly for Monkey A. [...] Since the authors have video data for both monkeys, it is surprising that it was not used to extract landmarks for kinematic analysis, or at least hand/endpoint trajectory, and how it is adjusted over time. Adding more behavior data and aligning it with the EMG data would be very helpful for characterizing motor recovery and is needed to support conclusions about underlying neural control strategies for functional improvement.

      We thank the reviewer for this important suggestion. The reviewer’s comment prompted us to re-examine our behavioral data, and we have now performed additional analyses that we agree provide a much clearer link between the neural changes and functional recovery.

      For Monkey A, we have quantified the ‘pull times’ on a day-by-day basis. This analysis reveals a clear, gradual learning curve: pull times were initially long and variable post-surgery but steadily decreased and stabilized over the recovery period. This provides a direct, quantitative measure of motor performance recovery for this animal.

      For Monkey B, we have performed a detailed analysis of the ‘grasp aperture’ prior to object contact. This kinematic analysis is particularly revealing, as it shows the development of the compensatory strategy in real-time. The grasp aperture was initially very small post-surgery, reflecting the monkey’s inability to open its hand. It then steadily increased over the next ~40 days as the monkey learned and refined the compensatory tenodesis grasp, before stabilizing at a new, functional baseline.

      We believe these new analyses directly address the reviewer’s concern by providing a more detailed picture of motor recovery. The grasp aperture data, in particular, offers a clear kinematic correlate for the slow, skill-learning process that we propose runs in parallel to the more abrupt neural reorganization. We have added these results as a new figure in the main text of our revised manuscript.

      Considering specific conclusions, the statement that the monkeys learned to use “tenodesis” over time by increasing activation of a wrist flexor muscle synergy does not seem to be fully supported by the data. [...] Given these issues, it is not clear how to align the EMG and kinematic data and interpret these findings.

      We thank the reviewer for this detailed and critical analysis. They raise an excellent point and have correctly observed that the adaptation is not a simple, uniform increase in wrist flexor synergy amplitude. Our interpretation, which we have clarified in the manuscript, is that the monkeys learned a more sophisticated strategy: a precise re-timing of the wrist flexor activation to occur earlier in the movement, specifically to pre-shape the hand for the grasp.

      For Monkey A: The reviewer correctly notes that the peak amplitude of Synergy C (the wrist flexor synergy) around the moment of grasp (0% task range) is lower in the final phase compared to baseline. However, the crucial change is temporal: the peak of this synergy’s activation shifts from occurring just after the grasp (~+1%) to occurring just before it (~-2%). This re-timing is perfectly suited to enable finger extension via the tenodesis effect immediately prior to object contact. The subsequent lower amplitude may reflect a more efficient, less forceful movement once this new skill was refined.

      For Monkey B: The reviewer is right that this monkey does not have a dedicated wrist flexor synergy and that the overall amplitude of the PL muscle does not increase dramatically. However, a closer look at its activity profile (Fig. S2-AN) reveals a clear and consistent increase in activation specifically in the pre-contact phase (~7% task range). This is the precise neural signature of the assistive tenodesis grasp that is kinematically confirmed in Figure 12. The monkey is not simply scaling up the synergy; it is strategically activating it earlier to prepare for the grasp.

      In summary, the key evidence linking the EMG to the tenodesis strategy is in the temporal domain. The learned re-timing of the wrist flexor activation to the pre-grasp phase is the crucial link that aligns the neural and kinematic data. We have revised the manuscript to make this distinction between amplitude scaling and temporal shifting clearer.

      A more minor point regarding conclusions: statements about poor task performance and high energy expenditure being the costs that drive exploration for a new strategy are speculative and should be presented as such. Although the monkeys did take longer to complete the tasks after the surgery, they were still able to perform it successfully and in less than a second and no measurements of energy expenditure were taken.

      We thank the reviewer for this important point regarding the precision of our language. We agree that statements regarding ‘high energy expenditure’ and the specific drivers for exploring a new strategy are interpretations of the data, not direct measurements, and should be framed as such.

      Our speculation about energetic cost is based on the significant increase in muscle co-activation we observed (e.g., Fig. 11), a phenomenon widely understood to be metabolically expensive. Similarly, while the monkeys were still successful, their prolonged movement times and inefficient motor patterns represent a clear performance deficit compared to their highly optimized presurgical baseline, which we propose acted as a driver for further adaptation. In our full revision, we have carefully revised the manuscript to soften these claims. We have used more speculative language, such as “we hypothesize that...”, “the likely cost of...”, or “may have provided the impetus for...” to ensure that our interpretations are clearly distinguished from our direct empirical findings.

      A small concern is whether the tendon transfer effect may fail over time, either due to scar tissue formation or tendon tearing, and it would be ideal if the integrity of the intervention were re-assessed at the end of the study.

      We thank the reviewer for raising this important point regarding the long-term integrity of the tendon transfer. We agree that a terminal anatomical re-assessment would be an ideal control. While a terminal assessment was not performed as part of this study’s protocol, we were able to monitor the transfer’s integrity throughout the study. We are confident the transfer remained functionally intact for two key reasons:

      (1) Physical Monitoring: We periodically used ultrasound imaging to non-invasively visualize the tendon repair, which allowed us to confirm its continued physical integrity.

      (2) Functional Evidence: This physical confirmation was corroborated by the functional data. Both animals achieved stable, proficient task performance that was maintained for months. Furthermore, the late-phase neuromuscular control strategies became highly consistent. A significant failure, such as a tendon tear or prohibitive mechanical scarring, would be incompatible with this sustained behavioral and neural stability.

      Nevertheless, we agree that a terminal assessment is an excellent methodological suggestion that should be incorporated into the design of future long-term studies of this nature.

      Reviewer #3 (Public review):

      (1) First, I find myself wondering about the physical healing process from the tendon transfer surgery and how it might contribute to the learning. Specifically, how long does it take for the tendons to heal and bear forces? If this itself takes a few months, it would be nice to see some discussion of this.

      We thank the reviewer for this insightful question about the potential contribution of the physical healing process to the adaptation timeline. Our surgical protocol was specifically designed to ensure the tendon transfer was biomechanically robust from the outset, minimizing the role of healing as a rate-limiting factor.

      We used a Pulvertaft weave technique, which is known to achieve mechanical strength equivalent to that of a native tendon shortly after the procedure (Graham et al., 2023). The repair involved more than two weaves and utilized high-strength suture material to maximize its initial forcebearing capacity. While full fibrous integration around the suture site typically occurs within approximately six weeks, the repair itself was strong enough to bear physiological forces immediately post-surgery. Therefore, the prolonged, complex, two-phase multi-month behavioral recovery and the neural reorganization we observed cannot be attributed to a slow physical healing process. Instead, this supports our conclusion that the observed timeline reflects the challenges and constraints of a purely neural adaptation and skill-learning process. To make this crucial point clear to all readers, we have added these details about the surgical method to the Methods section and included a brief discussion of its implications in the Discussion.

      (2) Second, I see that there are some changes in the muscle loadings for each synergy over the days, though they are relatively small. The authors mention that the cosine distances are very small for the conserved synergies compared to distances across synergies, but it would be good to get a sense for how variable this measure is within synergy. For example, what is the cosine similarity for a conserved synergy across different pre-surgery days? This might help inform whether the changes post-surgery are within a normal variation or whether they reflect important changes in how the muscles are being used over time.

      We thank the reviewer for this excellent and insightful suggestion. Establishing a baseline for normal day-to-day variability is an important control for our synergy analysis.

      We have performed this analysis in full. Specifically, to quantify baseline stability, we calculated the cosine similarity between the spatial synergy weights (W) of each individual recording day and the pre-surgery average. This provides a rigorous measure of day-to-day variability relative to the stable baseline structure. We have added these data to Figure 7 (Panel I), which plots the pre-surgery similarity (blue traces) alongside the post-surgery adaptation (red traces).

      We found that baseline stability was remarkably high, with cosine similarity consistently exceeding 0.99 (e.g., Monkey A: 0.99 ± 0.001). This quantification allows the reader to formally assess that the changes observed post-surgery (e.g., drops to ~0.80 or ~0.60 in Monkey B) are well outside the range of normal physiological fluctuation, representing subtle but genuine structural adaptation.

      (3) Last, and maybe most difficult (and possibly out of scope for this work): I would have ideally liked to see some theoretical modeling of the biomechanics so I could more easily understand what the tendon transfer did or how specific synergies affect hand kinematics before and after the surgery. Especially given that the synergies remained consistent, such an analysis could be highly instructive for a reader or to suggest future perturbations to further probe the effects of tendon transfer on long-term learning.

      We thank the reviewer for this excellent and forward-thinking suggestion. We completely agree that a detailed biomechanical model of the tendon transfer would be a powerful tool for understanding the mechanical consequences of the surgery and for interpreting the function of the recorded muscle synergies. However, creating a subject-specific musculoskeletal model with the fidelity required to accurately simulate synergy-to-kinematic transformations is a highly complex project that we feel is well beyond the scope of the current manuscript. Such an endeavor would constitute a major research project in its own right.

      Our study’s primary focus was to provide a detailed, longitudinal characterization of the in-vivo neural adaptation following this perturbation, a dataset that is itself rare and valuable. We aimed to document the physiological learning process as it unfolded over many months. Nonetheless, the reviewer’s point is exceptionally well-taken. Currently, we are constructing a monkey musculoskeletal model and performing tendon transfer on this model to investigate what kind of characteristics in the learning process reproduce the synergy changes observed in the experiments. Although this project is still in progress, to date, we have demonstrated that the robustness of synergies themselves is necessary for changes in muscle activity at the synergy level (Nakajima N, Wang S, Ogihara N, Oya T, Seki K, Funato T, Upper Limb Musculoskeletal Model of Macaque Monkey for Approaching Adaptation Mechanism to Tendon Transfer, Society for Neuroscience 2023, Washington DC, USA, 2023).

      The rich dataset we have collected in the present research could serve as an excellent foundation for developing and validating such a model in the future. We believe that combining these two approaches is a critical and exciting next step for the field, and we have highlighted this as a key future direction in our discussion.

      Recommendations for the authors:

      Reviewing Editor Comments:

      When revising the manuscript for resubmission, please try to improve the visual presentation of the data, which is a point highlighted by all three reviewers during the discussion, including making the presentation of monkey-specific results more consistent across subjects.

      We have comprehensively revised the figures to ensure a consistent and clear visual presentation, as requested. Specifically, we standardized the layout across all main and supplementary figures (placing Monkey A consistently in the top rows or left columns and Monkey B in the bottom rows or right columns) and applied unified color schemes throughout the manuscript. Furthermore, we harmonized the presentation of the analytical results, such as the specific cross-correlation pairings in Figures 9 and 10, to ensure that the data for both subjects are presented with identical logic, facilitating direct comparison.

      Reviewer #1 (Recommendations for the authors):

      (1) Please revise the writing; some words are missing (line 90), and some sentences could be clarified slightly, even if the paper is well written (lines 317-320). The paragraph including the idea of tenodesis could also be further clarified, I think.

      Thank you for pointing these out. We have corrected the missing word (osteoarthritis) on line 90. We have also revised lines 317-320 to remove ambiguity. Furthermore, the section describing the tenodesis effect (now section "Distinct neural implementations...") has been substantially rewritten for improved clarity, incorporating a more detailed explanation of the biomechanics.

      (2) In the Introduction, the authors cite Hunter and Eckstein 2009 and Mercuri and Muntoni 2013 without describing the pathological conditions; this will not be clear for not nonspecialists.

      Thank you. We have added brief descriptions ("osteoarthritis, a degenerative joint disease," and "muscular dystrophy, which involves progressive muscle weakness,") directly into the Introduction sentence where these references appear.

      (3) Data presentation: I often thought that the data could be presented more clearly:

      (a) For example, Figure 3D and 4D should show error bars around the mean to have a sense of the consistency of pre-lesion behaviour. Same for other figures like Figure 6.

      We appreciate the reviewer's suggestion to visualize data consistency. (a) Figures 3D, 4D, and 6 (EMG Profiles): For these figures, we opted to display mean traces and peak markers to clearly illustrate the temporal shifts and relationships between muscles. Overlaying multiple standard deviation envelopes in these comparative plots would significantly reduce legibility. However, to fully address the reviewer's request to see the consistency of pre-lesion behavior, we direct attention to Supplementary Figure S1, which presents the complete EMG profiles with full error tubes (Mean ± SD) for every recorded muscle. (b) Quantitative Analysis Figures: We ensured that variability is explicitly visualized in all statistical analyses. The crosscorrelation time-courses in Figures 6 (G-Q), 9, and 10 are plotted with shaded error tubes to show variance. Similarly, the aggregated EMG analysis in Figure 11 utilizes bar plots with explicit error bars to quantify the statistical consistency of the changes.

      (b) The autocorrelation analysis in Figure 6 should also include measures of lag if it’s not at zero lag. If it’s the latter, please specify it in the Methods.

      We thank the reviewer for this question regarding the cross-correlation analysis presented in Figure 6 (Panels G-J, P-Q). We confirm that this analysis was performed at zero time lag. To clarify this, we have added a sentence to the Methods section (Subsection "Crosscorrelation analysis") explicitly stating that the EMG cross-correlations shown in Figure 6 were calculated at zero lag. We have also added a clarifying note ("at zero time lag") to the description of these panels within the Figure 6 caption.

      (c) Seeing EMG patterns similar to those presented in Figures 3D and 4D at different times post-lesion (e.g., as a Supplementary figure) would also give readers a better intuition of the neural changes.

      We thank the reviewer for this suggestion to provide more intuitive examples of the neural changes. We realize we did not sufficiently highlight this in the main text, but this complete data is already available in the manuscript. Supplementary Figures S1 and S2 provide a comprehensive overview of the EMG patterns for all recorded muscles in Monkey A and Monkey B, respectively. These figures show the pre-surgery and post-surgery average profiles for all recording sessions as well as the average profiles from five different post-surgery landmark days, covering the entire adaptation period. We have added explicit cross-references to these figures in the main text.

      (d) I couldn’t fully understand the analysis in Figure 4E; clarify.

      We thank the reviewer for noticing this oversight. The reviewer is correct that Figure 4E was not referenced in the main text. This panel was intended to show the baseline kinematic profiles (MCP and wrist angles) for Monkey B's control session, corresponding to the average EMGs shown in panel 4D. Given that our more comprehensive kinematic analyses are now presented in Figure 12 and the new Figure 13, we believe panel 4E is largely redundant. To improve the clarity and focus of Figure 4, we have removed panel 4E and its description from the revised manuscript.

      (e) Some figures showing neural changes (e.g., Figures 6G-J, 6P,Q, Figures 9 and 10, and even Figure 11 for different reasons) would become more understandable if they were accompanied by the behavioural changes (e.g., something like Figure 5A on top of them).

      We agree that visualizing the temporal link between neural reorganization and behavioral recovery is essential for interpreting the data. We have implemented this suggestion by overlaying behavioral metrics onto the right y-axes of Figures 6 (G-Q), 9, 10, and 11. However, regarding the specific behavioral metric, we opted to overlay the maladaptive behavior/aberrant reaching metric (from Figure 5B) rather than the grip formation time (Figure 5A). We found that the maladaptive behavior profile provided a clearer and more direct correlate to the neural data, as its peak coincides precisely with the ‘swapped’ synergy phase, thereby effectively illustrating the functional cost of that specific neural state.

      (f) Some figure captions could be improved by adding more detail (e.g., for Figure 6).

      We agree. We have substantially expanded and improved the captions for Figure 6 and Figure 7 to make them more self-contained and guide the reader more effectively through the key findings presented in the panels. We have also reviewed other captions for clarity.

      (g) I’d show the cosine distance between synergies across days as a main figure, e.g., as part of Figure 7, because this is an important result.

      We agree that the longitudinal stability of the synergy structures is a crucial result that deserves prominence. We have implemented this suggestion by adding a new panel, Figure 7 (I, K) for primary synergies and Figure 8 (K, L) for secondary synergies, which plots the cosine similarity of the spatial synergy weights across the entire experimental timeline. This figure explicitly visualizes the high stability of the pre-surgery baseline (blue traces, similarity > 0.99) and contrasts it with the dynamic structural tuning observed during the post-surgery adaptation (red traces), providing a clear, day-by-day account of synergy evolution as requested.

      (h) In Figure 7C, D and G, H, it’d be interesting to also see in the background the EMG for the transferred muscle that belongs to each synergy, to appreciate their relationship.

      We thank the reviewer for this suggestion. To illustrate the close relationship between the primary synergies and their key constituent muscles, while avoiding visual clutter in the complex post-surgery plots, we have modified the pre-surgery panels of Figure 7 (C, D, G, H). In these panels, we have now overlaid the average pre-surgery EMG profile of the primary transferred muscle belonging to that synergy (e.g., FDS for Synergy A, EDC for Synergy B) as a thin, gray, dashed line. This visually confirms the tight correlation between the synergy profile and the muscle’s activity at baseline.

      (i) In page 10, the authors report as maladaptive behaviour the duration of the aberrant reaching component from day 29 (monkey A) and day 20 (monkey B). What was happening before those recording dates? Were the monkeys recovering?

      Thank you for this question. We have added two sentences to the start of the Results section (“Functional Recovery Follows...”) clarifying that the period between surgery and formal recordings included approximately one week of home cage recovery followed by several weeks of assisted task practice. Formal recordings began once the monkeys could perform the task consistently without assistance.

      (j) In the Methods (EMG Analysis), the authors state that they resumed their recordings post-TT “once they (the monkeys) were able to perform the task on their own”. It would be good if the authors made this more precise (e.g., based on success rate or another metric).

      We thank the reviewer for this suggestion to increase precision. We have revised the Methods section to include the specific criteria used for resuming post-surgical recordings. Recordings were restarted once the monkeys were able to perform the task independently (i.e., without assistance from the experimenter) and consistently achieved a successful trial count of at least 100 trials within a single experimental session.

      (k) Line 266- reads “Alternation of EMG activity in non-transferred muscle suggests one possibility: TT might alter the control strategy of coordinated muscle activity for hand movement by modifying the transferred muscles and their agonists as a cohesive unit”, however, some “muscles showed patterns that were incompatible with a simple swap” (Lines 255-256). Doesn’t this observation suggest that what happens is not a simple change in muscle synergies?

      We thank the reviewer for this insightful question regarding the interpretation of muscles with adaptive patterns incompatible with the primary ‘swap-and-revert’. We agree that these observations require careful consideration within the modular framework. Our interpretation is that these muscles do not represent evidence against modular control, but rather reflect the involvement of multiple modules adapting concurrently. Specifically, muscles like FCR and PL, which showed distinct patterns, are primary members of Synergy C (the wrist flexor synergy) in Monkey A. Their adaptive profile is therefore consistent with the task-specific recruitment and retiming of Synergy C as part of the compensatory tenodesis strategy, rather than being a deviation from the swap observed in Synergies A and B. Synergies represent the dominant, shared variance in muscle activity. While they capture the overall strategy, some degree of individual muscle variation or the influence of secondary synergies is expected. We have added a sentence to the Results section to clarify that these diverse patterns likely reflect the differential involvement of muscles in multiple adapting synergies. We believe the overall evidence still strongly supports the modulation of stable synergies as the primary mechanism of adaptation in this paradigm.

      (l) You may want to call synergy A and synergy B, synergy F and synergy E to make recall easier? (Same for synergy C and D, which could be F2 and E2).

      We thank the reviewer for this helpful suggestion aimed at improving clarity. We considered renaming the synergies based on function (e.g., F/E). However, given the number of figures and the complexity of a global change, and the fact that the functional roles of Synergies C and D differed between animals, we decided to retain the original A/B/C/D labels for consistency. To ensure clarity for the reader, we have carefully checked the manuscript to ensure that we consistently define the primary functional role of each synergy (e.g., "Synergy A, the primary finger flexor synergy") when it is discussed.

      (m) Lines 315-317 - “These pattens of changes in synergy 3 and 4, both contributed minimally to the EMG of transferred muscles” -> This statement puts the causality as synergies cause muscles to activate according to certain patterns, which is supported by work by several groups -including the authors- however, they could also reflect biomechanical and task constraints as other have argued; perhaps this tone would be better for the discussion?

      We thank the reviewer for this nuanced point regarding the interpretation of synergy contributions. We agree that the causal relationship between computed synergies and muscle activity is complex and can reflect both neural commands and task constraints. To address this, we have revised the sentence in question in the Results section. Instead of stating that the synergies "contributed minimally," we now state that the changes in these synergies "were associated with minimal EMG activity in the transferred muscles." This phrasing is more descriptive of the observation and less implicitly causal, while retaining the key point within the flow of the results. The subsequent sentences, which offer interpretation, are already framed speculatively ("This suggests...", "may have served...").

      (n) Line 403 How do the authors conclude from the synergy patterns in Figure 11 that the early post-TT is characterised by “an unstable and inefficient neural control strategy”? To me, this is shown clearly in the behaviour, not in these plots, unless I’m missing something?

      We thank the reviewer for this comment, which highlights the need to clearly connect our neural findings to the behavioral outcome. The reviewer is absolutely correct that the behavioral data (Fig. 5) provides the most direct evidence of instability and inefficiency during the early adaptation phase. Our intention was to argue that the neural patterns observed in Figure 11 provide a physiological correlate for this behavioral inefficiency. Specifically, the escalating aggregated EMG activity observed in the conflicted extensor synergy (Synergy B), which we term the ‘arms race’, represents significant muscle co-activation. Such co-activation is widely understood to be energetically costly and reflects a suboptimal control strategy where the CNS is essentially "fighting itself" against the altered mechanics. To make this link clearer, we have revised the concluding sentence of the relevant paragraph in the Discussion ("The early adaptation phase...") to explicitly state that this escalating co-activation is a known marker of inefficient recruitment and that it occurred concurrently with the period of poor behavioral performance shown in Figure 5.

      (o) Lines 469-471. The authors suggest that muscle synergies may be preserved post-TT because a modular approach (to motor control) may be computationally easy and metabolically cheap. To me, recent data suggest that the most parsimonious explanation is what they later say: that the nervous system may not be plastic enough to change this (e.g., see Makin and Krakauer, “Against reorganisation” also in eLife).

      We thank the reviewer for raising this important theoretical point and for referencing the relevant literature on constraints on cortical reorganization. We agree that the preservation of muscle synergies in the face of such a profound perturbation is a key finding that warrants careful interpretation. In our revised Discussion (section "The CNS Defaults to a Modular Strategy..."), we have now explicitly incorporated the perspective that synergy stability may reflect inherent constraints on neural plasticity, citing Makin and Krakauer (2023), alongside our original hypothesis regarding computational and metabolic efficiency. We present these ideas not as mutually exclusive, but as potentially complementary factors that both contribute to the CNS’s apparent preference for modulating existing modules rather than fundamentally restructuring them.

      (p) Lines 501-503. Also on interpretation. Would the metabolic cost indeed be much higher? Couldn’t the observed change in strategy be explained purely based on performance metrics?

      This is an important point. We agree that statements regarding high energy expenditure are interpretations, not direct measurements. We have carefully revised the manuscript (Abstract, Results, and Discussion) to soften these claims, using more speculative language (e.g., "likely costly," "what we propose was...") to clearly distinguish our interpretations from direct empirical findings.

      (q) Lines 538-. The authors link the initial adaptation phase to the fast process reported in adaptation studies and say that this leads to poor retention. However, it seems from their data that the behaviour is stable across (early) days, so doesn’t this rule out such an interpretation?

      We thank the reviewer for this insightful question regarding the interpretation of the early adaptive phase within the two-state model framework. The reviewer correctly notes that the early post-surgical behavior, while maladaptive, appeared relatively stable across days and did not show the rapid decay sometimes associated with the "poor retention" characteristic of the fast system. We agree that this apparent stability requires careful interpretation. In our revised Discussion (section "A Multi-Timescale Model..."), we now propose that the fast system is primarily responsible for the initial, rapid adoption of the ‘swap’ strategy in response to the large error signal. The subsequent persistence of this flawed but stable state for several weeks is likely not due to strong retention by the fast system itself, but rather reflects the time required for the parallel slow system to gradually develop a more effective compensatory strategy (i.e., the tenodesis grasp). Once this alternative strategy became viable, it enabled the abrupt "switchback," which we also attribute to the fast system recalibrating away from the highly costly swap strategy. Therefore, we believe our data is consistent with the involvement of a fast system driving rapid strategic shifts, even if the typical "poor retention" phenotype is masked by the lack of a viable alternative strategy during the early phase.

      Reviewer #2 (Recommendations for the authors):

      (1) The discussion would benefit greatly from a more careful comparison with prior work characterizing the response to experimental or clinical tendon or nerve transfer in different models.

      We thank the reviewer for suggesting these important references and for the recommendation to compare our findings more carefully with prior work. This is an excellent point, and we agree it will significantly strengthen the discussion. In our full revision, we have added a new paragraph to the Discussion section dedicated to this comparison. We discuss how our findings relate to classic work showing primate adaptive capacity beyond simple maladaptive responses (Sperry, 1947), EMG evidence for the persistence of original neural patterns alongside new ones in human patients (Illert et al., 1986), the critical role of altered peripheral biomechanics and myofascial force transmission in complicating adaptation (Maas & Huijing, 2012), and how our observation of synergy stability aligns with evidence for modular adaptation strategies (Berger et al., 2013). This comparison helps situate our unique findings of a multi-timescale process and synergy timing modulation within the broader context of motor relearning after musculoskeletal rearrangement.

      (2) Line 90 - Which disease or condition is studied in Hunter and Eckstein (2009)?

      Thank you. We have clarified this in the Introduction; the reference pertains to osteoarthritis.

      (3) Line 280 for clarity in text and as a reminder to the readers, please state which muscles are involved in each synergy grouping.

      We have updated the text (Results, 'Adaptation occurs through modulating...') to explicitly list the main contributing muscles for each synergy grouping (e.g., Synergy A: FDS and FCU for Monkey A). This provides the requested clarity regarding the functional identity of each synergy while maintaining readability. For the complete, quantitative muscle weight composition including minor contributors, we referred the reader to Figure 7 and Supplementary Table 1.

      (4) Line 180 There are differences in the time course for measurements between the behavioral metrics and EMGs. If not recorded at fixed time intervals, the differences in the time courses for the two monkeys should be explained.

      We thank the reviewer for this question regarding the time courses of our measurements. We interpret this comment in two ways, both of which we have addressed in the revised manuscript.

      First, if the reviewer is asking about the overall recording schedule, they are correct that sessions were not performed at fixed daily intervals, and the specific days sampled differed between monkeys. This non-uniform sampling was due to the practical constraints of longterm behavioral experiments (e.g., animal cooperation, scheduling, weekends) and the aim to capture data during key phases of adaptation. However, within any given session, behavioral (video) and EMG data were always collected concurrently.

      Second, if the reviewer is asking whether the set of days included differs between the behavioral plots (e.g., Fig 5) and the EMG/synergy plots (e.g., Figs 6, 9-11), this is a possibility depending on data quality criteria. Our criterion for including a session in the behavioral analysis was a minimum of 20 successful trials. However, for the more demanding synergy analysis, we required a higher minimum of 100 successful trials to ensure robust factorization. It is possible that a few sessions met the behavioral criterion but not the synergy criterion and were thus excluded from the latter analysis, leading to slight differences in the days presented across figures. To ensure full clarity, we have added text to the Methods section explicitly stating: (A) the rationale for the non-uniform daily sampling schedule, and (B) the specific minimum trial count criteria used for including data in the behavioral versus the synergy analyses, noting if this resulted in different sets of days being analyzed for different figures.

      (5) General figure comments - The figures are informative, but they could be better presented, designed, and formatted to explain the important results in the paper. The figures should be able to explain most of the key results without entirely referring to the text to find some of the details. I had a bit of trouble understanding Figure 9 & 10. I would also like to suggest that bringing raw data into some figures (e.g., EMG of different muscle groups), such as showing stability between the synergies, could improve the results and allow the story to flow with more clarity. Likewise, clearly showing the differences between baseline EMG measurements and post-surgery measurements could improve some of the result figures.

      We thank the reviewer for these important general comments on data presentation. We agree that the figures are the key to our story and are implementing several revisions based on this and other reviewer feedback to improve their clarity.

      General Presentation: We have conducted a thorough review of all figures to improve layout, consistency, and font legibility (addressing R3, 1 and the Reviewing Editor's comments). This includes adjusting the layouts of Figures 3, 4, and 6 for better alignment and clarity.

      Figures 9 & 10 (Cross-correlation): The reviewer mentioned having trouble understanding these figures. In our revision, we have substantially rewritten the captions for Figures 9 and 10 to be much more descriptive. We explicitly walk the reader through how to interpret the plots (e.g., "The ‘swap’ is evidenced by the drop in self-correlation... and a concurrent rise in antagonist-correlation...").

      Including "Raw Data" (EMG): We thank the reviewer for this suggestion to provide more intuitive examples of the neural changes. We realize we did not sufficiently highlight this in the main text, but this complete data is already available in the manuscript. Supplementary Figures S1 and S2 provide a comprehensive overview of the EMG patterns for all recorded muscles in Monkey A and Monkey B, respectively. These figures show the pre-surgery and post-surgery average profiles for all recording sessions as well as the average profiles from five different post-surgery landmark days, covering the entire adaptation period. These figures directly visualize the swap-and-revert pattern in the transferred muscles and their agonists (e.g., EDC, ED23), as well as the diverse and complex adaptations in other nontransferred muscles (e.g., FCR, PL), as requested. To make this clearer, we have added explicit cross-references to Supplementary Figures S1 and S2 within the main Results section to ensure readers are directed to this detailed data.

      Showing Differences (Pre vs. Post): To "clearly show the differences between baseline... and post-surgery measurements," we implemented the point-by-point statistical comparison of pre- vs. final-day synergy profiles (as suggested in R1, 2b). This has resulted in a new Supplementary Figure visually highlighting the precise periods in the task where the final profiles still differ significantly from baseline (Fig. S9).

      We believe these additions (new figures and improved captions) will make the results much clearer and more self-explanatory, as the reviewer suggested.

      (6) Figure 1 A table with all the acronyms would help with identifying all the muscles and their respective synergies (supplemental), especially when describing the muscles in the result of the discussion section.

      This is an excellent suggestion. We have created a comprehensive table (Supplementary Table 1) listing all muscle abbreviations, full names, primary functional groups, and assigned synergies for both monkeys. We have added a reference to this table in the Figure 1 caption and the Methods section.

      (7) Figure 2 - is this mainly from Monkey A? If so, it should be stated.

      We thank the reviewer for pointing out this omission. We have updated the caption for Figure 2 to clarify that the example data shown (ultrasound, trajectories, and quantitative plots) are from Monkey A.

      (8) Figure 3 & Figure 4 seems unbalanced because of the descriptive need to explain Monkey B’s tasks? The figure alignments could be better.

      We thank the reviewer for this comment on the visual presentation of Figures 3 and 4. The reviewer’s observation that the figures appeared ‘unbalanced’ was correct. This was a direct consequence of two issues: (1) the different tasks required slightly different schematics (the "descriptive need" the reviewer mentioned), and (2) the original Figure 4 contained an additional kinematic panel (formerly 4E) that was unique to Monkey B, which broke the parallel structure with Figure 3.

      To address this and significantly improve the alignment, we have now moved the unique kinematic panel (formerly 4E) to a new Supplementary Figure (Supplementary Figure S8). This change has allowed us to re-arrange the panels in Figures 3 and 4 so that they now follow the exact same order. We have also adjusted the layout to ensure that corresponding panels are of a consistent size. We agree that this creates a much better visual balance and makes the comparison between the two monkeys far more direct and clear, as the reviewer suggested.

      (9) Figure 5. It seems like the animals can still perform the task post-surgery, but with high variability. Maybe emphasize the differences in variability between baseline and postsurgery?

      We thank the reviewer for this suggestion to emphasize the changes in variability. We have now quantified this using the Coefficient of Variation (CV) for key behavioral metrics across different phases (Pre-surgery, Early, Mid, Late post-surgery). The results confirm the reviewer’s observation of high variability post-surgery, particularly in the early phase. For instance, Monkey A’s grip formation time CV spiked dramatically (Pre: 47% vs Early: 133%), while Monkey B’s remained high (Pre: 82% vs Early: 76%). Interestingly, while Monkey A’s variability returned close to baseline levels in the late phase (Late: 55%), Monkey B’s variability increased further (Late: 97%), suggesting persistent inconsistency despite functional recovery.

      We also observed metric-specific changes. Monkey A’s pull time became less variable than baseline later on (Pre: 65% vs Late: 43%), suggesting refinement of that action. Conversely, Monkey B’s grasp aperture remained consistently low throughout (Pre: 26% vs Late: 19%), indicating relatively precise kinematic control was maintained or quickly regained. We have added a summary of these findings to the Results section to provide a more complete picture of how behavioral variability evolved relative to baseline during the adaptation process.

      (10) Figure 6 quite a confusing figure. This figure needs to be better presented. The figure legends are hard to see for Monkey A vs Monkey B. At first, I thought Monkey B’s figure legend also represented Monkey A. I would suggest reorganizing the figures for clarity and coherence.

      We agree that the original presentation of Figure 6 was dense and potentially confusing. We have completely reorganized the figure to improve clarity and coherence.

      (1) Clear Separation: The figure is now structured with a strict separation between Monkey A (Left Panels, A-J) and Monkey B (Right Panels, K-Q), with prominent headers for each subject to prevent ambiguity.

      (2) Improved Legends: We have redesigned the legends to be larger and placed them explicitly within their respective subject’s section to ensure it is immediately clear which data they describe.

      (3) Visual Consistency: We have standardized the color schemes and axis layouts across this and all other figures to reduce cognitive load and facilitate easier comparison between subjects.

      (11) Figure 12 - This figure is incomplete without Monkey A’s results. The videos in the supplemental sections seem clear enough for some kinematic analysis. The story could be more supported with more thorough measurements of the kinematics from both animals to show how they differ over time and by highlighting the two phases. As a minor note, it would be helpful to present the kinematic data together with a schematic of when during the task the data are drawn from, using the % task range scale, since that is the standard throughout the paper.

      We thank the reviewer for their suggestions regarding the kinematic analysis. We agree that a parallel kinematic analysis for Monkey A, similar to that in Figure 12, would be ideal. We did attempt this. Unfortunately, while the supplemental videos for Monkey A are sufficient for observing the overall movement trajectory, they are not suitable for the detailed joint angle analysis the reviewer suggests. The videos for Monkey A were recorded at an insufficient frame rate that did not allow to reliably extract the rapid joint angle positions of the wrist and fingers during the grasping movement. This is the reason why this detailed kinematic analysis was limited to Monkey B, for which we had high-speed video recorded at 240 fps, allowing for a robust analysis of these fast movements.

      We have, however, expanded our kinematic analysis for Monkey B to show the refinement of the tenodesis strategy over the full time course (New Figure 13), which does help to highlight the different adaptive phases for that animal. We have also clarified in the manuscript (e.g., in the caption for Figure 12) that the lack of Monkey A data for this specific analysis was due to the lowresolution and low-frame-rate video available.

      We agree that defining the precise timing of the kinematic snapshot relative to our normalized task range is critical for accurate interpretation. In response, we have added a new panel (Figure 12C) that explicitly maps the kinematic snapshot to our standardized task timeline. This schematic clarifies that the joint angle analysis captures the hand configuration during the pre-shaping phase, specifically at 83 ms prior to object contact (which corresponds to -0.02% of the normalized task range). This ensures the kinematic data can be directly interpreted within the same temporal context as the EMG and synergy results presented throughout the paper.

      Reviewer #3 (Recommendations for the authors):

      First and most major: I found many of the figures much too small and incredibly difficult to read. Possibly the most difficult was Figure 7, where I had to zoom in a great deal to read what muscles corresponded to which bars. I don’t have specific suggestions here other than to make sure that figures are legible.

      We thank the reviewer for highlighting this important issue. We have comprehensively revised the figures to ensure they are legible at standard publication sizes. Specific improvements include:

      (1) Figure 7: We have significantly increased the font size of the x-axis muscle labels and optimized the bar chart spacing to ensure the muscle identities are readable without excessive zooming.

      (2) Global Updates: Across all figures, we have increased font sizes for axis labels and titles, removed unnecessary whitespace to maximize the data-to-ink ratio, and exported all final figures in high-resolution vector formats to ensure clarity.

      Second and more minor: I liked the setup of the manuscript, where the authors explained the unique benefits of their experimental methods and the question they were going after (“When confronted with structural changes to the musculoskeletal system, does the CNS adapt by modulating existing synergies, or by shifting toward more fractionated control strategies?”). However, the evolution of the paper made the answer to this question seem very confusing to me as I read it. The results show that monkeys initially modulated existing synergies in phase 1, but then reverted to the original modulation. This, in addition to the way the question was set up initially, made me think the conclusion was going to be that the synergies themselves changed in the second phase, but this paradoxically was not the case--synergies were stable throughout. I was left confused for the back half of the results section, until the discussion on tenodesis and developing compensatory movement strategies. So the answer is that the monkey learns by modulating existing synergies, but using different strategies in different learning phases. I’m not entirely sure how to avoid this confusion, but I wonder if there’s a way to foreshadow this finding earlier on.

      We thank the reviewer for this valuable feedback on the manuscript’s narrative structure. We understand how the initial framing (modulation vs. fractionation) followed by the reversion of the initial modulation could lead to confusion before the compensatory strategy is fully introduced. To address this, we have made two key adjustments in the revised manuscript:

      (1) In the Introduction, after posing the central question, we have added a sentence to subtly foreshadow that the adaptive process might be complex and multi-phasic, requiring analysis over extended timescales.

      (2) In the Results section, at the transition point between describing the reversion of the primary synergy timings and introducing the compensatory tenodesis strategy, we have added a short paragraph to explicitly signal that the reversion was not the complete solution and that a distinct compensatory strategy emerged concurrently.

      We believe these changes improve the narrative flow, provide better signposting for the reader, and mitigate the potential for confusion identified by the reviewer, making it clearer that the ultimate solution involved modulating existing synergies but via different strategies across distinct learning phases. We appreciate the reviewer’s help in identifying this area for improvement.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The paper describes a biologically plausible version of JEPA using recurrent neural networks called RPL for recurrent predictive learning. Given an embedding z<sub>t</sub>, a recurrent neural network processes these inputs with the form: c<sub>t</sub>+1 = RNN(c<sub>t</sub>,z<sub>t</sub>). Then the predictive network f is predicting the future inputs with the format: min||f(c<sub>t</sub>) − stop grad(z<sub>t</sub>+∆<sub>t</sub>)||<sup>2</sup>. I understand that a prediction error is defined as: e = z<sub>t</sub>+∆<sub>t</sub> − f(c<sub>t</sub>) to model cortical measurements in the oddball task.

      The RPL model is also shown to build an internal world model, with ”real-world” data like the movement of moving animals or speech signals. The representation is then compared to V1 data and expected prediction error signals in an oddball setting. In a stacked hierarchy of RNN learning with RPL, the higher layers appear to learn high-level latent variables, although gradients are not propagated downward to the lower layers.

      The paper tackles an open question: Self-supervised learning is thought to be a fundamental principle to explain how computation is structured in the brain. Cortical data suggest qualitatively that prediction error is a core principle of representation learning in the brain, but the field is still looking for a simple yet expressive model that would explain how the cortex learns its representations. RPL contributes in that direction by making a useful link between cortical representation learning in RNN models and the JEPA learning algorithm that was demonstrated to scale to large world model learning from video data by Lecun’s group. It is very useful to connect this popular deep learning algorithm to cortical data.

      The model formalism is relatively elegant and simple: Simple next input prediction objectives are conceptually simple but not necessarily trivial to build at scale. There is a clear benefit in comparison with contrastive or IL methods because they are free from dataset-specific data augmentation and negative samples. Thereby moving the comp neuro field towards conceptually simpler models of representation in the cortex. Yet predictive only models (and in particular predictive models in latent space instead of pixel space) are not easy to build in a stable fashion. JEPA family is basically intended to solve this question; it is very nice and timely to bring this to comp neuro.

      The methodology combining comp neuro and deep learning makes sense: The conceptual and qualitative analogy with cortical prediction errors is relevant and consistent with what is expected as a model of self-supervised learning in cortical models. The methodology to compare RPL with IL and CL is methodologically meaningful and grounded: showing, for instance, how some of the models fail to represent some latent structure in some toy datasets is interesting.

      (1.1) h-RPL: The h-RPL is perhaps the most creative departure from the JEPA model family. It would be interesting to say more about what was particularly difficult to see in the latent variables emerging in the hierarchical model. I often find it magical that layer-wise learning rules of this type are not learning redundant representations. Any insights why this is not the case here would be potentially insightful.

      We thank the reviewer for this comment. Regarding representational collapse in h-RPL: each local circuit independently applies the same collapse-preventing strategy as the single-level RPL model: namely, the asymmetric prediction architecture combined with the stop-grad operator. Since this mechanism operates locally within each circuit, it is sufficient to prevent collapse at every level of the hierarchy independently (see also our response to Point P1.3).

      The more subtle question is why the circuits learn non-redundant rather than identical representations across the hierarchy. We believe two mechanisms are at play here: First, the hierarchical encoder is a stacked convolutional network, meaning that receptive field sizes grow with depth. This architectural inductive bias naturally encourages successive circuits to operate on increasingly spatially integrated features, creating a structural pressure toward learning complementary rather than redundant representations. Second, the growing expressivity of the network with depth means that higher circuits have access to richer, more abstract inputs from which they can extract higher-level latent structure that is not already captured by lower circuits. Together these factors: the local collapse-preventing mechanism and the depth-dependent growth in receptive field size and network expressivity presumably explain why h-RPL builds an increasingly refined and non-redundant representational hierarchy.

      What we will do: We will expand our discussion on this point in the revised manuscript. We plan to expand our quantification on how abstractions emerge in h-RPL in future work in which we will also study variations with top-down connections.

      (1.2) In general, I fully support the type of question and ideas that the paper is putting forward. It is, however, very hard in this research field to gain insight into specific conceptual contributions or specific bits of experimental data that the model puts forward. In pointing to the following weaknesses, I am encouraging the authors to lay out more clearly what the unique hypothesis is or the contribution of the RPL model that we should remember it for.

      Thanks for the positive feedback along with the constructive criticism, and we agree that articulating the core contributions more crisply would strengthen the paper.

      At its heart, we believe the paper makes two contributions we hope it will be remembered for. First, while prior work has established that invariant representations can be learned via local Hebbianlike learning rules, we show that learning equivariant representations alongside a latent dynamics model requires something qualitatively different: a local circuit; one with recurrent dynamics and an asymmetric predictive architecture. RPL provides a minimal concrete instantiation of this principle.

      Second, and perhaps more broadly, the model makes a structural prediction about (cortical) neuronal circuit organization: since the encoder, integrator, and predictor each perform functionally distinct computations, the framework implies the existence of corresponding cell types and connectivity patterns one should look for in experimental data.

      What we will do: We will sharpen these above messages in the revised manuscript to ensure these contributions are prominently highlighted throughout the paper.

      (1.3) Comparison with JEPA variants: JEPA variants are integrating different details into the learning algorithm. Integrating, for instance, “masking” of the latent encoder targets, or EMA in the style of BYOL or Siamese networks, for the predicted representations. It is great that RPL does not seem to need any of those (next input prediction is a natural implementation of masking, and EMA does not seem to be used). It is notoriously hard for the JEPA model to work without these features. Since some of these details are sometimes surprisingly crucial for a simulation to work, it would be good to report which of the other important details were key to live without EMA and masking. Is it the difference in learning rate, for instance? Or maybe the tasks considered are simply easy enough for any model to work; if so, it could be useful to acknowledge to what extent this is true.

      We thank the reviewer for raising this important point. There are two key mechanisms that ensure stable, non-trivial training in RPL. First, using a higher learning rate for the predictor relative to the encoder is crucial for stable training. This prevents the predictor from collapsing the encoder representations and was already noted empirically by Chen et al. (2021).

      Second, and more fundamentally, predicting at the level of the memoryless encoder output, rather than at the level of the recurrent integrator, is essential to prevent a degenerate solution in which the RNN simply learns to generate an internally predictable time series unrelated to the input. By anchoring the prediction target to the encoder, the model is forced to ground its representations in the sensory input. Intuitively, otherwise the RNN can simply “make up” a predictable time series, which satisfies the learning objective, but would not yield useful internal representations.

      Beyond these architectural points, previous work from our group (Srinath Halvagal et al., 2023) has shown mathematically that JEPAs without EMA avoid collapse via an implicit variance regularization mechanism, and we believe RPL benefits from the same principle. Indeed, we now have a more complete theoretical understanding of this, including identifiability proofs for the latent dynamical model under relatively mild assumptions (Mikulasch et al., 2026). This work has recently been accepted at ICML. Other than that, one has to ensure that representations are not already nearly collapsed at the beginning of training. In this paper, we used normalization layers (batchnorm) in the encoder to ensure this.

      Finally like all SSL paradigms the augmentation strength is an important hyperparameter that impacts the quality of learned representations. In the temporal predictive setting, the augmentation strength is fixed by the world itself. The only knob we have to play with is the prediction horizon ∆. While we typically focused on next-time-step (∆ = 1) prediction, we saw a clear effect in the case of the speech dataset where ∆ = 8, but not ∆ = 1, yielded useful representations for the tasks (Fig. 5b).

      What we will do: We will discuss the above points more prominently in the discussion to avoid them being overlooked in the methods. Additionally, we will include a plot on the empirical prediction horizon for the speech dataset in the supplementary material for reference.

      (1.4) Comparison with IL and CL: On a high level, the comparison with IL and CL algorithms is written as conclusive. I suspect that the failure modes of IL and CL that are described are not due to the algorithms themselves, but rather to the construction of invariance statistics or the choice of negative sample sets (the sets of samples among which variance 1 is requested by VICreg). For instance, if variance (or negative sample set) is taken only across time, the variance object identity is expected to collapse. Similarly, if the variance is taken across the object identity, the variance across time can collapse. So I wonder if the failure of IL and CL is induced by the construction of the variance definition.

      We thank the reviewer for this thoughtful point. Both RPL and CL implement an implicit variance regularizer by virtue of being JEPAs (Srinath Halvagal et al., 2023), whereas IL uses an explicit regularizer computed along both the batch and time dimensions to avoid representational and dimensional collapse. The failure modes of IL and CL therefore cannot be entirely attributed to the statistics of the input samples chosen for variance regularization, but are instead primarily determined by the choice of prediction and target representations.

      What we will do: We will clarify this in the Methods section of the revised manuscript.

      (1.5) Prediction error: When compared to the recording of cortical activity in Figure 7. It is not obvious from the figure which latent space we are talking about mathematically. Is the vector z, c or the prediction error e? This is rather important from a neuroscientific point of view, because the prediction error e is expected to explain the neuronal data. On the other hand, the prediction error e is only used in the learning algorithm to define the loss function, but it is not the communication medium between the RNN units c (or with the encoder z).

      In the brain, since the measurements are recorded as neural activity, they are communication channels between specific units (z or c). It is probably c or z that would already explain the oddball prediction error. I believe that other models, like Forward-forward of Nejad et al., have tried quite hard to address this apparent tension. Whether or not this is resolved by RPL, it thinks it would be beneficial to state the problem and clarify how the algorithm addresses or ignores the issue.

      Thanks for pointing out the issue with regards to clarity and for raising the important but subtle point about prediction error representation. To answer the immediate question asking which vector we use in Figure 7, it is the vector c corresponding to the integrator representations. We agree this should be stated explicitly and will update the manuscript accordingly.

      On the more general point, we agree that the tension between recordable neural activity and the computational role of prediction errors is an important issue. We do already briefly engage with it in the Discussion (subsection “Relation to previous modeling work”), where we note that under RPL “inter-areal communication is dominated by representations rather than error signals”. However, we agree that this point should be surfaced more directly.

      To elaborate, under classical predictive coding, prediction errors are the inter-areal communication channel and are therefore expected to be directly observable in neural recordings, e.g., as oddball responses. Under RPL, this is not the case: e is computed locally within a circuit and serves only as a learning signal for synaptic plasticity, not as a signal propagated between circuits or areas. What cortex primarily encodes and communicates in our framework are predictive representations, not reconstruction errors. Accordingly, what should map onto recorded population activity are the representations c (and z), while locally computed prediction errors could in principle remain observable as more circumscribed or transient mismatch-like signals within a circuit.

      We would like to push this point further. The reviewer frames this as a tension that RPL needs to resolve, but growing neurophysiological evidence suggests that classical residual-difference prediction errors may not be a dominant mode of cortical encoding in the first place. Furutachi, Franklin, et al. (2024) showed that V1 responses to unexpected visual stimuli do not encode how input deviates from predictions, but instead selectively amplify the representation of the unexpected stimulus itself. Very recently, Furutachi and Hofer (2026) generalize this into a revised framework in which feedforward pathways transmit sensory representations modulated by prediction-error magnitude, rather than residual differences. Vasilevskaya et al. (2026) constrain the space of plausible cortical algorithms via functionalinfluence experiments, also concluding that no variant of standard predictive processing is consistent with the full pattern of layer 2/3 ↔ layer 5 interactions; they propose a JEPA-based model, citing RPL as a promising candidate. The model by Nejad et al. (2025) similarly shares with RPL the property that representations, rather than residual errors, propagate between circuit elements.

      Taken together, the apparent tension may be less a problem RPL needs to resolve than one it is well positioned to explain, remaining consistent with the emerging picture of cortex as encoding amplified sensory features rather than transmitting residual errors across areas.

      What we will do: We will add missing information to the main text and sharpen the Discussion with these arguments.

      (1.6) Successor representation without value? I believe the term successor representation is historically relevant in a reinforcement learning (RL) setting and has a precise mathematical definition. Without RL, I feel that learning successor representation is conceptually identical to learning a transition matrix (aka, a primitive world model). I therefore wonder if the pitch for high-level framing of the successor representation is appropriately described or trivial.

      The reviewer makes a valid point on the concept of successor representations. To answer the immediate question, it is not entirely trivial, as we not only observe the emergence of the transition structure (Fig. 6c), but also the encoding of decaying future (but not past) state occupancy (Fig 6d,e). We largely adapted the terminology “successor-like representations” from the study by (Ekman et al., 2023), but we will elaborate a bit further for why we stuck to it. As nicely pointed out by the reviewer, the term “successor representations” was introduced in the RL literature (Dayan, 1993), but further adopted in neuroscience to describe the idea that a neuronal population encodes a predictive representation that reflects the expected future occupancy of future states under a given policy. Ekman et al. (2023) use the term “successor-like representations” to explain the phenomena where the neural activity in V1 (and hippocampus) represent both current and (discounted) future, but not past, state occupancies in a sequence learning task with no explicitly defined policy or value training. In other words, successor-like representations are simply predictive representations.

      What we will do: To deal with this dichotomy, we will replace “successor-like representations” with the term “predictive representations” in the abstract and clarify this distinction in the Results section of the revised manuscript.

      (1.7) Learning in RNN: Learning with recurrent networks appears to be a key in this model presented here (it is in the algorithm name). Yet, this aspect of the model and the literature on biologically plausible learning rules for RNN is not really discussed.

      We thank the reviewer for raising this concern. While h-RPL is one step toward more biologically plausible and spatially local learning rules, exploring it further in terms of temporal credit assignment is beyond the scope of the present study and would require a more systematic and in-depth analysis. However, moving toward more biologically plausible learning rules is an interesting research direction that we plan to explore, as we also mentioned in the Discussion (“Limitations and future research directions”).

      We think a viable strategy could be to combine a slim spatial credit assignment strategy such as feedback alignment (Nøkland, 2016; Lillicrap et al., 2016) with an online learning rule using eligibility traces for temporal credit assignment such as SuperSpike (Zenke et al., 2018) or e-prop (Bellec et al., 2020). Similar strategies have given promising results for CLAPP (Illing et al., 2021; Zihan et al., 2026).

      What we will do: Following the suggestion, we will discuss biologically plausible learning rules for RNNs in the Discussion.

      Reviewer #2 (Public review):

      This is a very interesting manuscript, which proposes a novel idea on how cortical networks may learn useful representations of sensory stimuli. The model implementing this idea is thoroughly tested in multiple experimental paradigms. The manuscript is very clearly written. I feel it may have a significant impact on our understanding of cortical circuitry.

      Reviewer #3 (Public review):

      This paper presents Recurrent Predictive Learning (RPL), a self-supervised model conceptually similar to Joint-Embedding Predictive Architecture (JEPA) models. RPL sequentially observes dynamic scenes to predict subsequent observations. A central claim of the work is that the model’s trained representations are simultaneously invariant and equivariant to transformations, such as movement properties that emerge without explicit supervision. These representational qualities are demonstrated through three experiments utilizing two simulated datasets and one naturalistic dataset. Furthermore, the latent embeddings are qualitatively compared with neural data, showing that the model reproduces the successor representation observed in human V1 and the local/global oddball effect in the monkey Prefrontal Cortex.

      The paper addresses a fundamental question relevant to both computational neuroscience and machine vision: how the brain learns representations that are simultaneously invariant and equivariant to transformations. The manuscript is well-written, easy to follow, and supported by clear visualizations.

      While JEPA-style models have recently gained significant traction in the artificial intelligence community, this paper nicely bridges the gap to neuroscience. By framing these architectures as a theory for visual learning in the brain, the authors provide valuable insights into how predictive frameworks can explain cortical processing.

      The qualitative alignment with V1 and PFC data is a particularly strong contribution, as it offers a potential mechanistic explanation for observed neural phenomena through the lens of selfsupervised learning.

      (3.1) The central claim, that both invariance and equivariance emerge spontaneously, requires further scrutiny (see Ghaemi et al., NeurIPS, 2025; Garrido et al., arXive, 2024). In particular, the synthetic ”moving animal” dataset used in this paper may be too simple to fully support this claim. In latent space prediction, a model must predict both the scene content and the dynamics of movement. Because movement (whether ego-motion or external) is often highly uncertain (or multi-modal), predictive models in naturalistic settings often ”collapse” toward learning purely invariant representations, ignoring the hard-to-predict dynamics. In the provided simulations, the movements are extremely predictable. In more complex scenarios, the model would likely prioritize content (invariance) over dynamics (equivariance) unless aided by action-conditioning or explicit factor estimation (Zhang et al., ICLR, 2026). The authors’ results in Figure 5 using naturalistic video seem to reflect this limitation, given the lower performance on the naturalistic videos compared to the synthetic datasets.

      We thank the reviewer for the feedback. We agree that further validation on more complex datasets would strengthen the claims, and we take this point seriously. If the reviewer has any suggestions for a specific alternative dataset, we would welcome any recommendations.

      Regarding the mouse video data specifically, we realized that this is a suboptimal benchmark rather than a shortcoming of our method. The culprit presumably is that the mice remain largely stationary, leading to a heavily imbalanced velocity distribution peaked near zero (Supplementary Fig. S9). This imbalance makes equivariance evaluation unreliable regardless of the learning algorithm. For example, end-to-end supervised training results in an R<sup>2</sup> of 0.19 compared to 0.08 ± 0.02 for RPL.

      Regarding the moving animal dataset, we note that the dynamics are not trivial from an SSL perspective: unlike moving MNIST (Srivastava et al., 2015), the dataset includes changes in scale and orientation, both features that invariance-focused SSL models can easily ignore, yet RPL recovers reliably. For example, this discrepancy can be seen in Supplementary Table S1 where we compare to InfoNCE and CPC. That said, we acknowledge the reviewer’s broader concern and will seek to validate RPL on more complex datasets.

      While it would be nice to compare to related work by Ghaemi et al. (2024), this study used 3DIEBench (Garrido et al., 2023). Unfortunately, 3DIEBench’s reliance on pair-based representations with annotated but random augmentations (such as rotations or color changes) precludes the possibility of smooth latent traversals that would be required for RPL to learn from the same dataset. We will look into whether it is computationally feasible to adapt or regenerate a similar dataset that meets the requirements for temporal prediction.

      Regarding stochasticity, we agree that predictive learning in latent space is most natural in approximately deterministic settings, whereas real world sensory information often comprises non-deterministic elements. While a deeper treatment of such stochastic environments is beyond the scope of the present manuscript, it will be the focus of ongoing and future work. Regarding ongoing work, it is worth mentioning that in recent work from our group (Hauri et al., 2026), we have demonstrated that RPL’s core objective can replace the reconstruction loss in Dreamer, achieving competitive performance in complex, stochastic environments. While we did not systematically evaluate equivariance in this study, the results suggests that representation-space predictive learning is viable beyond the deterministic regime.

      What we will do: We will make the point about the real-world mouse video dataset being a poor benchmark and include the additional R<sup>2</sup> values to show that. Further, we will try to identify or generate alternative datasets to back the equivariance claims and discuss our findings in the light of previous work, e.g., Ghaemi et al. (2024). Moreover, we will sharpen our discussion of our model’s limitations in stochastic settings and highlight notable connections to related work.

      (3.2) The framing of the RPL model as an entirely new theory of representation learning is slightly overstated. The focus on prediction in representation space rather than input space is the defining characteristic of JEPA and various other Self-Supervised Learning (SSL) models, even sequential prediction. While this paper clarifies the connection between these AI frameworks and cortical circuits, the work would be strengthened by more explicitly positioning RPL within the context of existing JEPA-style models and prior SSL theories of the visual system.

      Thanks for raising this point. We are unsure what the reviewer refers to. We did not frame our work as ”an entirely new theory of representation learning,” as the reviewer suggests. In fact, we highlight quite the opposite already in the title of our article, which reads: “Understanding neural circuit principles for representation learning through joint-embedding predictive architectures.” We do not claim novelty over JEPA as an ML paradigm, we adopt it precisely because it provides a principled, non-generative framework for predictive representation learning, and our goal is to develop a circuit level instantiation that accounts for neural circuit computation. We already discuss a body of previous work of self-supervised learning and JEPAs at length. Since the reviewer did not specify what they are missing, we will briefly reiterate what is already there.

      Our contribution is a theory of representation learning in the brain, built on JEPAs as the underlying ML framework. The Title and Introduction already position our work quite explicitly this way. Specifically, we mention prior work on JEPAs (CPC, BYOL, SimSiam, I-JEPA, seq-JEPA, V-JEPA, V-JEPA 2), while noting that “most JEPAs developed in machine learning are poor models of cortical computation” because of their reliance on negative sampling, transformers, masking, static images, and/or known parametrized transformations, and motivate RPL as the minimal candidate that “must instead rely on recurrent neural dynamics, learn from streaming sensory input without masking, support both invariant and equivariant representations, and reproduce key neurophysiological observations.”

      The Discussion (“Relation to previous modeling work”) further details the specific novelties of RPL relative to existing sequential JEPA-style and SSL models like CPC (Oord et al., 2018), V-JEPA (Bardes et al., 2024), V-JEPA 2 (Assran et al., 2025), seq-JEPA (Ghaemi et al., 2024). In brief:

      RPL is a recurrent JEPA based on RNN dynamics, not transformers, and learns from streaming sensory input without masking or random negative sampling;

      It explicitly compares three prediction-error topologies (RPL vs. invariance learning vs. contextprediction; Fig. 2, Suppl. Fig. S2, S6) and shows that asymmetric recurrent prediction is essential for jointly learning invariant and equivariant representations;

      Importantly, it does so via pure temporal prediction without access to underlying transformations, a property shared by very few JEPAs. The closest exception is VJ-VCR (Drozdov et al., 2024) which uses an explicit variance-covariance regularization (VCReg) in a JEPA, which we will cite in the revised manuscript;

      It provides the first hierarchical JEPA optimizing local prediction errors at multiple levels (h-RPL, Fig. 8), as envisioned by LeCun (2022) but not previously implemented;

      It connects directly to neurophysiological data: successor-like representations in human V1 and abstract sequence representations in macaque PFC, which provides qualitative correspondence between JEPA components and cortical activity that the existing JEPA literature, focused on ML benchmarks, does not address.

      Finally, our article already includes a discussion paragraph on recent self-supervised learning models in the context of the brain where we discuss work by Nejad et al. (2025) and Asabuki et al. (2025). Most other SSL theories of the visual system rely on static images and recognition tasks (Yerxa et al., 2024; Margalit et al., 2024). However, there are two studies that include temporal prediction objectives and are worth mentioning with more details: First, Bakhtiari et al. (2021) show that representations similar to ventral and dorsal pathways in the visual system can emerge in a two-pathway encoder architecture within the CPC model. Second, Niu et al. (2024) use a “straightening” objective together with VCReg as a practical model of the perceptual straightening hypothesis (H´enaff et al., 2019). Though not a JEPA (i.e., has no predictor network), it can decode equivariant factors in a sequential MNIST dataset where only single factors change throughout a video.

      What we will do: We will carefully review our discussion of previous work and further discuss Drozdov et al. (2024), Bakhtiari et al. (2021), and Niu et al. (2024) in the revised manuscript.

      (3.3) A significant challenge in latent-space SSL is avoiding “representational collapse” (where the model provides a trivial constant output). While the paper alludes to JEPAlike solutions, it lacks a detailed explanation (in both the text and the architectural schematics) of the specific technique used to prevent collapse. Consequently, it is difficult to evaluate the authors’ claim of “biological plausibility,” as the biological equivalents of common machine learning techniques (such as stop gradient) are not discussed.

      Thanks for pointing this out. Our model avoids collapse through the asymmetric stop-grad / predictor architecture. It does not require an EMA, when the predictor learns with a faster learning rate than the rest of the network (see also our response to Point P1.3).

      The use of stop-grad suggests that a circuit learning with RPL needs to compute a vector-based instructive learning signal. While we do not explicitly model the circuit level mechanisms of how this could be implemented in the brain, excitation-inhibition balance is one possibility (Rossbroich et al., 2025). Finally, differences in learning rate can be implemented both structurally or functionally in the brain (see Liu et al. (2025) for instance), or activity normalization is suggested as a canonical computation in biological neural circuits (Carandini et al., 2012).

      What we will do: We will make sure to discuss these putative biological mechanisms in the revised manuscript.

      (3.4) Recent work has shown that the capacity (size) of the predictor significantly influences the learned representations in a JEPA-type world model (Gorrido et al., 2024). In simpler scenarios, a large enough predictor can allow a model to ”memorize” dynamics rather than learning generalized equivariant features. It would be beneficial to see how the ratio of predictor size to encoder size affects the emergence of these features.

      Thanks for raising this concern. We don’t observe noticeable difference in position and velocity decoding when changing the width or depth of the MLP predictor in the moving animals data. However, performance on rotation speed and orientation decoding scales with the changes in width, but not depth of the predictor. This analysis excludes the effect of integrator’s capacity as it directly affects the dimensionality of the representations, even though it also effectively contributes to prediction computation in RPL.

      What we will do: We will include a figure how how task performance varies with the predictor’s width and depth.

      Methodological Clarifications

      (3.5) The authors mention a contrastive learning comparison but provide few details. Since contrastive learning is primarily a technique to avoid collapse, it would be a more rigorous baseline if implemented within the same architecture as RPL to isolate the effect of the predictive objective.

      Thanks for the question. We already use the same network model as in RPL for the contrastive predictive learning (InfoNCE) baseline in Supplementary Table S1 and mentioned in the main text (l.164).

      What we will do: We will mention the architecture of the non-linear predictor used for InfoNCE baseline in Methods more explicitly.

      (3.6) In the PFC data comparison (Figure 7f), there appears to be a discrepancy where the local and global conditions show nearly identical results in PFC, while different dynamics in the model. It is unclear if this is a visualization error or a genuine model deviation.

      Thanks for picking up on this subtlety in the experimental results. To clarify, it is a model deviation but an interesting one. The local and global responses do look quite similar in the original PFC data. They differ in that the global oddball (xY|xx and xx|xY) response has a secondary peak that encodes the presence of the global oddball, whereas the initial response is actually dominated by local oddball encoding (xY vs xx). Concretely, this results in the response to the xx|xY condition only showing up weakly in the data and at a time lag with respect to the initial local oddball response. Our model, however, does not show the transient initial response to local oddballs in the decoding direction for global oddballs. In a sense, the network model encodes the global oddball concept more robustly than is seen in the PFC data. That said, whether this indicates a genuine difference in representational strategies that needs to be further accounted for, or whether it is an issue stemming from limited sub-sampling of PFC neurons, remains unclear.

      (3.7) The criteria for selecting specific model variables for comparison with V1 versus PFC are not explicitly defined. Clarification is needed on whether the same latent variables were used for both brain regions or if different layers were selected.

      To clarify, the successor-like representations in human V1 and abstract representations in macaque PFC are two different experiments, so each has different latent variables requiring different RPL models. The architecture used for each experiment is detailed in Methods and the criteria for selecting each architecture was the simplest that should work given the task complexity. Throughout the paper, all representation analysis is done on the output of integrator (c) unless said otherwise. We hope this resolves the confusion.

      References

      Chen, Xinlei et al. (2021). “Exploring simple siamese representation learning”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758.

      Srinath Halvagal, Manu et al. (2023). “Implicit variance regularization in non-contrastive SSL”. In: Advances in Neural Information Processing Systems 36, pp. 63409–63436.

      Mikulasch, Fabian A et al. (2026). Understanding Self-Supervised Learning via Latent Distribution Matching. arXiv: 2605.03517[cs.LG].

      Furutachi, Shohei, Alexis D. Franklin, et al. (Sept. 2024). “Cooperative thalamocortical circuit mechanism for sensory prediction errors”. en. In: Nature 633.8029. Publisher: Nature Publishing Group, pp. 398–406. issn: 1476-4687. doi: 10.1038/s41586-024-07851-w.

      Furutachi, Shohei and Sonja B Hofer (2026). “Rethinking Predictive Processing”. In: Annual Review of Neuroscience 49.

      Vasilevskaya, Anna et al. (2026). “A functional influence based circuit motif that constrains the set of plausible algorithms of cortical function”. In: bioRxiv. doi: 10.64898/2026.01.29.702557. eprint: https://www.biorxiv.org/content/early/2026/01/29/2026.01.29.702557.full. pdf.

      Nejad, Kevin Kermani et al. (July 2025). “Self-supervised predictive learning accounts for cortical layer-specificity”. en. In: Nat Commun 16.1, p. 6178. issn: 2041-1723. doi: 10.1038/s41467-025-61399-5.

      Ekman, Matthias et al. (Feb. 2023). “Successor-like representation guides the prediction of future events in human visual cortex and hippocampus”. In: eLife 12. Ed. by Morgan Barense et al., e78904. issn: 2050-084X. doi: 10.7554/eLife.78904.

      Dayan, Peter (1993). “Improving generalization for temporal difference learning: The successor representation”. In: Neural computation 5.4, pp. 613–624.

      Nøkland, Arild (2016). “Direct feedback alignment provides learning in deep neural networks”. In: Advances in neural information processing systems 29.

      Lillicrap, Timothy P et al. (2016). “Random synaptic feedback weights support error backpropagation for deep learning”. In: Nature communications 7.1, p. 13276.

      Zenke, Friedemann et al. (2018). “Superspike: Supervised learning in multilayer spiking neural networks”. In: Neural computation 30.6, pp. 1514–1541.

      Bellec, Guillaume et al. (2020). “A solution to the learning dilemma for recurrent networks of spiking neurons”. In: Nature communications 11.1, p. 3625.

      Illing, Bernd et al. (2021). “Local plasticity rules can learn deep representations using self-supervised contrastive predictions”. In: Advances in Neural Information Processing Systems 34.

      Zihan, Wu S et al. (2026). “Can Local Learning Match Self-Supervised Backpropagation?” In: arXiv preprint arXiv:2601.21683.

      Srivastava, Nitish et al. (2015). “Unsupervised learning of video representations using lstms”. In: International conference on machine learning. PMLR, pp. 843–852.

      Ghaemi, Hafez et al. (2024). “Seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models”. In: NeurIPS 2024 Workshop: Self-Supervised Learning - Theory and Practice.

      Garrido, Quentin et al. (2023). “Self-supervised learning of split invariant equivariant representations”. In: arXiv preprint arXiv:2302.10283.

      Hauri, Michael et al. (2026). “Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction”. In: arXiv preprint arXiv:2603.07083.

      Oord, Aaron van den et al. (July 2018). “Representation Learning with Contrastive Predictive Coding”. In: arXiv:1807.03748 [cs, stat]. arXiv: 1807.03748.

      Bardes, Adrien et al. (2024). V-JEPA: Latent Video Prediction for Visual Representation Learning.

      Assran, Mido et al. (2025). “V-jepa 2: Self-supervised video models enable understanding, prediction and planning”. In: arXiv preprint arXiv:2506.09985.

      Drozdov, Katrina et al. (2024). “Video representation learning with joint-embedding predictive architectures”. In: arXiv preprint arXiv:2412.10925.

      LeCun, Yann (2022). “A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-0627”. en. In.

      Asabuki, Toshitake et al. (2025). “Learning predictive signals within a local recurrent circuit”. In: Proceedings of the National Academy of Sciences 122.27, e2414674122. doi: 10.1073/pnas. 2414674122. eprint: https://www.pnas.org/doi/pdf/10.1073/pnas.2414674122.

      Yerxa, Thomas et al. (2024). “Contrastive-equivariant self-supervised learning improves alignment with primate visual area it”. In: Advances in neural information processing systems 37, pp. 96045–96070.

      Margalit, Eshed et al. (2024). “A unifying framework for functional organization in early and higher ventral visual cortex”. In: Neuron 112.14, pp. 2435–2451.

      Bakhtiari, Shahab et al. (2021). “The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning”. In: Advances in Neural Information Processing Systems. Ed. by M. Ranzato et al. Vol. 34. Curran Associates, Inc., pp. 25164–25178.

      Niu, Julie Xueyan et al. (2024). “Learning predictable and robust neural representations by straightening image sequences”. In: Advances in Neural Information Processing Systems 37, pp. 40316– 40335.

      H´enaff, Olivier J et al. (2019). “Perceptual straightening of natural videos”. In: Nature neuroscience 22.6, pp. 984–991.

      Rossbroich, Julian et al. (2025). “Breaking Balance: Encoding local error signals in perturbations of excitation-inhibition balance”. In: bioRxiv, pp. 2025–05.

      Liu, Peng et al. (2025). “Layer-specific changes in sensory cortex across the lifespan in mice and humans”. In: Nature neuroscience 28.9, pp. 1978–1989.

      Carandini, Matteo et al. (2012). “Normalization as a canonical neural computation”. In: Nature reviews neuroscience 13.1, pp. 51–62.

    1. Reviewer #1 (Public review):

      The manuscript shows that different traits of adults and larvae correlate with Red List status. The authors argue that this shows a big gap in the conservation of amphibians and that the traits of all life stages should be taken into account in amphibian conservation. Specifically, amphibian conservation should do more for the habitats where the larvae live.

      The manuscript is well written and easy to understand. The methods are sound.

      While the study will make an interesting contribution to conservation science, there are many things that I disagree with.

      I don't think that amphibian larvae and their requirements are a "blind spot" as the title suggests. When reading the manuscript, I didn't learn how conservation practice should change in response to the results.

      I wonder whether the relationship between species traits and extinction risk is of great importance for conservation. If a species is Data Deficient on the IUCN Red List, then species traits could be used to predict its Red List category. However, for other conservation projects, I don't see how this would work. How would traits be linked to captive breeding, conservation translocation, pond construction or habitat management in general? In some cases, I can envision a link between species traits and pond hydroperiod.

      Species traits are body size and morphological traits. That makes sense. However, one of the species traits was microhabitat. I find it far-fetched to call habitat a species trait. This is standard habitat ecology. It is well known that habitats matter and that different habitat types face different threats, and consequently, the species that live in those habitats. Furthermore, habitat and morphology may be confounded. For example, tadpoles in lentic and lotic habitats have very different morphologies. So is it habitat or morphology?

      I don't know how the threat status of Chinese amphibians is determined. IUCN has multiple reasons why a species can be Red Listed. One reason is range size, and another reason is population decline. Personally, I don't think they should be pooled in an analysis because they are fundamentally different reasons why a species has a high extinction risk. A reduction in population size of greater than 30% in 10 years or 3 generations is not the same thing as a small distribution range. Another issue is that IUCN developed the Green Status of species. The Green Status shows that even a species which is LC on the Red List may be significantly depleted.

      The species traits in Table 1 are mostly functional/morphological and body size related (and microhabitat). While there may be correlations between traits and Red List status, it is unknown whether this is correlation or causation. In addition, it is difficult to know the conservation interventions that may be necessary now that we know that relative head with and Red List status are correlated.

      In the discussion, the authors explain why body size and other traits may affect extinction risk and whether there is a causal relationship. I agree that body size may have a direct effect because larger species are harvested more frequently (it was interesting to learn that tadpoles are harvested as well). However, as macroecological studies show, smaller species often have larger populations than larger species. Abundance may matter.

      I found it much harder to understand why relative head length and tympanum size correlated with Red List status. I wasn't convinced by the arguments in the discussion. Typanum size may be related to hearing and anthropogenic noise. Several studies are cited which show that frogs alter their calling behaviour in response to noise. Crucially, however, they describe changes in behaviour or properties of the advertisement call, yet none show that noise has effects on population viability. If some anthropogenic stressor affects individuals, then this does not mean that it will cause a population decline. When IUCN published the second global amphibian assessment, did they list noise as a major threat to amphibians?

      There are statements that the tadpole stage is the most important stage: "a critical period for amphibian survival" (line 78-79). While there is high mortality in the tadpole stage, tadpole survival is rather unlikely to affect population survival. Many population models show this. See, for example, Biek et al. 2002 in Conservation Biology. Other papers have argued that the postmetamorphic juvenile stage is most important (Petrovan and Schmidt 2009 Biological Conservation).

      The authors repeatedly make the statement that amphibian conservation should focus more on the tadpole stage. I don't understand why this statement is made. For example, a major activity in amphibian conservation is the restoration and de novo construction of ponds (see Calhoun et al. 2014 PNAS, Moor et al. 2022 PNAS). Ponds are habitats for tadpoles. Others removed fish from amphibian breeding sites because fish prey on tadpoles (and adults; see Vredenburg 2004 PNAS). Semlitsch (2002 in Conservation Biology) argued that the management of pond hydroperiod is a critical element of amphibian recovery plans. Ponds should be temporary because this effectively removes predators that consume tadpoles. Clearly, the tadpole stage is not a neglected stage in amphibian conservation.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important study fills a major geographic and temporal gap in understanding Paleocene mammal evolution in Asia and proposes an intriguing "brawn before bite" hypothesis grounded in diverse analytical approaches. However, the findings are incomplete because limitations in sampling design - such as the use of worn or damaged teeth, the pooling of different tooth positions, and the lack of independence among teeth from the same individuals - introduce uncertainties that weaken support for the reported disparity patterns. The taxonomic focus on predominantly herbivorous clades also narrows the ecological scope of the results. Clarifying methodological choices, expanding the ecological context, and tempering evolutionary interpretations would substantially strengthen the study.

      We have now thoroughly revised our manuscript in response to the editor and reviewer’s comments. In particular with regard to:

      (1) Sampling design: we clarified our methods section to indicate that we did not use worn or broken teeth in our initial analyses. We added the following sentence around line 690:

      “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      (2) Pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).

      For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.

      For tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.

      For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S9-11, only Fig. S10 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.

      (3) Ecological scope of the study: although carnivorans and mesonychids are recorded from some of the time intervals examined in this study, our sampling choice of pantodonts and anagalids reflects the high abundance of available dental specimens in those clades, permitting us to make the strongest statistical inference given the incomplete fossil record. Additionally, all sampled taxa come from archaic clades that have not been determined to be specifically herbivorous; we included an additional paragraph in the introduction to explain this:

      “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the generally stratigraphically limited nature of early Cenozoic sequences. In Asia, Paleocene localities in China represent the best studied to date[11]. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene[11]. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction[1]. Herein we treat the archaic Paleocene taxa in our analyses as having generalized diets rather than categorizing them as insectivores, herbivores, or carnivores.”

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work provides valuable new insights into the Paleocene Asian mammal recovery and diversification dynamics during the first ten million years post-dinosaur extinction. Studies that have examined the mammalian recovery and diversification post-dinosaur extinction have primarily focused on the North American mammal fossil record, and it's unclear if patterns documented in North America are characteristic of global patterns. This study examines dietary metrics of Paleocene Asian mammals and found that there is a body size disparity increase before dietary niche expansion and that dietary metrics track climatic and paleobotanical trends of Asia during the first 10 million years after the dinosaur extinction.

      Strengths:

      The Asian Paleocene mammal fossil record is greatly understudied, and this work begins to fill important gaps. In particular, the use of interdisciplinary data (i.e., climatic and paleobotanical) is really interesting in conjunction with observed dietary metric trends.

      Weaknesses:

      While this work has the potential to be exciting and contribute greatly to our understanding of mammalian evolution during the first 10 million years post-dinosaur extinction, the major weakness is in the dental topographic analysis (DTA) dataset.

      There are several specimens in Figure 1 that have broken cusps, deep wear facets, and general abrasion. Thus, any values generated from DTA are not accurate and cannot be used to support their claims. Furthermore, the authors analyze all tooth positions at once, which makes this study seem comprehensive (200 individual teeth), but it's unclear what sort of noise this introduces to the study. Typically, DTA studies will analyze a singular tooth position (e.g., Pampush et al. 2018 Biol. J. Linn. Soc.), allowing for more meaningful comparisons and an understanding of what value differences mean. Even so, the dataset consists of only 48 specimens. This means that even if all the specimens were pristinely preserved and generated DTA values could be trusted, it's still only 48 specimens (representing 4 different clades) to capture patterns across 10 million years. For example, the authors note that their results show an increase in OPCR and DNE values from the middle to the late Paleocene in pantodonts. However, if a singular tooth position is analyzed, such as the lower second molar, the middle and late Paleocene partitions are only represented by a singular specimen each. With a sample size this small, it's unlikely that the authors are capturing real trends, which makes the claims of this study highly questionable.

      With regard to sampling design: we clarified our methods section to indicate that we did not use worn or broken teeth in our initial analyses. We added the following sentence around line 690:

      “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      With regard to pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).

      For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.

      For the tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.

      For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S8-10, only Fig. S9 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.

      Reviewer #2 (Public review):

      Summary:

      This study uses dental traits of a large sample of Chinese mammals to track evolutionary patterns through the Paleocene. It presents and argues for a 'brawn before bite' hypothesis - mammals increased in body size disparity before evolving more specialized or adapted dentitions. The study makes use of an impressive array of analyses, including dental topographic, finite element, and integration analyses, which help to provide a unique insight into mammalian evolutionary patterns.

      Strengths:

      This paper helps to fill in a major gap in our knowledge of Paleocene mammal patterns in Asia, which is especially important because of the diversification of placentals at that time. The total sample of teeth is impressive and required considerable effort for scanning and analyzing. And there is a wealth of results for DTA, FEA, and integration analyses. Further, some of the results are especially interesting, such as the novel 'brawn before bite' hypothesis and the possible link between shifts in dental traits and arid environments in the Late Paleocene. Overall, I enjoyed reading the paper, and I think the results will be of interest to a broad audience.

      Weaknesses:

      I have four major concerns with the study, especially related to the sampling of teeth and taxa, that I discuss in more detail below. Due to these issues, I believe that the study is incomplete in its support of the 'brawn before bite' hypothesis. Although my concerns are significant, many of them can be addressed with some simple updates/revisions to analyses or text, and I try to provide constructive advice throughout my review.

      (1) If I understand correctly, teeth of different tooth positions (e.g., premolars and molars), and those from the same specimen, are lumped into the same analyses. And unless I missed it, no justification is given for these methodological choices (besides testing for differences in proportions of tooth positions per time bin; L902). I think this creates some major statistical concerns. For example, DTA values for premolars and molars aren't directly comparable (I don't think?) because they have different functions (e.g., greater grinding function for molars). My recommendation is to perform different disparity-through-time analyses for each tooth position, assuming the sample sizes are big enough per time bin. Or, if the authors maintain their current methods/results, they should provide justification in the main text for that choice.

      With regard to pooled versus by-tooth position analyses: we repeated the three major analyses (DTA & FEA variability through time, tooth size and variability through time, and DTA-FEA correlation through time) for individual molars (upper M1-3, lower m1-3) and select premolars (upper P3-P4 and lower p4; lower and upper p2 samples contained fewer than 5 specimens across the three time intervals, lower p3 contained only 2 specimens for the middle Paleocene, so they were excluded from the sub-partition analyses).

      For DTA & FEA variability through time (summarized as a new figure, Fig. S5, also pasted below), OPCR, DNE, and FEA trait data are supported in 78-100% of the per-tooth analyses for both the early-middle Paleocene and middle-late comparisons. By contrast, RFI and Slope data are replicated in only 22-56% of the per-tooth analyses. We qualified the main text reporting and discussion to include these sensitivity analyses so readers can assess nuances in the data when comparing pooled sample versus per-tooth analyses.

      For the tooth size and variability through time (summarized in a new table, Table S3, also pasted below), we observed broad concordance in the pooled analyses and the per-tooth partitioned analyses. Different tooth positions provide strong support for different aspects of the observed trends, with the lower fourth premolar being the strongest driver of the overall trend. All of the significant trends in per-tooth analyses are in the same direction (i.e., decreasing size disparity and size mean through time) as the pooled sample. We added qualifying clarification in the text to bring attention to these refined results.

      For DTA-FEA correlation through time, we generated per-tooth correlation plots in three new figures (Figs. S8-10, only Fig. S9 shown here as an example). We observed that upper M1 patterns general reflect the trend recovered from analysis of the overall dataset, but M2 and M3 results display inconsistent DTA-FEA correlations, possibly due to small sample sizes. Lower molar patterns generally replicate those recovered in the overall analyses, but lower M1 and M2 signals appear to be stronger than those for lower M3. Finally, low sample sizes make premolar correlations unstable, with general pattern showing EP-MP strengthening then MP-LP stasis or weakening. Given these findings, it appears that the results in the pooled sample correlation plots are mainly driven by lower molar signals. It is not possible to conclude the other tooth position display different patterns because of the limited sample sizes.

      Also, I think lumping teeth from the same specimen into your analyses creates a major statistical concern because the observations aren't independent. In other words, the teeth of the same individual should have relatively similar DTA values, which can greatly bias your results. This is essentially the same issue as phylogenetic non-independence, but taken to a much greater extreme.

      It seems like it'd be much more appropriate to perform specimen-level analyses (e.g., Wilson 2013) or species-level analyses (e.g., Grossnickle & Newham 2016) and report those results in the main text. If the authors believe that their methods are justified, then they should explain this in the text.

      Based on the per-tooth partition analyses we performed and reported above, the results now show that the overall trends described in the previous draft of the study is a composite of signals from different regions of the dentition. For example, the OPCR, DNE, and FEA trends persist across most tooth positions, whereas the Slope and RFI trends are mainly driven by lower fourth premolar patterns. The tooth size results are also mainly driven by lower fourth premolar patterns, but tooth disparity trends are broadly supported across tooth positions. These observations indicate that the overall trends remain valid, but there are nuances as to which tooth positions are driving which components of the trends. As such, we deem the overall results to be valid, and focused our revision on providing the nuances so readers can assess through-time patterns in more detail than in the previous version of the study.

      (2) Maybe I misunderstood, but it sounds like the sampling is almost exclusively clades that are primarily herbivorous/omnivorous (Pantodonta, Arctostylopida, Anagalida, and maybe Tillodonta), which means that the full ecomorphological diversity of the time bins is not being sampled (e.g., insectivores aren't fully sampled). Similarly, the authors say that they "focused sampling" on those major clades and "Additional data were collected on other clades ... opportunistically" (L628). If they favored sampling of specific clades, then doesn't that also bias their results?

      If the study is primarily focused on a few herbivorous clades, then the Introduction should be reframed to reflect this. You could explain that you're specifically tracking herbivore patterns after the K-Pg.

      We appreciate the reviewer’s suggestion that our sampling may have focused on putative herbivorous clades more than others. However, at the early stage of placental evolution during the Paleocene, and in particular among the endemic forms we studied from south China, it is unclear to us that such clearcut ecomorphological categories were present amongst the fossil mammals. Thus, we take a more agnostic approach and do not define the dietary categories of the sample taxa (and by extension, those of the unsampled taxa). Although we recognize that representatives of certain clades, such as Carnivora, may be more reasonably interpreted as carnivores/insectivores/omnivores and, in the current context, remains unsampled, we point out the fact that including tooth samples from rare taxa such as carnivores likely would have biased the analyses temporally. Chinese Paleocene carnivores are known only from one of the three time intervals analyzed (representing only a handful of specimens), and so would potentially inflate the disparity in that time interval relative to the others (if dentitions specialized for carnivory is assumed to be present in the Paleocene). To clarify this point, we added a paragraph in the introduction:

      “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the generally stratigraphically limited nature of early Cenozoic sequences. In Asia, Paleocene localities in China represent the best studied to date[11]. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene[11]. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction[1]. Herein we treat the archaic Paleocene taxa in our analyses as having generalized diets rather than categorizing them as insectivores, herbivores, or carnivores.”

      (3) There are a lot of topics lacking background information, which makes the paper challenging to read for non-experts. Maybe the authors are hindered by a short word limit. But if they can expand their main text, then I strongly recommend the following:

      a) The authors should discuss diets. Much of the data are diet correlates (DTA values), but diets are almost never mentioned, except in the Methods. For example, the authors say: "An overall shift towards increased dental topographic trait magnitudes ..." (L137). Does that mean there was a shift toward increased herbivory? If so, why not mention the dietary shift? And if most of the sampled taxa are herbivores (see above comment), then shouldn't herbivory be a focal point of the paper?

      We edited the introduction to say that “We used dental topographical traits as indicators of ecomorphological diversity[28] and examined temporal shifts in tooth crown complexity, curvature, and height and their association with tooth performance in terms of deformation resistance using topographic and simulation analyses.” And also added the following to the methods section, in order to clarify that we are using DTA as a general ecomorphological proxy, and not a direct dietary proxy.

      “Overall, we use these DTA traits as indicators of ecomorphological capacity, but do not link them explicitly to dietary categories. The craniodental morphology of archaic placental clades in general have not been demonstrated to share the same structure-function linkages as crown mammals, so the aforementioned linkages between DTA and dietary ecology in extant species only serve as evidence that DTA is a potentially useful ecomorphological proxy, without the application of those DTA-diet relationships to the Paleocene fossil mammal dataset.”

      b) The authors should expand on "we used dentitions as ecological indicators" (L75). For non-experts, how/why are dentitions linked to ecology? And, again, why not mention diet? A strong link between tooth shape and diet is a critical assumption here (and one I'm sure that all mammalogists agree with), but the authors don't provide justification (at least in the Introduction) for that assumption. Many relevant papers cited later in the Methods could be cited in the Introduction (e.g., Evans et al. 2007).

      We added the following sentence to clarify our usage of tooth crowns as ecomorphological proxies: “Teeth are among the most well-preserved parts of fossil mammals, and the fact that they interface directly with the environment through mastication makes them suitable elements for studying potential ecology-morphology linkages.”

      c) Include a better introduction of the sample, such as explicitly stating that your sample only includes placentals (assuming that's the case) and is focused on three major clades. Are non-placentals like multituberculates or stem placentals/eutherians found at Chinese Paleocene fossil localities and not sampled in the study, or are they absent in the sampled area?

      We modified the following sentence to indicate our sampling focus on placentals: “Our analyses focused on placental mammals from three of the most fossiliferous and biogeographically isolated Paleocene sedimentary sequences in paleotropical Asia: The Nanxiong, Qianshan, and Chijiang Basins in present-day south China 23–27 (Fig. S1)”

      d) The way in which "integration" is being used should be defined. That is a loaded term which has been defined in different ways. I also recommend providing more explanation on the integration analyses and what the results mean.

      If the authors don't have space to expand the main text, then they should at least expand on the topics in the supplement, with appropriate citations to the supplement in the main text.

      We replaced all mentions of “integration” with “covariation” to avoid using the loaded terminology. Covariation more accurately reflects the correlation between two sets of traits (DTA vs FEA) without invoking developmental mechanisms implied by modularity/integration.

      (4) Finally, I'm not convinced that the results fully support the 'brawn before bite' hypothesis. I like the hypothesis. However, the 'brawn before ...' part of the hypothesis assumes that body size disparity (L63) increased first, and I don't think that pattern is ever shown. First, body size disparity is never reported or plotted (at least that I could find) - the authors just show the violin plots of the body sizes (Figures 1B, S6A). Second, the authors don't show evidence of an actual increase in body size disparity. Instead, they seem to assume that there was a rapid diversification in the earliest Paleocene, and thus the early Paleocene bin has already "reached maximum saturation" (L148). But what if the body size disparity in the latest Cretaceous was the same as that in the Paleocene? (Although that's unlikely, note that papers like Clauset & Redner 2009 and Grossnickle & Newham 2016 found evidence of greater body size disparity in the latest Cretaceous than is commonly recognized.) Similarly, what if body size disparity increased rapidly in the Eocene? Wouldn't that suggest a 'BITE before brawn' hypothesis? So, without showing when an increase in body size diversity occurred, I don't think that the authors can make a strong argument for 'brawn before [insert any trait]".

      Although it's probably well beyond the scope of the study to add Cretaceous or Eocene data, the authors could at least review literature on body size patterns during those times to provide greater evidence for an earliest Paleocene increase in size disparity.

      We added a sentence in the discussion of body size during the Paleocene to note that the largest late Cretaceous fossil mammals in China are shrew- to gopher-sized, whereas the largest early Paleocene Chinese Endemic Pantodonts are dog-sized:

      “Dog-sized CEPs such as Bemalambda reached sizes not seen in late Cretaceous mammals from China such as Zhangolestes and Kryptobaatar, which are shrew- to gopher-sized [Meng 2014]”

      Reference: Meng, J. (2014). Mesozoic mammals of China: implications for phylogeny and early evolution of mammals. Natl. Sci. Rev. 1, 521–542. 10.1093/nsr/nwu070.

      Furthermore, we tempered our discussion to restrict the “brawn before bite” hypothesis to post K-Pg recovery in the Paleocene. Body size patterns shifted in the Eocene as crown clades replaced the archaic endemic clades analyzed in our study, and much larger taxa began to appear after the PETM. Such body size shift patterns are based on different clades and likely different dynamics compared to the 10-million year interval examined in our study, so we refrain from commenting on post-Paleocene times.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In regard to the DTA dataset: Was there a method used to 'fix' these teeth before dental topographic analyses were implemented? If so, this should be explicitly stated. If not, the authors should explain why broken, worn, or abraded teeth were used.

      We excluded the incomplete teeth from our analyses. We added the following sentence for clarification: “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      (2) The authors should explicitly explain why all tooth positions were analyzed together. Again, this is not something that is typically done, and some explanation would be helpful for readers.

      We added a paragraph in the methods section to explain both our pooled sampling approach, as well as the per-tooth analyses added in this revised manuscript:

      “Given the rarity of Paleocene fossil material from China, we combined data from different tooth positions into three pooled samples, one for each of the time intervals examined (early, middle, late Paleocene). We treated the pooled samples as representative of the range of dental topographic features and bite performance traits available to the mammal taxa under study. In this way, the variance estimates are interpreted as measures of the morphological and performance heterogeneity present in each time interval dataset. To further tease out the possibility of specific tooth positions driving the overall trends observed in the pooled samples, we also performed the DTA, FEA, DTA-FEA correlation, and tooth size through-time analyses using per-tooth data partitions.”

      (3) I think the authors should hedge their claims a bit more and recognize the limitations of their study (e.g., sample size and tooth preservation).

      We thank the reviewer for raising this important point. We carefully read through the main text and further tempered our interpretations based on the limitations of our data. Additionally, we added a paragraph in the supplemental text to summarize the major sources of uncertainty in the sample:

      “Sample and methodological limitations

      The highly fragmentary nature of early Cenozoic mammal fossils in Asia means that even the best preserved faunas studied herein contain much missing information. First, the absence of a high-resolution chronological framework prevents the fossil data from being analyzed on a continuous time axis; the binning of the samples into three main intervals within a 10-million-year period hinders additional hypotheses about the environmental and climatic correlations of the dental structure-performance results presented. Second, the uneven sampling of the available mammalian assemblage throughout the Paleocene sites in China limits the breadth of ecomorphological categories included in the analyses; rarer taxa representing more specialized carnivore, insectivore, or herbivore forms were not included in our sampling. Third, the spatial discontinuity of stratigraphically younger (Eocene) and older (Cretaceous) mammal assemblages means that body size and ecomorphological shifts bracketing the Paleocene cannot currently be analyzed alongside the dataset presented. These limitations should be taken into account when considering the interpretations made in the main text.”

      Reviewer #2 (Recommendations for the authors):

      I'm including my Line Comments here as recommendations for the authors. But note that many of my recommendations are also in my Public Review.

      L22: "3% of sites"? Do you mean 3% of global sites?

      Yes, we revised the sentence to indicate 3% of global sites. Thank you for this suggestion.

      L35: This is nitpicky because it's not crucial to your study, but I can't help but point out that the Long Fuse, etc, hypotheses are specifically about the DIVERGENCE TIMES for Placentalia and major subclades, NOT the 'adaptive radiation' of placentals like you imply in your text. Adaptive radiations include ecomorphological diversification and are driven by ecological opportunity (e.g., Schluter 2000). (Emphasis on 'ecological.') The long fuse, short fuse, and explosive models do not include an ecological component - i.e., the diversifications could have occurred without ecological diversification. Instead, for hypotheses that are specifically on the adaptive/ecological radiation of mammals, see the Early Rise, Suppression (or Dinosaur Incumbency; Benevento et al. 2023 Palaeontology), and Late Rise hypotheses (Grossnickle et al. 2019 TREE). These hypotheses apply broadly to all mammals, not just placentals (see Box 1's figure in Grossnickle et al. 2019), but they can still be applied to mammalian subclades like eutherians/placentals (e.g., see Thomas Halliday papers).

      Thank you for helping to clarify the adaptive radiation vs. divergence time concepts. We edited this sentence to mention the adaptive radiation hypotheses instead, adding in the references provided by the reviewer.

      L39-40: I think your comment is probably accurate. But keep in mind that advocates of the Early Rise and Delayed Rise hypotheses (see citations within Grossnickle et al. 2019) might argue that other time periods, other than the Paleocene, are equally or more important.

      We added a reference to Grossnickle et al. 2019 to bring attention to potential arguments otherwise. Thank you for the suggestion.

      L48: I think the inclusion of "at higher latitudes" is a little distracting or misleading and should be erased. It implies that the taxonomic diversification was ONLY rapid at higher latitudes. But many of the references that you cite include analyses at the global or continental scale (e.g., Alroy 1999, Grossnickle & Newham 2016) and don't distinguish patterns at different latitudes. If you want to keep the point about latitudes, then I recommend inserting a separate sentence on that point.

      We removed “at higher latitudes”.

      L50: Isn't "stem lineages and those with no living relatives" somewhat redundant? Or do you mean something like "stem placental/eutherian lineages and extinct placental subgroups"?

      Yes, we adopted the suggested phrasing. Thank you.

      L53: I recommend starting a new paragraph around here (maybe starting with "Distinct from ...") that focuses specifically on introducing the 'brawn before [ecomorphological trait]' hypothesis.

      Done.

      L56: "large herbivores and their predators"? Are you just referring to mammals? Wilson (2013), which you cite, and Grossnickle & Newham (2016) argued that dietary specialists were targeted at the K-Pg, but none of the herbivores were "large" (at least relative to Cenozoic herbivores). And most faunivorous mammals at the time were probably insectivorous and not preying on herbivorous mammals, besides maybe a few outlying taxa (e.g., Altacreodus, Nanocuris). I'd revise your sentence for clarity.

      We removed “disproportionately impacting large herbivores and their predators” for clarity.

      L63: I'd replace "ecometric" with "ecomorphological". Ecometrics commonly refers to using fossil traits to infer paleo environments/climate (e.g., see papers by David Polly, Michelle Lawing, etc), which I don't think is what you're referring to here. (E.g., I don't think that brain size or jaw shape patterns were/are used to infer paleo environments.)

      Revised. Thank you.

      L85: I strongly advise against making conclusions like this: "Dental height and sharpness variability ... [spiked] in the middle Paleocene corresponding to a short-lived negative excursion in global temperature." That implies that the change in dentitions is linked to global temperature changes, which I don't think your results support. Later in the text you highlight the temporal uncertainty of your time bin ages (L650) and say that the middle Paleocene bin could be as old as ~62 Ma (L646), which is well before the negative excursion (and looks to be more in line with a positive excursion!), at least according to the Figure 1 time scale (see comment below). So, I don't think that your results even support your statement.

      We reworded this sentence to say “Dental height and sharpness variability were low in the beginning and end of the time interval, with a peak in the middle Paleocene. This pattern is observed both when dentitions are considered holistically and by tooth position in the lower dentition (Fig. S5; upper teeth display the opposite pattern).”

      L144: Using variance for disparity seems fine. But keep in mind that other disparity metrics, such as range (or sum-of-ranges for multivariate data), might produce different results. For instance, variance of RFI and Slope spike in the middle Paleocene, like you point out, but based on the values in Figure 1A, it looks like the ranges stay relatively constant through the Paleocene (although I realize that the ranges might change with bootstrapping). So, your choice of disparity metric might have a big influence on your conclusions. Alternatively, you could calculate disparity using multiple metrics (e.g., Brusatte et al. 2012 Nature Communications; Grossnickle & Newham 2016 supplemental analyses), even if it's just for supplemental analyses.

      Thank you for bringing the choice of disparity measures to our attention. We conducted a parallel set of bootstrapped disparity calculation and comparison analyses using range lengths (maximum trait value – minimum trait value for a given trait) and summarized the through-time trends as for variance-based results (Fig. S5). Overall, very similar trends are observed, providing support for the variance-based data interpretation presented in the main text. We added explanation of this additional sensitivity testing both in the main text and in the supplemental text.

      L147: "body size disparity ... (Fig. 1B, S6A, Table 1, Data S5)." But I don't see disparity calculated or plotted in any of the figures/tables that you cite. You test for differences in disparity between time bins (Table 1), but that doesn't provide the actual disparity patterns.

      We generated a new figure (Fig. S8) to show the tooth size variance and range levels across time and data partitions, and modified this sentence to say that “Over the same time interval examined, body size disparity and mean were higher in the early Paleocene than in subsequent time intervals (Fig. S8, Table S3; also supported by premolar 4 and upper molar partition analyses), indicating that substantial increases in the disparity of dental complexity, curvature, and height lagged behind maximum size disparity tooth size during the Paleocene.”

      L151-153: Maybe. But you're basing this on a much narrower temporal range (Paleocene) than the brain and jaw studies, and I think those studies observed big increases in brain/jaw disparity in the Eocene, which you don't sample. And as I explained elsewhere, I'm not convinced that your results strongly support the same pattern. At a minimum, I recommend tempering your conclusions to better reflect the uncertainty of your results.

      We tempered our statements here to say that “This suggests a ‘brawn before bite’ pattern in endemic Asian mammals, partially mirroring the endocranial and jaw functional morphology patterns identified in their North American and European counterparts [21,22]. These findings raise the possibility that an initial size-driven post-K-Pg recovery followed by ecomorphological radiation was a global phenomenon, even as regional tectonic events such as the initial collision of the Indian subcontinent with Asia and Deccan Traps volcanism influenced local mammal evolution.”

      L170: I'm not well-versed in integration (and modularity) studies, so maybe this reflects my ignorance, but I had trouble understanding sentences like this: "These findings indicate that form-function malleability, the coexistence of distinct topography-performance relationships in each time and taxon partition while overall integration between the two trait groups increases between time bins, was present throughout the Paleocene." If there is space, I recommend revising and/or breaking apart long, jargon-y sentences like that (throughout the paper) so that they're more digestible for readers.

      We simplified complex sentences such as the one the reviewer noted, in order to communicate our findings and interpretations more clearly. Thank you for the suggestion.

      L183: It's probably fine to assume most placental orders arose in the Paleocene based on fossil evidence. But keep in mind that molecular studies often argue that many orders arose in the Late Cretaceous.

      We revised the statement to indicate a “Cretaceous/Paleocene” origin of many modern mammal orders.

      L200-207: Again, this might just reflect my ignorance concerning integration analyses, but I recommend expanding on this text to better explain how your integration results support this conclusion. It seems really interesting, and I like the Garden of Eden hypothesis. It's just not immediately clear to me how your results support that hypothesis. A little more background on how to interpret the integration results would be helpful.

      We expanded the discussion here to say that “Such flexibility in dental form-function linkage permits ‘mix and match’ trait combinations rather than evolutionary change as a single unit, potentially enhancing the evolvability of feeding ecological traits as new environmental conditions arose [Goswami et al. 2015]”

      Reference: Goswami, A., Binder, W.J., Meachen, J., and O’Keefe, F.R. (2015). The fossil record of phenotypic integration and modularity: A deep-time perspective on developmental and evolutionary dynamics. Proc. Natl. Acad. Sci. 112, 4891–4896. 10.1073/pnas.1403667112.

      L218: "reached maximum tooth size disparity early". Again, I don't see size disparity plotted or reported. And without baseline comparisons (Late K or Eocene), it's hard to interpret your results and evaluate what 'maximum' means (Figure 1B).

      We revised the sentence to now say “In response, Paleocene mammal clades in south China between dental topography and bite performance later, all the while maintaining high levels of variability in dental complexity and convexity (Fig. 1).”

      Figure 1A: The time scale in the top left of the figure looks off. Shouldn't the K-Pg be at 66 Ma (not 65 Ma) and the P-E boundary at 56 Ma (not ~54 or 55)?

      We revised Fig. 1 to fix the time scale so that K-Pg is at 65.5 Ma and the P-E boundary at 56 Ma. Thank you for catching this.

      Figure 1A: Is there a different y-axis scale for the variance (red line) results?

      Yes, the y axes for the variance curves were missing. We added them back in. Thank you.

      L628-629: As I explained above, it feels like you focused your sampling just on herbivorous/omnivorous groups, and, if true, this is an important point that should be discussed at the forefront of the paper. Does your sample truly represent the total ecological diversity of the mammalian faunas at the time?

      We agree with the reviewer about the potential partial sampling of the range of ecomorphological diversity when only the most abundant clades are included in the analyses. However, we refrain from interpreting the dietary groupings represented in the dataset using an assumption of functional morphology from crown/extant clades. We added a paragraph in the introduction to bring attention to the inherent uncertainty in the ecological diversity of the dataset:

      “A major challenge with expanding analyses of post K-Pg recovery to Paleocene mammal assemblages elsewhere in the world is the stratigraphically limited nature of early Cenozoic sequences that produce fossil mammals. In Asia, Paleocene localities in China represent the best studied to date 11. From the earliest Paleocene, highly regional and endemic faunas are known from a handful of sedimentary basins (Fig. S1A). Among the faunal elements, only the archaic placental clades Anagalida and Pantodonta are consistently sampled across the major subdivisions of the Paleocene 11. An additional complication with ecomorphological analysis of these early mammals is the uncertainty in their dietary ecology, as they are beyond the reach of conventional phylogenetic bracketing approaches to dietary reconstruction. Phenomic analysis of the placental radiation supports insectivory as the ancestral diet of the hypothetical placental ancestor, but uncertainty in the post K-Pg availability of insects and plants in some regions leave some doubt as to the accuracy of this ancestral state reconstruction 1. Herein we treat the archaic Paleocene taxa in our analyses as having uncharacterized diets rather than categorizing them as insectivores, herbivores, or carnivores. “

      L653: Sorry if this is mentioned elsewhere, but did you avoid using teeth with especially worn or broken cusps? You might expand on how you chose teeth for your sample.

      We left out this detail in the original submission. Thank you for pointing this out. We had to exclude a third of the teeth because they were too worn or broken. We added the following explanation to the methods section:

      “These tooth positions were selected from a broader examination of ~300 individual teeth from 72 specimens. We vetted the specimens and excluded 99 tooth positions (~33% of teeth initially chosen for possible inclusion) from our analyses because they either (1) were partially or completely broken at the crown, (2) were in an advanced stage of attritional wear where no cusps could be identified, or (3) possessed a combination of the two aforementioned conditions.”

      L654: "specimens" should be "teeth", correct? In the preceding sentence, you say that there are 200 teeth from only 48 specimens.

      Corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This important work demonstrates the role of physically linking the core and CTD kinase modules of TFIIH via separate domains of subunit Tfb3 in confining RNA Polymerase II Serine 5 CTD phosphorylation to promoter regions of transcribed genes in budding yeast. The main findings, resulting from analyses of viable Tfb3 mutants in which the linkage between TFIIH core and kinase modules has been severed, are supported by solid evidence from in vitro and in vivo experiments. The new findings raise the intriguing possibility that the Tfb3-mediated connection between core and kinase modules of TFIIH is an evolutionary addition to an ancestral state of physically unconnected enzymes.

      After consultation with the referees, we would like to suggest that you insert text into the RESULTS section acknowledging two limitations of your findings remaining in the revised manuscript, as follows:

      (i) It remains possible that Kin28 abundance was reduced by splitting Tfb3, which could be a factor in reducing its occupancies at gene promoters.

      In response, the paper now contains the following sentence:

      “Kin28 levels in extracts were below the limit of detection for our antibody, so we cannot rule out that the drop in ChIP signal is partly due to reduced Kin28 levels in the split Tfb3 strains. However, the viability of the cells (Figure 2) and the Tfb3-TAP purifications (Figure 3) argue against a complete loss of Kin28.”

      (ii) Lower than wild-type expression of the Tfb3 truncations might contribute to their mutant phenotypes shown in Figs. 2 & 5.

      In response, the paper now contains the following sentence:

      “There was some variation in protein expression levels (Figure 3A, left panel, lanes 1-4), and reduced levels of the split Tfb3 may contribute to the slow growth phenotypes.”

      Public Reviews:

      Reviewer #1 (Public review):

      Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.

      Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about the publication of this manuscript and offer a few minor comments below that may help to further strengthen the study.

      We appreciate the reviewer’s positive assessment of our work and suggestions for improvement.

      Page 4 PIC structures show the linker emerging from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits, followed by a turn and a short stretch of helix just N-terminal to a disordered region that connects to the C-terminal region (see schematic in Fig. 1A).

      The linker helix was only observed in the poised PIC (Abril-Garrido et al., 2023), not other fully-engaged PIC structures.

      Thanks for clarifying. We note that some structures of TFIIH alone also see the long helix. Accordingly, we modified this section to read:

      “In many TFIIH and PIC structures the linker is not visible, presumably due to flexibility. However, when it is seen (Abril-Garrido et al., 2023; Greber et al., 2019), the linker emerges from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits…”

      Page 8 Recent structures (reviewed in (Yu et al., 2023)) show that the Kinase Module would block interactions between the Core Module and other NER factors. Therefore, TFIIH either enters into the NER complex as free Core Module, or the Kinase Module must dissociate soon after.

      To my knowledge, this is still controversial in the NER field. I note the potential function on the kinase module is likely attributed to the N-terminal region of Tfb3 through its binding to Rad3.

      We are not experts on NER, but in reviews of the field this appears to be a widely held assumption. A 2008 paper from the Egly lab (Coin et al., DOI 10.1016/j.molcel.2008.04.024) is usually cited, which shows that the interaction between XPD (metazoan Rad3) and XPA is likely incompatible with XPD-MAT1 interaction. In addition to the Yu 2023 review, we now also cite a more recent publication that more extensively reviews the models for core TFIIH interactions (van Sluis et al, 2025). We looked at the multiple recently published structures of various TCR-NER and GG-NER intermediate complexes, and none of them show the CAK module or even the Tfb3/Mat1 N-term, even though those proteins were typically included during assembly. We also consulted with our colleagues Johannes Walter and Lucas Farnung, who are studying various TC-NER intermediates biochemically and structurally. Although the CAK module is included in their assembly reactions, it is not visible in their cryoEM structures. They tell me that the presence of CAK would be compatible with early TC-NER intermediates, but is predicted to overlap with later interactions of XPD with the TC-NER factor STK19 (see Mevissen et al., Cell 2024). To be conservative, we modified the sentence to say “Recent structures … suggest” rather than “show”.

      Because the yeast strains used in Fig. 6 retain the N-terminal region of Tfb3, the UV sensitivity assay presented here is unlikely to directly address the contribution of the kinase module to NER.

      We agree that our experiment only shows that the connection between Tfb3 N- and C-term domains is not necessary for NER. The individual domains might still be able to function independently. Accordingly, we changed the heading of that section from “Disconnected core TFIIH does not cause an NER defect” to “Split Tfb3 does not cause an NER defect.” This more closely matches the figure legend title.

      Page 11. Notably, release of the Tfb3 Linker contact also results in the long alpha-helix becoming disordered (Abril-Garrido et al., 2023), which could allow the kinase access to a far larger radius of area. This flexibility could help the kinase reach both proximal and distal repeats within the CTD, which can theoretically extend quite far from the RNApII body.

      Although the kinase module was resolved at low resolution in all PIC-Mediator structures, these structural studies consistently reveal the same overall positioning of the kinase module on Mediator, indicating that its localization is constrained rather than variable. This observation suggests that the linker region may help position the kinase module at this specific site, likely through direct interactions with the PIC or Mediator. This idea is further supported by numerous cross-links between the linker region and Mediator (Robinson et al., 2016).

      That is true. But please note that this sentence was meant to describe movement of the kinase module AFTER release from Mediator (see previous sentence). Re-reading the passage, we realized the confusion is because we propose multiple possible pathways in that paragraph. In the first half, we suggest the capture of the kinase module by Mediator might trigger the conformation changes in the linker. In the second half (where it says “Alternatively….”) we suggest the Mediator-CAK interaction could instead come first, and the release of this contact could free the CAK module to move around. We have modified the paragraph to make it clear these are two different distinct models.

      Comments on revisions:

      Revised ms clarified all my points, including those I previously misunderstood.

      Thanks again for helping us improve the manuscript.

      Reviewer #2 (Public review):

      Summary:

      This work advances our understanding of how TFIIH coordinates DNA melting and CTD phosphorylation during transcription initiation. The finding that untethered kinase activity becomes "unfocused," phosphorylating the CTD at ser5 throughout the coding sequence rather than being promoter-restricted, suggests that the TFIIH Core-Kinase linkage not only targets the kinase to promoters but also constrains its activity in a spatial and temporal manner.

      Strengths:

      The experiments presented are straightforward and the model for coupling initiation and CTD phosphorylation and for evolution of these linked processes are interesting and novel. The results have important implications for the regulation of initiation and CTD phosphorylation.

      Comments on revisions:

      The revised version with revisions to figures, text and new data has addressed all of our prior comments.

      We thank the reviewer for helping us improve the paper.

      Reviewer #3 (Public review):

      Summary:

      Eukaryotic gene transcription requires a large assemblage of protein complexes that govern the molecular events required for RNA Polymerase II to produce mRNAs. One of these complexes, TFIIH, comprises two modules, one of which promotes DNA unwinding at promoters, while the other contains a kinase (Kin28 in yeast) that phosphorylates the repeated motif at the C-terminal domain (CTD) of the largest subunit of Pol II. Kin28 phosphorylation of Ser5 in the YSPTSPS motif of the CTD is normally highly localized at promoter regions, and marks the beginning of a cycle of phosphorylation events and accompanying protein association with the CTD during the transition from initiation to elongation.

      The two modules of TFIIH are linked by Tfb3. Tfb3 consists of two globular regions, an N-terminal domain that contacts the Core module of TFIIH and a C-terminal domain that contacts the kinase module, connected by a linker. In this paper, Giordano et al. test the role of Tfb3 as a connector between the two modules of TFIIH in yeast. They show that while no or very slow growth occurs if only the C-terminal or N-terminal region of Tfb3 is present, near normal growth is observed when the two unlinked regions are expressed. Consistent with this result, the separate domains are shown to interact with the two distinct TFIIH modules. ChIP experiments show that the Core module of TFIIH maintains its localization at gene promoters when the Tfb3 domains are separated, while localization of the kinase module, and of Ser5 phosphorylation on the CTD of Pol II, is disrupted. Finally, the authors examine the effect of separating the Tfb3 domains on another function of TFIIH, namely nucleotide excision repair, and find little or no effect when only the N-terminal region of Tfb3 or the two unlinked domains are present.

      Strengths:

      Experiments involving expression of Tfb3 domains in yeast are well-controlled and the data regarding viability, interaction of the separate Tfb3 domains with TFIIH modules, genome-wide localization of the TFIIH modules and of phosphorylated Ser5 CTDs, and of effects on NER, are convincing. The experiments are consistent with current models of TFIIH structure and function and support a model in which Tfb3 tethers the kinase module of TFIIH close to initiation sites to prevent its promiscuous action on elongating Pol II.

      We appreciate that the reviewer finds that our main conclusions are convincing.

      Weaknesses:

      The work is limited in scope and does not provide major insights into the mechanism of transcription. The main addition to current models of transcription is that tethering of Kin28 to Tfb3 may limit kinase action from occurring downstream from the initiation site.

      The first described experiment, which purports to show that three kinases cannot function in place of Kin28 when tethered (by fusion) to Tfb3 is missing the crucial control of showing that Kin28 can support viability in the same context. This result also does not connect with the rest of the manuscript, although the experiment apparently motivated the subsequent studies reported here.

      We elected not to do this control experiment for several reasons. As reviewer 3 points out, this kinase fusion experiment turned out to be somewhat disconnected from the rest of the paper. Even though it didn’t work, we included it in the paper because the results led us to the realization that the Tfb3 C-term was actually not fully essential for viability as reported, which in turn led us to the idea of splitting Tfb3. Structural studies (https://doi.org/10.1126/sciadv.abd4420, https://doi.org/10.1073/pnas.2009627117, https://doi.org/10.7554/eLife.44771) show that, in addition to providing linkage to the core module, the C-term of Tfb3 induces a conformation change in Kin28/Cdk7 necessary for full kinase activity (which is likely why the strains without C-term are just barely viable). If we were to pursue why the fusions didn’t work, we could tether Kin28 directly to the Tfb3 linker (and may try this in the future), but then would need to also express the C-term separately for its activating function. Even then, this would be an imperfect control for the fusion experiments in Figure 1. Because were trying to best mimic Kin28 being tethered via the accessory subunit Tfb3/Mat1, in the Figure 1 experiment we did not directly attach the kinases to Tfb3. For Ctk1/Cdk12, we fused the Tfb3 linker to the Ctk3 accessory subunit (analogous to Tfb3), and for Bur1/Cdk9, we fused to the cyclin subunit Bur2 (there is no known third subunit in this complex). The one exception was Mpk1, which has no partner subunits and is not a CDK. There are many reasons why this high-risk protein fusion experiment may not have worked, but chose not to pursue it further at this time.

      Finally, the authors present the interesting and reasonable speculation that the TFIIH complex and connecting Tfb3 found in mammals and yeast may have evolved from an earlier state in which the two TFIIH subdomains were present as unconnected, distinct enzymes. It will be interesting to have this idea tested more thoroughly as more molecular evolutionary data becomes available.

      Comments on revisions:

      For the most part, the authors have satisfactorily addressed my previous critique. In particular, they have added to their discussion of evolutionary implications, and performed an experiment casting doubt on the assertion of a dominant negative effect, and as a consequence removed this claim from the manuscript. I also pointed out that the fusion experiments that lead off the Results section are missing the crucial control of including a Tfb3-Kin28 fusion. The authors have elected not to perform this control experiment, pointing out that even this control would be imperfect in some respects, and agreeing that this experiment is somewhat disconnected from the rest of the paper. The reason for including it, in spite of its somewhat tangential nature, is that it provides something of a rationale for the experiments that follow. I don't so much mind their retaining the experiment, as the absence of this control (and indeed, the results) does not so much impact the later results. However, I think if it is to be included, this shortcoming should be explicitly recognized, especially as a service to younger scientists who could benefit from an exposition that includes a thorough consideration of potential control experimenents.

      We thank the reviewer for helping us improve the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors report the results of a tDCS brain stimulation study (verum vs sham stimulation of left DLPFC; between-subjects) in 46 participants, using an intense stimulation protocol over 2 weeks, combined with an experience-sampling approach, plus follow-up measures after 6 months.

      Strengths:

      The authors are studying a relevant and interesting research question using an intriguing design, following participants quite intensely over time and even at a follow-up time point. The use of an experience-sampling approach is another strength of the work.

      Weaknesses:

      There are quite a few weaknesses, some related to the actual study and some more strongly related to the reporting about the study in the manuscript. The concerns are listed roughly in the order in which they appear in the manuscript.

      We truly appreciate your dedicating time and efforts to review our manuscript. Yes, we do perceive that those weaknesses you raised all make sense. We agree with you on almost all the suggestions that you detailed below, particularly in clarifying statistics and sample size determination. Please see specific responses below.

      Major Comments

      (1) In the introduction, the authors present procrastination nearly as if it were the most relevant and problematic issue there is in psychology. Surely, procrastination is a relevant and study-worthy topic, but that is also true if it is presented in more modest (and appropriate) terms. The manuscript mentions that procrastination is a main cause of psychopathology and bodily disease. These claims could possibly be described as 'sensationalized'. Also, the studies to support these claims seem to report associations, not causal mechanisms, as is implied in the manuscript.

      Thank you for this very practical suggestion. We agree that the current statements to underline the importance of procrastination are somewhat overreaching. Upon revision, we have overall toned down such claims by explicitly stating them as “associative evidence”, and rewritten a portion of terms in a more modest and balanced style. Please see specific revisions in the main text below:

      Introduction Section (Page 5, Line 64-81)

      “Procrastination is increasingly becoming a prevalent behavioral problem around the world, which reflects the irrational voluntary postponement of scheduled tasks albeit being worse off for such delays (Blake, 2019; Steel, 2007). In the epidemiological investigations, more than 15% of adults were identified as having chronic procrastination problems, and the situation for students was worse as 70-80% of undergraduates engaged in procrastination (American College Health Association, 2022; Ferrari et al., 2005). Moreover, the behavioral genetic evidence indicates a certain heritability of procrastination in human beings as well (Gustavson et al., 2017; Gustavson et al., 2014, 2015). In addition to its prevalence, the undesirable associations between procrastination behavior and health also warrant cautions. There is cumulative evidence to show the close associations between procrastination behavior and working performance, financial status, interpersonal relationships, and subjective well-being (Ferrari, 1994; Pychyl & Sirois, 2016; Steel et al., 2021). Further, as the prospective cohort studies indicated, many mental health problems emerge alongside procrastination, particularly in sleep problems, depression, and anxiety (Hairston & Shpitalni, 2016; Johansson et al., 2023). Even worse, chronic procrastination behavior has been observed to impair general health, as manifested by the intimate associations with close system disruption, gastrointestinal disturbance, as well as a high risk of hypertension and cardiovascular disease (Sirois, 2015; Sirois, 2016). ... ”

      (2) It is laudable that the study was pre-registered; however, the cited OSF repository cannot be accessed and therefore, the OSF materials cannot be used to (a) check the preregistration or to (b) fill in the gaps and uncertainties about the exact analyses the authors conducted (this is important because the description of the analyses is insufficiently detailed and it is often unclear how they analyzed the data).

      We are sorry to encounter a serious technical barrier making our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account (please see the screenshot below). This results in no access to all materials already deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report. We reckon that this may be triggered by my affiliation change to the Third Military Medical University of the People’s Liberation Army (PLA).

      To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” into the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the whole revised manuscript. Furthermore, we fully understand the gaps of comprehending the statistics of this study, resulting from inadequate methodological details in the reporting. Therefore, we have clearly reported extensive details in the Methods section to clarify how to conduct those analyses, favoring the smooth evaluations of our conclusions. Please see what we have added in the lines below (Comments #4-9).

      Methods Section (Page 5, Line 186-191)

      “This study fully adhered to CONSORT reporting guidelines, and was originally preregistered in the OSF repository (10.17605/OSF.IO/Y3EDT). However, due to the technical constraint related to OSF account service (see SM), this OSF page is no longer accessible. For transparency and best practices of open science, based on the original protocol documentations, a preregistration statement has been reconstructed to clarify aprior hypotheses, sample size determinations, and analysis plans for this study (Table S1).”

      (3) Related to the previous point: I find it impossible to check the analyses with respect to their appropriateness because too little detail and/or explanation is given. Therefore, I find it impossible to evaluate whether the conclusions are valid and warranted.

      Again, we apologize for confusing you because of inadequate statistical and methodological details. As you may know, this manuscript has ever been reviewed by Nature Human Behaviour, which editorially constrained the paper length. Thus, a substantial number of details had to be omitted or removed. As you kindly suggested, we have diligently added extensive descriptions to clarify how we carried out statistical analyses in the present study. Please see specific instances underneath.

      (4) Why is a medium effect size chosen for the a priori power analysis? Is it reasonable to assume a medium effect size? This should be discussed/motivated. Related: 18 participants for a medium effect size in a between-subjects design strikes me as implausibly low; even for a within-subjects design, it would appear low (but perhaps I am just not fully understanding the details of the power analysis).

      Thank you for raising this crucial question. We have determined this a priori effect size based on the existing work we published previously (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In our pilot study (Xu et al., 2023), we identified a significant interaction effect between the single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in the laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori. To clarify, we have explicitly justified the selection of this effect size in the Methods section.

      Methods Section (Page 5, Line 206-215)

      “A full randomized block design was used to assign participants to both groups (active neuromodulation group, NM; sham-control group, SC) (see Fig. 2C). As the pilot study probing into the effect of single-session tDCS stimulation to change procrastination willingness indicated (t = 2.38, p = .02, 95% CI [0.14, 1.49]; Xu et al., 2023), statistical power was predetermined by G*Power at a relatively medium effect size (1-β err prob = 0.80, f = 0.25), yielding the total sample size at 18 to reach acceptable power (see SM Methods and Fig. S1)....”

      We fully understand that this sample size to reach a medium effect size is seemingly low, and that the18 participants for each group are apparently limited in any case. Upon double-checking these power analyses, we confirmed that this sample size requirement is indeed correct. Please see the G*Power outputs in Author response image 1.

      Author response image 1.

      Despite the absence of algorithmic errors in the power analysis here, we are aware that this limited sample size may hamper statistical robustness. To tackle this weakness, we have clearly warranted such cautions in the Limitation section:

      Limitations Section (Page 12, Line 637-640)

      “... In addition to technical limitations, given the apparently limited size of the sample (total N = 46), it warrants caution in generalizing these findings elsewhere, and necessitates further validations in a large-scale cohort.”

      (5) It remains somewhat ambiguous whether the sham group had the same number of stimulation sessions as the verum stimulation group; please clarify: Did both groups come in the same number of times into the lab? I.e., were all procedures identical except whether the stimulation was verum or sham?

      Yes, we fully followed the CONSORT pipeline to carry out this double-blind trial, and thus confirmed that all the participants in both groups had the same number of stimulation sessions in our lab. That is to say, except for the stimulation type (verum vs sham), all the procedures, equipment and even the room were identical for all the participants. For clarification, we have clearly stated this in the main text:

      Results Section (Page 9, Line 419-423)

      “In both groups, almost all participants (93.2%, 41/44) reported perceiving acceptable pain stemming from current stimulation, and believed they were receiving treatment (91.30% (21/23) for active neuromodulation group (NM), 86.95% (20/23) for sham control group (SC), x<sup>2</sup> = 0.224, p = .636). All the participants were engaged in the identical experimental procedures excepting to stimulation’s type (active vs sham). ...”

      (6) The TDM analysis and hyperbolic discounting approach were unclear to me; this needs to be described in more detail, otherwise it cannot be evaluated.

      We apologize for the inadequate details, which hindered a precise understanding of the TDM and the hyperbolic discounting model. The Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations to take away from playing actions now for avoiding negative experiences). Once task aversiveness overrides the pursuit of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). Considering the nonlinear dynamics inherent in this hyperbolic discounting, we therefore employed a log-spaced temporal sampling scheme (Myerson et al., 2001) to strengthen curve-fitting performance (please see the schematic diagram (https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time)):

      Specifically, based on the log-spaced temporal sampling rule, five time points were first selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampling occurred at 10:00, 16:00, 18:00, 19:30, 20:00). At each time point, participants reported task aversiveness (A) on a 0–100 Visual Analog Scale (VAS). Then, task aversiveness discounting was calculated as 1- (A<sub>t</sub> / A<sub>earliest</sub>), where t<sub>earliest</sub> was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from these five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed as the trapezoidal integration of task aversiveness discounting over time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination. As you kindly suggested, we have added these details to explicitly clarify how to use the hyperbolic discounting approach for determining sampling time points and for calculating AUC of task aversiveness discounting.

      Methods Section (Page 6, Line 268-283)

      “On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives when performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a priori by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting, requiring ≥ 4 points (Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure.”

      Methods Section (Page 7, Line 318-334)

      “... As articulated temporal decision theoretical model above, the task aversiveness evoked by executing a task was temporally dynamic in a hyperbolic discounting pattern, with sharply discounting in faring away from deadline but slowly discounting in nearing deadline (Zhang & Feng, 2020). To quantitatively characterize the task aversiveness with consideration for its dynamics, the model-free area under the curve (AUC) was calculated. Specifically, based on the log-spaced temporal sampling rule, task aversiveness was measured by 100-point visual analog scale at the five sampling moments. Then, the task aversiveness discounting (A) was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point, serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), the AUC was computed as the trapezoidal integration between task aversiveness discounting and time across five data points, basing on the Myerson algorithm (Myerson et al., 2001). By doing so, a higher AUC reflects stronger temporal discounting of task aversiveness along with nearing deadline, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. As for the task outcome value, it was theoretically posited as a relatively stable evaluation of the task (Zhang & Feng, 2020; Zhang et al., 2021).”

      References

      Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235

      Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312

      Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053

      Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492

      Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643

      (7) Coming back to the point about the statistical analyses not being described in enough detail: One important example of this is the inclusion of random slopes in their mixed-effects model which is unclear. This is highly relevant as omission of random slopes has been repeatedly shown that it can lead to extremely inflated Type 1 errors (e.g., inflating Type 1 errors by a factor of then, e.g., a significant p value of .05 might be obtained when the true p value is .5). Thus, if indeed random slopes have been omitted, then it is possible that significant effects are significant only due to inflated Type 1 error. Without more information about the models, this cannot be ruled out.

      Thank you for sharing this very timely and crucial comment. After careful scrutiny, we identified this statistical flaw you pointed out - each participant was not yet modeled as random slopes but as random intercepts merely. As you kindly suggested, we have reanalyzed all the statistics by adding random slopes (i.e., (1 + day|SubjectID)). Results showed a statistically significant interaction effect for both procrastination willingness (β = -7.8, SE = 1.8, DF = 45.6, p < .001) and actual procrastination rates (β = -7.4, SE = 2.4, DF = 46.6, p = .004), indicating the effectiveness of multi-session neuromodulation in mitigating procrastination. In the post-hoc simple effect analyses, participants who engaged in active neuromodulation (NM) showed a significant increase in task-execution willingness (i.e., decreased procrastination willingness; NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction) and a decrease in actual procrastination rates (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), while no such effects were identified for participants in the sham control group (for willingness, SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction; for actual procrastination, SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction). Taken together, we do appreciate your pointing out this definitely crucial statistical weakness, and have confirmed that our findings remain reliable after adjusting for Type 1 error by adding random slopes. Moreover, as you kindly suggested, we have incorporated these statistical details, particularly those concerning the GLMM, into the main text to facilitate your evaluation. Please see specific revisions below:

      Methods Section (Page 8, Line 381-401)

      “To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test....”

      Results Section (Page 9, Line 428-449)

      “To identify whether ms-tDCS targeting the left DLPFC can alleviate subjective procrastination willingness and actual procrastination behavior, a generalized linear mixed-effects model with Scatterthwaite algorithm was built, with task-execution willingness and actual procrastination rates (PR) as primary outcomes, respectively. For procrastination willingness, results showed a statistically significant interaction effect between multi-session neuromodulations and groups (β = -7.8, SE = 1.8, DF = 45.6, p < .001; Fig. 3A). In the post-hoc simple effect analysis, it demonstrated a significantly increased task-execution willingness (i.e., decreased procrastination willingness) after neuromodulation in the active neuromodulation group (NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction), but no such effects were identified in the sham control group (SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction) (Fig. 3B-C). A linear uptrend for task-execution willingness was further observed across multiple sessions in the active NM group, indicating gradually increasing neuromodulation effects (Fig. 3D; p < .01, Mann-Kendall test). For actual procrastination behavior, changes to actual procrastination rates across all the sessions have been detailed in the Fig. 3E. Similarly, a statistically significant interaction effect was identified here (β = -7.4, SE = 2.4, DF = 46.6, p = .004), and the simple effect analysis further revealed decreased actual procrastination rates after ms-tDCS in the active neuromodulation group (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), but no such prominent changes found in the sham control group (SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction) (Fig. 3F-G). Also, a significant downtrend for procrastination rates across all the sessions was identified in the active NM group (Fig. 3H; p < .01, Mann-Kendall test).”

      (8) Related to the previous point: The authors report, for example, on the first results page, line 420, an F-test as F(1, 269). This means the test has 269 residual degrees of freedom despite a sample size of about 50 participants. This likely suggests that relevant random slopes for this test were omitted, meaning that this statistical test likely suffers from inflated Type 1 error, and the reported p-value < .001 might be severely inflated. If that is the case, each observation was treated as independent instead of accounting for the nestedness of data within participants. The authors should check this carefully for this and all other statistical tests using mixed-effects models.

      Thank you for underlining this very timely and helpful comment. As you correctly pointed out above, we did not include random slopes in the original GLMM, highly risking the inflation of the false-positive rate (i.e., Type-I error). By adding the random slopes, we reanalyzed all the statistics from the GLMM, and confirmed that all the findings are still reliable from those new GLMMs with random slopes. Again, thank you for this crucial statistical advice, and please see the above response for full details regarding what we have revised to address this comment you kindly raised.

      (9) Many of the statistical procedures seem quite complex and hard to follow. If the results are indeed so robust as they are presented to be, would it make sense to use simpler analysis approaches (perhaps in addition to the complex ones) that are easier for the average reader to understand and comprehend?

      We do thank you for this practical and helpful comment. In the original manuscript, we incorporated a joint model of longitudinal and survival data (JM-LSD), in conjunction with machine learning algorithms, to strengthen the robustness of our statistical findings. Nevertheless, we all agree with you on this point: there is no need to complicate the analyses by repeatedly probing the same research question to increase methodological robustness, at the expense of compromising readability and intelligibility for a broader audience. As you suggested, we have removed these complicated statistical methods, and merely maintained the primary ones - GLMM and X<sup>2</sup> cross-tab test, as well as a complementary one - Mann-Kendall linear trend test. Thus, we have almost rewritten the whole Results section. Please see the specific instances below:

      Results Section (Page 9, Line 468-485)

      “Ms-tDCS changes task aversiveness and task-outcome value

      Both task aversiveness and task outcome value serve as key pathways determining whether one would procrastinate. To this end, we further utilized a generalized linear mixed-effects model to examine the effects of ms-tDCS on changes in task aversiveness and task outcome value. Task aversiveness changes across all the sessions are shown in the Fig. 4A and 4C. We demonstrated a statistically significant decrease in task aversiveness and an increase in task outcome value via ms-tDCS in the neuromodulation group (Task aversiveness: interaction effect, β = -0.12, SE = 0.04, DF = 46.7, p = .002; simple effect, NM-before <sub>(AUC)</sub>: 1.13 ± 0.53, NM-after <sub>(AUC)</sub>: 1.95 ± 0.85, t.ratio = 4.5, p < .001, Tukey correction; Outcome value: β = -6.8, SE = 1.74, DF = 46.2, p < .001; simple effect, NM-before: 35.86 ± 27.82, NM-after: 73.08 ± 23.33, t.ratio = 5.0, p < .001, Tukey correction; see Fig. 4B), but not in the sham control group (Task aversiveness: SC-before <sub>(AUC)</sub>: 1.07 ± 0.51, SC-after <sub>(AUC)</sub>: 1.28 ± 0.46, t.ratio = 1.3, p = .20, Tukey correction; Outcome value: SC-before: 34.00 ± 25.17, SC-after: 40.13 ± 28.94, t.ratio = 0.8, p = .41, Tukey correction; see Fig. 4D). In the neuromodulation (NM) group, task aversiveness steadily decreased with the cumulative number of stimulation sessions, while perceived task outcome value increased significantly (see Fig. 4E-F, p < .05, Mann-Kendall test). Thus, it provides causal evidence clarifying that neuromodulation to left DLPFC reduces task aversiveness and enhances task-outcome value meanwhile.”

      Results Section (Page 10, Line 525-542)

      “Long-term effects of ms-tDCS

      We have also attempted to conduct a follow-up investigation to test the long-term retention of ms-tDCS in reducing actual procrastination. Almost all the participants had undergone follow-up except one in the neuromodulation group after last neuromodulation for 6 months (N<sub>NM</sub> = 22, N<sub>SC</sub> = 23). Thus, the GLMM was constructed, with the PR before first neuromodulation vs. PR after last neuromodulation for 6 months as covariates of interest. Results showed the statistically significant group*time interaction effects (β = 16.5, SE = 9.9, p = .049). Simple-effect model demonstrated a decrease in actual procrastination rates in the active neuromodulation group after last stimulation for 6 months compared to baseline (β = -22.05, SE = 10.0, p = .038, Tukey correction; NM-before: 40.68 ± 37.96, NM-after<sub>6-months</sub>: 18.63 ± 29.80), and revealed null effects in the SC group (β = 1.26, SE = 9.78, p = .99, Tukey correction; SC-before: 46.47 ± 40.75, SC-after<sub>6-months</sub>: 47.73 ± 39.18) (see Fig. 6).. Furthermore, using a nonparametric x<sup>2</sup> test to compare differences in the number of procrastinated tasks, we still found a statistically significant reduction in procrastination frequency in NM group after neuromodulation for 6 months compared to baseline (x<sup>2</sup> = 3.30, p = .035, NM-before: 68.19% (15/22), NM-after<sub>6-months</sub>: 40.91% (9/22)), while no significant changes were observed in the SC group (x<sup>2</sup> = 0.11, p = .74, SC-before: 69.56% (16/23), SC-after<sub>6-months</sub>: 73.91% (17/23)). Therefore, beyond to short-term effects, the benefits of ms-tDCS neuromodulation to reduce procrastination pose the long-term retention.”

      (10) As was noted by an earlier reviewer, the paper reports nearly exclusively about the role of the left DLPFC, while there is also work that demonstrates the role of the right DLPFC in self-control. A more balanced presentation of the relevant scientific literature would be desirable.

      We are grateful to you for noticing the unbalanced presentation of the literature on left DLPFC. As you kindly suggested, we have added literature to support the association between self-control and the right lateralization of the DLPFC. Please see below for what we have revised:

      Introduction Section (Page 4, Line 137-143)

      “...In addition to the left lateralization, there is solid evidence indicating significant associations between self-control and the right DLPFC indeed, particularly given that this region specifically functions in top-down regulation, future self-continuity representation and social decisions (Huang et al., 2025; Lin and Feng, 2024; Knoch & Fehr, 2007). Despite this case, Xu and colleagues demonstrated null effects of anodally stimulating the right DPFC to modulate either value evaluation or emotional regulation for changing procrastination willingness (Xu et al., 2023).”

      (11) Active stimulation reduced procrastination, reduced task aversiveness, and increased the outcome value. If I am not mistaken, the authors claim based on these results that the brain stimulation effect operates via self-control, but - unless I missed it - the authors do not have any direct evidence (such as measures or specific task measures) that actually capture self-control. Thus, that self-control is involved seems speculation, but there is no empirical evidence for this; or am I mistaken about this? If that is indeed correct, I think it needs to be made explicit that it is an untested assumption (which might be very plausible, but it is still in the current study not empirically tested) that self-control plays any role in the reported results.

      We truly appreciate your pointing out this weakness with regard to conceptualization. Yes, you are correct in understanding this causal chain: we conceptually speculate that the HD-tDCS stimulation over the left DLPFC operates self-control to change procrastination, rather than empirically validating this component in the chain: brain stimulation→increased self-control→increased task outcome value→decreased procrastination. In this causal chain, we did not collect data to directly measure self-control at either baseline or post-neuromodulation times. Therefore, we all agree with your suggestion to explicitly claim this case in the main text. Following this advice, we have redrawn a portion of the Conclusion by clearly pointing out the hypothesis-generating role of self-control in mitigating procrastination, and have further claimed this case in the Limitation section:

      Abstract Section (Page 2, Line 55-57)

      “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and offers a validated, theory-driven strategy for interventions.”

      Results Section (Page 10, Line 489-492 and 520-522)

      “Given the dual neurocognitive pathways identified above—reduced task aversiveness and increased task-outcome value—we proposed that these changes, conceptually driven by enhanced self-control via ms-tDCS over left DLPFC, account for how neuromodulation reduces procrastination. ...”

      “In summary, these findings demonstrated a mechanistic pathway underlying procrastination: the self-control that was conceptualized to be governed by left DLPFC mitigate procrastination by plausibly increasing task-outcome value.”

      Discussion Section (Page 13, Line 642-645)

      “Moreover, this study did not collect data for assessing participants’ self-control at either baseline or post-neuromodulation, thereby limiting our ability to determine whether the effects on procrastination were uniquely attributable to neuromodulation-induced changes in self-control. ...”

      (12) Figures 3F and 3H show that procrastination rates in the active modulation group go to 0 in all participants by sessions 6 and 7. This seems surprising and, to be honest, rather unlikely that there is absolutely no individual variation in this group anymore. In any case, this is quite extraordinary and should be explicitly discussed, if this is indeed correct: What might be the reasons that this is such an extreme pattern? Just a random fluctuation? Are the results robust if these extreme cells are ignored? The authors remove other cells in their design due to unusual patterns, so perhaps the same should be done here, at least as a robustness check.

      Thank you for raising this highly important and helpful comment. Indeed, we fully understand that this result is somewhat extraordinary, a fact that was equally striking to us when unblinding the data. After carefully scrutinizing the data and statistics, we are thrilled to confirm that this pattern is true. In support of this observation, we were gratified to receive numerous thank-you letters from participants who engaged in active neuromodulation. They expressed gratitude to us, and reported that they have substantially ameliorated procrastination behavior in real-life activities after completing the trial. While this does not constitute formal scientific evidence, we are also glad to see the benefits of this neuromodulation for those procrastinators.

      Two reasons could account for this pattern herein. One interpretation is to attribute this pattern to “scalar inflation”. In the present study, the procrastination rate was calculated as 1 minus the task-completion rate (e.g., 80%, 60%, 40%) by the deadline. At sessions # 6 and #7, all the participants completed their real-life tasks before the deadline, yielding a 0% (1 minus 100% completion rate) procrastination rate, without any between-individual variation. Thus, rather than there being no individual variation in procrastination, this scalar – the procrastination rate - is too insensitive to capture subtle differences per se. For instance, although participants #1 and #2 both showed a 0% procrastination rate - meaning that both completed their tasks before the deadline - Participant #1 might have completed it 3 hours before the deadline, whereas Participant #2 might have completed it only 10 minutes before. In this case, the “scalar inflation” emerges to let us perceive that both participants have equivalent procrastination rates, although participant #2 may have a higher procrastination level than #1. As conceptually defined in the field, procrastination is contextualized as “not completing a task before the deadline”. Thus, if this task is completed before the deadline, regardless of whether it was finished close to or far in advance of the deadline, this case is defined as “no procrastination”. In the present study, the primary outcome is whether a participant procrastinated on a real-life task before the deadline in real-world settings, irrespective of when she/he completed this task. Thus, this scalar - procrastination rate - fits our conceptualization of procrastination.

      Another reason is the potential accumulative effects from sequential multi-session tDCS stimulation. As shown in Mann-Kendall trend tests, the procrastination rates show a significant linear downtrend in the active neuromodulation group across sessions, even after removing sessions #6 and #7. This indicates that the improvements of going against procrastination may be sequentially accumulative along with the increase in sessions, implying a potential “dose-dependent effect”. Despite a speculative interpretation, this “dose-dependent effect” in neuromodulation has been well-documented in previous studies, showing the robustly linear association between the number of sessions and effectiveness (c.f., Cole et al., 2020; Hutton et al., 2023; Sabé et al., 2024; Schulze et al., 2018). Therefore, although this extreme pattern is somewhat extraordinary compared to previous observations, it makes sense.

      Yes, this is a definitely great idea to carry out a robustness check by removing sessions #6, #7, or both. We do believe that this analysis could support statistical robustness to go against potential biases from extreme cells. By doing so, we found that all the group*treatment_day interaction effects remained significant when removing either session #6 or session #7 (or even both, all p-values < .05), indicating high statistical robustness. Please see Supplementary table S3 and S4

      Taken together, in spite of their being extraordinary, we confirm that those findings are statistically robust to extreme outliers. As you kindly suggested, we have added those findings of the robustness check into the revised Supplemental Materials section.

      References

      Cole, E. J., Stimpson, K. H., Bentzley, B. S., Gulser, M., Cherian, K., Tischler, C., Nejad, R., Pankow, H., Choi, E., Aaron, H., Espil, F. M., Pannu, J., Xiao, X., Duvio, D., Solvason, H. B., Hawkins, J., Guerra, A., Jo, B., Raj, K. S., Phillips, A. L., … Williams, N. R. (2020). Stanford Accelerated Intelligent Neuromodulation Therapy for Treatment-Resistant Depression. The American journal of psychiatry, 177(8), 716–726. https://doi.org/10.1176/appi.ajp.2019.19070720

      Hutton, T. M., Aaronson, S. T., Carpenter, L. L., Pages, K., Krantz, D., Lucas, L., Chen, B., & Sackeim, H. A. (2023). Dosing transcranial magnetic stimulation in major depressive disorder: Relations between number of treatment sessions and effectiveness in a large patient registry. Brain stimulation, 16(5), 1510–1521. https://doi.org/10.1016/j.brs.2023.10.001

      Sabé, M., Hyde, J., Cramer, C., Eberhard, A., Crippa, A., Brunoni, A. R., Aleman, A., Kaiser, S., Baldwin, D. S., Garner, M., Sentissi, O., Fiedorowicz, J. G., Brandt, V., Cortese, S., & Solmi, M. (2024). Transcranial Magnetic Stimulation and Transcranial Direct Current Stimulation Across Mental Disorders: A Systematic Review and Dose-Response Meta-Analysis. JAMA network open, 7(5), e2412616. https://doi.org/10.1001/jamanetworkopen.2024.12616

      Schulze, L., Feffer, K., Lozano, C., Giacobbe, P., Daskalakis, Z. J., Blumberger, D. M., & Downar, J. (2018). Number of pulses or number of sessions? An open-label study of trajectories of improvement for once-vs. twice-daily dorsomedial prefrontal rTMS in major depression. Brain stimulation, 11(2), 327–336. https://doi.org/10.1016/j.brs.2017.11.002

      (13) The supplemental materials, unfortunately, do not give more information, which would be needed to understand the analyses the authors actually conducted. I had hoped I would find the missing information there, but it's not there.

      Sorry to offer uninformative supplemental materials (SM) in the original submission. As you suggested, we have added a substantial number of details to clarify how we conducted data analyses in the main text, and also tightened the whole SM section to improve readability and comprehensibility. We do hope that this revised manuscript could offer clear and adequate information in understanding methods and statistics for broader readers.

      In sum, the reported/cited/discussed literature gives the impression of being incomplete/selectively reported; the analyses are not reported sufficiently transparently/fully to evaluate whether they are appropriate and thus whether the results are trustworthy or not. At least some of the patterns in the results seem highly unlikely (0 procrastination in the verum group in the last 2 observation periods), and the sample size seems very small for a between-subjects design.

      Thank you for this very helpful summary. As you kindly suggested above, we have overhauled this manuscript to address those points that you listed here, particularly where we added relevant literature to balance our claims, added a huge amount of details to sufficiently/transparently report statistics, and conducted a robustness check to confirm the statistical robustness of our findings to those plausible extreme patterns (sessions #6 and #7), as well as justified how we determined this sample size fulfilling medium statistical power in a priori. Please see above for full details regarding how we addressed those comments, point-by-point.

      Reviewer #2 (Public Review):

      Chen and colleagues conducted a cross-sectional longitudinal study, administering high-definition transcranial direct stimulation targeting the left DLPFC to examine the effect of HD-tDCS on real-world procrastination behavior. They find that seven sessions of active neuromodulation to the left DLPFC elicited greater modulation of procrastination measures (e.g., task-execution willingness, procrastination rates, task aversiveness, outcome value) relative to sham. They report that tDCS effects on task-execution willingness and procrastination are mediated by task outcome value and claim that this neuromodulatory intervention reduces procrastination rates quantified by their task. Although the study addresses an interesting question regarding the role of DLPFC on procrastination, concerns about the validity of the procrastination moderate enthusiasm for the study and limit the interpretability of the mechanism underlying the reported findings.

      Strengths:

      (1) This is a well-designed protocol with rigorous administration of high-definition transcranial direct current stimulation across multiple sessions. The approach is solid and aims to address an important question regarding the putative role of DLPFC in modulating chronic procrastination behavior.

      (2) The quantification of task aversiveness through AUC metrics is a clever approach to account for the temporal dynamics of task aversiveness, which is notoriously difficult to quantify.

      Thank you for taking your invaluable time to review our manuscript, warmly applauding the strength in research design and the conceptualization of scaling task aversiveness, as well as kindly sharing such helpful and insightful evaluations. As you correctly pointed out, we are aware of the absence of detailed, clear and understandable reporting of measures (e.g., real-world procrastination), statistics and methods, in the original manuscript. Following all your suggestions, we have thoroughly revised this manuscript to address those comments that you kindly made, point-by-point. Please see the full response underneath.

      Weaknesses:

      (1) The lack of specificity surrounding the "real-world measures" of procrastination is problematic and undermines the strength of the evidence surrounding the DLPFC effects on procrastination behavior. It would be helpful to detail what "real-world tasks" individuals reported, which would inform the efficacy of the intervention on procrastination performance across the diversity of tasks. It is also unclear when and how tasks were reported using the ESM procedure. Providing greater detail of these measures overall would enhance the paper's impact.

      We genuinely appreciate your raising this very crucial comment. We are sorry for omitting a tremendous number of methodological details to comply with the editorial requirement on the manuscript’s length, which hampered the comprehension of how we measure “real-life tasks” and “real-world procrastination”.

      As shown in the schematic diagram for experimental procedure (Fig. 1), the experimental protocol alternated between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On each Neuromodulation Day, participants received either active or sham HD-tDCS, and—critically—before stimulation—were instructed to specify a real-life task they were required to complete the following day, with a deadline between 18:00 and 24:00. This ensured ≥24 hours between neuromodulation and task execution, isolating offline after-effects. For instance, on Day #2 (Neuromodulation Day), before carrying out stimulation, participants were asked to report a real-life task that has a deadline within 18:00 - 24:00 for tomorrow’s “task day” (Day #3) (please see the schematic diagram in Author response image 2).

      Author response image 2.

      There are some real-life tasks that they reported in our experiment as examples: “Complete and submit a homework assignment”, “Complete a standardized English proficiency test”, “Complete an online course module required for applying a Class C driver’s license”, “Prepare slides for a seminar presentation”, “Practice guitar”, “Practice Chinese calligraphy”, and “Do the laundry”. Reported tasks spanned academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity.

      On each “task day”, participants engaged in an intensive Experience Sampling Method (iESM) protocol via a custom-built mobile app. Using this app, participants were required to report a subjective task-execution willingness score (i.e., a one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”; procrastination willingness = 100 – the task-execution willingness score), the subjective task aversiveness (i.e., a one-item 100-point visual analog scale), the subjective task outcome value (i.e., a one-item 100-point visual analog scale), and the objective procrastination rate, respectively.

      Rather than self-reported scores from those one-item visual analog scales, we asked participants to report real “task completion rate” for the objective quantification of the “real-world procrastination behavior”. Specifically, at the deadline, each participant was asked to report whether she/he had completed this task. If she/he reported not having yet completed the task (i.e. procrastination behavior emerged), she/he was further required to report the percentage of the task completed (1% - 99%), which was defined as the task completion rate. By doing so, we could calculate the real-world procrastination rate for the real-life task as the “1 – the task completion rate”. For instance, if a participant did not complete her/his real-life task before the deadline (i.e. she/he procrastinated this task) and reported completing 75% of this task at the deadline, her/his real-world procrastination rate was computed as the 25% (1 - 75%) (Please see the schematic diagram in Author response image 3).

      Moreover, rather than merely a self-reported task completion rate, each participant was also asked to upload proof (e.g., screenshots of submitted assignments, photos of printed documents, system timestamps) to the ESM digital system for validation.

      Author response image 3.

      To determine the sampling time points for this mobile app in the ESM, we capitalized on both the conceptual temporal decision model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001) (please see the schematic diagram in https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time):

      By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00). Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. As the primary outcomes, the procrastination rate (i.e., 1 – the task completion rate) and the procrastination willingness were sampled at the deadline point.

      Furthermore, yes, we fully concur with you on this great idea, that is, transparency about task diversity strengthens the generalizability of our findings. In response, we have tabulated these real-life tasks that were reported in this experiment in the independent Appendix 1, with automatic translations from Chinese to English via Qwen GPT. Please see below for what we have added to the main text:

      Methods Section (Page 6-7, Line 238-308)

      “Nested cross-sectional longitudinal design

      This study used a nested cross-sectional longitudinal design to investigate whether the multiple-session anodal HD-tDCS targeting the left DLPFC could reduce actual procrastination behavior and to probe how this effect manifests. To assess procrastination in daily life, we implemented a 15-day protocol alternating between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On the Neuromodulation days, the 20-min anodal HD-tDCS neuromodulation targeting the left DLPFC was performed for HD-tDCS active group at intervals of 2 days, while the sham-control group received sham HD-tDCS training. This HD-tDCS training was repeated for a total of seven sessions, and lasted 15 days (see Fig. 1a). Crucially, to capture procrastination in ecologically valid contexts, prior to receiving either active or sham HD-tDCS (administered between 09:00–18:00), participants were instructed to specify a real-life task they were personally obligated to complete the following day, with a self-defined deadline strictly constrained to 18:00–24:00 to ensure ≥24 hours between stimulation offset and task deadline, thereby isolating offline after-effects. This task should meet the following three criteria: (a) it should be already assigned in the real-world settings; (b) deadline should be constrained to 18:00-24:00 (see above); (c) it should be more likely to induce procrastinate. By doing so, more than 300 real-life tasks were collected, spanning academic (e.g., “submit a statistics homework assignment”), occupational (e.g., “draft and email a project proposal”), administrative (e.g., “complete online application for Class C driver’s license”), self-improvement (e.g., “practice guitar for ≥30 minutes”), domestic (e.g., “do laundry ”), and health-related (e.g., “running 2,000m for exercise”). Full task list has been tabulated in the Appendix 1. As primary outcomes, all the participants were required to reported task-execution willingness (TEW) (Zhang & Feng, 2020; Zhang, Liu, et al., 2019), for a real-life task 24 hours post-neuromodulation. Thus, procrastination willingness was quantified as 100-TEW score (see underneath for details). Furthermore, we asked participants to report the actual task completion rate (CR) of the task at the deadline (e.g. participant A finished 90% homework at deadline and reported this situation to us at deadline). In this vein, the actual procrastination rate (PR) was quantified as 1-CR.

      On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a prior by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting (requiring ≥ 4 points; Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure. To obviate the confounds of daily emotions in task aversiveness evaluation, we used the averaged scores of PANAS at 10:00 (noon) and 16:00 (afternoon) as anchoring points to quantify one’s daily emotions by using this ESM app. Before each session of HD-tDCS training, each participant was required to report a real-life task whose deadline is tomorrow. To obtain the long-term effect of HD-tDCS (i.e., the interval between HD-tDCS and task completion is at least 24 hours), the task deadline that participants reported was required to be between 18:00 - 24:00. Once a sampling time reached, this app would send a digital message to require participants to fill online form for data collection.

      Quantification of covariates of interests

      Outcome variables of this study were twofold: one is task-execution willingness and another is procrastination rate (PR). Task-execution willingness is used to evaluate one’s subjective inclination to avoid procrastination (Zhang & Feng, 2020). In this vein, we used a 100-point scale to require participants to report their task-execution willingness (0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). This metric was recorded 24 hours after neuromodulation to examine its long-term effects. PR is used to quantify the extent to which one task has been procrastinated, and was calculated as 1 - CR (task completion rate). Critically, at the precise deadline, the app prompted participants to (a) indicate task completion status (yes/no), and if incomplete, (b) report the percentage completed (1–99%), defined as the Task CR, while simultaneously uploading objective evidence (e.g., screenshots of submitted files, photos of physical outputs, system-generated logs, or app-exported records). If the task was actually completed before the deadline, the CR would be 100% and the PR would be calculated as 0% (1-CR). PR was recorded at the actual task deadline for each participant. We were also interested in re-investigating their actual procrastination by using PR 6 months after the last neuromodulation to test the long-term retention of this neuromodulation effect.”

      References

      Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235

      Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312

      Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053

      Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492

      Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643

      (2) Additionally, it is unclear whether the reported effects could be due to differential reporting of tasks (e.g., it could be that participants learned across sessions to report more achievable or less aversive task goals, rather than stimulation of DLPFC reducing procrastination per se). It would be helpful to demonstrate whether these self-reported tasks are consistent across sessions and similar in difficulty within each participant, which would strengthen the claims regarding the intervention.

      Thank you for raising this very crucial comment. We indeed agree with you on this point that the reported effects may vary with task difficulties and task-execution proficiency, which potentially confound the effects of stimulation on mitigating procrastination. As you correctly comment, given no data collection on difficulties or other relevant characteristics of tasks, we cannot completely rule out this confounder in interpreting our findings on the one hand. As a result, we have explicitly claimed this limitation in the Discussion section.

      On the other hand, despite no quantitative evidence, this risk of confounding main effects with disparities in task characteristics was controlled experimentally. As we reported above, all the reported tasks were mandated to meet three criteria: (a) they were already assigned in the real-world settings; (b) the deadline was constrained to 18:00-24:00; (3) they were likely to lead to procrastinate. To do so, each participant was clearly instructed to report a real-life task that was more likely to be procrastinated in real-world settings, and was not allowed to report easy, achievable and cost-less tasks. Supporting this case, those reported tasks were found spanning academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity and difficulty. This was resonated by observing the high within-subject task homogeneity. For instance, for Participant #5, she/he reported the tasks that were almost all around academic activities across all the sessions. Therefore, as the task list reported (please see Appendix 1), these self-reported tasks were plausibly consistent across sessions and similar in difficulty within each participant.

      In addition, as we tested, almost all the participants reported they were receiving treatment, with 91.30% (21/23) for the active neuromodulation group (NM) and with 86.95% (20/23) for the sham control group (SC) (x<sup>2</sup> = 0.224, p = .636), indicating the effectiveness of the double-blinding methods. If participants learned across sessions to report more achievable or less aversive task goals, their procrastination willingness and procrastination rates for their reported tasks would all increasingly decrease, irrespective of whether they were in the active neuromodulation-effect group or the sham group. However, no such effects - procrastination willingness and procrastination rates for their reported tasks increasingly decreasing across sessions - existed in the sham control group (Mann-Kendall test, for procrastination willingness, tau = 0.60, p = .13; for procrastination rate, tau = 0.61, p = .13), indicating no statistically significant learning effect or strategic effect on task performance. Again, thank you for this very crucial comment, and we do hope these clarifications could address it.

      Limitations Section (Page 12, Line 637-640)

      “In addition, despite instructing to report valid real-life tasks with high probabilities to procrastinate, we had not yet measured the task difficulty and consistency across sessions for each participant. Consequently, interpreting the effects of neuromodulation to mitigate procrastination as “unique contributions” should warrant cautions. ...”

      (3) It would be helpful to show evidence that the procrastination measures are valid and consistent, and detail how each of these measures was quantified and differed across sessions and by intervention. For instance, while the AUC metric is an innovative way to quantify the temporal dynamics of task-aversiveness, it was unclear how the timepoints were collected relative to the task deadline. It would be helpful to include greater detail on how these self-reported tasks and deadlines were determined and collected, which would clarify how these procrastination measures were quantified and varied across time.

      We do appreciate your highlighting the importance of clarifying how to measure procrastination, substantially helping readers to interpret these findings. As reported above, the primary outcomes of this experiment included subjective procrastination willingness and objective actual procrastination rate. For the subjective procrastination willingness, using the purpose-built mobile app, participants were required to report subjective task-execution willingness score (i.e., one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). Thus, the procrastination willingness was computed as “100 – the task-execution willingness score”. For the objective procrastination rate, rather than self-reported scores from those one-item visual analog scales, we asked participants to report the real “task completion rate from 1% to 99%” for the objective quantification of the “real-world procrastination behavior”. Full details can be found in Response #1.

      For determining sampling time points for the quantification of AUC, we capitalized on both the conceptual Temporal Decision Model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when being far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001). By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00).

      Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. After capturing the task aversiveness from those five time points, the task aversiveness discounting was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from those five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed via the trapezoidal integration between task aversiveness discounting and time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination.

      Taken together, following your suggestion, we have added a substantial number of details to clarify how to measure procrastination, when to sample the data and how to estimate the AUC into the revised manuscript. Please see them in Response #1.

      (4) There are strong claims about the multi-session neuromodulation alleviating chronic procrastination, which should be moderated, given the concerns regarding how procrastination was quantified. It would also be helpful to clarify whether DLPFC stimulation modulates subjective measures of procrastination, or alternatively, whether these effects could be driven by improved working memory or attention to the reported tasks. In general, more work is needed to clarify whether the targeted mechanisms are specific to procrastination and/or to rule out alternative explanations.

      Yes, we fully agree with you on this consideration: we should tone down the conclusions currently claimed in the main text, given the inherent shortcomings mentioned above. As you helpfully suggested, we have moderated our overall claims regarding the effects of multi-session neuromodulation in alleviating chronic procrastination. Please see specific instances below:

      Abstract Section (Page 2, Line 55-57)

      “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and potentially offers a validated, theory-driven strategy for interventions.”

      Conclusion Section (Page 13, Line 657-664)

      “In conclusion, this study potentially provides an effective way to reduce both procrastination willingness and actual procrastination behavior by using neuromodulation on the left DLPFC. Furthermore, such effects have been observed for 2-day-interval long-term after-effects, and were also found for 6-month long-term retention in part. More importantly, this study identified that the ms-tDCS neuromodulation could decrease task aversiveness and increase task outcome value while, and further demonstrated that the increased task outcome value could predict decreased procrastination, a relationship conceptually driven by enhancing self-control. In this vein, the current study enriches our understanding of neurocognitive mechanism of procrastination by showing the prominent role of increased task outcome value in reducing procrastination. Also, it may provide an effective method for intervening in human procrastination.”

      Moreover, yes, as we clarified above, in addition to the objective measure of procrastination behavior, we also leveraged a one-item visual analog scale (i.e. one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”) to measure subjective procrastination willingness. Results demonstrated that the subjective procrastination willingness significantly decreased across neuromodulation sessions in the active group, but not in the sham control group, consistent with the observed reduction in the objective procrastination measure. In addition, we all perceive it as helpful and crucial to note that we cannot draw the conclusion that the effects of neuromodulation on mitigating procrastination are contributed by increasing task outcome value uniquely. Given no measures or evidence of other factors, such as working memory and attention, we cannot rule out other neurocognitive pathways. To address this point, we have removed or rephrased such statements throughout the whole revised manuscript, and explicitly constrained to interpret this neurocognitive mechanism (i.e., increased task outcome value) within the theory-driven framework of the temporal decision model.

      Reviewer #3 (Public review):

      This manuscript explores whether high-definition transcranial direct current stimulation (HD-tDCS) of the left DLPFC can reduce real-world procrastination, as predicted by the Temporal Decision Model (TDM). The research question is interesting, and the topic - neuromodulation of self-regulatory behavior - is timely.

      Many thanks for kindly dedicating time to review our manuscript, and for the helpful comments detailed below. Thank you for appreciating the novelty of this study.

      However, the study also suffers from a limited sample size, and sometimes it was difficult to follow the statistics.

      Thank you for pointing out these crucial concerns. As you correctly raised, the sample size is somewhat small in any case, but we confirm that this sample size is adequate to obtain medium statistical power.

      For estimating the sample size, we determined the a priori effect size based on the existing work we published (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In this pilot study, we identified a significant interaction effect between single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori.

      Using the GPower software with an estimation of a medium effect size, we determined that a total sample size of N<sub>total</sub> = 34 could reach adequate statistical power. Please see outputs of the GPower in Author response image 1.

      As for the statistics, we genuinely acknowledge that the vague methodological descriptions and complex algorithms indeed complicated the understanding of the methods and statistics. To address this, echoing the comment raised by Reviewer #1, we have removed the complicated statistics and methods, and further clarified how we used the generalized linear mixed-effect model (GLMM) for statistical analysis. Please see the specific revisions below:

      Methods Section (Page 8, Line 378-403)

      “Statistics

      All the statistics were implemented by R (https://www.rstudio.com/) and R-dependent packages.

      To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test. Regarding the 6-month follow-up investigation, this GLMM was also built to examine the long-term retention of neuromodulation on reducing actual procrastination.”

      The preregistration and ecological design (ESM) are commendable, but I was not able the find the preregistration, as reported in the paper.

      We are sorry to encounter a serious technical barrier that has rendered our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account. This has prevented access to all materials deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report (please see the screenshot below). We reckon that this may be due to my affiliation change to the Third Military Medical University of People’s Liberation Army (PLA).

      To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” to the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the revised manuscript.

      Overall, the paper requires substantial clarification and tightening.

      We are grateful for your evaluation, and we fully agree with you. In response, we have added a tremendous number of details to clarify how to measure procrastination, how to conduct the statistical analyses, and how to collect real-life tasks, as well as other experimental materials. Please see the revisions in the Methods section of the revised manuscript. Again, thank you for those helpful suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Supplemental Materials, page 4, lines 163 to 167 seem to be from a different manuscript (as the section talks about neural markers, significant clusters, and brain networks).

      We are sorry for erroneously embedding this irrelevant section here. We have removed it, and have double-checked the document to avoid such mistakes.

      (2) I'm no expert here, but some of the trace and density plots in the SOM look problematic (e.g., Figure S5 top panel). But it's not made clear to which model/analysis these plots belong, so they are not very helpful without that information.

      Thank you for bringing these potentially problematic plots to our attention. Following your great suggestion, these results have been removed from the SM to amplify readability and comprehensibility.

      (3) Table S1 reports side effects "from the neurostimulation" (this is also the language used in the main manuscript), but having the flu is rather unlikely to be a side effect from the stimulation, isn't it? Thus, this language is highly confusing, and when reading the main text, it's not clear that these are just life events that are most likely unrelated to the stimulation, but have the potential to affect the measured variables (i.e., ultimately, they seem a source of noise).

      We apologize for this confusing wording. Here, the “side effects” are defined as confounding effects deriving from unexpected life events that uncontrollably disrupt task execution and task performance, such as “having the flu”, or “an unexpected mandatory CCP (Communist Party of China) meeting assignment”. To obviate misunderstanding, we have rephrased “side effects” as “unexpected life events disrupting task execution” in both the main text and the SM section both.

      (4) The use of the English language could be improved.

      Thank you for your very practical suggestion. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) It would be helpful to include greater detail about the ESM procedure and details of the self-reported tasks. This would help rule out potential confounds of difficulty or learning (e.g., participants may have learned to identify more achievable and less difficult tasks across the sessions, which would mean they are learning to perform the task better rather than to procrastinate less). Further elaboration on the quantification of procrastination measures would help clarify the mechanism underlying this behavior, which is important for clarifying how these effects arise and what aspect of procrastination behavior is being targeted by the tDCS intervention (and rule of alternative explanations).

      We wholeheartedly appreciate your sharing this very crucial recommendation. As we mentioned above, we fully followed your helpful suggestions, particularly by adding massive details to fully report how to collect real-life tasks (with consistent and plausible difficulty across sessions), how to determine sampling time points, and how to quantify metrics (e.g., subjective procrastination willingness score, objective procrastination rate, AUC of task aversiveness, and task outcome value) to the revised manuscript. We do believe that these revisions and clarifications are imperative and necessary. By including these details, we do believe that the readability and clarity have been substantially improved in the current form. Please see the specific revisions and clarifications above.

      (2) It would be helpful to proofread for grammatical and spelling typos (e.g., DLPFC is spelled incorrectly in line 140, Satterwaite is spelled incorrectly in Line 415).

      Thank you for your kind suggestion. Both spelling typos have been corrected, and we have double-checked the revised manuscript to ensure no such typos remain. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.

      (3) Please clarify in Figure 4 that a higher AUC is associated with lower task aversiveness (which is stated in the methods but not clearly in the figure).

      Many thanks to you for your helpful suggestion. As you kindly suggested, we have clarified this case in the figure legend.

      Reviewer #3 (Recommendations for the authors):

      I want to see the preregistration.

      Thank you for your helpful recommendation. As we replied above, a serious technical issue on OSF occurred, making our preregistration invisible and inaccessible. OSF has disabled my account, claiming to detect “suspicious user’s activities” in my account. As a result, there is no access to all materials that were already deposited in this OSF account, including this preregistration. We have reconstructed this preregistration based on archived documents, and reported it in the SM. As we reported above, although this partially addresses the problem, it no longer fulfills the best practices of preregistration. Consequently, in addition to transparently reporting this case, we have removed all the preregistration statements throughout the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This interesting paper probes the problematic relationships between the classical "spiralian" taxa, i.e., annelids, molluscs, brachiopods, platyhelminths and nemerteans, and shows that the branches leading to them are so short as to be unreliable guides to their relationships. This, in turn, has important implications for how we view the origin of the animal phyla.

      Strengths:

      A very careful analysis of a famous old problem with quite significant results. The results seem to be robust and support their conclusions.

      It often passes uncommented that many different trees are published about animal relationships, yet some parts of the tree seem extremely difficult to resolve; the spiralians are perhaps the most difficult case. More recently, problems about sponges or ctenophores as sister groups to the rest of the animals have alerted us to major areas of uncertainty in large-scale phylogenetic reconstruction; this paper is a welcome reminder that other, perhaps even harder, problems exist which may be difficult to ever resolve with the (molecular) data we have.

      Weaknesses:

      The paper could have perhaps drawn out some of the implications of its results in a clearer manner.

      Reviewer #2 (Public review):

      Summary:

      The relationships among the phyla making up Spiralia - a major clade of animals including molluscs, annelids, flatworms, nemerteans and brachiopods - have been challenging from a phylogenomic perspective despite decades of molecular phylogenetic effort. Every topology uniting subsets of these phyla has been recovered with apparent support in at least one study, yet no consensus has emerged even from large-scale genomic datasets. Serra Silva and Telford set out to determine whether this instability reflects a genuine biological signal being obscured by analytical limitations, or whether it reflects a rapid, near-simultaneous origin of these phyla that has left behind in modern genomes far too little phylogenetic information to resolve. They focused deliberately on five phyla, reducing the problem to a tractable set of 15 unrooted and 105 rooted topologies, and applied a suite of complementary approaches across two independent datasets and multiple substitution models to test whether any topology is significantly preferred over alternatives.

      Strengths:

      (1) The conceptual framing of the problem is excellent, and the study makes a convincing case across several lines of evidence. By enumerating all possible topologies and demonstrating empirically that every one of the 15 unrooted arrangements has been recovered as the preferred solution in at least one published study, the authors make a strong argument about the state of the field. The use of two entirely independent datasets as a consistency check is great, and convergence between them, where it occur,s substantially strengthens confidence in the conclusions.

      (2) It is my view that the simulation framework is a particular strength. Generating data on a fully unresolved star tree and scoring those data under both correctly-specified and misspecified substitution models provides convincing evidence that the strong preference for rooting Spiralia on the flatworm branch is, at least partly, an analytical artefact driven by the exceptionally long branch in combination with compositional heterogeneity across sites. This is an important methodological demonstration with implications beyond spiralian phylogenetics, as the same issue is likely to affect other deep, long-branched lineages in the animal tree of life.

      (3) The randomised taxon-jackknifing approach is a very nice addition here. The demonstration that preferred topologies shift depending on which species happen to be sampled (even within the same phylum) is a convincing indicator of weak signal, and provides a practical caution for future studies that may report strong support for a particular spiralian arrangement based on a fixed taxon sample.

      (4) The branch-length analyses, benchmarking internal interphylum branches against the already disputed and extremely short branch uniting deuterostomes (work also by this group), are well-conceived and solid.

      (5) I think it is worth highlighting the notable intellectual honesty throughout the paper: the authors do not overstate their results, correctly acknowledging that while the unrooted topology grouping molluscs with brachiopods and flatworms with nemerteans emerges most consistently, this preference is not statistically significant under more adequate substitution models and may itself carry some artefactual component.

      Weaknesses:

      (1) The restriction to five phyla is the most significant limitation, as the authors acknowledge this and give a clear computational justification, but readers should be aware that the paper's convincing conclusions apply specifically to the five focal phyla and the evidence remains incomplete with respect to spiralian phylogeny as a whole.

      (2) The treatment of substitution model adequacy, while commendably thorough for site-heterogeneous models, is necessarily bounded. The authors note that models accounting for non-stationarity, across-lineage compositional heterogeneity, or mixtures of tree histories might yield different results, and that even the most sophisticated currently available approaches have not produced consistent spiralian topologies across studies. This is not a criticism of what has been done here - the analytical scope is reasonable and well-implemented - but it means the paper cannot be read as a definitive demonstration that no model will ever resolve these relationships. The distinction between a true hard polytomy and a radiation that is effectively unresolvable given current data and methods could be drawn more sharply in the discussion.

      (3) The reticulation-aware coalescent analyses are presented somewhat briefly relative to the likelihood-based topology scoring. The finding that flatworms are recovered within a paraphyletic jaw-bearing animal clade in both summary trees - interpreted as long-branch attraction - is striking, and its implications for gene-tree-based approaches to spiralian rooting deserve more discussion than they currently receive.

      (4) The central conclusions - that interphylum branches in Spiralia are extraordinarily short, that topological preferences are strongly model-dependent and taxon-sampling-sensitive, and that an ancient rapid radiation is the most parsimonious explanation - are convincingly supported by the evidence presented. The identification of flatworm long-branch attraction as an important confounding factor in rooting analyses is itself an important and well-demonstrated result.

      Conclusion:

      This paper clearly makes an important contribution to the ongoing debate about spiralian relationships and, more broadly, to methodological discussions about how to handle anciently diversified clades where phylogenetic signal is genuinely limited. The exhaustive topology-scoring framework combined with taxon-jackknifing and simulation under unresolved trees is a valuable methodological template that could usefully be applied to other notoriously difficult nodes in the animal tree. I thoroughly enjoyed the discussion of the implications of these findings for interpreting Cambrian fossils and the evolutionary history of shells, segmentation, larval types and other characters - it is both thoughtful and thought-provoking and will be of broad interest well beyond the phylogenomics and zoology communities. From a very practical perspective, the data and scripts provided make the work useful to researchers wishing to apply similar approaches to other groups.

      Reviewer #3 (Public review):

      Summary:

      This paper addresses the controversial internal relationships within the Spiralia, a major clade of invertebrate animals including molluscs, annelids, brachiopods and flatworms.

      Strengths:

      Performs a range of empirical analyses and simulations that address the core question. Although a favoured unrooted topology finds some support, this is not strongly endorsed in the paper.

      Weaknesses:

      (1) Only considers a subset of relevant phyla (e.g. gastrotrichs are relevant to the phylogenetic position of Platyhelminthes), although how this would change the scale of the analyses (i.e. number of topologies) is addressed in the paper.

      (2) Discussion of Spiralia evolution and broader context, particularly the relevance for the fossil record. Line 448: our current understanding of the early spiralian fossil record is quite consistent with the main results of this paper. For example, there are very few claims for fossils that sit on the short branch leading to Spiralia (or Lophotrochozoa as defined here) that this paper discusses. Many of the key fossils that inform on the characters discussed in the introduction, which have unusual character combinations, have an apomorphy of one of the phyla discussed, and so are resolved as members of the stem lineages of particular phyla.

      (3) This is what you would expect with long phylum stem lineages (line 148) and a short spiralia stem lineage. For example, the mollusc Wiwaxia has chaetae, but a mollusc like Radula (Smith 2012), the conchiferan mollusc Pelagiella has chaetae and a coiled shell (Thomas et al. 2020). The only fossil groups that are routinely discussed as belonging to the stem lineage of more than one phylum are the tommotiids, which have chaetae, segmentation and a complex mineralised skeleton (but not shells in the brachiopod/mollusc sense, see Guo et al 2023) but they sit on the lophophorate stem lineage, a synapomorphy rich group the monophyly of which the present paper endorses (e.g. line 435). The fossil record is consistent with the scenario presented in line 442, e.g. convergent loss or reduction of chaetae and segmentation and convergent evolution of shells in molluscs and brachiopods.

      We thank the reviewers for their kind comments. Please see below for detailed responses to all identified weaknesses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some minor comments that might help improve the paper:

      (1) Abstract L17. "Most analyses on the 15 unrooted trees showed a preference for the same topology but the support over other solutions was non significant" - I don't really understand this sentence in the context of the paper; it makes it sound as if the tree is, after all, well resolved! Non-significant, or not significant better than non significant?

      Having read the rest of the paper I see what this refers to (uT4), but still I don't understand the second clause.

      Re-written to clarify.

      (2) Introduction L31. This makes it sound as if phoronids are actually part of brachiopods, and while that was recovered by Cohen and Weydmann 2005, I'm not sure if it's really a general result. In addition, rather than using "brachiopods plus phoronids" everywhere, you could use "Brachiozoa" (Cavalier-Smith 1998, Biol. Rev).

      We have updated our text and figures to use Brachiozoa.

      (3) L36-37. Yes, but the presence of Chaetagnatha in this clade is suggestive that their primitive body size is not small.

      Have made clear that chaetognaths are not all tiny.

      (4) L85. Kumar et al. may have claimed that Spiralia are as old as 670, but many other analyses would suggest a range of different results. Why choose just this one? In addition, this age seems rather incompatible with your results.

      We agree this maximum age is highly improbable (the principal point remains the deep age of the protostomes). We have used a different reference and refer to a generally acceptable minimum age only.

      (5) L88. The key part of this sentence, "proving a hard polytomy", comes at the end of a long set of references that makes it hard to connect to the lead-in "given the age of", so I would suggest rephrasing.

      Rephrased for clarity.

      (6) L109. It is unclear what this means in the context: "and even support multiple topologies".

      Re-worded for clarity.

      (7) Figure 1. Why did you choose to indicate brachiopods plus phoronids as a larval form, unlike the other clades? Perhaps it's because we don't know what the last common ancestor of the two looked like (unless P is an ingroup of B), but that's arguably true for some of the other clades as well!

      Apologies, this was laziness as we already had a line drawing of an actinotroch larva. Have improved the images in figures 1 and 5 where required.

      (8) L164. Reticulation-aware analyses. As I understand it, this would include introgression, hybridization, etc. However, incomplete lineage sorting has also been invoked, not just for Cambrian-explosion age events but also for other major radiations, such as for angiosperms and birds. How significant might ILS be for generating the results you get?

      Section title amended. Results section updated to reflect this. We now explicitly mention the potential impact of ILS and introgression on spiralian relationships in our discussion.

      Unrooted trees analysis:

      (9) L405 on. Maybe it would be worth including a figure showing the relative branch lengths of uT4. All the images of trees show similar-length branches, which gives off the wrong impression within the context of the paper!

      We understand the motivation, but we worry that showing uT4 as the sole phylogram may end up with this being interpreted by a casual reader as being the main result of the paper. Hopefully the figures with branch lengths encompass this information well enough and with no danger of misinterpretation.

      (10) L430 on. Why is this a "conservative" interpretation?

      Yes agreed not clear. Have changed to “We interpret our results as showing that…”

      (11) You mention synapomorphy accumulation time and implicitly equate shortness of branches with shortness of time. However, other options are available under varying diversification rate models (e.g. ClaDs, Barido-Sottani et al. 2023 Syst. Biol.; CET, Budd and Mann 2025, Syst.Biol.). In particular, the latter paper shows that when unusually large clades are selected for study (as is arguably the case here), then those clades are likely to have started with very high "evolutionary tempo", which speeds up all aspects of evolution, including diversification rates.

      In the Budd and Mann scenario large clades begin with high tempo of cladogenesis, high substitution rate and high diversification rate (rapid origin of new characters). This would suggest that the period of the radiation was extra rapid (even less time than in a ‘normal’ period during which smaller clades emerge) so we feel the point stands.

      (12) L449. Maybe refer to the Song et al. paper again here on scaphopods plus bivalves, as it makes the same sort of points, albeit in a slightly different context.

      We thank the reviewer for the suggestion and have added the citation where relevant.

      (13) Finally, to return to L20. You mention implications for the Cambrian fossil record, but then fail to deliver any!

      We have hopefully addressed this remark in the discussion better (at least to the extent we are qualified to).

      Yet if you are correct, then synapomorphy accumulation would unite groups of phyla, and would surely lead to a scenario highly incompatible with clock models suggesting deep origins of clades (as they would all be more fossilisable).

      Apologies but we don’t completely understand this point as ‘synapomorphy accumulation would unite groups of phyla’ is a little ambiguous. Of course, this is generally true, but our results suggest there was little opportunity to accumulate identifiable synapomorphies linking pairs, triplets or quartets of our 5 spiralian phyla.

      In addition, clock results suggest rather long periods of time leading to the phyla, which would imply that there would have to be extremely slow rates of molecular evolution to yield the short early branches here. Also, it might be worth referring to papers compatible with this view, such as Wernström, J.V. et al., EvoDevo 13, 17 (2022). https://doi.org/10.1186/s13227-022-00202-8 or some of the palaeo literature, such as Budd and Jackson 2016, Phil Trans.

      The referee refers to clock results suggesting a (deep) Ediacaran origin of Lophotrochozoa/Spiralia. We interpret the spiralian radiation itself as rapid but, in the absence of a clock analysis, we cannot comment on when it took place.

      Reviewer #2 (Recommendations for the authors):

      (My not very) Major points - as I feel this is an excellent paper.

      (1) The coalescent-based summary tree analyses warrant expansion. The recovery of flatworms within a paraphyletic jaw-bearing animal clade in both summary trees is a striking result attributed to long-branch attraction, but this interpretation would be strengthened by examining whether pruning or downweighting the longest-branching taxa within those groups affects the outcome, or by reporting per-node quartet scores more fully. This would make the reticulation-aware results more directly informative and would bring this section into better balance with the detailed likelihood-based analyses.

      We thank the reviewer for the suggestion of the expanded analyses. We have now done these, and they yielded essentially the same results as the unpruned analyses. Additionally, while not discussed, we ran the Astral analyses on the subset of gene-trees where all groups of interest (spiralian phyla and superphyletic Ecdysozoa, Deuterostomia, etc.) were monophyletic and found no changes to interphylum quartet scores beyond those due to enforced (super)phylum monophyly, with Platyhelminths still recovered within Gnathifera.

      We have expanded our description of the results slightly as well as our discussion. Location of the tables with detailed quartet scores and local posterior probabilities has been added to Fig. S1’s legend.

      (2) It would strengthen the paper to include at least a brief analysis or explicit discussion of whether any currently available models accounting for non-stationary or across-lineage compositional heterogeneity show any change in the pattern of support, even if only tested on a subset of topologies. A null result here would itself be informative and would make the conclusions more robust to the concern that unexamined model classes might behave differently.

      We thank the reviewer for the suggestion, but this represents a considerable amount of new work and we think it falls outside the scope of the present work. We have, as suggested, included this as a discussion point.

      (3) The authors note that topologies grouping flatworms with ribbon worms appear among the higher-scoring arrangements even under model misspecification in simulations. It would be helpful to comment explicitly on whether the apparent signal for this grouping should therefore be regarded with particular scepticism, or whether it survives artefact correction in any of the analyses, as this is a grouping that has appeared repeatedly in the literature and readers will want guidance on how to interpret it.

      We do state that the nemertean+platyhelminth grouping seems likely to be at the least emphasised by an artefact (as the referee points out it is common to the higher scoring trees in the star tree simulations). We state that this suggests “…that this grouping derives some support from systematic errors.” We now return briefly to this in the discussion.

      Writing and presentation

      (1) The abstract states that rooting Spiralia on the flatworm branch "is a long-branch artefact" - this is slightly stronger than the language used in the body of the paper, where the authors correctly write that this preference is "at least enhanced by" the artefact. The abstract phrasing should be softened to reflect the more nuanced conclusion in the text.

      Good point. Done.

      (2) A brief signposting sentence near the start of the Results, setting out the overall analytical logic before the individual sections begin, would help orient readers. The strategy - score all topologies, test robustness to model choice and taxon sampling, then use simulation to identify artefactual signals - is clear in retrospect but would benefit from being made explicit upfront.

      We have taken this suggestion on board. The summary seemed in the end better placed as the final part of the introduction.

      (3) Figure 3 is complex and would be easier to interpret with a brief explanatory note in the legend clarifying what a wide versus narrow range of log-likelihood scores across topologies means in practical terms for statistical resolution between trees.

      Added sentence to legend.

      Minor Corrections:

      (1) The Figure 2 legend contains a typographical error: "shorter than the short, disputed deuterostome branch" should read "shorter than."

      Done

      (2) At least one reference appears to carry a future publication year (Ishii et al., 2026) and should be verified for accuracy before final submission.

      This reference is correct per the journal’s website. We did find Google Scholar to list it as being from 2025.

      Reviewer #3 (Recommendations for the authors):

      (1) Abstract/SI definitions of Spiralia/Lophotrochozoa

      While I don't have strong feelings about this, if Spiralia is being used as an apomorphy-based name, then it still might be equivalent to Lophotrochozoa, as spiral cleavage in Gnathostoniula jenneri was illustrated by Riedl (1969). Although no other studies have replicated this observation, this should at least be mentioned.

      Sorry this reference to gnathostomulid spiral cleavage was included in a longer version of the discussion of nomenclature. This was first reduced in length (which was when the mention of gnathostomulid spiral cleavage was dropped) then finally moved to the supplementary material. We have now re-included mention of this in the discussion in supplementary info.

      The SI text suggests that the name Lophotrochozoa, as used in its original form by Halanych et al. (1995), was a node-based definition, and that this name is for the sister group of Ecdysozoa. However, in that paper, the name is actually defined as "as the last common ancestor of the three traditional lophophorate taxa, the molluscs, and the annelids, and all of the descendants of that common ancestor". This definition would exclude Gnathifera, and depending on the internal relationships of the non-Gnathiferan phyla, may be equivalent (or not) to the usage of the name Spiralia adopted in the present paper. The perils of mixing node and apomorphy-based definitions of clades are clear, and the situation is less straightforward than the paper suggests, and (somewhat unhelpfully given the subject of the paper) may only become clearer if the relationships of non-ecdysozoan protostomes are resolved.

      We believe that the community universally understood the definition of Lophotrochozoa following the 1997 paper (by the authors who also provided the original 1995 definition). This 1997 definition included both chaetognaths and rotifers as examples of the Gnathifera. The Spiralia, in contrast, began life not even as a name for a clade but a description of a character shared by some apparently unrelated taxa – similar to a grouping of ‘carnivores’. The introduction of a new name was, we suggest, unhelpful. We hope that by defining our terms up front the meaning in the current paper is clear.

      (2) Introduction

      Line 76. Some references needed regarding claims that there was a polymeric brachiopod ancestor, e.g. Gutman (1978), Temereva and Malakhov (2011), Guo et al. (2023). Likewise for the chaetae of brachiopods, annelids and molluscs, e.g. Schiemann (2017), as it's key to trace where these ideas originated.

      Added

      Figure 1. This is a nice illustration of the uncertainty in the relationships of these groups. However, I kept checking which thumbnail image was which for nemerteans and annelids. A minor suggestion, but perhaps a polychaete instead for the annelid?

      We have replaced the rather poor image of an earthworm with a polychaete and also now include labels. We hope the improved images are more helpful. Good point.

      (3) Results

      Branch length comparison. I understand why the deuterostome stem was chosen as the branch for comparison from the point of view of phylogenetic uncertainty. However, what about the branch leading to ecdysozoa or the branch subtending lophotrochozoan and/or gnathifera? Given that the short internodes are used as an argument underpinning uncertain relationships, can we be sure that Gnathifera is not nested within the group of interest, especially given that Gnathifera contains many long-branched taxa and the root may be misplaced within the group?

      We have added the Lophotrochozoa and Ecdysozoa median lengths to our plots and now discuss both the lophotrochozoan branch in our results.

      Line 249. Given that Spiralia is the group of interest, why were the Gnathiferans also chosen at random?

      The point of the experiment was to see the effect of taxon sampling on the consistency of the resulting topology. Random sampling across the tree seems helpful in this context. We chose Gnathifera as one group to sample from as this ensured they would be present in all trees. This seems appropriate as they are the sister group of the clade of interest and as such their inclusion reflects a choice a typical investigator might make when choosing which species to include. Additionally, as noted in the reviewer’s earlier comment, Gnathifera includes many long-branched taxa and we wanted to ensure our root-placement results were robust to this aspect of taxon sampling.

      (4) Discussion

      Line 448. Our current understanding of the early spiralian fossil record is quite consistent with the main results of this paper. For example, there are very few claims for fossils that sit on the short branch leading to Spiralia (or Lophotrochozoa as defined here) that this paper discusses. Many of the key fossils that inform on the characters discussed in the introduction that have unusual character combinations have an apomorphy of one of the phyla discussed, and so are resolved as members of the stem lineages of particular phyla.

      This is what you would expect with long phylum stem lineages (line 148) and a short spiralia stem lineage. For example, the mollusc Wiwaxia has chaetae, but a mollusc like radula (Smith 2012), the conchiferan mollusc Pelagiella has chaetae and a coiled shell (Thomas et al. 2020). The only fossil groups that are routinely discussed as belonging to the stem lineage of more than one phylum are the tommotiids, which have chaetae, segmentation and a complex mineralised skeleton (but not shells in the brachiopod/mollusc sense, see Guo et al 2023) but they sit on the lophophorate stem lineage, a synapomorphy rich group the monophyly of which the present paper endorses (e.g. line 435). The fossil record is consistent with the scenario presented in line 442, e.g. convergent loss or reduction of chaetae and segmentation and convergent evolution of shells in molluscs and brachiopods.

      We accept these points (though are clearly not experts on these fossils). We have (slightly tentatively given our lack of expertise) expanded our discussion to include these fossil taxa with their combinations of characters.

  3. bafybeihwigujdzh7xrbwmf2t2zv5eku6cr3reb5qzqmhgrpnfdd2ryhh7y.ipfs.dweb.link bafybeihwigujdzh7xrbwmf2t2zv5eku6cr3reb5qzqmhgrpnfdd2ryhh7y.ipfs.dweb.link
    1. Who is doing the job of helping you to map the expanding frontier of your knowledgeparaphrased: Who is doing the job of creating a map of the Web frontier as you explore it. (200????) Or indeed a map of the territory already explored, ready to resume exactly where you left off.  How Hyperpost addresses these problemsDeep rearrangability and repurposabilty supported by a new "Cosmology for Computing' Capture Intertwingularity as scaffoldings of everything you care about in one placeAdd new capabilities at the Meta LevelHyperPost: The Thought Processor for Google+Google+: interest based social networking serviceSocial Knowledge Network: Intersection of Knowledge Graph, Google+ Circles and Thought GraphsGoogle+: interest based social networking serviceCircles: entire saffolding: Vannevar Bush: American electrical engineer and science administratorMemex: hypothetical proto-hypertext system that Vannevar Bush described in 1945HyperPost: The Thought Processor for Google+trails: Trail blazing: StubMemex: hypothetical proto-hypertext system that Vannevar Bush described in 1945Connected neighborhoods of nodes thus conveyed contain not only the information presented in the narrative trails, but they also contain as it were the entire scaffolding  with which they were erected.  Trail blazing  as in the  Memex  Thought Vectors in Concept Space  kernel for  tinkerable  Hypermedia Direct manipulation interfaces to suit personal needsthe Lively Kernel project.tinkerable: through associations: "The Human mind works by association" As We May Think - The Atlantickernel: main component of most computer operating systemsLively Kernel: StubDirect manipulation interface: StubHypermedia: Hypermedia, an extension of the term hypertext, is a nonlinear medium of information that includes graphics, audio, video, plain text and hyperlinks.To have sufficient built in capability in the  kernel  that support  tinkerable  Hypermedia formats incorporating Direct manipulation interfaces to suit personal needs, as it was done in the Lively Kernel project.  Thought Graph Search for Things as you writeThe sentences that you write are Nodes Structural linksHyperPost: The Thought Processor for Google+search and mention: entity: something that existswrite about things: Node: network conceptThought Graph: HyperPost  invites us to  search and mention  all the things that are important in the context of our thoughts that are related to the things we write about. The sentences you write down are turned into  Nodes in a  Thought Graph    Public Knowledge Graph Incorporate Entities from Google's Knowledge GraphWikiData auto suggest boxesThing: Wikidata: free knowledge database project hosted by Wikimedia and edited by volunteersPersonal Knowledge Graph: Google: American multinational Internet and technology corporationKnowledge Graph: knowledge base used by Google to enhance its search engine's search resultsWhen you want to mention some  Thing  the search box autosuggests matching entities drawn   Wikidata.A new node in the user's  Personal Knowledge Graph  is created that references the node in the  Google's   Knowledge Graph. Personal Entities When you reach the edge of your recorded knowledgeThe Wiki GambitCreate your own on the flyAutomatic contextualizationfor thoughts and discovered web resourcesFocus on what you write, not where you put itPersonal Entity: wiki: type of website that visitors can editIn case no public entity matches the user's search a new  Personal Entity  node is created in the user's Personal Knowledge Graph. This is analogous to the greatest gambit of the  wiki.  When you reach the edges of your knowledge just create a new page for it. Here it is more fine grained, it is just a node. You do not need to think up a name for a page. Nor would you need to worry about where it is created, because the identity of the node is independent from where you put it.  Context of discovery and Justification Like the eval and apply of LISPContext of justification: refers to the later or final phase of research when evidence is applied to and compared with a hypothesis.Context of discovery: StubThe Lakatos's term  Context of discovery  can be created by marking trails during your web research with HyperPost, whereas in the   Context of justification  linking to web resources discovered completes the circle. Blaze Trails Attach Narrative Trails to entity nodeslink to web resourcesThe context for a sentence automatically contextualizes linked resourceDiscussion Threads: It is possible to attach  Narrative Trails  to any entity node so that more information about it can be further elaborated. These narrative trails comprise sequences of paragraph, which in turn, consist of sentences for individual thoughts. In addition links to web resources can be attached so that they are linked to relevant contexts and will not be lost. Deep Re-arrangability and Re-purposing Reuse through transclusion any trails or contextproduce every sentence is a node, it can be moved, transcluded in any contextsocial media    Posts,  blog  posts,  Presentations, Project Plans, Issue Trackers rooted in your own graph of all your  articulated knowledgetransclusion: technical method of including some or all of one stored document in another document, without having to copy the data itselfDeep Rearrangeabilty: Ted Nelson: American information technologist, philosopher, and sociologist; coined the terms "hypertext" and "hypermedia"immitationg paper: social media: interaction among people in which they create, share, and/or exchane information and ideas in virtual communities and networksPosts: blog: discussion or informational site published on the World Wide WebPresentation slide: A slide is a single page of a presentation. Collectively, a group of slides may be known as a slide deckBy providing suitable structural links all kinds of presentation format's like  social media    Posts,  blog  posts,  Presentation slides,  etc can be applied to arbitrary network of nodes in the Thought Graph. Combine that with  transclusion  and we have "Deep Rearrangeabilty"  ref  required to solve  Ted Nelson's problem with "immitating paper"  ref   Capture Intertwingularity as scaffoldings of everything you care about in one placeAdd new capabilities at the Meta LevelHyperPost: The Thought Processor for Google+Google+: interest based social networking serviceSocial Knowledge Network: Intersection of Knowledge Graph, Google+ Circles and Thought GraphsGoogle+: interest based social networking serviceCircles: entire saffolding: Vannevar Bush: American electrical engineer and science administratorMemex: hypothetical proto-hypertext system that Vannevar Bush described in 1945HyperPost: The Thought Processor for Google+trails: Trail blazing: StubMemex: hypothetical proto-hypertext system that Vannevar Bush described in 1945Connected neighborhoods of nodes thus conveyed contain not only the information presented in the narrative trails, but they also contain as it were the entire scaffolding  with which they were erected.  Trail blazing  as in the  Memex  Demo This presentation was created in HyperPostPosts can be derived from it and will be publishedword processor: computer program used for writing and editing documentsHyperPost is used to generate the presentation it remains the master.Working with it preserves all the familiar characteristics of  a word processor augmented to accommodate thoughts and knowledge in their native associative graph model.  ConclusionHyperPost shows the way how to overcome the problem with paperIt is put forward as one possible way forward to reinvent hypertext for Academia Availibility Hyperpost landing page: Landing Page | hyperPostThis presentation will shortly be  available at  Hyperpost landing page  For people who sign up for the beta an extended version will be made available presenting a much larger graph, containing our development road map. It will be dynamically extended. Thanks And Thanks for all the fish

      map the expanding frontier

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Comments on revised version:

      The authors have appropriately addressed my comments and questions from the initial review process. My remaining concern relates to the lack of evidence to confirm proteasomal inhibition by lactacystin in both promastigotes and amastigotes. The immunoblotting experiment newly presented does not reveal a clear increase in the levels of poly-ubiquitylated proteins in treated parasites. In fact, poly-Ub levels were lower at both the 4h and 18h timepoints of treatment. If alternative antibodies or additional immunoblots are not available, the manuscript would benefit from an expanded discussion of this observation and potential explanations. In particular, the interpretation that lactacystin stabilizes ama- and pro-specific degradation would be greatly strengthened by such validation.

      Reviewer #2 (Public review):

      General comments on the revisions:

      My view is that the authors have made significant, satisfactory changes that address the comments and queries I made on the original manuscript (Review Commons).

      There are two areas where the authors had to make major changes/justifications where further comment is merited, these were:

      RNA-seq.

      The most significant issue was the originally underpowered RNA-seq which had only two replicates. This has been repeated with four replicates now. This has not led to changes in the interpretation of the data between the original study and this one. One comment that the authors make in the response to this was : "Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary". Ensuring that animal experiments are properly powered and that maximum robustness of the data from the minimum sample size is an important part of experimental design for ethical use of animal models. Essentially the replication here could have been avoided if the original study had used 1 more animal. However, the new version of RNA-seq brings appropriate confidence to the interpretation of the data.

      Phosphoproteomics.

      The authors provide a robust justification of their strategy for the phosphoproteomics and highlight the inclusion criteria for phosphosites: "Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate". The way missing values were dealt with is explained "For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition." This fills in some of the gaps I was missing from the original manuscript, and I am satisfied that the data analysis is entirely appropriate for a discovery/system -based approach such as this one. The authors also edit the manuscript to reflect that "occupancy" or "stoichiometry" might not be the best description of what they were presenting and switched to the terminology of "normalised phosphorylation level" - I think this is an appropriate response.

      Overall, in the absence of follow up experiments on specific individual examples, some of the claims in the original submission were toned down and reflect a more neutral description of the data now. Significantly, the data still underpin a key role for regulation of the ribosome between the amastigote and promastigote stages (and during the differentiation process). The recursive and reciprocal links between the phosphorylation and ubiquitination systems are interesting and present many opportunities for future investigation.

      Reviewer #3 (Public review):

      Summary:

      The authors proposed to use 5-layer systems level analysis (genomics, transcriptomics, proteomics / protein degradation, metabolomics, phosphoproteomics) to uncover how post-transcriptional mechanisms regulate stage differentiation in Leishmania donovani.<br /> This enabled the identification of several potential regulatory networks, including the regulation of stage-specific gene clusters by RNA stabilisation or decay, proteasomal degradation and protein phosphorylation.

      In the new version of this manuscript, the authors have addressed all questions raised by the reviewers.

      Strengths:

      Although some observations in this study have already been described in the literature, the integrated analysis applied here provides a novel view on how different levels of post-transcriptional networks regulate Leishmania differentiation. This "5-layer system" represents the first analysis of this depth in kinetoplastid parasites.

      The revised version with an increased sample number for the RNA-seq now made the authors assumptions adequate to their obtained data.

      The use of a proteasomal inhibitor adds an interesting insight in how protein degradation is involved in the parasite differentiation, confirming previous observations in the literature, and help to explain the discrepancies between mRNA and protein expression in the different stages.

      Weaknesses:

      While this work provides an impressive and foundational dataset, it opens the door for future research to rigorously validate these initial findings and conclusions.

      Significance and Impact in the field.

      The different datasets generated in this study will be of great interest to the parasitology community, either to be used for hypothesis generation, to validate data from other sources, etc.

      The multi-layered analysis performed here identified a series of potential feedback loops and regulatory networks to be further explored in organisms that lack transcriptional control.

      According to the reviewers’ comments, we made the following minor changes:

      As suggested by reviewer 1, we have extended the discussion of the results related to the analysis of the ubiquitination pattern by Western blot analysis as follows: “Proteasome inhibition blocked amastigote-to-promastigote differentiation, without inducing rapid global accumulation of ubiquitinated proteins (Figure S7C, upper panel) consistent with a quiescent-like state and low basal ubiquitin–proteasome system activity in amastigotes. After 18 h, ubiquitination levels remained similar to untreated cells, indicating that protein turnover and ubiquitin accumulation are primarily driven by developmental remodeling rather than acute proteasome inhibition. In promastigotes, the lack of detectable change (Fig. S7C, lower panel) may also reflect high basal ubiquitination, engagement of compensatory pathways such as autophagy, and/or only partial proteasome inhibition.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      - Supplementary figure 3 is not referenced in the main text.

      - The authors removed the "infinite" sign from figures 3 and 4 to better present the data according to their chosen approach to missing values when LFQ=0. However, the sign is still present in the respective figure legends, please adjust.

      Supplementary Figure 3 (Figure S3) is now referenced in the main text as requested.

      The "infinite" sign has been removed from the legends of Figures 3 and 4 as requested.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary:

      This manuscript reports the identification of putative orthologues of mitochondrial contact site and cristae organizing system (MICOS) proteins in Plasmodium falciparum - an organism that unusually shows an acristate mitochondrion during the asexual part of its life cycle and then this develops cristae as it enters the sexual stage of its life cycle and beyond into the mosquito. The authors identify PfMIC60 and PfMIC19 as putative members and study these in detail. The authors at HA tags to both proteins and look for timing of expression during the parasite life cycle and attempt (unsuccessfully) to localise them within the parasite. They also genetically deleted both gene singly and in parallel and phenotyped the effect on parasite development. They show that both proteins are expressed in gametocytes and not asexuals, suggesting they are present at the same time as cristae development. They also show that the proteins are dispensible for the entire parasite life cycle investigated (asexuals through to sporozoites), however there is some reduction in mosquito transmission. Using EM techniques they show that the morphology of gametocyte mitochondria is abnormal in the knockout lines, although there is great variation.

      Major comments:

      The manuscript is interesting and is an intriguing use of a well studied organism of medical importance to answer fundamental biological questions. My main comments are that there should be greater detail in areas around methodology and statistical tests used. Also, the mosquito transmission assays (which are notoriously difficult to perform) show substantial variation between replicates and the statistical tests and data presentation are not clear enough to conclude the reduction in transmission that is claimed. Perhaps this could be improved with clearer text?

      We would like to thank the reviewer for taking the time to review our manuscript. We are happy to hear the reviewer thinks the manuscript is interesting and thank the reviewer for their constructive feedback.

      To clarify the statistical analyses used, we included a new supplementary dataset with all statistical analyses and p-values indicated per graph. Furthermore, figure legends now include the information on the exact statistical test used in each case.

      Regarding mosquito experiments, while we indeed reported a reduction in transmission and oocysts numbers, we are aware that this effect might be due to the high variability in mosquito feeding assays. To highlight this point, we deleted the sentence “with the transmission reduction of [numbers]….” and we included the sentence “The high variability encountered in the standard membrane feeding assays, though, partially obstructs a clear conclusion on the biological relevance of the observed reduction in oocyst numbers“

      More specific comments to address:

      Line 101/Fig1E (and figure legend) - What is this heatmap showing. It would be helpful to have a sentence or two linking it to a specific methodology. I could not find details in the M+M section and "specialized, high molecular mass gels" does not adequately explain what experiments were performed. The reference to Supplementary Information 1 also did not provide information.

      We added the information “high molecular mass gels with lower acrylamide percentage” to clarify methodology in the text. Furthermore, we extended the figure legend to include all relevant information. Further experimental details can be found in the study cited in this context, where the dataset originates from (Evers et al., 2021).

      Line 115 and Supplementary Figure 2C + D - The main text says that the transgenic parasites contained a mitochondrially localized mScarlet for visualization and localization, but in the supplementary figure 2 it shows mitotracker labelling rather than mScarlet. This is very confusing. The figure legend also mentions both mScarlet and MitoTracker. I assume that mScarlet was used to view in regular IFAs (Fig S2C) and the MitoTracker was used for the expansion microscopy (Fig S2D)?

      Please clarify.

      We thank the reviewer for pointing this out – this was indeed incorrectly annotated. We used the endogenous mito-mScarlet signal in IFA and mitoTracker in U-ExM. The figure annotation has now been corrected.

      Figure 2C - what is the statistical test being used (the methods say "Mean oocysts per midgut and statistical significance were calculated using a generalized linear mixed effect model with a random experiment effect under a negative binomial distribution." but what test is this?)?

      The statistic test is now included in the material and method section with the sentence “The fitted model was used to obtain estimated means and contrasts and were evaluated using Wald Statistics”. The test is now also mentioned in the figure legend.

      Also the choice of a log10 scale for oocyst intensity is an unusual choice - how are the mosquitoes with 0 oocysts being represented on this graph? It looks like they are being plotted at 10^-1 (which would be 0.1 oocysts in a mosquito which would be impossible).

      As the data spans three orders of magnitude with low values being biologically meaningful, we decided that a log scale would best facilitate readability of the graph. As the 0 values are also important to show, we went with a standard approach to handle 0s in log transformed data and substituted the 0s with a small value (0.001). We apologize for not mentioning this transformation in the manuscript. To make this transformation transparent, we added a break at the lower end of the log-scaled y-axis and relabelled the lowest tick as ‘0’. This ensures that mosquitoes with zero oocysts are shown along the x-axis without being assigned an artificial value on the log scale. We would furthermore like to highlight that for statistics we used the true value 0 and not 0.001.

      Figure 2D - it is great that the data from all feeding replicates has been shared, however it is difficult to conclude any meaningful impact in transmission with the knock-out lines when there is so much variation and so few mosquitoes dissected for some datapoints (10 mosquitoes are very small sample sizes). For example, Exp1 shows a clear decrease in mic19- transmission, but then Exp2 does not really show as great effect. Similarly, why does the double knock out have better transmission than the single knockouts? Sure there would be a greater effect?

      We agree with the reviewer and with the new sentence added, as per major point, we hope we clarified the concept. Note that original Figure 2D has been moved to the supplementary information, as per minor comment of another reviewer.

      Figure 3 legend - Please add which statistical test was used and the number of replicates.

      Done

      Figure 4 legend - Please add which statistical test was used and the number of replicates.

      Done. Regarding replicates, note that while we measured over 100 cristae from over 30 mitochondria, these all stem from the same parasite culture.

      Figure 5C - the 3D reconstructions are very nice, but what does the red and yellow coloring show?

      Indeed, the information was missing. We added it to the figure legend.

      Line 352 - "Still, it is striking that, despite the pronounced morphological phenotype, and the possibly high mitochondrial stress levels, the parasites appeared mostly unaffected in life cycle propagation, raising questions about the functional relevance of mitochondria at these stages."

      How do the authors reconcile this statement with the proven fact that mitochondria-targeted antimalarials (such as atovaquone) are very potent inhibitors of parasite mosquito transmission?

      Our original sentence was reductive. What we wanted to state was related to the functional relevance of crista architecture and overall mitochondrial morphology rather than the general functional relevance of the mitochondria. We changed the sentence accordingly.

      Furthermore, even though we do not discuss this in the article, we are aware of mitochondria targeting drugs that are known to block mosquito transmission. We want to point out that it is difficult to discern the disruption of ETC and therefore an impact on energy conversion with the impact on the essential pathway of pyrimidine synthesis, highly relevant in microgamete formation. Still, a recent paper from Sparkes et al. 2024 showed the essentiality of mitochondrial ATP synthesis during gametogenesis so it is very likely that the mitochondrial energy conversion is highly relevant for transmission to the mosquito.

      Reviewer #1 (Significance):

      This manuscript is a novel approach to studying mitochondrial biology and does open a lot of unanswered questions for further research directions. Currently there are limitations in the use of statistical tests and detail of methodology, but these could be easily be addressed with a bit more analysis/better explanation in the text.

      This manuscript could be of interest to readers with a general interest in mitochondrial cell biology and those within the specific field of Plasmodium research.

      My expertise is in Plasmodium cell biology.

      We thank the reviewer for the praise.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Major comments:

      (1) In my opinion, the authors tend to sensationalize or overinterpret their results. The title of the manuscript is very misleading. While MICOS is certainly important for crista formation, it is not the only factor, as ATP synthase dimer rows make a highly significant contribution to crista morphology. Thus, one can argue with equal validity that ATP synthase should be considered the 'architect', as it's the conformation of the dimers and rows modulate positive curvature. Secondly, while cristae are still formed upon mic60/mic19 gene knockout (KO), they are severely deformed, and likely dysfunctional (see below). Thus, I do not agree with the title that MICOS is dispensable for crista formation, because the authors results show that it clearly is essential. So, the title should be changed.

      We thank the reviewer for taking the time to review our manuscript.

      Based on the reviewers’ interpretation we conclude the title does not come across as intended. We have changed the title to: “The role of MICOS in organizing mitochondrial cristae in malaria parasites”

      The Discussion section starting from line 373 also suffers from overinterpretation as well as being repetitive and hard to understand. The authors infer that MICOS stability is compromised less in the single KOs (sKO) in compared to the mic60/mic19 double KO (dKO). MICOS stability was never directly addressed here and the composition of the MICOS complex is unaddressed, so it does not make sense to speculate by such tenuous connections. The data suggest to me that mic60 and mic19 are equally important for crista formation and crista junction (CJ) stabilization, and the dKO has a more severe phenotype than either KO, further demonstrating neither is epistatic.

      We do agree with the reviewer’s notion that we did not address complex stability, and our wording did not make this sufficiently clear. We shortened and rephrased the paragraph in question.

      The following paragraphs (line 387 to 422) continues with such unnecessary overinterpretation to the point that it is confusing and contradictory. Line 387 mentions an 'almost complete loss of CJs' and then line 411 mentions an increase in CJ diameter, both upon Mic60 ablation. I do not think this discussion brings any added value to the manuscript and should be shortened. Yes, maybe there are other putative MICOS subunits that may linger in the KOS that are further destabilized in the dKO, or maybe Mic60 remains in the mic19 KO (and vice versa) to somehow salvage more CJs, which is not possible in the dKO. It is impossible to say with confidence how ATP synthase behaves in the KOs with the current data.

      We shortened this paragraph.

      (2) While the authors went through impressive lengths to detect any effect on lifecycle progression, none was found except for a reduction in oocyte count. However, the authors did not address any direct effect on mitochondria, such as OXPHOS complex assembly, respiration, membrane potential. This seems like a missed opportunity, given the team's previous and very nice work mapping these complexes by complexome profiling. However, I think there are some experiments the authors can still do to address any mitochondrial defects using what they have and not resorting to complexome profiling (although this would be definitive if it is feasible):

      i) Quantification of MitoTracker Red staining in WT and KOs. The authors used this dye to visualize mitochondria to assay their gross morphology, but unfortunately not to assay membrane potential in the mutants. The authors can compare relative intensities of the different mitochondria types they categorized in Fig. 3A in 20-30 cells to determine if membrane potential is affected when the cristae are deformed in the mutants. One would predict they are affected.

      Interesting suggestion. As our staining and imaging conditions are suitable for such analysis (as demonstrated by Sarazin et al., 2025, https://www.biorxiv.org/content/10.1101/2025.11.27.690934v1), we performed the measurements on the same dataset which we collected for Figure 3. We did, however, not detect any difference in mitotracker intensity between the different lines. The result of this analysis is included in the new version of Supplementary figure S6.

      ii) Sporozoites are shown in Fig S5. The authors can use the same set up to track their motion, with the hypothesis that they will be slower in the mutants compared to WT due to less ATP. This assumes that sporozoite mitochondria are active as in gametocytes.

      While theoretically plausible and informative, we currently do not know the relevance of mitochondrial energy conversion for general sporozoite biology or specifically features of sporozoite movement. Given the required resources and time to set this experiment up and the uncertainty whether it is a relevant proxy for mitochondrial functioning, we argue it is out of scope for this manuscript.

      iii) Shotgun proteomics to compare protein levels in mutants compared to WT, with the hypothesis that OXPHOS complex subunits will be destabilized in the mutants with deformed cristae. This could be indirect evidence that OXPHOS assembly is affected, resulting in destabilized subunits that fail to incorporate into their respective complexes.

      While this experiment could potentially further our understanding of the interaction between MICOS and levels of OXPHOS complex subunits we argue that the indirect nature of the evidence does not justify the required investments.

      To expedite resubmission, the authors can restrict the cell lines to WT and the dKO, as the latter has a stronger phenotype that the individual KOs and conclusions from this cell line are valid for overall conclusions about Plasmodium MICOS.

      I will also conclude that complexome/shotgun proteomics may be a useful tool also for identifying other putative MICOS subunits by determining if proteins sharing the same complexome profile as PfMic60 and Mic19 are affected. This would address the overinterpretation problem of point 1.

      (3) I am aware of the authors previous work in which they were not able to detect cristae in ABS, and thus have concluded that these are truly acristate. This can very well be true, or there can be immature cristae forms that evaded detection at the resolution they used in their volumetric EM acquisitions. The mitochondria and gametocyte cristae are pretty small anyway, so it not unreasonable to assume that putative rudimentary cristae in ABS may be even smaller still. Minute levels of sampled complex III and IV plus complex V dimers in ABS that were detected previously by the authors by complexome profiling would argue for the presence of miniscule and/or very few cristae.

      I think that authors should hedge their claim that ABS is acristate by briefly stating that there still is a possibility that miniscule cristae may have been overlooked previously.

      We acknowledge that we cannot demonstrate the absolute absence of any membrane irregularities along the inner mitochondrial membrane. At the same time, if such structures were present, they would be extremely small and unlikely to contain the full set of proteins characteristic of mature cristae. For this reason, we consider it appropriate to classify ABS mitochondria as acristate. To reflect the reviewer’s point while maintaining clarity for readers, we have slightly adjusted our wording in the manuscript, changing ‘fully acristate’ to ‘acristate’.

      This brings me to the claim that Mic19 and Mic60 proteins are not expressed in ABS. This is based on the lack of signal from the epitope tag; a weak signal is detected in gametocytes. Thus, one can counter that Mic19 and Mic60 are also expressed, but below the expression limits of the assay, as the protein exhibits low expression levels when mitochondrial activity is upregulated.

      We agree with the reviewer that the absence of a detectable epitope-tag signal does not definitively exclude low-level expression, and we have therefore replaced the term ‘absent’ with ‘undetectable’ throughout the manuscript. In context with previous findings of low-level transcripts of the proteins in a study by Lopez-Berragan et al. and Otto et al., we also added the sentence “The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.” to the discussion. At the same time, we would like to clarify that transcript levels for both genes fall within the <25th percentile, suggesting that these low values likely represent background signal rather than biologically meaningful expression. This interpretation is further supported by proteomic datasets in PlasmoDB, which report PfMIC19 and PfMIC60 expression in gametocyte and mosquito stages, but not in asexual blood stages.”

      To address this point, the authors should determine of mature mic60 and mic19 mRNAs are detected in ABS in comparison to the dKO, which will lack either transcript. RT-qPCR using polyT primers can be employed to detect these transcripts. If the level of these mRNAs are equivalent to dKO in WT ABS, the authors can make a pretty strong case for the absence of cristae in ABS.

      We appreciate the reviewer’s suggestion. As noted in the Discussion, existing transcriptomic datasets already show detectable MIC19 and MIC60 mRNAs in ABS. For this reason, we expect RT-qPCR to reveal low (but not absent) levels of both transcripts, unlike the true loss expected to be observed in the dKO. Because such residual signals have been reported previously and their biological relevance remains uncertain, we do not believe transcript levels alone can serve as a definitive indicator of cristae absence in ABS.

      They should highlight the twin CX9C motifs that are a hallmark of Mic19 and other proteins that undergo oxidative folding via the MIA pathway. Interestingly, the Mia40 oxidoreductase that is central to MIA in yeast and animals, is absent in apicomplexans (DOI: 10.1080/19420889.2015.1094593).

      Searching for the CX9C motifs is a valuable suggestion. In response to the reviewer´s suggestion we analysed the conservation of the motif in PfMIC19 and included this in a new figure panel (Figure 1 F).

      Did the authors try to align Plasmodium Mic19 orthologs with conventional Mic19s? This may reveal some conserved residues within and outside of the CHCH domain.

      In response to this comment we made Figure 1 F, where we show conserved residues within the CHCH domains of a broad range of MIC19 annotated sequences across the opisthokonts, and show that the Cx9C motifs are conserved also in PfMIC19. Outside the CHCH domain, we did not find any meaningful conservation, as PfMIC19 heavily diverges from opisthokont MIC19.

      (5) Statistical significance. Sometimes my eyes see population differences that are considered insignificant by the statistical methods employed by the authors, eg Fig. 4E, mutants compared to WT, especially the dKO. Have the authors considered using other methods such as student t-test for pairwise comparisons?

      The graphs in figures 3, 4 and 5 got a makeover, such that they now are in linear scale and violin plots (also following a suggestion from further down in the reviewer’s comments). We believe that this improves interpretability. ANOVA was kept as statistical testing to assure the correction for multiple comparisons that cannot be performed with standard t-test. A full overview of statistics and exact pvalues can also be found in the newly added supplementary information 2.

      Minor comments:

      Line 33. Anaerobes (eg Giardia) have mitochondria that do produce ATP, unlike aerobic mitochondria

      We acknowledge that producing ATP via OXPHOS is not a characteristic of all mitochondria-like organelles (e.g. mitosomes), which is why these are typically classified separately from canonical mitochondria. When not considering mitochondria-like organelles, energy conversion is the function that the mitochondrion is most well-known for and the one associated with cristae.

      Line 56: Unclear what authors mean by "canonical model of mitochondria"

      To clarify we changed this to “yeast or human” model of mitochondria.

      Lines 75-76: This applies to Mic10 only

      We removed the “high degree of conservation in other cristate eukaryotes” statement.

      Line 80: Cite DOI: 10.1016/j.cub.2020.02.053

      Done

      Fig 2D: I find this table difficult to read. If authors keep table format, at least get rid of 'mean' column' as this data is better depicted in 2C. I suggest depicted this data either like in 3B depicting portion of infected vs unaffected flies in all experiments, then move modified Table to supplement. Important to point out experiment 5 appears to be an outlier with reduced infectivity across all cell lines, including WT.

      To clarify: the mean reported in the table indicates the mean per replicate while the mean reported in figure 2C is the overall mean for a given genotype that corrects for variability within experiments. We agree that moving the table to the supplementary data is a good idea. We decided to not include a graph for infected and non-infected mosquitoes as this information would be partially misleading, highlighting a phenotype we argue to be influenced by the strong variability.

      Fig. 3C-G: I feel like these data repeatedly lead to same conclusions. These are all different ways of showing what is depicted in Fig 2B: mitochondria gross morphology is affected upon ablation of MICOS. I suggest that these graphs be moved to supplement and replaced by the beautiful images.

      Thank you for the nice comment on our images. We have now moved part of the graphs to supplementary figure 6 and only kept the Relative Frequency, Sphericity and total mitochondria volume per cell in the main figure.

      Line 180: Be more specific with which tubulin isoform is used as a male marker and state why this marker was used in supplemental Fig S6.

      We have now specified the exact tubulin isoform used as the male gametocyte marker, both in the main text and in Supplementary Fig. S6. This is a commercial antibody previously known to work as an effective male marker, which is why we selected it for this experiment. This is now clearly stated in the manuscript.

      Line 196 and Fig 3C: the word 'intensities' in this context is very ambiguous. Please choose a different term (puncta, elements, parts?). This is related to major point 2i above.

      To clarify the biological effect that we can conclude form the measurement, we added an explanation about it in the respective section of the results, and we decided to replace the raw results of the plug-in readout with the deduced relative dispersion.

      Line 222: Report male/female crista measurements

      We added Supplementary information 2, which contains exact statistical test and outcomes on all presented quantifications as well as a per-sex statistical analysis of the data from figure 4. Correspondingly, we extended supplementary information 2 by a per-sex colour code for the thin section TEM data.

      Fig. 4B-E: depict data as violin plots or scatter plots like Fig. 2C to get a better grasp of how the crista coverage is distributed. It seems like the data spread is wider in the double KO. This would also solve the problem with the standard deviation extending beyond 0%.

      We changed this accordingly.

      Lines 331-333: Please clarify that this applies for some, but not all MICOS subunits. Please also see major point 1 above. Also, the authors should point out that despite their structural divergence, trypanosomal cryptic mitofilins Mic34 and Mic40 are essential for parasite growth, in contrast to their findings with PfMic60 (DOI: https://doi.org/10.1101/2025.01.31.635831).

      This has been changed accordingly.

      Line 320: incorrect citation. Related to point 1above.

      Correct citation is now included in the text.

      Lines 333-335. This is related to the above. Again, some subunits appear to affect cell growth under lab conditions, and some do not. This and the previous sentence should be rewritten to reflect this.

      This has been changed accordingly.

      Line 343-345: The sentence and citation 45 are strange. Regarding the former, it is about CHCHD10, whose status as a bona fide MICOS subunit is very tenuous, so I would omit this. About the phenomenon observed, I think it makes more sense to write that Mic60 ablation results in partially fragmented mitochondria in yeast (Rabl et al., 2009 J Cell Biol. 185: 1047-63). A fragmented mitochondria is often a physiological response to stress. I would just rewrite as not to imply that mitochondrial fission (or fusion) is impaired in these KOs, or at least this could be one of several possibilities.

      The sentence has been substituted following the indication of the reviewer. Though we still include the data of the human cells as this has also been shown in Stephens et al. 2020.

      Line 373: 'This indicates' is too strong. I would say 'may suggest' as you have no proof that any of the KOs disrupts MICOS. This hypothesis can be tested by other means, but not by penetrance of a phenotype.

      Done

      Line 376-377; 'deplete functionality' does not make sense, especially in the context of talking about MICOS subunit stability. In my opinion, this paragraph overinterprets the KO effects on MICOS stability. None of the experiments address this phenomenon, and thus the authors should not try to interpret their results in this context. See major point 1.

      We removed the sentence. Also, the entire paragraph has been shortened, restructured and wording was changed to address major point 1.

      Other suggestions for added value

      (1) Does Plasmodium Sam50 co-fractionate with Mic60 and Mic19 in BN PAGE (Fig. 1E)

      While we did identify SAMM50 in our BN PAGE, the protein does not co-migrate with the MICOS components but instead comigrates with other components of a putative sorting and assembly machinery (SAM) complex. As SAMM50, the SAM complex and the overarching putative mitochondrial membrane space bridging (MIB) complex are not mentioned in the manuscript, we decided to not include the information in Author response image 1.

      Author response image 1.

      Reviewer #2 (Significance):

      The manuscript by Tassan-Lugrezin is predicated on the idea that Plasmodium represents the only system in which de novo crista formation can be studied. They leverage this system to ask the question whether MICOS is essential for this process. They conclude based on their data that the answer is no, which the authors consider unprecedented. But even if their claim is true that ABS is acristate, this supposed advantage does not really bring any meaningful insight into how MICOS works in Plasmodium.

      First the positives of this manuscript. As has been the case with this research team, the manuscript is very sophisticated in the experimental approaches that are made. The highlights are the beautiful and often conclusive microscopy performed by the authors. Only the localization of Mic60 and Mic19 was inconclusive due to their very low expression unfortunately.

      The examination of the MICOS mutants during in vitro life cycle of Plasmodium falciparum is extremely impressive and yields convincing results. Mitochondrial deformation is tolerated by life cycle stage differentiation, with a modest but significant reduction of oocyte production, being observed.

      However, despite the herculean efforts of the authors, the manuscript as it currently stands represents only a minor advance in our understanding of the evolution of MICOS, which from the title and focus of the manuscript, is the main goal of the authors.

      In its current form, the manuscript reports some potentially important findings:

      (1) Mic60 is verified to play a role in crista formation, as is predicted by its orthology to other characterized Mic60 orthologs.

      (2) The discovery of a novel Mic19 analog (since the authors maintain there is no significant sequence homology), which exhibits a similar (or the same?) complexome profile with Mic60. This protein was upregulated in gametocytes like Mic60 and phenocopies Mic60 KO.

      (3) Both of these MICOS subunits are essential (not dispensable) for proper crista formation

      (4) Surprisingly, neither MICOS subunit is essential for in vitro growth or differentiation from ABS to sexual stages, and from the latter to sporozoites. This says more about the biology of plasmodium itself than anything about the essentiality of Mic60, i.e. plasmodium life cycle progression tolerates defects to mitochondrial morphology. But yes, I agree with the authors that Mic60's apparent insignificance for cell growth in examined conditions does differ with its essentiality in other eukaryotes. But fitness costs were not assayed (e.g. by competition between mutants and WT in infection of mosquitoes)

      (5) Decreased fitness of the mutants is implied by a reduction of oocyte formation.

      While interesting in their own way, collectively they do not represent a major advance in our understanding of MICOS evolution. Furthermore, the findings bifurcate into categories informing MICOS or Plasmodium biology. Both aspects are somewhat underdeveloped in their current form.

      This is unfortunate because there seem to be many missed opportunities in the manuscript that could, with additional experiments, lead to a manuscript with much wider impact. For me, what is remarkable about Plasmodium MICOS that sets it apart from other iterations is the apparent absence of the Mic10 subunit. Purification of plasmodium MICOS via the epitope tagged Mic60 and Mic19 could have verified that MICOS is assembled without this core subunit. Perhaps Mic60 and Mic19 are the vestiges of the complex, and thus operate alone in shaping cristae. Such a reduction may also suggest the declining importance of mitochondria in plasmodium.

      Another missed opportunity was to assay the impact of MICOS-depletion of OXPHOS in plasmodium.

      This is a salient issue as maybe crista morphology is decoupled from OXPHOS capacity in Plasmodium, which links to the apparent tolerance of mitochondrial morphology in cell growth and differentiation. I suggested in section A experiments to address this deficit.

      Finally, the authors could assay fitness costs of MICOS-ablation and associated phenotypes by assaying whether mosquito infectivity is reduced in the mutants when they are directly competing with WT plasmodium. Like the authors, I am also surprised that MICOS mutants can pass population bottlenecks represented by differentiation events. Perhaps the apparent robustness of differentiation may contribute plasmodium's remarkable ability to adapt.

      I realize that the authors put a lot of efforts into their study and again, I am very impressed by the sophistication of the methods employed. Nevertheless, I think there is still better ways to increase the impact of the study aside from overinterpreting the conclusions from the data. But this would require more experiments along the lines I suggest in Section A and here.

      We thank the reviewer for their extensive analysis of the significance of our findings, including the compliments on our microscopy images and the sophisticated experimental approaches. We hope we have convincingly argued why we could or could not include some of the additional analyses suggested by the reviewer in section 1 above.

      With regard to the significance statement, we want to point out that our finding that PfMICOS is not needed for initial formation of cristae (as opposed to organization thereof), is a confirmation of something that has been assumed by the field, without being the actual focus of studies. We argue that the distinction between formation and organization of cristae is important and deserves some attention within the manuscript. The result of MICOS not being involved in the initial formation of cristae, we argue to be relevant in Plasmodium biology and beyond. As for the insights into how MICOS works in Plasmodium we have confirmed that the previously annotated PfMIC60 is indeed involved in the organization of cristae. Furthermore, we have identified and characterized PfMIC19. These findings, we argue, are indeed meaningful insights into PfMICOS.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      MICOS is a conserved mitochondrial protein complex responsible for organising the mitochondrial inner membrane and the maintenance of cristae junctions. This study sheds first light on the role of two MICOS subunits (Mic60 and the newly annotated Mic19) in the malaria parasite Plasmodium falciparum, which forms cristae de novo during sexual development, as demonstrated by EM of thin section and electron tomography. By generating knockout lines (including a double knockout), the authors demonstrate that knockout of both MICOS subunits leads to defects in cristae morphology and a partial loss of cristae junctions. With a formidable set of parasitological assays, the authors show that despite the metabolically important role of mitochondria for gametocytes, the knockout lines can progress through the life stages and form sporozoites, albeit with diminished infection efficiency.

      We thank the reviewer for their time and compliment.

      Major comments:

      (1) The authors should improve to present their findings in the right context, in particular by:

      i) giving a clearer description in the introduction of what is already known about the role of MICOS. This starts in the introduction, where one main finding is missing: loss of MICOS leads to loss of cristae junctions and the detachment of cristae membranes, which are nevertheless formed, but become membrane vesicles. This needs to be clearly stated in the introduction to allow the reader to understand the consistency of the authors' findings in P. falciparum with previous reports in the literature.

      We extended the introduction to include this information.

      iii) at the end to the introduction, the motivating hypothesis is formulated ad hoc "conclusive evidence about its involvement in the initial formation of cristae is still lacking" (line 83). If there is evidence in the literature that MICOS is strictly required for cristae formation in any organism, then this should be explained, because the bona fide role of MICOS is maintenance of cristae junctions (the hypothesis is still plausible and its testing important).

      To clarify we rephrased the sentence to: “Although MICOS has been described as an organizer of crista junctions, its role during the initial formation of nascent cristae has not been investigated.”

      (2) Line 96-97: "Interestingly, PfMIC60 is much larger than the human MICOS counterpart, with a large, poorly predicted N-terminal extension." This statement is lacking a reference and presumably refers to annotated ORFs. The authors should clarify if the true N-terminus is definitely known - a 120kDa size is shown for the P. falciparum but this is not compared to the expected length or the size in S. cerevisiae.

      To solve the reference issue, we added the uniprot IDs we compared to see that the annotated ORF is bigger in Plasmodium. We also changed the comparison to yeast instead of human, because we realized it is confusing to compare to yeast all throughout the figure, but then talk about human in this specific sentence.

      Regarding whether the true N-terminus is known. Short answer: No, not exactly.

      However, we do know that the Pf version is about double the size of the yeast protein.

      As the reviewer correctly states, we show the size of 120kDa for the tagged protein in Figure 1G. Considering that we tagged the protein C-terminally, and observed a 120kDa product on western blot, it is safe to conclude that the true N-terminus does not deviate massively from the annotated ORF, and hence, that there is a considerable extension of the protein beyond a 60kDa protein. We do not directly compare to yeast MIC60 on our western blots, however, that comparison can be drawn from literature: Tarasenko et al., 2017 showed that purified MIC60 running at ~60kDa on SDS-PAGE actively bends membranes, suggesting that in its active form, the monomer of yeast MIC60 is indeed 60kDa in size.

      To clarify, we now emphasize that we ran the Alphafold prediction on the annotated open reading frame (annotated and sequenced by Bohme et al. and Chapell et al. now cited in the manuscript), and revised the wording to make clear what we are comparing in which sentence.

      (3) lines 244-245: "Furthermore, our data indicates the effect size increases with simultaneous ablation of both proteins?". The authors should explain which data they are referring to, as some of the data in Fig 3 and 4 look similar and all significance tests relate to the wild type, not between the different mutants, so it is not clear if any overserved differences are significant. The authors repeat this claim in the discussion in lines 368-369 without referring to a specific significance test. This needs to be clarified.

      As a reply to this and other comments from the reviewers we added the multiple testing within all samples. In addition, to clarify statistics used we included a supplementary dataset with all p-values and statistical tests used.

      (4) lines 304-306: "Though well established as the cristae organizing system, the role of MICOS in initial formation of cristae remains hidden in model organisms that constitutively display cristae.". This sentence is misleading since even in organisms that display numerous cristae throughout their life cycle, new cristae are being formed as the cells proliferate. Thus, failure to produce cristae in MICOS knockout lines would have been observable but has apparently not been reported in the literature. Thus, the concerted process in P. falciparum makes it a great model organism, but not fundamentally different to what has been studied before in other organisms.

      We deleted this statement.

      (5) lines 373-378. "where ablation of just MIC60 is sufficient to deplete functionality of the entire MICOS (11, 15),". The authors' claim appears to be contrary to what is actually stated in ref 15, which they cite:

      "MICOS subunits have non-redundant functions as the absence of both MICOS subcomplexes results in more severe morphological and respiratory growth defects than deletion of single MICOS subunits or subcomplexes."

      This seems in line with what the authors show, rather than "different".

      This sentence has been removed.

      (6) lines 380-385: "... thus suggesting that membrane invaginations still arise, but are not properly arranged in these knockout lines. This suggests that MICOS either isn't fully depleted,...". These conclusions are incompatible with findings from ref. 15, which the authors cite. In that study, the authors generated a ∆MICOS line which still forms membrane invaginations, showing that MICOS is not required at all for this process in yeast. Hence the authors' implication that MICOS needs to be fully depleted before membrane invaginations cease to occur is not supported by the literature.

      This sentence has been deleted in the revised version of the manuscript.

      Minor comments:

      (1) The authors should consider if the first part of their title could be seen as misleading: It suggests that MICOS is "the architect" in cristae formation, but this is not consistent with the literature nor their own findings.

      Title is changed accordingly

      - Line 43, of the three seminal papers describing the discovery of MICOS in 2011, the authors only cite two (refs 6 and 7), but miss the third paper, Hoppins et al, PMID: 21987634, which should probably be corrected.

      Done, the paper is now cited

      - Page 2, line 58: for a more complete picture the authors should also cite the work of others here which shows that although at very low levels, e.g. complex III (a drug target) and ATP synthase do assemble (Nina et al, 2011, JBC).

      Done

      - Page 3, line 80: "Irrespective of the shape of an organism's cristae, the crista junctions have been described as tubular channels that connect the cristae membrane to the inner boundary membrane (22, 24)." This omits the slit-shaped cristae junctions found in yeast (Davies et al, 2011, PNAS), which the authors should include.

      The paper and concept have been added to the manuscript, though the sentence has been moved up in the introduction, when crista junctions are first introduced.

      - Line 97: "poorly predicted N-terminal extension", as there is no experimental structure, we don't know if the prediction is poor. Presumably the authors mean either poorly ordered or the absence of secondary structure elements, or the poor confidence score for that region in the prediction? This should be clarified or corrected.

      We were referring to the poor confidence score. To address this comment as well as major point 2, we rewrote the respective paragraph. It now clearly states that confidence of the prediction is low, and we mention the tool that was used to identify conserved domains (Topology-based Evolutionary Domains).

      - Line 98: "an antiparallel array of ten β-sheets". They are actually two parallel beta-sheets stacked together. The authors could find out the name of this fold, but the confidence of the prediction is marked a low/very low. So, its existence is unknown, not just its "function".

      We adapted the domain description to “a stack of two parallel beta-sheets" and replaced the statement on unknown function by the statement “Because this domain is predicted solely from computational analysis, both its actual existence in the native protein and its biological function remain unknown.”

      - Fig 1B: The authors show two alphafold predictions of S. cerevisiae and P. falciparum Mic60 structures. There is however an experimental Mic60/19 (fragment) structure from the former organism (PMID: 36044574), which should be included if possible.

      We appreciate the reviewer’s suggestion and note that the available structural data indeed provides valuable insight into how MIC60 and MIC19 interact. However, these structures represent fusion constructs of limited protein fragments and therefore capture only a small portion of each protein, specifically the interaction interface. Because our aim in Fig. 1B is to compare the overall domain architecture of the full-length proteins, we believe that including fragment-based structures would be less informative in this context.

      - Line: 318-321: "The same trend was observed for PfMIC19 and PfMIC60. Although transcriptomic data suggested that low-level transcripts of PfMIC19 and PfMIC60 are present in ABS (38), we did not detect either of the proteins in ABS by western blot analysis. While this statement is true, the authors should comment on the sensitivity of the respective methods - how well was the antibody working in their hands and how do they interpret the absence of a WB band compared to transcriptomics data?

      The HA antibody used in our experiments is a standard commercial reagent that performs reliably in both WB and IFA, although it shows a low background signal in gametocytes. We agree that the sensitivity of the method and the interpretation of weak or absent bands should be addressed explicitly. Transcript levels for both PfMIC19 and PfMIC60 in asexual blood stages fall within the <25 percentile, suggesting that these signals likely represent background. Nevertheless, we acknowledge that low-level protein expression below the detection limit of western blot analysis cannot be excluded. To reflect these considerations, we added the sentence: ‘The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.

      - Lines 322-323: would the authors not typically have expected an IFA signal given the strength of the band in Western blot? If possible, the authors should comment if the negative fluorescence outcome can indeed be explained with the low abundance or if technical challenges are an equally good explanation.

      Considering the nature of the investigated proteins (embedded in the IMM and spread throughout the mitochondria) difficulties in achieving a clear signal in IFA or U-ExM are not very surprizing. While epitopes may remain buried in IFA, U-ExM usually increases accessibility for the antibodies. However, U-ExM comes at the cost of being prone to dotty background signals, therefore potentially hiding low abundance, naturally dotty signals such as the signal of MICOS proteins that localize to distinct foci (at the CJ) along the mitochondrion. Current literature suggests that, in both human and yeast, STED is the preferred method for accurate spatial resolution of MICOS proteins (https://www.ncbi.nlm.nih.gov/pubmed/32567732,https://www.ncbi.nlm.nih.gov/pubmed/3206734 4). Unfortunately, we do not have experience with, nor access to, this particular technique/method.

      - Lines 357-365: the authors describe limitations of the applied methods adequately. Perhaps it would be helpful to make a similar statement about the analysis of 3D objects like mitochondria and cristae from 2D sections. E.g. the apparent cristae length depends on whether cristae are straight (e.g. coiled structures do not display long cross sections despite their true length in 3D).

      The limitations of other methods are described in the respective results section.

      We added a clarifying sentence in the results section of Figure 4:

      “Note that such measurements do not indicate the true total length or width of cristae, as the data is two-dimensional. The recorded values are to be considered indicative of possible trends, rather than absolute dimensions of cristae.“

      This statement refers to the length/width measurements of cristae.

      In the context of Figure 4D we mention the following (see preprint lines 229 – 230): “We expect this effect to translate into the third dimension and thus conclude that the mean crista volume increases with the loss of either PfMIC19, PfMIC60, or both.”

      For Figure 5, we included a clarifying statement in the results section of the preprint (lines 269 – 273): “Note that these mitochondrial volumes are not full mitochondria, but large segments thereof. As a result of the incompleteness of the mitochondria within the section, and the tomography specific artefact of the missing wedge, we were unable to confirm whether cristae were in fact fully detached from the boundary membrane, or just too long to fit within the observable z-range.”

      - Line 404: perhaps undetected or similar would be a better description than "hidden"?

      The sentence does not exist in the revised manuscript.

      Reviewer #3 (Significance):

      The main strength of the study is that it provides the first characterisation of the MICOS complex in P. falciparum, a human parasite in which the mitochondrion has been shown to be a drug target. Mic60 and the newly annotated Mic19 are confirmed to be essential for proper cristae formation and morphology, as well as overall mitochondrial morphology. Furthermore, the mutant lines are characterised for their ability to complete the parasite life cycle and defects in infection effectivity are observed. This work is an important first step for deciphering the role of MICOS in the malaria parasite and the composition and function of this complex in this organism. The limitation of the study stems from what is already known about MICOS and its subunits in great detail in yeast and humans with similar findings regarding loss of cristae and cristae defects. The findings of this study do not provide dramatic new insight on MICOS function or go substantially beyond the vast existing literature in terms of the extent of the study, which focuses on parasitological assays and morphological analysis. Exploring the role of MICOS in an early-divergent organism and human parasite is however important given the divergence found in mitochondrial biology and P. falciparum is a uniquely suited model system. One aspect that would increase the impact of the paper would be if the authors could mechanistically link the observed morphological defects to the decreased infection efficiency, e.g. by probing effects on mitochondrial function. This will likely be challenging as the morphological defects are diverse and the fitness defects appear moderate/mild.

      As suggested by Reviewer 2, we examined mitochondrial membrane potential in gametocytes using MitoTracker staining and did not observe any obvious differences associated with the morphological defects. At present, additional assays to probe mitochondrial function in P. falciparum gametocytes are not sufficiently established, and developing and validating such methods would require substantial work before they could be applied to our mutant lines. For these reasons, a more detailed mechanistic link between the observed morphological changes and the reduced infection efficiency is currently beyond reach.

      The advance presented in this study is to pioneer the study of MICOS in P. falciparum, thus widening our understanding of the role of this complex to different model organism. This study will likely be mainly of interest for specialised audiences such as basic research parasitologists and mitochondrial biologists. My own field of expertise is mitochondrial biology and structural biology.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, where both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulrosclerosis over several months. Because of concerns about incomplete KO, the authors generated and studied podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism of cell death, the authors performed global proteomics and found that spliceosome proteins were downregulated. They confirmed this directly by using long-read sequencing. These results suggest a novel role for insulin and IGF1R signaling in RNA splicing in podocytes.

      This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling regulates splicing is not directly addressed but implicates potentially the phosphorylation downstream of these receptors. In the revised manuscript, it is shown that the mouse KO is incomplete potentially explaining the slow onset of renal insufficiency. Direct measurement of GFR and serial serum creatinines might also enhance our understanding of progression of disease, proteinuria is a strong sign of renal injury. An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful but may be masked by defects in other spliceosome genes. As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.

      Significance:

      With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism.

      Comments on revised version:

      I'm satisfied with the revised manuscript and the responses to my previous concerns.

      Thank you.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.

      Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.

      Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.

      Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The four figures are generally well-designed, bars/superimposed dot-plots.

      Methods are generally well described.

      Comments on revised version:

      Coward and colleagues have done an excellent job of responding to all the reviewer comments.

      Thank you.

      Reviewer #4 (Public review):

      Summary and background:

      This report entitled "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte" from Hurcombe et al is based on a mouse double knockdown of the IR and IGF1R and a parallel cultured mouse podocyte model. Insulin/IGF signaling system in mammals evolved as three gene reduplicated peptides (insulin, IGF-1, and IGF-2) and their two receptors IR and IGF1R that cross-react to variable extents with the peptides, are ubiquitously expressed, and signal through parallel pathways. The major downstream effect of insulin is to regulate glucose uptake and metabolism, while that of the IGF pathways is to regulate growth and cell cycling in part through mTORC1. The GH-IGF-1-IGF1R pathway regulates post-natal growth. IGF-2 signaling is thought to play a major role in regulating intrauterine growth and development, although IGF-2 is also present at high levels in post-natal life. Thus, one would anticipate that reducing IR/IGF1R signaling in any cell would slow growth and cell cycling by reducing growth factor and metabolic mTORC1-mediated and other processes including the splicing of RNA for protein synthesis.

      Thank you for this new extra review and assessing our paper with new suggestions (we addressed the previous suggestions to the satisfaction of other reviewers). Of note -regarding this introduction – the podocyte is a terminally differentiated cell and may have unique responses to insulin / IGF as it is accepted it does not generally proliferate (hence we consider understanding the actions of insulin / IGF and their receptors to be of interest). Indeed, we have recently shown a contrasting effect of IGF signalling in the podocyte. Partial suppression of the IGF1 receptor is beneficial in contrast to near complete suppression that results in mitochondrial dysfunction (PMID:38706850).

      Mouse IR/IGF1R double knockdown model:

      A double knockdown mouse model was generated by interbreeding mice with different genetic backgrounds carrying floxed sites for IR and IGF-1R to produce mixed background offspring with both floxed IR and IGF-1R genes. These mice were crossed so that the podocin promoter driven-Cre (that comes on at about embryonic day 12 bas podocytes are developing) would delete IR and IGF-1R genes. Since podocin is believed to be an absolutely podocyte-specific protein, this podocin promoter this is predicted to specifically knock down the IR and IGF1R genes only in podocytes. The weight and growth of double KO offspring was not different from controls, but some proportion of the double knockdown mice subsequently developed proteinuria by 6 months and 20% died, although no specific data is provided to identify the cause of the deaths since eGFR was not decreased. Surviving mice were evaluated at 6 months of age. The efficacy of knockdown was not demonstrated in the mouse model itself, although a temperature-sensitive cell line developed from these double knockdown mice showed that expression of IR and IGF-1R proteins in the Cre-treated cell line were both reduced by about 50% (no statistical analysis of this result provided).

      In the knockout mice, proteinuria was significantly increased by 6 months, but not at earlier time points. Histologic analysis showed proteinaceous casts, glomerulosclerosis and interstitial fibrosis. Podocyte number was stated to be reduced by about 30% in double knockdown mice, although the method by which this was evaluated seems to have been by counting WT1 positive nuclei in glomerular cross-sections, an approach that is well-known not to be a reliable way of assessing true podocyte number. No information is provided about podocyte size, density or glomerular volume.

      Comment: If IR/IGF1R deletion plays a significant role in normal podocyte function sufficient to cause proteinuria and glomerulosclerosis then the effect of reduced IR and IGF1R protein expression on podocyte function would have been expected to produce a phenotype before 6 months. A more likely scenario to explain the overall result is that deleting the IR and IGF1R genes at about embryonic day12 impacted podocyte development to a variable extent such that some mice developed fewer podocytes per glomerulus than other mice. As mice grow and their glomeruli and glomerular capillary area increases, those mice with fewer podocytes would not be able to completely cover the filtration surface with foot processes and would develop proteinuria and glomerulosclerosis. If reduced podocyte number per glomerulus is the proximate cause of the observed proteinuria, then modulation of the body and kidney growth rate by calorie restriction to slow growth (lower circulating IGF-1 levels) would be expected to be protective, while a high protein high calorie diet (higher circulating IGF-1 levels) or uni-nephrectomy to increase kidney growth rate would be expected to enhance proteinuria and glomerulosclerosis.

      Thank you for these comments. In response to them:

      (1) WT1 as a marker of podocyte number. We agree may not be the most accurate way of precisely measuring podocyte number but is widely accepted in the field (PMID:33655004 / PMID:38542564) and we think convincingly shows fewer podocytes at 6-months.

      (2) Podocyte size and density was not measured. This was not the focus of the paper and the histology obviously showed a significant phenotype in several mice (Figs 1D-F). Of note we did objectively assess a glomeruloscleorosis index (Fig 1D). We took the approach to understand mechanism through non-biased proteomics and phospho-proteomics of conditionally immortalised podocytes in which we had convincingly knocked down the insulin and IGF1 receptors (Figure 2)

      (3) You did not study the mice earlier to ascertain the developmental phenotype. We concede we did not do this but there was no significant proteinuria detected early in the mice so elected not to increase mouse numbers by studying them then (which we consider good practice for reduction, replacement and refinement). We suspect there would have been subtle changes in those mice that had significantly reduced simultaneous IR and IGF1R knockdown. It was precisely because of this that we generated a conditionally immortalised podocyte cell line with robust simultaneous knock-down of both receptors.

      (4) You did not show significant insulin and IGF1 receptor knockdown in the conditionally immortalised cell line (reviewer states it was 50%). We clearly knocked both receptors down (insulin and IGF1R) in the podocyte line by >80% which was highly statistically significant (p<0.00001). Figure 2A. We agree this was crucial (and we made the cell line because of the variability in the mouse model).

      The model as used may be more representative of a variable degree of podocyte depletion than an effect of impaired IR/IGF1R signaling. Therefore, although the phenotype may be ultimately attributable to the IR/IGF1R gene deletions the proteinuria and glomerulosclerotic phenotype itself was probably a consequence of defective podocyte development. Examining podocyte number, size, density and glomerular volume at earlier time points (4 weeks) would help to answer this question. Therefore, a more appropriate title would be "The insulin/IGF axis is critically important (for) normal podocyte development and deployment". In this context the effect of the knockdowns on splicing would make more sense.

      Please see our response (above). We think our final conclusion that in the podocyte the insulin/IGF axis is important for spliceosome activity and control is valid. This is due to our findings (both total and phospho proteomics results) and considering recent other papers showing this axis can rapidly phosphorylate a variety of spliceosome proteins in different cell types (PMID:39939313 / PMID:32888406). All discussed in detail in the manuscript).

      Cell culture studies. A cell line was generated using a temperature sensitive SV40 system that has been previously reported from this laboratory. A detailed analysis is provided to show that double knockout cells exhibited abnormal spliceosome activity. This forms the basis for the conclusion that "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte". There are several concerns that weaken this conclusion.

      (1) In the double knockdown cell culture system about 30% of cells were "lost" by 3 days and about 70% of cells were "lost" by 5days. The studies were done at the 3 day time point. It is not clear whether "lost" cells were in the process of dying, stress-induced detachment, or just growing more slowly than control due to reduced IR and IGF-1R signaling. These processes could have impacted splicing in a non-specific way independent of IR/IGF1R signaling itself.

      (2) Can a single cell line derived from the double floxed mice be relied on to provide an unbiased picture of the effect of deleting IR and IGF-1R? Presumably, the transfection and selection process will select for cells that survive thereby including unknown biases, possibly related to spliceosome function. Is a single cell line adequate? These investigators have extensive experience with this type of analysis, but this question is not addressed in the discussion.

      (3) To determine whether the effect is specific to reduced IR/IGFR signaling the deletion of IR and IGF-1R could be corrected by transfecting full length IR and IGF-1R cDNAs into the cells to restore normal IR/IGF1R signaling. If transfected cells with intact IR and IGF-1R expression and activity returns spliceosome activity to normal this would be evidence that receptors themselves play some role in spliceosome activity, as opposed to the downstream effect on growth limitation/stress on the cells.

      (4) Other ways of testing whether the splicing effect is specifically due to reduced IR/IGF-1R signaling would be to (a) block IR and IGF1R receptors using available inhibitors, (b) remove or reduce insulin, IGF-1 and IGF-2 levels in the culture medium, (c) use low glucose and amino acid culture medium to slow growth rate independent of receptor function, (d) or block intra-cellular signaling via the IR and IGF-1R receptors through mTORC1 inhibition using rapamycin or other signaling targets.

      (5) It would be useful to determine whether the cultured cells stressed in other ways (e.g. ischemia, toxins, etc.) also results in the same splicing abnormalities.

      Point 1. 70% cell loss was observed at day 7 (not day 5). We found approximately 20% loss at day 3. We opted to go for this early date hypothesising the key detrimental processes would be clear then. This 3 day time point also ensures there has been enough time to allow for the expression of Cre recombinase, receptor gene excision and degradation of existing endogenous IR/IGF1R following lentiviral transduction. Interestingly we did not find a major “death or apoptosis” signal in our data then but agree it should be considered. We think this is a specific pathway as we have examined several other conditionally immortalised detrimental podocyte cell line previously using proteomics with a much more severe phenotype of cell death (E.g. podocyte GSK3 alpha/beta knockdown) and we detected NO spliceosome signal (PMID:30679422). Furthermore, there are now other podocyte proteomics “stress” studies that have been published in which there is proteinuria and significant cell loss / death that also do not show spliceosome dysfunction. These include studying the detailed proteosomal signature of podocytes stressed with Doxorubicin and Lipopolysaccharide endotoxin LPS in mice (PMID:32047005) and bradykinin stimulation of rat podocytes (PMID:32518694).

      Point 2. Yes, we think it is valuable and reproducible. We generated a podocyte cell line from insulin receptor and IGF1 receptor homozygous floxed cells. Hence there is no selection bias in the cells when generating the line as both receptors are effectively intact. We then temporally “knocked down” the receptors with extrinsic lentiviral Cre.

      Importantly we validated our cell line findings both back in the cells (with Western blotting) and in our transgenic receptor knockdown mice and found evidence of spliceosomal dysregulation (Figure 3E and 3F). Also as discussed above the spliceosome has been identified in other models in the insulin/IGF pathway.

      Point 3. We don’t think the experiment of knocking down the receptors and then reconstituting them would prove this hypothesis. This is because if splicing abnormality was due to generalised cell dysfunction (which we do not think is the case in this situation) then putting the receptors back may simply restore cell health and the spliceosomal function (e.g. it does not prove it is via the receptors). Secondly, the process of transduction with multiple lentiviruses may be inherently stressful to the cell and there may be a high level of extrinsic receptor inserted which may also be confounding/detrimental. Finally, as discussed there are now several lines of evidence describing insulin / IGF signalling to spliceosomal proteins which we consider important (discussed in the paper in detail).

      Point 4. We think modulating the receptors using the Cre-lox approach is the cleanest approach (with fewer off-target effects) to interrogate the insulin / IGF axis. It allows us to differentiate the cells by thermo-switching (which is crucial for this terminally differentiated cell) and then robustly knocking down both receptors simultaneously to investigate mechanism. We agree these supplementary approaches may give some extra information if their limitations (eg off target effects of inhibitors) are also taken into consideration.

      Point 5. They do not. Please see response to point 1 above regarding GSK3, Doxorubicin, LPS and bradykinin challenge.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Johnston and Smith used linear electrode arrays to record from small populations of neurons in the superior colliculus (SC) of monkeys performing a memory-guided saccade (MGS) task. Dimensionality reduction (PCA) was used to reveal low-dimensional subspaces of population activity reflecting the slow drift of neuronal signals during the delay period across a recording session (similar to what they reported for parts of cortex: Cowley et al., 2020). This SC drift was correlated with a similar slow-drift subspace recorded from the prefrontal cortex, and both slow-drift subspaces tended to be associated with changes in arousal (pupil size). These relationships were driven primarily by neurons in superficial layers of the SC, where saccade sensitivity/selectivity is typically reduced. Accordingly, delay-period modulations of both spiking activity and pupil size were independent of saccade-related activity, which was most prevalent in deeper layers of the SC. The authors suggest that these findings provide evidence of a separation of arousal- and motor-related signals. The analysis techniques expand upon the group's previous work and provides useful insight into the power of large-scale neural recordings paired with dimensionality reduction. This is particularly important with the advent of recording technologies which allow for the measurement of spiking activity across hundreds of neurons simultaneously. Together, these results provide a useful framework for comparing how different populations encode signals related to cognition, arousal, and motor output in potentially different subspaces.

      Comments on revised manuscript:

      The authors have done a very good job of responding to all of the reviewers' concerns.

      No weaknesses to address.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themself introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time on explaining the importance of arousal and how it could interfere with oculomotor behavior.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?

      Comments on revised manuscript:

      I remain somewhat concerned that the authors jump immediately into an analysis of the 'arousal-related' effects on SC activity. Before that, I would like to see a more detailed discussion justifying the use pupil size alone (i.e., w/o other indicators such as RT) as indicative of fluctuations in general arousal that are causal to concomitant changes in SC activity. Instead, in its current form, the authors find changes in SC activity and describe them immediately as 'arousal-related'.

      Other than this conceptual issue, I do not have major problems with the analysis per se.

      We agree with the reviewer that we may have advanced into discussing arousal-related effects in the previous version of the manuscript without providing a thorough explanation for why we think the slow drift axis is associated with changes in the monkey’s arousal levels. Arousal has been linked to the size of the pupil as well as movements of the eyes in numerous previous studies. We have made the following changes in the revised manuscript to address the reviewer’s concern:

      (1) When first describing how the spiking responses of SC neurons fluctuate over the course of a recording session (Lines 130-132), we have used the phrase "slow fluctuations in the spiking responses" rather than "arousal-related fluctuations in the spiking responses". Then, when describing these effects in more detail (Lines 136-147), we have explained why we think these fluctuations may be related to arousal. The following text has been added in the revised manuscript for clarification:

      “We found that this low-dimensional pattern of activity in the SC was also correlated with pupil size in the present study and with simultaneously recorded data in the prefrontal cortex (PFC), pointing to a link between this brain-wide fluctuation and changes in the monkeys’ arousal levels while performing the task.” (Lines 136-147)

      (2) We have changed the subheading in Line 183 of the revised manuscript from "Arousal-related fluctuations are present in the SC and correlated with pupil size and fluctuations in PFC activity" to "Slow fluctuations in SC spiking activity are correlated with pupil size and PFC activity". Given that we have not yet explained the results linking these fluctuations to arousal at this stage of the manuscript, we believe that this revised title is more accurate and avoids jumping too quickly to arousal-related fluctuations without first explaining the link between SC slow drift, pupil size and PFC activity.

      (3) We have provided additional justification for using pupil size and PFC activity to assess whether SC slow drift is associated with changes in the monkeys’ arousal levels. In a previous study, we computed an identical slow drift axis for spiking responses in visual cortex (V4) and PFC, and investigated how these low-dimensional neural activity patterns, which were themselves strongly correlated, were associated with various eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). Results showed that pupil size was the strongest predictor of slow drift in V4 and PFC. Given that the eye metrics were also strongly correlated with each other, we believe that the observed relationship between SC slow drift, pupil size and PFC activity provides sufficient evidence to suggest that the fluctuations observed in the SC are arousal-related. The following text has been added to the Results section of the revised manuscript:

      “Moreover, previous work in our laboratory computed a similar slow-drift axis using spiking activity in visual cortex (V4) and PFC, and investigated the relationship between these low-dimensional neural activity patterns and different eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). In addition to observing a strong correlation between V4 and PFC slow drift, we found that, relative to the other eye-related metrics, pupil size was the strongest predictor of these fluctuations (Johnston et al., 2022a). Thus, to further confirm the link between the SC slow drift axis and changes in the monkeys’ arousal levels while they performed the MGS task, we next sought to explore if projections onto the SC slow drift axis were associated with pupil size.” (Lines 236-344)

      Reviewer #3 (Public review):

      Summary:

      This study looked at slow changes in neuronal activity (on the order of minutes to hours) in the superior colliculus (SC) and prefrontal cortex (PFC) of two monkeys. They found that SC activity shows slow drift in neuronal activity like in the cortex. They then computed a motor index in SC neurons. By definition, this index is low if the neuron has stronger visual responses than motor response, and it is low if the neuron has weaker visual responses and stronger motor responses. The authors found that the slow drift in neuronal activity was more prevalent in the low motor index SC neurons and less prevalent in the high motor index neurons. In addition, the authors measured pupil diameter and found it to correlate with slow drifts in neuronal activity, but only in the neurons with lower motor index of the SC. They concluded that arousal signals affecting slow drifts in neuronal modulations are brain-wide. They also concluded that these signals are not present in the deepest SC layers, and they interpreted this to mean that this minimizes the impact of arousal on unwanted eye movements.

      Strengths:

      The paper is clear and well-written.

      Showing slow drifts in the SC activity is important to demonstrate that cortical slow drifts could be brain-wide.

      Weaknesses:

      The authors find that the SC cells with the low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual sensitivity. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in the most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC.

      Of course, the general conclusion is that the motor neurons will not have the arousal signal. It's just the interpretation that is different in the sense that the lack of the arousal signal is due to a lack of visual sensitivity in the motor neurons.

      I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. Please also note that I do not mean the luminance transient associated with the target onset. I mean the luminance of the gray display. it is a source of light. if the pupil diameter changes, then the amount of light entering to the visually sensitive neurons also changes.

      Comments on revised manuscript:

      The authors have addressed my first primary comment. For the light comment, I'm still not sure they addressed it. At the very least, they should explicitly state the possibility that the amount of light entering from the gray background can matter greatly, and it is not resolved by simply changing the analysis interval to the baseline pre-stimulus epoch. I provide more clear details below:

      In line 194 of the redlined version of the article (in the Introduction), the citation to Baumann et al., PNAS, 2023 is missing near the citation of Jagadisan and Gandhi, 2022. Besides replicating Jagadisan and Gandhi, 2022, this other study actually showed that the subspaces for the visual and motor epochs are orthogonal to each other

      We thank the reviewer for this comment and apologize that the citation to Baumann et al., PNAS, 2023 was missing in the previous version of the manuscript. In addition to including this citation in the revised version, we have provided a much more comprehensive description of all three cited studies and clarified that, in addition to replicating the results of Jagadisan and Gandhi, Baumann et al., PNAS, 2023 showed that the subspaces for the visual and motor epochs are orthogonal to each other. The following lines have been added to the Introduction of the revised manuscript:

      “A similar separation has been observed for visual and motor responses in the SC (Jagadisan and Gandhi, 2022; Ayar et al., 2023; Baumann et al., 2023). For example, Jagadisan and Gandhi (2022) used linear microelectrode arrays to investigate why early eye movements are not triggered when neuronal responses to a visual target, presented before a delayed saccade to that target, cross a threshold. They found that population activity in the SC was less stable during the visual epoch of a delayed saccade task, relative to the saccade epoch. Moreover, saccades could be evoked more easily by patterned microstimulation when the temporal structure of the microstimulation was stable across electrodes, providing a potential explanation for how downstream regions differentiate between visual and motor responses. Similar results were reported by Baumann et al. (2023) who found that the strength of SC motor responses during a saccade to a visual image depends on the features of that image (e.g., contrast, orientation). When dimensionality reduction was applied to the spiking responses of neuronal populations in the SC, the population trajectory during the initial visual response to the image was orthogonal to that during the motor response. These findings replicate the separation in temporal population structure reported by Jagadisan and Gandhi (2022) and support the results of Ayar et al. (2023). They found that, although not completely orthogonal, population activity in the SC is distinct for visual and motor responses during the same oculomotor task and across different tasks, which could further facilitate the decoding of signals related to sensation, action and context by downstream regions.” (Lines 110-127)

      Line 683 (and around) of the redlined version of the article (in the Results): I'm very confused here. When I mentioned visual modulation by changed pupil diameter, I did not mean the transient changes associated with the brief onset of the cue in the memory-guided saccade task. I meant the gray background of the display itself. This is a strong source of light. If the pupil diameter changes across trials, then the amount of light entering the eye also changes from the gray background. Thus, visually-responsive neurons will have different amount of light driving them. This will also happen in the baseline interval containing only a fixation spot. The arguments made by the authors here do not address this point at all. So, please modify the text to explicitly state the possibility that the global luminance of the display (as filtered by the pupil diameter) alters the amount of light driving the visually-responsive neurons and could contribute to the higher effects seen in the more visual neurons.

      We apologize that our analysis did not fully address the reviewer’s concern that the presence of fluctuations in visual neurons and their absence in motor neurons may have arisen indirectly due to changes in the amount of light entering the eye caused by changes in pupil size. As per the reviewer’s suggestion, we have now raised the possibility that visual neurons in the SC may have firing rates that are monotonically related to slow trends in overall luminance induced by pupil size changes, whereas motor neurons do not. Although we believe this to be an unlikely explanation, the paragraph from lines 374-398 has been modified to better describe this possibility, including the following text:

      “Given that slow drift is found in traditionally defined visual areas (e.g., area V4) and in regions that show mixed selectivity for multiple task variables (e.g., PFC) (Cowley et al., 2020), it seems unlikely that slow drift is caused by luminance fluctuations alone and more likely that it reflects global changes in arousal. At the same time, these arousal-related fluctuations covary with changes in pupil size (Johnston et al., 2022a), which could modulate the amount of light entering the eye from the display. This might affect visual neurons but not motor neurons due to their lack of visual sensitivity. Because SC neurons exist on a continuum, with visual responses decreasing and motor responses increasing from the intermediate to deep layers (Massot et al., 2019; Heusser et al., 2022) and no clear categorical boundary for motor-only neurons, any readout strategy would still need to avoid corruption of the motor output by slow drift, even if it were caused by changes in the amount of light entering the eye.” (Lines 387-398)

      The figures (everywhere, including the responses to reviewers) are very low resolution and all equations in methods are missing.

      We thank the reviewer for bringing this to our attention. We believe this issue may have arisen during conversion of the manuscript file for review, as the figures were of sufficient quality and the equations visible in the version that appeared online (https://doi.org/10.7554/eLife.99278.2). In any case, we will ensure that high-resolution figures are submitted with the revised manuscript and apologize that they were low resolution in the previous version.

      I'm very confused by Fig. 2 - supplement 2. Panel B shows a firing rate burst aligned to *microsaccade* onset. Does that mean you were in the foveal SC? i.e. how can neurons have a motor burst to the target of the memory-guided saccade and also for microsaccades? And which microsaccade directions caused such a burst? And what does it mean to compute the motor index and spike count for microsaccades in panel C? if you were in the proper SC location for the saccade target, then shouldn't you *not* get any microsaccade-related burst at all? This is very confusing to me and needs to be clarified

      We agree that clarification is needed here and thank the reviewer for their comment. The eccentricity of the targets was set to match the endpoints of the evoked saccades, which for some sessions were relatively close to the fovea. The mean eccentricity of the targets across sessions was 4.52° (SD = 2.89°). These values are now reported in the Methods section of the revised manuscript (Line 637). For the neuron shown in Figure 2–figure supplement 2, the eccentricity of the targets was 3°. Previous research has shown that some SC neurons respond during microsaccades as well as slightly larger saccades (see Hafed & Krauzlis, 2012, J. Neurophysiol., Fig. 4B). This likely explains why the neuron shown in Figure 2–figure supplement 2, which had a receptive field at ~3° based on saccades evoked by microstimulation, also responded during microsaccades. We apologize that this was not explained in the previous version and agree that it could have been confusing for the reader. To address this, the legend for this supplementary figure has been edited in the revised version and now reads:

      “(B) PSTH for an SC neuron that responded around the time of a microsaccade. Firing rates were computed in 1ms bins, averaged across trials and smoothed using a Gaussian function (σ = 5ms). Note that the targets were set to 3º in this session based on saccades evoked by microstimulation (see Methods). Previous research has shown that some SC neurons respond during microsaccades as well as to slightly larger saccades (Hafed and Krauzlis, 2012). This likely explains why this SC neuron, which had a RF at ~3º based on saccades evoked by microstimulation, also responded around the time of a microsaccade.” (Lines 1026-1031)

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study explores how exogenous attention operates at the finest spatial scale of vision, within the foveola - a topic that has not been previously explored. The question is important for understanding how attention shapes perception, and how it differs between the periphery and the central regions of highest visual acuity. The evidence is compelling, as shown by carefully designed experiments with state-of-the-art eye tracking to monitor attended locations just a few tens of minutes of arc away from the fixation target, but additional clarification regarding analyses and implications for vision and oculomotor control would broaden the impact of the study.

      We thank the editors and reviewers for their thorough evaluation of our work. We have carefully revised the manuscript and substantially reworked the Discussion to address all of the points raised, eliminate redundancies, streamline the text, and clarify the implications of our findings for vision and oculomotor control. We have also expanded the documentation of our power analyses and conducted the additional analyses requested by the reviewers. Our point-by-point responses are provided.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates how exogenous attention modulates spatial frequency sensitivity within the foveola. Using high-precision eye-tracking and gaze-contingent stimulus control, the authors show that exogenous attention selectively improves contrast sensitivity for low- to midrange spatial frequencies (4-8 cycles/degree), but not for higher frequencies (12-20 CPD). In contrast, improvements in asymptotic performance at the highest contrast levels occur across all spatial frequencies. These results suggest that, even within the foveola, exogenous attention operates through a mechanism similar to that observed in peripheral vision, preferentially enhancing lower spatial frequencies.

      Strengths:

      The study shows strong methodological rigor. Eye position was carefully controlled, and the stimulus generation and calibration were highly precise. The authors also situate their work well within the existing literature, providing a clear rationale for examining the fine-grained effects of exogenous attention within the foveola. The combination of high spatial precision, gazecontingent presentation, and detailed modeling makes this a valuable technical contribution.

      Weaknesses:

      The manipulation of attention raises some interpretive concerns. Clarifying this issue, together with additional detail about statistics, participant profiles, other methodological elements, and further discussion in relation to oculomotor control in general, could broaden the impact of the findings.

      We thank the reviewer for the helpful comments. In the Discussion, we have now considered additional factors that could have contributed to the observed attentional effects. First, the exogenous cue might have functioned as a temporal warning signal. However, the interval between cue and stimulus onset was fixed across trials, meaning that the cue did not provide temporal information beyond what participants could already anticipate. Furthermore, participants completed a large number of trials (≥ 4000), making it highly likely that the temporal relationship between trial onset and target onset was overlearned. These considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions.

      Another possibility is that the 100% validity of the exogenous cue could potentially have promoted endogenous attentional engagement. Yet, several characteristics of our task strongly limited the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to the observed attentional benefits in our task.

      Regarding the points on statistical reporting and participant details, we followed the reviewer’s suggestions by adding post hoc power analyses and providing more comprehensive reporting of the linear model outputs (see Appendices 1 and 2). We also expanded the description of the training procedures conducted with participants prior to formal data collection in the Methods section.

      We appreciate the reviewer for raising the important question of how our findings may relate to oculomotor control. To address this, we analyzed trials excluded from the manuscript due to saccades. This analysis revealed that saccade latencies were shorter in the valid condition than in the neutral condition (see Figure 2 — Supplementary Figure 2). This earlier saccade onset may reflect exogenously triggered preparatory activity in the oculomotor system in response to the salient cue. Future studies are needed to examine whether this preparatory mechanism serves to efficiently guide microsaccades or saccades toward behaviorally relevant stimuli in everyday vision. We have incorporated this point into the Discussion, highlighting a potential mechanistic link between exogenous attention and oculomotor behavior.

      Reviewer #2 (Public review):

      Summary:

      This study aims to test whether foveal and non-foveal vision share the same mechanisms for endogenous attention. Specifically, they aim to test whether they can replicate at the foveola previous results regarding the effects of exogenous attention for different spatial frequencies.

      Strengths:

      Monitoring the exact place where the gaze is located at this scale requires very precise eyetracking methods and accurate and stable calibration. This study uses state-of-the-art methods to achieve this goal. The study builds on many other studies that show similarities between foveal vision and non-foveal vision, adding more data supporting this parallel.

      Weaknesses:

      The study lacks a discussion of the strength of the effect and how it relates to previous studies done away from the fovea. It would be valuable to know if not just the range of frequencies, but the size of the effect is also comparable.

      We thank the reviewer for raising these important issues. In response, we have expanded the Discussion to link our findings to prior work. First, we included a direct comparison of our effect sizes with those reported in previous studies. This analysis revealed that our effect sizes are highly comparable to those earlier studies (see Figure 3 — Supplementary Figure 4). Second, we contextualized our findings within the popular framework of normalization model of attention in the Discussion. We detected a mixture of contrast and response gain effects, consistent with predictions from the normalization framework given our experimental design. Finally, we extended the Discussion to consider potential underlying neural mechanisms. Specifically, we suggested that differences in attentional modulation, particularly the manifestation in response gain vs. contrast gain between the fovea and extrafovea, may reflect distinct characteristics of foveal neurons relative to those in extrafoveal regions.

      Reviewer #3 (Public review):

      Summary:

      This paper explores how spatial attention affects foveal information processing across different spatial frequencies. The results indicate that exogenously directed attention enhances contrast sensitivity for low- to mid-range spatial frequencies (4-8 CPD), with no significant benefits for higher spatial frequencies (12-20 CPD). However, asymptotic performance increased as a result of spatial attention independently of spatial frequency.

      Strengths:

      The strengths of this article lie in its methodological approach, which combines a psychophysical experiment with precise control over the information presented in the foveola.

      Weaknesses:

      The authors acknowledge that they used the standard approach of analyzing observeraveraged data, but recognize that this method has limitations: it ignores the uncertainty associated with parameter estimates and the relationships between different parameters of the psychometric model. This may affect the interpretation of attentional effects. In the future, mixed-effects models at the trial level could overcome these limitations.

      We thank the reviewer for this comment. Our Methods section continues to transparently discuss these limitations, as well as the fact that these limitations are shared with most published studies in psychophysics. Additionally, we now include measures of uncertainty for all key effects (see Appendices 1 and 2), and we have reported effect sizes throughout the Results section. Finally, we have added post hoc power analyses to the Methods. Following previous approaches to power calculation for related experiments, we found that our study was sufficiently powered to detect the main effect of attention and had moderate power to detect the interaction between attention and spatial frequency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manipulation of attention raises some interpretive concerns. Since only valid and neutral cue conditions were included, the results might reflect differences in temporal predictability rather than true spatial reorienting of attention. In other words, the valid cue could act mainly as a temporal warning signal that reduces uncertainty about stimulus onset. Without invalid trials or a non-predictive control cue, it remains difficult to separate spatial and temporal contributions to exogenous attention.

      We thank the reviewer for raising this point. In this regard, we would like to clarify that there was no temporal uncertainty in stimulus onset: across all conditions and trial types, the stimulus was presented at the same time relative to the start of the trial, i.e., 600 ms after the start. Yet, we acknowledge that the shorter temporal proximity between the cue and stimulus in valid trials could serve as an additional temporal warning signal, potentially conferring an advantage relative to the neutral condition. While we cannot completely rule out a contribution of such temporal cueing within the constraints of the current experimental design, we believe its impact was limited. Specifically, the fixed cue-stimulus interval reduced the cue’s ability to convey additional temporal information. Furthermore, observers completed a large number of trials (≥4000), and the temporal contingency between trial onset and target onset was likely overlearned. Taken together, these considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions. We now mention this in the revised Discussion (lines 309-318).

      We recognized that the original Figure 2 illustrating the experimental paradigm may have caused confusion regarding the timing structure of the task. We have therefore updated the figure to more explicitly illustrate the trial timeline in both conditions.

      (2) The reported effects seem small, and no power analysis is provided. With only seven participants, the study may not have enough statistical power to confirm that the observed differences are reliable or generalizable. Although the technical precision in gaze and stimulus control is impressive, it cannot offset the limitations of a small sample. The authors should include effect size estimates, confidence intervals, and ideally a post-hoc power analysis.

      The statistical results are reported only as χ² values from model comparisons, which do not show the direction or size of the effects. For clarity and transparency, these tests should be accompanied by fixed-effect estimates with their standard errors and confidence intervals, so readers can better assess both the reliability and perceptual relevance of the findings.

      The reviewer raised several important points regarding the study's statistical rigor.

      In the revised manuscript, we now report effect size estimates (Cohen’s d) in the Results section and Appendices. Effect sizes were in the medium-to-large range, including the effect of attention on contrast sensitivity at 4 and 8 CPD, and the difference in attentional benefit on contrast sensitivity between 4 and 12 CPD and between 8 and 12 CPD. We have also included the full model outputs, including standard errors and confidence intervals, in the Appendices.

      The sample size for the current study was determined based on the magnitude of the attentional effects observed in our previous work (Guzhang et al., 2021). The experimental design and dependent measures were highly similar across the two studies, and the prior study revealed a robust effect, which accounted for a substantial proportion of within-observer variance in a tightly controlled repeated-measures design.

      We have revised the manuscript, adding bootstrap-based power estimates, following the procedure described by Jigo and Carrasco (2020), using data from Guzhang et al. (2021). Assuming the effect size in our current study would be comparable to the prior one, 2 to 12 observers were randomly sampled with replacement, and a one-way repeated-measures ANOVA with attention as the main factor was used. This procedure was repeated 10,000 times, and power was estimated as the proportion of iterations yielding a significant main effect for each sample size. The results of this analysis indicate that a sample size of five observers would have been sufficient to achieve approximately 80% power to detect the main effect of attention in the prior study. Based on these estimates, the sample size used in the current study (seven observers) is adequately powered.

      We also conducted a post hoc power analysis to evaluate the power of our design to detect the main effects and their interaction. It was performed using the R package simr, which estimates statistical power for mixed-effects models through model-based simulation. Specifically, simr generated datasets based on the fixed- and random-effect structure of the fitted model, preserving the observed effect sizes and variance components. For each simulated dataset, the model was refit, and the effect of interest was tested. By repeating this procedure 501 times across different sample sizes, power was estimated as the proportion of simulations in which the effect was statistically significant. Based on these post hoc simulations, we estimated that our study had high power (>95%) to detect the main effects and moderate power (>65%) to detect the interaction. Although the estimated power for the interaction was lower than for the main effects, the observed effect size was substantial (as indexed by Cohen’s d), indicating that the interaction was not trivially small.

      We now describe these analyses in lines 501-532 in the Methods section.

      (3) The task seems quite demanding, requiring fine spatial discrimination, very small stimuli, and head stabilization with a bite bar. It is not clear whether participants were naïve or experienced observers. If they had prior psychophysical training, practice effects could have influenced the results, particularly given the lack of invalid trials. The manuscript would benefit from clarifying participants' experience level and describing any training or familiarization procedures.

      We appreciate the reviewer’s concern regarding potential training effects. All observers had prior experience with similar tasks, but were naïve to the scope of this study. Each participant underwent an initial familiarization phase of approximately 50 trials with the experimental setup of this study. They then completed an additional ~50 trials to estimate their individual contrast thresholds per spatial frequency level before we proceeded with data collection at the five predefined contrast levels.

      Based on our experience, we have found that, for experiments similar to the one described here, observers quickly adapt to the setup and are generally able to maintain reliable fixation and stable performance, even during the initial training phase. In addition, each participant completed approximately 400 trials before the data collection started. Even observers who began the session with no prior experience would have become practiced with the setup by the time the actual data-collection phase started, during which ~4000 trials were collected per observer. Therefore, whether an observer participated in previous experiments is unlikely to meaningfully affect the results, as the large number of trials ensures comparable levels of task familiarity across individuals.

      Crucially, valid and neutral trials were interleaved throughout the session. Any general learning or practice would therefore influence both conditions equally. Despite this, we still observed clear performance improvements in the valid condition relative to the neutral condition, indicating that the observed benefits cannot be attributed solely to practice and reflect an attentional enhancement. We have added elaboration on the training procedures in Methods (lines 411-429).

      Finally, we recognize that the lack of invalid trials may raise concerns given our 100% spatially predictive cue, as noted in Reviewer 3’s first comment. We refer the reader to our response to that point for a more detailed discussion of cue validity and the distinction between exogenous and endogenous influences in our paradigm.

      (4) The study would benefit from a clearer connection between the behavioral results and possible underlying neural mechanisms. How might the observed changes in contrast sensitivity relate to known physiological processes at the retinal, thalamic, or cortical level? The discussion could be strengthened by framing the findings within established models of attentional modulation or by referring to known effects of attention in the early visual cortex.

      This is an important point, and we agree that framing the findings within established models of attentional modulation can strengthen the discussion. We believe that the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) offers a useful framework for interpreting our behavioral findings, especially the attention-related changes in contrast sensitivity and asymptotic performance observed at the foveal scale. We have now added a more detailed discussion linking our results to this model and considering, explicitly as speculation, how known physiological processes at different stages may contribute to the observed effects in Discussion (lines 264-307).

      (5) The ecological relevance of the results is not fully developed. The authors propose that the observed effects may resemble natural attentional shifts triggered by salient events, yet the brief, highly localized flashes used here are somewhat artificial. A more likely interpretation is that these mechanisms relate to oculomotor control within the fovea, perhaps reflecting preparatory activity for microsaccades or fine fixation adjustments. Considering this view could broaden the impact of the findings and link them to current discussions on the relationship between attention and oculomotor control.

      We thank the reviewer for raising this important point regarding the ecological relevance of our findings, which we did not sufficiently address in the original manuscript. Although we briefly motivated scenarios that engage exogenous attention at high spatial resolution, such as detecting road signs or traffic lights at a distance while driving, we did not fully elaborate on how such attentional processes may link to downstream visual and oculomotor functions.

      In our experiment, observers maintained fixation and avoided saccades throughout the trial. Nevertheless, in a subset of trials (on average 17% ± 3%), observers made saccades after stimuli disappeared and prior to providing a response. Typically, these movements were microsaccades with amplitudes smaller than 0.5°, directed toward the target location, in both valid and neutral trials. These saccades were discarded prior to the analyses performed in the manuscript. Inspired by the reviewer’s feedback, we decided to examine the saccade latency in these trials relative to the onset of the response cue to assess whether exogenous cueing influenced oculomotor timing. Notably, we observed an earlier onset of microsaccades in valid compared to neutral trials (71 ms ± 50 ms faster, P < 0.01). We have now added this observation as Figure 2 — Supplementary Figure 2 in the manuscript. Because the presence of an exogenous pre-cue was the only difference between the two trial types, the earlier microsaccade onset likely reflects exogenously triggered preparatory activity in the oculomotor system in response to the salient pre-cue. Such fine-grained attention may prime potential eye movements toward behaviorally relevant stimuli for further examination. This interpretation is consistent with the reviewer’s suggestion and supports a mechanistic link between exogenous attention and oculomotor behavior, extending the ecological relevance of our findings. This point has been added to the Discussion on lines 329 to 340.

      We also conducted analysis to examine ocular drift behavior following the response cue. Although trials included in the manuscript analyses were constrained such that fixation during target presentation remained within a small window (10’ radius) around the fixation marker, we did not assess whether gaze subsequently drifted closer to the target location after the response cue. One possibility is that exogenous attention might bias ocular drift, shifting the preferred locus of fixation closer to the target. To address this, we computed the average Euclidean distance between gaze position and the target location following response cue onset for valid and neutral trials. However, we found no significant difference in gaze-target distance between valid and neutral trials (p = 0.57).

      Although the spatial cueing approach has long been used to probe exogenous attention in a controlled manner in psychophysical experiments, we fully recognize the importance of understanding attention under more naturalistic viewing conditions that allow observers to freely move their eyes. Developing paradigms that incorporate more naturalistic, salient stimuli would be an important direction for future work, enabling investigation of exogenous attention in ecologically valid settings and its influence on sequential actions and processes, including oculomotor behavior.

      (6) There is no statement about the availability of the data and code used for the experiment.

      We have now added the data and code for the analysis pipeline to the Open Science Framework (OSF).

      Reviewer #2 (Recommendations for the authors):

      (1) The study could discuss the strength of the effect and how it relates to previous studies.

      We thank the reviewer for raising this point. To facilitate direct comparison with the study by Jigo and Carrasco (2020), we computed attentional benefit as the ratio of contrast sensitivity between the valid and neutral conditions (now shown in Figure 3 — Supplementary Figure 4). In their data, the attentional benefit at 0° eccentricity peaked just below 4 CPD, with a ratio of approximately 1.2, corresponding to a ~20% increase in contrast sensitivity. This magnitude closely matches the benefit we observed for fine-grained attentional shifts within the foveola at spatial frequencies between 4 and 8 CPD (17% ± 12% and 16% ± 14% for 4 and 8 CPD, respectively). We have added this comparison to the Discussion (lines 246-262).

      In addition, we acknowledge that prior studies have reported heterogeneous attentional effects, including pure contrast gain, pure response gain, or a mixture of the two. We now explicitly reference these findings in the Discussion and use the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) to account for how differences in stimulus configuration, attention field size, and eccentricity may account for discrepancies between our findings and prior studies examining attention in the extrafovea or when broadly distributed across the fovea (lines 264-307).

      (2) Minor details:

      (a) The abstract mentions gaze-contingent-display, but if I understand correctly, the stimulus was not presented in a gaze-contingent manner.

      That’s correct. Although stimuli were not presented gaze-contingently, we used a gaze-contingent calibration procedure (see Methods, lines 386-389) to achieve higher precision in localizing the line of sight. This increased accuracy was essential for selecting trials in which stimuli remained at the intended eccentricity relative to the preferred locus of fixation. To avoid potential confusion, however, we have removed this detail from the abstract.

      (b) Line 361: What is the manual calibration the authors are referring to? It does not appear to be described.

      The text has been updated to explain more explicitly what auto and manual calibrations are.

      (c) Line 402: There may be a typo towards the end of the line "t0" should be "to"?

      Text has been updated. Thank you.

      (d) Line 405. What are the units of 30?

      It’s in arcminutes. Text has been updated.

      Reviewer #3 (Recommendations for the authors):

      I found this paper very interesting, with a solid methodological approach and excellent data analyses. The authors present a well-designed psychophysical study that contributes valuable insights into the mechanisms of attention in the foveola. The methodology is rigorous, and the analyses are thoughtfully conducted and clearly presented.

      That said, I would like to offer a few comments and suggestions for clarification and further consideration:

      (1) Exogenous attention:

      If a 100% spatially predictive cue is compared to a neutral cue, the observed attentional effect should not be described as (purely) exogenous, since the cue fully predicts where the post-cue will request a response. This situation represents a case in which attention is exogenously driven but endogenously maintained (see e.g., Chica et al., 2013, Behavioural Brain Research). I recommend clarifying this distinction in the manuscript (and title) to avoid conceptual ambiguity.

      We thank the reviewer for raising this important conceptual point. We agree that because the pre-cue was 100% spatially predictive, the resulting attentional allocation cannot be considered purely exogenous. Although the abrupt, salient onset of the cue obligatorily triggers an exogenous shift of attention, its validity could also promote endogenous maintenance of attention at the cued location. Yet, several characteristics of our task strongly limit the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to perceptual encoding in our task.

      We also considered the possibility that our response cue (a retro-cue indicating the target location) might recruit endogenous attention to the internal perceptual representation. Importantly, however, this retro-cue was equally informative in valid and neutral conditions. Any enhancement driven by the retro-cue should therefore benefit both trial types to the same extent. The fact that we still observe a robust advantage in valid trials supports the conclusion that the performance improvements predominantly reflect fast, spatially specific exogenous facilitation rather than slower endogenous processes.

      We have revised the manuscript to clarify that although the cue obligatorily triggers an exogenous attentional shift, its 100% validity could allow for endogenous attention maintenance as shown by Chica et al. (2013). We also added an explanation detailing why such endogenous contributions are unlikely to drive our main results, given the rapid cue-target timing in our task in Discussion (lines 319-327). Finally, to further prevent ambiguity, we updated the manuscript title to refer to “exogenously triggered attention,” rather than simply “exogenous attention.”

      (2) Interpretation of statistical effects:

      The statement "Therefore, asymptotic performance showed only independent, additive effects of frequency and attention, without a systematic influence of spatial frequency on the attentional benefit" seems not to be supported by the data, as the main effect of frequency was not significant.

      We thank the reviewer for this helpful observation. We agree that the original phrasing did not accurately reflect the results, as the main effect of spatial frequency was not significant (p = .0545). We have revised the sentence to “Therefore, asymptotic performance reflected an effect of attention alone, with no detectable contribution of spatial frequency or of the interaction between spatial frequency and attention” to avoid implying such an effect (lines 210-211).

      If data from two participants were missing in one condition, the authors should consider replacing this data with new participants.

      We agree with the reviewer that having two observers with missing data in one condition is not ideal. However, the 20 cpd condition was deliberately positioned near the resolution limit at the tested eccentricity and was therefore extremely demanding. Observers also had to monitor two stimulus locations simultaneously, further increasing task difficulty. This condition was challenging for all observers and, despite testing up to the highest contrast, two of seven observers were unable to perform above chance, indicating that for a non-trivial fraction of observers, this condition was effectively unmeasurable with our paradigm. As noted in the manuscript, the 20 cpd condition also has a statistical limitation: thresholds clustered near the upper bound (approaching 100% contrast), compressing the dynamic range and markedly reducing variance relative to lower spatial frequencies, which violates the homoscedasticity assumption of linear models. For these reasons, we did not pursue additional data collection in this condition. Nevertheless, we report the data that were successfully obtained, as they remain informative about performance near the resolution limit.

      We finally note that even when setting aside the 20 CPD condition, our data support this conclusion: comparisons between 4 and 12 CPD, as well as between 8 and 12 CPD, revealed large differences in the magnitude of the attentional benefit (d = 0.65, 95% CI [0.11, 1.18] and d = 0.62, 95% CI [0.08, 1.14], respectively). To further quantify these effects, we have added Cohen’s d to report the effect sizes for these spatial-frequency comparisons across texts in Results as well as in tables in Appendices.

      (3) Sample size:

      As this is a psychophysical experiment with many trials and few participants, I am curious about how the authors determined the appropriate sample size and the number of trials required to detect the expected effects. Given that many effects were found to be significant, it seems that statistical power was adequate; however, it would be helpful if the authors could explain how this issue was addressed a priori during experimental planning.

      We appreciate that the reviewer raised this point. Please see the reply to the second point from Reviewer 1, who raised a related question about statistical power.

      (4) Figure 2 clarification:

      In Figure 2B, I do not fully understand the "Valid" and "Neutral" representation. Both conditions include a post-cue indicating the right position; however, in the neutral condition, there is a central fixation square, whereas in the valid condition, there is not. Please clarify this aspect of the figure. I think I understood the paradigm, but this part of the figure is misleading.

      Precue only exists in valid condition. But there is a mistake where fixation marker is missing in valid condition in panel B.

      We thank the reviewer for pointing this out. We have updated Figure 2 to explicitly show the sequence of valid vs. neutral trials. The fixation mark remained on the screen throughout the trial in both the valid and neutral conditions. After a 500 ms fixation period, an exogenous cue was presented for 30 ms in valid trials, followed by a 70 ms interval before stimulus onset. In neutral trials, no cue was presented, and the screen remained blank for 100 ms before the stimuli appeared. In conditions, a response cue would appear 50 ms after stimulus offset.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary:

      In this manuscript, the authors used a leucine/pantothenate auxotrophic strain of Mtb to screen a library of FDA-approved compounds for their antimycobacterial activity and found significant antibacterial activity of the inhibitor semapimod. In addition to alterations in pathways, including amino acid and lipid metabolism and transcriptional machinery, the authors demonstrate that semapimod treatment targets leucine uptake in Mtb. The work presents an interesting connection between nutrient uptake and cell wall composition in mycobacteria.

      Strengths:

      (1a) The link between the leucine uptake pathway and PDIM is interesting but has not been characterized mechanistically. The authors discuss that PDIM presents a barrier to the uptake of nutrients and shows binding of the drug with PpsB. However it is unclear why only the leucine uptake pathway was affected.

      We observe interference of L-leucine, but not of pantothenate, uptake in mc2 6206 strain upon semapimod treatment. At present, we do not have any clue whether PDIM presents a barrier exclusively to the uptake of L-leucine. Further studies may shed a light on underlying mechanism(s) by which L-leucine uptake is modulated by this small molecule.

      (1b) We still do not know what PpsB actually does for amino acid uptake - is it a transporter?

      By BLI-Octet we do not find any interaction between L-leucine and PpsB. Therefore, we doubt that PpsB is a transporter of L-leucine.

      (1c) Does semapimod binding affect its activity?

      Our study suggests that semapimod treatment alters PDIM architecture which becomes restrictive to L-leucine. However, at present the exact mechanism is not clear. Further studies are required to thoroughly examine the effect of semapimod on Mtb PpsB activity and alterations in PDIM by mass spectrometry.

      (1d) Does the auxotrophic Mtb have lower PDIM levels compared to wild-type Mtb?

      As per the published report by Mulholland et al, and by vancomycin susceptibility phenotype in our study, both the strains appear to have comparable PDIM levels.

      (2) The authors show an interesting result where they observed antibacterial activity of semapimod against H37Rv only in vivo and not in vitro. Why do the authors think this is the basis of this observation? It is possible semapimod has an immunomodulatory effect on the host since leucine is an essential amino acid in mice. The authors could check pro-inflammatory cytokine levels in infected mouse lungs with and without drug treatment.

      Semapimod inhibits production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6, which would indeed help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth.

      (3) The authors show that the semapimod-resistant auxotroph lacks PDIM. The conclusions would be further strengthened by including validations using PDIM mutants, including del-ppsB Mtb and other genes of the PDIM locus, whether in vivo this mutant would be more susceptible (or resistant) to semapimod treatment.

      PDIM is a virulence factor, and plays an important role in the intracellular survival of the TB pathogen. Mtb strains lacking PDIM are expected to show attenuated growth during infection, even without semapimod treatment. In such a case, it might be difficult to draw any conclusions about the effect of semapimod against PDIM(-) strains in vivo.

      (4) Prolonged subculturing can introduce mutations in PDIM, which can be overcome by supplementing with propionate (Mullholland et al, Nat Microbiol, 2024). Did the authors also supplement their cultures with propionate? It would be interesting to see what mutations would result in Semr strains with propionate supplementation along with prolonged semapimod treatment. 

      Considering the fact that extensive subculturing may result in loss of PDIM, we avoided prolonged subculturing of bacteria. As presented in Fig. 6b, the WT bacteria retain PDIM. While performing the initial screening of drugs, we did not anticipate such phenotype, and hence bacteria were cultured in regular 7H9-OADS medium without propionate supplementation.

      A comprehensive future study would help examining the effect of propionate on generation of semapimod resistant mutants in Mtb mc2 6206.

      Weaknesses:

      I have summarized the limitations above in my comments. Overall, it would be helpful to provide more mechanistic details to study the connection between leucine uptake and PDIM.

      Reviewer #2 (Public review): 

      Summary

      This important study uncovers a novel mechanism for L-leucine uptake by M. tuberculosis and shows that targeting this pathway with 'Semapimod' interferes with bacterial metabolism and virulence. These results identify the leucine uptake pathway as a potential target to design new anti-tubercular therapy. 

      Strengths

      The authors took numerous approaches to prove that L-leucine uptake of M. tuberculosis is an important physiological phenomenon and may be effectively targeted by 'Semapimod'. This study utilizes a series of experiments using a broad set of tools to justify how the leucine uptake pathway of M. tuberculosis may be targeted to design new anti-tubercular therapy.

      Weaknesses

      (1) The study does not explain how L-leucine is taken up by M. tuberculosis, leaving the mechanism unclear. Even though 'Semapimod' binds to the PpsB protein, the relevant connection between changes in PDIM and amino acid transport remains incomplete.

      While Leucine uptake involves specific transporters in other bacteria, such transport system is not known in Mtb. By screening small molecule inhibitors, we came across a molecule, semapimod, which selectively kills the leucine auxotroph (mc2 6206), but not the WT Mtb. To understand the underlying mechanism of differential susceptibility of the WT and auxotrophic strains to this molecule, we evaluated the effect of restoration of leuCD and panCD expression on susceptibility of the auxotrophic strain to semapimod. Interestingly, our results demonstrated that upon endogenous expression of leuCD genes, mc2 6206 strain becomes resistant to killing by semapimod. In contrast, no effect of panCD expression was observed on semapimod susceptibility of mc2 6206. These findings were further substantiated by gene expression analysis of semapimod treated mc2 6206, which exhibits differential regulation of a set of genes that are altered upon leucine depletion in Mtb as well as in other bacteria. Overall results thus provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph.

      To further gain mechanistic insights into the effect of semapimod on leucine uptake in Mtb, we generated the semapimod resistant strain which exhibits point mutation in 4 genes including ppsB. Interestingly, overexpression of wild-type ppsB, but not of other genes, restored susceptibility of the resistant bacteria to semapimod. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As mentioned above, we anticipate that semapimod treatment brings about certain modifications in PDIM which becomes more restrictive to L-leucine. A comprehensive future study will be helpful to examine the effect of semapimod on Mtb physiology.

      (2) Also, the fact that the drug does not function on WT bacteria makes it a weak candidate to consider its usefulness for a therapeutic option.

      We agree that semapimod is not an appropriate drug candidate against TB owing to its inhibitory effect on production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6 that help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth. Therefore targeting L-leucine uptake can be a novel therapeutic strategy against TB.

      Reviewer #3 (Public review): 

      (1) Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly, however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival, and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium depleted of leucine. 

      The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.

      As mentioned above, overall results from this study provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As hitherto mentioned, it appears that semapimod treatment brings about certain modifications in PDIM which becomes restrictive to L-leucine. Future studies are required to gain detailed mechanistic insights into the effect of semapimod on Mtb physiology.

      (2) Since leucine uptake and PDIM synthesis are important concepts of the manuscript, experiments would benefit from exploring other BCAAs to know if the phenotypes observed are specific to leucine, and adding additional strains to the 2D TLC experiments to provide confidence in the absence of the PDIM band.

      We thank the peer reviewer for this suggestion. We would be happy to analyse the effect of semapimod on the level of other amino acids including BCAA by mass spectrometry.

      (3) The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or disrupted cell wall (PDIM synthesis), testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. H37Rv is still able to synthesize endogenous leucine and is able to circumvent the effect of semapimod.

      We thank the peer reviewer for this suggestion. We would explore the possibility of analysing the effect of increasing concentrations of BCAAs on mc2 6206 susceptibility to semapimod.

      Recommendations for the authors:

      (1A) Intracellular leucine can decrease from:

      inhibition of transport/uptake via semapimod as the authors claim or

      decreased uptake/requirement of many metabolites due to cells entering static growth arrest from challenge by semapimod

      To rule out the growth-inhibitory effect of semapimod on L-leucine uptake, we estimated intracellular L-leucine in Mtb after brief exposure of 24 hours to 50ng/ml semapimod (kindly refer Materials and Methods). We confirmed that 24 hours of treatment with 50ng/ml semapimod does not cause cells entering static growth arrest.

      (1B) increased consumption/utilization of leucine for some programmed response to semapimod challenge

      Our results show reduced expression of genes involved in leucine catabolism such as accD1, bkdA and bkdB in semapimod-treated cells, and thus the above hypothesis seems unlikely.

      (1C) Additional metabolites should be measured to determine the specificity of the semapimod challenge.

      As mentioned below, we measured intracellular valine in the semapimod-treated Mtb 6206 by LC-MS/MS, which shows no change in its level. These observations thus corroborate a specific effect of semapimod on L-leucine level in the cell.

      (2) The effect of Semapimod on L-leucine uptake is largely based on indirect evidence, without showing reduced transport of the amino acid. Gene expression data is not enough to prove that the amino acid transport is blocked. More compelling evidence is required to confirm this mechanism.

      The authors could perform leucine uptake assays to directly confirm the functioning of Semapimod, inhibiting L-leucine transport. Another possibility would be to try out measuring intra-bacterial leucine levels for drug-treated versus untreated M. tuberculosis strains.

      Data presented in the Fig. 3b shows lesser intracellular L-leucine upon semapimod treatment; in contrast, Sem<sup>R</sup> strain exhibits ~3-fold more intracellular L-leucine, as estimated by mass spectrometry (kindly refer our response to comment #6 below). Together, these observations indicate an inhibitory effect of semapimod on L-leucine uptake by the auxotroph.

      (3) The authors show that the overexpression of leuC-leuD restores Semapimod resistance in the auxotroph (Figs. 3C-3E). Is it possible to examine Semapimod resistance of WT-H37Rv or the complemented mutant grown in leucine-limiting conditions? This sort of evidence will be more direct on the specific drug-target beyond the auxotroph (mc<sup>2</sup> 6206).

      Because endogenous L-leucine synthesis pathway is functional in WT-H37Rv, as well as complemented auxotrophic strain, leucine-limiting conditions are unexpected to yield any effect on susceptibility to semapimod.

      Author response image 1.

      (4) Biolayer Interferometry (BLI) shows Semapimod binds to PpsB (Fig. 6); however, there is no clear evidence that it disrupts PDIM synthesis. More direct evidence would be to study the effect of Semapimod on a ppsB mutant (may be a knock-down). This would prove the specificity of Semapimod for PpsB. Likewise, it would be worth looking into the effect of Semapimod using mutant M. tuberculosis defective for PDIM synthesis.

      As recommended by the peer reviewer, we created the ppsB knockdown strain in the Mtb mc2 6206 by CRISPRi and examined its vulnerability to semapimod treatment. As can be seen in the Author response image 1, ppsB KD strain shows lesser susceptibility to semapimod when compared with the pDcas9-control strain which exhibits significant growth inhibition on the 7H11-OADS-PL agar plate containing 200nM semapimod.

      (5) Metabolomics experiments would benefit from including other control BCAAs like isoleucine and valine to determine if decreased intracellular levels of leucine are specific to semapimod or a general consequence of growth arrest from an antimicrobial agent.

      As suggested by the reviewer, we measured intracellular valine as well as proline levels in the semapimod-treated Mtb 6206 by LC-MS/MS; data presented in the supplimentry figure 5 clearly show no change in their levels upon semapimod treatment.

      (5) Figure 3c, pyrazinamide susceptibility assay could be included on the panCD strain to ensure complementation leads to functional panCD. Parent strain would be resistant to PZA, complement strain would be susceptible. (doi: 10.1038/s41467-019-14238-3).

      The wild-type Mtb 6206 is unable to grow in the absence of pantothenate. We verified resumption of growth of Mtb 6206 in 7H9-OADS-L-leucine medium lacking pantothenate upon PanCD overexpression, which provides more direct evidence of the expression of functional copies of panCD genes.

      (6) does the Sem-R mutant have increased levels of leucine?

      As can be seen in the supplimentry figure 7, Sem<sup>R</sup> strain shows ~3.0 fold increase in the intracellular L-leucine level when compared with the WT strain. In contrast, a comparable level of another BCAA– valine, is observed in both the strains

    1. In summary, we find striking mathematical similarity among animal ‘song’ vocalizations and human musical sounds regarding the stability of CES, along with some well-defined differences across evolutionary distant taxa that evolved singing behavior independently (i.e. anurans, birds, primates)

      I don't think you have yet shown enough to make this claim. The paper would greatly benefit from the use of null models. Given that all the data used in the paper share informational structure (i.e., they're way detectable in some way as "songs" to humans), are the relationships you identify actually surprising? It's very hard to gauge this without a null, either empirical or simulated, to compare to.

      Similarly, it's impossible to know whether the comparisons are fair without a deeper examination of how parameter choice (e.g., window sampling size) may differentially affect the CES estimation across song types. Even if you find significant similarity between human/animal song times compared to a null, how will you know if this isn't a product of bias in your CES estimation?

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Sun et al. generated germline-specific cKO mice for the Znhit1 gene and examined its effect on male meiosis. The authors found that the loss of Znhit1 affects the transcriptional activation of pachytene. Znhit1 is a subunit of the SRCAP chromatin remodeling complex and a depositor of H2AZ, and in cKO spermatocytes, H2AZ is not deposited into the gene region. The authors claim that this is why the PGA was not activated. These findings provide important insights into the mechanisms of transcriptional regulation during the meiotic prophase.

      Strengths:

      The authors used samples from their original mouse model, analyzing both the epigenome and the transcriptome in detail using diverse NGS analyses to gain new insights into PGA. The quality of the results appeared excellent.

      Weaknesses:

      Overall, the data is inconsistent with the authors' claims and does not support their final conclusions. In addition, the sample used may not be the most suitable for the analysis, but a more suitable sample would dramatically improve the overall quality of the paper.

      Thank you for your comprehensive summary of our study and your thoughtful insights into its strengths and weaknesses. We greatly appreciate this valuable feedback, which helps us further improve our work. Below, we provide a detailed response addressing each of the points you raised.

      Reviewer #1 (Recommendations For The Authors):

      Major revisions:

      Surprisingly, many genes were upregulated in the scRNA-seq results. How many XY genes are included? Discuss why many genes are up-regulated in Fig. 5E whereas bulk RNA-seq showed only 70 genes were down-regulated. Since apoptosis-related factors are up-regulated in Fig5E, could these up-regulated genes be due to the high content of the transcriptome of dead cells? As you know, cell death starts, but randomly and violently disrupts the transcriptome, so we think it is not desirable to analyze the transcriptome with dead cells in the mix. Describe this point appropriately in the text or generate new data without dead cells.

      We sincerely appreciate the reviewer’s critical points. Below, we address each point sequentially:

      (1) To address the question about XY-linked genes, we utilized scRNA-seq data to identify differentially expressed sex chromosome genes in spermatocytes at different stages. Our analysis revealed an aberrant activation of XY-linked genes relative to controls. Specifically, 120 XY-linked genes were aberrantly activated in zygotenestage spermatocytes, and 119 XY-linked genes showed aberrant activation in pachytene-stage spermatocytes (revised Fig. 4F). This observation directly indicates that Znhit1 knockout impairs Meiotic Sex Chromosome Inactivation (MSCI), a finding that aligns with our prior characterization of XY chromosome synapsis defects in Znhit1-deficient spermatocytes.

      (2) Two key reasons explain the discrepancy between scRNA-seq and bulk RNA-seq results:

      First, scRNA-seq employs a more permissive threshold for identifying DEGs (log2 fold change [log2FC] = 0.25), thereby enhancing sensitivity to subtle expression changes and enabling the detection of more upregulated genes. In contrast, bulk RNAseq uses a stricter threshold (log2FC = 1), which filters out these subtly upregulated transcripts, resulting in fewer DEGs overall.

      Second, scRNA-seq can capture cell subset-specific differential expression. In contrast, bulk RNA-seq averages signals across mixed cells, masking such subsetspecific expression changes.

      These clarifications have been included in the Data Analysis section of the revised manuscript.

      (3) We fully agree with the reviewer’s concern that dead cells could confound transcriptomic analyses. Before downstream analysis, we excluded non-viable cells via stringent QC: cells with mitochondrial RNA (mtRNA) content exceeding 15% were removed, as high mtRNA content is a well-established marker of cell death or compromised viability. To further validate that upregulated genes were not driven by dead cell contamination, we analyzed the correlation between the expression of apoptosis-related genes and mtRNA fractions in our data. This analysis revealed no significant correlation (Pearson correlation coefficient, r = -0.02; please see Author response image 1). These results collectively rule out dead cell transcriptome contamination as the primary cause of the observed gene upregulation.

      Author response image 1.

      Scatter Chart showing the Pearson correlation between apoptosisrelated genes and mitochondrial RNA fractions in scRNA-seq data.

      Line 280-286: The data in Figures 7I and J are confusing: as shown by KAS-seq, it is natural that ssDNA is not formed in the promoter region in Znhit1-cKO sample because transcription does not proceed, but why is ssDNA formed in the enhancer region in the first place in control and then lost in Znhit1-cKO sample? Generally, it is said that in the enhancer region, including the super-enhancer region, doublestranded DNA is not dissociated, thus not forming ssDNA. Discuss why the loss of ssDNA in the enhancer region affects transcription with appropriate citations. Also, show whether genes downstream of the missing ssDNA in the promoter region have abnormal transcriptional activity, along with the RNA-seq data. Furthermore, in the region shown in Figure 7I, why the chromatin is even more open, as shown by ATACseq in Znhit1-cKO. Discuss whether this is related to transcriptional progression or aberrant substitution with H2A. If the function of ZNHIT1 is to replace H2A with H2AZ for PGA, it is not necessary to show the H2A level in Znhit1-cKO.

      We appreciate the reviewer’s constructive comments.

      (1) ssDNA dynamics in enhancer regions: Emerging evidence demonstrates that active enhancers undergo transient DNA unwinding to form ssDNA, a process critical for transcriptional regulation by transcribing enhancer RNAs (eRNA). KAS‑seq is sufficiently sensitive to detect ssDNA in enhancer regions (Kim et al., 2010; Wu et al., 2020). It has been shown that H2A.Z (deposited by the ZNHIT1-SRCAP complex) is required for maintaining enhancer accessibility and dynamic unwinding (Sporrij et al., 2023). In this study, we found that Znhit1 deletion and defective H2A.Z incorporation impaired enhancer ssDNA formation, indicating that ZNHIT-H2A.Z plays an important role in the activity of both promoter and enhancer.

      (2) Impact of ssDNA loss on transcription: To address how missing ssDNA affects transcriptional activity, we further analyzed changes in KAS‑seq signals following Znhit1 knockout. Overall, KAS‑seq signals were significantly reduced upon Znhit1 depletion, confirming that Znhit1 is essential for ssDNA formation. Further examination of KAS‑seq signals at promoters of downregulated genes also revealed reduced signals (revised manuscript, Fig. S8). In contrast, KAS-seq signals of upregulated genes remained relatively low and showed no changes in both the control and knockout groups, and their upregulation probably results from indirect regulation. These results underscore the importance of ZNHIT1-mediated chromatin states in regulating ssDNA formation and gene expression.

      (3) Aberrant chromatin openness in Znhit1-cKO (ATAC-seq): The increased chromatin accessibility detected by ATAC-seq likely represents a disorganized, nonfunctional state rather than productive transcriptional openness. H2A.Z normally constrains chromatin dynamics to facilitate ordered transcriptional regulation (Cole et al., 2021); its absence in Znhit1-cKO leads to higher ATAC-seq signals, suggesting that this aberrant openness fails to support proper assembly of the transcriptional machinery.

      Minor revisions:

      Line 106. The text says that they looked for chromatin factors, but the legend says that they looked for epigenetic factors. The text must be consistent.

      We have corrected it in the revised manuscript (line 801).

      Line 107. Although it is stated that the transcriptional data published here were used, it appears from the cited references that they are scRNA-seq data. A clear explanation is required in the text or legend.

      We have revised this data as scRNA-seq data (line 107).

      Line 141-143: Using TUNEL analysis in Figure 4F, the authors show that Znhit1cKO testis cells contain many dead cells. Describe the type or stage of the apoptotic cells.

      We appreciate the reviewer’s suggestion. Specifically, we performed TUNEL staining on testes isolated from P14 mice, a critical time point for pachytene development (revised Fig. 2D). We tested this by showing that apoptosis-related genes were significantly upregulated in pachytene-stage spermatocytes in scRNA-seq data (revised Fig. 4D). To further validate this observation, we performed scRNA-seq from P35 testis samples. The results revealed a significant reduction in late pachytene-stage spermatocytes in Znhit1-cKO samples (revised Fig. 2F), consistent with apoptotic loss of pachytene cells. Collectively, these data confirm that Znhit1 knockout impairs pachytene-stage spermatocyte development.

      The authors claimed that the loss of Znhit1 lowers the transcription of a group of genes involved in homologous recombination, including Rnf212, causing a delay in homologous recombination; however, if the process of homologous recombination is delayed, homologous chromosome pairing and synapsis are affected unless DSB repair is completed. Provide a satisfactory explanation for the fact that DNA damage remains on autosomes despite complete synapsis, as shown in Figure 3C, which is likely not solely due to delayed homologous recombination.

      Thank you for this insightful comment. We fully agree that persistent autosomal DNA damage cannot be explained solely by delayed homologous recombination. To resolve this question, we further analyzed autosomal synapsis through SYCP1 and SYCP3 staining. While autosomal synapsis appeared morphologically complete, we identified subtle but significant synapsis defects in autosomal terminal regions (revised Fig. 3A). This suggests that Znhit1 knockout also results in autosomal synapsis defects. We speculate that these synapsis defects are associated with the unresolved autosomal DNA damage we observed.

      Lines 150-163. With regard to XY unpairing in Znhit1-cKO pachytene spermatocytes, there is insufficient discussion as to whether this is due to transcriptional aberrations.

      Thank you for highlighting the need to link transcriptional aberrations to XY unpairing in Znhit1-cKO pachytene spermatocytes. To address this, we analyzed sex chromosome transcription using scRNA-seq data. Relative to controls, 120 XYlinked genes were aberrantly activated at zygotene, and 119 were upregulated at pachytene in Znhit1-cKO spermatocytes (revised Fig. 4F), directly demonstrating Znhit1 knockout disrupts Meiotic Sex Chromosome Inactivation (MSCI). Given that intact MSCI is required to stabilize XY synapsis in pachytene spermatocytes, we conclude that the observed XY unpairing is likely a direct consequence of these sex chromosome transcriptional abnormalities. We add this information to the revised manuscript (lines 221-226).

      Line 187-194. Analysis of the scRNA-seq data is shown in Figure 4, but it lists several genes as stage-specific markers, some of which do not have well-understood meiotic functions. Please cite a reference paper that provides sufficient evidence to qualify this stage.

      In response to this comment, we have refined the presentation of marker genes used for cell annotation (revised Fig. S4B). We have incorporated relevant references supporting their utility as stage-specific markers for the meiotic stages (line 187).

      Line 225-233: If Znhit1 is important for H2AZ deposition and regulates PGA through it, how does it regulate HR-related genes that are expressed earlier through H2AZ deposition during the pachytene stage? For example, Rnf212 is not specifically expressed during the pachytene stage but is one of the targets of MEIOSIN, so it is expressed at an earlier stage.

      Thank you for this insightful comment. We fully acknowledge the reviewer’s key observation that HR-related genes such as Rnf212 are MEIOSIN targets that initiate transcription at earlier meiotic stages, before the pachytene stage. Our stage-resolved scRNA-seq data further showed that the expression of Ccnb1ip1 and Rnf212 was significantly upregulated from zygotene to pachytene, following their initial transcriptional onset. We next showed that the loss of H2A.Z deposition induced by Znhit1 deletion specifically impaired this pachytene-specific secondary transcriptional activation, rather than the early MEIOSIN-driven expression onset (please see Author response image 2).

      Author response image 2.

      Plots showing the expression level of indicated genes in scRNAseq data.

      Line 245-251: As shown in Figure 6E, more than 14,000 genes have H2AZ peaks. In contrast, only approximately 60% of the genes downregulated by Znhit1-cKO appeared to be directly affected by H2AZ. Are the remaining 40% of genes regulated in a different way that is not mediated by H2AZ? Also, only a few percent of the genes with H2AZ peaks are affected, but why are only genes with A-MYB involvement affected, as shown in Figure 7?

      Thank you for these insightful and constructive comments. For the ~40% of downregulated genes not directly linked to H2A.Z, they were likely regulated through indirect mechanisms. H2A.Z deposition mediated by ZNHIT1 may influence upstream transcriptional regulators (e.g., transcription factors or coactivators), whose dysregulation in turn affects these genes.

      The selective effect of H2A.Z loss on A-MYB target genes is explained by the strict context-dependent function of H2A.Z, which requires stage-specific partner transcription factors to exert its regulatory activity. During the zygotene-to-pachytene transition, A-MYB acts as the master regulator of pachytene gene activation and forms a functional collaborative complex with H2A.Z to drive target gene transcription. Disrupted H2A.Z deposition upon Znhit1 deletion specifically impairs the activity of this A-MYB-H2A.Z complex, leading to selective downregulation of A-MYB targets. Other H2A.Z peak-associated genes may rely on alternative cofactors and compensatory mechanisms.

      Line 245-256: Figures 6 and F show that the localization of H2AZ is reduced in Znhit1-cKO mice, which means that no substitution with H2A occurs. If so, show it in the data because the localization of H2A should be increased compared to that in the control.

      To clarify the status of H2A, we have now detected immunofluorescent staining against H2A. While H2A.Z deposition was clearly impaired following Znhit1 deletion, the global level of H2A did not change significantly (Author response image 3). We speculate that this observed absence of a compensatory increase in H2A is likely due to the intrinsically low abundance of the histone variant H2A.Z relative to canonical histone H2A under physiological conditions.

      Author response image 3.

      Immunostaining of SYCP3 and H2A in spermatocyte testis sections of control and Znhit1-sKO mice, Scale bar, 40 μm.

      Reviewer #2 (Public Review):

      Summary:

      The study demonstrates that Znhit1 regulates male meiosis, with deletion causing pachytene failure associated with defective expression of pachytene genes and subtle effects on X-Y pairing and DSB repair. The authors attribute this phenotype to the defective incorporation of the Znhit1 target H2A.Z into chromatin.

      Strengths:

      The paper and the figures are well presented and the narrative is clear. Evidence that the conditional deletion strategy removes Znhit1 is strong, with multiple orthogonal approaches used. Most of the meiotic phenotyping is well performed, and the omics analysis clearly identifies a dramatic effect on the meiotic gene expression program. The link to H2A.Z and A-MYB adds a mechanistic angle to the study.

      Weaknesses:

      (1) Current literature demonstrates that meiotic mutants arrest at one of two stages: midpachytene (stage IV of the seminiferous cycle) or metaphase I (stage XII of the seminiferous cycle). This study documents that in the Znhit1 KO the midpachytene marker H1t appears normally, but that cells arrest before diplotene. If this is true, then arrest must occur during late pachytene, which based on my knowledge has never been documented for a meiotic KO. To resolve this, the authors should present stronger histological substaging evidence to support their claim.

      Thank you for this insightful and constructive comment. To achieve highresolution tracking of cell lineage progression, we performed scRNA-seq analysis using P35 testes in this revised manuscript. scRNA-seq data showed that germ cells normally progressed through all meiotic stages and successfully gave rise to spermatids in control groups. By contrast, in the Znhit1 knockout group, late pachytene spermatocytes decreased significantly, and only very few subsequent germ cell types were observable (revised Fig. 2F, G). In scRNA-seq data, although very few diplotene spermatocytes and meiotic metaphase I cells were detectable, these cells still appeared abnormal, as evidenced by their extremely low Pou5f2 expression. We have revised our description of the meiotic arrest stage in the manuscript.

      (2) The authors overlooked the possible effects of Znhit1 deletion on MSCI. Defective MSCI is a well-established cause of pachytene arrest. Actually, the fact that they see X-Y pairing failure should alert them even more strongly to this possibility because MSCI failure is often associated with defective X-Y pairing. This could be easily addressed by examination of their RNAseq data.

      To address the concern that Znhit1 deletion may impact Meiotic Sex Chromosome Inactivation (MSCI), we analyzed XY-linked gene expression using scRNA-seq data from spermatocytes at distinct stages. Our analysis revealed aberrant activation of XY-linked genes in Znhit1-CKO spermatocytes relative to controls. Specifically, 120 XY-linked genes were activated at zygotene, and 119 XY-linked genes were upregulated at pachytene (revised Fig. 4F). This observation directly demonstrates that Znhit1-CKO impairs MSCI, which aligns with our prior characterization of defective X-Y chromosome synapsis in Znhit1-deficient spermatocytes. To explicitly resolve this concern, we have integrated these MSCIfocused RNA-seq analyses into the revised Results section (lines 221-226).

      (3) The recombination assays need attention.

      In the text the authors state that they studied RPA2 and DMC1, but the figures show RPA2 and RAD51.

      The RPA counts are not quantitated.

      The conclusion that crossover formation fails (based on MLH1 staining) is not justified. This marker does not appear in wt males until late pachytene, so if cells in this mutant are dying before that stage, MLH1 cannot be assessed.

      The authors state that gH2AZ persists in the KO, but I'm not convinced that they are comparing equivalent stages in the wt and KO. In Figure 3C, the pachytene cell is late, whereas in the mutant the pachytene cell is early or mid (when residual gH2AX is expected, even in wt males).

      Previous work (PMID: 23824539) has shown that antibodies reportedly detecting pATM in the sex body are non-specific. I therefore advise caution with the data shown in Figure 3D.

      We appreciate the reviewer’s detailed feedback on our recombination assays and have addressed each concern as follows:

      (1) Discrepancy between text and figures (RPA2/DMC1 vs. RPA2/RAD51): We have corrected this in the revised manuscript.

      (2) Quantitation of RPA2 foci: We have supplemented quantitative analysis of RPA2 foci (revised Fig. S3).

      (3) Conclusion on crossover failure: Single-cell RNA sequencing data from P35 testes definitively confirmed that Znhit1 knockout spermatocytes successfully progressed to the late pachytene stage, ruling out the possibility that our MLH1 staining results are confounded by cell death or arrest before this critical stage. In addition, analysis of transcriptome datasets revealed significant downregulation of important genes required for homologous recombination and crossover formation, including Ccnb1ip1 and Rnf212. Reduced expression of these essential factors may impair the assembly of MLH1 crossover foci. These data demonstrate that ZNHIT1 is essential for proper homologous recombination and crossover formation during male meiosis. We have revised the text to emphasize this context.

      (4) γH2AX persistence and stage matching: We have replaced the images with more representative, stage‑matched pachytene spermatocytes from wild‑type and Znhit1‑KO mice (revised Fig. 2C). Furthermore, prompted by the insightful comment from Reviewer 1, we carefully re‑examined autosomal synapsis and identified abnormal synapsis specifically at the terminal regions of autosomes in Znhit1‑deficient spermatocytes (revised Fig. 3A). These data together confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.

      (5) pATM staining issue: Following the reviewer’s advice, we carefully reviewed the relevant literature (PMID: 23824539) and confirmed that the anti‑pATM antibody may exhibit non‑specific staining on the XY chromosomes. Accordingly, we have removed the pATM staining data presented in Figure 3D from the revised manuscript to ensure the accuracy and rigor of our results.

      (4) RNAseq data. The authors show convincingly that Znhit1 activates genes that are normally upregulated at the zyg-pachytene transition. They should repeat the analysis for genes normally upregulated at the prelep- lep and lep-zyg transition to show that this effect is really pachytene-gene specific.

      We appreciate this suggestion. To clarify the stage specificity of ZNHIT1’s regulatory role, we analyzed genes upregulated at the prelep-lep and lepzyg transitions. Our results showed that Znhit1 knockout had little impact on the overall expression levels of these genes (as shown in revised Fig. 4B). In contrast, as we previously reported, genes upregulated at the zygotene-pachytene transition were remarkably downregulated in Znhit1-cKO. These findings further confirm the specificity of ZNHIT1 in regulating pachytene gene expression.

      (5) I am puzzled that the title and overall gist of the study focuses on H2A.Z, when it is Znhit1 that has been deleted.

      We appreciate the reviewer’s observation and have revised the study title as suggested. Specifically, the title is now updated to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis.”

      Reviewer #3 (Public Review):

      Summary:

      Sun et al. present a manuscript detailing the phenotypic characterization of loss of Znhit1 in male germ cells. Znhit1 is a subunit of the chromatin regulating complex SRCAP that functions to deposit the histone variant H2A.Z. Given that meiosis, and specifically meiotic recombination, occurs in the context of the dynamic condensing of chromosomes, the role of chromatin regulators in general, and histone variants specifically, in mammalian meiosis is an active area of research. Previous work has shown that H2A.Z is found at the locations of recombination in plants, although H2A.Z was previously not found at recombination sites in mammalian meiosis. Here the authors use a conditional approach to ablate Znhit1 in spermatocytes and characterize a block in meiosis in prophase I in the transition from pachytene to diplotene stage.

      Strengths:

      The authors combine current methods in immunohistochemistry and functional genomics to provide strong evidence of meiotic block upon the loss of Znhit1. They find that loss of Znhit1 leads to reduced incorporation of the histone variant H2A.Z, specifically at promoters and enhancers. Further, RNA sequencing found more genes are down-regulated upon loss of Znhit1 compared to upregulated, suggesting that incorporation of H2A.Z is critical for the expression of genes necessary for successful meiotic progression.

      A strength of the manuscript is tying the locations of changes in H2A.Z deposition with binding of the transcription factor A-MYB, providing a mechanism that can potentially combine the changes in chromatin regulation with variable binding of a transcription factor in gene expression in pachytene stage spermatocytes.

      Weaknesses:

      A weakness in the single-cell RNA experiment using cells from 16-day-old male mice. The authors suggest that the rationale for the experiment was to determine where the Znhit1-sKO mutant showed an arrest in meiosis, and claim that this is the pachytene stage. However, in the 'first wave' of meiosis 16-day-old mice are just beginning to enter pachytene, so cells from later meiotic stages will be largely absent in these tubules. This is clear from the UMAP showing a similar pattern of cell distributions between wild-type and mutant mice. Using older mice would have better demonstrated where the mutant and wild-type mice differ in cell-type composition.

      We appreciate the reviewer’s constructive comment. To resolve this issue, we have added new scRNA‑seq data from testes of P35 mice, which harbor a full spectrum of meiotic stages, including late pachytene, diplotene, metaphase I spermatocytes, and post-meiotic spermatids. Compared with wild-type controls, Znhit1-sKO testes exhibited a marked reduction in late pachytene spermatocytes and a near-complete loss of post-pachytene cell types, directly validating the pachytenestage meiotic arrest (revised Fig. 2F, G). All updated analyses have been integrated into the manuscript to strengthen our conclusions.

      The authors use the term pachytene genome activation (PGS) in the manuscript to suggest a novel process by which genes are specifically increased in expression in the pachytene stage of meiotic prophase I, without reference to literature that establishes the term. If the authors are putting forward a new concept defined by this term, it would strengthen the manuscript to describe it further and delineate what the genes are that are activated and discuss potential mechanisms.

      We appreciate the reviewer’s valuable feedback on our use of the term "pachytene genome activation (PGA)".

      To address this, we have revised the text to explicitly frame PGA as a stage-specific transcriptional program observed in our data, defined by the coordinated upregulation of a distinct set of genes during the pachytene stage of meiotic prophase I.

      (1) Definition and Gene Set: Using the scRNA-seq dataset, we formally defined PGA as the transcriptional wave characterized by genes with increased expression in pachytene vs. zygotene spermatocytes (n = 1,560 genes). Functional enrichment analysis shows these genes are primarily involved in DNA repair, cilium organization, and spermatid development (Table S3), consistent with the biological process of germ cell development.

      (2) Relationship to existing literature: While PGA as a term is not widely established, our data align with prior observations of pachytene-specific transcriptional upregulation (Alexander et al., 2023; Ernst et al., 2019; Turner, 2015). Importantly, Alexander et al reveals that in late meiotic stages, starting from pachynema, chromatin has a ~3-fold increase in transcription. We have added these citations to clearly illustrate the relevant advances in the field (lines 68-71).

      (3) Regulation of pachytene-stage gene expression: We further delineate that PGA is regulated by ZNHIT1-dependent H2A.Z deposition. Znhit1 deletion resulted in significant downregulation of 70.1% (1,094 out of 1,560) of these genes. This links PGA to chromatin-based regulation, where ZNHIT1-dependent H2A.Z deposition enables pachytene-specific transcription.

      Generally speaking, the authors present solid evidence for a pachytene block in male germ cell development in mice lacking Znhit1 in spermatocytes. The evidence supporting a change in gene expression during pachytene, that more genes are downregulated in the mutant compared to increased expression, and changes in histone modification dynamics and placement of H2A.Z all support a role in alterations in meiotic gene regulation. However, the support that changes in H2A.Z impacting meiotic recombination (as suggested in the manuscript title) is less supported, rather than a general cell arrest in the pachytene stage leading to cell death. The conclusions around the role of Znhit1 influencing meiotic recombination directly could use further justification or mechanistic hypothesis.

      We acknowledge the reviewer’s comments. Indeed, existing data support the presence of a pachytene block in spermatocytes of Znhit1-deficient mice, along with aberrant pachytene gene expression and impaired H2A.Z deposition.

      In response, we made the following revisions: (1) we adjusted the manuscript title and conclusion to reduce emphasis on a direct H2A.Z-recombination link, and focus instead on ZNHIT1/H2A.Z in pachytene gene regulation and meiotic progression; (2) recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery (lines 314-319).

      Reviewer #3 (Recommendations For The Authors):

      Quality of the images for meiotic spreads - images have low contrast and are tiny. It is difficult to see the SYCP3 results even when the images are magnified on the computer screen.

      We have provided new images with high resolution to ensure a clear visualization of SYCP3 signals.

      Line 165 - indicates the results for DMC1, although the figure suggests the results are for RAD51 foci.

      We have corrected this mistake.

      Line 306 - this manuscript 'confirms' that H2AZ is not found at mammalian recombination sites, a result already in the literature.

      We have corrected this mistake (lines 309-312).

      Reviewing Editor Comments:

      Major points and revisions highlighted by the reviewers:

      (1) Meiotic prophase in Znhit1KO: The main questions to clarify are the stage and status of progression, the analysis of apoptosis, and the consequences of gene expression on the X and Y. Additional analysis for DSB repair foci, gH2AX is also required. Those analysis are needed to answer to reviewer 2. Even if H2AZ was not detected at recombination hotspots, it may be possible that it plays a role in DSB repair but the level is too low for detection. This should be discussed as H2AZ was shown to be involved in DNA repair.

      We sincerely appreciate the reviewing editor’s constructive comments.

      (1) Stage and progression of meiotic prophase: We supplement P35 testes for scRNAseq. Results confirmed Znhit1-KO spermatocytes arrest at late pachytene, and postpachytene stages (diplotene, metaphase I) were nearly absent (revised Fig. 2F, G).

      (2) Apoptosis analysis: We studied this by demonstrating that apoptosis-related genes were upregulated in pachytene spermatocytes at the single-cell level (revised Fig. 4D). To further validate this finding, we performed scRNA-seq analysis on P35 testis samples. Our results revealed a marked reduction in late pachytene spermatocytes in Znhit1-cKO testes (revised Fig. 2F, G), consistent with apoptotic depletion of pachytene-stage cells. Together, these data confirm that Znhit1 ablation impairs pachytene-stage spermatocyte development.

      (3) X/Y gene expression consequences: To address this key point, we performed stage-resolved analysis of XY-linked gene expression using scRNA-seq data from different-stage spermatocytes. Compared with controls, we detected aberrant ectopic activation of XY-linked genes in Znhit1-KO spermatocytes: 120 XY-linked genes were inappropriately activated at zygotene, and 119 remained abnormally upregulated at pachytene (revised Fig. 4F). These results provide direct evidence that Znhit1 deletion impairs Meiotic Sex Chromosome Inactivation (MSCI).

      (4) DSB repair issue: We have replaced the images with more representative, stage‑matched pachytene spermatocytes (revised Fig. 3C). The revised images show consistently increased γH2AX signals in Znhit1-KO spermatocytes. Prompted by Reviewer 1’s comment, we identified abnormal synapsis at autosomal terminal regions in mutant cells. Together, these results confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.

      (5) Potential role of H2A.Z in DSB repair: Though H2A.Z was nearly undetectable at recombination hotspots, we discuss two possibilities: (1) ZNHIT1-H2A.Z depletion dysregulated DSB repair-related genes; (2) Current ChIP-seq sensitivity may miss low-abundance H2A.Z at hotspots, which could support repair via chromatin remodeling. Future high-resolution assays (super-resolution imaging, DSB-targeted ChIP-seq) are proposed to validate this. We agree that recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery.

      (2) Gene expression analysis. The first consequence of H2AZ depletion is gene expression downregulation. However, it may be not surprising that some genes are down and others upregulated. There are likely secondary and indirect effects including the upregulation of some genes. The authors should explain and discuss this point such as to answer to questions raised by reviewer 1 and 2.

      The primary consequence of H2A.Z depletion in pachytene spermatocytes is indeed widespread downregulation of genes. For the coexistence of upregulated genes, we explain this via three key points.

      (1) Technical differences between scRNA-seq and bulk RNA-seq (addressing Reviewer 1): scRNA-seq captures cell-type-specific differentially expressed genes that bulk RNA-seq masks (bulk averages signals across mixed cells, hiding changes in rare subsets). Additionally, scRNA-seq uses a lower log2(fold change) threshold (0.25 vs. 1 in bulk RNA-seq), detecting subtle upregulations missed by bulk analysis.

      (2) No dead cell contamination (addressing Reviewer 1): Stringent quality control excluded cells with >15% mitochondrial RNA. Apoptosis-related genes showed no significant correlation with mitochondrial RNA fractions (Pearson correlation coefficient, r = -0.02; please see Author response image 1), ruling out dead cell transcriptome interference.

      (3) Secondary/indirect effects (addressing Reviewers 1 & 2): Upregulated genes likely result from indirect regulatory cascades. H2AZ depletion may disrupt upstream transcription factors, leading to compensatory upregulation of their downstream genes or cell stress responses to meiotic arrest. Notably, Znhit1 knockout specifically impacts genes upregulated at the zygotene-pachytene transition, while genes upregulated at preleptotene-leptotene or leptotene-zygotene transitions remain largely unaffected (revised Fig. 4B), confirming the specificity of H2A.Z’s direct regulatory role and framing upregulation as non-targeted indirect effects.

      (3) The authors should also test the effect of Znhit1KO on the 1196 genes (up PreL/L) and 1325 (up L/Z) as shown in Figure 5D for the PGA. Also in Figure 5B, there is no evaluation of the statistical significance of the variation, this should be revised. X and Y genes should be analysed. KAS-Seq should be correlated with gene expression analysis, and several points as mentioned in the reviews below should be better explained and discussed.

      (1) Effect of Znhit1-KO on PreL/L- and L/Z-upregulated genes: we analyzed the 1196 genes upregulated at the PreL/L transition and 1325 genes upregulated at the L/Z transition. Znhit1 knockout had minimal effect on the expression of these early meiotic gene sets (revised Fig. 4B), whereas genes activated at the zygotene‑pachytene transition were strongly downregulated in Znhit1-KO spermatocytes. These results confirm the specific role of ZNHIT1 in regulating pachytene‑stage gene expression. We have also added a statistical evaluation for the variation shown in Fig. 4B.

      (2) X/Y-linked gene analysis: Analysis of stage‑resolved scRNA‑seq revealed aberrant ectopic activation of 120 XY‑linked genes at zygotene and 119 at pachytene in Znhit1-KO spermatocytes (revised Fig. 4F), demonstrating impaired Meiotic Sex Chromosome Inactivation (MSCI).

      (3) KAS-seq correlation with gene expression: We analyzed the link between KAS‑seq signals and gene expression, and we found that Znhit1 depletion caused a global reduction in KAS‑seq signals, especially at promoters of downregulated genes (revised Fig. S8). Genes with increased expression showed low KAS‑seq signals in both control and mutant groups, likely reflecting indirect regulation. These results highlight the essential role of ZNHIT1 in transcriptional regulation.

      (4) The title should refer to Znhit1, and the effect on meiotic recombination activities may be an indirect consequence of prophase progression arrest, even if some recombination genes are downregulated. This point is important as noted by reviewer 3.

      We fully acknowledge Reviewer 3’s key point and have revised the manuscript title to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis” to reduce emphasis on a direct H2A.Z-recombination link.

      Regarding meiotic recombination activities: The downregulation of recombinationrelated genes (e.g., Ccnb1ip1, Rnf212) stems from impaired pachytene-stage transcriptional programs caused by ZNHIT1-dependent H2A.Z deposition defects, which in turn leads to prophase progression arrest. Thus, the observed recombination abnormalities may be a secondary consequence of the meiotic prophase arrest, rather than a direct regulatory effect of ZNHIT1 on recombination machinery. This clarification has been integrated into the Discussion section (lines 314-318).

      (5) The recent structural analysis of SRCAP should be cited: Yu et al. Cell Discovery (2024) 10:15 https://doi.org/10.1038/s41421-023-00640-1.

      We have cited this reference in this revised manuscript (lines 234-236).

      (6) The authors should read and answer the specific revisions asked for by the reviewers.

      We have thoroughly read and systematically addressed all specific revisions requested by Reviewers 1, 2, and 3, as detailed in the revised manuscript and supplementary data.

      References

      Alexander, A.K., Rice, E.J., Lujic, J., Simon, L.E., Tanis, S., Barshad, G., Zhu, L., Lama, J., Cohen, P.E., and Danko, C.G. (2023). A-MYB and BRDT-dependent RNA Polymerase II pause release orchestrates transcriptional regulation in mammalian meiosis. Nature communications 14.

      Cole, L., Kurscheid, S., Nekrasov, M., Domaschenz, R., Vera, D.L., Dennis, J.H., and Tremethick, D.J. (2021). Multiple roles of H2A.Z in regulating promoter chromatin architecture in human cells. Nature communications 12, 2524.

      Ernst, C., Eling, N., Martinez-Jimenez, C.P., Marioni, J.C., and Odom, D.T. (2019). Staged developmental mapping and X chromosome transcriptional dynamics during mouse spermatogenesis. Nature communications 10, 1251.

      Kim, T.K., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, M., Barbara-Haley, K., Kuersten, S., et al. (2010). Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182-187.

      Sporrij, A., Choudhuri, A., Prasad, M., Muhire, B., Fast, E.M., Manning, M.E., Weiss, J.D., Koh, M., Yang, S., Kingston, R.E., et al. (2023). PGE(2) alters chromatin through H2A.Z-variant enhancer nucleosome modification to promote hematopoietic stem cell fate. Proceedings of the National Academy of Sciences of the United States of America 120, e2220613120.

      Turner, J.M. (2015). Meiotic Silencing in Mammals. Annu Rev Genet 49, 395-412. Wu, T., Lyu, R., You, Q., and He, C. (2020). Kethoxal-assisted single-stranded DNA sequencing captures global transcription dynamics and enhancer activity in situ.

      Nature methods 17, 515-523.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Al Asafen and colleagues apply a set of scanning fluorescence correlation spectroscopic approaches (Raster Image Correlation Spectroscopy (RICS), cross-correlation RICS, and pair-correlation function spectroscopy) to address the nuclear-cytoplasmic kinetics of the Dorsal (Dl) transcription factor in early Drosophila embryos. The Toll/Dl system has long been appreciated to establish dorsal-ventral polarity of the embryo through Tolldependent control of Dl nuclear localization, and provides an example of a morphogen gradient produced with high enough precision to yield robust biophysical measurements of general transcription factor activity and function. By measuring GFP-tagged Dl protein, either in wild-type embryos or in mutant embryos with low/medium/high levels of Toll signaling, the authors report diffusivity of Dl in nuclear and cytoplasmic compartments of the embryo, as well as the fraction of mobile and immobile Dl, which can be correlated with DNA binding through cross-correlation RICS. A model is presented where Cactus/IkB is implicated in preventing Dl from binding to DNA.

      Strengths:

      The experiments on wild-type GFP-tagged Dorsal are performed well, are mostly reported well, and are interpreted fairly.

      Weaknesses:

      The discrepancy between experiment and theory as pertains to Michaelis-Menten kinetics is not fully motivated in the text, and could benefit from a more clear presentation. The experiments performed to distinguish between the contribution of Toll-dependent phosphorylation and Cactus interaction models for limiting Dorsal DNA binding are possibly confounded by the presence of wild-type, GFP-tagged Dorsal protein.

      Thank you for your thoughtful feedback. Regarding the discrepancy between experiment and theory in relation to Michaelis-Menten kinetics, we recognize that our initial explanation may not have been explicit enough. Our intent was to illustrate that if DNA binding is a saturable process, then while the absolute concentration of Dl bound to DNA will increase with total Dl levels, the fraction of Dl bound to DNA will decrease. We used Michaelis-Menten kinetics only as a familiar example to convey this concept but did not intend to suggest that the system strictly follows Michaelis-Menten behavior. To clarify this point, we removed mention of Michaelis-Menten as an illustrative analogy and stuck specifically with discussing the system as “saturating.” This primarily affected text in the paragraph starting on Line 204, but also Lines 323-325.

      Regarding the concern about potential confounding effects due to the presence of wildtype GFP-tagged Dorsal (Dl[wt]-GFP): we understand the importance of addressing this point more directly. Therefore, we have imaged the Dorsal-GFP gradient in embryos expressing the UAS-dl[S280P]-GFP or the UAS-dl[S317A]-GFP constructs in the absence of the BAC-recombineered Dl-GFP construct. In both cases, the dl mutants by themselves were not able to recapitulate enough of the Dl gradient to test our hypotheses. We have added this analysis to Supplemental Figure 4 and mentioned this figure on Lines 333-336 and 354-358. Furthermore, we explicitly mention that it is possible the reason why we failed to reject the null hypothesis in the Toll phosphorylation mutant case may be due to the additional copy of Dl[wt]-GFP (the BAC recombineered construct), with text added to Lines 343-345, 365-369 (Results) and 408-418 (Discussion).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Al Asafen, Clark et al., use fluorescence correlation spectroscopy (FCS) to quantitatively analyze the mobility of Dl along the DV axis of the early Drosophila embryo. Dl is essential for dorsal-ventral (DV) patterning and its gradient initiates the activation of several genes and thereby orchestrates the formation of the Drosophila body plan. While the mechanisms underlying the formation of the Dl gradient have been extensively studied by this group and others, there are some observations for which there is not yet a mechanistic explanation. For example, the peak of the Dl gradient grows continuously during nuclear cycles 10-14. This is likely due to Cact-dependent Dl diffusion and Dl binding to DNA. However, the biophysical parameters governing Dl nuclear dynamics that would support these claims have not been previously measured. In this work, the authors provide evidence that GFP-tagged Dl may be separated into a mobile pool and an immobile pool. Interestingly, the fraction of immobile Dl is position-dependent along the DV axis, revealing more binding to DNA in the ventral than in the dorsal nuclei. This is either due to higher binding affinity in ventral locations (due to Toll-dependent Dl phosphorylation) or to higher Dl-Cact binding in dorsal nuclei that would prevent Dl from binding to DNA. Using dl-mutant alleles, the authors support the latter hypothesis.

      Strengths:

      The manuscript is well written and their conclusions are convincingly supported by their methodology and analysis. As a quantitative study, the biophysical analysis seems rigorous, in general.

      Although this is not the first study that employs FSC to investigate the dynamics of a morphogen, it further exemplifies how these quantitative tools can be used to uncover mechanistic aspects of morphogen dynamics during development. In particular, the manuscript reports novel biophysical parameters of Dl dynamics that will be helpful in future hypotheses-driven modeling studies.

      Weaknesses:

      In my opinion, the main weakness of the manuscript is that the main biological implication of the study, namely that the asymmetry in the fraction of immobile Dl is a result of nuclear Dl-Cact binding which prevents Dl from binding DNA (Figure 5), occurs in a region of the embryo where there is very little Dl anyways (Figure 1A, 5A). While it is interesting that the fraction of immobile Dl increases (just a little, but significantly) in dorsal nuclei in mutants expressing a form of Dl with reduced Cact binding it is unclear what is the biological impact of this effect in a location where Dl is nearly absent. As can be seen in Figure 3F, the fraction of immobile is unaffected in Dl-mutant forms with reduced DNA binding, because it is already very low. It is unlikely that Dl binding to Cact in dorsal nuclei would affect shuttling as well since the fraction is very low anyway.

      We thank the reviewer for pointing out the places where we could strengthen our explanations. Here we first address the criticism, also raised by the other reviewer, that the fraction of immobile Dl increases only a small amount (Fig. 5A). [In our reply to the next comment, we address the question of biological implications.] We attempted to explain this small effect size in the manuscript; however, we understand that we could clarify further and, given the fact that eLife has no restraints on space, we added more explanation in the main text.

      In essence, even though the effect was statistically significant, the effect size was small because the mutation was “diluted” by the presence of a wildtype Dl protein tagged with GFP. We were willing to deal with this dilution because the alternative was that, according to previous literature, without any wildtype Dl, no Dl gradient would be present in the reduced Toll phosphorylation mutants, and only a very weak Dl gradient (weakened on both ends) would be present in mutants that reduced Cact binding. We were confident that, with our quantitative approaches, we would be able to detect the diluted effect.

      However, because both reviewers have criticized this diluted effect, in this resubmission, we have included analysis of GFP-tagged mutants without the presence of wildtype Dl protein. Unfortunately, these embryos lack a discernible Dl gradient and cannot be analyzed in such a way as to test the hypotheses that the mutants were generated for.

      Even so, the effect of the Cact-binding mutant was strong enough that we were able to statistically distinguish it from embryos expressing only wildtype Dl-GFP, even with the dilution effect. On the other hand we have also included a caveat that our failure to statistically distinguish Toll phosphorylation mutants from wildtype may be due to the dilution effect. We now also explicitly state the concerns about a lack of a discernible Dl gradient and have included figures of full mutants in the supplement. See also our discussion of Reviewer 1’s similar comment.

      While the authors have a very clear understanding of the biology of the Dl gradient, I feel that the manuscript is more written as a 'tools' paper (i.e., to exemplify how FSC methods and analysis can be used for biological discovery). This is ok, but I think that the authors should discuss further what are the biological implications of these findings other than the contribution to uncovering the biophysical parameters.

      Here we underscore the biological implications of our discovery that Cact is present in the nucleus on the dorsal side. The reviewer mentioned that Cact in the nucleus on the dorsal side appears to have little overall effect, because this is the location of the embryo where there is very little Dl in the first place, which raises the question of whether this discovery is impactful.

      While we previously used the final paragraph of the discussion to touch on the implications of this discovery, we acknowledge that we could have spent more time on the explanation. As such, we have expanded this final paragraph into two paragraphs. In the first of the two, we discuss in more detail the implications specifically of the Dl/Cact interactions in the dorsal-most nuclei, as understood by the results of this paper. In brief, knowing that Dl in the dorsal-most nuclei is bound by Cact results in an updated understanding of the Dl gradient, with increased dynamic range, robustness, and precision (but unknown shape).

      In the second of the two paragraphs, we discuss this result in light of our recent work on imaging Cact in live embryos, in which we have shown that Cact is present in all nuclei at roughly uniform levels. Taken together, we suggest that it is possible that Cact is bound to Dl in all nuclei (not just the dorsal-most), which would allow us to estimate the shape of the overall Dl gradient by subtracting off the fluorescence that stems from Dl/Cact complex.

      For example, I think that the implications of the rejected hypothesis (i.e., that Tolldependent Dl phosphorylation does not seem to have an impact on Dl binding affinities to DNA) are important and should be further discussed (even if no additional experiments are performed). What is then the role of Dl phosphorylation? Perhaps it could have an impact on patterning robustness in lateral regions. The authors should report in Figure 5 also what happens to the fraction of Dl bound to DNA in lateral regions in the reduced Cact binding and reduced Toll phosphorylation mutants.

      We appreciate the reviewer’s suggestion that the rejection of the hypothesis that phosphorylation of Dl by Toll impacts Dl/DNA binding could be expanded upon further. For the role of Dl phosphorylation by Toll: we previously mentioned that this phosphorylation is known to enhance the nuclear import or retention of Dl, and that mutation of serine 317 to an alanine abolishes Toll-mediated phosphorylation of Dl, which results in embryos with no Dl gradient. We had also mentioned that phosphorylation of Dl is not known to affect its DNA binding, which is the hypothesis we sought to test by creating the dl[S317A]-GFP mutants. We did not image any mutants, or the UAS-dl[wt]-GFP control, in the lateral regions, for two reasons. First, this region is easily the smallest of the three regions, in terms of the percentage of the DV axis (see Fig. 1A). Second, because of the dilution effect, we knew the effect size would be small, and as such, we imaged only on the extreme ends of the gradient so that the most clear conclusion could be drawn about the effect that Toll phosphorylation might have on DNA binding of Dl.

      The way that position along the DV axis is reported using the nuclear-cytoplasmic-ratio (NCR) in Figures 1-3 is not incorrect, but I wonder if it is the best way of doing it. The reason is that it spreads out a relatively small region of the embryo (the ventral-most locations) and shrinks a relatively large region of the embryo (lateral and dorsal regions), see Figure 1A. Perhaps reporting the NCR in log_2 units would be more appropriate.

      We agree that there is some distortion of the relative spatial extents of the Dorsal gradient when NCR is used as an independent variable on a plot. However, we prefer the NCR on the horizontal axis because it is closer the functional variable (Dl concentration, rather than spatial location) for the properties we studied.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I really enjoyed the first part of this paper and have only minor suggestions for improvement of the presentation. I am confused about the experimental approach for the final figure, distinguishing phosphorylation and cactus-dependent effects. I'll divide my comments between "First Part/General Suggestions", "Last Part", and finish with some minor typo observations.

      The gist of the issues with the last part of the paper could boil down to insufficient detail/explanation of the section. The discrepancy with expectation with Michaelis-Menten kinetics is presented in a total of three sentences and is not necessarily obvious to the general readership of eLife. The mutants chosen to distinguish the phosphorylation and cactus mechanisms could be described more (why these? aren't other residues phosphorylated?) and possibly why also having wild-type GFP-Dl in the measurements isn't confounding. Since there is unlimited space in this journal, it may be advisable to use this space to fill out these rationales and ideas.

      First part/General Suggestions:

      (1) For the RICS data, (Figures 1 and 2) there is a nice correlation between WT NC ratio and the selected low/med/hi Dl activity mutants. More-or-less the median values in, say, Figure 1E-G are reflected in Figure 1H. However, with the ccRICS data (Figure 3), it looks like there is less correspondence between the range of fraction bound estimates in, for instance, "ventral" in Figure 3D and '10b' in Figure 3E. Can the authors comment on this? Should the reader be able to make this kind of comparison, or does something about data collection for the wt/NCR measurements preclude direct comparison of magnitudes with the panel of mutants? (imaging setup, laser power, etc)?

      The reviewer is correct that there seems to be a discrepancy in the values of ψ between the wt embryos (ventral side) and the Toll10B embryos. It should be noted that the Toll10B embryos are not “ventral-like” in every way, in part because they have unknown activated Toll levels that might be above or below what is seen at the ventral midline in wildtype embryos, and in part because there is no DV gradient, and thus no shuttling in these embryos that would accumulate total Dorsal on the ventral midline. As such, comparisons between Toll10B embryos and the ventral side of wildtype embryos are not exactly one-toone, and we are more confident in comparing among the mutants in an allelic series. To address this question, we have added a sentence to the end of the second paragraph of the “Dorsal/DNA binding exhibits a spatial gradient” subsection of the Results (Lines 233235).

      (2) Materials and methods: Mounting and imaging of Drosophila embryos: the authors cite the "488 nm laser intensity ranged from 0.5% to 3.0%..." The values presented here are not useful for the general reader or an individual looking to replicate these conditions, as emission power produced from such values will vary from instrument to instrument. It is standard in these cases to report an estimated laser power (measured in watts) for each laser line, and a clear description of how such measurements were made (stationary beam, under scanning conditions, with what detector, etc). These measurements are valuable and the authors are strongly encouraged to report such measurements for their setup.

      We appreciate the reviewer’s suggestion and understand the importance of providing absolute laser power values for reproducibility. We have now included the laser power (in watts) for the laser lines on both microscopes used in this study. The revised text can be found in the Materials and Methods section, in the Lines 535-536 and 540.

      (3) The presentation of the data in Figure 4 is difficult to understand. Are the kymographs (A lower) representing the entire length of the big white arrow in A upper? Or do the dashed lines indicate the x-axis limits of the kymograph? It is difficult to tell from the figure legend, where the dashed lines are described as "areas where Dl-GFP movement is measured out of the nucleus." I believe that the authors can make these measurements and that Figure 4B reflects properties of "movement" of Dl out of the nucleus, but how they get there from these data is not clear to this reader. Perhaps a cartoon explaining the green lines and the orange lines in the kymograph or tightening the legend would help.

      We thank the reviewer for their feedback and understand the need for greater clarity in the text of the pCF section and in Figure 4. The widths of the kymographs in the lower panels correspond to the full widths of the images in the upper panels. The pCF measurements were taken at the y-coordinates at the level of the white arrows. The dashed vertical lines connecting the upper and lower panels illustrate two cases of locations along the x-axis of the image where Dl is crossing from inside a nucleus to outside. In the two illustrated cases, these crossings are accompanied by either zero Dl molecules being observed to cross the nuclear barrier (ventral image/kymograph on left) or delayed crossing of Dl molecules (dorsal image/kymograph on right). To address this concern, we have added more detail to the Fig. 4 legend and greatly expanded on a discussion of what pCF does in the text (the second and third paragraph of the section). We have also updated Fig. 4 to align with new explanations from the text: namely, describing the y-axis of the kymographs as Δt (instead of log(time)) and explicitly showing that the pair correlation is for pairs of pixels that are Δx = 6 pixels apart. Further details were also added to the relevant Methods section.

      (4) DV position in the wild-type imaging experiments is operationally determined through measurement of the Dorsal NC ratio. This makes sense, but the strategy is buried in the first paragraph of the results, and not discussed in the M & M. For readers unfamiliar with imaging the fly embryo or the nuances of the Dl gradient, perhaps a sentence or two explaining that embryos were oriented randomly along the DV axis, and DV positions of the imaging region were estimated by measuring the Dl NC ratio.

      We thank the reviewer for this helpful suggestion. To improve clarity, we have added a description of how DV position was determined to the Materials & Methods section (paragraph starting on Line 520). Specifically, we now state that embryos were randomly oriented along the DV axis and that we used the Dorsal NC ratio of intensity as a proxy for measuring the DV position in imaging experiments. Additionally, we have added a statement to the Results section to ensure that this strategy is more clearly introduced (Lines 143-144). We appreciate this recommendation, as it will help readers unfamiliar with fly embryo imaging better understand our approach.

      (5) It would be nice to report the corresponding NC-ratio values for Dl in each of the mutant conditions, perhaps as a supplement to Figure 1. Currently, Figure 1H relies on the (admittedly well-established) properties of the three mutants, but it feels that an additional nice quantitative link in the data can be drawn out here. Do the authors see the strict correlation between the wt and mutant diffusivity measurements at specific NC-ratios?

      We are hesitant to try to draw direct comparisons between the mutants and the behavior of the wildtype embryo at the corresponding NCR. This is because, in the context of these uniform mutants, the NCR is determined by a combination of at least three factors that we cannot measure or control for: the unknown strength of Toll signaling, the unknown capacity of Toll signaling (ie, the potential saturation of the cytoplasmic enzymes controlled by Toll signaling), and, most importantly, the lack of a shuttling mechanism that concentrates Dl on the ventral side of the embryo. As such, the NCR does not represent a continuous variable that transforms the behavior of one mutant into another (or from mutants into wt DV coordinates), as it does along the DV axis in wildtype embryo. This is why the mutant studies are presented as boxplots. At best, we were comfortable only in using the uniform mutants as an allelic series to produce gross trends. We have added a brief statement describing the shuttling caveat to the Results section (Lines 173-177).

      (6) In the section related to Dl nuclear export, the language used to describe Dl kinetics is ambiguous. The term "movement" is used seemingly as a catch-all for nuclear-importexport as distinguished from diffusion. However, diffusion is also a form of movement. Could this section be reworked to explicitly distinguish nuclear import-export and diffusive movements?

      We appreciate the reviewer’s suggestion and agree that the language used to describe Dl kinetics could be more precise. By way of explanation, the pCF analysis calculates the time scale on which Dl can exit the nucleus. pCF only gives a signal if it sees the same Dl molecule twice, at two different locations after some Δt amount of time has passed. Because of this, if a given Dl molecule in a ventral nucleus is being tracked, then that molecule has some probability that it is bound to DNA initially, which means it will take, on average, longer to exit the nucleus than a Dl molecule not initially bound to DNA. Therefore, on the ventral side, the time scale on which Dl exits the nucleus is longer than on the dorsal side (where DNA binding is not happening). This can be true even if the nuclear export rate constants are the same on the ventral side vs the dorsal side. As such, we were careful to choose language that did not imply that we were talking about a nuclear export rate constant. We have added this discussion to the end of the relevant Results section (Lines 308-315).

      We have also revised this section to explicitly distinguish between the mobility associated with exiting the nucleus and diffusive movement, while still trying to distinguish between the time scale of exiting the nucleus vs the nuclear export rate. Specifically, we now refer to ‘time scale of nuclear export’ when discussing transport across the nuclear envelope and reserve the term ‘diffusion’ for passive intracellular movement. Furthermore, we have edited a sentence in this section (Lines 291-293) to describe the distinction we are making between the time scale measured by pCF and the time scale commonly associated with nuclear export (that is, the reciprocal of the rate constant). We hope this clarification improves readability and conceptual clarity.

      Last Part:

      (1) There is an undersold argument centered on Michaelis-Menten kinetics that needs to be explicitly presented, especially since it motivates the final experiments of the paper, which are challenging. In the two sections describing how the data do not adhere to expectations based on Michaelis-Menten Kinetics, the assertion that "the fraction of immoble Dl is expected to decrease with increasing nuclear total Dl concentration" is only intuitively true if the system is saturated. Is the system demonstrably saturated? Another interpretation of this would be that these results demonstrate that the system is likely not saturated. In any case, the authors need to devote some space in the introduction and/or results and/or discussion to fully motivate this point.

      We agree that the reviewer has raised an important point: if the system is very far from saturation, then the fraction of immobile Dl is not expected to decrease with increasing nuclear total Dl concentration. But neither would it increase; it would instead stay flat. To correct this mistake, we have edited the sentences in question to acknowledge the farfrom-saturation scenario, saying “at best, [the fraction bound] remain[s] constant” (Line 209). As such, our original point, which is that in no case would the fraction immobile increase [unless something else is going on besides affinity-based binding to DNA], it still valid.

      (2) Wouldn't any argument on the basis of Michaelis-Menten need to rely on the assumption that the system is at steady-state? Reeves 2012 concludes that during the times measured here, Dl does not reach a steady state. It would be good, in the context of the point above, for the authors to clarify how this impacts the expectations of saturation and the application of M/M kinetics.

      We thank the reviewer for raising this important point. We apologize for not being clear on our points about M/M kinetics and would like to stress again that we are not claiming the system is has M/M kinetics. We appealed to M/M kinetics only as a simple, intuitive example of a saturating system to point out the difference between bound concentration vs bound fraction as functions of total concentration. We did this because previous feedback on our manuscript suggested that the difference between these two variables needed to be made clearer. Because this point seemed controversial with both reviewers, we removed all mention of M/M kinetics and simply refer to the system as “saturating.” For further explanation, see the first paragraph of our response to Reviewer 1’s “weaknesses” in the public review.

      (3) It is not clear to me how the inclusion of wild-type, GFP-tagged dorsal in the experimental setup for Figure 5 is not confounding. For the S317 (phospho-) mutant, GFPtagged alleles of both phospho- and wild-type Dl are expressed. The reasoning is that not enough phospho-mutant Dl gets into the nucleus, and this makes it difficult to distinguish the dorsal from the ventral side of the embryo, so in a dl mutant background, there is expression of wt GFP-dl from a BAC, and nos>Gal4 driven expression of a GFP-tagged S317A mutant dl. The measurements show that on the ventral side of the embryo, there is no difference in the fraction of bound Dl. Couldn't this be predominantly binding of wildtype GFP-Dl? How is this interpretable? Wouldn't it be easier to perform these measurements in a Tl 10b background (or to cross in UAS>Tl[10b]) and for the only GFPtagged dl to be S317A? The same goes for the S234 mutant (could be done in the pelle mutant background).

      We thank the reviewer for raising the point that the confounding effect of wildtype Dl makes it difficult to interpret the results from the 317A mutant. Under the circumstances of the experimental design, we can best conclude that, if the null hypothesis is incorrect, the effect size was too small to detect with our sample size. As such, we have modified our discussion of the results of this experiment to carefully explain this caveat (rather than confidently saying that Toll phosphorylation has no effect). For further explanation, see the second paragraph of our response to Reviewer 1’s “weaknesses” in the public review, as well as our response to the related question raised by Reviewer 2 in the public review.

      Minor issues/typo stuff:

      (1) This reviewer notes that the submitted materials contain neither line numbers nor page numbers.

      We appreciate the reviewer’s feedback. We have now included line numbers and page numbers in the revised manuscript for easier reference.

      (2) First paragraph of results: "We imaged small regions of the embryo..." The parenthetical statement only cites pixel size and directs the reader to the methods. Without the total number of pixels, the pixel size value does not clarify how "small" the imaged region is. Consider including the xy area, pixel dimensions, and pixel size here to assert the smallness of the imaged area.

      We have added the requested information.

      (3) Second paragraph, Introduction: "Dorsal, one of three (Drosophila) homologs to mammalian NF-kB" (Add Drosophila). Also, aren't these orthologs?

      We have made these changes.

      (4) Last sentence of last paragraph in the introduction: Kind of a throw-away sentence. Consider revising.

      We thank the reviewer for making this point; the sentence was originally constructed to state that our quantitative measurements resulted in a biologically significant discovery. However, because Reviewer 2 also mentioned the question of biological significance, we have changed this final sentence to explicitly mention of what the biological significance is: namely, an understanding of the Dl gradient that has superior dynamic range, spatial range, robustness, and precision.

      (5) Where is the median line in the S317A boxplot in Fig 5C?

      The median line is at ψ = 0. We have added an explanation of this to the Figure legend.

      (6) Materials & Methods: Fly transformation, typo: Drosophila embryos were injected with 0.5 µl of each pUAST construct..." The volume of an entire Drosophila embryo is less than 0.5 µl, please revise the units to reflect the value injected. Most likely an absolute volume unit was stated when rather a concentration of an injection solution, delivered at significantly smaller volumes was intended.

      We thank the reviewer for catching this typo. It was intended to indicate a concentration of 0.5 ng/μL, and we have made the appropriate changes.

      Reviewer #2 (Recommendations for the authors):

      (1) Perhaps this has been described in a prior publication (if this is the case, please simply state this somewhere in the Methods section where Dl-GFP embryos are described), but since Dl-GFP embryos have one copy of endogenous dl and one copy of Dl-GFP, how do potential differences in tagged vs. non-tagged Dl interactions with DNA or Cact affect their findings?

      The reviewer brings up a good point, and we acknowledge that any time a protein is tagged with GFP, the behavior of the protein may be affected. We have now explicitly added this caveat to our discussion in a new paragraph on Lines 420-429.

      (2) In the Discussion section, the authors argue that a major implication of their findings is the possibility that Cact binds Dl in the nuclei would imply that the true (active) Dl gradient may be unknown unless the unbounded Dl is separated from the Dl/Cact (inactive form). While this is an interesting point, this idea is not supported by the findings of Figure 5B where there is no effect in the fraction of Dl bound to DNA in the reduced Cactus binding mutants. The authors should report what happens in lateral regions in Figure 5 because perhaps there is an effect there (see comment on this in the Public Review).

      We thank the reviewer for the insight, as we did not directly discuss the implications of the middle column of Fig. 5B on our hypothesis. Indeed, our hypothesis is not supported by Fig. 5B; it is instead inconclusive (failure to reject H0). This is why we designed the second experiment (Fig. 5C) to test the Cactus hypothesis, because the effect size would be greater on the dorsal side.

      Furthermore, as pointed out by both reviewers, the presence of wildtype Dl-GFP in these experiments is confounding. We have discussed this elsewhere in our rebuttal, but briefly, this problem resulted in needing larger effect sizes to detect a statistically significant difference between wt and the mutant populations. This was a necessary evil that we were willing to deal with in order to ensure the Dl gradient could be established so that the dorsal vs ventral sides would be distinguishable. We have added a fuller discussion of these issues to the relevant Results section (Lines 333-336, 343-345, 354-359, 365-369) and also the Discussion section (Lines 412-418), including underscoring the fact that, from a falsification standpoint, the results in Fig. 5B do not allow us to reject either null hypothesis, possibly due to the confounding effect of wildtype Dl. We appreciate the reviewer’s point about this, and believe the changes suggested by the reviewer have improved the manuscript.

      On the other hand, we respectfully disagree with the reviewer that investigating either mutant in the lateral regions of the embryo would bear fruit. To the first approximation, it would be the average between the behaviors on the ventral vs. dorsal sides. For the S317A mutant, neither the ventral nor the dorsal side was conclusive in regards to our hypotheses. (Although we admit here that further investigation into why the S317A column in Fig. 5C was statistically different from wildtype, in the opposite direction from the S234P mutant, may be interesting in future work.) For the S234P mutant, the data were more conclusive on the side of the embryo where the effect size was expected to be large enough to detect a difference. In the lateral regions, the expectation would be that the effect size would be intermediate, which would make the interpretation of the results more difficult (i.e., more likely to be inconclusive). In contrast, as Fig. 5C is already conclusive, we are not confident there would be more information gained by imaging the lateral regions.

      (3) Is Figure 5A a wild-type embryo? If so, I think that the labels are misleading or unclear. Also, is it the same image as in Figure 1A? If so, I suggest replacing this with a schematic since it does not add any new data.

      We have eliminated the labels for the mutants and have added the following comment to the figure 5 legend “Same embryo as in Fig. 1A”.

      (4) Also in Figure 5, I suggest using labels to indicate the schematics instead of simply using their location. You could use 5A', 5A' and 5A', for example.

      We have made the suggested changes.

      (5) The use of some technical labels makes some figures difficult to read. I suggest using more simple labels for mutants in Figure 3F (replace R063C) or Figure 5B, C (replace S234P and S317A).

      We have made changes to Fig. 3F, Fig. 5B,C, and the corresponding places in the figure legends. We have labeled R063C as ↓DNA, S317A as ↓Toll, and S234P as ↓Cact.

      (6) I suggest reporting p-values consistently. For example, in Figure 4B, they use one or two asterisks to denote p-values less than 0.07 and 0.05, respectively, which is somehow arbitrary and unconventional. Why not report the actual values as in Figure 5C, for example? (By the way, I would report in Figure 5B the actual p-values as well, since a nonsignificant value is also reported in Figure 5C. Also in Figure 5C, report values in the same notation (decimal or scientific), i.e., either put 0.005 as 5x10^-3 or 10^-3 as 0.001).

      We have made the suggested changes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This manuscript provides several important findings that advance our current knowledge about the function of the gustatory cortex (GC). The authors used high-density electrophysiology to record neural activity during a sucrose/NaCl mixture discrimination task. They observed population-based activity capable of representing different mixtures in a linear fashion during the initial stimulus sampling period, as well as representing the behavioral decision (i.e., lick left or right) at a later time point. Analyzing this data at the single neuron level, they observed functional subpopulations capable of encoding the specific mixture (e.g., 45/55), tastant (e.g., sucrose), and behavioral choice (e.g., lick left). To test the functional consequences of these subpopulations, they built a recurrent neural network model in order to "silence" specific functional subpopulations of GC neurons. The virtual ablation of these functional subpopulations altered virtual behavioral performance in a manner predicted by the subpopulation's presumed contribution.

      Strengths:

      Building a recurrent neural network model of the gustatory cortex allows the impact of the temporal sequence of functionally identifiable populations of neurons to be tested in a manner not otherwise possible. Specifically, the author's model links neural activity at the single neuron and population level with perceptual ability. The electrophysiology methods and analyses used to shape the network model are appropriate. Overall, the conclusions of the manuscript are well supported.

      Weaknesses:

      One potential concern is the apparent mismatch between the neural and behavioral data. Neural analyses indicate a clear separation of the activity associated with each mixture that is independent of the animal's ultimate choice. This would seemingly indicate that the animals are making errors despite correctly encoding the stimulus. Based solely on the neural data, one would expect the psychometric curve to be more "step-like" with a significantly steeper slope. One potential explanation for this observation is the concentration of the stimuli utilized in the mixture discrimination task. The authors utilize equivalent concentrations, rather than intensity-matched concentrations. In this case, a single stimulus can (theoretically) dominate the perception of a mixture, resulting in a biased behavioral response despite accurate concentration coding at the single neuron level. Given the difficulty of isointensity matching concentrations, this concern is not paramount. However, the apparent mismatch between the neural and behavioral data should be acknowledged/addressed in the text.

      We thank the Reviewer for the insightful comments and thoughtful suggestions. Our electrophysiological recordings show that GC dynamically encodes stimulus concentration of mixture elements, dominant perceptual quality, and decisions of directional lick. With regard to the encoding of mixtures, the clear separation of activity associated with each mixture (Figure 3) is present at a trial-averaged pseudo-population level, and average activities associated with more similar, intermediate mixtures are closer to each other in this space. At a single trial level activities evoked by similar, intermediate mixtures are much harder to separate. This increased similarity can lead to behavioral errors resulting from either incorrect encoding of the stimulus or from the inability to interpret the stimulus to guide the correct decision. The psychometric function, which shows that more distinct stimuli (100/0 vs 0/100) lead to fewer mistakes than more ambiguous, intermediate mixtures (55/45 vs 55/45), is consistent with the increased ambiguity of responses to intermediate mixtures.

      The Reviewer is correct that there could be a slight mismatch in the perceived intensity of the mixture components. This mismatch could be the reason for the slight asymmetry in our psychometric function (Figure 1B). However, it is not uncommon for mice in these 2AC tasks to also have a motor laterality bias in their responses that manifests itself for the more ambiguous stimuli. We chose not to model this bias given its subtlety and its unknown origin. Rather, we chose to model an ideal scenario in which stimuli have matched intensity and no motor bias exists. In the revised manuscript we discuss this issue.

      Reviewer #1 (Recommendations for the authors):

      (1) The apparent mismatch between neural and behavioral data. I am providing more details in this section to hopefully better illustrate my concern.

      (a) Based on the author's psychometric curve, sucrose appears to be a more salient signal causing the behavior to be shifted (e.g., a 50/50 mixture results in a >60% predicted behavioral performance). If both sucrose and salt were intensity-matched, a 50/50 mixture should result in a behavioral performance near 50%. The increased salience of sucrose could cause the animals to have lower overall performance despite accurate neural encoding. Alternatively, certain animals could display a strong side bias, skewing the data slightly. These issues have seemingly been fixed in the model data, which displays a more balanced psychometric curve. Accordingly, the model data seemingly displays a larger shift in error trials as compared to correct trials (Figure 6A).

      The reviewer is correct in observing that the average experimental psychometric curve in Figure 1B shows a slight shift in favor of the sucrose side with a 50/50 mixture. We fit psychometric curves to each session and the mean value of P(Sucrose choice | Stimulus = 50/50) across sessions was significantly different from 0.5 (one-sample t-test, p = 0.003), with 5 probabilities below 0.5 and 18 above it.

      This slight bias could be attributed to a slight mismatch in the perceived intensity of the mixture components and/or lateral motor biases. In any case, it is subtle and its origins were not a focus of this study.

      Models were not trained to match the animals’ psychometric curves, but rather to choose correctly in an ideal scenario where stimuli have matched intensities. This explains why the model simulations lack the bias observed in animal behavior data.

      We do not believe that there is a mismatch between the experimental behavioral and neural data, as trial-averaged pseudo-population trajectories are farther in neural space for more discriminable stimuli and closer in neural space for more similar stimuli, consistent with behavioral performance that is high for more discriminable stimuli and low for more similar stimuli. Moreover, as the model also shows, a clear separation of trial-averaged trajectories still results in a sigmoidal performance function for trial-to-trial behavior.

      Finally, subtle behavioral biases would not necessarily be expected to appear in our dPCA analyses since we used this technique to find a single axis that best separates all stimuli conditions regardless of choice when the pseudo-population data are projected upon it. Additional modes of activity that explain less overall variance might better reflect biases.

      (b) Although I am not an expert at these analyses, I wonder whether the elevated bump (i.e., >0) in Figure 3C of the 55/45 mixture that occurs early in the stimulus presentation further supports the hypothesis mentioned above and could indicate an early signal of salience/increased intensity?

      The reviewer is correct that the 55/45 trajectory features a brief positive wave right after stimulus delivery before going negative. While this may be related to stimuli not being explicitly balanced for intensity, it could also reflect a signal related to ambiguity or balanced mixtures. We are hesitant to interpret this positive deflection as conclusive evidence of a bias in neural activity, given its short duration and the natural variability of neural signals.

      (2) The increase in step-perception neurons after the decision period is confusing (Figure 4C). The text states (line 246) "the analysis reveals a small and time-invariant proportion of step-perception neurons". However, the proportion doubles after the decision-making process, which is seemingly a significant change. Why does this occur? This observation is noticeably missing from the network data. Could it be attributed to a mislabeling of "step-choice" neurons, given the correlation between the left/right decision and sweet/salty? Either way, it is very noticeable and should be addressed.

      We cannot be sure of the reason for the increase in step-perception neurons after decisions. One possibility is that they are acting as feedback for learning, encoding the percept to compare with choice and outcome to improve performance. The model, which presumably learns the task differently from the animals, does not seem to leverage this signal for its own learning. We have modified the text, now referring to a “small but consistently present proportion” of step-perception neurons, and included this proposed explanation in the Discussion.

      (3) Optional: I think the authors are missing an opportunity to analyze the temporal aspect of this multiplex code using their network-based modeling approach. A significant proportion of neurons fall into different categories (i.e., step-perception/linear, etc.) at different time points. However, the virtual ablation experiments remove any neuron that falls into one of these categories at any time. By limiting the cell-specific virtual ablation to specific time windows, you could (I think) provide stronger evidence for the temporal sequence of the encoding of these perceptual aspects.

      This was an excellent suggestion for an additional modeling experiment, so we performed it. A new supplemental figure (Figure S8) and additional text in the revised manuscript showcase the results. In summary:

      In terms of behavioral results, ablating the linear coding units in the beginning (that is, silencing all units that are labeled linear in any bin within the first 1.2 s after stimulus onset for the entirety of the 1.2 s) significantly reduces performance, as does ablating the step-perception or step-choice coding units at the end (1.2 s prior to choice). The remaining combinations of coding type and timing of the ablation do not affect performance.

      Regarding the dynamics of coding types (compare Figure 7A), stimulus coding activity was significantly blunted only by ablating the linear coding units in the beginning, whereas choice coding activity was diminished by ablating the choice coding units at the end or by ablating the linear coding units at either the beginning or the end.

      Reviewer #2 (Public review):

      Lang et al. investigate the contribution of individual neuronal encoding of specific task features to population dynamics and behavior. Using a taste-based decision-making behavioral task with electrophysiology from the mouse gustatory cortex and computational modeling, the authors reveal that neurons encoding sensory, perceptual, and decision-related information with linear and categorical patterns are essential for driving neural population dynamics and behavioral performance. Their findings suggest that individual linear and categorical coding units have a significant role in cortical dynamics and perceptual decision-making behavior.

      Overall, the experimental and analytical work is of very high quality, and the findings are of great interest to the taste coding field, as well as to the broader systems neuroscience field.

      I have a couple of suggestions to further enhance the authors' important conclusions:

      My main comment is the distinction between constrained and unconstrained units. The authors train a small percentage of units to match the real neural data (constrained units), and then find some unconstrained units that are similar to the real neural data and some that are not. As far as I could tell, the relative fraction of constrained and unconstrained units in the trained RNN is not reported; I assume the constrained ones are a much smaller population, but this is unclear. The selection of different groups of neurons for the RNN ablation experiments appears to be based on their response profiles only. Therefore, if I understood correctly, both constrained and unconstrained units are ablated together for a given response category (e.g., linear or step-perception). It would be useful, therefore, to separately compare the effects of constrained vs. unconstrained RNN units.

      We thank the Reviewer for the constructive feedback. The Reviewer is correct that ablations were carried out with respect to response categories only and included both constrained and unconstrained units.

      The ratio of total units to constrained units was fixed at 5.88, thus constrained units were ~17% of the network and unconstrained units were ~83%. This value is specified in the Methods (RNN: Components and dynamics), but we have reported it in the Results of the revised manuscript for clarity.

      We have also edited the Methods because they wrongly stated that the ratio of unconstrained (rather than total) units to constrained units was 5.88.

      Specifically:

      (1) For the analyses in the initial version of the manuscript, the authors should specify how many units in each ablation category are constrained and unconstrained.

      In the revised manuscript, we have specified the fractions of constrained and unconstrained units within each response category. For convenience, they are reported here: linear = 194 constrained and 691 unconstrained units; step-perception = 147 constrained and 840 unconstrained units; step-choice = 129 constrained and 814 unconstrained units; “other” = 353 constrained and 1739 unconstrained units.

      (2) The authors should repeat Figure 6, but only for unconstrained units to test how much of the effects in the initial version of Figure 6 are driven by constrained vs. unconstrained RNN units.

      In the revised version we have included two additional supplemental figures (Figures S5-6) where the analyses of Figure 6 are carried out separately for constrained and unconstrained units. In short, the results for the constrained units strongly resemble those for the experimental data, while the results for the unconstrained units strongly resemble those for all model units.

      (3) The authors should repeat Figure 7, but performing ablations separately on the constrained and unconstrained units to examine how the network behaves in each case and the resulting "behavioral" effect.

      The revised version includes a supplemental figure (Figure S7) with the results of these additional ablation simulations.

      In summary:

      In terms of behavioral performance, the prior results showing that ablating linear, step-perception, or step-choice units significantly impairs performance, while ablating “other” has no significant effect, hold even if ablation is restricted to only constrained or only unconstrained units. There is a significant main effect of constrained vs unconstrained; on average, ablating the unconstrained population impairs performance more, most likely due to their larger population size.

      In terms of dynamics, to impair stimulus coding by ablating step-choice units, you must ablate them all; to impair stimulus coding by ablating linear or step-perception units, however, ablating just the unconstrained ones suffices. As before, ablating linear, step-perception, or step-choice units significantly impairs choice coding activity, while ablating “other” units does not; these results hold even if ablation is restricted to only constrained or only unconstrained units. Finally, there is again a significant main effect of constrained vs unconstrained; on average, ablating the unconstrained population impairs dynamics more, most likely due to the larger population size.

      Reviewer #2 (Recommendations for the authors):

      (1) In addition to panel 5B, it would be informative to show data from individual mice and the corresponding RNNs trained on each mouse, to assess how closely they match. If available, including one representative example of a good match and one of a less accurate match would help the reader get a better sense of the data.

      Figure 5B shows the average behavioral performance of the model. Individual models were not trained directly on the psychometric curves of experimental sessions; they were trained to perform the task correctly. After successful training, model simulations were run with input noise to be able to produce a sigmoidal psychometric curve. However, although the input noise was tuned to capture the overall correct rate of the corresponding experimental session, we did not attempt to match the details of the psychometric curve. See also the next reply.

      (2) In addition to panel 5C, it would be useful to add examples of experimentally observed PSTHs and the corresponding activity trajectory for the units in the RNN trained to match them, for all the other coding patterns (step-perception and step-choice).

      We note that the PSTH in 5C is not an example of a linear coding unit as the Reviewer implies, but simply one with a good fit, and here the model’s output was produced in the absence of input noise. In order to classify step-perception and step-choice responses one needs error trials, but the model was trained without this input noise that induces errors (and produces a sigmoidal psychometric function) to match experimental PSTHs from correct trials only. Post-training simulations were then run with input noise to induce error trials, and model unit response profiles were classified based on this. However, there is no guarantee that error trials in the model match the error trials in the experiment; therefore, step-perception and step-choice units in the model may or may not be step-perception and step-choice units in the data. Despite this limitation, the revised manuscript includes additional examples, in Figure S2, of experimentally observed PSTHs and their corresponding model activity, to supplement Figure 5C and provide a better sense of the goodness-of-fit.

      (3) Electrophysiological data in Figure 2 - It would be helpful to provide statistics on how many neurons change their activity in each session.

      In the revised manuscript we have included across-session statistics for proportions of neurons that are taste-responsive and that show decision preparatory activity. We have also included tables (Tables S1 and S3) with the numbers of neurons that are taste-responsive and that show preparatory activity for each session in the experimental and model data.

      (4) Peak auROC selection - How was the peak auROC selected? Selecting only one bin for the peak could be potentially problematic and may result in the incorrect identification of an outlier that does not faithfully represent the neuron's overall activity. The peak selection could instead be based on several consecutive bins showing a consistent trend. If this approach was already implemented, the authors should explicitly describe it in the Methods section.

      Peak auROC was selected from a single bin (with average duration about 50ms). While it is true that this may result in outlier neurons that transiently prefer one stimulus strongly but more consistently prefer the other, we opted for a simple criterion to sort the neurons into two categories for visualization. Adopting more stringent criteria that consider multiple bins may result in neurons that cannot be placed in either category, and we wanted a way to examine the entire pseudo-population. Also, the entire auROC trace is visualized in the heatmap, so potential outliers are not hidden and can be assessed by eye.

      Reviewer #3 (Public review):

      Primary taste cortex neurons show a variety of dynamic response profiles during taste decision-making tasks, reflecting both sensory and decision variables. In the present study, Lang et al. set out to determine how neurons with distinct response profiles contribute to perceptual decisions about taste stimuli.

      The methods, with reference to the behavioral task and electrophysiological recordings/data analysis, are straightforward, solid, and appropriate. The computational model is presented in a clear and conceptually intuitive manner, although the details are outside of my area of expertise.

      The experimental design features a simple 2-alternative forced-choice design that yielded clear psychometric curves across a range of stimuli. In vivo recordings were performed using Neuropixels and yielded an appropriate sample of single neuron responses. The strength of the model lies in the fact that it consists of single neurons whose response profiles mimic those recorded in vivo, and allows neuron-selective manipulation.

      By virtually lesioning specific subsets of neurons in the network, the authors demonstrate that a relatively small population of neurons with specific tuning profiles was sufficient to produce the observed neural dynamics and behavioral responses. This effect was selective as lesioning other responsive neurons did not affect overall response dynamics or performance.

      These findings provide new insight into the relation between the response profiles of single neurons in sensory cortex, their population-level activity dynamics, and the perceptual decisions they inform.

      The approach is particularly innovative as it uses computational modeling to target functionally-defined "cell types", which cannot necessarily be targeted by more conventional genetic approaches.

      We thank the Reviewer for the positive assessment of our study.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: I'm missing a clearly stated specific hypothesis and what is predicted on the basis of that hypothesis. What is the alternative?

      The null hypothesis is that single neuron activity patterns, even when clearly structured, do not matter for population activity or behavior. Alternatively, they do matter for these phenomena, and our model supports the alternative hypothesis. We have made this hypothesis clearer in the Introduction.

      (2) Discussion: Much of the text is a recap of the Introduction and Results sections. Please elaborate on the specific insights gained from the findings. The idea that tuned neurons in the sensory cortex are the basis for perception and perceptual decisions concerning the features being represented by those neurons is generally accepted. What the present study adds to this insight could be described more explicitly. On the other hand, the idea that small populations of tuned neurons are responsible for perception of taste/perceptual decisions about taste appears in contrast with previous accounts where stimulus features/decisions are reflected in correlated changes in activity across distributed populations of taste cortical neurons, including ones that are not necessarily tuned or even overtly responsive. How do the present findings relate to this idea?

      This is a very good point about reconciling these findings with past ones that have focused on coordinated changes across ensembles of neurons, i.e., metastable dynamics of internal (hidden) states. There is a brief mention of metastability toward the end of the Discussion, but we agree it deserves elaboration.

      This work does emphasize single unit activity, but in the context of, and as relevant to, population activity. We believe that the findings and frameworks of previous studies and those presented here are compatible rather than mutually exclusive. There is no reason why neurons with the coding patterns we studied here cannot coordinate with others to participate in the formation of different metastable states. The question of which—neurons with specific response profiles, or ensemble activity patterns that may involve these neurons?—is necessary and sufficient for producing perception and behavior during the mixture-based decision-making task is interesting but rather difficult to answer because of the single units’ contribution to both alternatives. One would need to utilize a manipulation that disrupts ensemble coordination without disrupting single unit activity to differentiate between them. We have made these points clearer in the Discussion.

      (3) Results: RNNs were based on data from single sessions -- how many neurons of each tuning type were observed in each session? In particular, there were 23 sessions but only 25 neurons total tuned to choice, suggesting that modelled choice neurons were based on ~1 neuron.

      The revised manuscript includes the session-by-session breakdown of response types for both experiment and model in two supplementary tables (Tables S2 and S4). We note that there are 25 neurons tuned to choice during the last 500 ms of the trial prior to decision, but 114 out of 626 neurons in total are tuned to choice in some time bin in the experimental data.

      (4) Minor: Indicate the time windows used for analysis of stimulus sampling, delay, and choice on the figures.

      The revised manuscript now includes the illustration of sampling and delay windows in Figure 2C-D, since we averaged the values over these windows for use in a 2-way ANOVA. All other figures either are associated with bin-by-bin analyses and have the first central and lateral licks (T and D) indicated, or have the time windows specified (e.g., Figure 4B, which uses [T, T + 0.5 s] and [D - 0.5 s, D]).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides valuable insights with solid evidence into altered tactile perception in a mouse model of ASD (Fmr1 mice), paralleling sensory abnormalities in Fragile X and autism. Its main strength lies in the use of a novel tactile categorization task and the careful dissection of behavioral performance across training and difficulty levels, suggesting that deficits may stem from an interaction between sensory and cognitive processes. However, while the experiments are well executed, the reported effects are subtle and sometimes non-significant. The interpretation of results may be overextended given the nature of the data (solely behavioral), the reliance on repeated d′ measures may obfuscate some of the results without clearer psychometric or regressionbased analyses, and the absence of mechanistic, causal, or computational approaches limits the strength of the broader conclusions. The work will be relevant to those interested in autism, cognition, and/or sensory processing.

      We thank the editors for their positive assessment of the data quality and the novelty of our behavioral task, and for pointing out the limitations inherent in behavioral studies.

      We would like to clarify one important point regarding the use of d′ measures. While d′ was included to quantify sensitivity, our conclusions are not based solely on repeated d′ measures. In addition to d′, we analyzed raw behavioral data (correct and incorrect choice rates), and categorization performance was assessed using psychometric curves fitted with logistic regression models. These complementary analyses provide converging evidence and ensure that our interpretations are supported by multiple robust measures.

      In the revised manuscript, we have further strengthened the analyses by including additional regression-based assessments, reporting effect sizes for subtle effects, and refining the statistical methods for clarity and transparency.

      We fully acknowledge that this work is behavioral and does not directly reveal the underlying neural mechanisms. Nonetheless, the translational framework we have developed establishes a robust foundation for future studies. This platform can be directly applied in clinical research on autism and other neuropsychiatric conditions involving sensory-cognitive interactions, and provides a solid basis for subsequent mechanistic, causal, or computational investigations to uncover the neural circuits mediating these effects.

      We greatly appreciate the editors’ and reviewers’ guidance and believe the revisions have clarified and strengthened the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.

      We appreciate the reviewer’s statement highlighting the importance of our study.

      Strengths:

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism.

      We thank the reviewer for recognizing the quality of our experiments and the relevance of our findings for understanding tactile perception and cognition in autism.

      Weaknesses:

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure).

      We thank the reviewer for these constructive comments. We acknowledge that aspects of the analyses were previously difficult to follow, and we have reworked the Results section to improve clarity and transparency.

      We would like to emphasize that all d′ measures are complemented by analyses of raw response rates (correct and incorrect choices), ensuring that our interpretations are not solely dependent on this metric. In addition, we applied standard psychometric analyses wherever possible. For the training phase, only two stimulus amplitudes were presented, which precluded the construction of full psychometric curves; however, for the categorization phase, psychometric analyses were feasible and are reported in Figure 3. Specifically, psychometric functions were fitted to the data using logistic regression, allowing us to estimate both categorization bias (threshold) and precision (slope) across stimulus intensities. These analyses revealed no evidence of categorization bias or precision in Fmr1<sup>-/y</sup> mice across stimulus strengths.

      Following the reviewer’s suggestion, we have also added general linear model analyses that account for trial history, providing a complementary perspective on decision-making dynamics. Finally, while the calculation of d′ is detailed in the Methods, we have revised the Results to clearly explain its use and appropriateness in each relevant analysis.

      These revisions aim to provide a clearer, more comprehensive picture of the data while ensuring that all conclusions are supported by multiple complementary measures.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      We thank the reviewer for emphasizing the strengths of our task design and analysis approach, and we appreciate that the potential of this platform for future mechanistic investigations is recognized.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative.

      We thank the reviewer for the careful reading of our manuscript and for these constructive comments. We agree that our study is purely behavioral, and we appreciate the opportunity to clarify the scope and interpretation of our findings. The primary goal of this work was to characterize behavioral patterns during tactile discrimination and categorization in a translationally relevant mouse model of autism.

      Although we did not include direct neural recordings, causal manipulations, or computational modeling, our analyses combining choice behavior, sensitivity measures from signal detection theory, psychometric curves, and regression-based models of trial history provide a detailed and robust characterization of perceptual learning, stimulus discrimination, categorization, and the interplay of cognitive processes with tactile perception. The manuscript has been revised to explicitly state that our conclusions are behavioral, emphasizing that this work establishes a foundation for future studies aimed at elucidating the neural and circuit mechanisms underlying these sensory–cognitive interactions.

      Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered.

      Alternative explanations for our findings including differences in motivation, fatigue, satiety, stereotyped licking, or reward valuation were carefully considered. As described in the Methods, only testing sessions with >70% correct performance on the training stimuli (12 µm and 26 µm) were included, excluding sessions with reduced motivation, fatigue, satiety, or stereotyped licking that could confound performance on low- or high-salience stimuli.

      Although differences in reward valuation could affect learning speed, we observed no genotype differences in training duration (Fig. 1B-D, Fig. S1C-D). Sessions with disengagement were analyzed only during epochs of active task performance (information added to the revised Methods section, lines 619-620). Reward-driven choice biases were unlikely, as no genotype differences were observed in categorization bias (Fig. 3F) and GLM analyses confirmed that previous reward outcome did not affect current choices (Fig. 4D).

      Finally, altered reward valuation could increase miss rates. Elevated miss rates in Fmr1<sup>-/y</sup> mice were restricted to the lowest-intensity stimulus (12 µm) under high cognitive load, demonstrating a salience- and context-specific effect inconsistent with generalized motivational or reward deficits. The Discussion has been updated to clarify these points and delimit the scope of our interpretations (lines 483-499).

      Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      This was not done intentionally. References to Load Theory were meant to provide conceptual inspiration for assessing attention in high cognitive load conditions during categorization, rather than to indicate a formal test. Moreover, we do not claim to have tested the Weak Central Coherence theory, although our results suggest reduced facilitation of across- category discrimination. Finally, we agree that citing Adaptive Resonance Theory, which is grounded in artificial neural network models, could be misleading, and we have revised the text accordingly.

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations.

      We thank the reviewer for this comment and agree that our study is purely behavioral and does not provide direct mechanistic evidence for top-down pathway dysfunction. In the first version of the manuscript, the term “top-down” was used at the behavioral level, referring to the influence of higher-order cognitive processes (e.g., categorization, attention, sensory and choice history integration) on tactile perception, rather than to imply specific neural circuits.

      We acknowledge that identifying the neural pathways underlying these effects would require extensive mechanistic experiments, including identifying the specific top-down pathway that modulates the influence of categorization on discrimination without directly altering categorization itself and performing pathway-specific recordings and manipulations. Such work represents a substantial mechanistic research program beyond the scope of the present study.

      To clarify that our study does not provide insights into the neural underpinnings of the studied behavioral processes, we have revised the manuscript, removing the term “top-down” or replacing it with “higher-order processes” where appropriate. We also explicitly noted that future work using neural recordings or causal manipulations will be needed to uncover the neural underpinnings of these behavioral phenomena (lines 508-510).

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited.

      We recognize that terms such as “reduced top-down categorization influence” and “choice consistency bias” are derived from behavioral observations. However, we respectfully note that these behavioral inferences are widely used in clinical studies to characterize cognitive tendencies (Soulières et al., 2007; Feigin et al., 2021) and are not inherently speculative.

      The translational impact of our work lies in the development of a robust behavioral platform that allows precise dissection of tactile perception and cognitive influences in a manner directly comparable to clinical studies. While we agree that neural, circuit-level, or causal manipulations would provide valuable mechanistic insight, the current study establishes a foundational behavioral framework that can guide and inform future investigations into the underlying neurobiological substrates.

      To ensure clarity, we have revised the manuscript throughout to explicitly indicate that all conclusions are based on behavioral measures and do not imply mechanistic evidence.

      (3) Statistical analysis:

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on non-significant findings undermines confidence in the conclusions.

      We chose to present both statistically significant effects and trends to ensure transparency and to highlight that commonly used aggregate measures, such as d′, can sometimes obscure meaningful underlying patterns. In the text, p-values between 0.05 and 0.1 are described as trends without over-interpreting their significance. To further support interpretation, we have now computed effect sizes (Hedges’ g) for all subtle effects. In the revised manuscript, all interpretations of non-significant effects have been reworded to avoid overstatement.

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations.

      The number of mice used per genotype is consistent with standard practices in behavioral studies of sensory processing. To complement statistical analyses and account for small sample sizes, we have calculated effect sizes (Hedges’ g) for all subtle or trend-level effects (p ≈ 0.05–0.1), providing a measure of effect magnitude independent of sample size.

      As the reviewer correctly noted, no animals were excluded as outliers, since observed variability reflects true biological differences rather than experimental or technical errors. In the revised manuscript, we re-examined all datasets for potential outliers, and when identified, analyses were performed both with and without the data point. Any results sensitive to single animals are explicitly reported. This procedure is now detailed in the Methods section (lines 675-679).

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.

      We thank the reviewer for highlighting this important point. To control for false positives arising from multiple comparisons, we applied the Bonferroni correction. This information has been added to the Methods section (line 682) to ensure transparency and reproducibility of all statistical tests.

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as t-tests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test.

      We thank the reviewer for raising this point, as this was not done intentionally. In the revised manuscript, miss rates for high- and low-salience stimuli were reanalyzed using a mixedeffects linear model, which appropriately accounts for repeated measurements within sessions (Fig. 5; Results section: lines 320-340). This analysis confirmed that Fmr1<sup>-/y</sup> mice exhibit increased miss rates specifically at the 12 µm amplitude, with the effect disappearing at higher low-salience amplitudes (18 µm). Post-hoc comparisons with Bonferroni correction revealed a strong trend for increased misses at 12 µm (T-test: t = -2.8437, p = 0.058, Hedge’s g = 1.23), while no significant differences were found at other amplitudes. The Methods section has been updated to detail this statistical approach for analyzing miss rates (lines 686687).

      (4) Emphasis on theoretical models:

      The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed.

      As mentioned above, our goal was not to directly test theoretical frameworks such as Adaptive Resonance Theory, Load Theory of Attention, or Weak Central Coherence, but rather to provide a context for interpreting our behavioral findings. In the revised manuscript, we have removed references to the Load Theory from the Results section and reframed the Discussion to emphasize that our results are consistent with certain predictions from these cognitive theories, without implying that the experiments directly assessed them. This clarifies that the interpretations are based on observed behavioral patterns, while still acknowledging the potential relevance of these frameworks to better understand tactile perception and cognition in autism.

      Reviewer #3 (Public review):

      Summary:

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice.

      Strengths:

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD.

      We appreciate the reviewer’s positive assessment regarding our study’s translational value and the importance of our behavioral findings.

      Weaknesses:

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.

      We thank the reviewer for these helpful suggestions. We agree that visualizing behavioral patterns, such as raster and density plots of licks, as well as learning rate over time, provides additional insights into learning dynamics. In response, we have added these analyses to the revised manuscript (Fig. S1, Fig. S2), which illustrate both individual and group-level learning trajectories and trial-by-trial licking patterns.

      There was no assessment of reversal learning in Fmr1<sup>-/y</sup> mice in this study. While this is an interesting and important question, and is motivated by previous preclinical and clinical findings, it falls outside the scope of the current manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Main Comments

      (1) This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism vs. WT controls. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention. The experiments seem well performed, with interesting results. I found certain aspects of the analysis not clearly explained, which made it difficult at times to understand.

      Please see specific details in the comments below.

      (2) To measure sensitivity, the authors present many comparisons of d' - sometimes between pairs of stimuli (or sometimes even for a single stimulus level).

      (a) Firstly, the calculation of d' for a single stimulus value is unclear (because the same proportion of high/low choices for a given stimulus can result from shifts in bias/criterion).

      We agree with the reviewer that calculating d′ for a single stimulus conflates sensitivity with response bias/criterion differences. For this reason, the panels showing d′ for individual stimulus amplitudes during training (Fig. 1F and 1G in the original manuscript) have been removed from the manuscript.

      In addition, we revised our d’ (Fig. 1E) and criterion calculations (Fig. 2A), treating the high amplitude stimuli as “signal” and low amplitude stimuli as “noise”, based on the Signal Detection Theory. The formulas used in the revised manuscript take into account correct responses during high amplitude stimuli and wrong responses during low amplitude stimuli to calculate the sensitivity and bias of the mice during discrimination in the training period.

      Sensitivity (d′) is now computed as:

      d' = z(lick right|high amplitude stimulus) - z(lick right|low amplitude stimulus)

      and the criterion (c) as:

      c = −1/2 × [z(lick right / high amplitude) + z(lick right / low amplitude)]

      (b) Secondly, while calculating d' makes sense for comparing two stimulus levels (like in the training condition), in the test condition (with a spread of stimuli), this becomes a little tedious - at times difficult to follow and unclear.

      I would have thought that sensitivity (at least for overall performance) would be better compared using data from all the stimuli - e.g. either using:

      (i) the sigma of the psychometric curve (although the downside of that approach is that it ignores history effects), or

      (ii) a logistic regression for the choices, given the stimuli, where the weights assigned to the stimulus magnitude indicate sensitivity (the advantage of that approach is that history effects, like the previous trials/choices can be used as regressors in the model). Accordingly, it can simultaneously also quantify the history effects. This could even be expanded to a GLMM (mixed effects for different mice).

      We thank the reviewer for this very valuable feedback. Indeed, during the testing phase, we calculated sensitivity d’ to probe the overall categorization sensitivity (Fig. 3H).

      (i) This analysis was only complementary to the psychometric curves (fitted on the rightward lick rate for each stimulus amplitude using a general linear model – Fig. 3A). As the reviewer proposes, we had calculated the sigma of the psychometric curve (Fig. 3G, slope) to assess categorization precision. Sensitivity calculations have also now been revised using the aforementioned formula (d' = z(lick right|high amplitude stimulus) - z(lick right|low amplitude stimulus).

      (ii) To incorporate history effects, we implemented generalized linear models (GLMs) with a binomial link function to predict high-salience licks (right-lick choices) based on the current stimulus, trial history, genotype, and their interactions. A main-effects model included current stimulus, previous stimulus, previous outcome, previous choice, and genotype, followed by interaction terms to assess genotype-specific modulation of history effects. These analyses are now presented in the new Figure 6.

      The resulting coefficients are shown in Fig. 6A. As expected, decisions were primarily driven by current stimulus amplitude (Fig. 6A, B). Both genotypes displayed a tendency to repeat previous choices (Fig. 6A, C), while previous reward outcomes did not influence current choice (Fig. 6A, D). Notably, stimulus amplitude history showed genotype-specific effects: WT mice were negatively influenced by the previous stimulus, whereas Fmr1<sup>-/y</sup> mice remained unaffected (Fig. 6A, E).

      To clearly visualize these findings, we plotted psychometric curves and marginal effects accounting for current stimulus, previous choice, previous outcome, and previous stimulus (Fig. 6B-E). These analyses are now fully integrated into the Methods (lines 688-702), Results (Fig. 6, lines 341-369), and Discussion (lines 469-479) sections of the revised manuscript.

      (3) I find some of the terminology used confusing/misleading:

      (a)The term "Categorization thresholds" can be misleading - in psychometric curves, "thresholds" often refer to the sigma (SD) of the fitted curve used to measure sensitivity (inversely related). Here, I think that the meaning is in terms of the PSE/ criterion. Perhaps the terminology can be improved to prevent confusion on this matter. E.g., I think that here the authors mean a measure of bias/criterion/PSE or similar. Correct? Not really a perceptual "threshold".

      We thank the reviewer for pointing this out. In our analysis, the term “threshold” referred to the inflection point (i.e., the midpoint parameter μ) of the fitted logistic psychometric function used to categorize high- versus low-amplitude stimuli. We termed it “threshold” in the categorization of high and low amplitude stimuli. We agree with the reviewer that we could also use the term “Categorization bias”. We originally opted to avoid this term, not to confuse the readers when referring to the criterion (signal detection theory) as “response bias”. However, seeing as the term “threshold” may be confusing as well, we adopted the term “Categorization bias” in the updated version of the manuscript (lines 282, 284, 637-638, 785, Fig. 3F).

      (b) Similarly, I think that "Categorization accuracy" can be misleading when describing the slope of the psychometric curve. Performance could have a steep slope but still be quite inaccurate (e.g., if there is a big bias). Perhaps "precision" is a better description of the slope?

      We thank the reviewer for this suggestion. The slope of the psychometric curve is often referred to as “sensitivity” in the literature (Carandini and Churchland, 2014), but in our original manuscript we used the term “accuracy” to avoid confusion with the d′ measure from signal detection theory. We have revised the manuscript and Figures with the term “precision” as the reviewer suggested (lines 282, 284, 637-638, 786, Fig. 3G).

      Minor Comments

      (1) Abstract: "determines how autistic individuals engage" - there are other factors too. So, I think that "determines" is a little strong. Perhaps "influences" is more appropriate.

      We have incorporated the reviewer’s suggestion (line 7).

      (2) Figure 1 F, G. On the one hand, d' is defined as "sensitivity (d') in discriminating between high- and low-salience stimuli" - that seems to make sense. But then d' is also calculated and presented for each salience level on its own. How was this done? Namely, percent correct (or proportion of choices high/low salience) could be affected by criterion shifts as well as sensitivity. This makes calculating the d' for a single (low or high) salience stimulus ambiguous. So, how do these authors make this conclusion?

      We agree that calculating d′ for a single stimulus amplitude is ambiguous, because the resulting value conflates true stimulus sensitivity with shifts in response bias or criterion. Consequently, all analyses and figures reporting d′ for individual high- or low-salience stimuli (e.g., Figures 1F and 1G) have been removed from the revised manuscript.

      In the updated analyses, d′ is calculated only across high- versus low-salience stimuli, following standard Signal Detection Theory procedures, ensuring that it reflects true discriminability between the two categories (Methods, line 631; Figure 1E).

      (3) "Our results showed comparable correct choice rates in Fmr1-/y and WT mice (Fig. 1H), for both high- and low-salience stimuli (Fig. S1C-D). In contrast, Fmr1-/y mice presented a significantly higher rate of incorrect choices (Fig. 1I)." - aren't correct choices and incorrect choices complementary (i.e., 1-x) in a 2AFC? How is this possible?

      We thank the reviewer for pointing this out. Correct and incorrect choices are complementary at the single-trial level if miss trials are excluded. However, in our analyses, correct and incorrect choice rates were calculated by normalizing the number of correct or incorrect responses to the total number of trials (including misses), which breaks this complementarity and contributes to the differences observed in Fig. 1H–I. This was clarified in the Methods section (lines 616-617). Moreover, incorrect responses were less frequent than correct ones and are thought to reflect lapses, response bias, and impulsive responding rather than sensory performance, making them more sensitive to genotype-dependent differences in behavioral control. Based on this concept, we further examined whether incorrect choices were preferentially associated with specific stimulus amplitudes and assessed response bias and prior effects.

      (4) The conclusion that "they showed a strong trend toward reduced sensitivity for lowsalience stimuli (Fig. 1G)" has a confound - it could be that there was a criterion shift (rather than differences in sensitivity)?

      We agree with the reviewer that the previously reported trend in sensitivity for low-salience stimuli could reflect a criterion shift rather than true differences in sensory sensitivity. Because sensitivity estimates for individual stimulus amplitudes are not well-defined in a 2AFC framework, we have removed the sensitivity calculations for high- and low-salience stimuli considered independently. Instead, we now present salience-specific differences using correct and incorrect response rates for each stimulus amplitude, which more directly capture performance differences without assuming changes in sensory sensitivity (Fig. 1G-I, S1E-F).

      (5) Figure 3D, E - I stumbled over this in comparison to Figure 3B, C. That is because (a) In D and E, the authors compare right-lick responses (reporting high salience) to stimuli of 12 μm and 14 μm amplitude (Figure 3D) and low-salience lick rates for the same (Figure 3E). I would have thought that these approaches are simply complementary (1-x) - see related minor question above/below. So, what is the advantage of presenting them both?

      We presented both panels to clarify the source of the observed differences in performance. Specifically, showing right-lick responses (reporting high-salience choices) alongside low salience lick rates allows us to distinguish whether reduced high-salience reporting arises from an actual shift in choice (e.g., increased leftward licking) versus an increase in miss trials at the lowest amplitude (12 µm). By presenting both, we can demonstrate that the effect is primarily driven by an increase in leftward choices rather than by missed responses, providing a more precise interpretation of behavioral changes. The complementary analysis for leftward choices has now been moved to the supplemental material (Fig. S5A) and the reason for this analysis has been clarified in the Results (lines 275-276).

      (b) In B and C, the authors compare two differences in stimulus magnitude (2 and 4 μm), but in Figure 3D and E, only one difference (2 μm) from two perspectives. I was expecting a comparison with stimuli differing by 4 μm in amplitude (comparable to the high stimulus comparison of 26 μm vs. 22 μm stimuli).

      We have indeed analyzed the 12 μm versus 16 μm stimulus pair, which corresponds to a 4 μm difference and is reliably discriminated by both genotypes. In the original manuscript, we did not include this comparison because of the differences already seen at a 2 μm amplitude difference. Based on the reviewer’s suggestion, we have now included the 12 μm vs. 16 μm comparison in the revised manuscript (Results, lines 270-272; Fig. 3E) to provide a complementary perspective consistent with the high-salience comparisons (26 μm vs. 22 μm).

      (c) "Sensitivity d' for high- and low-salience stimuli was calculated based on the Correct and Incorrect choice rate for high- and low-salience stimuli respectively." How were trials for which the animal did not respond taken into account? Were these part of the denominator? Or were these excluded when calculating proportions? (related to the Q regarding Figure 3 D,E above).

      Indeed, the Miss trials were part of the denominator. This is now clarified in the Methods section (line 631).

      (d) "c = d'(high)- d'(low)." - I did not understand this fully. There were several high and several slow stimuli - so how were these calculated? Pooled for high and pooled for low? Per stimulus difference?

      This was indeed calculated for pooled high and low amplitudes during testing. In the revised manuscript, criterion c has been recalculated based on the average correct high rate (for stimuli of 20-26 µm amplitude) and average incorrect low rate (for stimuli of 12-18 µm amplitude), using the same formula as in the analysis of the training dataset:

      c = −1/2 × [z(lick right / high amplitude) + z(lick right / low amplitude)]

      Pooling across amplitudes allows us to obtain a single summary measure of response bias toward the right lickport, independent of stimulus discriminability. This approach is consistent with standard signal detection theory practices when multiple stimulus levels are present.

      If the inter-trial interval is 5-10s, how is a 5s timeout a punishment?

      The 5 s timeout serves as a punishment by temporarily delaying access to the next trial and potential reward, thereby reducing the overall reward rate. Even though the inter-trial interval (ITI) varies between 5 and 10 s, the timeout increases the effective delay before the next opportunity to earn a reward, discouraging incorrect responses. This is consistent with standard operant conditioning procedures, where brief timeouts act as negative consequences without being overly severe. Across most trials, the timeout effectively reduces expected reward rate, though its impact is minimal when the ITI is already long.

      Reviewer #2 (Recommendations for the authors):

      Task-related questions:

      (1) What evidence is there that the 40 Hz, 12 μm stimulus is "low salience: while the 40 Hz, 26 μm stimulus is "high salience"? This seems like an arbitrary distinction without showing sensitivity curves across a group of animals. Better definitions of the stimuli and the actual forces applied are necessary.

      We thank the reviewer for this comment. Based on our previous work (Semelidou et al., bioRxiv; Accepted in Advanced Science), both the 40 Hz, 12 µm and 40 Hz, 26 µm stimuli are clearly suprathreshold. In the present study, however, stimulus salience is defined in a relative and operational manner within this suprathreshold range.

      Specifically, analysis of miss trials (Fig. S3E) shows that the 40 Hz, 12 μm stimulus consistently elicited a higher proportion of missed responses compared to the 40 Hz, 26 μm stimulus across animals, indicating lower behavioral performance for the lower-amplitude stimulus. We therefore refer to the 12 μm stimulus as “low salience” and the 26 μm stimulus as “high salience” to denote relative differences in perceptual strength and attentional engagement within the suprathreshold range, rather than differences in detectability or absolute sensory sensitivity. This definition has been clarified in the Methods (lines 583-587) and Results sections (lines 115-119; lines 225-227).

      (2) Sensitivity curves/detection thresholds for each mouse should be included in the study.

      We thank the reviewer for this suggestion. Sensitivity curves and detection thresholds for low-amplitude and low-frequency vibrotactile forepaw stimulation have been systematically characterized in our previous study (Semelidou et al., bioRxiv, Accepted in Advanced Science). In that work, we demonstrated that stimuli with similar amplitudes and even lower frequency (10Hz) than those used in the present study are reliably detectable by mice, confirming that both the 40 Hz, 12 µm and 40 Hz, 26 µm stimuli fall within the suprathreshold range.

      Because the goal of the present study was not to determine absolute detection thresholds but rather to examine discrimination and categorization performance within a suprathreshold range, we did not re-establish full psychometric detection curves for each mouse.

      We have clarified this rationale in the revised manuscript (Results, lines 108-113; Methods, lines: 577-579).

      (3) What force is being applied during stimulus presentations? 12 or 26 μm does not provide enough information about the stimuli applied. What are the physical parameters of the indenter? What material, what tip size?

      Vibrotactile stimuli were delivered to the forepaw via a piezoelectric actuator. A 12.7 mm stainless steel post (ThorLabs) was mounted on the actuator vertically and a 0.6 mm stainless steel rod (ThorLabs) was clamped horizontally onto this post. The horizontal rod served as the contact bar on which the animal rested its right forepaw.

      Stimuli were sinusoidal vibrations at 40 Hz with peak-to-peak displacements of 12 μm (low salience) or 26 μm (high salience). The actuator displacement was calibrated prior to experiments to ensure accurate vibration amplitudes.

      Animals were positioned in the setup to ensure stable and consistent forepaw contact with the rod delivering the vibration. Pilot experiments with an extra sensor to monitor forepaw placement confirmed that the mice did not remove their forepaws from the bar before stimulus delivery. All this information is now added in the Methods section (lines 552-555, 580-582).

      (4) Only one vibration stimulus was used (40 Hz) - this preferentially activates specific subsets of low-threshold mechanoreceptors and not others. A range of vibrotactile stimuli (with varying frequencies) would be more useful. From this limited range of stimuli, it is difficult to assess whether the findings would extrapolate to other types of stimuli.

      We agree that using a single vibration frequency limits the generalization of our findings across the full range of mechanoreceptor subtypes and vibrotactile stimulus conditions. In the present study, we deliberately focused on amplitude discrimination within the flutter range (<50 Hz), as this frequency preferentially activates subsets of low-threshold mechanoreceptors relevant for flutter perception and is commonly used in clinical studies of tactile amplitude discrimination (Puts et al., 2014, 2017; Asaridou et al., 2022). By holding frequency constant and varying only amplitude, we were able to isolate amplitude-dependent perceptual and decision-making processes while minimizing frequency-dependent variability and to facilitate direct translational comparisons with human studies using similar flutter stimuli.

      We acknowledge, however, that extending the paradigm to additional, high frequencies would help determine whether the observed effects generalize across mechanoreceptor channels. We have now added this point as a future direction in the Discussion section (lines 510-514).

      (5) The methods indicate that during the implementation of the water-restriction protocol, mice had access to a solid water supplement in their home cage. How did they control for how much water supplement was consumed by each mouse before the testing sessions?

      We thank the reviewer for raising this point. The solid water supplement was divided into premeasured individual portions, and each mouse received its allotted amount only after the daily training/testing session. Daily body weight measurements were used to monitor hydration and ensure that all animals maintained stable body weight. If necessary, supplemental water was adjusted to maintain animals within the approved weight range. This procedure is now described in the Methods section (line 567-571).

      (6) A control version of the test, perhaps using a different sensory modality, would be useful for making conclusions.

      We agree that testing other sensory modalities would provide a useful control for assessing the generalizability of the observed effects. However, in the present study, we intentionally focused on the tactile modality, as touch has been shown to play a critical role in autism across sexes and predict other core behavioral symptoms. This makes touch particularly relevant for investigating translational mechanisms in this model.

      By specifically targeting tactile perception, we aimed to investigate the link between sensory discrimination, decision-making, and cognitive modulation within a modality that is strongly implicated in autism. Previous studies in autistic individuals have demonstrated similar interactions between cognitive processes and perceptual decision-making in the visual domain, suggesting that such effects may not be modality-specific. Nevertheless, extending this paradigm to additional sensory systems would be valuable to directly test whether comparable cognitive influences on perception generalize across modalities. We have now incorporated this perspective as a future direction in the Discussion section (lines 514-518).

      Reviewer #3 (Recommendations for the authors):

      There are several questions:

      (1) It is important to show stimulus intensity-response curves representing tactile responses for both WT and Fmr1-/y mice.

      We thank the reviewer for this important comment. Detection sensitivity curves for lowamplitude and low-frequency vibrotactile stimulation of the forepaw have been characterized in detail in our previous study (Semelidou et al., bioRxiv; now accepted in Advanced Science). In that work, we showed that stimuli at or above 8 µm amplitude and 10Hz frequency are reliably detected by both WT and Fmr1<sup>-/y</sup> mice.

      Based on these findings, the current study employed vibrotactile stimuli at a higher frequency (40 Hz) and amplitudes of 12 µm and above, ensuring that all stimuli were well within the suprathreshold range for both genotypes. This experimental choice was made to specifically probe discrimination, categorization, and decision-making processes, rather than basic sensory detection. As a result, the behavioral effects reported here cannot be attributed to differences in stimulus detectability.

      We have clarified this rationale in the revised manuscript to make explicit that the absence of full intensity-response curves in the current study reflects a deliberate focus on suprathreshold perceptual and cognitive processes rather than sensory threshold differences (Results, lines 108-113; Methods, lines: 577-579).

      (2) There is no difference in the time it takes to learn the task between WT and Fmr1-/y mice. But how does the learning rate curve look? Is there a difference in the slope between WT and Fmr1-/y early vs late into learning?

      We thank the reviewer for this suggestion. To directly address whether learning dynamics differed between genotypes, we analyzed learning curves across training.

      We first computed the correct choice rate per day for each animal (Fig. S2A) and fit a mixedeffects model including training day, genotype, and their interaction. This analysis revealed no genotype differences in baseline performance or learning rate with minimal Genotype × Day interaction (Fig. S2A-top, Fig. S2C).

      We additionally computed the slope of the learning curve for each individual, which also showed no difference across genotypes (Fig. S2B). In addition, within-animal day-to-day performance variability was also comparable across groups (Fig. S2A-bottom, S2D).

      These analyses indicate that WT and Fmr1<sup>-/y</sup> mice exhibit similar learning trajectories during training. The learning curves are now included in Figure S2, described in the Results (lines 140–151) and detailed in the Methods (lines 644-658).

      (3) It would be useful to see raster plots of licks for different trials and the corresponding lick density plots for early vs late trials.

      We thank the reviewer for this suggestion. To visualize trial-by-trial behavior, we included example lick traces from an early 100-trial session and a late 100-trial session, alongside the corresponding raster plots of licks (Fig. S1A–B).

      (4) Consistent with the first question, examples of intermediate learning stages would help gain more insight into how both WT and Fmr1-/y mice learn.

      In line with the reviewer’s suggestion, we examined whether WT and Fmr1<sup>-/y</sup> mice showed different performance during intermediate stages of learning. To this end, we defined the middle three days of the training period of each animal as the intermediate learning phase. We compared both the mean correct-choice rate and individual learning slopes across this interval. Statistical analyses revealed no significant genotype differences in either measure, indicating comparable performance and learning dynamics during the intermediate phase of training (lines 152-156).

      (5) How does the learning rate change with increased cognitive load for both WT and Fmr1-/y mice?

      We thank the reviewer for this question. While our experimental design did not include a manipulation of cognitive load during the learning phase itself, we assessed whether increased cognitive load affected performance by analyzing behavior on the first day of testing, when animals were required to categorize and discriminate among a larger set of stimuli compared to training.

      Using performance on the training stimuli during this first testing session as a proxy, we found no significant difference between WT and Fmr1<sup>-/y</sup> mice in correct choice rate (Author response image 1). This indicates that increased cognitive load did not differentially affect performance on familiar stimuli across genotypes at this stage.

      Because this analysis does not reflect learning rate per se, but rather performance under increased task demands after learning had already occurred, we did not incorporate it into the main Results section. Instead, it is presented here to directly address the reviewer’s question.

      Author response image 1.

      Correct choice rate for the 12 µm and 26 µm stimuli during the first day of testing when the cognitive load is high.

      (6) How does the learning rate change if the sensory stimuli are more challenging for both WT and Fmr1-/y to detect?

      We thank the reviewer for this question. In the present study, animals were deliberately trained using well-separated, suprathreshold low- and high-salience stimuli to ensure reliable stimulus detection and to avoid confounding learning rate with perceptual difficulty or discrimination limits.

      A recent study (Heimburg et al., 2025) has shown that learning is slower when the difference between the two training stimuli is reduced. Based on these results, we would expect that decreasing the separation between low- and high-salience stimuli would similarly increase training duration for both WT and Fmr1<sup>-/y</sup> mice, since our results do not indicate any discrimination or categorization deficits in the mouse model of autism. However, directly testing how stimulus difficulty modulates learning rate would require a dedicated manipulation of stimulus spacing during training and was beyond the scope of the current study.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals.

      These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __ __We thank all reviewers for the valuable feedback and critical insight on our study. We acknowledge the concern that the manuscript, in its initial form, appeared descriptive and did not provide the mechanistic insight inferred from the current data. In the revised manuscript, we will (i) more clearly delineate what mechanistic inferences can be drawn from the existing data, (ii) expand our discussion of the caspase-independent mechanisms, and (iii) incorporate additional experiments/analyses aimed at identifying downstream effectors that mediate the observed phenotypes. In this revision plan, we have included six new figures addressing some of the major issues raised by reviewers.

      1. Specifically, to address questions about mechanistic insight, we generated stable ACSL1:HaloTag expressing hESCs. Currently presented as Figure 1A for reviewers____. __ACSL1 is a critical enzyme that catalyzes the first step of fatty acid oxidation at the outer mitochondrial membrane. Our previous analysis and work from the Opferman lab demonstrated that ACSL1 contains a BH3-like domain. Thus, we examined the effects of MCL-1 inhibition on the mitochondrial localization of this enzyme. Our findings pinpoint that MCL-1 inhibition is causing the displacement of ACSL1 from the mitochondria (__Figures 1B-C for reviewers). Our interpretations of the effects of MCL-1 inhibition are 2-fold: 1) as we show in our data, MCL-1 inhibition causes disruption of the mitochondrial cristae, altering the microenvironment for fatty acid oxidation, and 2) as seen in cancer cells, the MCL-1 inhibitor may also displace ACSL1 from the mitochondria. In the new version of the manuscript, we will focus on these 2 mechanisms as mechanistic outcomes of MCL-1 inhibition.
      2. We have included data of cells treated with Perhexilin (CPT1/2 inhibitor), and Etomoxir (CPT1a inhibitor) (Figure 2 for reviewers). This experiment determines whether direct perturbation the FAO pathway mimics the effects of the MCL-1i.
      3. We have assayed the effects of MCL-1 inhibition on oxygen consumption rates in NPCs. Currently presented as Figure 3 for reviewers.
      4. We will perform MCL-1:MICOS proximity ligation assays and/or immunoprecipitation assays to determine whether MCL-1 inhibitors disrupt the association of MCL-1 with MICOS. Preliminary data suggesting an association (albeit, very weak) are shown in Figure 4 for reviewers. __Reviewer #1____ (Evidence, reproducibility and clarity (Required)): __

      Summary: This study claims that beyond its canonical anti-apoptotic function, MCL-1 has essential non-apoptotic roles in human neurodevelopment. Pharmacologic inhibition of MCL-1 in human neural stem cells disrupts mitochondrial inner membrane architecture by destabilizing cristae and the OPA1-MICOS complex, leading to swollen mitochondria with disorganized cristae. These structural defects impair fatty acid oxidation and lipid droplet homeostasis, linking cristae integrity to metabolic competence. Independently of apoptosis or proliferation, MCL-1 inhibition selectively depletes intermediate neural progenitors, indicating a direct role in lineage progression. Overall, the work positions MCL-1 as a key regulator of mitochondrial structure-metabolism coupling that instructs neural progenitor identity and human neurogenesis.

      Overall: The study does a good job of using (in most assays) caspase inhibition (e.g., QVD treatment) to block apoptotic responses induced by MCL-1 inhibition. As a result, many of the phenotypes caused by inhibition are likely to be independent of caspase activation. As a result, this manuscript would be of interest to researchers that study the topics of the BCL-2 family and cell death signaling, mitochondrial bioenergetics and dynamics, neurodevelopment, and cellular metabolism. However, as currently presented the manuscript is only descriptive and lacks mechanistic insight.

      We thank Reviewer 1 for the insightful evaluation of our work. We are encouraged that the reviewer finds the study relevant to investigators in the fields of BCL-2 family biology, mitochondrial dynamics and bioenergetics, neurodevelopment, and cellular metabolism. We also thank the reviewer for pointing out the need to increase the mechanistic insight of our findings. As mentioned above, in the revised manuscript, we are proposing to address this.

      Major Concerns:

      1) The authors only use a single MCL-1 inhibitor and never use other non-targeting BH3-mimetics (such as venetoclax) as negative controls. This seems like a missed opportunity to demonstrate that the phenotypes observed are MCL-1 dependent.

      This is an excellent point. We will include venetoclax (ABT-199) to examine their effect on intermediate progenitors (TBR2 +) and early born neurons (BIII tubulin +).

      2) There is no mechanism proposed in this study other than reliance upon QVD as not affecting the phenotypes. As submitted, the manuscript only can speculate that these phenotypes are due to non-apoptotic roles of MCL-1 inhibition. The authors have missed an opportunity to explore MCL-1's non-apoptotic functions directly.

      Mechanistically, we propose MCL-1 is acting in 2 ways: 1) as we show in our data, MCL-1 inhibition causes disruption of the mitochondrial cristae, altering the microenvironment for fatty acid oxidation, and 2) as seen in cancer cells, MCL-1 inhibitors may also displace ACSL1 from the mitochondria.

      In the past few weeks, since receiving the initial reviews, we have focused on testing the 2nd possibility, since the accumulation of lipids was also seen in cancer cells (see PMID: 38503284). We have successfully generated stable ACSL1:HaloTag expressing hESCs (Figure 1A for reviewers). Our findings included here, ACSL1 is displaced from the mitochondria by MCL-1 inhibition in NPCs (Figures 1B-C for reviewers).

      Other concerns exist that weaken the impact of the study.

      1. Figure 1 should include the fact that QVD inhibition (shown in Sup Fig 2) does not obviate the phenotype induced by pharmacological inhibition of MCL-1 on mitochondrial morphology. We would like to clarify that QVD does prevent the phenotypes induced by MCL-1 inhibition on mitochondrial morphology. In Fig1B, we report an increase in volume and surface area at 24h and 48h along with a decrease in mitochondrial content at 48h when NPCs were treated with MCL-1i only. However, NPCs co-treated with QVD in Supp Fig 2B did not exhibit any significant morphological phenotypes on average or at min/max values. Reviewer 1 may be referring to Fig 1B’s corresponding min/max values presented in Supp Fig 2A where we reported an increase in __max __volume.

      Figure #

      Volume

      Surface Area

      Fig 1B (MCL-1i only, avg values)

      Increase (avg vol)

      increase (avg)

      Supp Fig 2B (MCL-1i+QVD)

      no change

      no change

      Supp Fig 2A (MCL-1i only, max/min values)

      increase (max vol)

      no change (max)

      For clarity, we will move Supplementary Fig 2A into Supplementary Fig 1.

      Figure 2 would benefit from evidence that caspase inhibition does not repress the phenotype on mitochondrial cristae morphology (volume and area). Furthermore, the FIB-SEM data are very hard to appreciate as the size precludes visualization of individual mitochondria.

      While we included the visualization of the segmented mitochondria and cristae (Figure 2C), as well as snapshots through the z-stack for segmented cristae only (Figure 2E) and segmented mitochondria separately (Supp Figure 3A) in the original manuscript, we are also now attaching the FIB-SEM 3D reconstruction videos (New Supplementary Videos 1-2 for reviewers) (1. Mito and cristae, 2. Cristae only, 3. Mito only) for ease of visualization purposes.

      Figure 3 reports that MIC60 and OPA1 appear to be downregulated in response to MCL-1 inhibition, but these appear to be more significant only when QVD is added. Why would the phenotype be obscured in the non-QVD setting (Fig. 2B&C). How does MCL-1 inhibition lead to changes in MIC60/MICOS/OPA1? This seems quite preliminary at this point.

      In Figures 3B and 3C, we report decreased protein levels of short-form OPA1 and MIC10 only, not MIC60. We argue that our data with QVD shows that the cell death function of MCL-1 (i.e., inhibiting cell death effectors from initiating the caspase cascade) is not the main trigger of the phenotypes we report (cristae dysregulation and fatty acid oxidation disruption), however, cells without a functional cristae and/or defects in FAO, may not be able to survive long-term. Thus, QVD treatment preserves these cells that may not survive the dismantling of such an essential structure. To confirm this, we have performed immunofluorescence of cleaved caspase 3 (Figure 5 for reviewers). These results show that indeed MCL-1 inhibition at the time points of our study doesn’t result in increased activation of Caspase-3. We reported similar results of MCL-1 inhibition in oligodendrocyte precursor cells (Gil and Hanna et al., Glia, 2025, PMID: 41420072)

      The loss of MIC60 and OPA1 should repress electron transport chain function, are such impacts observed in the cultured cells? This could be shown by assessing oxygen consumption, etc. Such data would enhance the authors' conclusion that MCL-1 inhibition leads to defects in mitochondrial physiology*. *

      We completely agree with this comment by Reviewer 1. In our revision, we will include an assessment of mitochondrial oxygen consumption rate, using the Seahorse analyzer (mitochondrial stress test), of NPCs treated with MCL-1i. Preliminary data (n=3) are currently presented as Figure 3 for reviewers. Interestingly, these data show a more nuanced cellular response. Consistent with our conclusion that MCL-1 inhibition does not cause apoptotic cell death, MCL-1i did not affect mitochondrial respiration at baseline. The specific deficits appear in spare respiratory capacity and maximal respiration, meaning cells can sustain routine mitochondrial function but lose the ability to respond to increased energetic demand. This suggests MCL-1 loss creates a mitochondrial reserve deficiency rather than a generalized bioenergetic failure. The results with caspase inhibitors show a near-zero OCR across both 24h and 48h timepoints, and significant reductions in maximal respiration, spare respiratory capacity, and non-mitochondrial OCR. Remarkably, these conditions are not detrimental to newborn neurons, as shown in Figure 7. This is very interesting because it suggests that, under severe bioenergetic failure, neural stem cells (PAX6+) can differentiate into newborn neurons in a TBR2-independent manner. More relevant to this study, our results unequivocally demonstrate that TBR2-positive cells depend on the non-apoptotic function of MCL-1

      In Figure 4, the differences between transcripts (qPCR data) and protein (immunoblot) data are often confusing and not well explained. Why do the authors propose that mRNA expression is decreasing whereas the protein expression is increasing? Example CPT1. Furthermore, it is unclear what these data mean functionally? Is this reflective of enhanced lipid oxidation or simply a response to inhibition of fatty acid oxidation? Clarification of the impact of these findings is necessary.

      We agree with Reviewer 1 that the results could be hard to interpret. However, the effects of MCL-1 inhibitors on the transcription of fatty acid oxidation genes have been widely cited by the work of Opferman and Walensky (PMID: 36198266). We speculate that the effects on transcription are triggered by mitochondrial signaling. The mechanistic insight into this phenomenon would be an interesting next step.

      In the case of CPT1, we addressed this comment and found that the difference is due to differential expression of isoforms The RT-qPCR shown in Figure 4, is on CPT1c, whereas the western blot is on CPT1a. Unfortunately, after trying several products, we determined that there are no good antibodies for CPT1c. Thus, since we can’t compare gene and protein expression, we will include CPT1a RT-qPCR data to complement the western blot.

      The increase in lipid droplet number induced by MCL-1 inhibition has been previously documented, but it is unclear whether this increase is related to an inability to oxidize lipid (defective fatty acid oxidation) that leads to increases in the cellular abundance or whether this indicates that MCL-1 inhibition leads to enhanced storage. Do other inhibitors of fatty acid oxidation lead to similar increases in lipid droplet size and abundance? Does QVD inhibition affect this phenotype?

      This is a great point raised by Reviewer 1, and one we have also wondered about. We conducted an experiment using C16 BODIPY to address this point (Figure 6 for Reviewers). We observed no changes in C16 lipid droplet accumulation in count, volume, or surface area when cells were treated with MCL-1 inhibitor for 24 hours total with or without a starvation period in the last 6 hours of treatment. However, we observed significant pan-lipid droplet accumulation in the same conditions. This contrast suggests that FAO of exogenous LC-fatty acids is not reliant on MCL-1. This finding does not discount from the requirement of MCL-1 for other FAO processes especially given the major limitation of how much C16 BODIPY (fluorescent palmitate) can be administered to the cells (10µM) which was 10-fold less than what we exogenously supplied to the cells for the pan-BODIPY experiment (100µM, see Figure 5). It is entirely possible that this small dose was not enough to detect any lipid droplet accumulation.

      We have now also included experiments using etomoxir and perhexiline to assess their effects on TBR2/PAX6 (Figure 2 for reviewers). The results indicate that inhibiting the FAO pathway does not fully mimic the effects of MCL-1i on TBR2. However, we show that MCL-1i displaces ACSL1 from the mitochondria, a step that is upstream of CPT1/2. We suggest a model in which the coordinated non-apoptotic function of MCL-1 at the outer mitochondrial membrane promotes ACSL1 activity and, in the inner mitochondrial membrane, regulates mitochondrial cristae morphology. While our data point to this model, we are limited by the tools to investigate it further, but it will be a great direction for future experiments.

      For Figure 6, while these data may be very meaningful, as presented they are very hard to appreciate. Insets that show the neuronal populations would help to convey the point that the differentiation is impacted. Also, are there other methods that could confirm these observations (qPCR to show changes in differentiation).

      We agree with Reviewer 1. In the new version of the manuscript, we will include panels that zoom into the cell populations we quantified. The current panels will go to a new Supplemental figure. We will also add the TUBB3 to the qPCR panel in the new version.

      Figure 7 is also very hard to appreciate. What is the reader to see? Can these be quantified? It seems that QVD may be rescuing in this figure, does this suggest that MCL-1 inhibition might be inducing death. All of this needs to be quantified.

      We will provide quantification of BIII tubulin branching, and it will be included next to the images provided.

      BCL-XL has also been implicated in affecting mitochondrial electron transport chain function (See PMID: 19255249, 21926988, 21987637). Can BCL-XL inhibitors affect any of the phenotypes associated here?

      We will include experiments to test the effect of BCL-2 and BCL-XL inhibitors on TBR2 cells to address this comment.

      Please be carefully avoid using the term "MCL-1 loss", when talking about pharmacological inhibition. Only genetic ablation (e.g. knockout, silencing, etc.) should be termed loss.

      We have now removed the reference to MCL-1 loss in line 199.

      __*Reviewer #1 (Significance (Required)):

      The study advances in human cells the impacts of MCL-1 inhibition. They replicate many impacts previously observed in mouse systems and refine analyses to impacts on MICOS complex, lipid droplet storage, and neuronal differentiation. While these findings are important and would be well received by a wide audience, the study fails to provide almost any mechanistic insight into how these phenotypes are being induced. The only common theme is that blocking caspase activation in many assays fails to block the phenotype.

      *__

      __Reviewer #2_ (Evidence, reproducibility and clarity (Required)): _*

      Summary: This manuscript by Hanna et al. investigates non-apoptotic roles of MCL-1 in human neural stem cells and connects MCL-1 inhibition to mitochondrial cristae formation and beta-oxidation. Connecting these roles to brain development, the authors also show a reduction in the number of progenitor cells upon MCL-1 inhibition, independently of caspase activity. Throughout their work, the authors make use of an impressive array of imaging techniques. While the methods used offer sufficient evidence to connect MCL-1 inhibition to cristae architecture, the mechanistic underpinnings of this effect remain unexplored. *__

      We thank Reviewer 2 for the thoughtful and positive assessment of our manuscript. We appreciate the reviewer’s recognition that our study reveals non-apoptotic roles of MCL-1 in human neural stem cells. We are also grateful for the acknowledgment of the imaging approaches employed, which allowed us to connect MCL-1 function to cristae architecture with multiple complementary techniques. We acknowledge the reviewer’s point that the mechanistic basis by which MCL-1 influences cristae structure remains insufficiently defined. In the revised manuscript, we will clarify the limitations of the current data, expand our discussion of potential mechanisms, and incorporate additional analyses to identify downstream effectors that mediate these structural and metabolic changes.

      Major comments:

      - In Fig. 1B, the very same representative images are shown for both conditions (DMSO and S63845) at 48 hours.

      We deeply appreciate Reviewer 2 for catching this unintentional duplication that occurred during figure preparation. We have now corrected this issue.

      - For Western Blot analysis, it looks like the authors only quantified the band density of their proteins of interest without considering varying levels of control protein (Actin) levels. Normalizing the protein levels to actin would account for any differences in loaded protein amounts (although a Ponceau staining might be preferable still to exclude this). This is especially relevant for Fig. 4E, where actin levels visibly differ between the conditions.

      All WB quantifications were normalized to Actin (this detail is now added to the y-axis of all band density graphs and figure legends). In addition, we will transform the data to a logarithmic scale to “normalize” for gel-to-gel variability.

      - The authors offer evidence that MCL-1 inhibition impedes proteolytic cleavage of OPA1-L into the OPA-1-S isoforms, yet do not explore the mechanism behind this. Since OPA1 is cleaved by both OMA1 and YME1L, determination of the levels of these proteases could help shed some light on the mechanism leading to cristae reorganization.

      We will follow up on Reviewer 2's comment with a WB analysis of OMA1 and YMEL in cells treated with an MCL-1 inhibitor.

      - Generally speaking, while the authors show all those effects (cristae defects, FAO dysfunction) upon MCL-1 inhibition, it would be interesting to see whether any of those effects can be rescued by blocking FA import e.g. through carnitine palmitoyl- transferase 1a (CPT1a) inhibition with etomoxir to understand if they are downstream of altered Fa supply. This could affect cristae morphology through altered Cardiolipin biogenesis.

      This is an excellent point, which was also raised by reviewer 1. We have now included experiments using etomoxir and perhexiline to assess their effects on TBR2/PAX6 (Figure 2 for Reviewers). As mentioned above, the results indicate that inhibiting the FAO pathway does not fully mimic the effects of MCL-1i on TBR2. However, we show that MCL-1i displaces ACSL1 from the mitochondria, a step that is upstream of CPT1 and 2. We suggest a model in which the coordinated non-apoptotic function of MCL-1 at the outer mitochondrial membrane promotes ACSL1 activity and, in the inner mitochondrial membrane, regulates mitochondrial cristae morphology. While our data point to this model, we are limited by the tools to investigate it further, but it will be a great direction for future experiments. The suggestion of Reviewer 2 that the effects on FAO could impact cardiolipin biogenesis is a very exciting possibility. However, difficult to test with the tools available.

      - In line 262 the authors discuss that mitochondria lose metabolic function upon MCL-1 inhibition. This claim would require additional experiments. While the authors look at lipid droplet accumulation and FAO enzymes, there are many more aspects to mitochondrial metabolic function that should be investigated. While measuring the oxygen consumption rate via Seahorse might require additional resources (optional), measurements of ATP production, ROS generation or determination of the mitochondrial membrane potential should be feasible.

      We fully agree with Reviewer 2's comment, which was also raised by Reviewer 1. In our revision, we will include an assessment of the mitochondrial oxygen consumption rate of NPCs treated with MCL-1i, measured using the Seahorse analyzer (mitochondrial stress test). These data are presented as Figure 3 for reviewers. Interestingly, these data show a more nuanced cellular response. While MCL-1i does not globally collapse mitochondrial respiration at baseline, the specific deficits appear in spare respiratory capacity and maximal respiration, meaning cells can sustain routine mitochondrial function but lose the ability to respond to increased energetic demand. This suggests MCL-1 loss creates a mitochondrial reserve deficiency rather than a generalized bioenergetic failure. The results with caspase inhibitors show a near-zero OCR across both 24h and 48h timepoints, and significant reductions in maximal respiration, spare respiratory capacity, and non-mitochondrial OCR. These conditions are detrimental for TBR2-positive NPCs (Figure 6) , but not for newborn neurons (Figure 7).

      - While the authors "propose a model in which MCL-1 associates with MICOS", they do not offer direct scientific to support this hypothesis. Co-immunoprecipitation experiments or e.g. proximity ligation assays would better support the proposed model.

      We agree with this statement. Preliminary, we have performed proximity ligation assays and immunoprecipitation analyses to test for this interaction (see below and ____Figure 4 for reviewers), and the results indicate an interaction, albeit very weak. In the revised version of the manuscript, we will attempt to repeat these experiments with MCL-1i.

      - While Fig. 7 shows representative images, quantification e.g. for the truncation of neuronal processes is missing.

      We will provide quantification of BIII tubulin branching, which will be included alongside the images provided.

      - In lines 219f. the authors state that they "observed a significant downregulation of PAX6 and EOMES at 24 hours that was not rescued by QVD co-treatment". While there is still a trend towards a downregulation, there is no statistical significance anymore. In fact, PAX6 levels almost mirror those of SOX2 which is not described as "downregulated" by the authors. In order to be more consistent, I would suggest rephrasing this part, or at least reword it to be less absolute.

      In the new version, we will clarify that while QVD rescued TBR2 and PAX6 transcript levels at 24h, it did not rescue them at 48h. We will also mention the downregulation of SOX2 at 48h that persists with co-treatment.

      - Brinkmann et al. (2025) also investigated cristae structure upon MCL-1 deletion in vivo and found no effect when MCL-1 was replaced with other Bcl-2 family members. It would be interesting to combine MCL-1 inhibition with overexpression of MCL-1 versus BCL-XL to reconsolidate some of the discrepant findings.

      While this is a great suggestion for future studies, there are some complications. Specifically, it is likely that the inhibitor may also target the overexpressed MCL-1 and thus, a mutant form is needed.

      To address this, we generated a Flag-tagged MCL-1 construct with a mutated BH3 domain, previously described by Kotschy et al. Nature 2016. We validated the construct in HeLa cells, but unfortunately the mutant protein appears to be significantly less stable than the WT construct, complicating analysis of this experiment.

      Minor comments:

      - In Supp. Fig. 1C the MCL-1 protein is shown both to run above 37kDa (upper panel) and below 37 kDa (lower panel). Could the authors please comment on why this is the case?

      The observed variation is caused by drift in the gel during electrophoresis. In Fig 1C, the protein ladder is on the edge of the gel, whereas in Fig 1E, the protein ladder is in the middle of the gel, and the last sample is on the edge and also exhibits edge drift.

      - In line 64 of the introduction the authors mention clinical trials yet do not give a citation for these trials making it hard to judge whether the content of these trials is actually related to the brain.

      This information is anecdotal, based on an Amgen press release.

      - MCL-1 as well as ACSL-1 are sometimes written without the hyphen both in the text and figures.

      We will carefully check the manuscript before submission.

      - Lines 92-94 and 106-108 essentially highlight the same existing knowledge gap. Maybe the content of these two paragraphs could be combined in order to avoid repetition.

      We thank Reviewer 2 for this suggestion. We will do this in the new version of the manuscript.

      - In Fig. 1A, the authors provide a schematic for their experimental design. While the figure legend is very thorough, some of this information (like the days of collection) could also be included in the figure itself. The same is true for schematics in the following figures.

      We agree with this and will incorporate the suggestion in the new version.

      - Fig. 2A includes a typo (analyze) but would maybe also be more suitable for the supplement figures or could even be combined with Fig. 1A as not much new content is added.

      We already incorporated these changes in the new version of the manuscript.

      - Regarding statistical analysis, could the authors please comment on why they did not consider one-sample t-tests suitable for the cases where control values were set at 1 (e.g. Fig. 4B, C for the relative expression).

      This is a valid suggestion. We will rerun RT-qPCR data using a one-sample t-test.

      - In lines 247f. the authors state that "inhibition of MCL-1 leads to [...] and disassembly of the MICOS complex as well as OPA1". This sounds like OPA1 is still cleaved upon MCL-1, which is not at all what the authors showed and further discuss. Rewording of the sentence would help in avoiding any misunderstandings.

      We agree with this comment and have now reworded the paragraph: “Inhibition of MCL-1 leads to structural collapse of the cristae likely due to the possible disassembly of the MICOS complex, as suggested by decreased MIC10 levels, and interruption of OPA1 cleavage, as suggested by decreased short-form OPA1, two scaffolds required for cristae maintenance.”

      - In lines 210f. the authors state that "quantitative imaging increased the average and maximum volume of lipid droplets". While there is definitely a trend towards an increase for the maximum volume, the increase is in fact not statistically significant. This should be reflected in the wording.

      We have reworded this to “Quantitative imaging revealed a significant increase in average lipid droplet volume and a trending increase in maximum volume of lipid droplets.”

      - In Fig. 6 the overlap between TBR2 and PAX6 is hard to judge when printed out. Including a zoom-in may make it easier to judge.

      We agree with Reviewer 2. In the new version of the manuscript, we will include panels that zoom into the cell populations we quantified. The current panels will go to a new Supplemental figure. We will also add the TUBB3 to the qPCR panel in the new version.

      - In Fig. 7 the color-coding is listed in the figure legend but is missing from the figure itself. If the authors could include this, as they did for the other figures, it would further improve this figure.

      We agree. We have specified the channel color in the new figure.

      - Line 238 should reference Fig. 7A, as Fig 7B does not exist.

      Thanks for catching this. It is already corrected

      - In the figure legends the authors state that biological replicates were used. Were technical replicates also performed?

      Yes, technical replicates were performed for RT-qPCR.

      Reviewer #2 (Significance (Required)):____ Significance

      The authors make use of a wide array of imaging techniques to further elucidate non-apoptotic roles of MCL-1. The study has the potential to offer new insights into mitochondrial biology on the level of basic research rather than translational. While the methods used offer sufficient evidence to connect MCL-1 inhibition to cristae architecture, the mechanistic underpinnings of this effect remain unexplored. Nevertheless, the study offers additional knowledge on the role of MCL-1 in human neural stem cells, whereas previous research mostly focused on cardiomyocytes or cancer cells.

      Reviewer #3____ (Evidence, reproducibility and clarity (Required)):

      Summary: ____ In this study, Gama et al. describe a non-canonical role for the anti-apoptotic protein Myeloid Cell Leukemia-1 (MCL-1) in mitochondrial cristae organization and suggest a role of MCL-1 in regulating metabolism and neuronal differentiation. Using fluorescence microscopy imaging and electron microscopy, the authors show changes to mitochondrial morphology upon treatment with MCL-1 inhibitor S63845. MCL-1 inhibition results in altered protein and transcript levels of some key proteins involved in mitochondrial cristae organization and fatty acid metabolism. While some of the findings are interesting and indeed point towards a non-canonical role of MCL-1, several key conclusions of the authors are not sufficiently supported by the data shown in the manuscript.

      We thank Reviewer 3 for the careful evaluation of our manuscript. We appreciate the reviewer’s recognition that our study identifies a potential non-canonical role for MCL-1 in mitochondrial cristae organization, metabolism, and neuronal differentiation. As with Reviews 1 and 2, we are encouraged that the reviewer finds these observations interesting and suggestive of previously unappreciated functions for MCL-1. We agree that stronger evidence is required to firmly link MCL-1 inhibition to specific changes in MICOS organization and metabolic regulation. In the revised manuscript, we will (i) more clearly distinguish between observations and mechanistic inferences, (ii) temper conclusions where appropriate, and (iii) incorporate additional analyses and controls to better substantiate the proposed model.

      Major comments:

      1. The authors try to disentangle the apoptotic and non-apoptotic role of MCL-1 through addition of a caspase inhibitor. However, I am not convinced that phenotypes found under the addition of caspase inhibitor are necessarily caused by non-canonical functions independent of apoptosis. It could also be that the observed changes happen upstream of caspase activation. In addition, many of the described finding, such as CPT1 expression changes, only happen in the presence of the caspase inhibitor. If one follows the logic of the authors, changes associated by non-canonical MCL-1 functions should happen under MCL-1 inhibition and caspase inhibition, but not with MCL-1 inhibition only____. __ The reviewer is right that we expected non-canonical functions to happen under MCL-1 inhibition and caspase inhibition. Our data with QVD shows that the cell death function of MCL-1 (i.e., inhibiting cell death effectors from initiating the caspase cascade) is not the main trigger of the phenotypes we report (cristae dysregulation and fatty acid oxidation disruption), however, cells without a functional cristae and/or defects in FAO, may not be able to survive long-term. Thus, QVD treatment preserves these cells that may not survive the dismantling of such an essential structure. To confirm this, we performed immunofluorescence of cleaved caspase 3 (__Figure 5 for reviewers). These results show that, indeed, MCL-1 inhibition at the time points of our study doesn’t result in increased Caspase-3 activation. We reported similar results of MCL-1 inhibition in oligodendrocyte precursor cells (Gil and Hanna et al., Glia, 2025, PMID: 41420072).

      The authors show no data on the viability of the cells in response to the MCL-1 inhibitor. To exclude secondary effects of the inhibitor, at least some of the results should be validated with an MCL-1 knock down.

      We will include this experiment in our revised manuscript. To check the effects of MCL-1 knockdown on TBR2 positive cells, we tested 5 different ASOs for MCL-1. Knockdown efficiency with ASOs was very low (on average In Figure 1, the authors show immunofluorescence data of mitochondria and nucleus staining and conclude that MCL-1 inhibition alters mitochondrial morphology. Based on the images shown in Fig. 1, I do not think that individual mitochondria can be segmentd to measure their volume and length. In addition, some metrics such as mitochondrial content are not explained in the text or methods.

      We can achieve mitochondrial segmentation with a SoRa Spinning Disk Confocal Microscope, which has a lateral (XY) resolution of approximately 120 nm to 150 nm and an axial (Z) resolution of approximately 300 nm–320 nm. All images are first denoised prior to sharpening using the Richardson-Lucy deconvolution algorithm. Additionally, the FIB-SEM data are consistent with the IF data (both show increase in mitochondrial volume and surface area).

      We agree with Reviewer 3 that we need to explain some metrics in the revised version. We will specify the meaning of mitochondrial content (count of all mitochondria in FOV, not normalized to Hoechst).

      In Fig. 2 B-D, the authors show TEM and FIB-SEM imaging to demonstrate alterations in the cristae architecture upon treatment with MCL-1 inhibitor. However, based on the images shown, it looks that cristae area and density is reduced under S63845 treatment in TEM images, while the FIB-SEM data come to the opposite conclusion. In addition, the quantification of cristae volume quantified as cristae volume in percentage is unclear to me.

      We apologize for the confusion. No conclusions about the cristae area and density were made using the TEM data, because TEM data represent a single snapshot section of a mitochondrion without a discernible orientation. Cristae from TEM were described as “aberrant” and preliminarily revealed changes in cristae and were followed up with FIB-SEM, 3D reconstruction of intact mitochondria, and quantification of volume.

      In the new version of the manuscript, we will specify that the cristae volume is normalized to the volume of its respective mitochondria (i.e., how much of the mitochondrial volume is attributed to cristae).

      The change in CPT1/2 protein levels (Fig. 4) is interesting but does not directly proof that fatty acid oxidation is altered, as concluded by the authors. For this, the authors would need to directly measure fatty acid oxidation for example using Seahorse or metabolic tracing experiments. Also, to prove that the MCL-1 inhibition affects neural differentiation through fatty acid oxidation, a rescue experiment should be performed through CPT1 overexpression.

      We agreed that this is an important point. We have optimized the fatty acid oxidation test using Seahorse and will make sure to include it in the revised version of the manuscript.

      In Figure 6, the authors show decreased intermediate progenitor cells after MCL-1 inhibition by immunofluorescence staining. I am not convinced that this can be concluded from the data shown, since the concentration of intermediate progenitor cells is very close to the noise levels. Since the MCL-1 treated cells look much less sparse, I don't think the percentages can be compared (total counts are between 2-20). Although this data might give some indication that differentiation could be impaired, the measured effect could be very well due to lower viability of the cells. The authors need to control for this or come up with a different method for measuring differentiation.

      The number of TBR2 is low, but we disagree with the reviewer’s assessment of noise levels. We focused on cells expressing only TBR2 and rigorously examined this population of cells. The percentages are compared to account for the lower density of the MCL-1i-treated cultures, as the IPC counts are normalized to the Hoechst total cell count within the FOV. Moreover, the immunofluorescence images are complemented with RT-qPCR, which shows significant downregulation of EOMES (gene encoding TBR2).

      Figure 7 is missing quantification

      We will include this quantification in the revised version of the manuscript.

      Reviewer #3 (Significance (Required)):

      General assessment____: The manuscript reports an interesting finding, which suggest a non-canonical role of MCL-1 in mitochondrial remodeling, regulation of fatty acid oxidation and neuronal fate. While this finding would be highly interesting and relevant, the presented data do not sufficiently support this conclusion. Further experiments would have to be performed to proof causality. ____ Advance: Should the authors manage to proof their hypothesis by additional experiments, this would indeed advance the field on mitochondrial remodeling and its effect on neuronal differentiation by

      identifying a novel molecular player. ____ Audience: mitochondrial biology, cell biology, developmental neuroscience Own expertise: mitochondrial biology, cell biology, advanced imaging techniques

    1. Reviewer #3 (Public review):

      Summary:

      The authors aim to compare proposal models of perceptual decision making using a joint modeling approach, where they fit models to both behavioral outcomes as well as CPP. Most notably, they compare a standard evidence accumulation model with models that track the evidence without integrating it over time (extrema detection). The authors report that the joint CPP-behavioral data do not discriminate between two of their proposals.

      Strengths:

      This is an interesting finding that reinforces the idea that what we believe to see based on aggregation over trials may not be what happens on every single trial. The models are creative, and the simulations are convincing, relating the models to multiple neural markers of decision formation. These include the CPP but also mu/beta power spectra.

      Weaknesses:

      The paper makes some strong points, and the work seems generally well-executed. The weaknesses that I identified are twofold:

      (1) Embedding in the literature/exposition of the main argument.

      The focus in the introduction is on the noise-free nature of the stimulus and the prolonged presentation time. However, after reading the paper, I felt these were mostly experimental design choices that enable comparison of the different models using the CPP. Perhaps my misreading of the goals of the paper stems from two other observations:

      a) The fact that the stimulus is noise-free does not entail that perception is noise-free. Thus, the argument that using a noise-free stimulus precludes the necessity of temporal integration seems not completely valid. Of course, one could argue that noise is limited in this case, but that makes a noise-free stimulus more of a design choice.

      b) The focus on prolonged stimulus presentation, but at the same time the contrast with expanded judgement, did not make sense to me. Perhaps, as a non-native speaker, I am misreading the subtle difference between "protracted sampling" and "longer sampling", but again, the longer duration seems mostly a design choice.

      More could be said about the optimality of the extrema detection methods. In particular, decades of work (centuries?) have shown that evidence integration is an optimal decision-making procedure: For example, the Sequential Probability Ratio Test is Bayes-optimal wrt mean RT (Wald, 1946); evidence accumulation together with collapsing threshold serves to maximize rewards in repeated choices (e.g., Bogacz et al., PsychRev, 2006; Boehm et al. APP, 2020). Given all this work, why would the brain have evolved to adopt a different mechanism? I realize that the paper is not about optimal decision making, but some discussion of this point seems warranted.

      (2) Modeling choices.

      The authors introduce a parameter, sampT, that represents uncertainty in the sampling onset time. It was not clear to me whether this parameter represented an offset of all trials, or a distribution (probably the latter). I wonder how exactly this parameter was integrated into the models, and in particular, if and how it interacts with the starting-point parameters. My intuition is that on a single-trial, IF early sampling occurs, you can model that with either a negative sampT and z at 0, or with sampT at 0 but a shift in z. This would suggest trade-offs between these parameters, making them hard to estimate independently. Since the paper does not depend on the identification of parameter estimates, this may not be a huge problem, but nevertheless it is good to explore the consequences.

      The way the Bounded Integration model (BIntg) is formulated seems very close to the EZ-diffusion model (Wagenmakers et al., PBR, 2007). This model states that the proportion of correct responses Pc = 1/(1+exp(-B*D/s^2), with B and D the bound and drift rate parameters, respectively. However, filling in the numbers for the high contrast condition from Table 2, and assuming that s=2 (because the model description states that dt=2, with s undefined), I get a Pc of 80% for the 1.6H condition. This seems substantially less than what Figure 2 suggests.

      On some occasions, it is unclear to me what modeling choices are being made:

      a) It seems as if the models are fit on accuracy data alone (before introducing the neural data). This seems suboptimal given that the authors do report differences in RT.

      b) Are the models fit on all data combined, or on the data of individual participants? Fitting individual participant data is preferred, as combined or aggregated data may be distorted by individual differences.

      c) The authors seem to suggest that the diffusion coefficient s is estimated (in the section "Integration models"). Most likely, however, this is set to a fixed value. Obviously, it matters for the model comparison using AIC whether this parameter was freely estimated or not.

      Not really a weakness, but I wondered about the effect of stimulus duration on RT. In particular, what hypothesis (or post hoc explanation) do the authors have for these RT effects? I could think of at least three hypotheses that are consistent with the behavioral data:

      a) H1: The shorter the evidence duration, the more likely participants are to require a double-check before response execution, reflecting their uncertainty about their decision.<br /> b) H2: There is a collapsing threshold that initiates at stimulus offset, leading to quicker responses on trials where there is more evidence.<br /> c) H3: motor preparation is correlated with the evidence signal, which leads to faster responses on trials with more evidence.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of the triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observed differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle, and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine-scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

      We thank the Reviewer for these comments.

      Weaknesses:

      The findings are limited to one pair of muscle heads. While an important initial finding, the lack of confirmation from analysis of other muscles acting at other joints leaves the general relevance of these findings unclear.

      The Reviewer raises a fair point. While outside the scope of this paper, future studies should certainly address a wider range of muscles to better characterize motor unit firing patterns across different sets of effectors with varying anatomical locations. Still, the importance of results from the triceps long and lateral heads should not be understated as this paper, to our knowledge, is the first to capture the difference in firing patterns of motor units across any set of muscles in the locomoting mouse.

      While differences between muscle heads with somewhat distinct functions are interesting and relevant to joint control, differences between MUs for individual muscles, like those in Marshall et al., are more striking because they cannot be attributed potentially to differences in each head's function. The present manuscript does show some signs of differences for MUs within individual heads: in Figure 2C, we see what looks like two clusters of motor units within the long head in terms of their recruitment probability. However, a statistical basis for the existence of two distinct subpopulations is not provided, and no subsequent analysis is done to explore the potential for differences among MUs for individual heads.

      We agree with the Reviewer and have revised the manuscript to better examine potential subpopulations of units within each muscle as presented in Figure 2C. We performed Hartigan’s dip test on motor units within each muscle to test for multimodal distributions. For both muscles, p > 0.05, so we can not reject the null hypothesis that the units in each muscle come from a multimodal distribution. However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.

      Still, the limited sample size warrants further data collection and analysis since the varying properties across motor units may lead to different activation patterns. Given these results, we have edited the text as follows:

      “A subset of units, primarily in the long head, were recruited in under 50% of the total strides and with lower spike counts (Figure 2C). This distribution of recruitment probabilities might reflect a functionally different subpopulation of units. However, the distribution of recruitment probabilities were not found to be significantly multimodal (p>0.05 in both cases, Hartigan’s dip test; Hartigan, 1985). However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.”

      The statistical foundation for some claims is lacking. In addition, the description of key statistical analysis in the Methods is too brief and very hard to understand. This leaves several claims hard to validate.

      We thank the Reviewer for these comments and have clarified the text related to key statistical analyses throughout the manuscript, as described in our other responses below.

      Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to describe the firing activity of individual motor units in mice during locomotion. To achieve this, they implanted small arrays of eight electrodes in two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Simultaneously, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice at five different speeds, ranging from 10 to 27.5 cm·s⁻¹.

      From these data, the authors reported that:

      (1) a significant portion of the identified motor units was not consistently recruited across strides,

      (2) motor units identified from the lateral head of the triceps tended to be recruited later than those from the long head,

      (3) the number of spikes per stride and peak firing rates were correlated in both muscles, and

      (4) the probability of motor unit recruitment and firing rates increased with walking speed.

      The authors conclude that these differences can be attributed to the distinct functions of the muscles and the constraints of the task (i.e., speed).

      Strengths:

      The combination of novel electrode arrays to record intramuscular electromyographic signals from a larger muscle volume with an advanced spike sorting pipeline capable of identifying populations of motor units.

      We thank the Reviewer for this comment.

      Weaknesses:

      (1) There is a lack of information on the number of identified motor units per muscle and per animal.

      The Reviewer is correct that this information was not explicitly provided in the prior submission. We have therefore added Table 1 that quantifies the number of motor units per muscle and per animal.

      (2) All identified motor units are pooled in the analyses, whereas per-animal analyses would have been valuable, as motor units within an individual likely receive common synaptic inputs. Such analyses would fully leverage the potential of identifying populations of motor units.

      Please see our answer to the following point, where we address questions (2) and (3) together.

      (3) The current data do not allow for determining which motor units were sampled from each pool. It remains unclear whether the sample is biased toward high-threshold motor units or representative of the full pool.

      We thank the Reviewer for these comments. To clarify how motor unit responses were distributed across animals and muscle targets, we updated or added the following figures:  

      Figure 2C

      Figure 4–figure supplement 1

      Figure 5–figure supplement 2

      Figure 6–figure supplement 2

      These provide a more complete look at the range of activity within each motor pool, suggesting that we do measure from units with different activation thresholds within the same motor pool, rather than this variation being due to cross-animal differences. For example, Figure 2C illustrates that motor units from the same muscle and animal show a wide variety of recruitment probabilities. However, the limited number of motor units recorded from each individual animal does not allow a statistically rigorous test for examining cross-animal differences.

      (4) The behavioural analysis of the animals relies solely on kinematics (2D estimates of elbow angle and stride timing). Without ground reaction forces or shoulder angle data, drawing functional conclusions from the results is challenging.

      The Reviewer is correct that we did not measure muscular force generation or ground reaction forces in the present study. Although outside the scope of this study, future work might employ buckle force transducers as used in larger animals (Biewener et al., 1988; Karabulut et al., 2020) to examine the complex interplay between neural commands, passive biomechanics, and the complex force-generating properties of muscle tissue.

      Major comments:

      (1) Spike sorting

      The conclusions of the study rely on the accuracy and robustness of the spike sorting algorithm during a highly dynamic task. Although the pipeline was presented in a previous publication (Chung et al., 2023, eLife), a proper validation of the algorithm for identifying motor unit spikes is still lacking. This is particularly important in the present study, as the experimental conditions involve significant dynamic changes. Under such conditions, muscle geometry is altered due to variations in both fibre pennation angles and lengths.

      This issue differs from electrode drift, and it is unclear whether the original implementation of Kilosort includes functions to address it. Could the authors provide more details on the various steps of their pipeline, the strategies they employed to ensure consistent tracking of motor unit action potentials despite potential changes in action potential waveforms, and the methods used for manual inspection of the spike sorting algorithm's output?

      This is an excellent point and we agree that the dynamic behavior used in this investigation creates potential new challenges for spike sorting. In our analysis, Kilosort 2.5 provides key advantages in comparing unit waveforms across multiple channels and in detecting overlapping spikes. We modified this version of Kilosort to construct unit waveform templates using only the channels within the same muscle (Chung et al., 2023), as clarified in the revised Methods section (see “Electromyography (EMG)”):

      “A total of 33 units were identified across all animals. Each unit’s isolation was verified by confirming that no more than 2% of inter-spike intervals violated a 1 ms refractory limit. Additionally, we manually reviewed cross-correlograms to ensure that each waveform was only reported as a single motor unit.”

      The Reviewer is correct that our ability to precisely measure a unit’s activity based on its waveform will depend on the relationship between the embedded electrode and the muscle geometry, which alters over the course of the stride. As a follow-up to the original text, we have included new analyses to characterize the waveform activity throughout the experiment and stride (also in Methods):

      “We further validated spike sorting by quantifying the stability of each unit’s waveform across time (Figure 1–figure supplement 1). First, we calculated the median waveform of each unit across every trial to capture long-term stability of motor unit waveforms. Additionally, we calculated the median waveform through the stride binned in 50 ms increments using spiking from a single trial. This second metric captures the stability of our spike sorting during the rapid changes in joint angles that occur during the burst of an individual motor unit. In doing so, we calculated each motor unit’s waveforms from the single channel in which that unit’s amplitude was largest and did not attempt to remove overlapping spikes from other units before measuring the median waveform from the data. We then calculated the correlation between a unit’s waveform over either trials or bins in which at least 30 spikes were present. The high correlation of a unit waveform over time, despite potential changes in the electrodes’ position relative to muscle geometry over the dynamic task, provides additional confidence in both the stability of our EMG recordings and the accuracy of our spike sorting.”

      We have included a supplementary to Figure 1 to highlight the effectiveness of our spike sorting.

      (2) Yield of the spike sorting pipeline and analyses per animal/muscle

      A total of 33 motor units were identified from two heads of the triceps in six mice (17 from the long head and 16 from the lateral head). However, precise information on the yield per muscle per animal is not provided. This information is crucial to support the novelty of the study, as the authors claim in the introduction that their electrode arrays enable the identification of populations of motor units. Beyond reporting the number of identified motor units, another way to demonstrate the effectiveness of the spike sorting algorithm would be to compare the recorded EMG signals with the residual signal obtained after subtracting the action potentials of the identified motor units, using a signal-to-residual ratio.

      Furthermore, motor units identified from the same muscle and the same animal are likely not independent due to common synaptic inputs. This dependence should be accounted for in the statistical analyses when comparing changes in motor unit properties across speeds and between muscles.

      We thank the Reviewer for this comment. Regarding motor unit yield, as described above the newly-added Table 1 displays the yield from each animal and muscle.

      Regarding spike sorting, while signal-to-residual is often an excellent metric, it is not ideal for our high-resolution EMG signals since isolated single motor units are typically superimposed on a “bulk” background consisting of the low-amplitude waveforms of other motor units. Because these smaller units typically cannot be sorted, it is challenging to estimate the “true” residual after subtracting (only) the largest motor unit, since subtracting each sorted unit’s waveform typically has a very small effect on the RMS of the total EMG signal. To further address concerns regarding spike sorting quality, we added Figure 1–figure supplement 1 that demonstrates motor units’ consistency over the experiment, highlighting that the waveform maintains its shape within each stride despite muscle/limb dynamics and other possible sources of electrical noise or artifact.

      Finally, the Reviewer is correct that individual motor units in the same muscle are very likely to receive common synaptic inputs. These common inputs may reflect in sparse motor units being recruited in overlapping rather than different strides. Indeed, in the following text added to the Results, we identified that motor units are recruited with higher probability when additional units are recruited.

      “Probabilistic recruitment is correlated across motor units

      Our results show that the recruitment of individual motor units is probabilistic even within a single speed quartile (Figure 5A-C) and predicts body movements (Figure 6), raising the question of whether the recruitment of individual motor units are correlated or independent. Correlated recruitment might reflect shared input onto the population of motor units innervating the muscle (De Luca, 1985; De Luca & Erim, 1994; Farina et al., 2014). For example, two motor units, each with low recruitment probabilities, may still fire during the same set of strides. To assess the independence of motor unit recruitment across the recorded population, we compared each unit’s empirical recruitment probability across all strides to its conditional recruitment probability during strides in which another motor unit from the same muscle was recruited (Figure 7). Doing this for all motor unit pairs revealed that motor units in both muscles were biased towards greater recruitment when additional units were active (p<0.001, Wilcoxon signed-rank tests for both the lateral and long heads of triceps). This finding suggests that probabilistic recruitment reflects common synaptic inputs that covary together across locomotor strides.”

      (3) Representativeness of the sample of identified motor units

      However, to draw such conclusions, the authors should exclusively compare motor units from the same pool and systematically track violations of the recruitment order. Alternatively, they could demonstrate that the motor units that are intermittently active across strides correspond to the smallest motor units, based on the assumption that these units should always be recruited due to their low activation thresholds.

      One way to estimate the size of motor units identified within the same muscle would be to compare the amplitude of their action potentials, assuming that all motor units are relatively close to the electrodes (given the selectivity of the recordings) and that motoneurons innervating more muscle fibres generate larger motor unit action potentials.

      We thank the Reviewer for this comment. Below, we provide more detailed analyses of the relationships between motor unit spike amplitude and the recruitment probability as well as latency (relative to stride onset) of activation.

      We generated Author response image 1 to illustrate the relationship between the amplitude of motor units and their firing properties. As suspected, units with larger-amplitude waveforms fired with lower probability and produced their first spikes later in the stride. If we were comfortable assuming that larger spike amplitudes mean higher-force units, then this would be consistent with a key prediction of the size principle (i.e. that higher-force units are recruited later). However, we are hesitant to base any conclusions on this assumption or emphasize this point with a main-text figure, since EMG signal amplitude may also vary due to the physical properties of the electrode and distance from muscle fibers. Thus it is possible that a large motor unit may have a smaller waveform amplitude relative to the rest of the motor pool.

      Author response image 1.

      Relation between motor unit amplitude and (A) recruitment probability and (B) mean first spike time within the stride. Colored lines indicate the outcome of linear regression analyses.

      Currently, the data seem to support the idea that motor units that are alternately recruited across strides have recruitment thresholds close to the level of activation or force produced during slow walking. The fact that recruitment probability monotonically increases with speed suggests that the force required to propel the mouse forward exceeds the recruitment threshold of these "large" motor units. This pattern would primarily reflect spatial recruitment following the size principle rather than flexible motor unit control.

      We thank the Reviewer for this comment. We agree with this interpretation, particularly in relation to the references suggested in later comments, and have added the following text to the Discussion to better reflect this argument:

      “To investigate the neuromuscular control of locomotor speed, we quantified speed-dependent changes in both motor unit recruitment and firing rate. We found that the majority of units were recruited more often and with larger firing rates at faster speeds (Figure 5, Figure5–figure supplement 1). This result may reflect speed-dependent differences in the common input received by populations of motor neurons with varying spiking thresholds (Henneman et al., 1965). In the case of mouse locomotion, faster speeds might reflect a larger common input, increasing the recruitment probability as more neurons, particularly those that are larger and generate more force, exceed threshold for action potentials (Farina et al., 2014).”

      (4)    Analysis of recruitment and firing rates

      The authors currently report active duration and peak firing rates based on spike trains convolved with a Gaussian kernel. Why not report the peak of the instantaneous firing rates estimated from the inverse of the inter-spike interval? This approach appears to be more aligned with previous studies conducted to describe motor unit behaviour during fast movements (e.g., Desmedt & Godaux, 1977, J Physiol; Van Cutsem et al., 1998, J Physiol; Del Vecchio et al., 2019, J Physiol).

      We thank the Reviewer for this comment. In the revised Discussion (see ‘Firing rates in mouse locomotion compared to other species’) we reference several examples of previous studies that quantified spike patterns based on the instantaneous firing rate. We chose to report the peak of the smoothed firing rate because that quantification includes strides with zero spikes or only one spike, which occur regularly in our dataset (and for which ISI rate measures, which require two spikes to define an instantaneous firing rate, cannot be computed). Regardless, in the revised Figure 4B, we present an analysis that uses inter-spike intervals as suggested, which yielded similar ranges of firing rates as the primary analysis.

      (5)    Additional analyses of behaviour

      The authors currently analyse motor unit recruitment in relation to elbow angle. It would be valuable to include a similar analysis using the angular velocity observed during each stride, re broadly, comparing stride-by-stride changes in firing rates with changes in elbow angular velocity would further strengthen the final analyses presented in the results section.

      We thank the Reviewer for this comment. To address this, we have modified Figure 6 and the associated Supplemental Figures, to show relationships in unit activation with both the range of elbow extension and the range of elbow velocity for each stride. These new Supplemental Figures show that the trends shown in main text Figure 6C and 6E (which show data from all speed quartiles on the same axes) are also apparent in both the slower and faster quartiles individually, although single-quartile statistical tests (with smaller sample size than the main analysis) not reach statistical significance in all cases.

      Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that:

      (1) Motor units are recruited differently in the two types of muscles.

      (2) Individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle.

      (3) The recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique data set, and the data analysis is convincing and well-performed.

      We thank the Reviewer for the comment.

      Weaknesses:

      The implications of "probabilistical recruitment" should be explored, addressed, and analyzed further.

      Comments:

      One of the study's main findings (perhaps the main finding) is that the motor units are "probabilistically" recruited. The authors do not define what they mean by probabilistically recruited, nor do they present an alternative scenario to such recruitment or discuss why this would be interesting or surprising. However, on page 4, they do indicate that the recruitment of units from both muscles was only active in a subset of strides, i.e., they are not reliably active in every step.

      If probabilistic means irregular spiking, this is not new. Variability in spiking has been seen numerous times, for instance in human biceps brachii motor units during isometric contractions (Pascoe, Enoka, Exp physiology 2014) and elsewhere. Perhaps the distinction the authors are seeking is between fluctuation-driven and mean-driven spiking of motor units as previously identified in spinal motor networks (see Petersen and Berg, eLife 2016, and Berg, Frontiers 2017). Here, it was shown that a prominent regime of irregular spiking is present during rhythmic motor activity, which also manifests as a positive skewness in the spike count distribution (i.e., log-normal).

      We thank the Reviewer for this comment and have clarified several passages in response. The Reviewer is of course correct that irregular motor unit spiking has been described previously and may reflect motor neurons’ operating in a high-sensitivity (fluctuation-driven) regime. We now cite these papers in the Discussion (see ‘Firing rates in mouse locomotion compared to other species’). Additionally, the revision clarifies that “probabilistically” - as defined in our paper - refers only to the empirical observation that a motor unit spikes during only a subset of strides, either when all locomotor speeds are considered together (Figure 2) or separately (Figure 5A-C):

      “Motor units in both muscles exhibited this pattern of probabilistic recruitment (defined as a unit’s firing on only a fraction of strides), but with differing distributions of firing properties across the long and lateral heads (Figure 2).”

      “Our findings (Figure 4) highlight that even with the relatively high firing rates observed in mice, there are still significant changes in firing rate and recruitment probability across the spikes within bursts (Figure 4B) and across locomotor speeds (Figure 5F). Future studies should more carefully examine how these rapidly changing spiking patterns derive from both the statistics of synaptic inputs and intrinsic properties of motor neurons (Manuel & Heckman, 2011; Petersen & Berg, 2016; Berg, 2017).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, there are several issues with the statistics that need to be corrected to properly support the claims made in the paper.

      The authors compare the fractions of MUs that show significant variation across locomotor speeds in their firing rate and recruitment probability. However, it is not statistically founded to compare the results of separate statistical tests based on different kinds of measurements and thus have unconstrained differences in statistical power. The comparison of the fractional changes in firing rates and recruitment across speeds that follow is helpful, though in truth, by contemporary standards, one would like to see error bars on these estimates. These could be generated using bootstrapping.

      The Reviewer is correct, and we have revised the manuscript to better clarify which quantities should or should not be compared, including the following passage (see “Motor unit mechanisms of speed control” in Results):

      “Speed-dependent increases in peak firing rate were therefore also present in our dataset, although in a smaller fraction of motor units (22/33) than changes in recruitment probability (31/33). Furthermore, the mean (± SE) magnitude of speed-dependent increases was smaller for spike rates (mean rate<sub>fast</sub>/rate<sub>slow</sub> of 111% ± 20% across all motor units) than for recruitment probabilities (mean p(recruitment)<sub>fast</sub>/p(recruitment)<sub>slow</sub> of 179% ± 3% across all motor units). While fractional changes in rate and recruitment probability are not readily comparable given their different upper limits, these findings could suggest that while both recruitment and peak rate change across speed quartiles, increased recruitment probability may play a larger role in driving changes in locomotor speed.”

      The description in the Methods of the tests for variation in firing rates and recruitment probability across speeds are extremely hard to understand - after reading many times, it is still not clear what was done, or why the method used was chosen. In the main text, the authors quote p-values and then state "bootstrap confidence intervals," which is not a statistical test that yields a p-value. While there are mathematical relationships between confidence intervals and statistical tests such that a one-to-one correspondence between them can exist, the descriptions provided fall short of specifying how they are related in the present instance. For this reason, and those described in what follows, it is not clear what the p-values represent.

      Next, the authors refer to fitting a model ("a Poisson distribution") to the data to estimate firing rate and recruitment probability, that the model results agree with their actual data, and that they then bootstrapped from the model estimates to get confidence intervals and compute p-values. Why do this? Why not just do something much simpler, like use the actual spike counts, and resample from those? I understand that it is hard to distinguish between no recruitment and just no spikes given some low Poisson firing rate, but how does that challenge the ability to test if the firing rates or the number of spiking MUs changes significantly across speeds? I can come up with some reasons why I think the authors might have decided to do this, but reasoning like this really should be made explicit.

      In addition, the authors would provide an unambiguous description of the model, perhaps using an equation and a description of how it was fit. For the bootstrapping, a clear description of how the resampling was done should be included. The focus on peak firing rate instead of mean (or median) firing rate should also be justified. Since peaks are noisier, I would expect the statistical power to be lower compared to using the mean or median.

      We thank the Reviewer for the comments and have revised and expanded our discussion of the statistical tests employed. We expanded and clarified our description of these techniques in the updated Methods section:

      “Joint model of rate and recruitment

      We modeled the recruitment probability and firing rate based on empirical data to best characterize firing statistics within the stride. Particularly, this allowed for multiple solutions to explain why a motor unit would not spike within a stride. From the empirical data alone, strides with zero spikes would have been assumed to have no recruitment of a unit. However, to create a model of motor unit activity that includes both recruitment and rate, it must be possible that a recruited unit can have a firing rate of zero. To quantify the firing statistics that best represent all spiking and non-spiking patterns, we modeled recruitment probability and peak firing rate along the following piecewise function:

      Eq. 1:

      Eq. 2:

      where y denotes the observed peak firing rate on a given stride (determined by convolving motor unit spike times with a Gaussian kernel as described above), p denotes the probability of recruitment, and λ denotes the expected peak firing rate from a Poisson distribution of outcomes. Thus, an inactive unit on a given stride may be the result of either non-recruitment or recruitment with a stochastically zero firing rate. The above equations were fit by minimizing the negative log-likelihood of the parameters given the data.”

      “Permutation test for joint model of rate and recruitment and type 2 regression slopes

      To quantify differences in firing patterns across walking speeds, we subdivided each mouse’s total set of strides into speed quartiles and calculated rate (𝜆, Eq. 1 and 2, Fig. 5A-C) and recruitment probability terms (p, Eq. 1 and 2, Fig. 5D-F) for each unit in each speed quartile. Here we calculated the difference in both the rate and recruitment terms across the fastest and slowest speed quartiles (p<sub>fast</sub>-p<sub>slow</sub> and 𝜆<sub>fast</sub>-𝜆<sub>slow</sub>). To test whether these model parameters were significantly different depending on locomotor speed, we developed a null model combining strides from both the fastest and slowest speed quartiles. After pooling strides from both quartiles, we randomly distributed the pooled set of strides into two groups with sample sizes equal to the original slow and fast quartiles. We then calculated the null model parameters for each new group and found the difference between like terms. To estimate the distribution of possible differences, we bootstrapped this result using 1000 random redistributions of the pooled set of strides. Following the permutation test, the 95% confidence interval of this final distribution reflects the null hypothesis of no difference between groups. Thus, the null hypothesis can be rejected if the true difference in rate or recruitment terms exceeds this confidence interval.

      We followed a similar procedure to quantify cross-muscle differences in the relationship between firing parameters. For each muscle, we estimated the slope across firing parameters for each motor unit using type 2 regression. In this case, the true difference was the difference in slopes between muscles. To test the null hypothesis that there was no difference in slopes, the null model reflected the pooled set of units from both muscles. Again, slopes were calculated for 1000 random resamplings of this pooled data to estimate the 95% confidence interval.”

      The argument for delayed activation of the lateral head is interesting, but I am not comfortable saying the nervous system creates a delay just based on observations of the mean time of the first spike, given the potential for differential variability in spike timing across muscles and MUs. One way to make a strong case for a delay would be to show aggregate PSTHs for all the spikes from all the MUs for each of the two heads. That would distinguish between a true delay and more gradual or variable activation between the heads.

      This is a good point and we agree that the claim made about the nervous system is too strong given the results. Even with Author response image 2 that the Reviewer suggested, there is still not enough evidence to isolate the role of the nervous system in the muscles’ activation.

      Author response image 2.

      Aggregate peristimulus time histogram (PSTH) for all motor unit spike times in the long head (top) and lateral head (bottom) within the stride.

      In the ideal case, we would have more simultaneous recordings from both muscles to make a more direct claim on the delay. Still, within the current scope of the paper, to correct this and better describe the difference in timing of muscle activity, we edited the text to the following:

      “These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, the motor pool for the long head becomes active roughly 100 ms before the motor pool supplying the lateral head during locomotion (Figure 3C).”

      The results from Marshall et al. 2022 suggest that the recruitment of some MUs is not just related to muscle force, but also the frequency of force variation - some of their MUs appear to be recruited only at certain frequencies. Figure 5C could have shown signs of this, but it does not appear to. We do not really know the force or its frequency of variation in the measurements here. I wonder whether there is additional analysis that could address whether frequency-dependent recruitment is present. It may not be addressable with the current data set, but this could be a fruitful direction to explore in the future with MU recordings from mice.

      We agree that this would be a fruitful direction to explore, however the Reviewer is correct that this is not easily addressable with the dataset. As the Reviewer points out, stride frequency increases with increased speed, potentially offering the opportunity to examine how motor unit activity varies with the frequency, phase, and amplitude of locomotor movements. However, given our lack of force data (either joint torques or ground reaction forces), dissociating the frequency/phase/amplitude of skeletal kinematics from the frequency/phase/amplitude of muscle force. Marshall et al. (2022) mitigated these issues by using an isometric force-production task (Marshall et al., 2022). Therefore, while we agree that it would be a major contribution to extend such investigations to whole-body movements like locomotion, given the complexities described above we believe this is a project for the future, and beyond the scope of the present study.

      Minor:

      Page 5: "Units often displayed no recruitment in a greater proportion of strides than for any particular spike count when recruited (Figures 2A, B)," - I had to read this several times to understand it. I suggest rephrasing for clarity.

      We have changed the text to read:

      “Units demonstrated a variety of firing patterns, with some units producing 0 spikes more frequently than any non-zero spike count (Figure 2A, B),...”

      Figure 3 legend: "Mean phase ({plus minus} SE) of motor unit burst duration across all strides.": It is unclear what this means - durations are not usually described as having a phase. Do we mean the onset phase?

      We have changed the text to read:

      “Mean phase ± SE of motor unit burst activity within each stride”

      Page 9: "suggesting that the recruitment of individual motor units in the lateral and long heads might have significant (and opposite) effects on elbow angle in strides of similar speed (see Discussion)." I wouldn't say "opposite" here - that makes it sound like the authors are calling the long head a flexor. The authors should rephrase or clarify the sense in which they are opposite.

      This is a fair point and we agree we should not describe the muscles as ‘opposite’ when both muscles are extensors. We have removed the phrase ‘and opposite’ from the text.

      Page 11: "in these two muscles across in other quadrupedal species" - typo.

      We have corrected this error.

      Page 16: This reviewer cannot decipher after repeated attempts what the first two sentences of the last paragraph mean. - “Future studies might also use perturbations of muscle activity to dissociate the causal properties of each motor unit’s activity from the complex correlation structure of locomotion. Despite the strong correlations observed between motor unit recruitment and limb kinematics (Fig. 6, Supplemental Fig. 3), these results might reflect covariations of both factors with locomotor speed rather than the causal properties of the recorded motor unit.”

      For better clarity, we have changed the text to read:

      “Although strong correlations were observed between motor unit recruitment and limb kinematics during locomotion (Figure 6, Figure 6–figure supplement 1), it remains unclear whether such correlations actually reflect the causal contributions that those units make to limb movement. To resolve this ambiguity, future studies could use electrical or optical perturbations of muscle contraction levels (Kim et al., 2024; Lu et al., 2024; Srivastava et al., 2015, 2017) to test directly how motor unit firing patterns shape locomotor movements.The short-latency effects of patterned motor unit stimulation (Srivastava et al., 2017) could then reveal the sensitivity of behavior to changes in muscle spiking and the extent to which the same behaviors can be performed with many different motor commands.”

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Introduction:

      (1) "Although studies in primates, cats, and zebrafish have shown that both the number of active motor units and motor unit firing rates increase at faster locomotor speeds (Grimby, 1984; Hoffer et al., 1981, 1987; Marshall et al., 2022; Menelaou & McLean, 2012)." I would remove Marshall et al. (2022) as their monkeys performed pulling tasks with the upper limb. You can alternatively remove locomotor from the sentence and replace it with contraction speed.

      Thank you for the comment. While we intended to reference this specific paper to highlight the rhythmic activity in muscles, we agree that this deviates from ‘locomotion’ as it is referenced in the other cited papers which study body movement. We have followed the Reviewer’s suggestion to remove the citation to Marshall et al.

      (2) "The capability and need for faster force generation during dynamic behavior could implicate motor unit recruitment as a primary mechanism for modulating force output in mice."

      The authors could add citations to this sentence, of works that showed that recruitment speed is the main determinant of the rate of force development (see for example Dideriksen et al. (2020) J Neurophysiol; J. L. Dideriksen, A. Del Vecchio, D. Farina, Neural and muscular determinants of maximal rate of force development. J Neurophysiol 123, 149-157 (2020)).

      Thank you for pointing out this important reference. We have included this as a citation as recommended.

      Results:

      (3) "Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in the triceps brachii (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units (Figure 1E) as described previously (Chung et al., 2023)."

      This sentence can be misleading for the reader as the array used by the researchers has 4 threads of 8 electrodes. Would it be possible to specify the number of electrodes implanted per head of interest? I assume 8 per head in most mice (or 4 bipolar channels), even if that's not specifically written in the manuscript.

      Thank you for the suggestion. As described above, we have added Table 1, which includes all array locations, and we edited the statement referenced in the comment as follows:

      “Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in forelimb muscles (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units in the triceps brachii long and lateral heads (Table 1, Figure 1E) as described previously (Chung et al., 2023).“

      (4) "These findings demonstrate that despite the overlapping biomechanical functions of the long and lateral heads of the triceps, the nervous system creates a consistent, approximately 100 ms delay (Figure 3C) between the activation of the two muscles' motor neuron pools. This timing difference suggests distinct patterns of synaptic input onto motor neurons innervating the lateral and long heads."

      Both muscles don't have fully overlapping biomechanical functions, as one of them also acts on the shoulder joint. Please be more specific in this sentence, saying that both muscles are synergistic at the elbow level rather than "have overlapping biomechanical functions".

      We agree with the above reasoning and that our manuscript should be clearer on this point. We edited the above text in accordance with the Reviewer suggestion as follows:

      "These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, …”

      (5) "Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role."

      It is difficult to draw such an affirmative conclusion on the synaptic inputs from the data presented by the authors. The differences in firing rates may solely arise from other factors than distinct synaptic inputs, such as the different intrinsic properties of the motoneurons or the reception of distinct neuromodulatory inputs.

      To better explain our findings, we adjusted the above text in the Results (see “Motor unit firing patterns in the long and lateral heads of the triceps”):

      “Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role.”

      We also included the following distinction in the Discussion (see “Differences in motor unit activity patterns across two elbow extensors”) to address the other plausible mechanisms mentioned.

      “The large differences in burst timing and spike patterning across the muscle heads suggest that the motor pools for each muscle receive distinct inputs. However, differences in the intrinsic physiological properties of motor units and neuromodulatory inputs across motor pools might also make substantial contributions to the structure of motor unit spike patterns (Martínez-Silva et al., 2018; Miles & Sillar, 2011).”

      (6) "We next examined whether the probabilistic recruitment of individual motor units in the triceps and elbow extensor muscle predicted stride-by-stride variations in elbow angle kinematics."

      I'm not sure that the wording is appropriate here. The analysis does not predict elbow angle variations from parameters extracted from the spiking activity. It rather compares the average elbow angle between two conditions (motor unit active or not active).

      We thank the Reviewer for this comment and agree that the wording could be improved here to better reflect our analysis. To lower the strength of our claim, we replaced usage of the word

      ‘predict’ with ‘correlates’ in the above text and throughout the paper when discussing this result.

      Methods:

      (7) "Using the four threads on the customizable Myomatrix array (RF-4x8-BHS-5), we implanted a combination of muscles in each mouse, sometimes using multiple threads within the same muscle. [...] Some mice also had threads simultaneously implanted in their ipsilateral or contralateral biceps brachii although no data from the biceps is presented in this study."

      A precise description of the localisation of the array (muscles and the number of arrays per muscle) for each animal would be appreciated.

      (8) "A total of 33 units were identified and manually verified across all animals." A precise description of the number of motor units concurrently identified per muscle and per animal would be appreciated. Moreover, please add details on the manual inspection. Does it involve the manual selection of missing spikes? What are the criteria for considering an identified motor unit as valid?

      As discussed earlier, we added Table 1 to the main text to provide the details mentioned in the above comments.

      Regarding spike sorting, given the very large number of spikes recorded, we did not rely on manual adjusting mislabeled spikes. Instead, as described in the revised Methods section, we verified unit isolation by ensuring units had >98% of spikes outside of 1ms of each other. Moreover, as described above we have added new analyses (Figure 1–figure supplement 1) confirming the stability of motor unit waveforms across both the duration of individual recording sessions (roughly 30 minutes) and across the rapid changes in limb position within individual stride cycles (roughly 250 msec).

      Reviewer #3 (Recommendations for the authors):

      Figure 2 (and supplement) show spike count distributions with strong positive skewness, which is in accordance with the prediction of a fluctuation-driven regime. I suggest plotting these on a logarithmic x-axis (in addition to the linear axis), which should reveal a bell-shaped distribution, maybe even Gaussian, in a majority of the units.

      We thank the Reviewer for the suggestion. We present the requested analysis (Author response image 3), which shows bell-shaped distributions for some (but not all) distributions. However, we believe that investigating why some replotted distributions are Gaussian and others are not falls beyond the scope of this paper, and likely requires a larger dataset than the one we were able to obtain.

      Author response image 3.

      Spike count distributions for each motor unit on a logarithmic x-axis.

      Why not more data? I tried to get an overview of how much data was collected.

      Supplemental Figure 1 has all the isolated units, which amounts to 38 (are the colors the two muscle types?). Given there are 16 leads in each myomatrix, in two muscles, of six mice, this seems like a low yield. Could the authors comment on the reasons for this low yield?

      Regarding motor unit yield, even with multiple electrodes per muscle and a robust sorting algorithm, we often isolated only a few units per muscle. This yield likely reflects two factors. First, because of the highly dynamic nature of locomotion and high levels of muscle contraction, isolating individual spikes reliably across different locomotor speeds is inherently challenging, regardless of the algorithm being employed. Second, because the results of spike-train analyses can be highly sensitive to sorting errors, we have only included the motor units that we can sort with the highest possible confidence across thousands of strides.

      Minor:

      Figure captions especially Figure 6: The text is excessively long. Can the text be shortened?

      We thank the Reviewer for this comment. Generally, we seek to include a description of the methods and results within the figure captions, but we concede that we can condense the information in some cases. In a number of cases, we have moved some of the descriptive text from the caption to the Methods section.

      References

      Berg, R. W. (2017). Neuronal Population Activity in Spinal Motor Circuits: Greater Than the Sum of Its Parts. Frontiers in Neural Circuits, 11. https://doi.org/10.3389/fncir.2017.00103

      Biewener, A. A., Blickhan, R., Perry, A. K., Heglund, N. C., & Taylor, C. R. (1988). Muscle Forces During Locomotion in Kangaroo Rats: Force Platform and Tendon Buckle Measurements Compared. Journal of Experimental Biology, 137(1), 191–205. https://doi.org/10.1242/jeb.137.1.191

      Chung, B., Zia, M., Thomas, K. A., Michaels, J. A., Jacob, A., Pack, A., Williams, M. J., Nagapudi, K., Teng, L. H., Arrambide, E., Ouellette, L., Oey, N., Gibbs, R., Anschutz, P., Lu, J., Wu, Y., Kashefi, M., Oya, T., Kersten, R., … Sober, S. J. (2023). Myomatrix arrays for high-definition muscle recording. eLife, 12, RP88551. https://doi.org/10.7554/eLife.88551

      De Luca, C. J. (1985). Control properties of motor units. Journal of Experimental Biology, 115(1), 125–136. https://doi.org/10.1242/jeb.115.1.125

      De Luca, C. J., & Erim, Z. (1994). Common drive of motor units in regulation of muscle force. Trends in Neurosciences, 17(7), 299–305. https://doi.org/10.1016/0166-2236(94)90064-7

      Farina, D., Negro, F., & Dideriksen, J. L. (2014). The effective neural drive to muscles is the common synaptic input to motor neurons. The Journal of Physiology, 592(16), 3427–3441. https://doi.org/10.1113/jphysiol.2014.273581

      Hartigan, P. M. (1985). Algorithm AS 217: Computation of the Dip Statistic to Test for Unimodality. Applied Statistics, 34(3), 320. https://doi.org/10.2307/2347485

      Henneman, E., Somjen, G., & Carpenter, D. O. (1965). FUNCTIONAL SIGNIFICANCE OF CELL SIZE IN SPINAL MOTONEURONS. Journal of Neurophysiology, 28(3), 560–580. https://doi.org/10.1152/jn.1965.28.3.560

      Karabulut, D., Dogru, S. C., Lin, Y.-C., Pandy, M. G., Herzog, W., & Arslan, Y. Z. (2020). Direct Validation of Model-Predicted Muscle Forces in the Cat Hindlimb During Locomotion. Journal of Biomechanical Engineering, 142(5), 051014. https://doi.org/10.1115/1.4045660

      Kim, J. J., Wyche, I. S., Olson, W., Lu, J., Bakir, M. S., Sober, S. J., & O’Connor, D. H. (2024). Myo-optogenetics: Optogenetic stimulation and electrical recording in skeletal muscles. https://doi.org/10.1101/2024.06.21.600113

      Lu, J., Zia, M., Baig, D. A., Yan, G., Kim, J. J., Nagapudi, K., Anschutz, P., Oh, S., O’Connor, D., Sober, S. J., & Bakir, M. S. (2024). Opto-Myomatrix: μLED integrated microelectrode arrays for optogenetic activation and electrical recording in muscle tissue. https://doi.org/10.1101/2024.07.01.601601

      Manuel, M., & Heckman, C. J. (2011). Adult mouse motor units develop almost all of their force in the subprimary range: A new all-or-none strategy for force recruitment? Journal of Neuroscience, 31(42), 15188–15194. https://doi.org/10.1523/JNEUROSCI.2893-11.2011

      Marshall, N. J., Glaser, J. I., Trautmann, E. M., Amematsro, E. A., Perkins, S. M., Shadlen, M. N., Abbott, L. F., Cunningham, J. P., & Churchland, M. M. (2022). Flexible neural control of motor units. Nature Neuroscience, 25(11), 1492–1504. https://doi.org/10.1038/s41593-022-01165-8

      Martínez-Silva, M. de L., Imhoff-Manuel, R. D., Sharma, A., Heckman, C. J., Shneider, N. A., Roselli, F., Zytnicki, D., & Manuel, M. (2018). Hypoexcitability precedes denervation in the large fast-contracting motor units in two unrelated mouse models of ALS. eLife, 7(2007), 1–26. https://doi.org/10.7554/eLife.30955

      Miles, G. B., & Sillar, K. T. (2011). Neuromodulation of Vertebrate Locomotor Control Networks. Physiology, 26(6), 393–411. https://doi.org/10.1152/physiol.00013.2011

      Petersen, P. C., & Berg, R. W. (2016). Lognormal firing rate distribution reveals prominent fluctuation–driven regime in spinal motor networks. eLife, 5. https://doi.org/10.7554/elife.18805

      Srivastava, K. H., Elemans, C. P. H., & Sober, S. J. (2015). Multifunctional and Context-Dependent Control of Vocal Acoustics by Individual Muscles. The Journal of Neuroscience, 35(42), 14183–14194. https://doi.org/10.1523/JNEUROSCI.3610-14.2015

      Srivastava, K. H., Holmes, C. M., Vellema, M., Pack, A. R., Elemans, C. P. H., Nemenman, I., & Sober, S. J. (2017). Motor control by precisely timed spike patterns. Proceedings of the National Academy of Sciences of the United States of America, 114(5), 1171–1176. https://doi.org/10.1073/pnas.1611734114

    1. Author response:

      The following is the authors’ response to the current reviews.

      We are pleased that Reviewer 3 appreciated our findings and found the temporal lag between the expression of TFF1 and TFF3 during signaling particularly interesting. The reviewer also advised us not to overemphasize that this lag arises from phase separation of ERα at the TFF1 locus, as the use of 1,6-hexanediol alone is not sufficient to conclusively establish whether ERα condensates undergo liquid–liquid phase separation. We agree with this assessment and have revised the manuscript accordingly. Specifically, we have modified the title to remove reference to phase separation and have updated the text throughout the manuscript to avoid claiming that the observed condensates are a result of phase separation. The revised title is: “Ligand-dependent Enhancer Activation Indirectly Modulates Non-target Promoters in a Chromatin Domain.”

      With these changes, we are proceeding with the Version of Record using revised version of the manuscript.

      ———

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But I think they can be fixed with more efforts.

      We thank the reviewer for their positive comments on the paper. We have addressed all their specific recommendations below.  

      The deletion of enhancer reveals the absolute reliance of TFF1 on its enhancers for its expression. Authors should elaborate more on this as this is an important finding.

      We thank the reviewer for the comment. We have now added a more detailed discussion on the requirement of enhancer for TFF1 expression in the revised manuscript (line 368-385).  

      In Fig. 1, TFF3 expression is shown to be induced upon E2 signaling through qRT-PCR, while smFISH does not display a similar pattern. The authors attribute this discrepancy to the overall low expression of TFF3. In my opinion, this argument could be further supported by relevant literature, if available. Additionally, does GRO-seq data reveal any changes in TFF3 expression following estrogen stimulation? The GRO-seq track shown in Fig.1 should be adjusted to TFF3 expression to appreciate its expression changes.

      We have now included a browser shot image of TFF3 region showing GRO-Seq signal at E2 time course (Fig. S1C). We observed an increased transcription towards the 3’ end of TFF3 gene body at 3h.  The increased transcription at 3h, corroborates with smFISH data. The relative changes of TFF3 expression measured by qRT-PCR and smFISH for intronic transcripts are somewhat different, we speculate that such biased measurements that are dependent on PCR amplifications could be more for genes that express at low levels and smFISH using intronic probes may be a more sensitive assay to detect such changes.    

      Since the mutually exclusive relationship between TFF1 and TFF3 is based on snap shots in fixed cells, can authors comment on whether the same cell that expresses TFF1 at 1h, expresses TFF3 at 3h? Perhaps, the calculations taking total number of cells that express these genes at 1 and 3h would be useful.

      Like pointed out by the reviewer, since these are fixed cells, we cannot comment on the fate of the same cell at two time points. To further address this limitation, future work could employ cells with endogenous tags for TFF1 and TFF3 and utilize live cell imaging techniques. In a fixed cell assay, as the reviewer suggests, it can be investigated whether a similar fraction shows high TFF3 expression at 3h, as the fraction that shows high TFF1 expression at 1 h. To quantify the fractions as suggested by the reviewer, we plotted the fraction of cells showing high TFF1 and TFF3 expression at 1h and 3h. We identify truly high expressing cells by taking mean and one standard deviation (for single cell level data) at E2-1hr as the threshold for TFF1 (80 and above transcript counts) and mean and one standard deviation (for single cell level data) at E2-3hr as the threshold for TFF3 (36 and above transcript counts). The fraction with high TFF1 expression at 1h  (12.06 ± 2.1) is indeed comparable to that with high TFF3 expression at 3h (12.50 ± 2.0) (Fig. 2C and Author response image 1). We should note that if the transcript counts were normally distributed, a predetermined fraction would be expected to be above these thresholds and comparable fractions can arise just from underlying statistics. But in our experiments, this is unlikely to be the case given the many outliers that affect both the mean and the standard deviation, and the lack of normality and high dispersion in single cell distributions. Of course, despite the fractions being comparable, we cannot be certain if it is the same set of cells that go from high expression of TFF1 to high expression of TFF3, but definitely that is a possibility. We thank the reviewer for pointing out this comparison.

      Author response image 1.

      The graph represents the percent of cells that show high expression for TFF1 and TFF3 at 1h and 3h post E2 signaling. The threshold was collected by pooling in absolute RNA counts from 650 analyzed cells (as in Fig. 2C). The mean and standard deviation over single cell data were calculated. Mean plus one standard deviation was used to set the threshold for identifying high expressing cells. For TFF1, as it maximally expresses at 1h the threshold used was 80. For TFF3, as it maximally expresses at 3h the threshold used was 36. Fraction of cells expressing above 80 and 36 for TFF1 and TFF3 respectively were calculated from three different repeats. Mean of means and standard deviations from the three experiments are plotted here.

      Authors conclude that TFF3 is not directly regulated by enhancer or estrogen receptor. Does ERa bind on TFF3 promoter? 

      The ERa ChIP-seq performed at 1h and 3h of signaling suggests that TFF3 promoter is not bound by ERa as shown in supplementary Fig. 1B and S1B. However, one peak upstream to TFF1 promoter is visible and that is lost at 3h. 

      Minor comments:

      Reviewer’s comment -The figures would benefit from resizing of panels. There is very little space between the panels.

      We have now resized the figures in the revised manuscript.

      The discussion section could include an extrapolation on the relationship between ERα concentration and transcriptional regulation. Given that ERα levels have been shown to play a critical role in breast cancer, exploring how varying concentrations of ERα affect gene expression, including the differential regulation of target and non-target genes, would provide valuable insights into the broader implications of this study.

      This is a very important point that was missing from the manuscript. We have included this in the discussion in the revised manuscript (line 426-430).

      Reviewer #2:

      Summary:

      In this manuscript by Bohra et al., the authors use the well-established estrogen response in MCF7 cells to interrogate the role of genome architecture, enhancers, and estrogen receptor concentration in transcriptional regulation. They propose there is competition between the genes TFF1 and TFF3 which is mediated by transcriptional condensates. This reviewer does not find these claims persuasive as presented. Moreover, the results are not placed in the context of current knowledge.

      Strengths:

      High level of ERalpha expression seems to diminish the transcriptional response. Thus, the results in Fig. 4 have potential insight into ER-mediated transcription. Yet, this observation is not pursued in great depth however, for example with mutagenesis of ERalpha. However, this phenomenon - which falls under the general description of non monotonic dose response - is treated at great depth in the literature (i.e. PMID: 22419778). For example, the result the authors describe in Fig. 4 has been reported and in fact mathematically modeled in PMID 23134774. One possible avenue for improving this paper would be to dig into this result at the single-cell level using deletion mutants of ERalpha or by perturbing co-activators.

      We thank the reviewer for pointing us to the relevant literature on our observation which will enhance the manuscript. We have discussed these findings in relations to ours in the discussion section (Line 400-413). We thank the reviewer for insight on non-monotonic behavior.

      Weaknesses:

      There are concerns with the sm-RNA FISH experiments. It is highly unusual to see so much intronic signal away from the site of transcription (Fig. 2) (PMID: 27932455, 30554876), which suggests to me the authors are carrying out incorrect thresholding or have a substantial amount of labelling background. The Cote paper cited in the manuscript is likewise inconsistent with their findings and is cited in a misleading manner: they see splicing within a very small region away from the site of transcription. 

      We thank the reviewer for this comment, and apologize if they feel we misrepresented the argument from Cote et al. This has now been rectified in the manuscript. However, we do not agree that the intronic signals away from the site of transcription are an artefact. First, the images presented here are just representative 2D projections of 3D Z-stacks; whereas the full 3D stack is used for spot counting using a widely-used algorithm that reports spot counts that are constant over wide range of thresholds (Raj et al., 2008). The veracity of automated counts was first verified initially by comparison to manual counts. Even for the 2D representations the extragenic intronic signals show up at similar thresholds to the transcription sites. 

      The signal is not non-specific arising from background labeling, explained by following reasons:

      • To further support the time-course smFISH data and its interpretation without depending on the dispersed intronic signal, we have analyzed the number of alleles firing/site of transcription at a given time in a cell under the three conditions. We counted the sites of transcription in a given cell and calculated the percentage of cells showing 1,2,3,4 or >4 sites. We see that the percent of cells showing a single site of transcription for TFF1 is very high in uninduced cells and this decreases at 1h. At 1h, the cells showing 2, 3 and 4 sites of transcription increase which again goes down at 3h (Author response image 2A). This agrees with the interpretation made from mean intronic counts away from the site of transcription. Similarly, for TFF3, the number of cells showing 2,3 and 4 sites of transcription increase slightly at 3hr compared to uninduced and 1hr (Author response image 2B).  We can also see that several cells have no alleles firing at a given time as has been quantified in the graphs on right showing total fraction of cells with zero versus non-zero alleles firing (Author response image 2A-B). A non-specific signal would be present in all cells.

      • There is literature on post-transcriptional splicing of RNA beyond our work, which suggests that intronic signal can be found at relatively large distances away from the site of transcription. Waks et al. showed that some fraction of unspliced RNA could be observed up to 6-10 microns away from the site of transcription suggesting that there can be a delay between transcription and (alternative) splicing (Waks et al., 2011). Pannuclear disperse intronic signals can arise as there can be more than one allele firing at a time in different nuclear locations. The spread of intronic transcripts in our images is also limited in cells in which only 1 allele is firing at E2-1 hour (Author response image 2C) or uninduced cells (Author response image 2D). Furthermore, Cote et al. discuss that “Of note, we see that increased transcription level correlates with intron dispersal, suggesting that the percentage of splicing occurring away from the transcription site is regulated by transcription level for at least some introns. This may explain why we observe posttranscriptional splicing of all genes we measured, as all were highly expressed.” This is in line with our interpretation that intron signal dispersal can occur in case of posttranscriptional splicing (Coté et al., 2023). Additionally, other studies have suggested that transcripts in cells do not necessarily undergo co-transcriptional splicing which leads us to conclude that intronic signal can be found farther away from the site of transcription. Coulon et al. showed that splicing can occur after transcript release from the site and suggested that no strict checkpoint exists to ensure intron removal before release which results in splicing and release being kinetically uncoupled from each other (Coulon et al., 2014). Similarly, using live-cell imaging, it was shown that splicing is not always coupled with transcription, and this could depend on the nature and structural features of transcript (such as blockage of polypyrimidine tract which results in delayed recognition) (Vargas et al., 2011). Drexler  et al. showed that as opposed to drosophila transcripts that are shorter, in mammalian cells, splicing of the terminal intron can occur post-transcriptionally (Drexler et al., 2020). Using RNA polymerase II ChIP-Seq time course data from ERα activation in the MCF-7 cells, Honkela et al. showed that large number of genes can show significant delays between the completion of transcription and mRNA production (Honkela et al., 2015). This was attributed to faster transcription of shorter genes which results in splicing  delays suggesting rapid completion of transcription on shorter genes can lead to splicing-associated delays (Honkela et al., 2015). More recently, comparisons of nascent and mature RNA levels suggested a time lapse between transcription and splicing for the genes that are early responders during signaling (Zambrano et al., 2020). The presence of significant numbers of TFF1 nascent RNA in the nucleus in our data corroborates with above observations. 

      • Uniform intensities across many transcripts suggests these are true signal arising from RNA molecules which would not be the case for non-specific, background signal (Author response image 2E).

      • Splicing occurs in the nucleus and intron containing pre-transcripts should be nuclear localized. Thus, intronic signals should remain localized to the nucleus unlike the mature mRNA which translocate to the cytoplasm after processing and thus exonic signals can be found both in the nucleus and the cytoplasm. In keeping with this, we observe no signal in the cytoplasm for the intronic probes and it remains localized within the nucleus as expected and can be seen in Author response image 2F, while exonic signals are observed in both compartments. This suggests to us that the signal is coming from true pre-transcripts. There is no reason for non-specific background labelling to remain restricted to the nucleus.

      • We observe that the mean intronic label counts for both the genes TFF1 and TFF3 increases upon E2-induction compared to uninduced condition (Fig. 2B). Similarly, the mean intronic count for both genes reduce drastically in the TFF1-enhancer deleted cells (Fig. 3C, D). This change in the number of intronic signal specifically on induction and enhancer deletion suggests that the signal is not an artefact and arises from true nascent transcripts that are sensitive to stimulus or enhancer deletion.

      • We expect colocalization of intronic signal with exonic signals in the nucleus, while there can be exonic signals that do not colocalize with intronic, representing more mature mRNA. Indeed, we observe a clear colocalization between the intronic and exonic signals in the nucleus, while exonic signals can occur independent of intronic both in the nucleus and the cytoplasm. This clearly demonstrates that the intronic signals in our experiments are specific and not simply background labelling (Author response image 2G).

      These studies and the arguments above lead us to conclude that the presence of intronic transcripts in the nucleus, away from the site of transcription is not an artefact. We hope the reviewer will agree with us. These analyses have now been included in the manuscript as Supplementary Figure 6 and have been added in the manuscript at line numbers 106-111, 201204,  215-217 and line 231-235. We thank the reviewer for raising this important point.

      Author response image 2.

      Dynamic induction and RNA localization of TFF1 and TFF3 transcription across cell populations using smRNA FISH A. Bar graph depicting the percentage of cells with 1,2,3,4, or greater than 4 sites of transcription for TFF1 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph on right shows the number of cells with zero or non-zero number of alleles firing. B. Bar graph depicting the percentage of cells with 1,2,3,4 or greater than 4 sites of transcription for TFF3 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph in the middle shows the number of cells with 2,3,4 or greater than 4 sites of transcription for TFF3.The graph on the right shows the number of cells with zero or non-zero number of alleles firing. C. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in cells induced for 1 hour with E2. The image shows that when a single allele of TFF1 is firing, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. D. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in uninduced cells. The image shows that when a single allele of TFF1 is firing and transcription is low, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. E. Line profile through several transcripts in the nucleus show uniform and similar intensities indicating that these are true signals. F. 60X Representative images from a single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1 (top) and InTFF3 and ExTFF3 (bottom). The image shows that there is no intronic signal in the cytoplasm, while exonic signals can be found both in the nucleus and the cytoplasm. The scale bar is 5 microns. G. 60X Representative images from single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1. The image shows that all intronic signals are colocalized with exonic signals, but all exonic signals are expectedly not colocalized with intronic signals, representing more mature mRNA. The scale bar is 5 microns.

      One substantial way to improve the manuscript is to take a careful look at previous single cell analysis of the estrogen response, which in some cases has been done on the exact same genes (PMID: 29476006, 35081348, 30554876, 31930333). In some of these cases, the authors reach different conclusions than those presented in the present manuscript. Likewise, there have been more than a few studies that have characterized these enhancers (the first one I know of is: PMID 18728018). Also, Oh et al. 2021 (cited in the manuscript) did show an interaction between TFF1e and TFF3, which seems to contradict the conclusion from Fig. 3. In summary, the results of this paper are not in dialogue with the field, which is a major shortcoming. 

      We thank the reviewer for pointing out these important studies. The studies from Prof. Larson group are particularly very insightful (Rodriguez et al., 2019). We have now included this in the discussion (line 106-111 and line 420-424) where we suggest the differences and similarities between our, Larson’s group and also Mancini’s group (Patange et al., 2022; Stossi et al., 2020). 

      The 4C-Seq data from the manuscript Oh et al. 2021 is exactly consistent with our observation from Fig 3 as they also observed little to no interaction between TFF1e and TFF3p in WT cells, only upon TFF1p deletion, did the TFF1e become engaged with the TFF3p. In agreement with this, we also observe little to no interaction between TFF1e and TFF3p in WT cells (Fig.3A). This is also consistent with our competition model for resources between these two genes. Oh et al. shows interaction between TFF1e and TFF3 when the TFF1 promoter is deleted showing that when the primary promoter is not available the enhancer is retargeted to the next available gene (Oh et al., 2021). It does not show that in WT or at any time point of E2 signalling does TFF1e and TFF3 interact.

      In the opinion of this reviewer, there are few - if any - experiments to interrogate the existence of LLPS for diffraction-limited spots such as those associated with transcription. This difficulty is a general problem with the field and not specific to the present manuscript. For example, transient binding will also appear as a dynamic 'spot' in the nucleus, independently of any higher-order interactions. As for Fig. 5, I don't think treating cells with 1,6 hexanediol is any longer considered a credible experiment. For example, there are profound effects on chromatin independent of changes in LLPS (PMID: 33536240).  

      We are cognizant of and appreciate the limitations pointed out by the reviewer. We and others have previously shown that ERa forms condensates on TFF1 chromatin region using ImmunoFISH assay (Saravanan et al., 2020).  The data below shows the relative mean ERα intensity on TFF1 FISH spots and random regions clearly showing an appearance of the condensate at the TFF1 site. Further, the deletion of TFF1e causes the reduction in size of this condensate. Thus, we expect that these ERα condensates are characterized by higher-order interactions and become disrupted on treatment with 1,6-hexanediol. These condensates are the size of below micron as mentioned by the reviewer, but most TF condensates are of the similar sizes. We agree with the reviewer that 1,6- hexanediol treatment is a brute-force experiment with several irreversible changes to the chromatin. Although we have tried to use it at a low concentration for a short period of time and it has been used in several papers (Chen et al., 2023; Gamliel et al., 2022). The opposite pattern of TFF1 vs. TFF3 expression upon 1,6- hexanediol treatment suggests that there is specificity. Further, to perturb condensates, mutants of ERa can be used (N-terminus IDR truncations) however, the transcriptional response of these mutants is also altered due to perturbed recruitment of coactivators that recognize Nterminus of ER, restricting the distinction between ERa functions and condensate formation.

      References:

      Chen, L., Zhang, Z., Han, Q., Maity, B. K., Rodrigues, L., Zboril, E., Adhikari, R., Ko, S.-H., Li, X., Yoshida, S. R., Xue, P., Smith, E., Xu, K., Wang, Q., Huang, T. H.-M., Chong, S., & Liu, Z. (2023). Hormone-induced enhancer assembly requires an optimal level of hormone receptor multivalent interactions. Molecular Cell, 83(19), 3438-3456.e12. https://doi.org/10.1016/j.molcel.2023.08.027

      Coté, A., O’Farrell, A., Dardani, I., Dunagin, M., Coté, C., Wan, Y., Bayatpour, S., Drexler, H. L., Alexander, K. A., Chen, F., Wassie, A. T., Patel, R., Pham, K., Boyden, E. S., Berger, S., Phillips-Cremins, J., Churchman, L. S., & Raj, A. (2023). Post-transcriptional splicing can occur in a slow-moving zone around the gene. eLife, 12. https://doi.org/10.7554/eLife.91357.2

      Coulon, A., Ferguson, M. L., de Turris, V., Palangat, M., Chow, C. C., & Larson, D. R. (2014). Kinetic competition during the transcription cycle results in stochastic RNA processing. eLife, 3, e03939. https://doi.org/10.7554/eLife.03939

      Drexler, H. L., Choquet, K., & Churchman, L. S. (2020). Splicing Kinetics and Coordination Revealed by Direct Nascent RNA Sequencing through Nanopores. Molecular Cell, 77(5), 985-998.e8. https://doi.org/10.1016/j.molcel.2019.11.017

      Gamliel, A., Meluzzi, D., Oh, S., Jiang, N., Destici, E., Rosenfeld, M. G., & Nair, S. J. (2022). Long-distance association of topological boundaries through nuclear condensates. Proceedings of the National Academy of Sciences of the United States of America, 119(32), e2206216119. https://doi.org/10.1073/pnas.2206216119

      Honkela, A., Peltonen, J., Topa, H., Charapitsa, I., Matarese, F., Grote, K., Stunnenberg, H. G., Reid, G., Lawrence, N. D., & Rattray, M. (2015). Genome-wide modeling of transcription kinetics reveals patterns of RNA production delays. Proceedings of the National Academy of Sciences of the United States of America, 112(42), 13115. https://doi.org/10.1073/pnas.1420404112

      Oh, S., Shao, J., Mitra, J., Xiong, F., D’Antonio, M., Wang, R., Garcia-Bassets, I., Ma, Q., Zhu, X., Lee, J.-H., Nair, S. J., Yang, F., Ohgi, K., Frazer, K. A., Zhang, Z. D., Li, W., & Rosenfeld, M. G. (2021). Enhancer release and retargeting activates disease-susceptibility genes. Nature, 595(7869), Article 7869. https://doi.org/10.1038/s41586-021-03577-1

      Patange, S., Ball, D. A., Wan, Y., Karpova, T. S., Girvan, M., Levens, D., & Larson, D. R. (2022). MYC amplifies gene expression through global changes in transcription factor dynamics. Cell Reports, 38(4). https://doi.org/10.1016/j.celrep.2021.110292

      Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A., & Tyagi, S. (2008). Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods, 5(10), Article 10. https://doi.org/10.1038/nmeth.1253

      Rodriguez, J., Ren, G., Day, C. R., Zhao, K., Chow, C. C., & Larson, D. R. (2019). Intrinsic Dynamics of a Human Gene Reveal the Basis of Expression Heterogeneity. Cell, 176(1–2), 213-226.e18. https://doi.org/10.1016/j.cell.2018.11.026

      Saravanan, B., Soota, D., Islam, Z., Majumdar, S., Mann, R., Meel, S., Farooq, U., Walavalkar, K., Gayen, S., Singh, A. K., Hannenhalli, S., & Notani, D. (2020). Ligand dependent gene regulation by transient ERα clustered enhancers. PLOS Genetics, 16(1), e1008516. https://doi.org/10.1371/journal.pgen.1008516

      Stossi, F., Dandekar, R. D., Mancini, M. G., Gu, G., Fuqua, S. A. W., Nardone, A., De Angelis, C., Fu, X., Schiff, R., Bedford, M. T., Xu, W., Johansson, H. E., Stephan, C. C., & Mancini, M. A. (2020). Estrogeninduced transcription at individual alleles is independent of receptor level and active conformation but can be modulated by coactivators activity. Nucleic Acids Research, 48(4), 1800. https://doi.org/10.1093/nar/gkz1172

      Vargas, D. Y., Shah, K., Batish, M., Levandoski, M., Sinha, S., Marras, S. A. E., Schedl, P., & Tyagi, S. (2011). Single-Molecule Imaging of Transcriptionally Coupled and Uncoupled Splicing. Cell, 147(5), 1054–1065. https://doi.org/10.1016/j.cell.2011.10.024

      Waks, Z., Klein, A. M., & Silver, P. A. (2011). Cell-to-cell variability of alternative RNA splicing. Molecular Systems Biology, 7(1), 506. https://doi.org/10.1038/msb.2011.32

      Zambrano, S., Loffreda, A., Carelli, E., Stefanelli, G., Colombo, F., Bertrand, E., Tacchetti, C., Agresti, A., Bianchi, M. E., Molina, N., & Mazza, D. (2020). First Responders Shape a Prompt and Sharp NF-κB-Mediated Transcriptional Response to TNF-α. iScience, 23(9), 101529. https://doi.org/10.1016/j.isci.2020.101529

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides a valuable characterization of individual sarcomere's contractility and synchrony in spontaneously beating cardiomyocytes as a function of substrate stiffness. The authors, however, provide an incomplete explanation for the observed heterogeneous and stochastic dynamics, so that the work remains mainly descriptive. The work will be of interest to scientists working on muscle biophysics, nonlinear dynamics, and synchronization phenomena in biological systems.

      We appreciate the reviewer’s insightful comments. A detailed explanation of the described phenomena in the form of a theoretical model and simulations was not included in our manuscript, because we believed it would be most impactful to present a detailed quantitative statistical description of the experiments in one manuscript and then introduce the model, which we already had in preparation, in a separate manuscript to avoid diluting the overall message.

      However, following the reviewers’ advice, we have now included a comprehensive model into the revised manuscript. This model qualitatively and quantitatively explains the experimentally observed phenomena and introduces a novel class of coupled relaxation oscillators based on a non-monotonic force-velocity relationship of individual sarcomeres. We believe that this addition significantly strengthens the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors experimentally demonstrated the heterogeneous behavior of sarcomeres in cardiomyocytes and that a stochastic component exists in their contractile activity, which cancels out at the level of myofibrils.

      Strengths:

      The experiments and data analysis are robust and valid. With very good statistics and unbiased methods, they show cellular activity at the individual level and highlight the heterogeneity between biological networks. The similarity of the results to the study cited in [24] demonstrates the validity of the in vitro setup for answering these questions and the feasibility of such in-vitro systems to extend our knowledge of physiology.

      Weaknesses:

      Compared to the current literature ([24]), the study does not show a high degree of innovation. It mainly confirms what has been established in the past. The authors complemented the published experiments by developing an in vitro setup with stem cells and by changing the stiffness of the substrate to simulate pathological conditions. However, the experiments they performed do not allow them to explain more than the study in [24], and the conclusions of their study are based on interpretation and speculation about the possible mechanism underlying the observations.

      We thank the reviewer for contextualizing our work with the literature. We appreciate the comparison to the study by Kobirumaki-Shimozawa et al. which we cite prominently. They observed stochastically varying beating patterns of individual sarcomeres on a beat-to-beat basis. They propose that this arises from a "titin-based mechanism" operating stochastically, which they interpret as being fundamentally linked to sarcomere-length-dependent effects. This interpretation differs from our model. We feel that the inclusion of our comprehensive model in the revised manuscript will emphasize the significance and novelty of our findings. Our work proposes a distinct alternative mechanistic explanation for the observed stochasticity, grounded in the force-velocity relationship and intrinsic stochasticity, and presents additional novel dynamic phenomena (such as popping and high-frequency oscillations) not reported in the literature yet. We outline the key advancements of our study below:

      (1) Physiologically Relevant Human Model System: Our study utilizes human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs). Using a human cell model provides direct relevance for understanding human cardiac physiology and pathophysiology, overcoming limitations inherent in translating findings from rodent models. The hiPSC-CMs exhibit key physiological differences from the mouse ventricular myocytes observed in [24], most notably beating at a significantly lower frequency (~1 Hz or 60 bpm) compared to mice (~5-8 Hz or 300-500 bpm). This difference in timescale is critical as it allowed us to resolve complex intra-beat dynamics that may be different and also harder to observe in mouse cardiomyocytes.

      (2) Advanced Experimental Methodology and Resolution: We developed a novel assay incorporating our SarcAsM algorithm for high-throughput tracking and analysis of individual sarcomere dynamics. This approach gave us spatial resolution better than 20 nm at significantly higher sampling rates than previous studies, including Kobirumaki-Shimozawa et al. Furthermore, our high-throughput in vitro approach made it possible to analyze vastly larger datasets than, e.g., the study by Kobirumaki-Shimozawa et al. (which reports observations from fewer than 20 myofibrils, encompassing less than 200 sarcomeres in total). While we recognize that in-vivo tissue studies present unique experimental challenges, the substantially greater statistical power of our study is crucial for reliably characterizing the complex, stochastic dynamics we report. The enhanced resolution and statistical robustness are not merely incremental; they enable the detailed identification and analysis of heterogeneous behaviors that were previously inaccessible or could not be characterized with the same level of confidence.

      (3) Novel Observed Phenomena: Our high-resolution data reveals specific dynamic behaviors, such as sarcomere "popping" and high-frequency oscillations during contraction, which, to our knowledge, have not been previously reported or characterized in cardiomyocytes. The resolution limitations and the high beating frequency in mouse models may not have permitted the observation of these subtle, but potentially important phenomena.

      (4) Distinct Mechanistic Explanation and Model: Kobirumaki-Shimozawa et al. propose a qualitative model where sarcomere motion variability primarily arises from length-dependent activation. This view is essentially a static one, based on a long history of isometric skeletal muscle experiments, where time-dependent forces are not relevant. We argue that in highly dynamic cardiomyocytes this may not be the most useful approach. While we acknowledge length dependence can play a role, our integrated experimental-theoretical work proposes a different primary mechanism. Our model demonstrates that the observed stochastic heterogeneity and beat-to-beat variations, including the oscillatory motion and popping, can be quantitatively explained by dynamic instabilities arising from a non-monotonic force-velocity relationship of individual sarcomeres in conjunction with intrinsic sarcomere-level stochastic fluctuations. The model emphasizes the active, transient nature of force generation rather than solely assuming length dependence. Our model provides an alternative explanation for the observed dynamics, and a quantitative, mechanism-based understanding.

      Reviewer #2 (Public Review):

      Summary:

      Sarcomeres, the contractile units of skeletal and cardiac muscle, contract in a concerted fashion to power myofibril and thus muscle fiber contraction.

      Muscle fiber contraction depends on the stiffness of the elastic substrate of the cell, yet it is not known how this dependence emerges from the collective dynamics of sarcomeres. Here, the authors analyze the contraction time series of individual sarcomeres using live imaging of fluorescently labeled cardiomyocytes cultured on elastic substrates of different stiffness. They find that reduced collective contractility of muscle fibers on unphysiologically stiff substrates is partially explained by a lack of synchronization in the contraction of individual sarcomeres.

      This lack of synchronization is at least partially stochastic, consistent with the notion of a tug-of-war between sarcomeres on stiff sarcomeres. A particular irregularity of sarcomere contraction cycles is 'popping', the extension of sarcomeres beyond their rest length. The statistics of 'popping' suggest that this is a purely random process.

      Strengths:

      This study thus marks an important shift of perspective from whole-cell analysis towards an understanding of the collective dynamics of coupled, stochastic sarcomeres.

      Weaknesses:

      Further insight into mechanisms could be provided by additional analyses and/or comparisons to mathematical models.

      We thank the reviewer for the feedback. We have enhanced the manuscript by a comprehensive dynamic model, that we also contrast with previously proposed models.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript of Haertter and coworkers studied the variation of length of a single sarcomere and the response of microfibrils made by sarcomeres of cardiomyocytes on soft gel substrates of varying stiffnesses.

      The measurements at the level of a single sarcomere are an important new result of this manuscript. They are done by combining the labeling of the sarcomeres z line using genetic manipulation and a sophisticated tracking program using machine learning. This single sarcomere analysis shows strong heterogeneities of the sarcomeres that can show fast oscillations not synchronized with the average behavior of the cell and what the authors call popping events which are large amplitude oscillations. Another important result is the fact that cardiomyocyte contractility decreases with the substrate stiffness although the properties of single sarcomeres do not seem to depend on substrate stiffness.

      The authors suggest that the cardiomyocyte cell behavior is dominated by sarcomere heterogeneity. They show that the heterogeneity between sarcomeres is stochastic and that the contribution of static heterogeneity (such as composition differences between sarcomeres) is small.

      Strengths:

      All the results are to my knowledge new and original and deserve attention.

      Weaknesses:

      However, I find the manuscript a bit frustrating because the authors only give very qualitative explanations of the phenomena that they observe. They mention that popping could be explained by a nonlinear force-velocity relation of the sarcomere leading to a rapid detachment of all motors. However, they do not explicitly provide a theoretical description. How would the popping depend on the parameters and in particular on the substrate stiffness? Would the popping statistics be affected by the stiffness? It is also not clear to me how the dependence on the soft gel stiffness of the cardiomyocyte cell can be explained by the stochasticity of the sarcomere properties. Can any of the results found by the authors be explained by existing theories of cardiomyocytes? The only one I know is that of Safran and coworkers.

      I also found the paper very difficult to read. The authors should perhaps reorganize the structure of the presentation in order to highlight what the new and important results are.

      We are grateful for this detailed and critical feedback. The observed phenomena (stochastic heterogeneity, popping, high-frequency oscillatory motion) can indeed be explained by a nonmonotonic force-velocity relation along with stochastic fluctuations of individual sarcomeres. At the time of initial submission of this manuscript, we already had a theoretical model in preparation, which both qualitatively and quantitatively explains the observed phenomena. As a result, we included certain interpretations preemptively, which caused some lack of clarity in the absence of the full model. We have now added the model to this manuscript, providing a mechanistic interpretation of our findings. The model is different from prior models in that it emphasizes time-dependent forces, typically disregarded in models built to understand isometric skeletal muscle experiments.

      We have shortened, streamlined and restructured our manuscript to improve the readability and accessibility of our study.

      Recommendations for the authors:

      There is a consensus among reviewers that the link between the stiffness dependence of the observed stochastic dynamics and the proposed tug-of-war mechanism is unclear. More quantitative support and discussion is required, possibly using theoretical modeling.

      We are grateful for the insightful and comprehensive feedback by both editor and reviewers. As suggested, we have now added a comprehensive model explaining the observed phenomena and presenting a new conceptual view on cardiac muscle dynamics.

      Reviewer #1 (Recommendations For The Authors):

      The authors addressed an interesting question related to the dynamics of cardiac cells and their multiscale dynamics. They did a good job in terms of experimental design and data analysis. However, I fear that they do not contribute enough new information to the topic.

      The authors should refer to the study in [24] and explain better the difference between these two studies. Although the different approaches are quite obvious, it is not clear to me what additional insights they add to the problem. They conducted their experiments with different stiffnesses. However, the conclusions they draw from the study are based on speculation (e.g. about the behavior of myosin heads in relation to shortening and relaxation), while their data mainly confirm previous studies. They need to address more explicitly the novelty of their study.

      Novelty and Comparison with Previous Studies: We understand the concern about distinguishing our contribution from prior work, specifically Kobirumaki-Shimozawa et al., 2021.

      As detailed in our public response, these are the key advances:

      Use of a medically relevant human iPSC-CM model vs. mouse cardiomyocytes.

      Superior spatial and temporal resolution via our SarcAsM algorithm, revealing novel phenomena like popping and high-frequency oscillations not previously reported.

      Significantly greater statistical power due to our high-throughput in vitro assay.

      We added a distinct mechanistic explanation based on the dynamic force-velocity relationship and sarcomere-level stochasticity, contrasting with the static, deterministic titin/length-dependence focus of previous studies.

      Interpretation and Speculation: We acknowledge that without the explicit model, some interpretations in the initial submission appeared speculative. As noted in our public response, we had already started to develop a theoretical model explaining our observations at the time of submission, targeting a second follow-up publication. Including interpretations based on this unpublished model prematurely clearly caused confusion. We now include the full model in the revised manuscript.

      Integration of the Theoretical Model: We have now fully integrated the model into the revised manuscript. The model explicitly demonstrates how the non-monotonic force-velocity relationship of individual sarcomeres leads to dynamic instabilities around a critical force threshold. This instability along with stochasticity drives a 'tug-of-war' between coupled sarcomeres, generating complex emergent behaviors.

      Mechanistic Explanation Beyond Length-Dependence: Our model quantitatively reproduces all key experimental findings (stochastic heterogeneity, popping, oscillations) without relying on length-dependent activation effects. This strongly supports our conclusion that the active, transient dynamics of individual sarcomeres governed by the force-velocity relationship are fundamental drivers of these complex contractile patterns. We believe this provides a significant conceptual advance, highlighting a potentially underappreciated aspect of sarcomere dynamics. Previous models focused mostly on length-dependence, historically based on skeletal muscle fiber experiments that were often done under static, isometric conditions. We feel that the new model represents a substantial paradigm shift in understanding highly dynamic muscles such as heart muscle.

      We are confident that the inclusion of the model addresses the majority of the reviewer's concerns.

      Additional comments:

      The authors write of a tug-of-war competition between the sarcomeres, and I'm not sure what they mean by that. I would spend more words explaining this point, especially because it seems to be an important point to describe their results. Similarly, they talked about an all-or-nothing phenomenon when they described the elongation of sarcomeres. What do they mean by this?

      We have revised the manuscript where clarification was needed and now define the terms mentioned more explicitly.

      (1) "Tug-of-War": We used this term metaphorically to describe the mechanical competition between linearly coupled sarcomeres within a myofibril, especially when contracting against rigid external boundary conditions. While it is not a perfect analogy, the metaphor intuitively captures the inherent instability of this interaction: similar to how a team in a real tug-of-war might suddenly yield when one person tires and the rest of team gets overloaded, rather than steadily losing ground, the dynamic instability arising from the non-monotonic force-velocity relationship (detailed in our model, lines 300ff) can cause individual sarcomeres to abruptly change state (e.g., shorten or rapidly lengthen) while under tension from their neighbors. We have removed the term from the title and now use it more sparingly within the manuscript to better reflect its role as an illustrative analogy.

      (2) "All-or-Nothing" Elongation (Popping): The term "popping" describes our experimental observation of sudden, rapid, and extensive elongation of individual sarcomeres. This typically occurs late in the contraction cycle during early relaxation, when overall force may be declining, but individual sarcomeres can still experience significant tension from their neighbors. We described this specific type of rapid elongation in the original manuscript as an "all-or-nothing" phenomenon because, typically, sarcomeres in these events yield rapidly and strongly overshoot their resting length without recovering in a given activation cycle. The speed of popping events is substantially higher than the speed of coordinated gradual shortening observed during systoles that is driven by bound myosin heads. This observation strongly suggests an instability-driven, avalanche-like unbinding of myosin heads from the actin filaments during these events.

      We agree that the term "all-or-nothing" is not precise, and we have removed it, as it is not essential for describing the observed "popping" dynamics.

      The authors claim that the popping frequency increases as a function of stiffness. However, Figure 4E does not really seem to be a common practice in terms of statistical significance. A better description could help to remove this doubt.

      We clarified the presentation of popping frequency data and its statistical interpretation.

      (1) Popping Frequency vs. Substrate Stiffness (previously Figure 4D, now Figure 3G):

      We first corrected that the dependence of popping frequency on substrate stiffness was presented in Figure 4D, not 4E. In the revised, shortened manuscript it can be now found in Fig. 3G. Due to the large number of observations (N) in our dataset, the slight upward trend in popping frequency with increasing substrate stiffness shown in Figure 4D does reach statistical significance using standard tests. For details see Figure captions.

      (2) Popping Frequency vs. Sarcomere Resting Length (previously Figure 4E, now Figure 3H):

      Figure 4E addresses the relationship between popping frequency and the individual sarcomere's resting length. To generate this plot, we binned sarcomeres based on their measured resting length (in intervals of 0.02 µm) and calculated the mean popping frequency within each bin across all conditions. We have now clarified this in the figure caption.

      (3) Interpretation of Length Dependence:

      While Figure 3H clearly shows that longer sarcomeres are more prone to popping, we argue this is likely a modulating factor rather than the sole underlying cause. Two key observations support this interpretation:

      Even very short sarcomeres (e.g., < 1.65 µm resting length) exhibit a non-zero popping frequency (around 5-10%), indicating that popping is not exclusive to long sarcomeres.

      The distribution of resting lengths, now added to the graph, is narrower than the wide range (1.6-2.0 µm) plotted in Figure 3H. Popping still occurs stochastically within a myofibril of sarcomere with relatively similar resting lengths.

      Therefore, while length clearly influences the probability of popping, the phenomenon itself appears to be fundamentally stochastic, occurring across a range of lengths. This is consistent with our model in which dynamic instabilities (driven by the non-linear force-velocity relationship) and stochastic fluctuations are the primary triggers, while length affects probability of occurrence.

      Changes in Manuscript:

      We have revised the text associated with Figures 3G and 3H to clarify the distinction between stiffness and length dependence.

      We have added a statement in the Methods section and figure legends (e.g., Legend for Fig 3) explaining our approach to statistical analysis and interpretation for large datasets where standard p-values may be less informative.

      We believe these clarifications directly address the reviewer's concerns about the data presentation and interpretation in Figure 3.

      Reviewer #2 (Recommendations For The Authors):

      This is an interesting study, which however could and should be extended, see below. The current manuscript contains much less information than its length suggests; its figures contain partially redundant data.

      Taking into account this critical feedback, we have restructured, streamlined and shortened the manuscript to improve readability and accessibility.

      (1) How regular are the cellular contraction cycles?

      Have the authors computed a coefficient of variation of cycle durations?

      Does this regularity depend on substrate stiffness?

      We have substantially improved the detection accuracy of contraction intervals compared to our initial submission (details see SarcAsM, https://www.biorxiv.org/content/10.1101/2025.04.29.650605v1). We calculated the beating rate variability (defined as the standard deviation of cycle durations), and found a low variability of on average less than 0.05 s across the tested conditions. The distribution of this variability is positively skewed, with the majority of values clustering near zero. We have added new panels showing these results to Fig. S2B.

      (2) Which experiments could the authors perform to identify the origin of the apparent 3-Hz oscillations?

      Would these oscillations persist even if the cardiomyocytes would not beat?

      We now address these questions in the revised manuscript.

      (1) Active Nature: The ~3 Hz oscillations are clearly linked to active contraction. They are absent in quiescent, non-beating cardiomyocytes observed under identical conditions, confirming that they are not passive fluctuations or baseline cellular tremors.

      (2) Signal Fidelity: We are confident these are genuine physiological events, not artifacts. Our high temporal resolution (~15 ms frame time) and tracking accuracy (< 20 nm) allow reliable detection because events are well above system noise. This is now explained in the revised manuscript.

      (3) Can the authors augment their study by modeling?

      For example, could the experimental data be fitted by a Kuramoto-type model of the form d phi_i / dt = eps*sin( Omega - phi_i ) + lambda*sin( phi_i - phi_i+1 ) + xi_i, combining phase-locking of sarcomere oscillations with phase phi_i to intracellular calcium oscillations with phase Omega, and anti-phase synchronization between neighboring sarcomeres, as well as noise xi?

      If yes, how would the coupling strength depend on subtrate stiffness?

      We now added a model. While a Kuramoto-type phase model is powerful for studying synchronization, we determined that a more mechanistic approach was required. Crucially, sarcomeres are mechanically coupled in series within a myofibril, and this direct physical linkage is not well-represented by the abstract, phase-based coupling of a Kuramoto model.

      Instead, our model comprises serially coupled sarcomeres, each governed by an underdamped Langevin equation. This framework allowed us to infer the force-velocity relation without any prior assumptions directly from our experimental data, revealing a critical non-monotonic characteristic. As we now emphasize in the revised manuscript, this behavior is mathematically equivalent to a Van-der-Pol relaxation oscillator, which reflects the instability-driven nature of the system.

      Furthermore, and in line with the reviewer's suggestion, our model incorporates a stochastic noise term which we found essential for reproducing the observed phenomena. Without this noise term, the characteristic sarcomere dynamics do not emerge (Fig. 5).

      (4) What is the maximally extended length of titin, and how does this length correspond to the maximal length of popping sarcomeres?

      The force-extension curves of titin have been measured in single-molecule experiments (and the packing density of titin is known) - can the authors use this information to infer the forces acting inside sarcomeres?

      We thank the reviewer for this thoughtful question. While sarcomere length during popping can be measured, inferring the corresponding intra-sarcomeric force is not straightforward in a living, contracting cardiomyocyte. The relationship between extension and force is complex and dynamic, involving multiple molecular components.

      Our data show elongations up to 0.5 μm during popping events. While this magnitude is plausibly within the extensibility range of titin and other mechanically relevant components (Caporizzo & Prosser, 2021; Loescher & Linke, 2023), directly inferring force from this observation is challenging. In such a multi-component system with both active and passive elements, total force comprises several factors that cannot be disentangled from a simple length measurement alone. First, the system is dominated by active, velocity-dependent force generation of cross-bridges, which our model shows is non-monotonic. Second, titin exhibits a restoring force that is strongly strain-rate dependent (Rief et al., 1997), critical during rapid elongation. Third, viscous drag forces within the sarcomere are also highly strain-rate dependent, contributing significantly during rapid length changes. Fourth, other structural elements such as microtubules and intermediate filaments contribute to viscoelastic properties, particularly at high strains (Caporizzo & Prosser, 2021). This complex interplay makes it impossible to map a given sarcomere length to a unique force value using single-molecule titin data alone.

      (5) I urge the authors to make their raw data openly available.

      We agree on the importance of data availability. While the complete raw imaging dataset is several hundred gigabytes and thus impractical to deposit, we have uploaded a comprehensive dataset to Zenodo to ensure full reproducibility. This repository includes a representative subset of raw imaging data (50 cells per condition), with corresponding sarcomere motion data provided in a readable JSON format. Crucially, the deposition also contains the complete aggregated data underlying all figures and statistical analyses presented in the manuscript. All provided data can be programmatically accessed and analyzed using our `SarcAsM` Python API. The data can be accessed at: https://doi.org/10.5281/zenodo.17564384.

      Minor

      (1) How did the authors determine the start and end of contraction cycles when analyzing their data?

      The start and end points of each contraction cycle were identified using ContractionNet, a custom convolutional neural network we developed for this purpose. This method, used for all analyses in the revised manuscript, detects contraction intervals with high accuracy directly from sarcomere dynamics time-series data and significantly outperforms the threshold-based approach used previously. The complete methodology, algorithm description, and validation of ContractionNet are detailed in our companion paper on the SarcAsM analysis software

      (www.biorxiv.org/content/10.1101/2025.04.29.650605v1, see Fig. S6).

      (2) What are the measurement errors in determining Delta_SL?

      The measurement error for the Z-band trajectories is approximately 17 nm. This high tracking accuracy is achieved with our deep-learning-based Z-band segmentation approach, which employs a 3D convolutional neural network (3D U-Net) to leverage both spatial and temporal context for robust Z-band segmentation in noisy, high-speed recordings. A full description of this validation is available in our SarcAsM companion paper (see Figure S3 therein).

      (3) Does popping occur while other sarcomeres are still contracting?

      This is an important point. Yes, popping frequently occurs while other sarcomeres within the same myofibril are still actively shortening. This simultaneity is clearly visualized in the newly added Movie M1, which displays a phase-space plot (velocity vs. length change relative to rest) for all tracked sarcomeres over time. In this visualization, popping events appear as trajectories moving into the top-right quadrant (rapid elongation), while concurrently, other sarcomeres are represented by points in the left quadrants (negative velocity), indicating ongoing shortening. We have included Movie M1 as supplementary material.

      (4) The authors argue that their data on popping sarcomeres is consistent with homogeneous popping probabilities.

      (5) Can the authors assess in simulations how dispersed the popping probabilities of individual sarcomeres could be before they would notice a statistically significant difference to the homogeneous case?

      This question touches on a key challenge in analyzing these complex dynamics. A direct statistical test of popping probability for each individual sarcomere is not feasible, as the number of events per sarcomere over our observation time is too low for robust single-unit analysis. Consequently, our approach relies on testing the cumulative distributions of inter-event spatial distances and temporal gaps across all sarcomeres within a given region (LOI).

      In nearly half of the analyzed LOIs, these cumulative distributions were statistically indistinguishable (p > 0.05) from the geometric distribution expected for a single, homogeneous stochastic process. This provides strong support for our primary conclusion that popping is fundamentally a random phenomenon.

      For the cases that deviate from the homogeneous model, we argue that this does not refute the underlying stochasticity of the events. Instead, we propose this is the expected statistical signature of pooling data from a population of sarcomeres that have slight, intrinsic variations in their individual popping probabilities due to factors like resting length or structural integrity. Even if each sarcomere's popping is a locally random event, a cumulative test performed on a population with varied baseline probabilities is expected to detect a deviation from a simple, homogeneous model.

      Regarding the requested simulation study: While we agree this would be methodologically informative, the sensitivity to detect probability dispersion depends on multiple interacting factors (number of sarcomeres per LOI, observation time, event rates, and the assumed form of heterogeneity). Any single simulation scenario would therefore be highly model-dependent and of limited generality. Rather than introducing additional assumptions, we base our conclusions on the observed agreement with the homogeneous model in approximately half of LOIs and the correlation of deviations with measurable properties (Fig. 4E). A comprehensive statistical analysis would constitute a substantial methodological study beyond the scope of this mechanistically focused manuscript.

      (6) Can the authors measure sarcomere rest length and check if this rest length is correlated with the popping probability of individual sarcomeres?

      Yes, we performed this analysis. As shown in Figure 3H (previously Fig. 4E), we found a positive correlation between sarcomere resting length and popping frequency, confirming that longer sarcomeres have a higher probability of popping.

      Importantly, however, the popping probability remains non-zero even for shorter sarcomeres. As detailed in our response to Reviewer #1 regarding this figure, we interpret resting length as a significant modulating factor that influences popping probability, rather than the sole determinant of the phenomenon.

      (7) Several mathematical models of sarcomere contraction exist (e.g., crossbridge models).

      (8) Could the authors perform computer simulations of several such stochastic sarcomere models coupled in series?

      Alternatively, could the authors discuss this?

      As I understand, references 16-18 model myofibril contraction assuming static variability of sarcomeres, but do not account for stochasticity in the contractility of individual sarcomeres.

      We thank the reviewer for this excellent suggestion. We have performed such simulations, and the theoretical model is a central component of our revised manuscript (new Figures 4 and 5; manuscript lines 316ff).

      As the reviewer points out, previous models (e.g., refs 12 and 14 in our manuscript) have often relied on predefined static variability between sarcomeres to explain heterogeneous behavior. Our work takes a fundamentally different approach. We model the myofibril as a chain of serially coupled sarcomeres, where the dynamics of each unit are governed by an underdamped Langevin equation. This formulation inherently incorporates stochasticity and describes the interplay between a non-monotonic, velocity-dependent active force, a length-dependent passive force, and the mechanical coupling to its neighbors.

      Crucially, the model parameters were not assumed, but were instead inferred by fitting the model directly to our experimental data using a gradient-free optimization algorithm. This data-driven stochastic model was sufficient to quantitatively reproduce key observed phenomena, including high-frequency oscillations and popping events. Our central finding is that these complex behaviors emerge naturally from the coupled system, driven by the non-monotonic force-velocity relationship and intrinsic stochastic fluctuations. This demonstrates that predefined static heterogeneity is not required to explain the observed dynamics.

      (9) The manuscript could be shortened (e.g., lines 52-56 in the introduction provide little extra value).

      We have significantly revised the entire manuscript to improve clarity and readability. We have removed sentences in the introduction as suggested and substantially restructured major sections. One of the main reasons for this was the integration of our theoretical model, which was originally prepared as a separate manuscript. This required us to completely reframe the introduction and reorganize the figures and results.

      We are confident that these extensive changes have resulted in a stronger, more concise and impactful paper that now integrates our experimental findings with a theoretical model.

      (10) Figure 2 is overloaded with data. Several panels could be moved to the SM without compromising the key message.

      Introducing the notation in panels Figures 2A-C does not seem ideal to me; maybe add a cartoon?

      We agree that the Fig. 2 was dense. We have redesigned panels A-F to improve clarity and better guide the reader. We now use a consistent color-coding scheme to link the extrema in the phase portraits (A-C) to the corresponding distributions of individual sarcomeres (E-G). We have also revised the accompanying text to make the figure's logic more transparent.

      We have considered moving panels A-C to the supplementary materials. However, we believe their placement in the main text is crucial for two reasons:

      (1) Revealing Core Dynamics: The length-velocity phase portrait is the first visualization that reveals the underlying near-oscillatory dynamics of individual sarcomeres. This was not an assumed behavior but a critical experimental observation that directly motivated our entire theoretical modeling effort. We now also provide animated versions of these plots (Movies X-Y) to further illustrate these complex dynamics.

      (2) Enabling Model-Experiment Comparison: A phase portrait is a standard tool for comparing experimental data with theoretical models. Retaining it in the main text allows us to directly compare data and model in our new Figures 4 and 5, providing a clear validation of our model.

      (11) Similarly, Figures 4F, G, and H seem dispensable to me.

      (I also wonder how clear the analogy of a coin flip is if a biased coin with probabilities p and 1-p needs to be used.)

      We agree that the previous Figure 4F, which served a purely illustrative purpose, was dispensable and have removed it. The "coin flip" analogy was potentially confusing and we have removed it.

      As part of a broader restructuring of the manuscript, the quantitative analyses from the original Figures 4G and 4H are now presented as Figures 3I and 3J. They provide important supporting evidence for the stochastic nature of the resulting popping events. We believe retaining this quantitative analysis is valuable, and we hope that by streamlining the figure and removing the analogy, we have addressed the reviewer's concerns.

      (12) Equation (1) is unnecessarily complicated. The same holds for Equation (2).

      It might make sense to separate definitions for serial and mutual correlations.

      (This would also simplify the axes labels in Figure 3C.)

      (13) The notation used in Equation (1) is not fully clear.

      I assume t denotes a unit-less time index and T is the unit-less duration of a contraction cycle, measured in multiples of a fixed time interval?

      Regarding comments (12) and (13):

      We thank the reviewer for these helpful suggestions. In response to comment (12), we have separated the definitions for the mutual (r<sub>m</sub>) and serial (r<sub>s</sub>) correlation coefficients, presenting them as distinct calculations rather than as special cases of a single, more complex formula. This makes their definitions more direct and explicit. The calculation for the serial correlation coefficient has also been streamlined into a concise inline definition.

      In response to comment (13), we have clarified the notation in Equation (1). In the manuscript text (lines 208f), we now explicitly state that 𝑡 represents the discrete, unitless time index (i.e., the frame number) within a time-series, and 𝑇 is the total number of frames (i.e., the total duration in frames) of a given contraction cycle.

      While Equation (1) itself is the standard definition for the uncentered correlation coefficient and cannot be algebraically simplified, we have added text to specify this and justify its use. This metric (equivalent to cosine similarity) is appropriate for our analysis as it assesses the similarity in the shape of motion patterns, independent of their mean values.

      Finally, to further streamline the paper, we have removed the velocity correlation analysis and the corresponding parts of Figure 3.

      (14) The authors should make clear in all figures what is experiment and what is simulation.

      We have now clarified the nature of each graph in the figure captions.

      (15) The caption of Figure 3C could be simplified.

      We have simplified all figure captions.

      (16) I found Figure 3A hard to understand.

      We concluded that Figure 3A was confusing and did not add essential information to the manuscript. We have removed it entirely.

      Reviewer #3 (Recommendations For The Authors):

      In conclusion, l think that the manuscript would gain a lot if some more precise and more quantitative interpretation of the results were given. This might require a collaboration with theorists.

      We have integrated a novel theoretical framework into the revised manuscript (new Figures 4 and 5; manuscript lines 300ff as described above.

      This new section introduces a data-driven, stochastic dynamical model that simulates the myofibril as a chain of serially coupled sarcomeres. Each sarcomere's motion is governed by an underdamped Langevin equation, a formulation that inherently accounts for stochasticity. Crucially, our model incorporates a non-monotonic force-velocity relationship inferred directly from our experimental data, rather than relying on predefined static variability between sarcomeres a key distinction from previous theoretical work.

      This integrated model successfully and quantitatively reproduces all major experimental phenomena described in the paper, including high-frequency oscillations and stochastic "popping" events. It demonstrates that these complex behaviors emerge naturally as dynamic instabilities from the coupled system. This addition elevates the manuscript from a descriptive study to one that provides a predictive, mechanism-driven framework for understanding sarcomere dynamics.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The factors that create and maintain diversity in host-associated microbiomes remain poorly understood. A better understanding of these factors will help in the efforts to leverage the adaptive potential of the microbiome to help solve pressing problems in health and agriculture.

      Experimental evolution provides a promising path forward as we can track the causes and consequences in the emergence of novel variants, but experimental evolution remains underutilized in host-microbiome interactions. Here, Gracia-Alvira utilizes a long-term experimental evolution study in Drosophila simulans under hot and cold temperature regimes to identify strain-level variation in an important fly bacterium, Lactiplantibacillus plantarum. They identify three strains of L. plantarum, which are most prevalent in their respective three temperature regimes, suggesting that these are locally adapted bacteria. Then, using a combination of genomics, in vitro, and in vivo, Gracia-Alvira et al attempt to understand the factors that led to the differentiation of the hot and cold L. plantarum and their impacts on the fly host.

      Strengths:

      This is an excellent use of experimental evolution to track the emergence of novelty in the microbiome. The genomic analyses are all solid and appropriate for the data sets. It is especially striking that the comparisons with the other, independent experimental evolution studies in different labs (and across continents between Portugal and South Africa) show a consistent response to temperature. Many have disregarded the microbiome as it is something that is too sensitive to seemingly innocuous variables (particularly in the fly microbiome), such that we cannot find generalities. However, this finding highlights the potential for experimental evolution to uncover these dynamics. The question of how strains emerge and are maintained is timely and is one of the key open questions in host-microbiome evolution currently.

      Weaknesses:

      (1) The framing in the title and throughout the discussion about "subspecies competition" does not match the data that was collected. The subspecies competition requires actually tracking the competitive outcomes between the hot, cold, and unevolved L. plantarum. In the in vivo work, I can see that mixes of the strains were made, but they did not track whether the cold strain outcompeted the hot strain in vivo under cold conditions, for example.

      We thank the reviewer for the honest concern and take this opportunity to defend our claim of "subspecies competition used across the manuscript. As the reviewer states, subspecies competition requires tracking the competitive outcomes between the three clades, and this is what we did by sampling and sequencing across ten years of experimental evolution (Figures 4 and S3). For this reason, we point that the subspecies competition assessment comes from the direct observation of changes in relative abundance across the time series, and not from the follow-up experiments in vivo or in vitro.

      While Figure 4 is suggestive that there is ongoing competition in the hot temperature regime, this is not necessarily shown in the cold, which is dominated by the C clade. It could also be that the bacteria cannot survive in the flies at the different temperatures. The growth curve assays hint that the bacteria can grow, but the plate reader couldn't actually maintain the 18 {degree sign}C temperature (line 455). So all of this evidence is very indirect and insufficient to say that strain competition is driving these patterns.

      We thank the reviewer for the alternative hypothesis that could explain the observed subspecies dynamic. We rule out that dominance of clade C in the cold occurs because the other two clades cannot grow in this regime based on three pieces of evidence:

      (1) In the time series, clades H and U decrease, but never disappear (Figures 4 and S3), even showing some peaks of abundance in specific replicate populations (Figure S3).

      (2) We isolated individuals belonging to clade H in the cold-evolved populations, as shown in figure 2. This is a direct evidence that clade H prevails in the cold-evolved populations, although in low abundance.

      (3) We did grow the three taxa in fly food petri dishes incubated at both temperature regimes, observing growth in all cases.

      We will include the food growth experiment in the revised manuscript as further supporting evidence for growth in both regimes.

      (2) The in vivo results are interesting in that there appears to be a fitness cost of clade C, but the explanation is underdeveloped. I say under-developed because in Figure 4, the cold L. plantarum remains much higher throughout adaptation to the hot temperature regime than the hot L. plantarum in the cold regime. The hot L. plantarum is low abundance throughout the cold regime. I felt like this observation was not explained, but it seems relevant to understanding the strain dynamics.

      We acknowledge that a strong fitness cost of clade C is observed in axenic D. melanogaster. In the native host, D. simulans, with reduced microbiome, we observed delayed development that could even be an advantage depending on the situation, as pointed out by reviewer 3 in the recommendations.

      Even if we assume that flies colonized with clade C are less fit in the experimental evolution, another caveat is whether the flies can actively select for the L. plantarum clade. Under this assumption, a clade that imposes a fitness cost to the fly (clade C) should be selected against over time because the flies colonized by this clade will have less offspring, or develop later than the rest. Alternatively, as the microbiome is shared among all the individuals in the population, the host might not be able to “purge” the pernicious clade, and L. plantarum dynamics might be controlled solely by the relative fitness between clades in the given experimental treatment. We will discuss this hypothesis in the revision as a way to explain the relationship between the abundance of each clade and the effect on the host.

      I will also note that this is not the first time that L. plantarum or other Lactobacillus have been shown to exert fitness costs to Drosophila. Gould, PNAS, 2018, shows that both Lactobacillus plantarum and Lactobacillus brevis in mono-association have lower fitness (measured through Leslie matrix projections using lifespan and fecundity) than axenic flies. Many studies of wild Drosophila fail to find Lactobacillus, or it is low abundance (e.g., Chandler, PLoS Genetics, 2014; Wang, Environmental Microbiology Reports, 2018; Henry & Ayroles, Molecular Ecology, 2022; Gale, AEM, 2025). This might help provide useful context for the in vivo results.

      We thank the reviewer for the references. These observations will be compared to our phenotypic results and discussed in the revised version of the manuscript.

      (3) The data in Figure 4 are compelling to focus on the L. plantarum variants. However, I can see from the methods that the competitive mapping included only other strains of Wolbachia.

      We appreciate the thorough reading of the methods by the reviewer. The competitive mapping comprised two steps: first we discarded the reads that mapped to Drosophila, Wolbachia and additional potential contaminants from sequencing facitilies (human, dog...). This step leaves the reads originated from whole the external microbiome of the flies, including L. plantarum. The second competitive mapping step recruits the reads that map any clade of L. plantarum.

      It is not clear how other members of the microbiome changed in response to the temperature regimes. As I note in point #2, given that Lactobacillus is often rare, it is not clear what the rest of the microbiome looks like over the course of adaptation. Indeed, it seems like Mazzucco & Schlotterer, PRSB, 2021 did a broader analysis of the microbiome and found that Acetobacter is by far the most common bacterium (I think this data is also part of the data shown here?). Expanding on why or why not in this context is important and will improve this study, particularly if the focus is on connecting these evolutionary dynamics to ecological competition to explain the emergence of strain diversity.

      We acknowledge that the rest of the Drosophila microbiome is not addressed in this study, as we wanted to focus the storyline around the intraspecific dynamics found in L. plantarum. We consider that a complete characterization of the whole Drosophila microbiome would unnecessarily elongate the paper and thus we treat it as a constant biotic factor.

      We must point out that our dataset is not the one reported by Mazzucco & Schlötterer, which was done in D. melanogaster, rather than D. simulans. Nevertheless, both experiments share the same infrastructure, temperature regimes and fly maintenance.

      We will include a list of taxa that were isolated from the populations, as well as to report L. plantarum prevalence and abundance across the experiment in order to provide context of the microbiome, beyond L. plantarum, to the readership.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gracia-Alvira et al. investigated how environmental temperature affects competition among members of the microbiome, with a focus on intraspecific diversity, using the Drosophila model. Notably, the authors identified three clades of Lactiplantibacillus plantarum from a natural population of Drosophila simulans collected in Florida. They tracked the dynamics of these three bacterial clades under two temperature conditions over the course of more than ten years. Using comparative genomics and phylogeny, they showed that these three bacterial clades likely adapted to their host independently in a temperature-specific manner. Further, by combining in vitro culture and in vivo mono-association assays, they demonstrated the functional divergence of these three bacterial clades phenotypically, including their growth dynamics and effects on host fitness. Lastly, they performed pathway analysis and speculated on key genomic variance supporting such functional divergence.

      Strengths:

      The laboratory evolutionary experiment in response to cold or hot environmental temperature is impressive, given its more than ten years of experimental time period. This collection of achieved microbiome samples paired with the fly host data can be a valuable resource for the field.

      Weaknesses:

      The laboratory evolutionary experiment can be limited due to its artificial experimental setup. For example, wild flies rely on a more diverse set of food sources and are constantly exposed to new bacterial inoculations, whereas under laboratory conditions, flies live in a more restricted ecosystem. In addition, environmental temperatures differ among different locations, but they also involve seasonal changes within the same region. This manuscript can be strengthened with further discussions that elaborate on these limitations.

      As the reviewer has correctly noted, our experimental setting is not exempt from limitations. Lab-reared flies are fed with a defined standard diet. Furthermore, although the system is not completely close to bacterial migration, this is limited as replicate populations are not allowed to mix during the maintenance of the flies. For this reason, we consider our laboratory setting as a compromise between observing wild populations, which undergo all biotic and abiotic stresses but cannot be manipulated, and evolving the bacteria in absence of the host, or in gnobiotic hosts, in which biotic interactions are not fully considered. We will extend on this in the new version of the manuscript.

      Moreover, the extent of host effects involved in these experiments remains ambiguous, because it is unclear whether these Lactiplantibacillus plantarum mostly reside within fly guts or on Drosophila medium. The laboratory evolutionary experiment possibly favored better colonizers on Drosophila medium under either cold or hot temperatures, which subsequently can saturate fly guts. As fully dissociating these variables can be experimentally tedious, the authors may want to comment more on these aspects in the discussion. Or they may want to consider some measurements. For example, measuring the growth rate of these bacteria on Drosophila medium under different temperatures, in addition to the current MRS culture experiments, or measuring the portion of the Lactiplantibacillus on Drosophila medium versus these stably colonizing fly guts.

      The reviewer's point was briefly addressed in the Results chapter: "Phenotypic differences in liquid culture".

      Reviewer #3 (Public review):

      Summary:

      The study presents an analysis of 297 pangenomes derived from 20 populations of Drosophila simulans, at 19 time points for fast-reproducing individuals in a hot environment, or at 10 time points for slow-reproducing individuals in a cold environment, over a period of more than 10 years. The authors select a particular microbial component of the pangenomes and study the dynamics of Lactiplantibacillus plantarum strains in two environments. They discover that the revealed operational taxonomic units could be divided into three phylogenetic clades, which have their own genomic and genetic features, different adaptive capabilities that depend on the environment, and have a distinct impact on the fitness of the host.

      Strengths:

      The authors prove that bacterial microbiome components are sensitive to the environment and could rapidly (years) be fixed in eukaryotic populations. This study establishes a tractable model that potentially enables the study of variability of the physiological influence of distinct strains of an important commensal species, Lactiplantibacillus plantarum, on the Drsosophila host. It is clearly shown that this single species consists of several phylogenetically and functionally diverse strains. The authors did not limit their interest to their own model, but rather they have integrated a comparative approach by analysing phylogenetic relationships among 92 described L.plantarum strains.

      Overall, the study is novel and delivers important discoveries of a longitudinal, well-replicated experiment, generating a substantial amount of genomic data. It highlights an important dimension of research that environmental selection operates at the subspecies level.

      Weaknesses:

      Even though the authors show only one particular example by conducting their longitudinal experiment, they honestly acknowledge failures important for interpretation of the biological significance of the results (gnotobiotic mono-association experiments was done with D.melanogaster, but not D. simulans) and therefore they state limitations of their conclusions (weaker effects in the non-axenic flies are due to the presence of other taxa or to higher-order interactions with other members of the microbiome). These interactions could significantly affect bacterial growth, metabolism, and physiological influence on the host.

      We agree with the reviewer in that the use gnobiotic animals is a limitation, as by "tuning" the flies' microbiome we are modifying the interactions between members, which can potentially change the phenotypic outcome. Nevertheless, we use it as a complementary approach, rather than the only inference in our study.

      The authors exploit the results of their experiment to speculate about a wide range of evolutionary phenomena, like within-species competition, ecological adaptation and evolution of the host, fitness advantage of bacteria to the host, the benefits of parasitism or mutualism, the domestication of the microbiome, etc. At the end, they conclude that their study "highlights that even subspecies diversity plays a key role in adaptation to environmental temperature". However, the potential mechanisms of such adaptation are barely discussed, so that the focus of the study shifts from the temperature-induced changes in microbial population structures toward metabolism-related adaptations of clade representatives that enable them to diversify their carbon and nitrogen sources. The role of the temperature factor remains elusive.

      We acknowledge that our study does not fully resolve the mechanism by which a different clade ends up dominating each temperature regime. The MRS liquid experiment was an attempt to answer whether differences in optimal growth temperature could explain the temperature-specific abundance of the two clades. Our experiments showed, however, thatthis was not the case. Beyond this point, it is hard to disentangle the role of the temperature, as it could also act indirectly on the bacteria, for example, through the host or the food.

      A second observation in our time series was that a third clade, U, was unfit in both regimes despite starting the experiment in high abundance. For this reason we also studied what made this clade less fit. Based on our analyses, we propose that the decrease of clade U was driven by the shift to a laboratory diet, shared by all experimental populations.

      In addition to that, the paper has a clearly minimalistic experimental approach to address functional properties of the revealed L.plantarum strains, so that their own fitness, or their relationship with the Drosophila host, is characterised superficially. Therefore, the authors' discourse can be speculative rather than factual (especially when the authors use the expression "likely" to share their guesses in the "Results" section). Nevertheless, these minor drawbacks do not underscore the novelty of the discovered phenotypes and the importance of their further investigation.

      We consider the reviewer's concern and will tone down the phrasing when reporting our findings in the revised version of the manuscript.

    1. R0:

      Reviewer #1:

      This paper examines factors associated with Shigella-attributed diarrhea among children aged 6–35 months in Malawi, including a novel assessment of seasonal effect modification. The analyses are technically rigorous and appropriately applied to the observational dataset, and the findings provide valuable evidence to guide targeted interventions, including the forthcoming Shigella vaccine rollout. I recommend publication pending a few minor revisions noted below.

      • The introduction explicitly situates the burden of Shigella in an economic context, but are there any other lenses, perhaps more human-centric, through which we can think about the implications of this burden? • Please include references for the categorization methods of diarrhea severity and WASH in the “Predictor Variables” section of the Methods. • How did you approach producing age group bins for this study? Was it data driven or decided a priori based on some contextual motivation? • Please add a statement justifying Poisson regression as the chosen analytic method. • Given that some samples were tested by culture, some by qPCR, and some by both it would be beneficial to add more clarification in the methods about the different testing procedures and results classification. For the samples tested by both methods, what happened if one result was positive and the other negative? In the discussion you state “Notably, 43% of qPCR-positive cases were also culture-positive, supporting the clinical relevance of the qPCR-detected cases”, however only 43% overlap between the two methods actually seems quite low – can you point to any other studies that have looked at this? • I believe when using generalized estimating equations (GEE) to account for clustering it is standard to report the number of clusters and distribution of cluster size. • These analyses rely on an assumption of missing data at random/completely at random, however the complete case analysis conducted excludes 32% of children with missing vaccination data. Please elaborate on this missingness in the limitations beyond reduction in sample size to include the potential bias introduced if the data is not in fact missing randomly. Alternatively, it may be worthwhile to consider inclusion of the observations with unknown vaccination status as a third category – which may still have relevant interpretation given the reality of often not knowing children’s vaccination status when designing interventions. • Was there any consideration of prior antibiotic use among patients reporting to the clinic with diarrhea? Please elaborate how this may or may not influence these results (perhaps in the limitations). • Please standardize spelling of “enrollment” throughout the manuscript (sometimes one vs two ls). • In Table 1 the Wasting “None” group percentage needs a decimal instead of a comma.

      Reviewer #2:

      The findings are interesting but it need through revision considering the following critical points. • Clearly mention the inclusion and exclusion criteria for selection of patients in the current study? • What was the limitation of the study? • Shigella were isolated and identified using culturing. What was the specie distribution of Shigella? • Mention the duration of the study (months/years) in the abstract. • Briefly describe how the culture and qPCR were used for detection of Shigella. Is Shigella DNA directly detected in fecal sample or it is detected from culture? • Define the criteria of Household drinking water source categorization: Improved, Unimproved?? • The abbreviations used in the tables should be defined in the table’s foot note.

      Academic Editor:

      Two reviewers have evaluated your manuscript and provided their comments below. In particular, please provide more detail in the methods section on the microbiological methods for Shigella detection and add information on any missing critical variables e.g. in the table footnotes. In addition and given the high missingness for vaccination status, conducting a sensitivity analysis for the multivariable models while excluding vaccination would aid with the interpretation of the study findings.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      1) Summary

      This study investigates the mechanochemistry of Arp2/3-mediated branched actin networks at the level of individual branch junctions under load. Using microfluidic single-filament/branch force assays (including constant-force flow and open-chamber imaging) the authors quantify debranching, re‑nucleation, and mother- vs daughter‑interface stability across nucleotide states of Arp2/3 (ADP-Pi, ADP, and an ADP-BeFx proxy for ADP-Pi). They further test effects by two branch regulators (GMF and cortactin). Key findings include: (i) ADP-Pi and ADP complexes share similar force dependence but differ markedly (~20×) in intrinsic dissociation rate; (ii) phosphate turnover on the Arp2/3 complex is rapid ii) affinity for Pi drops when Arp2/3 loses its daughter filament; (iii) quantification from model fits uncovers large stability differences between daughter and mother interfaces of the Arp2/3 complex; (iv) extraordinary high stability of ADP-Pi-like Arp2/3 on the mother filament; and (v) distinct effects of GMF and cortactin on force‑dependent stability. Overall, the work combines technically demanding measurements with mechanistic modeling to probe how nucleotide state and regulatory factors tune branch mechanics.

      2) Major comments:

      1. Low force kinetics and completeness of survival curves (Figure 1). "For all forces, the surviving curves exhibited a clear single exponential behavior...." While the data can be fitted to monoexponential decay curves, data at low forces is clearly incomplete. >90% of branches have not dissociated by the end of the experiment. For the particular data shown in 1C (F00nN, n=60 total branches) it means that the time information is coming from

      Essential; experiment might already be performed. Otherwise straightforward to do (weeks time).

      In figure 1B, we indeed show a Survival curve for ADP-Arp2/3 complex branch dissociation at 0 pN up to 900 seconds. As now shown in updated supp figure S2, the data was in fact acquired for at least 5000 seconds for ADP-Arp2/3 and ADP-Pi states (N=2 repeats for each condition, with n = 60 and 90 branches for ADP-Arp2/3 branches, and 90 and 132 branches for ADP-Pi-Arp2/3 branches). The debranching rates reported in the initial submission were already obtained by fitting the surviving curves over the whole duration of the experiments.

      1. Stability Analysis (Figure 4). I can follow much of the arguments presented in the stability analysis of the daughter vs mother interfaces, which is in principle extremely interesting! However, there are some concerns here:

      i) The authors emphasize the zero force ratio derived from fits (which is linked to the stability difference of the two interfaces in the absence of force) despite this being only weakly constrained by data. Intuitively in the model, the stability difference should grow to very large values as the re-nucleation ratio approaches 1 at low force. This combined with the noise in the data poses an issue in my opinion. Looking at the data and the error margin, I think that the authors cannot state with high confidence that there is a real difference between the relative stability of the daughter and mother interfaces between the two nucleotide states of the complex.

      Essential; analysis and textual revision only

      We thank the reviewer for this comment. The difference in stability between the two interfaces is strongly constrained by the shape of the branch renucleation ratio versus force curve, and its value at 0 pN. This is illustrated in the figure shown below (new Supp Fig. S8), showing the dissociation rates of the two interfaces (in 'dashed' and 'point-dashed' style) that contribute to the overall debranching rate in each nucleotide condition. Despite the limited force range at which we probed the debranching rate, the branch renucleation ratio curve informs us on which interface is the weakest, and how this evolves with force.

      We have assessed the confidence intervals of the parameters obtained from the fits, taking into account the error bars on our experimental datapoints. It seems to indicate that the simultaneous fits of the debranching rate and the branch renucleation ratio curves indeed constrain the parameters quite strongly. These confidence intervals are now reported in the main text and in the summarizing table.

      We have repeated branch renucleation experiments for ADP-BeFx- and ADP-Pi-Arp2/3 complex branches (see new figure 4C&D, and our response to the next point). We believe these new measurements allow a better assessment of the relative stability between the two interfaces for Arp2/3 complex branch junctions in the ADP-BeFx state.

      Still, we agree with the reviewer that the dispersion of the experimental data does not allow us to have a strong confidence on the crossover force and relative stability difference of the interfaces. Therefore, we have slightly toned down the way we present and discuss the differences in stability when comparing the two nucleotide states.

      ii) For ADP-Pi, the renucleation ratio essentially remains flat over the measured force range. Hence, the data can only provide little leverage to estimate both the zero force ratio and, more importantly, the differential distance to the transition state in the slip-bond model in my opinion, which will show in the crossover force. Consequently, the quoted ">100×" stability difference at F=0 and the crossover force >20pN are driven largely by extrapolation rather than direct constraint by data. Given the high number of free parameters in the model, I would anticipate that several crossover forces and differential distances might explain the data nearly equally well. Instead of loosely reporting exact number from fits, I would have hoped for some sort of sensitivity analysis, for instance relying on profile likelihoods. Also parameter values could be reported as bounds (e.g crossover force≫measured range) rather than precise point estimates. This issue re-occurs (albeit not as drastically) for the cortactin experiments (Figure 6).

      Essential; analysis and textual revision only

      As mentioned in our response to the previous point, we have repeated renucleation experiments for ADP-BeFx- (and also for Arp2/3 complex branches in the presence of 50 mM Pi) (see new figure 4C&D) to better characterize the differential distance between to the transition force. The crossover force for the ADP-BeFx state is now 13.5 pN and the ratio of the stability between the two interfaces is roughly 100 times.

      We agree with the reviewer that the dispersion of the experimental data does not allow us to have a strong confidence on the crossover force and relative stability difference of the interfaces. We have thus toned down the way we report these values. We do believe though that the difference we report between the ADP and ADP-BeFx state appears to be significant and needs to be acknowledged.

      As a side note, it has proven to be challenging to pull on branches at forces higher than 7 pN. To apply a large force on the branch junction, we need to have a high flow rate. In this case, it appeared that the height of the filaments (both mother and daughter filaments) above the surface seem to deviate from what we have established in our previous studies (Jegou et al, Nat. Comm. 2013 & Wioland et al, PNAS 2019). This may originate from the fact branched filaments have a more complex shape than an individual filament. Characterizing accurately the evolution of the branch height as a function of the flow rate and applied force would require quite extensive additional characterization, which, we believe, is beyond the current focus of this study on the stability of Arp2/3 complexes.

      iii) One important expectation from the "two slip bond" model is that branch dissociation rates should not necessarily scale mono-exponentially as they mostly do over the accessible force range of the paper. However, once the "minor" pathway of dissociation from the mother starts to dominate at high forces, rates become more force sensitive. This is nicely recaptured by the model fits in Figure S6 but deserves some explanation in the text. Otherwise, people will simply remember the "ADP-Pi is 20-fold more stable than ADP at all forces" message.

      Essential; textual revision only

      We now have rephrased the key sentences (in the Abstract and Results sections) to more clearly state that the debranching rate is not increasing mono-exponentially with force.

      In the Abstract: "Remarkably, we find that branch junctions are over 30-fold more stable when the Arp2/3 complex is in the ADP-Pi rather than ADP state, and that force accelerates debranching with similar exponential factors in both states."

      In the Results section: "The debranching rate seems to increase exponentially with the applied pulling force, in the range of 0 to 6 pN (Fig. 1F; see more refined analysis below). This behaviour is predicted by the Bell-Evans model for a slip bond."

      iv) One important prerequisite for the model is that isolated Arp2/3 complexes (without a daughter filament) should dissociate with equal rates from mother filaments at all flow rates. Since the Arp2/3 complex prefers mother filament curvature, forces experienced by the mother might change its off-rate. It would be good to refer to this assumption in the text and experimentally verify it. I could not find it in the paper nor in Ghasemi et al 2024.

      Essential; simple experiment (a weeks time).

      We thank the reviewer for this important comment.

      First, we investigated whether the viscous drag force, applied on the ADP-Arp2/3 complexes which remain bound to mother filaments could affect their stability. We have performed branch renucleation experiments at different flow rates but with the same pulling force on branch junctions (average force 3.9 pN) by adapting the length of the daughter filament. As shown in new supp. figure S11 (shown below), we did not observe any significant differences between 'low' and 'high' flow rates. If the off-rate of the surviving Arp2/3 was significantly affected by the flow, this would have led to a variation of the renucleation ratio with the flow rate.

      Second, we have investigated the impact of the tension experienced by the mother filament at the location of the branch junction for ADP-Arp2/3 complex branches, with the same pulling force on the branches (average 4.1 pN pulling force on branches). We have quantified the debranching rate from three groups of branches depending on their position along mother filaments. As shown in new supp. figure S12 (shown below), we can observe a small trend, where the debranching rate decreases with the tension on the mother filament at the branching point.

      Doubling the tension on the mother filament from 15 to 30 pN decreases the debranching rate by a third. Though, pairwise logrank tests performed between the survival fractions of the three binned groups do not report any statistical significant difference (all p values > 0.05). One possible explanation for this is the height of the mother filament in the microfluidics flow that increases linearly from the anchoring point to the free barbed end. As a consequence the pulling force on the branches will be higher, as branches experience faster flows.

      For these same groups, upon branch dissociation, all remaining-bound Arp2/3 complexes are exposed to the same flow rate; the branch renucleation ratios were similar. Thus branch renucleation ratio seems to not significantly depend on the tension experienced by the mother filament at the branching point.

      Similarly, Pandit et al PNAS 2020, Extended figure S1, also reported no detectable impact of the mother filament tension on the debranching rate in their assay.

      v) The force dependence of the branch re-nucleation rate (Fig 3D) has been measured previously by the same group (Ghasemi et al). While the data in the older paper has not been fitted by a model, the trend of the data in the previous paper looks conspicuously different. Are there any explanations for this? I speculate that it might be related to actin and ATP not being saturated (low-force re-nucleation rate rarely exceeds 80%) in Ghasemi et al., but it would be good to know what the authors think about this. Essential; textual revision only

      This is a good point. We have plotted the data of the renucleation ratio from ADP-Arp2/3 complex from figure 1F of Ghasemi et al, Sc. Adv. 2024 (performed at 0.3 and 1 µM actin), together with the data of the current study from figure 4D (performed at 1.5 µM actin). We feel this comparison could be of interest to the readers, and have thus integrated it in the manuscript as new supp. figure S13 (shown below).

      As expected, the branch renucleation ratio is lower with lower concentrations of actin. The experimental data points from Ghasemi et al are similarly well fitted by the branch renucleation function obtained for 1.5 µM multiplied by a scaling parameter, which reflects the fact that the branch renucleation ratio is actin concentration dependent (Fig. 6A in Ghasemi et al). This scaling parameter was the only free parameter of those fits.

      Since the branch renucleation ratio depends on the actin concentration as follows, 0.97.kon.([actin] - Cc)kon.([actin] - Cc)+koffATP-Arp2/3 , with kon = 3.4 µM-1.s-1 and koff ATP-Arp2/3 = 0.66 s-1 from (Ghasemi et al. 2024), the scaling parameter obtained by the fits give estimates of the actin concentration in these experiments, of 0.6({plus minus}0.05) and 0.9({plus minus}0.2) µM for the experiments performed at 0.3 and 1 µM respectively in (Ghasemi et al. 2024).

      1. Stability of the authentic ADP-Pi-Arp2/3 complex on the mother filament. The extraordinary stability of the isolated ADP-BeFx-Arp2/3 complex on mother filaments is surprising, especially considering that both ATP and ADP states are much more labile (Ghasemi et al 2024). I would recommend repeating this experiment in the authentic ADP-Pi state with labelled Arp2/3 complexes as a more direct readout, even if this would require working with very high phosphate concentrations.

      Essential; simple experiment (a weeks time).

      We have followed the recommendation of the reviewer and have performed new experiments using fluorescent Arp2/3 complexes for ADP, ADP-BeFx and ADP-Pi states, now displayed in new figure 5C (also shown below).

      For fluorescent Arp2/3 complexes remaining bound to the mother filament, the Arp2/3 complex - mother filament interface is ~ 100 times more stable in the ADP-BeFx state (0.0046 s-1) compared to the ADP state (0.56 s-1). We also assessed the dissociation of surviving ADP-BeFx-Arp2/3 complexes using unlabelled Arp2/3 complexes (previously in figure 4B, repeated experiment shown in new supp. figure S10), which also indicates a remarkable stability.

      The dissociation curve of surviving Arp2/3 complexes in the presence of 50 mM Pi and 200 µM ATP in solution reflects the mixture of Arp2/3 dissociating in the ADP/ATP state and ADP-Pi-Arp2/3 that can either dissociate in the ADP-Pi state or lose their Pi and dissociate in the ATP state. Despite the presence of 50 mM Pi, the rate at which ADP dissociates and ATP reloads rate is much faster than Pi binding. Fitting this survival curve with a function that accounts for the initial double populations and the evolution of the ADP-Pi population (see Methods) gives a good estimate of the Pi release rate.

      OPTIONAL: Further, but beyond the scope of the present paper, would be titrating phosphate in these experiments, which would even allow the authors to independently verify the reduced Pi affinity for Arp2/3 in the mother filament. Of note, this affinity difference is needed to satisfy detailed balance in the reaction scheme (Fig 4 D)!

      We thank the reviewer for this suggestion. High concentrations of phosphate in the buffer renders glass surfaces quite sticky in our assays. We've tried several different passivation strategies (BSA, PLL-PEG, K-casein, ...) but none gave satisfactory results. So titrating phosphate, by going beyond 50 mM phosphate, proved to be quite challenging.

      Detailed balance, considering the two possible routes connecting the ADP-Pi-Arp2/3 complex branch junction state and the surviving ADP-Arp2/3 complex state, can be written as KPi rel.branch junction . Kdebranching ADP-Arp2/3 = KdebranchingADP-Pi-Arp2/3 . KPi rel.surviving Arp2/3.. Some of these affinity constants are not known, because of the inability to determine reverse reactions rates such as the rebinding of a daughter filament to a surviving Arp2/3. It is thus hard to determine how the affinity of Pi for Arp2/3 complex changes between Arp2/3 complexes at branch junctions and surviving Arp2/3 complexes on mother filaments.

      While we cannot determine the affinity constant of Pi for a surviving Arp2.3 complex, our data indicates that the dissociation rate of Pi is higher from Arp2/3 complexes at branch junction (koff = 0.21 s-1) than from surviving Arp2/3 complexes (koff = 0.05 s-1). This unexpected finding indicates that surviving Arp2/3 complexes adopt a conformation where the nucleotides are readily exchanged, but where the 'back door' for Pi release is less open. We now discuss this point in our revised manuscript.

      1. Importance of "surviving" ADP-Pi-Arp2/3 complexes. The authors show a) rapid turnover of Pi on the ADP-Arp2/3 complex in both branch- or mother filament-bound state and b) the lowered Pi affinity of the latter. Nonetheless, they emphasize the importance of long-lived "surviving" ADP-Pi bound complexes on the mother (even stated in the abstract). I understand that this fraction shows under some experimental conditions (BeFx), but unless I am missing something, most complexes should rapidly lose their phosphate and either exchange nucleotide or dissociate from the mother under physiological conditions. Please clarify or tone done.

      Essential; textual revision only

      We thank the reviewer for their remark. We have tried to clarify this aspect in the manuscript.

      As shown now with the departure rate of fluorescent surviving Arp2/3 complexes together with branch renucleation data, we show that surviving ADP-Pi-Arp2/3 complexes are quite stable on mother filaments, because they detach and release their Pi slowly, such that branch regrowth will occur provided there is actin in solution. In the absence of actin monomers, as the reviewer correctly points out, the surviving ADP-Pi-Arp2/3 will predominantly release its Pi and thus become a surviving ADP-Arp2/3 complex. We have modified the text to avoid any confusion.

      1. GMF mechanism. The authors claim that GMF "...accelerates the departure of the surviving Arp2/3 complex from the mother...". I assume that they infer this from decrease in the re-nucleation ratio. However, alternatively GMF could simply dwell on the complex, inhibiting re-nucleation without promoting dissociation from the mother. The authors should either monitor Arp2/3 dwell times directly to discriminate between these possibilities or be more cautious in their conclusions.

      Essential; simple experiment (a weeks time) or textual revision.

      In Ghasemi et al. Sci. Adv. 2024, we examined the departure of Arp2/3 from the mother filament after GMF-induced debranching using fluorescent Arp2/3. Most of the fluorescent Arp2/3 dissociated from mother filaments within the same frame as the branch, i.e. within 0.5 seconds after the debranching event, and none were visible after another second . This could be due to Arp2/3 departing with the branch or an accelerated departure after branch dissociation. In any case, this rules out the possibility that GMF would dwell on the surviving complex for a substantial amount of time without promoting dissociation from the mother.

      In the present manuscript, we now show that increasing the ATP concentration 10-fold (from 0.2 to 2 mM) is sufficient to restore the branch renucleation ratio to its level without GMF. This shows that GMF does not cause Arp2/3 to leave with the branch, but rather that it (also) acts on the surviving Arp2/3 complex, in a way that is countered by high concentrations of ATP. More specifically, it suggests that GMF accelerates the departure of the surviving ADP-Arp2/3 complex, either directly and by hindering the reloading of ATP, and that GMF does not affect the surviving Arp2/3 complex once it has reloaded ATP.

      We now discuss these two non-mutually exclusive possibilities for the accelerated dissociation of the surviving ADP-Arp2/3 complex in the manuscript.

      6.Cortactin mechanism and the "leash model". I must say that the cortactin data are the most puzzling part of the paper and hard to reconcile with what we know from structure. I was hoping to find some of this resolved in the discussion. However, I do not understand the "leash model" in the discussion section for cortactin-mediated branch stabilization: "This would explain the observed increase in branch survival compared to the absence of cortactin. As the pulling force is increased, this rebinding mechanism becomes less efficient." According to my understanding of the data, this is opposite to what happens. Cortactin only stabilizes the labile interface at elevated forces! Some re-writing might help here.

      Essential; textual revision.

      We thank the reviewer for having us think more thoroughly about the model we initially proposed. We now believe that our 'leash' mechanism is not able to fully recapitulate our observations in a simple and satisfactory manner.

      We now propose a much simpler model, where the binding of cortactin to the Arp2/3 complex at the branch junction simply changes the energy landscape of the Arp2/3-daughter interface without the need to invoke a rebinding of the daughter filament upon branch departure. We have updated our interpretation of the data in the Discussion section accordingly.

      Overall, our results on the impact of cortactin on branch renucleation highlights a surprising behaviour that would require further investigation to fully decipher the underlying molecular mechanism.

      3) Minor comments

      Organization: - I do not want to impose on how to best tell the story, but I felt that Fig1 A-D and Fig 2 A-B belong to one logical unit (nucleotide dependence), whereas Fig 1 E-F and Fig 2 C belong to the other (Pi binding and exchange). Perhaps consider re-organizing to streamline presentation?

      We thank the reviewer for their suggestion. We agree that it flows more naturally as suggested, and have made the changes! Thank you.

      Semantics/Typos: - Abstract: „... ADP-Pi and ADP-Arp2/3 detach with the same exponential increase as a function of force...". Increase should refer to the dissociation rate, which should be added to the sentence.

      We have corrected this.

      Results page 8: "...and the majority of Arp2/3 complexes detach from the mother filament while remaining bound to the branch at the debranching time." "Branch" should likely be daughter here, as there is no branch after dissociation of either interface.

      We have corrected this, thank you.

      Results page 13: "Exposing ADP-BeFx-Arp2/3 complex branch junctions to a saturating amount of GMF...". It is strange to imply saturation, because GMF likely simply does not bind to the complex in this nucleotide state with appreciable affinity. Suggest to change to "high".

      We have made the changes accordingly.

      Discussion page 18: "Moreover, in mammalian Arp2/3, His80 in Arp3 (corresponding to His73 in mammalian actin) is not methylated, and corresponds to residue N77 in Arp3, which is also not modified." N77 likely belongs to Arp2?

      We have made the changes accordingly.

      Discussion page 19: "We showed that Pi affinity for Arp2/3 complexes at branch junctions is around 3.7 mM (Fig. 1), a value which lies within the reported 1-10 mM Pi concentration measured in the cytosol in different mammalian cell types". Notably, this is not too different from F-actin, which should be mentioned. By this measure alone, free inorganic phosphate could also directly regulate actin filament stability!

      We now mention this and discuss that intracellular Pi can also impact actin filament nucleotide state.

      Future interest (non essential): - It would be utterly exciting (but beyond current scope) to quantify how instantaneous debranching rates evolve for naturally aging branches starting from ATP-Arp2/3 complexes!

      We thank the reviewer for this remark. It is indeed quite beyond the scope of the current study, as this would require a way to probe ATP-Arp2/3 complex branches while daughter filaments are still quite short (so pulling on them is difficult). An interesting alternative could be to use ATP analogs, such as App-NHp (aka AMP-PNP), to stabilize this state. However, some studies have mentioned that App-NHp is not very stable.

      Significance

      General assessment:

      This is a compelling and carefully executed study that delivers a clear mechanistic framework for how Arp2/3 branch junctions fail and re‑form under load. The central strength is the tight integration of state‑of‑the‑art reconstitutions with careful and original kinetic analysis. The experimental design is elegant and experiments have been carried out to a masterful standard. The figures are clear, the statistics are appropriate with some exceptions as detailed above. There are very few labs in the world that could have achieved this feat!

      A few aspects could be further strengthened, most notably the explanation and application of the "two slip bond" model as well as slightly more restraint in speculating around specific regulatory mechanisms. However, these are minor refinements that do not detract from the important contributions of the paper.

      Overall, the clearly work merits publication with high priority after revision; most requested changes are textual/analytical with very few targeted experiments, which would substantially strengthen core claims.

      We thank the reviewer for their positive evaluation of our manuscript. We hope that our responses to the detailed points above, along with the corresponding revisions of the manuscript, will alleviate their concerns.

      Advance relative to prior literature: The major novel findings of the paper are already summarized above. There is some recent work done on the subject of branch mechanics by the authors (Ghasemi et al 2024, PMID: 38277459) and others (Pandit et al 2020 PMID: 32461373), but the focus of the present work is clearly unique and the there is plenty of novel insight.

      Audience and impact: Primary audience: specialists in cytoskeleton dynamics, in vitro reconstitution single molecule biophysics, and mechanobiochemistry. Secondary: researchers in cell motility, morphogenesis and mechanobiology, physicists working on active matter and modelers studying force producing and load-bearing biopolymer networks. The results and analysis framework should inform quantitative models of branched network turnover under load and the interpretation of regulatory factor action in vivo and in cells.

      Reviewer expertise: Actin dynamics; biochemical reconstitution; single molecule approaches; biophysics.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Xiao et al examine the molecular events occurring when Arp2/3 complex-mediated actin filament branches are removed from mother actin filaments. They do this using microfluidics assay with purified proteins combined with single filament TIRF imaging of branched actin filaments with distinct fluorescent labels. The contribution of different nucleotide states of Arp2/3 complex are tested in conjunction with the relationship force exerted on the branches and regulatory protein involvement from GMF and cortactin. The data seem comprehensive and highly quantified in response to concentration, force, fraction of branches and survival times and branching rates. They find that ADP-BeFx and high phosphate concentrations (leading to the ADP-Pi state) leads to a slower debranching rate at a given level of force applied. The ability to rapidly switch the buffer gives powerful information about response times of debranching compared with other actin remodelling events. They use renucleation experiments to determine that the previous debranching event most often occurs at the Arp2/3 complex/daughter interface, showing that filaments will be ready to re-branch in the stable ADP-Pi bound state. GMF addition allows debranching of the ADP state to occur at a lower force. Cortactin acts similarly to the ADP-Pi state to increase branch stability.

      Specific comments

      The pulling force on the branches seems to arise from different flow rates in the microfluidics. Viscous drag is mentioned and I can see there is methylcellulose in the buffer. It would be helpful to have the explanation of the conversion between flow and force, even if it has been standard in previous work.

      We apologize if this was unclear: in microfluidics experiments, the buffer does not contain methylcellulose. Methylcellulose is only used for 'open chamber' experiments, where no force is applied to Arp2/3 branches, to maintain them in the TIRF field of excitation (Figure S2).

      To better clarify the conversion between flow and force, we have rephrased and extended the Methods section to explain how the force on the branch junction is computed based on the local flow velocity and the length of the daughter filament.

      Pg 5 - what was the motivation to titrate phosphate? It seems a stretch that intracellular Pi levels are tuning branching inside cells more than protein-mediated control (GMF or cortactin) - can the authors evidence this at all?

      We are not claiming that the level of Pi plays a stronger regulatory role than proteins. We show that inorganic phosphate tunes the state of the Arp2/3 complex, which in turn modulates the action of regulatory proteins, such as GMF and cortactin.

      Nonetheless, we do show that the contribution of inorganic phosphate is quite central as it can (1) strongly stabilize branch junctions (~30-fold decrease in the dissociation rate), and (2) tune the activity of GMF and cortactin on Arp2/3 complexes at branch junctions as well as on the 'surviving' Arp2/3 complexes that remain bound to mother filaments.

      We thus titrated phosphate and found that its impact on Arp2/3 complex stability is significant in the range of Pi concentration that is explored in cells. For the sake of completeness, and following a comment from reviewer #1, we now also mention the affinity of Pi for actin subunits in filaments in the Discussion, and discuss the impact of intracellular Pi on actin itself.

      Minor comments

      • In the introduction, while the structural and mutagenesis evidence is clearly stated, in other cases a bit more detail would be helpful e.g. 'biochemical studies', which referred measurement of hydrolysis rates using radiolabelling

      We have made changes to more precisely define which biochemical assays were used in previous studies.

      • Page 3 Figures shouldn't be referenced in the introduction

      We have removed the references to the figures from the introduction.

      • Page 3 slip bond behaviour needs explanation

      We now explain the concept when first using this concept in the manuscript, as follows: "The debranching rate seems to increase exponentially with the applied pulling force, in the range of 0 to 6 pN (Fig. 1F; see more refined analysis below). This behaviour of accelerated debranching with the increase of the applied force is similar to the 'slip bond' concept, as predicted by the Bell-Evans model of the force-dependent lifetime of the interaction between two proteins".

      • Figure 1B seems to be a theoretical schematic which is superfluous

      We suppose that the reviewer is actually referring to figure 3B of the initial manuscript, describing the energy potential of a molecular interaction as a function of the reaction coordinate. We agree with the reviewer that it is not absolutely required and we have removed it.

      • Figure 4D is helpful, different weight lines might help even more to explain the dominant pathways

      We have made modifications to the biochemical reaction scheme in this figure (now figure 5F in the revised version). We hope we succeeded in improving its readability. Since the different paths depend on mechano-chemical parameters, there is no real dominant pathway per se.

      **Referee cross-commenting**

      Rev1 sounds like the specialist here. I can't comment on their requests. Some similar points arise between the reviewers which need addressing.

      Reviewer #2 (Significance (Required)):

      Significance

      Taking a look at references 16 and 19, I do not find it clear what is achieved differently in the current work compared to these papers and what agrees and what disagrees. If it's a species difference I might expect the two species would be analysed side-by-side in this paper.

      We thank the reviewer for this important comment. The goal of our study was not to compare the behaviour of mammalian and yeast Arp2/3 complexes.

      We now try to better explain that the motivation of the present work is to address how the nucleotide state of the Arp2/3 complex tunes actin branch mechanosensitive stability, and regulates interactions with well known Arp2/3 complex binding proteins. Most of the reactions are quantified here for the first time. Moreover, the experiments with branch junctions in different nucleotide states are done under controlled mechanical conditions, providing the first direct measurements of the force-dependence of the debranching reactions. Our detailed kinetic analysis of the full reaction scheme allows us to model the different binding interfaces of the Arp2/3 complex.

      In addition, it is worth noting that:

      1. Species matter and this is why ref 16 and 19 can give the impression to disagree on the ability to renucleate branches thanks to the stability of surviving Arp2/3 complexes on mother filaments.
      2. In ref 16 (Pandit et al, PNAS 2020) species are mixed (yeast Arp2/3 and mammalian alpha actin from skeletal muscle), likely leading to a different behaviour compared to the only mammalian protein situation we examine in our current work. In particular, with mixed species one misses the ability to renucleate, as shown in our previous study Ghasemi et al (ref 19). However, since mixing species does not correspond to anything physiological, we do not think it is worth repeating these conditions alongside our experiments.
      3. Further, the analysis carried out in ref 16 suffers from important limitations: the force was unknown (not calibrated) and the data was fitted by a model that compounded several reactions, providing only an indirect estimation of the rates, in particular at zero force. In contrast, we have worked with calibrated forces (including dedicated experiments at zero force) and we have carried out specific experiments to directly measure several rates.
      4. In ref 19 (our earlier work) we did not investigate the impact of the nucleotide state of the branch junction at all, and we did not systematically measure the dissociation rates as a function of force. Contrary to Pandit et al, we directly measure the difference in branch stability at zero force between ADP and ADP-Pi states and show that the ~ 30 fold difference holds true at all probed forces. Last, the force dependence of the branch renucleation success rate gives us crucial information on which of the two Arp2/3 complex interfaces ruptures first.

      I'm not understanding how the authors can distinguish effects of adding phosphate and BeFx on Arp 2 and 3 compared to effects on actin. Importantly, are possible accompanying changes in the actin filament a confounding factor?

      We have checked that the nucleotide state (ADP-BeFx and ADP-Pi versus ADP) of the mother and daughter filaments have no impact on branch stability:

      • In the experiments shown in figure 2F, where the buffer condition to which branches are exposed is quickly changed from phosphate buffer to buffer without phosphate, we observe a rapid change of branch stability. Actin subunits at the branch junction are in F-actin conformation according to recent cyroEM observations (ref. Chavani et al, Nat Comm. 2024; Liu et al, NSMB 2024). These actin subunits, initially in the ADP-Pi state, are expected to age and become ADP with a rate of ~ 0.007 s-1 (ie half-time of 100 s; ref. Jegou et al, PLoS Biology 2011, Ooosterhert et al, NSMB 2023), a much lower rate than the observed change of the debranching rate (0.21 s-1). This means that the debranching rate is independent of the nucleotide state of daughter and mother filaments.

      • In new supp. Figure S4, we show that the debranching rate is similar for ADP-Arp2/3 complex branch junctions initiated from ADP- or ADP-BeFx-actin mother filaments.

      • In new supp. Figure S9, we initially exposed branch junctions to a BeFx solution then monitored debranching and branch renucleation in our standard buffer (ie without BeFX or Pi). We observed multiple rounds of branch renucleation, the first with ADP-BeFx-actin daughter filaments, and the following with daughter filaments never exposed to BeFx. They all had the same debranching rates and renucleation success rates.

      The paper is quite specialist to read and the advance appears to be incremental. My expertise is in molecular pathways to actin regulation outside the main area of the paper.

      The results we present in this study are often unexpected, and some go counter long-standing assumptions. The regulation of Arp2/3-nucleated branches is of importance for the stability and the force-generating capabilities of many actin networks in cells. Last, most of the measurements that we present had never been done, mainly because experiments are difficult to achieve, and require specific tools to monitor several events while controlling the applied force.

      We believe our results are of broad interest as they go counter long-standing assumptions. We have rewritten the text in several instances to convey our message more clearly.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Please find enclosed the review of the manuscript "Inorganic phosphate in Arp2/3 complex acts as a rapid switch for the stability of actin filament branches" by Xiao et al.

      The authors provide a detailed investigation of how the nucleotide bound to the Arp2/3 complex affects branch stability under flow force. From a kinetic perspective, this is an elegant study with generally high-quality data, although some conclusions rest on assumptions rather than direct experimental evidence.

      We thank the reviewer for their positive feedback. We have improved our manuscript and performed important additional experiments to provide more direct experimental evidence of our conclusions.

      A key question concerns the physiological relevance of these findings. For instance, the concept of branch regrowth may not be applicable in cellular contexts, since forces by actin polymerization would displace existing branches away from sites where they generate this active forces. The authors should clarify the relevance of regrowth during active force generation by branched networks.

      We thank the reviewer for this comment. Our in vitro results indeed point to a previously unreported property of branched actin networks, i.e. the ability of Arp2/3 complexes to readily renucleate branches in the ADP-Pi state and that it does require reloading ATP within Arp2/3.

      Branched actin networks, especially the lamellipodia or endocytotic patches, do exert active force thanks to actin polymerization of the individual branches at the forefront. Though, the whole actin network is exposed to stress, and the architecture of the network (inter-branch distance, crosslink between branches, ...) presumably strongly impact its mechanical properties.

      In the case of other types of branched actin networks, such as the actin cortex, myosin motor put the whole network under tension. Such pulling forces on actin branches, depending on the amplitude of the pulling force, can lead to branch regrowth, and network self-repair.

      We have modified the text to make the physiological relevance clearer.

      Additionally, all experiments employ flow conditions that branches would probably not experience in cells-notably, the flow direction in the cellular context would be reversed. Altering the flow direction relative to the branches could affect not only the relationship between flow rate and branch stability, but potentially other system properties as well.

      We agree with the reviewer that in cells branches will not experience flow conditions similar to the ones we use in our in vitro assay. Nonetheless, in cells we expect mechanical stress on the branch junction to be applied in all directions. In lamellipodia, the compressive force applied at the leading edge is expected to result in diverse local orientations of the force on individual branch junctions within the network (as explained in Lappalainen et al. Nat Rev MBC 2022). Also, branch junctions are found in the cell cortex, where they are exposed to pulling forces resulting from the action of myosin motors and crosslinkers on mother and daughter filaments.

      This impact of the direction of the flow was addressed in our previous publication (Ghasemi et al, Sc. Adv. 2024, figure 2) and, to a lesser extent, by the lab of Enrique de la Cruz in Pandit et al, PNAS 2020 (ref. 16). We reported that flow direction has a minimal effect, if any, on branch dissociation rate and renucleation ratio.

      Reviewer #3 (Significance (Required)):

      Furthermore, the study appears not to account for the mother filament (particularly its nucleotide state) or the actin subunit bound to the Arp2/3 complex. The authors should discuss why their interpretation focuses exclusively on the Arp2/3 complex rather than on the actin filaments or Arp2/3-bound actin subunit.

      We have checked that the nucleotide state (ADP-BeFx and ADP-Pi versus ADP) of the mother and daughter filaments has no impact on branch stability :

      • In the experiments shown in figure 2F, where the buffer condition to which branches are exposed is quickly changed from phosphate buffer to buffer without phosphate, we observe a rapid change of branch stability. Actin subunits at the branch junction are in F-actin conformation according to recent cyroEM observations (ref. Chavani et al, Nat Comm. 2024; Liu et al, NSMB 2024). These actin subunits, initially in the ADP-Pi state, are expected to age and become ADP with a rate of ~ 0.007 s-1 (ie half-time of 100 s; ref. Jegou et al, PLoS Biology 2011, Ooosterhert et al, NSMB 2023), a rate much lower than the observed change of the debranching rate (0.21 s-1). This means that the debranching rate is independent of the nucleotide state of daughter and mother filaments.

      • In new supp. Figure S4, we show that the debranching rate is similar for ADP-Arp2/3 complex branch junctions initiated from ADP- or ADP-BeFx-actin mother filaments.

      • In new supp. Figure S9, we initially exposed branch junctions to a BeFx solution then monitored debranching and branch renucleation in a regular buffer. We observed multiple rounds of branch renucleation, the first with ADP-BeFx-actin daughter filaments, and the following with daughter filaments never exposed to BeFx. They all had the same debranching rates and renucleation success rates.

      An important concern involves the use of KPi (inorganic phosphate). Based our experience, KPi appears to have effects beyond simply impacting nucleotide state-actin filaments seem to assemble differently in the presence of KPi. The authors should exercise caution in their interpretation of KPi-based experiments.

      Concentration of KPi (up to 50 mM Pi) did not slow down barbed end elongation rate in our experiments.

      Overall, while the technical quality and kinetic analyses are state-of-the-art, relating this work to physiological contexts remains challenging, and some conclusions appear overstated.

      We have made changes in the discussion to try to more clearly relate our in vitro observations and conclusions with the cellular context where branch renucleation could have a strong impact on the architecture and mechanics of actin networks.

    1. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Okuno et al. re-analyze whole-brain imaging data collected in another paper (Brezovec et al., 2024) in the context of the two currently available Drosophila connectome datasets: the partial "FlyEM" (hemibrain) dataset (Scheffer et al., 2020) and the whole-brain "FlyWire" dataset (Dorkenwald et al., 2024). They apply existing fMRI signal processing algorithms to the fly imaging data and compute function-structure correlations across a variety of post-processing parameters (noise reduction methods, ROI size), demonstrating an inverse relationship between ROI size and FC-SC correlation. The authors go on to look at structural connectivity amongst more polarized or less polarized neurons, and suggest that stronger FC-SC correlations are driven by more polarized neurons.

      Strengths:

      (1) The result that larger mesoscale ROIs have higher correlation with structural data is interesting. This has been previously discussed in Drosophila in Turner et al., 2021, but here it is quantified more extensively.

      (2) The quantification of neuron polarization (PPSSI) as applied to these structural data is a promising approach for quantifying differences in spatial synapse distribution. The revision now uses morphological cable length for some analyses rather than straight-line distance, which improves the realism and interpretability of these results.

      Weaknesses:

      One should not score noise/nuisance removal methods solely by their impact on FC-SC correlation values, because we do not know a priori that direct structural connections correspond with strong functional correlations. In fact, work in C. elegans, where we have access to both a connectome and neuron-resolution functional data, suggests that this relationship is weak (Yemini et al., 2021; Randi et al., 2023). Similarly, I don't think it's appropriate to tune the confidence scores on the EM datasets using FC-SC correlations as an output metric. While it is likely that some FC-SC relationship does exist at large scales, it does not in my view justify use of this metric for evaluating noise removal methods, since such methods may inadvertently remove real neural correlates. This concern remains unaddressed in the revision.

      Any discussion of FC-SC comparisons should include an analysis of excitatory/inhibitory neurotransmitters, which are available in the fly connectome dataset. The authors examine the ratios of input and output neurotransmitters in different defined regions. However, I think it would be more useful to integrate the neurotransmitter information more fully into the assessment of SC, for instance: examining the signed weight (excitatory - inhibitory), or by examining the excitatory and inhibitory networks separately.

      Comparisons between fly and human MRI data are also premature here. Firstly, the fly connectomes, which are derived from neuron-scale EM reconstructions, are a qualitatively different kind of data from human connectomes, which are derived from DSI imaging of large-scale tracts. Likewise, calcium data and fMRI data are very different functional data acquisition methods-the fact that similar processing steps can be used on time-series data does not make them surprisingly similar, and does not in my view constitute evidence of "similar design concepts."

      The comparison of FlyEM/FlyWire connectomes concludes that differences are more likely a result of data processing than of inter-individual variability. If this is the case, the title should not claim that the manuscript covers individual variability.<br /> The analysis of the wedge-AVLP neuron strikes me as highly speculative, given that the alignment precision between the connectome and the functional data is around 5 microns (Brezovec* et al, PNAS 2024).

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors analyze connectome data from Drosophila and compare the physical wiring with functional connectivity estimated from calcium imaging data. They quantify structure-function relationships as a correlation of the two connectivity modalities. They report correlations roughly comparable to what has been described in the literature on sc/fc relationships in mammalian connectome data at the meso-scale. They then repeat their analysis, focusing on segregated versus unsegregated synapses. They derive separate connectomes using one or the other class of synapse. They show differential contributions to the sc/fc relationships by segregated versus unsegregated synapses.

      Strengths:

      There is nice synthesis of multimodal imaging data (Ca and EM data from flies and meso-scale data from human and marmoset).

      Thank you very much for your comments.

      Weaknesses:

      (1) The paper is written in an unusual way. The introduction intermingles results with background, making it hard to figure out what precisely is being tested.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) There are also major methodological gaps. Though the mammalian connectomes are used as a point of reference, no descriptions of their origins or processing are included.

      The reanalysis of marmoset data is presented in Ext. Data Figure. However, as pointed out by other reviewers, the data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      (3) A major weakness stems from the actual calculation of the sc/fc correlation. In general, SC is sparse. In the case of the EM connectomes, it is *exceptionally* sparse (most neural elements are not connected to one another). The authors calculated sc/fc coupling by correlating the off-diagonal elements of sc (the logarithm of its edge weights) and fc matrices with one another. The logarithmic transformation yields a value of infinity for all zero entries. The authors simply impute these elements with 0. This makes no sense and, depending on whether these zero elements are distributed systematically versus uniformly random, could either inflate or deflate the sc/fc correlations. Care must be taken here.

      Thank you for pointing this out. As you mentioned, the SC matrix becomes increasingly sparse as the number of ROIs increases (Ext. Data Fig.2-2b). In contrast, the FC matrix may contain values even when there are no direct connections between ROIs (indirect connections). We conducted an investigation into this issue. To deal with this issue, Honey et al. (2009) [6] resampled the elements of the SC matrix in rank order using a Gaussian distribution and calculated the FC-SC correlation between this resampled SC and FC.

      Ext. Data Fig.2-2a shows a comparison between resampled SC (Honey et al.’s method) and log-scaled SC (our method). Up to 200 ROIs, the proportion of SC matrix elements that are zero is less than 10% (Ext. Data Fig.2-2b), and there is little zero replacement of logarithmic elements. In this situation, replacing with Gaussian arithmetic tends to increase the correlation coefficient (Ext. Data Fig.2-2a). On the other hand, with 10,000 ROIs, where sparsity is extremely high, the proportion of SC matrix elements that are zero exceeds 70%. In this situation, 70-80% of the zeros are randomly assigned from the smaller end of the Gaussian distribution, which causes a lowering of the correlation coefficient (Ext. Data Fig.2-2a, c, d). For these reasons, we believe that log-scaled SC has less bias than resampling with a Gaussian distribution, and conclude that using log-scaled SC as is in this paper is reasonable. Log-scaled SC has also been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. To show that we have considered this issue, Ext. Data Fig.2-2 has been added to the manuscript.

      (4) Further, in constructing the segregated versus unsegregated connectomes, they use absolute thresholds for collecting synapses. It is unclear, however, whether similar numbers of synapses were included in both matrices. If the number is different, that might explain the differential relationship with fc; one matrix has more non-zero entries (and as noted earlier, those zero entries are problematic).

      Author response image 1.

      a, Sparsity rate histogram of SC matrix with cPPSSI (0-0.1) and subsampled null SC matrices corresponding Fig.4e. Red line indicates sparsity rate of SC matrix with cPPSSI (0-0.1). b, Sparsity rate histogram of SC matrix with cPPSSI (0.9-1) and subsampled null SC matrices corresponding Fig.4f. c, Sparsity rate histogram of SC matrix with reciprocal synapse (≤2𝜇𝑚) and subsampled null SC matrices corresponding Fig.4i.

      Thank you for pointing this out. The number of synaptic connections in the SC matrix shows a large difference between those extracted from cPPSSI (0-0.1) and cPPSSI (0.9-1) (Fig. 4e, f). However, when null SC matrices (99) were generated for each and compared with the cPPSSI-extracted matrices, the FC-SC correlation was significantly higher or lower. At this point, since the sparsity rates of the null SC matrices differed a lot from that of the SC matrices extracted by cPPSSI, we regenerated the null SC matrices in Fig. 4e and 4i. As shown in Author response image 1, we ensured that the extracted SCs (red lines) fit within the null-generated matrices. This figure was added to Ext. Data Fig.4-5, and the main text was also revised. The sparsity rates are 0.52 for cPPSSI (0-0.1) and 0.123 for cPPSSI (0.9-1). Since both cases involve comparisons with null SC matrices that have closely similar sparsity rates, we believe comparison using log-scaled SC is appropriate.

      (5) There was also considerable text (in the results) describing the processing of the Ca data. In this section, the authors frequently refer to some pipelines as "better" or "worse" (more or less effective). But it is not clear what measures they adopted to assess the effectiveness of a pipeline.

      Detailed registration flow of Ca data is described in “Preprocessing of D. melanogaster calcium imaging data” in Materials and Methods section (Ext. Data Fig. 1-1a). Then, optimal nuisance factor removal methods and smoothing size were investigated. We used both correlation analysis (FC-SC correlation) and ROC curve analysis (FC-SC detection). Since signals are assumed to be transmitted between regions based on SC, when SC is treated as the ground truth, we considered a pipeline with a FC-SC higher similarity and higher detection to be better. We updated the Results section to include this point.

      Reviewer #2 (Public review):

      Summary:

      Okuno et al. investigate the structure-function relationship in the fruit fly Drosophila melanogaster. To do so, they combine published data from two recent synapse-level connectomes ("hemibrain" and "FlyWire") with a dataset comprising functional whole-brain calcium imaging and behavioural data. First, they investigate the applicability of fMRI pre-processing techniques on data from calcium imaging. They then cross-correlate this pre-processed functional data with structural data extracted from the connectomes, including a comparison to humans. The authors proceed to compare the two connectomes and find significant differences, which they attribute to differences in the accuracy of the synapse detections. Next, they present a novel algorithm to quantify whether neurons are segregated (pre- and postsynapses are spatially separate) or unsegregated (pre- and postsynapses are mixed). Using this approach, they find that unsegregated neurons may contribute more to function than segregated neurons. Applying a general linear model to the functional dataset suggests that activity in two brain areas (Wedge and AVLP) is suppressed during walking. The authors identify a GABAergic neuron in the connectome that could be responsible for this effect and suggest it may provide feedback to the fly's "compass" in the central complex.

      Strengths:

      The study tackles a relevant question in connectomics by exploring the relationship between structural and functional connectivity in the Drosophila brain. The authors apply a range of established and adapted analytical methods, including fMRI-style preprocessing and a novel synaptic segregation index. The effort to integrate multiple datasets and to compare across species reflects a broad and methodical approach.

      Thank you very much for your comments.

      Weaknesses:

      The manuscript would benefit from a clearer overarching narrative to unify the various analyses, which currently appear somewhat disjointed. While the technical methods are extensive, the writing is often convoluted and lacks crucial details, making it difficult to follow the logic and interpret key findings. Additionally, the conclusions are relatively incremental and lack a compelling conceptual advance, limiting the overall impact of the work.

      (1) The introduction currently contains a number of findings and conclusions that would be better placed in the results and discussion to clearly delineate past findings from new results and speculations.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) The narrative would benefit greatly from some clear statements along the lines of "we wanted to find out X, therefore we did Y".

      Thank you for pointing this out. In many biology papers, the problem is clear, but as you say, this paper starts by comparing the very fine SC and FC of flies, which makes the problem unclear and the results sporadic. We have revised the structure of the introduction.

      (3) More concise terminology would be helpful. For example, the connectomes are currently referred to as either "hemibrain", "FlyEM", "whole-brain", or "FlyWire".

      Thank you for pointing this out. We revised the manuscript to separate "hemibrain" and "whole-brain" from "connectome." "hemibrain" and "whole-brain" retain their original meanings.

      (4) The abstract claims "a new, more robust method to quantify the degree of pre- and post-synaptic segregation". However, the study fails to provide evidence that this method is indeed more robust than existing methods.

      We apologize, but this information was not included in the main figures or the Results section. It is presented in the Methods section and Ext. Data Fig. 4-1i, j. We moved related texts from the Methods to the Results section.

      (5) The authors define unsegregated neurons as having mixed pre- and postsynapses in the same space. However, this ignores the neurons' topology: a neuron can exhibit a clearly defined dendrite with (mostly) postsynapses and a clearly defined axon with (mostly) presynapses, which then occupy the same space. This is different from genuinely unsegregated neurons with no distinct dendritic and axonal compartments, such as CT1.

      Thank you for pointing this out. Regarding this point, we think it is difficult to discuss the neuron’s topology in this paper. We defined PPSSI and demonstrated only that unsegregated neurons with mixed pre- and post-synapses are scattered throughout the brain (Ext. Data Fig. 4-2e). Further research is needed to determine the relationship with morphology in individual neurons.

      One possibility is that inhibitory, non-spiking unsegregated neurons, such as CT1 amacrine cell [24, 27, 28] or interneurons in Antennal Lobe [29], may be widely used throughout the brain (WAGN is also a candidate for this). Grimes et al. [34] mentioned “The retina is a beautiful example of a neural network that optimizes signal processing capacity while minimizing cellular cost.” To maintain the signal dynamic range, A17 amacrine cells must optimize the processing units and wiring costs. If one unit equaled one cell, an enormous number of cell bodies would be required, reducing the number of processing units per volume and increasing the energy cost during development. To optimize this, they proposed arranging units capable of parallel processing within a single cell, thereby maximizing the processing units and wiring costs per volume.

      Signal bursts might also occur in the central nervous system (CNS), in which case CNS neurons also require dynamic range adjustment. The concept of optimizing processing units per volume is highly compelling and is thought to apply not only to the retina but throughout the entire brain.

      (6) It is not entirely clear where the marmoset dataset originates from. Was it generated for this study? If not, why is there a note in the Ethics Declaration?

      Marmoset data were reported in [10] and it was not generated for this study. We therefore removed the Ethics Declaration.

      (7) On the differences between hemibrain and FlyWire: What is the "18.8 million post-synapses" for FlyWire referring to? The (thresholded) FlyWire synapse table has 130M connections (=postsynapses). Subsetting that synapse cloud to the hemibrain volume still gives ~47M synapses. Further subsetting to only connections between proofread neurons inside the hemibrain volume gives 19.4M - perhaps the authors did something like that? Similarly, the hemibrain synapse table contains 64M postsynapses. Do the 21M "FlyEM" post-synapses refer to proofread neurons only? If the authors indeed used only (post-)synapses from proofread neurons, they need to make that explicit in results and methods, and account for differences in reconstruction status when making any comparisons. For example, the mushroom body in the hemibrain got a lot more attention than in FlyWire, which would explain the differences reported here. For that reason, connection weights are often expressed as, e.g., a fraction of the target's inputs instead of the total number of synapses when comparing connectivity across connectomic datasets. Furthermore, in Figure 3b, it looks like the FlyWire synapse cloud was not trimmed to the exact hemibrain boundaries: for example, the trimmed FlyWire synapse cloud seems to extend further into the optic lobes than the hemibrain volume does.

      Thank you for pointing this out. FlyEM connectome data version 1.2 was downloaded and used as described in Data Availability. This data is provided in the format defined by https://neuprint.janelia.org/public/neuprintuserguide.pdf, and we extracted neurons and synapses from it.

      The entire segmentation body is 28M segmentations, and there were 99,644 Traced proofread neurons. In addition, there were 73M (pre- or post- alone) synapses, 87M records in synapseSets and 128M records in synapseSet-to-synapse. When we extracted post-synapses between Traced neurons, the total number was 21.4M (i.e., connections from Traced neurons to other body fragments like Orphans were excluded).

      The FlyWire dataset (v783) was downloaded from the flywire codex and Zenodo. This dataset contained 139,255 proofread neurons and 54.5M (pair of pre- and post-) synapses, as described in Dorkenwald et al. [13], with 18.8M post-synapses in the regions corresponding to the hemibrain primary ROIs. We have updated the Results and Methods sections by taking into account your comment.

      In Fig. 3b, these images were created using a mask that extended the boundaries of the hemibrain primary ROIs, making the boundaries unclear. Therefore, we corrected the images in Fig. 3b by adjusting the mask so that the boundaries were properly aligned.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Okuno et al. re-analyze whole-brain imaging data collected in another paper (Brezovec et al., 2024) in the context of the two currently available Drosophila connectome datasets: the partial "FlyEM" (hemibrain) dataset (Scheffer et al., 2020) and the whole-brain "FlyWire" dataset (Dorkenwald et al., 2024). They apply existing fMRI signal processing algorithms to the fly imaging data and compute function-structure correlations across a variety of post-processing parameters (noise reduction methods, ROI size), demonstrating an inverse relationship between ROI size and FC-SC correlation. The authors go on to look at structural connectivity amongst more polarized or less polarized neurons, and suggest that stronger FC-SC correlations are driven by more polarized neurons.

      Strengths:

      (1) The result that larger mesoscale ROIs have a higher correlation with structural data is interesting. This has been previously discussed in Drosophila in Turner et al., 2021, but here it is quantified more extensively.

      (2) The quantification of neuron polarization (PPSSI) as applied to these structural data is a promising approach for quantifying differences in spatial synapse distribution.

      Thank you very much for your comments.

      Weaknesses:

      One should not score noise/nuisance removal methods solely by their impact on FC-SC correlation values, because we do not know a priori that direct structural connections correspond with strong functional correlations. In fact, work in C. elegans, where we have access to both a connectome and neuron-resolution functional data, suggests that this relationship is weak (Yemini et al., 2021; Randi et al., 2023). Similarly, I don't think it's appropriate to tune the confidence scores on the EM datasets using FC-SC correlations as an output metric.

      Thank you for pointing this out. We believe that the FC in C. elegans uses cell body dynamics, which is different from the synaptic population dynamics in a region of fly calcium imaging or fMRI data (BOLD [Blood Oxygenation Level Dependent] signal). The BOLD signal in a region is thought to correspond to the neurovascular coupling of synaptic population dynamics. Furthermore, compartmentalization of a neuron has been observed in C. elegans (Hendricks et al., 2012)*, showing different dynamics across neuron compartments. Thus, the dynamics of the cell body and the dynamics of the synaptic population in other regions are different in C. elegans. We speculate that there is some relationship between FC-SC between regions, because the FC-SC correlation in the fly brain reached r=0.87 with 20 ROIs (Fig. 2d). We believe that this result is different from the cell body dynamics in C. elegans.

      *Hendricks et al., “Compartmentalized calcium dynamics in a C. elegans interneuron encode head movement,” Nature 487, 99-103 (2012)

      Any discussion of FC-SC comparisons should include an analysis of excitatory/inhibitory neurotransmitters, which are available in the fly connectome dataset. However, here the authors do not perform any analyses with neurotransmitter information.

      A comparison between FC-SC and neurotransmitter has been written in the Results section. We investigated the ratios of neurotransmitter input (ExtFig.3-2a) and output (Fig. 3f) in each region, and investigated the relationship between this ratio and FC-SC correlation in each neurotransmitter. This revealed significant correlations for acetylcholine (r=0.39, p=0.0013) and GABA (r=-0.25, p=0.046) (Fig. 3g). That is, the higher the percentage of excitatory connections, the higher the FC-SC correlation; conversely, the higher the percentage of inhibitory connections, the lower the FC-SC correlation.

      Comparisons between fly and human MRI data are also premature here. Firstly, the fly connectomes, which are derived from neuron-scale EM reconstructions, are a qualitatively different kind of data from human connectomes, which are derived from DSI imaging of large-scale tracts. Likewise, calcium data and fMRI data are very different functional data acquisition methods-the fact that similar processing steps can be used on time-series data does not make them surprisingly similar, and does not in my view, constitute evidence of "similar design concepts."

      Thank you for pointing this out. As you say, fiber bundles of DTI and EM connectome are completely different. Nevertheless, the fact remains that the FC-SC correlation is high in both the fly and human brains. As mentioned above, both regional signal from calcium imaging and BOLD signal from fMRI are based on synaptic population dynamics. It was estimated that 43% of the energy consumption in the gray matter is due to synaptic activity of neurons (Harris et al., 2012), and the BOLD signal fluctuates greatly due to this activity. Furthermore, synaptic activity is thought to be much faster than the activity of microglia and astrocytes, so the FC of fMRI is thought to mainly capture the regional correlation of synaptic activity. In other words, in both flies and humans, although the size is different, the pre-synaptic activity in one region and the pre-synaptic activity in another region via neural fibers are being compared in a common manner in the form of FC-SC.

      In addition, non-spiking unsegregated neuron exists in mammals as well, such as the amacrine cell of the retina [34], and even pyramidal cells in the neocortex show local mixtures of pre- and post-synapses (Ext. Data Fig.1-2). If a functional unit is realized by local compartment in a neuron as mentioned in [34], the fly will be a powerful model organism for investigating them, and its functional “design concept” may also be useful for mammals.

      Harris et al., “The Energetics of CNS White Matter,” J. Neurosci., 2012, 32 (1) 356-371

      The comparison of FlyEM/FlyWire connectomes concludes that differences are more likely a result of data processing than of inter-individual variability. If this is the case, the title should not claim that the manuscript covers individual variability.

      Thank you for pointing this out. Inter-individual variability is relevant to both SC and FC. Regarding SC, we think the difference in the number of synapses between the two individuals is due to the difference in detection power caused by differences in the resolution of the electron microscope. Regarding FC, as stated in the Results section, “Spatial smoothing is useful for absorbing inter-individual variability and conducting second-level group analysis.” Increasing the smoothing size improves the correlation and AUC between group-averaged FC and SC, indicating the presence of inter-individual variability in FC (Fig. 2b, Ext. Data Fig. 2-1b, especially when the number of ROIs is high). We added this text in the Introduction and Results sections to address your comment.

      The analysis of the wedge-AVLP neuron strikes me as highly speculative, given that the alignment precision between the connectome and the functional data is around 5 microns (Brezovec* et al, PNAS 2024).

      As you mentioned, functional analysis has limitations in spatial resolution. In particular, the resolution in the Z axis is 4 μm, which is 1,000 times lower than the resolution of electron microscopy data. This makes it difficult to perfectly match synaptic activity to a synapse in the structural data. Furthermore, spatial smoothing is applied to functional images to absorb inter-individual variability, which can only provide blurred results for group analyses. These are considered limitations of the methods used in fMRI analysis. Despite these limitations, we applied GLM analysis to walking behavior and observed clear inactivity region. This region roughly corresponds to the synaptic cloud of a neuron named WAGN (Fig.5b and c). This neuron also connects to WPNb and ANs in the connectome data, suggesting a possibility that it is related to walking behavior. This is merely a screening reference; therefore, further biological experimentation is needed to pursue this topic.

      Recommendations for the authors:

      Reviewing Editor Comments:

      We should emphasize that the reviewers encouraged revision and resubmission. If the reviewers' comments were to be addressed in full in a revision to strengthen the evidence, this would significantly increase the impact of the findings and the relevance of the work to the fly neuroscience community and to the connectomics field more broadly.

      Thank you very much for your comments.

      Major Issues:

      (1) Structural correlation and functional correlation measure very different aspects of network data, yet a simple correlation between the off-diagonal elements of the two is used. It would be expected that this would not be directly proportional, and it's not clear why this would be a sensible measure. The authors need a better solution for dealing with the zero entries in the SC matrix. Replacing the infinities with zeros and then running the linear regression to get an SC/FC relationship is not appropriate. Even with a better metric, given that both intuition and other studies have shown a weak correlation between FC and SC, using FC-SC correlation as a quality descriptor for other properties is not proper. Furthermore, the authors don't account for neurotransmitter identity in the structural data, which would have strong implications for the relationships between FC and SC.

      Thank you for pointing this out. To investigate this issue we compared the FC-SC correlation between the Gaussian resampled SC approach used in Honey et al. (2009) [6] and the log-scaled SC used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate is low (Ext. Data Fig.2-2b), resulting in less zero replacement. Therefore, log-scaled SC is likely to more accurately represent the FC-SC relationship. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and Gaussian resampled SC randomly assigns a large number of zero elements from the smaller end of the distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Log-scaled SC has been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. When zero replacement is undesirable, using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown). It may be possible to compare various methods, but this is outside the scope of this study and requires further research.

      The C. elegans studies presented by Reviewer #3 showed a weak correlation between FC and SC. However, C. elegans neurons do not fire and exhibited different calcium fluctuations depending on the region (Hendricks et al., 2012). This suggested that the cell body and various synaptic terminal regions have different FCs, which is consistent with the objective of our study (neuronal compartmentalization). If a functional unit is locally composed of multiple neurons and synapses, it is expected that SC and FC from that region will show a strong relationship. Larger regions would include multiple functional units, and a relationship between SC and FC would also be found, which is consistent with the results of our study. The C. elegans study compared FC of the cell body (a region) with SC of whole cell (not a same region), which would be inconsistent.

      (2) Synaptic segregation on neurons can be topologically present even if pre- and post-synaptic synapses are present in similar regions of space, as an axon branch and dendrite branch can overlap in space but remain distinct along the arbor. The authors emphasize a region-based definition that does not reflect cellular anatomy. Moreover, the authors do not make an argument for their claim of better robustness of their new synaptic segregation measures.

      Author response image 2.

      Distance calculation for DBSCAN. a, Example synapse pair (black dot) of distance calculation. Red line shows the straight-line distance, and green line shows the morphology-based distance. DBSCAN will places two synapses in the same cluster based on straight-line distance, but they will be in different clusters based on the morphology-based distance.

      Thank you for pointing this out. We changed from using DBSCAN based on the straight-line distance between synapses to DBSCAN based on the morphology-based distance via the branch nearest to the synapse (Author response image 2a). This resulted in a synaptic segregation measure that incorporates cellular anatomy. We updated all related figures, such as Figure.4, Ext. Data Figure.4-1, 4-2, 4-3, 4-4, Figure.5h. Also, we updated related text in the Results and Methods sections.

      (3) Reviewers found the overall structure of the paper is difficult to follow, with sections appearing disjoint and the aims of different sections not well described. This extended to the paper organization as well, with the introduction not clearly setting up the questions and being distinct from the results. The manuscript would benefit from a clearer overarching narrative to unify the various analyses.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (4) Similarly, there are several descriptions of data and analysis that are unclear or lacking, including the source of the marmoset data and how the FlyWire synapse was subsampled.

      As pointed out by other reviewers, the marmoset data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      We have updated the Results and Methods sections regarding the extraction of "traced" neurons and synapses in FlyEM connectome data, and the extraction of post-synapses in hemibrain primary ROIs in FlyWire connectome data.

      (5) Comparisons between FlyWire and Hemibrain have shown many similarities and some clear examples of inter-individual variability. There was concern that technical decisions with handling FlyWire synapse sampling were responsible for some of the differences observed between the datasets.

      In response to Reviewer #2's question, we answered that both FlyEM and FlyWire use proofread neurons and their connecting synapses. We also updated Fig. 3b and the Results and Methods sections.

      Reviewer #1 (Recommendations for the authors):

      The paper is written in an unusual way. It would be helpful if the introduction read more like a standard introduction. Describe the relevant background that the reader needs to understand the results that come later. Frame the experiments in terms of a question or hypothesis. Results should be relegated to the results section (or, if you like, a final paragraph that summarizes the findings). They should not be intermingled throughout the introduction.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      The authors must be more attentive in terms of how they construct the segregated/unsegregated connectomes. I suggest exploring various thresholds/bins, but also considering proportionality thresholds that match the number of synapses.

      Thank you for pointing this out. As pointed out by other reviewers, we changed from using DBSCAN based on the straight-line distance between synapses to DBSCAN based on the morphology-based distance via the branch nearest to the synapse (Author response image 2a). This resulted in a synaptic segregation measure that incorporates cellular anatomy.

      We also considered about the sparsity rates of the SC matrices. Since the sparsity rates of the null SC matrices differed a lot from that of the SC matrices extracted by cPPSSI, we regenerated the null SC matrices, shown in Fig. 4e and 4i. As shown in Author response image 1, we ensured that the extracted SCs fit within the null-generated matrices. This figure was added to Ext. Data Fig.4-5, and the main text was also revised.

      The authors need a better solution for dealing with the zero entries in the sc matrix. Replacing the infinities with zeros and then running the linear regression to get an sc/fc relationship is not appropriate.

      Thank you for pointing this out. To investigate this issue, as pointed out by other reviewers, we compared the FC-SC correlation between the Gaussian resampled SC approach used in Honey et al. (2009) [6] and the log-scaled SC used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate was low (Ext. Data Fig.2-2b), resulting in less zero replacement. Therefore, log-scaled SC is likely to more accurately represent the relationship. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and resampled SC randomly assigns a large number of zero elements from the smaller end of the distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown), because this matrix can also be very sparse. It may be possible to compare various methods, but this is outside the scope of this study and requires further research.

      It would be useful to include a description of where the human/marmoset datasets came from. It would be useful to describe the processing of those datasets and whether they're comparable to how the fly data was processed.

      As pointed out by other reviewers, the marmoset data was obtained in [10], and the processing is also described in [10]. Therefore, we have revised the caption and removed the Ethics Declaration.

      The pre-processing of fly calcium imaging data is described in the Methods section. Unfortunately, this processing method is not comparable to that used in humans/marmosets as it was highly customized.

      The authors report sc/fc correlations for the human/marmoset datasets based on single papers. However, in the human case, especially, the strength of sc/fc correlations is highly variable. Not just based on number/size of parcels, but based on amount of data, processing pipeline, single-subject versus group averaged (incidentally, single-subject sc/fc is ‘much’* lower than group-averaged, which has big implications for this study, where the fly datasets are, in essence, N=1 studies).

      Yes, there are numerous FC-SC correlation studies. We think Honey et al. (2009) [6] to be a highly representative study. It showed r = 0.39 to 0.48 for individual participants in 998 ROIs, and r = 0.36 for averaged one, but it increased r = 0.53 excluding absent or inconsistent structural connections. So, single-subject may not be much lower than group-averaged. Since the SC for a fly is an N=1 study, the FC-SC correlation for the same individual cannot be calculated. We think further research will be necessary.

      Reviewer #2 (Recommendations for the authors):

      Abstract:

      Please introduce the term "ROI"

      Thank you for pointing this out. We have revised the Abstract.

      Introduction:

      (1) On a general note: the introduction reads like an extended abstract (i.e., a mix of results and discussion).

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (2) Line 43: Does this mean FC-SC correlation is higher in flies but not significantly so? Please clarify.

      We performed Mann-Whitney U test and it was not significant (p= 0.2667).

      (3) Line 51: The "confidence" score does not indicate the degree of synaptic detection.

      In the NeuPrint user guide, https://neuprint.janelia.org/public/neuprintuserguide.pdf it states “confidence - The certainty that an annotated synapse is correct and valid.” Since “degree of synaptic detection” may be difficult to understand, we changed it to “certainty of an annotated synapse.”

      (4) Line 59-61: This statement needs refining: post-synapses do not "receive" neurotransmitters, action potentials aren't conducted along nerve fibres.

      We changed “receive” to “sense.” About “action potentials,” we changed “conduct an action potential” to “graded potentials”, and removed “along nerve fibers.”

      (5) Line 61: calcium activity as detected via GCaMP correlates with (electric) neuronal activity - please cite relevant GCaMP literature here.

      We added F. Helmchen and J. Waters, "Ca2+ imaging in the mammalian brain in vivo," Eur J Pharmacol., vol. 447, pp. 119-129, 2002.

      (6) Line 76: "interconnected" is rather vague; just say "many Drosophila neurons are reciprocally connected".

      Thank you for pointing this out. Lin et al., (2024) showed motif analysis and there are many reciprocal, three-node and rich-club connections. However, introduction was updated and this sentence was removed.

      (7) Line 77: comparing unsegregated vs reciprocal synapses is overly simplistic; these are separate features of the same object - i.e., a synapse can be reciprocal and at the same time be segregated in the presynaptic neuron but unsegregated in the postsynaptic neuron.

      Thank you for pointing this out. As you say, the relationship is complicated. In this paper, we are concerned with the degree of segregation of pre- and post-synapses, and we are looking at the segregation within a neuron. In this case, nearby reciprocal synapses (<=2 μm) are included in unsegregated synapses. We have made a correction to the sentence.

      (8) Line 79: I don't understand how we get from unsegregated synapses to local activity.

      Retinal amacrine cells have extensive unsegregated synapses, which provide local feedback inhibition of burst inputs [34]. We changed the text around these descriptions.

      (9) Line 80: What does "more essential function" mean?

      We removed this sentence.

      (10) Line 85: "as shown earlier": Is this based on results in this study or prior work? See also the general above note on mixing results/discussion into the introduction.

      Thank you for pointing this out. We have revised the introduction to make it more concise.

      (11) Line 85-87: I don't understand how the applicability of certain fMRI analysis methods in turn means that functional activity is locally compartmentalized. Did you mean to say something along the lines of "we applied common fMRI methods which showed functional activity is locally compartmentalized"?

      These sentences discuss the commonality between fMRI (BOLD signal) and calcium signal, which both represent presynaptic population dynamics within a local region (voxel). Furthermore, unsegregated synapses are widespread throughout the fly brain (Ext. Data Fig.4-2) and can also be observed in human pyramidal cells (Ext. Data Fig.1-2). Unsegregated synapses suggest local compartment activity [33, 34, 39, 40] and contribute more to functional activity (Fig.4b). Therefore, the similar trend in FC-SC correlation (Fig.2d) between humans and flies suggest that both species exhibit localized compartmental activity via unsegregated synapses throughout the entire brain.

      Because these sentences contain many conclusions, they have been moved from the Introduction to the Discussion section.

      (12) Line 87: Please provide a reference for "common among various species".

      Thank you for pointing this out. Because these sentences contain many conclusions, they have been moved from the Introduction to the Discussion section.

      Results:

      (1) Line 91-92:

      (a) Please explain where the calcium data came from, how it was generated, etc.

      We added the data source and a reference (Brezovec et al. [14]).

      (b) Please clarify: what registration method?

      This is not simple. Please see the Methods section and Ext. Data Fig.1-1. This is also indicated in the text.

      (c) "calcium image" → "calcium image data"?

      We changed “calcium image” to “calcium imaging data”.

      (d) What is the "FDA template"?

      This is a brain template created by Brezovec et al. [14]. JRC2018 is a well-known brain template, but it was created by immunostaining postmortem brains and did not fit well with calcium imaging data from living flies. Therefore, we used the FDA template.

      (2) Line 93: Please introduce the term "ROI".

      We added “(Region of Interest)” in Line 38.

      (3) Line 94: Ito et al., Neuron (2014) "A systematic nomenclature for the insect brain" is a better reference for Drosophila neuropils; for the hemibrain, the ROIs were generated to match that original atlas

      Thank you for pointing this out. We added a reference.

      (4) Line 95/96: It is unclear what was used as the basis for the k-means/distance-based clustering

      This was because we wanted to investigate whether nuisance factor removal methods are robust, even for such diverse types of ROI. We added this point to the text.

      (5) Line 120ff: I'm not sure how the total number of ROIs is relevant for comparing flies and humans, given (a) the huge difference in brain size and (b) the difference in resolution of the functional data.

      Indeed, the fly brain and the human neocortex are completely different. We are investigating whether there are commonalities between them using a metric called FC-SC correlation. As described in our answer for (11), both the fMRI (BOLD signal) and calcium signal represent presynaptic population dynamics within a local region (voxel). FC represents the synchronization of synaptic activity between regions, and SC represents the structural connectivity of neurons. Both flies and humans showed high SC-FC correlation and showed similar trends (Fig. 2d), so we believe it would be interesting to investigate this phenomenon.

      (6) Line 123: "by contrast" is misleading here since, as you say, there isn't really a difference.

      We changed “by contrast” to “and.”

      (7) Line 141: I'm somewhat worried that the differences between FlyWire and hemibrain synapse counts are due to the issues mentioned above.

      Thank you for the comment but we are not sure about “the issues mentioned above” is referring to.

      (8) Line 148: There is no evidence that any differences in synapse are due to the resolution or anisotropy (as suggested in the introduction).

      We apologize that we don’t have direct evidence for it. We changed this to the sentence “This may be caused by differences in detection accuracy resulting from the resolution of EM scanning, but not to inter-individual variability.”

      (9) Line 155: References "39,45" have no brackets.

      These are not referencing numbers, but brain regions of Brodmann area 39 and 45.

      (10) Line 155-157: I don't think we can infer the composition of brain areas in humans based on a tenuous correlation in flies; this is highly speculative and really should be in the discussion.

      In humans, there are areas with strong and weak FC-SC correlations [8], which may be due to the E-I (Excitatory-Inhibitory) balance of connections. We investigated this possibility by comparing the correlation between neurotransmitters and FC-SC correlations in the fly brain. We slightly changed this sentence.

      (11) Line 159: I find the first 2-3 sentences in this paragraph confusing. Are you saying that you did all these things in the prior results sections, or that you wanted to look at X and therefore you did Y? Maybe there is an issue with the tense here?

      We changed the sentences around this description.

      (12) Line 161: "whole-brain" = FlyWire?

      We changed “whole-brain” to “FlyWire”.

      (13) Line 163: Please explain the "PPSSI" acronym.

      This is now explained on Line 75.

      (14) Line 165: The description of how the cPPSSI was calculated is hard to follow. For example, what's the "fraction of synapse number".

      We changed our sentences around this description to be clearer. The cPPSSI is the degree of segregation within a cluster and is also assigned to each synapse. The PPSSI is then the average of the cPPSSI values of all synapses in a neuron.

      (15) Line 166: Is there a difference between "cPPSSI" and "PPSSI"?

      Yes, there is. Please see our answer for (14).

      (16) Line 167: "The result showed a histogram resembling a normal distribution" → I suggest running a normality test.

      Thank you for pointing this out. We tested it by Lilliefors test and the result was p=0.001 (significantly not a normal distribution). Since there are numerous values with PPSSI=1, it is not judged to be a normal distribution. We therefore changed this description.

      (17) Line 173: I am somewhat worried about a selection bias in your correlation of segregated vs unsegregated synapses. First, it seems like only a small fraction of neurons are in the 0-0.1 and 0.9-1 PPSSI range. I would suggest running a proper correlation between PPSSI and FC-SC correlation instead of looking at just the two extremes. Second, your examples for segregated neurons (APL + CT1) are large neurons that densely innervate spatially close and functionally very similar neuropils. If the sample of unsegregated neurons consists mainly of these large interneurons, I'm not at all surprised that they contributed strongly to FC-SC correlation.

      Thank you for pointing this out. For this work we investigated synapses (not neurons), extracting those with cPPSSI of 0-0.1 and 0.9-1, and performed a rank text with the FC-SC correlation of random sub-sampled synapses. We aimed to demonstrate that unsegregated synapses in particular, strongly contribute to FC-SC, and we hope to investigate overall trends in a future study.

      (18) Line 185: I don't think the function of reciprocal synapses is "considered to be clear". There are examples of feedback inhibition through reciprocal synapses, in particular in the visual system, but that does not mean that this is true across the board.

      We changed “considered to be clear” to “considered to be clearer than unsegregated synapses.” Of course, the function of reciprocal synapses is unknown for the whole brain, but we think it is more well-studied than unsegregated synapses.

      (19) Line 188 / Figure 4h: that figure panel does not appear to show transmitter pairs.

      Figure 4h (FlyWire) showed transmitter pairs. Ext. Data Fig.4-1g did not, because FlyEM does not have transmitter information.

      (20) Line 192: Please clarify "functionally common".

      We changed our sentences to clarify this.

      (21) Line 199: "ventral nerve code" → "ventral nerve cord".

      We fixed this typo.

      (22) Line 201: I don't think you can use "conversely" here.

      We changed “Conversely” to “Moreover.”

      (23) Line 201: How certain are you that the WAGN neuron is the only candidate? Also, it would be nice to provide the neuron IDs so that people can identify them in the connectome.

      Thank you for pointing this out. We added Root ID: 720575940644632087 in the text. Actually, we found several GABA neuron candidates, such as 720575940637611365, 720575940644632087, 720575940613552947, 720575940640333109 and 720575940612264817. We investigated whether ER1(L) was present in these downstream connections and found that 720575940644632087 had the strongest connection with the largest number of synapses, so we adopted this.

      (24) Line 207: When you say "the left WAGN was strongly connected", are those connections not also present for the right WAGN?

      There is a right WAGN (Root ID: 720575940624377224), but it does not have strong interconnections with WPNb tier 2/3 (left) neurons. For the right WAGN, there are few inputs from WPNb tier 2/3 (left). We added “(left)” in the text.

      (25) Line 212: I don't think you can use "however" here.

      We removed “however.”

      (26) Line 214: "well unsegregated" → "very unsegregated"?

      This sentence was removed, because we recalculated Fig. 5h.

      Ethics Declaration:

      It seems the marmoset data were reported on in [10], so why is there a reference to the generation of the dataset?

      Yes, marmoset data were reported in [10], so we removed the Ethics Declaration.

      Reviewer #3 (Recommendations for the authors):

      (1) In my opinion, the title and framing of this manuscript dramatically overstate the results presented here. Also, the results presented in the different figures in this manuscript seem disjointed and are not very related to each other.

      Thank you for pointing this out. We have rewritten our manuscript slightly to address this. Inter-individual variability is relevant to both SC and FC. Regarding SC, we think the difference in the number of synapses between the two individuals is due to the difference in detection power caused by differences in the resolution of the electron microscope. Regarding FC, as stated in the Results section, “Spatial smoothing is useful for absorbing inter-individual variability and conducting second-level group analysis.” Increasing the smoothing size improves the correlation and AUC between group-averaged FC and SC, indicating the presence of inter-individual variability in FC (Fig. 2b, Ext. Data Fig. 2-1b, especially when the number of ROIs is high). We added this text in the Introduction and Results sections.

      (2) There are multiple ways to compute structural correlation matrices-the methods the authors implemented should be discussed in greater detail in the manuscript.

      Thank you for pointing this out. To investigate this issue, as pointed out by other reviewers, we compared the FC-SC correlation between the Gaussian resampled SC approach, used in Honey et al. (2009) [6] and the log-scaled SC approach, used in this study (Ext. Data Fig.2-2a). With a small number of ROIs, the sparsity rate was low (Ext. Data Fig.2-2b), resulting in fewer zero replacement. Therefore, log-scaled SC is likely to more accurately represent the relationship in our study. Furthermore, with a large number of ROIs, the sparsity rate exceeds 70%, and resampled SC randomly assigns a large number of zero elements from the smaller end of the Gaussian distribution. This tends to lower the correlation (Ext. Data Fig.2-2c, d), suggesting that log-scaled SC provides fairer results. Using connection weights (the proportion of connections originating from the target region among all connections) can yield results similar to log-scaled SC (data not shown), because this matrix can be also very sparse. The log-scaled SC aprroach has been used in previous studies [9, 68] and is considered a simple method for showing the relationship (correlation) between FC and SC. It may be possible to compare various methods in-depth, but this is outside the scope of this study and requires further research.

      (3) The use of the FC-SC detection score defined by the authors should be discussed and justified more extensively in the text.

      Thank you for pointing this out. This has already been discussed in [10]. We defined our own “FC-SC detection score,” but we consider the overall approach to be well established in the literature. For example, Stafford et al. (2014) carried out FC-SC detection for 168 mouse cortical regions, and obtained 78.26% sensitivity and 81.69% specificity for the top 1% of SC. Hori et al. (2020) also investigated FC-SC detection for 55 cortical regions of the marmoset brain left hemisphere, achieving an AUC of 0.72. We think FC-SC detection is an index that evaluates the relationship between FC and SC from a different angle than FC-SC correlation and is worthwhile.

      Hori et al., (2020). Comparison of resting-state functional connectivity in marmosets with tracer-based cellular connectivity. NeuroImage, 204, 116241.

      Stafford et al., (2014). Large-scale topology and the default mode network in the mouse connectome. Proc. Natl. Acad. Sci. U.S.A., 111(52), 18745-18750.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      • *

      Background and unknown in the field:

      This study investigates how fibroblast alignment influences the migration of intestinal epithelial cells, contributing to tissue integrity and repair. It is well established that intestinal fibroblasts are important regulators in the tissue through their ability to secrete essential paracrine factors for the epithelium. However, it is less well understood if they also play additional structural, tissue architecture instructing role and how the communication between the fibroblasts and the epithelia is regulated.

      Advance over state of the art:

      Here the authors have set-up an elegant three-component system to investigate this. They have gone beyond the recent advances of culturing intestinal and colonic organoids in 2D (in a manner that preserves- and villus-like organization) and bioengineered epithelial-stromal model comprising organoid-derived intestinal epithelial cells (IECs), primary intestinal fibroblasts, and a basement membrane matrix. Using this model, they have uncovered fibroblasts enhancing the directed and persistent migration of intestinal epithelial cells (IECs). They used scRNAseq to carefully analyse the stromal cell populations present in their co-cultures of primary mouse intestinal subepithelial fibroblasts and organoid-derived intestinal mouse epithelial cells. They observed that this reflected well the stromal cell-type composition as well as the paracrine activity previously reported for these cells in tissue. Using a clever system with Matrigel and an elastomeric barrier, the authors were able to induce non-epithelial gaps in different scenarios (IECs alone or with fibroblasts or with conditioned media) and observe the wound-closure as well as the presence of specific cell types. They observed that the epithelial monolayers showed significant gap closure when in direct contact with fibroblasts compared to controls. Interestingly, the enhanced efficiency of epithelial migration and gap closure, in the presence of fibroblasts, was independent of PGE-EP4 signaling and was not due to differences in cell proliferation. Instead, the imaging revealed that the fibroblasts were in direct contact with the epithelium. The authors observed that in the absence of fibroblasts the migration properties of cells in the villus and the crypt regions were dramatically different and the fibroblast presence was necessary to efficiently synchronize these to support gap closure. In addition, the presence of fibroblasts enhanced the directionality of the epithelial cell migration. Detailed imaging and image analyses revealed that gap closure involved activation of the fibroblasts and co-ordinated coalignment of IECs and fibroblasts. They also explored matrix deposition of the fibroblasts during the process and found that they deposited aligned ECM fibers that guide epithelial migration. Mere cell-derived matrix (devoid of live fibroblasts) was able to partially recapitulate the fibroblast-coordinated epithelial migration that the fibroblast generated matrix and its alignment are key contributors to the phenotype.

      Comments:

      This is overall a very interesting and well-written study. The imaging and the image analysis are state-of-the art and the bioengineered model is an exciting advancement over current methods developed by these researchers and others. This study meets all the criteria for a publication in the since that all the experiments seem to be carefully conducted, with appropriate controls and sufficient quantifications and statistics. The claims made by the authors are supported by the data. This is currently suitable to be published as a method/protocol and as a descriptive study uncovering interesting cross-talk and co-dependencies of epithelial and stromal cells during injury repair. There are of course aspects that could improve the study further like more mechanistic insight into the underpinnings of the direct epithelia-fibroblast interaction and its involvement in the directed IEC migration. However, these may be topics to investigate in a future study.

      • *

      Reviewer #1 (Significance (Required)):

      • *

      The strengths of the study are the highly in vivo relevant model system that is amendable to imaging and detailed image analysis of distinct cell populations. This may be adapted by others in in the field and has the potential to transform the way cell dynamics in the intestinal epithelium are visualized and investigated in vitro

      • *

      We thank the reviewer for their thoughtful and positive assessment of our work, and their recognition of the relevance of the bioengineered epithelial-stromal model and its potential for quantitative imaging and analysis of epithelial and fibroblast dynamics.

      We agree that further mechanistic insight into epithelial-fibroblast crosstalk would strengthen the study. While the current manuscript establishes this tractable system and identifies a role for fibroblast organization and matrix alignment in coordinating epithelial migration, we also aim to deepen the mechanistic understanding in the revision. As outlined in our response to Reviewer 2, we will perform additional experiments to further investigate the epithelial-fibroblast crosstalk and force-dependent interactions underlying this process.

      We believe that these additions will complement the current findings and strengthen the conceptual contribution of the study beyond its methodological advances.

      • *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      • *

      Please find enclosed my review comments on the manuscript entitled "Fibroblast alignment coordinates epithelial migration and maintains intestinal tissue integrity" by Jordi Comelles et al.

      In this manuscript, the authors use a bioengineered epithelial-stromal system composed of organoid-derived intestinal epithelial cells, primary intestinal fibroblasts, and a basement membrane matrix to show that direct physical interactions between fibroblasts and epithelial cells drive a large-scale organization of the fibroblast network. This spatial reorganization, in turn, promotes persistent and oriented migration of epithelial cells, ultimately enabling restoration of the intestinal epithelium in an in vitro gap-closure assay. Overall, while the authors use an elegant in vitro model to study intestinal wound closure, and more specifically the role of fibroblasts in this context, I find this manuscript not suitable for publication in its present form. The data are overinterpreted, the novelty is limited, and the molecular mechanisms underlying WAE-fibroblast interactions are insufficiently addressed.

      • *

      We thank the reviewer for their contribution to the revision process with their valuable assessments. We will address their specific points below.

      • *

      Figure 1 - What are the units of the "fraction gap closure" shown in panels d and e? Is it expressed as a percentage?

      We thank the reviewer for pointing this out. The "fraction of gap closed" was calculated as (A(t = 0h)-A(t))/A(t = 0h), where A(t = 0h) corresponds to the initial gap area and A(t) is the area of the gap measured at the time point t. With this definition, the fraction of gap closed is dimensionless, it is 0 at the initial time point, will reach 1 if the gap is fully closed and will have negative values if the gap area increases beyond the initial size, as observed in some replicates of the control condition. To avoid misinterpretation, we will express this quantity as a percentage (i.e., multiplied by 100), as suggested by the reviewer. Moreover, we realized it was ill defined in the methods section. This will be corrected as well in the revised version.

      • *

      "Actually, epithelial monolayers achieved the most effective gap closure when cultured in direct physical contact with fibroblasts (Figure 1e and Movies 2 and 3)." From the data shown in panels c, d, and e, it appears that fibroblast-conditioned medium alone promotes efficient gap closure, comparable to the + fibroblast condition.

      We agree with the reviewer that the original closing sentence overstated the effect. While both fibroblast-conditioned medium and direct fibroblast contact promote efficient gap closure compared to control conditions, the data do not support a consistent difference between these two conditions. We will therefore remove this statement in the revised version to more accurately reflect the results.

      • *

      Figure 2 - The use of a cell proliferation inhibitor during the gap-closure assay would help determine the contribution of cell proliferation at the migration front.

      We agree with the reviewer that inhibiting proliferation would help assess the contribution of cell proliferation to gap closure. However, in the 2D gap-closure assay, our Ki67 immunostaining showed no significant differences in the proportion of proliferative cells between conditions, either within the monolayer or at the migration front. This suggests that differential proliferation is unlikely to account for the differences in gap closure observed between control and fibroblast-containing conditions.

      We note that, in a separate 3D organoid assay, fibroblast-derived signals induced a WAE-like transcriptional program associated with reduced Ki67 mRNA expression, indicating that fibroblasts can promote a more migratory epithelial state without increasing proliferation. Thus, while proliferation may contribute to epithelial homeostasis and repair, our data do not point it as the main determinant of the differences observed in the 2D gap-closure phenotypes.

      In addition, pharmacological inhibition of proliferation would likely perturb the homeostasis of the organoid-derived epithelial monolayers, in which proliferative crypt compartments are essential, and would be difficult to restrict to epithelial cells without also affecting fibroblasts in co-culture. For these reasons, although such experiments could inform the general contribution of proliferation to gap closure, we do not think they would directly clarify the differences observed between conditions in our system.

      • *

      Figure 2f and 2g - Has a dose-dependent effect of PGE2 been tested?

      We thank the reviewer for pointing this out. We did not perform a dose-response analysis of PGE2 in this study, as our aim was to assess the involvement of the PGE2-EP4 axis rather than to characterize its quantitative dynamics. We therefore selected a concentration based on previous work demonstrating dose-dependent induction of the WAE program in 3D organoid systems (Miyoshi et al., 2017). In that study, 1 µM PGE2 was sufficient to induce a significant increase in the WAE marker Cldn4, and we used this concentration as a biologically relevant reference condition. We will clarify this in the methods section.

      • *

      Figure 2i - The + fibroblast + EP4i condition (pink) is missing.

      We thank the reviewer for pointing this out. The + fibroblast + EP4i condition is present in the plot but not visually distinguishable because it overlaps with the + fibroblast condition and is therefore masked by it. As shown in Figure S4e, the + fibroblast + EP4i condition falls within the variability range of the + fibroblast condition. To improve clarity, we will revise the figure to ensure that this condition is visually identifiable.

      • *

      "This suggests a mechanical or contact-mediated role for fibroblasts in preserving epithelial integrity and promoting coordinated migration beyond their paracrine signaling." While PGE2-EP4 signaling does not appear to be involved in the fibroblast-mediated enhancement of gap-closure efficiency, the conclusion that physical interactions are more important than paracrine effects is overstated. For instance, an experimental condition in which fibroblast-conditioned medium is inactivated (boiling for 5 minutes) would strengthen this conclusion. In addition, inhibition of actomyosin contractility in fibroblasts would be informative.

      Figure 3 - The data presented here do not convincingly support the dismissal of conditioned medium as a contributing factor. The differences between the + fibroblast-conditioned medium and + fibroblast conditions are modest. In both cases, epithelial cells migrate and gaps close.

      We agree with the reviewer that inhibition of actomyosin contractility in fibroblasts would provide valuable insight into the role of force-dependent interactions in epithelial-stromal coupling. However, pharmacological inhibitors of the Rho-ROCK-myosin pathway (e.g., blebbistatin, ML-7, or the ROCK inhibitor Y-27632) would also affect epithelial contractility in our co-culture system, making it difficult to specifically attribute any observed effects to fibroblast mechanics.

      We also agree that paracrine signaling plays an important role in epithelial gap closure. Indeed, supplementation of control media with PGE improves gap closure compared to control conditions, although it does not reach the levels observed with fibroblast-conditioned medium, suggesting that additional soluble factors contribute beyond the PGE-EP4 axis. However, time-lapse imaging revealed direct and dynamic interactions between fibroblasts and epithelial cells (Movie 6; Figure S5a-d; Movie 7), which prompted us to further investigate the contribution of physical interactions, as addressed in Figure 3.

      In Figure 3, we analyzed migration at the single-cell level, in contrast to the tissue-level measurements used for gap closure quantification. In organoid-derived intestinal monolayers, two distinct compartments can be identified: crypt-like and villus-like regions. In vivo, these compartments exhibit different migration behaviors: cells in the crypt are primarily displaced due to crowding, whereas cells in the villus actively migrate, as suggested by the presence of cryptic lamellipodia (Krndija et al., 2019). Consistent with this, tracking individual cells revealed that crypt cells are largely static, while villus cells migrate toward the gap. This compartmentalized behavior was observed in both control and fibroblast-conditioned medium conditions. Strikingly, in the presence of fibroblasts, this differential behavior was reduced, resulting in coordinated migration of both crypt and villus regions.

      This mismatch between compartments in control conditions may contribute to the appearance of discontinuities ("holes") within the epithelial layer during migration. In control experiments, these defects failed to close, whereas in conditioned medium they closed slowly or incompletely. In contrast, in the presence of fibroblasts, these disruptions were rapidly and efficiently resolved, indicating improved tissue integrity.

      Additionally, analysis of individual trajectories near the migration front showed that cells exhibit significantly increased directional persistence (i.e., movement aligned with the direction of gap closure) in the presence of fibroblasts compared to conditioned medium alone.

      Taken together, while paracrine signaling from fibroblasts contributes to epithelial migration and gap closure, the physical presence of fibroblasts induces qualitative changes in epithelial behavior, including coordinated migration across compartments, improved hole closure, and enhanced directional persistence.

      • *

      Figure 4a - "Upon removal of the barrier (t = 0 h), fibroblasts at the epithelial front were small and evenly distributed, with no prominent α-SMA fibers present." Here, fibroblasts are α-SMA positive but not elongated. α-SMA may therefore not be the most appropriate marker. What are the levels of phosphorylated MLC2? These may increase during wound closure. Also, fibroblasts culture promotes aSMA expression, therefore, it may be possible that the fibroblasts used in this assay may not represent the healthy fibroblasts found in vivo.

      We agree with the reviewer that fibroblasts are α-SMA positive at early time points but are not yet elongated. In our system, we observe that α-SMA is already present at t = 0 h, while fibroblasts progressively elongate and reorganize α-SMA into prominent fiber structures over time. This suggests that changes in α-SMA organization, rather than its initial presence, are associated with fibroblast activation during gap closure.

      We note that baseline α-SMA expression may be influenced by in vitro culture conditions prior to the assay, which could differ from the state of fibroblasts in vivo. We will clarify this point in the Discussion to better contextualize our observations relative to native fibroblast populations.

      In addition, we agree that assessing phosphorylated myosin light chain 2 (pMLC2) levels would provide complementary information on contractile activity. We will therefore perform pMLC2 staining, as suggested, to further evaluate force generation by fibroblasts during the wound closure process.

      • *

      Figure 5 - Fibroblast alignment could also result from paracrine signals secreted by epithelial cells. This possibility should be tested.

      We thank the reviewer for this suggestion. To test whether fibroblast alignment could be driven by epithelial-derived paracrine signals, we will culture fibroblasts in conditioned medium collected from epithelial monolayers undergoing gap closure (control condition without fibroblasts) and quantify their alignment over time. This will be compared to fibroblasts maintained in standard fibroblast medium.

      This experiment will directly assess whether epithelial-derived soluble factors are sufficient to induce fibroblast alignment, or whether direct physical interactions are required.

      • *

      In summary, this manuscript demonstrates that epithelial cells migrate more efficiently on extracellular matrix proteins deposited and oriented by fibroblasts. This concept is not novel. Identifying the molecular mechanisms governing interactions between WAE and subepithelial fibroblasts would significantly enhance the novelty and impact of this study.

      • *

      Reviewer #2 (Significance (Required)):

      • *

      In this manuscript, the authors use a bioengineered epithelial-stromal system composed of organoid-derived intestinal epithelial cells, primary intestinal fibroblasts, and a basement membrane matrix to show that direct physical interactions between fibroblasts and epithelial cells drive a large-scale organization of the fibroblast network. This spatial reorganization, in turn, promotes persistent and oriented migration of epithelial cells, ultimately enabling restoration of the intestinal epithelium in an in vitro gap-closure assay. Overall, while the authors use an elegant in vitro model to study intestinal wound closure, and more specifically the role of fibroblasts in this context, I find this manuscript not suitable for publication in its present form. The data are overinterpreted, the novelty is limited, and the molecular mechanisms underlying WAE-fibroblast interactions are insufficiently addressed.

      *We thank the reviewer for this thorough and critical assessment. We have clarified the overstatements in the rebuttal and we will modify the text to address concerns regarding overinterpretation and clearly acknowledge the limitations of our approach. In particular, we will refine the framing of the study to better distinguish between the contributions of paracrine signaling and physical epithelial-stromal interactions. *

      *To address the reviewer's concerns regarding mechanism and novelty, we will perform additional experiments aimed at further characterizing epithelial-stromal cross-talk, and experiments to assess fibroblast contractility and its contribution to epithelial coordination. *

      We believe that these revisions and proposed experiments will strengthen the manuscript and clarify its conceptual contribution.

      • *

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      • *

      Summary:

      - Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      The study by Comelles et al. focuses on how primary intestinal fibroblasts contribute to organoid-derived intestinal epithelial migration in wound healing assays. Using fibroblast-epithelial co-cultures in a 2D in vitro gap closure system, the authors found that direct interaction with fibroblasts drives cohesive and directed migration of intestinal epithelia toward the gap. They further propose that long-range fibroblast alignment promotes the deposition of extracellular matrix (ECM) proteins in an oriented fashion, contributing to directed epithelial migration.

      Major comments:

      - Are the key conclusions convincing?

      Some of the key conclusions of this manuscript are not entirely convincing given the available data. The manuscript would benefit from additional evidence and/or clarifications to support their conclusions. See comments below.

      • *

      - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      (Fig 4a) The authors claim that fibroblasts become activated during gap closure as evidenced by the enhanced assembly of a-SMA fibers 24 hours following barrier removal. Yet, long a-SMA fibers are also observed when fibroblasts are cultured in the absence of epithelial cells or barrier removal (Fig. S1b). To support this conclusion, the authors should consider including additional controls to account for potential time-dependent assembly of a-SMA fibers (e.g., fibroblast-only control).

      We thank the reviewer for pointing this out. We agree that a fibroblast-only control would be important to account for potential time-dependent assembly of α-SMA fibers. We will therefore perform additional experiments monitoring α-SMA organization in fibroblasts cultured alone over time, which will allow us to better interpret the dynamics observed in the co-culture conditions.

      • *

      (Fig. 5a) The authors conclude that fibroblasts align parallel to the direction of epithelial migration during gap closure. While quantifications are convincing, again, a fibroblast-only control accounting for time-dependent spreading and elongation (as seen in Fig. S1) is missing. Including such a control would strengthen their claim that alignment is specific to the gap closure context rather than a time-dependent phenotype.

      We agree with the reviewer that, given the intrinsic ability of fibroblasts to form ordered domains with long-range alignment, this control would be highly informative. We will therefore quantify fibroblast alignment over time in fibroblast-only cultures, which will allow us to determine to what extent the long-range organization observed in co-culture is specific to the gap closure context.

      • *

      (Fig 6) The authors claim that fibroblast-derived aligned ECM drives directional epithelial migration. While fibronectin fibers appear scarce and weakly aligned with the direction of migration, laminin and type IV collagen fibers are barely detectable (Fig. 6f). This may reflect a defect in ECM deposition rather than fiber alignment, which contrasts with Fig. S1, where fibroblasts are shown to deposit and assemble laminin and type IV collagen fibers. One possible explanation is that primary fibroblasts were not cultured long enough to allow robust ECM deposition. Alternatively, the observed effect may be specific to fibronectin, which is consistent with fibroblasts being its major source. The authors should revise their interpretation or provide additional evidence to support their current claim.

      We thank the reviewer for this important point. We agree that differences in ECM signal within the gap may reflect not only fiber alignment but also differences in the amount of protein deposited. In the +fibroblast condition, fibroblasts in the gap have more time to secrete ECM compared to the "empty gap" condition, where fibroblasts remain confined beneath the epithelium.

      In addition, the presence of Matrigel likely masks the contribution of certain ECM components, making laminin or type IV collagen more apparent than fibronectin. We will therefore revise the interpretation of these results to explicitly acknowledge the contribution of ECM abundance in addition to alignment.

      • *

      (Fig 6i) The authors propose that the presence of ECM alone within the gap enhances epithelial gap closure compared to empty gap conditions, although gap closure remains less effective than in the presence of primary fibroblasts. From the figure legend and methods, it seems that the decellularized ECM condition is generated using NIH-3T3 fibroblasts cultured for 8 days, whereas the other conditions used primary fibroblasts cultured for 1 day (Fig. 6a-h). This comparison is confounded by differences in cell source and ECM deposition time. If I am misunderstanding this, please clarify, otherwise consider repeating the decellularized ECM condition using primary fibroblasts and matching culture times for a fair comparison. Along these lines, please include images showing that ECM fibers remain intact following decellularization.

      We thank the reviewer for this suggestion. We will include additional staining to confirm that ECM fibers remain intact after decellularization in the revised version.

      Regarding the use of NIH-3T3 fibroblasts for CDM generation, this choice was made to minimize potential residual paracrine signaling from primary intestinal fibroblasts after decellularization. We acknowledge that this introduces differences in cell source.

      Concerning culture time, we followed established protocols for CDM formation, which recommend extended culture periods ({greater than or equal to}8 days) to allow robust ECM deposition (Cukierman et al., 2001; Franco-Barraza et al., 2016; Godeau et al., 2020). We will clarify these points in the revised manuscript and discuss the limitations associated with these differences.

      • *

      - Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      Yes. The additional experiments outlined above would help support the current conclusions of the manuscript, rather than to explore new directions beyond its scope.

      • *

      - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Yes, the additional experiments primarily involve the inclusion of controls and additional immunofluorescence imaging to their existing experimental setups. They should be relatively straightforward to implement (~2-3 months).

      • *

      - Are the data and the methods presented in such a way that they can be reproduced?

      Yes.

      • *

      - Are the experiments adequately replicated and statistical analysis adequate?

      Overall, yes. But some plot legends should specify the number of replicates analyzed (e.g. Fig. 2b, Fig. 2d, Fig. 3h).

      We will review and correct these issues.

      • *

      Minor comments:

      - Specific experimental issues that are easily addressable.

      (Fig. 1c-e) The authors state that intestinal epithelial monolayers exhibit the most effective gap closure when in direct contact with fibroblasts. However, fibroblast-conditioned media and co-cultures show comparable gap closure efficiencies (Fig. 1e). The authors should consider revising this interpretation based on the provided data.

      We thank the reviewer for pointing this out, which was also raised by Reviewer 2. As discussed above, we agree that the original statement overstated the effect. Both fibroblast-conditioned medium and direct fibroblast contact promote efficient gap closure compared to control conditions, and we will revise the text accordingly to reflect that no consistent quantitative difference is observed between these two conditions.

      • *

      (Fig. 3b) The authors suggest that crypt-like epithelial cells undergo migration when grown on fibroblasts, but not in conditioned media alone. This is interesting, but it is not clear how they identify crypt-like cells for tracking. The authors should clarify if crypt-like cells are defined based on markers or inferred from their morphology.

      We thank the reviewer for this comment. In these tracking analyses, crypt-like cells were identified based on morphology. As shown in Figure S3 and in Larrañaga et al., 2025, crypt-like cells, defined by specific molecular markers, are significantly smaller than villus-like cells and form high-density regions. These features allow their identification based on morphology in fluorescently labeled monolayers. We will clarify this criterion in the Methods section of the revised manuscript.

      • *

      (Fig 3f-h) The authors conclude that fibroblasts promote directed epithelial cell motility based on cell trajectory analysis. Although they state that this analysis is performed on epithelial monolayers, their tdTomato epithelial population appears sparse in some conditions (control and conditioned media; Fig. S6a). Such variability in cell density may bias measurements of migration directionality at the cell-level, unless a mixed population is being used for tracking. The authors should clarify whether this analysis was indeed conducted on confluent monolayers.

      We thank the reviewer for this comment. For trajectory analysis, we used a mixed population of tdTomato-positive and non-fluorescent epithelial cells in some experiments to facilitate individual cell tracking. Importantly, epithelial monolayers were confluent in all conditions analyzed. We will clarify this in the Methods section.

      • *

      (Fig 6b) Their gap closure experimental setup indicates that fibroblasts are cultured on a Matrigel-coated surface, which should already contain abundant laminin and type IV collagen. Thus, it is unclear why type IV collagen is not detected underneath fibroblasts. The authors should explain why this is the case for clarity.

      We thank the reviewer for pointing out this observation. Indeed, fibroblasts are cultured on a Matrigel-coated surface which contains laminin and collagen type IV among many other components. We observed thick collagen-rich structures between the fibroblasts and the epithelia that we atributed, not only to fibroblasts' secreted collagen, but also a rearrengement of the collagen available in the coated surface. We will clarify this in the discussion of the revised version for clarity.

      • *

      - Are prior studies referenced appropriately?

      Yes

      • *

      - Are the text and figures clear and accurate?

      Mostly. Figures 6d and 6g seem to be duplicated by mistake.

      We thank the reviewer for noting this. We will correct this mistake.

      • *

      - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      There are some missing frames in Movie 2. If they are not available, it's okay to include black frames, so that the sequence remains consistent with the timestamps.

      The authors may consider using asterisks as significance indicators instead of reporting precise p-values directly on their plots. Having this format would facilitate visual comparison of statistical significance across conditions.

      Displaying single channels of experiments where co-cultures are used would help to better interpret their data.

      We thank the reviewer for pointing out these issues and for their valuable suggestions. We will correct the errors in the movie and improve the presentation as suggested where possible.

      • *

      Reviewer #3 (Significance (Required)):

      • *

      - Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This study provides a valuable contribution to understanding how fibroblasts influence intestinal epithelial migration. The main advance lies in the use of a co-culture system combining organoid-derived intestinal epithelial cells that assemble into a crypt-villus organization with primary intestinal fibroblasts in a 2D gap closure system. This approach allows the authors to examine epithelial-fibroblast interactions in a more physiologically relevant context compared to prior work.

      We thank the reviewer for their positive assessment of the significance of our work.

      • *

      - Place the work in the context of the existing literature (provide references, where appropriate).

      Addressed above.

      • *

      - State what audience might be interested in and influenced by the reported findings.

      Cell and developmental biology, extracellular matrix biology, tissue regeneration.

      • *

      - Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Tissue morphogenesis, cell motility, extracellular matrix dynamics.

      We thank the reviewer for their positive assessment and for their suggestions to improve the manuscript.

      • *
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) They start by incubating LFA-1 with iRBCs and show by flow analysis that a substantial population of these iRBCs binds to the LFA-1 (Figure 1C). They do conduct the control with uninfected RBCs, but put this in the supplementary material. As this is a critical control, I think that it should be moved to Figure 1C as it is essential to allow interpretation of the iRBC data. The authors also do not state which strain of P. falciparum they used (line 144). This is critical information as different strains have different variant surface antigens and should be included. With these changes, this data seems convincing.

      We thank the reviewer for this important suggestion. We agree that the uninfected RBC (uRBC) control is critical for interpreting the specificity of LFA-1 αI-Fc binding. In the revised manuscript, we have ensured that these control data are clearly presented and appropriately referenced in the main text; however, we have retained them in the Supplementary Information (Supplementary Figure S1) to maintain clarity and avoid overcrowding Figure 1, while still ensuring their visibility and accessibility to the reader. Importantly, these data demonstrate negligible binding of LFA-1 αI-Fc to uRBCs compared to iRBCs, supporting specificity. We have explicitly stated the parasite strain used (Plasmodium falciparum 3D7) in the Methods section (line 475).

      (2) They next incubated LFA-1 with the iRBCs, cross-linked and conducted a pulldown, identifying GP130 as a binding partner. Using cross-linkers is a dangerous strategy as it risks non-specific cross-linking. Did they try without cross-linking and find an interaction?

      We agree that cross-linking can introduce potential artefacts. To mitigate this, we included hIgG control pulldown experiments performed under identical conditions. Proteins identified in the control eluate were excluded as background (summarized in Supplementary Table S1). Importantly, PfGBP-130 was the only protein specifically enriched in the LFA-1 αI-Fc pulldown across all three biological replicates (Fig. 2A, Venn Diagram). While cross-linking was used to stabilize transient interactions, consistent enrichment of PfGBP-130 across the three biological replicates precludes any concerns of non-specificity.

      (3) They raised antibodies to PfGBP and showed IFA, which reveals that these antibodies stain iRBCs (Figure 2Ciii). This experiment lacks a critical control of uninfected RBCs, which needs to be included to show that the staining is specific. Without this, it is not possible to conclude that there is iRBC-specific staining with PfGBP.

      The question pertains to Fig. 2Biii. The IFA images include both infected and neighboring uninfected erythrocytes within the same field. No PfGBP-130 staining is observed in uninfected cells. PfGARP staining, specifically done to verify parasite-infected cell and surface localisation, shows complete resonance with PfGBP-130 staining. This unequivocally shows that the antibodies raised specifically recognise only infected RBCs.

      (4) They then conduct a pulldown using LFA-Fc, which does show GP130 only in the presence of the LFA-Fc, but not when empty beads are used. This is convincing. BLI measurements are also used to study this interaction (Figure 2Ci). The BLI data is presented in such a way that any association phase is obscured by the y-axis, which makes it impossible to know whether there is binding here. I think that the data needs to be shown with some baseline before the addition of the ligand so that the association can be seen. The data is also a bit messy with a downward drift and the curves showing different shapes, for example, with the 1.0uM curve seeming to have a different association rate. Also, is this n=1? I think that this data needs to be repeated and replicated. As this is the only data which shows a direct interaction between LFA1and GBP, as pulldowns are done with lysates, which might mean bridging components. I think that it is important to repeat the BLI or use additional biophysical methods to assess binding, to obtain more convincing data.

      We sincerely thank the reviewer for highlighting this important concern regarding the BLI data presentation and interpretation. We would like to clarify that the baseline signal prior to ligand addition was subtracted during data processing; therefore, the plotted curves represent the net response following ligand association. However, we agree that this may have obscured the visualization of the association phase. Accordingly, in the revised manuscript, we have re-plotted the data with adjusted y-axis scaling to better capture the association kinetics. In addition, to ensure robustness and reproducibility, the BLI experiments were performed in multiple independent replicates (n ≥ 3) using independently purified protein batches. The original figure showed a representative dataset; we have now included averaged sensorgrams along with standard deviation in the calculated KD values [K<sub>D</sub> = (1.7 ± 0.22) × 10<sup>-8</sup> M] (Figure 2C (i)). These revisions provide a clearer and more accurate representation of the binding interaction.

      (5) The authors next do some modelling of the putative complex. This is done by homology modelling and docking, which is not the most up-to-date method and is over-interpreted. Personally, I would remove this data as I did not find it convincing, and it is not important for the story. If the authors wish to include it, then I think that they should validate the modelling by mutagenesis to show that the residues which the models indicate might bind are involved in the interaction.

      We thank the reviewer for this thoughtful comment regarding the modelling analysis. We agree that computational docking and homology-based modelling have inherent limitations and should not be over-interpreted. In our study, these analyses were included strictly as supporting evidence to provide a structural framework for the PfGBP-LFA-1 interaction, while the primary conclusions are based on direct biochemical and functional validation, including pull-down, BLI measurements, receptor knockdown, and cellular inhibition assays. Importantly, the use of docking approaches such as ClusPro, followed by interface analysis and MD simulations, is a widely accepted and routinely used strategy to generate testable hypotheses for protein-protein interactions, particularly when experimental structures are unavailable (e.g., Comeau et al., 2004; Weng et al., 2019). We believe that the current modelling serves as a useful complementary analysis that is consistent with, and supportive of, the experimentally validated interactions.

      (6) They next made GP130 and tested the binding of this to THP-1 cells, which are often used as a model for macrophages. They observe greater binding of PfGBP-Fc to these cells when compared with hIgG and show that LFA-1 siRNA reduces this binding. I was a little confused about how the flow plots related to the graph in the bottom right corner of Figure 3Bii. In the flow plots, hIgG control shows 12.8% of cells in the gated region, while the unstained cells has 5.63%, but the MFI data shows a decrease in binding for hIgG vs unstained cells. How is this consistent? Also, the siRNA reduces the number of cells in the gated region from 66.6% to 25.9%, which is still substantially more that 5.63% in the unstained control. This also doesn't seem quite consistent with the MFI data. Could the authors explain this? Also, perhaps an additional experiment would be to add soluble LFA-1 into this assay as an additional control to determine whether this blocks PfGBP binding to the THP-1 cells? It could be that there are additional mechanisms of binding which indicate why the siRNA has a partial effect. The same is true for the NK cell experiments in Figure 3Ci, in which the siRNA has a partial effect. The authors also test binding to HEK, HepG2 and 'stem' cells and claim' only background levels of binding', but in each case, there is more binding to these cells by PfGBP-Fc than by hIgG, albeit less than in THP-1 and NK cells. Why have the authors decided that these increases are not significant? All in all, these experiments do indicate a role for the GBP-LFA1 interaction in the binding of immune cells to iRBCs, but perhaps not as absolutely as is suggested.

      We thank the reviewer for this insightful comment. The apparent discrepancy arises because the flow plots depict the percentage of cells within a defined positive gate, whereas the graphs quantify mean fluorescence intensity (MFI) across the entire population. We have revised figure legend accordingly to indicate the same. Regarding the partial reduction in binding upon LFA-1 (CD11a) knockdown, we agree that this indicates LFA-1 is a major but not exclusive contributor, which is biologically plausible given incomplete siRNA depletion and the known avidity-dependent nature of integrin interactions. Importantly, our conclusion is supported by multiple orthogonal approaches (αI-domain binding, LC-MS/MS identification, BLI, docking, receptor knockdown, and functional blockade). We also appreciate the suggestion of soluble LFA-1 competition, which we acknowledge as an important future experiment. Finally, we have revised the text regarding HEK293T, HepG2, and stem cells to reflect that PfGBP-Fc binding is minimal but not absent, consistent with low/non-expression of LFA-1 in non-immune cells. Overall, we have moderated our claims to state that PfGBP-LFA-1 interaction is a dominant and functionally relevant mechanism, while not excluding additional low-affinity or accessory interactions.

      Figure legend change: Representative flow plots depict the percentage of cells within a predefined positive gate, whereas the accompanying summary graph quantifies fluorescence intensity across the analyzed population. These two metrics report distinct properties of the distribution and are therefore not expected to be numerically identical.

      (7) The authors next produce CHO cells with PfGBP on the surface. These cells bind toLFA-1 specifically. When these cells were incubated with primary NK cells, they did see increases in activation markers, which were reduced by the addition of anti-CD11a, suggesting these to be specific. They also conduct the same experiment with anti-GBP with iRBCs, but this is in a different figure. It would be easier for the reader if Figure 5B were in the same figure as Figure 4B, as it is related data using the same method. I found this data convincing, showing that the LFA1:GBP interaction does contribute to immune cell recognition and activation.

      We thank the reviewer for this positive assessment and helpful suggestion regarding figure organization. We agree that the CHO-PfGBP and iRBC-based NK cell activation assays represent conceptually related experiments that both address LFA-1-PfGBP dependent activation using similar readouts. We have retained separate panels to distinguish the reductionist CHO-based system from the physiologically relevant iRBC context. We believe that the combined evidence from both systems strengthens the conclusion that PfGBP-LFA-1 interaction is a key contributor to NK cell recognition and activation.

      (8) The authors next conduct an experiment in which they assess parasite growth in the presence of NK cells and in the presence of anti-GBP. They use Heochst staining as a measure of parasite growth and claim that NK cells reduce the number of parasites, but that anti-GBP abolishes this effect (Figure 5A). I found this experiment very unconvincing as there are small effects and no demonstration of significance. More commonly used approaches to study parasite growth are lactate dehydrogenase GIA assays or calcein-AM labelling. I did not find this experiment convincing and would either remove or supplement with additional data using a more robust assay, with repeats and tests of statistical significance.

      We respectfully disagree that the assay should be removed, because flow-cytometric quantification of P. falciparum parasitemia using DNA dyes such as Hoechst is a widely used, accepted, and high-throughput approach for measuring infected erythrocytes and parasite growth, with clear separation of infected from uninfected RBCs and good reproducibility across malaria studies (Dent et. al., 2009; Jang et. al., 2014). Importantly, closely related immune-cell killing experiments in the malaria field have used the same general strategy, co-culture with effector cells followed by flow-cytometric enumeration of parasitemia to infer parasite control, including the seminal NK-cell study by Chen et. al., 2014, which our assay design follows conceptually, and later work showing reduced parasitemia after co-incubation with cytotoxic lymphocytes measured by nucleic-acid dye flow cytometry. We therefore believe the experiment is methodologically valid and directly relevant to the biological question, namely whether disrupting PfGBP-LFA-1 engagement alters NK-cell-mediated restriction of parasite expansion.

      Reviewer #2 (Public review):

      (1) PfGBP-130 is proposed to be a membrane protein based on a single predicted transmembrane domain. Figures 2b and 3a show ribbon schematics with this TM domain at residues 51-68, in agreement with TM prediction algorithms such as TMHMM 2.0 and Phobius. However, this predicted TM is upstream of the PEXEL motif (residues 84-88, sequence RILAE), a conserved sequence for parasite protein export to host cytosol that is proteolytically processed at its 4th residue. Thus, residues 1-87are removed from PfGBP-130 prior to export, yielding a mature protein without predicted TMs. Prior studies have determined that the mature PfGBP-130 lacks TMs and is retained as a soluble protein in host cell cytosol (PMID: 19055692, 35420481). Thus, the authors' model of PfGBP-130 as a surface-exposed membrane protein conflicts with both computational analysis of the mature protein and these prior reporter studies. An important simple experiment would be to evaluate PfGBP-130membrane association in immunoblots using the authors' PfGBP-130 antibody after hypotonic lysis (PMID: 19055692) and after alkaline extraction (e.g. 100 mM NaCO3, pH 11 as frequently used, PMID: 33393463). If the prior studies and computational analyses are correct, the protein will be predominantly in the soluble and/or alkaline supernatant fractions.

      We thank the reviewer for this important observation regarding PfGBP-130 topology and export. We agree that the presence of a PEXEL motif supports proteolytic processing and that the mature protein may lack a classical transmembrane domain. However, consistent with our model of surface accessibility, we would like to clarify that in an independent proteomic study performed in our laboratory on the membrane-enriched fraction of Plasmodium falciparum-infected erythrocytes, PfGBP-130 was reproducibly identified by LC-MS/MS among membrane-associated proteins (data not shown; can be provided upon request). These findings support the conclusion that, irrespective of the absence of a canonical transmembrane domain, PfGBP-130 is associated with the iRBC membrane compartment, likely via peripheral or protein-complex–mediated interactions, as described for several exported Plasmodium proteins.

      (2) Many findings rely on the specificity of antibodies generated against PfGPB-130 or NK cell receptors. Although the authors have included key controls (use of isotype control antibodies, lack of anti-PfGBP-130 binding to uninfected cells), cross-reactivity between P. falciparum antigens is well-recognized and could significantly undermine the interpretation of experiments (PMID: 2654292 and 1730474 provide key examples of antigens recognized by antibodies raised against other proteins). For example, the surface localization in IFA experiments (Figure 2B(iii)) could reflect anti-PfGBP-130binding to an unrelated parasite surface antigen, a possibility not addressed by any of the authors’ controls. As another example, the iRBC lysate immunoblot using this antibody in Fig. 2B(iv) suggests a MW of 95 kDa, which corresponds to the unprocessed pre-protein before export; cleavage in the PEXEL motif yields a processed mature protein of 85 kDa, which should be readily resolved from the pre-protein in immunoblots (PMID: 19055692). A better immunoblot using immature infected cell stages might show both the pre-protein and the mature protein as a doublet band.

      We thank the reviewer for raising this important concern regarding antibody specificity. We agree that cross-reactivity among P. falciparum antigens is a known issue and have taken multiple steps to ensure specificity in our study. First, the anti-PfGBP-130 antibodies were generated against a defined recombinant fragment and show no detectable binding to uninfected RBCs and no signal in hIgG control immunoprecipitates, supporting specificity. Importantly, in our LC-MS/MS analysis of LFA-1 αI-domain pull-downs, PfGBP-130 was specifically enriched and consistently identified across replicates, independently validating the target recognized by the antibody. Furthermore, the same antibody detects a single dominant band in both iRBC lysates and αI pull-down fractions, arguing against widespread cross-reactivity. Regarding the apparent molecular weight (~95 kDa), we agree that this likely corresponds to the precursor form, and that a processed form (~85 kDa) may not be well resolved under our current conditions.

      (3) PfGBP-130 is not essential for in vitro cultivation (PMID: 18614010 and MIS of 1.0 in the piggyBac mutagenesis screen as tabulated on plasmodb.org, indicating a highly dispensable gene). The authors should use the knockout line as a control in their IFA localization experiments to address antibody specificity. More fundamentally, their model predicts that NK cells should not recognize or kill infected cells from the knockout line when compared to their untransfected parent. Such results with the knockout line would compellingly support the authors' model without reliance on antibodies that may cross-react with other parasite antigens. PMID: 18614010reported that the PfGBP-130 knockout exhibited increased membrane rigidity, suggesting an intracellular scaffolding protein rather than a surface localization and use as a ligand for LFA-1 interaction and NK cell-mediated killing.

      We agree that a PfGBP-130 knockout line would provide a powerful genetic validation of both antibody specificity and the proposed functional role of PfGBP-130 in NK cell recognition. At present, such experiments were not included in this study, and we acknowledge this as an important limitation. However, we would like to emphasize that our conclusion does not rely on antibody-based localization alone; rather, it is supported by multiple orthogonal approaches, including LFA-1 αI-domain pull-down coupled to LC-MS/MS, biophysical interaction analysis, receptor knockdown, and functional blocking assays. In addition, in one of our previous proteomic analyses of the membrane-enriched fraction of infected erythrocytes, PfGBP-130 was identified among the proteins present in the membrane fraction, supporting its association with the iRBC membrane compartment despite lacking a classical mature transmembrane domain.

      (4) PfGBP-130 non-essentiality raises the question of why the gene would be retained if it triggers NK cell-mediated killing of infected cells in vivo. Presumably, this killing would pose strong selective pressure against retention of PfGBP-130. Some speculation is warranted to support the model.

      We thank the reviewer for this thoughtful evolutionary question. We agree that if PfGBP-130 enhances NK-cell recognition, its retention likely reflects a context-dependent fitness trade-off rather than a simple benefit or cost. This situation is not unusual in P. falciparum: several exported or surface-associated proteins are retained despite being immunogenic because they also provide advantages in other settings, such as erythrocyte remodeling, cytoadhesion, niche adaptation, immune modulation, or transmission. The clearest precedent is the PfEMP1/var system, in which highly immunogenic surface antigens are nevertheless strongly maintained because they mediate sequestration and in vivo fitness, while antigenic variation limits continuous immune exposure (Chew et. al., 2022). Similarly, other variant surface antigens such as STEVOR and RIFIN are retained despite immune recognition because they contribute to erythrocyte binding, antigenic diversity, and immune evasion or modulation (Niang et. al., 2009; Sakoguchi et. al., 2025). More broadly, many P. falciparum genes that appear dispensable in standard in vitro culture are nevertheless preserved because culture does not recapitulate the selective pressures present in vivo, including splenic clearance, endothelial interactions, immune attack, and within-host competition.

      Reviewer #3 (Public review):

      (1) Anti-GBP130 antibodies are used in the cellular assays to block the interaction between GBP130 and LFA1. They should therefore also block interactions betweenGBP130 and LFA1 recombinant proteins in the biolayer interferometry experiment. Do the authors have data to show this? Similarly, the anti-CD11a antibodies used to block the interaction in the cellular assays should also block the in vitro interaction between recombinant LFA1 and GBP130.

      We thank the reviewer for this insightful suggestion. We agree that demonstrating antibody-mediated inhibition of the recombinant PfGBP-LFA-1 interaction would provide an additional orthogonal validation of the interface. While such blocking experiments were not included in the original BLI dataset, our current study already establishes the specificity of this interaction through multiple independent approaches, including αI-domain pull-down and LC-MS/MS identification, BLI-derived high-affinity binding (KD ~10<sup>-8</sup> M), structural docking, receptor knockdown, and antibody-mediated inhibition in cellular systems. We note that antibody-mediated blocking in a purified biophysical system is not always directly comparable to cellular assays, as epitope accessibility, orientation on biosensor surfaces, and conformational states of integrins (which are known to undergo activation-dependent structural changes) can influence inhibition efficiency. Nonetheless, we fully agree that this represents an important validation experiment.

      (2) The structural modelling analysis of the predicted complex between GBP130 andLFA1 (Figure 2cii) predicts that the majority of the important GBP130 interface residues are located in the region D509-N607. However, the authors present BLI data for the GBP130-LFA1 interaction, which used the N-terminal fragment of GBP (residues 69-270), which does not include the GBP130 residues predicted to be important for the formation of the complex between the two proteins. Could the authors provide an explanation for how an interaction was observed with theGBP130-N fragment, which does not contain the residues predicted to be important for interacting with LFA1?

      We thank the reviewer for this important observation. We agree that the structural model predicts a major interaction interface within the D509-N607 region of PfGBP-130; however, this does not preclude the existence of additional or auxiliary binding determinants within the N-terminal region used in our BLI assays (aa 69-270). PfGBP-130 is a multi-domain, repeat-containing protein, and such proteins frequently exhibit distributed or multivalent interaction interfaces, where individual regions can independently engage binding partners with lower affinity while the full-length protein achieves higher avidity through cooperative interactions. In our study, the BLI data using the N-terminal fragment demonstrate that this region is sufficient to mediate direct interaction with the LFA-1 αI domain, whereas the structural model based on full-length predictions likely captures a dominant or higher-affinity interface in the C-terminal region. Importantly, the interaction is supported by multiple orthogonal datasets, including pull-down/LC-MS/MS, cellular binding assays, and functional inhibition, indicating that the observed binding is not an artefact of fragment choice.

      Author response image 1.

      To further examine this, we performed docking and binding energy analyses comparing the full-length PfGBP-130-LFA-1 complex with the N-terminal domain-LFA-1 complex. Using the PRODIGY server, the predicted binding affinity for the full-length complex was -9.8 kcal/mol, whereas the N-terminal domain complex exhibited a still favorable binding energy of -5.6 kcal/mol. Similarly, HawkDock (v2) analysis yielded binding energies of -22.2 kcal/mol for the full-length complex and -14.1 kcal/mol for the domain-only complex. While reduced relative to the full-length protein, these values remain well within the range of stable protein-protein interactions, supporting the ability of the N-terminal region to independently contribute to binding. These energy calculations take into account all non-covalent interactions. For clarity, hydrogen bonds have been specifically highlighted in the figure to represent key interaction interface.

      (3) There is no section in the materials and methods describing how the BLI was performed; this should be added. The highest concentration ofGBP130 used in the interaction measurements is 1.4uM, almost 100x the measured Kd (0.015uM) for the GBP130-LFA1 interaction. At these high concentrations ofGBP130, I would expect to start seeing saturation of binding, but the interferometry curves show that saturation is not close to being reached. This strongly suggests that the binding of GBP130 to LFA1 is non-specific.

      We thank the reviewer for raising these important technical points. We have included a detailed description of the biolayer interferometry (BLI) methodology in the Materials and Methods section in the manuscript. Regarding the concern about lack of saturation at higher analyte concentrations, we respectfully disagree that this necessarily indicates non-specific binding. In BLI assays, incomplete saturation can arise from several well-recognized factors, including suboptimal orientation or partial inaccessibility of immobilized ligand on the biosensor, mass transport limitations, or heterogeneous binding populations particularly relevant for integrins such as LFA-1, whose αI domain exists in multiple conformational states with distinct affinities. Importantly, the interaction exhibits clear concentration-dependent association and dissociation kinetics that fit a 1:1 binding model with a KD in the nanomolar range, which is inconsistent with non-specific interactions that typically show poor fitting and minimal dissociation. Furthermore, the specificity of the PfGBP-LFA-1 interaction is supported by multiple independent lines of evidence in our study, including selective enrichment in αI-domain pull-downs, absence in IgG controls, reduction upon CD11a knockdown, and functional inhibition by blocking antibodies in cellular assays. We have now clarified these points in the revised manuscript and tempered the interpretation to acknowledge potential experimental constraints of BLI while maintaining that the cumulative data strongly support a specific interaction.

      Minor points:

      (1) For the pulldown experiments, can the authors confirm that cross-linking was also performed for the protein A beads + hIgG control?

      Yes, DTSSP cross-linking was performed identically in the protein A beads + hIgG control arm. This is consistent with the control design described in the manuscript.

      (2) If the recombinant CD11a I subdomain used as a probe is correctly folded and functional, it should bind ICAM1. Do the authors have this data?

      We agree that ICAM-1 binding is an important functional validation for the recombinant CD11a αI probe (Hogg et. al., 1998). The isolated αI domain of LFA-1 is well established as the principal ICAM-1-binding module, and soluble αI-domain reagents have previously been shown to bind/block ICAM-1 interactions. We did not include this control in the current version.

      (3) Were the authors able to perform the reciprocal pull-down, using pfGBP130-N-Fc to pull down LFA1 from cell surfaces?

      We did not perform a reciprocal pull-down with PfGBP130-N-Fc and native cell-surface LFA-1 in the present study; we agree this would be a useful orthogonal experiment.

      (4) After identifying GBP130 as a co-purifying protein in the LFA-1 pull-down experiments, the authors select an N-terminal fragment of GBP130 to recombinantly express and use. How did the authors narrow down which region of GBP130interacted with LFA-1?

      The N-terminal PfGBP130 fragment (aa 69-270) was selected empirically as a tractable, soluble recombinant segment containing a defined repeat-containing extracellular region, rather than because we had already mapped the full LFA-1-binding interface. We agree with the reviewer that our structural model suggests that additional residues, including a likely dominant interface outside this fragment, may contribute to the full interaction, and we have clarified that the N-terminal fragment should be interpreted as a minimal binding-competent region, not necessarily the sole binding site.

      (5) As erythrocytes age, their surface undergoes biochemical changes, most notably a drop in levels of sialylation, decreasing the net repulsive negative charge, and they generally become more adherent. Can the authors exclude the possibility that, rather than binding to a parasite-derived ligand, LFA alpha 1 is instead binding to a marker of older erythrocytes? In the data presented, increased binding of LFA alpha 1 is observed as parasites progress through the life cycle, but the host erythrocytes will be ageing during parasite replication, which could account for the increased levels of LFA alpha 1 binding. To rule out this explanation, data from LFA alpha 1 staining of age-matched uninfected erythrocytes could be provided.

      We agree that erythrocyte aging can alter surface sialylation and adhesiveness, and loss of sialic acid is known to reduce erythrocyte surface charge and increase adhesiveness. However, our data argue against aging alone explaining the signal, because LFA-1 αI-Fc binding was compared with uninfected RBC controls and the interaction led to enrichment of a parasite-derived ligand, PfGBP130, in pull-down/MS analyses.

      (6) Figure 3b(i) Surface staining of THP1 cells was performed using GBP-130 Fc as a probe, which should detect all LFA1-positive cells. But no accompanying staining data using an anti-LFA1 antibody are shown, so it is not possible to determine whether staining profiles with GBP-130 Fc match staining profiles with anti-LFA1 antibodies. This is important to show what proportion of LFA1-positive cells can recognise parasite-derived GBP-130 Fc.

      (7) Figure 3c(i) Surface staining of peripheral NK cells is performed using GBP-130 Fc as a probe, which should detect all LFA1-positive cells. Here, as well, there are no staining data using an anti-LFA1 antibody. This would allow a comparison between cell population LFA1 staining with an anti-LFA1 antibody and cell population LFA1 staining with GBP-130 Fc. The two staining profiles should be similar as both probes bind the same surface marker. However, it appears this might not be the case because the staining data using GBP-130 Fc show that only a minor proportion of NK cells (~20%) stain positive, but the majority of peripheral NK cells usually express CD11a, as it is a key adhesion molecule in the formation of immune synapses with target cells. This suggests that GBP-130 can only bind to a subset of NK cells, and if it is binding LFA1, then it can only play a role in mediating the formation of an immune synapse with this subpopulation of NK cells. Could the authors include a comment in the manuscript making clear that the GBP-130 only assists a small proportion of NK cells in adhering to parasite-infected erythrocytes? Are there any reasonable hypotheses as to whyGBP-130 was only able to stain a small subpopulation of LFA1-expressing NK cells?

      For minor comment 6 and 7

      We agree that parallel staining with anti-CD11a would help relate PfGBP130-Fc binding to total LFA-1-positive THP-1 and NK-cell populations. Importantly, LFA-1 expression and ligand binding competence are not equivalent, because integrin binding depends strongly on activation/conformation and avidity state; in NK cells, only a subset can display LFA-1 in a partially activated conformation at baseline despite broader CD11a expression. Thus, a smaller PfGBP130-Fc-positive subset than the total CD11a-positive population is biologically plausible and does not imply inconsistency.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study examines the evolution of virulence and antibiotic resistance in Staphylococcus aureus under multiple selection pressures. The evidence presented is convincing, with rigorous data that characterizes the outcomes of the evolution experiments. However, the manuscript's primary weakness is in its presentation, as claims about the causal relationship between genotypes and phenotypes are based on correlational evidence. The manuscript needs to be revised to address these limitations, clarify the implications of the experimental design, and adjust the overall narrative to better reflect the nature of the findings.

      Thank you for your feedback. Here, we summarize the major changes made in the revised manuscript:

      (1) We did not test causality between mutations and phenotypes in our study. We were intentional about not using causal wording (“mutation X caused/led to/resulted in phenotype Y”), and only discussed these results using the terms “correlation” and “association”, and only when they were statistically significant. We understand that some readers may view these terms as being equivalent to “causation”, thus in the revision, we have modified our wording as suggested (please see below for specific lines).

      (2) We agree that experimental evolution in nematodes is not a direct simulation of evolution in humans. The goal of our study was first and foremost, a test of how multiple selective pressures can shape pathogen evolution. This point was presented in the first paragraph, the second to last paragraph of the Introduction (which included our hypotheses), and the last paragraph of the manuscript. References to humans and other mammalian systems were intended to point out similarities between our findings and what had already been found in S. aureus outside the lab. Despite differences between mammals and nematodes, several parallels arose at both the phenotypic and genomic levels, which is interesting from an evolutionary standpoint. We understand that more experiments and tests would be needed before we can make claims about the selective pressures acting on S. aureus outside the lab. We presented some information in the context of humans because a large part of the literature on S. aureus is on its role as a major bacterial pathogen; we did not want to neglect this aspect of its natural life history.

      In the revised manuscript, we are more explicit in stating these points, as well as tempering some language regarding human infection, and removing some references to humans. Please see below for specific lines as well as justification for specific references to humans/mammalian systems.

      (3) We have including additional details on the experimental design below. We hope this is sufficiently clarifying.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate how methicillin-resistant (MRSA) and sensitive (MSSA) Staphylococcus aureus adapt to a new host (C. elegans) in the presence or absence of a low dose of the antibiotic oxacillin. Using an "Evolve and Resequence" design with 48 independently evolving populations, they track changes in virulence, antibiotic resistance, and other fitness-related traits over 12 passages. Their key finding is that selection from both the host and the antibiotic together, rather than either pressure alone, results in the evolution of the most virulent pathogens. Genomically, they find that this adaptation repeatedly involves mutations in a small number of key regulatory genes, most notably codY, agr, and saeRS.

      Strengths:

      The main advantage of the research lies in its strong and thoroughly replicated experimental framework, enabling significant conclusions to be drawn based on the concept of parallel evolution. The study successfully integrates various phenotypic assays (virulence, growth, hemolysis, biofilm formation) with whole-genome sequencing, offering an extensive perspective on the adaptive landscape. The identification of certain regulatory genes as common targets of selection across distinct lineages is an important result that indicates a level of predictability in how pathogens adapt.

      Thank you very much.

      Weaknesses:

      (1) The main limitation of the paper is that its findings on the function of specific genes are based on correlation, not cause-and-effect evidence. While the parallel evolution evidence is strong, the authors have not yet performed the definitive tests (i.e., reconstruction of ancestral genes) to ensure that the mutations identified in isolation are enough to account for the virulence or resistance changes observed. This makes the conclusions more like firm hypotheses, not confirmed facts.

      We have replaced instances of “association” and “correlation” with wording similar to that suggested where applicable, including:

      L 342 – 344: “The loss of SCCmec and ACME was more often identified in populations exhibiting an increase in total growth from the ancestor outside the host…”

      L 371 – 375: “Mutations in three genes were regularly identified in populations exhibiting significant increases in virulence from the ancestor: codY, gdpP, and pbpA. Mutations in agr in general were not associated with changes in overall virulence, but MSSA populations harboring mutations in this gene were more likely to exhibit greater virulence compared to MRSA populations (Wilcoxon rank sum exact test P = 0.045).”

      L 377: “Mutations in specific genes were often found in populations able to hemolyze red blood cells…”

      L 379 – 381: “There were also significant differences between the mutations regularly identified in oxacillin-resistant populations evolved from the MSSA ancestor...”

      L 384 – 385: “By contrast, mutations in agr were often in populations exhibiting loss of hemolytic activity, consistent with previous findings...”

      L 409 – 410: “Mutations that arose during experimental evolution are regularly found in strains associated with human systemic infections.”

      We have also stated that ancestral reconstruction is needed:

      L 553 – 555: “Future experiments may include introducing these mutations into the ancestral background to directly link the mutations in these genes to evolved virulence.”

      (2) In some instances, the claims in the text are not fully supported by the visual data from the figures or are reported with vagueness. For example, the display of phenotypic clusters in the PCA (Figure 6A) and the sweeping generalization about the effect of antibiotics on the mutation rates (Figure S5) can be more precise and nuanced. Such small deviations dilute the overall argument somewhat and must be corrected.

      In reference to Fig. 6A, we have revised the statement as suggested: “…where populations exposed to host and sub-MIC oxacillin clustered together, largely separating from all other treatments…” Line 442

      In reference to Fig. S5, we conducted statistics to include both MRSA and MSSA populations and examined the effect of oxacillin on the number of mutations. While oxacillin had a significant effect on the number of mutations, we agree with the reviewer that this may be driven by the MRSA populations and have clarified: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence ( = 5.92, P = 0.015), although this is likely driven by MRSA populations.” Lines 310 – 311

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes the results of an evolution experiment where Staphylococcus aureus was experimentally evolved via sequential exposure to an antibiotic followed by passaging through C. elegans hosts. Because infecting C. elegans via ingestion results in lysis of gut cells and an immune response upon infection, the S. aureus were exposed separately across generations to antibiotic stress and host immune stress. Interestingly, the dual selection pressure of antibiotic exposure and adaptation to a nematode host resulted in increased virulence of S. aureus towards C. elegans.

      Strengths:

      The data presented provide strong evidence that in S. aureus, traits involved in adaptation to a novel host and those involved in antibiotic resistance evolution are not traded off. On the contrary, they seem to be correlated, with strains adapted to antibiotics having higher virulence towards the novel host. As increased virulence is also associated with higher rates of haemolysis, these virulence increases are likely to reflect virulence levels in vertebrate hosts.

      Weaknesses:

      Right now, the results are presented in the context of human infections being treated with antibiotics, which, in my opinion, is inappropriate. This is because

      (1) exposure to the host and antibiotics was sequential, not simultaneous, and thus does not reflect the treatment of infection, and

      (2) because the site of infection is different in C. elegans and human hosts.

      We have removed the two sentences referencing site of infection:

      Introduction: “In the host, antibiotic concentrations will gradually decline after administration due to metabolism and excretion.”

      Discussion: “…in addition to infection of antibiotic-treated hosts, where there is uneven distribution of drugs across tissues.”

      For our rationale for discussing humans in general, please see below.

      Nevertheless, the results are of interest; I just think the interpretation and framing should be adjusted.

      Thank you very much.

      Reviewer #3 (Public review):

      Summary:

      Su et al. sought to understand how the opportunistic pathogen Staphylococcus aureus responds to multiple selection pressures during infection. Specifically, the authors were interested in how the host environment and antibiotic exposure impact the evolution of both virulence and antibiotic resistance in S. aureus. To accomplish this, the authors performed an evolution experiment where S. aureus was fed to Caenorhabditis elegans as a model system to study the host environment and then either subjected to the antibiotic oxacillin or not. Additionally, the authors investigated the difference in evolution between an antibiotic-resistant strain, MRSA, and an isogenic susceptible strain, MSSA. They found that MRSA strains evolved in both antibiotic and host conditions became more virulent, and that strains evolved outside these conditions lost virulence. Looking at the strains evolved in just antibiotic conditions, the authors found that S. aureus maintained its ability to lyse blood cells. Mutations in codY, gdpP, and pbpA were found to be associated with increased virulence. Additionally, these mutations identified in these experiments were found in S. aureus strains isolated from human infections.

      Strengths:

      The data are well-presented, thorough, and are an important addition to the understanding of how certain pathogens might adapt to different selective pressures in complex environments.

      Thank you very much.

      Weaknesses:

      There are a few clarifications that could be made to better understand and contextualize the results. Primarily, when comparing the number of mutations and selection across conditions in an evolution experiment, information about population sizes is important to be able to calculate the mutation supply and number of generations throughout the experiment. These calculations can be difficult in vivo, but since several steps in the methodology require plating and regrowth, those population sizes could be determined. There was also no mention of how the authors controlled the inoculation density of bacteria introduced to each host. This would need to be known to calculate the generation time within the host. These caveats should be addressed in the manuscript.

      While the population sizes within hosts and generation time could be determined, we would need to conduct additional experiments (e.g., infecting nematodes with S. aureus, then crushing, plating, and counting colony forming units across time intervals) in order to obtain measurements for pathogen growth in hosts across time. For experimental evolution, we crushed a set number of dead nematodes (30) and all bacteria that were released were allowed to grow in liquid media before an aliquot (25%) was used to seed the next passage. Picking and crushing nematodes across 48 populations for one time point was an arduous task. The additional steps of picking, crushing, and plating nematodes across multiple time intervals at the same time experimental evolution was being performed would not be logistically sound.

      In terms of the inoculation density of bacteria, all nematodes were placed on abundant lawns of S. aureus. Nematodes were exposed to full lawns the entire infection step; bacteria remained in abundance. While we do not know the exact inoculum each individual nematode was exposed to, we know that they ingested the bacteria because of the high mortality rate. Furthermore, we followed the same procedure for every replicate across every host-associated treatment. Host individuals within and across passages were also genetically identical to one another. Altogether, these factors allowed for more consistency across the experiment, such that relative inoculum size should be similar across individual hosts. Please refer to the evolution experiment diagram (Author response image 1) for more details.

      Ultimately, while knowing the absolute population size, inoculum size, and generation time within the host is interesting, the rounds of selection (the number of times each population was exposed to the selective pressures) is also important in addressing our major question. Every treatment, which started out from one ancestral clone (MRSA or MSSA), was exposed to the same number of bouts of selection (passages), yet we see significant divergence in terms of traits and mutations. Future directions would certainly involve determining the number of steps (e.g., number of generations within hosts) required to reach these end points, but not knowing exactly how many steps were required do not detract from addressing the larger question of determining how pathogens respond to multiple selective pressures.

      Another concern is the number of generations the populations of S. aureus spent either with relaxed selection in rich media or under antibiotic pressure in between the host exposure periods. It is probable then that the majority of mutations were selected for in these intervening periods between host infection. Again, a more detailed understanding of population sizes would contribute to the understanding of which phase of the experiment contributed to the mutation profile observed.

      We conducted every step of the evolution experiment on the same timeline. For example, all replicates across treatments were grown in liquid media at the same time (see Author response image 1.). All populations were exposed to the same selective pressures at this step of the experiment. We can then compare populations that were subsequently exposed to hosts against those that were not. Populations passaged without a host served as the control. Mutations that were solely unique to host-exposed populations would more likely contribute to the traits of interest, compared to mutations that were in common between the host-exposed and no-host treatments. Similar comparisons could be made with the oxacillin-exposed and no-oxacillin populations.

      In general, the only differences between treatments would be driven by the treatments themselves. Given that we are interested in treatment-level effects, any differences in population size or generation time between treatments could contribute to the treatment effects we observe, and thus were not something we aimed to hold uniform across our experiment.

      Author response image 1.

      Schematic of procedural steps involved in one passage of S. aureus through nematodes (+host -ox) compared to without nematodes (-host -ox).

      Recommendations for the authors:

      Reviewing Editor Comments:

      We encourage you to address all other comments raised by the reviewers; however, the review team has identified the following points as the most critical and fundamental to improve your manuscript:

      (i) Reframing the narrative: You will need to adjust the narrative so that the study is presented as a "proof of principle" rather than a direct simulation of a human infection.

      While we referenced human infection, we believe the study had been presented as a proof of principle. Examples include:

      (1) We discussed the gap of knowledge in the first paragraph: “It is unclear how virulence evolves in the face of more than one selective pressure and whether this trait is constrained or facilitated by antibiotic resistance.” Lines 86 – 88

      (2) In the second to last paragraph in the Introduction, we presented the main hypotheses: “Adaptation may require resources to be expended toward either virulence or antibiotic resistance, leading to a trade-off between these traits (Ferenci, 2016). Alternatively, weaker selection from sub-MIC antibiotics may interact synergistically with hosts and facilitate the evolution or maintenance of high virulence and antibiotic resistance.” Lines 176 – 179

      (3) The last paragraph concluded with “Our findings ultimately emphasize the importance of considering the host context in the evolution of antibiotic resistance. Integrating multiple traits, such as virulence, antibiotic resistance, and fitness may be critical in identifying the factors that facilitate host shifts and persistence of drug-resistant pathogens.” Lines 613 – 616

      These paragraphs, which set up the context for our work, did not primarily discuss human infections.

      In the revised manuscript, we have further tempered language regarding human infection:

      L 169 - 172: “Experimentally evolving S. aureus in C. elegans thus allows us to track the early stages of virulence and antibiotic resistance evolution in novel host populations with the potential to identify conserved genomic regions underlying evolved traits.”

      L 595 – 596: “Additional direct tests are needed to evaluate the role of these mutations in adaptation of S. aureus to different infection sites.”

      L 610 – 611: “Pathogen evolution in a tractable invertebrate animal model yielded phenotypes and genotypes similar to those identified in mammalian hosts, highlighting the utility of evolution experiments to identify potential ecological and genetic mechanisms that may give rise to pathogen traits conserved across systems.”

      And removed some references to humans:

      In the Introduction: “In the host, antibiotic concentrations will gradually decline after administration due to metabolism and excretion.”

      In the Discussion: “…in addition to infection of antibiotic-treated hosts, where there is uneven distribution of drugs across tissues.”

      Otherwise, our rationale for referencing humans/mammalian systems in our Introduction include:

      Setting the context of our study system: we discussed humans and clinical significance when we first introduced S. aureus (lines 132 – 151) and experimental evolution (lines 153 – 172). Much of what is known about S. aureus outside the lab is when it is interacting with humans, thus we weaved in relevant information that has been discovered in other organisms.

      Hemolysis: This ability is important for S. aureus virulence toward C. elegans (Sifri et al., 2003).

      S. aureus genomic database: we intended to leverage this large-scale database of genomes isolated from S. aureus outside the lab to compare patterns emerging from experimental evolution to those in existing isolates. Due to its relevance as a major bacterial pathogen, most of the isolates happen to be from clinical settings.

      (ii) Adjusting the causal language: You will need to soften the language so that correlational claims do not appear to be causal.

      We have adjusted language as noted above.

      (iii) Clarifying methodological aspects: You will need to provide more details on the methodology, such as population sizes, and clarify the implications of these in the conclusions of the work.

      We have provided additional explanation of methodology and the role of control (no host) treatments above.

      Reviewer #1 (Recommendations for the authors):

      The paper is robust, and the study is of great significance. Tackling the subsequent issues would greatly enhance the paper and elucidate its findings.

      Major Recommendations:

      (1) Revising Causal Language: The main flaw of the manuscript lies in its presentation of correlational data as if it were causal. We highly suggest a thorough review of the text to soften causal language when connecting genotypes to phenotypes. The absence of ancestral reconstruction should be recognized as a constraint. Assertions ought to be presented as robust, evidence-based hypotheses. For instance, rather than saying a mutation "associated with significant increases in virulence," you might say "was regularly identified in groups that developed increased virulence, strongly suggesting this gene's role in the adaptation." This will more precisely clarify the contribution of the work.

      We have softened language and stated that ancestral reconstruction is needed as noted above.

      (2) Expand on Parallel Mutations: The examination of parallel evolution in Figure 4A is intriguing but would be notably stronger with additional details. I suggest including an additional supplementary figure or table detailing the specific non-synonymous mutations identified in the highly parallel genes (e.g., codY, agr, gdpP). It is essential for the reader to understand whether parallel evolution is happening at the gene level (different mutations in a single gene) or at the nucleotide level (the precise same mutation appearing again). Kindly specify if any of these mutations were nonsense mutations, as this suggests that the loss-of-function is advantageous.

      The full table of mutations is in fig share (10.6084/m9.figshare.28745558). We have added a Supplemental Table (Table S2) containing mutations in genes occurring in more than two populations. Many of these mutations were not the same, indicating parallel evolution at the gene level (lines 315 – 317).

      Minor Recommendations for Clarity and Accuracy:

      (1) Introduction:

      Lines 176-177: Please add a citation for the statement describing the function of the SCCmec cassette, as this is established knowledge.

      Done.

      (2) Results:

      Section Title (Line 254): The title "Host and sub-MIC antibiotic promoted growth..." is imprecise. Figure 3B shows that it is the combination of these factors that promotes growth in MRSA, while oxacillin alone is detrimental. Please revise the title to reflect this synergistic effect.

      “Synergistically” has been added to the title: “Host and sub-MIC antibiotic synergistically promoted growth of MRSA…” Lines 269 – 270

      Lines 261-263: The description of Figure 3B is incomplete. The text should explicitly state that the -host+ox treatment resulted in the lowest growth for MRSA, which provides a critical contrast and suggests a fitness cost.

      We have added “By contrast, exposure to sub-MIC oxacillin alone yielded the lowest growth, suggesting a fitness cost.” Lines 277 – 278

      Line 294: The claim that "Sub-MIC oxacillin selection also resulted in more mutations" is a generalization not supported for the MSSA genotype, according to Figure S5. Please revise this sentence to specify that this effect was observed in the MRSA populations.

      We have clarified: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence ( = 5.92, P = 0.015), although this is likely driven by MRSA populations.” Lines 310 – 311

      Lines 419-421: The claim that the +host+ox populations in Figure 6A "formed a distinct cluster" is an overstatement, as there is visible overlap with one other treatment (e.g., host-ox). Please revise this to more accurately describe the visual data (e.g., "clustered together, largely separating...").

      We have revised the statement as suggested: “…where populations exposed to host and sub-MIC oxacillin clustered together, largely separating from all other treatments…” Lines 442 – 443

      Lines 422-424: The interpretation of the MRSA PCA (Figure 6A) focuses on the correlation between virulence and sub-MIC growth. However, the correlation between "biofilm production" and "growth without oxacillin" appears visually stronger. Please address this correlation as well for a more complete interpretation.

      We have added “For MRSA populations, biofilm production and growth without oxacillin also appeared to be positively correlated.” Lines 447 – 448

      (3) Discussion:

      Lines 469-470: The statement that "exposure to oxacillin resulted in pathogens causing the greatest host mortality" is imprecise. The data in Figure 2A show that it is the combination of host and oxacillin. Please revise this for accuracy and add a direct citation to Figure 2A here.

      We have added clarification: “Nonetheless, we observed differing evolutionary trajectories, where exposure to oxacillin in host-associated treatments resulted in pathogens causing the greatest host mortality.” Lines 496 – 498

      Reviewer #2 (Recommendations for the authors):

      After reviewing the paper and reading the previous reviews from PLoS Biology, my biggest criticism of the paper is the way the story is told. In principle, the results are interesting and relevant, but the analogy to human infection and immune system/ antibiotic treatment strategies does not fit entirely with the experimental design or the results. I think the motivation needs to be reframed. In the study, antibiotic exposure is purely environmental, i.e., not in the host. How does environmental antibiotic use affect in vivo evolution, as this is not tested? As previous reviewers have pointed out, S. aureus is not an enteric pathogen in humans but most often causes skin infections. Furthermore, much of the results and discussion is focused on haemolysis of red blood cells, a cell type that C. elegans does not have. What the paper does present, on the other hand, and something that is interesting and novel, is a test in a model system of how a bacterial pathogen evolves to competing selection pressures. I might have hypothesised a priori that these competing pressures result in trade-offs, something which there is no evidence of, even though growth rate does not appear to be negatively impacted as a consequence of selection for drug resistance and virulence together. Instead, many traits are correlated and seemingly at the mechanistic level. This is cool and is a proof of principle, even if the system does not completely mirror reality, and I think the story should be told as such.

      We agree entirely with the reviewer that testing how pathogens respond to multiple selective pressures and the resulting lack of trade-offs are significant and interesting. We presented this question (lines 86 – 88) and our hypothesis about such trade-off in the Introduction (lines 176 – 179). As stated above, we had framed our paper to highlight these points and have removed references to antibiotic concentrations in treated humans.

      We measured and discussed hemolysis because it is important for virulence toward C. elegans (lines 195 – 197) (Sifri et al., 2003). We believe our manuscript contained a reasonable discussion of this trait. For example, three panels of the main figures presented the main hemolysis results (Figures 2B, 2C, and 2D), whereas 23 other panels did not at all involve hemolysis. In the Discussion, hemolysis took up half of the shortest paragraph (lines 509 – 519) and an additional sentence (line 589 – 591), out of seven total paragraphs.

      Specific comments:

      (1) L137-138. Can S. aureus really survive for long periods of time outside of the host? Can you clarify this statement? Do you mean it is an opportunistic pathogen and can also replicate in the environment?

      S. aureus can form biofilms and persist for weeks on inert surfaces (Kramer et al., 2024; Tran et al., 2023), indicating that it may replicate in non-host environments. We have included the phrase “opportunistic pathogen” to clarify (line 145).

      (2) L187 - to ascertain

      Corrected.

      (3) Figure 2B - there seems to be a benefit of haemolysis activity to oxacillin resistance, perhaps a crossover in mechanism? In MSSA, without a host, it goes to complete fixation, whereas it is completely lost when antibiotics aren't present. I know this is discussed later, but I would appreciate a more detailed hypothesis of why this could be.

      Antibiotics have been found to induce expression of virulence traits, such as in the case of oxacillin and hemolysis. Thus, it is reasonable that exposure to oxacillin during evolution would maintain MSSA’s hemolytic ability. We hypothesize that the loss of hemolysis in the absence of oxacillin may be due to the cost of hemolysis expression without a stimulant (oxacillin), hemolysis may not be expressed as often and be subject to deleterious mutations. Alternatively, the stress that cells were under favored virulence in some way, rather than the direct action of the antibiotic.

      (4) L225-228 - As C. elegans do not have red blood cells, why would we expect this? Do you see increased lysis of C. elegans gut cells? Or could it be due to iron accumulation as you are growing the staph on BHI?

      We measured and correlated nematode mortality with hemolytic ability because hemolysis had been found to be involved in virulence toward C. elegans (Sifri et al., 2003). The hemolysis phenotype is a surrogate for S. aureus virulence gene expression.

      (5) Figure 3A - There seems to be a growth cost of evolving oxacillin resistance in the absence of a host. Why might this be?

      MRSA populations exposed to oxacillin without a host during evolution visually exhibited the lowest growth rate. While this is an interesting question, the result was not statistically significant, so we cannot speculate in the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Some claims in the introduction are either non cited or not correctly stated. The second sentence has a claim about the interplay between antibiotic resistance and virulence with no citation listed. Additionally, there is a claim about S. aureus "evading detection" by attacking the host's immune cells. That is by definition not avoiding detection. Perhaps phrasing it as resisting host immune function would make it clearer.

      We have added a citation (lines 80 – 81) and clarified our wording: “Once inside the host, S. aureus resists host immune function by hindering or lysing immune cells.” Lines 140 – 141

      (2) Once in the introduction and in the discussion, the authors referred to S. aureus as a novel pathogen for C. elegans, I do not think enough is known to make this statement.

      This S. aureus strain is novel because it was isolated from humans, so at least in its recent evolutionary past, it has not interacted with C. elegans. Furthermore, we used a C. elegans isolate (N2) that had been frozen and maintained in the lab on E. coli, and had not been exposed to other microbes in its recent evolutionary past. Finally, S. aureus has not been found to be a native pathogen of C. elegans in nature (Ekroth et al., 2021).

      (3) Key suggestion: Change Figure 1C to reflect the design better. So you could have the +OXA before the host and then have an arrow looping back again to show the cycle of each step. So a figure that would have something like: MRSA > +OXA > +host>+OXA --> MRSA .

      We have updated the figure as suggested.

      (4) Suggest changing "greatest" on line 191, section header to greater.

      Done.

      (5) Line 258: Rich media can still provide selective pressures that are difficult to quantify - fast growth, cofactor and other nutrient limitations due to that fast growth

      We have adjusted our wording: “Importantly, rich media reduced the risk of introducing additional selective pressures than those being tested.” Lines 273 – 274

      (6) Why were intergenic mutations routinely ignored? These can often be very important phenotypically.

      We had focused on genes because there was a sufficient number of genes to discuss, but we have added a Supplemental Table (Table S2) containing all mutations (including intergenic and synonymous) appearing in more than 2 populations. We have also added information regarding mecA, an accessory gene, highlighting the role non-core genes may have in shaping bacterial evolution:

      “Despite evolving in similar environments, MRSA and MSSA populations differing only in the presence of an intact accessory gene (mecA)—proceeded on divergent evolutionary paths…” Lines 66 – 68

      “Carriage of Staphylococcal cassette chromosome mec (SCCmec), which encodes mecA, an accessory gene that provides resistance…” Lines 187 – 188

      “As MRSA and MSSA only differed in the presence of an intact mecA gene at the start of the experiment, accessory genes may play important roles in shaping bacterial evolution (Jackson et al., 2011).” Lines 472 – 474

      (7) Line 294: more mutations than what?

      We have clarified the sentence: “Sub-MIC oxacillin selection also resulted in more mutations than in its absence…” Lines 310 – 311

      (8) Lines 295-297: wording is pretty confusing. It seems that the discussion is about increased mutation rates, possibly due to hypermutators resulting from mutL or recA mutations, but this isn't well-thought out and much is implied here. Furthermore, see the above comment about comparing mutations across conditions - it's hard to make inferences of mutation rates without knowing the mutation supply as a result of varying population sizes across conditions and through the experiment.

      We have clarified the sentence: “…there were only two mutations in DNA and mismatch repair genes (mutL and recA), suggesting repair genes were not the sole mechanism involved.” Lines 313 – 314

      Because all populations evolved from one ancestral clone (either MRSA or MSSA), all mutations that are found at the end of the experiment would have arisen de novo from that ancestor. Since all populations experienced the same number of passages/rounds of selection, we determined mutation rate by counting the number of mutations that were found at the last passage for each replicate population. Populations that acquired significantly more mutations had a higher mutation rate in terms of # of mutations/# of selection rounds.

      (9) Line 486: typo "Mutations genes".

      Corrected.

      (10) Line 487: "antibiotics may allow" is awkward; suggest changing to more precise language, possibly relating to pleiotropy if that is what was meant here.

      We had intended to mean “adaptation [to antibiotics] may allow”. We have clarified: “Mutations in genes involved in resistance to antibiotics were found more often in populations with increased virulence, suggesting that antibiotic adaptation may also favor evolution of virulence.” Lines 514 – 516

      REFERENCES

      Ekroth AKE, Gerth M, Stevens EJ, Ford SA, King KC. 2021. Host genotype and genetic diversity shape the evolution of a novel bacterial infection. ISME Journal 15:2146–2157. DOI: https://doi.org/10.1038/s41396-021-00911-3, PMID: 33603148

      Kramer A, Lexow F, Bludau A, Köster AM, Misailovski M, Seifert U, Eggers M, Rutala W, Dancer SJ, Scheithauer S. 2024. How long do bacteria, fungi, protozoa, and viruses retain their replication capacity on inanimate surfaces? A systematic review examining environmental resilience versus healthcare-associated infection risk by “fomite-borne risk assessment.” Clinical Microbiology Reviews. PMID: 39388143

      Sifri CD, Begun J, Ausubel FM, Calderwood SB. 2003. Caenorhabditis elegans as a model host for Staphylococcus aureus pathogenesis. Infection and Immunity 71:2208–2217. DOI: https://doi.org/10.1128/IAI.71.4.2208-2217.2003, PMID: 12654843

      Tran NN, Morrisette T, Jorgensen SCJ, Orench-Benvenutti JM, Kebriaei R. 2023. Current therapies and challenges for the treatment of Staphylococcus aureus biofilm-related infections. Pharmacotherapy 43:816–832. DOI: https://doi.org/10.1002/phar.2806, PMID: 37133439

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Gosselin et al., develop a method to target protein activity using synthetic single-domain nanobodies (sybodies). They screen a library of sybodies using ribosome/ phage display generated against bacillus Smc-ScpAB complex. Specifically, they use an ATP hydrolysis deficient mutant of SMC so as to identify sybodies that will potentially disrupt Smc-ScpAB activity. They next screen their library in vivo, using growth defects in rich media as a read-out for Smc activity perturbation. They identify 14 sybodies that mirror smc deletion phenotype including defective growth in fast-growth conditions, as well as chromosome segregation defects. The authors use a clever approach by making chimeras between bacillus and S. pnuemoniae Smc to narrow-down to specific regions within the bacillus Smc coiled-coil that are likely targets of the sybodies. Using ATPase assays, they find that the sybodies either impede DNA-stimulated ATP hydrolysis or hyperactivate ATP hydrolysis (even in the absence of DNA). The authors propose that the sybodies may likely be locking Smc-ScpAB in the "closed" or "open" state via interaction with the specific coiled-coil region on Smc. I have a few comments that the authors should consider:

      Major comments:

      (1) Lack of direct in vitro binding measurements:

      The authors do not provide measurements of sybody affinities, binding/ unbinding kinetics, stoichiometries with respect to Smc-ScpAB. Additionally, do the sybodies preferentially interact with Smc in ATP/ DNA-bound state? And do the sybodies affect the interaction of ScpAB with SMC?

      It is understandable that such measurements for 14 sybodies is challenging, and not essential for this study. Nonetheless, it is informative to have biochemical characterization of sybody interaction with the Smc-ScpAB complex for at least 1-2 candidate sybodies described here.

      We agree with the reviewer that adding such data would be reassuring and that obtaining solid data using purified components is not trivial, even for a smaller selection of sybodies. We have now incorporated ELISA data as new Table S1, which shows that most sybodies support clear binding to Smc-ScpAB. Curiously, while (only) some sybodies show a clear preference for ATP-bound or unbound Smc, this is not a strong predictor of the strength of phenotype observed in vivo. We have also attempted to characterize the binding of Smc to sybodies by other methods including pull-downs, cross-linking, and by biophysical methods (GCI). However, we prefer to not include these data as the outcomes are not clear due to inconsistencies in the behaviour of purified sybodies.

      (2) Many modes of sybody binding to Smc are plausible

      The authors provide an elaborate discussion of sybodies locking the Smc-ScpAB complex in open/ closed states. However, in the absence of structural support, the mechanistic inferences may need to be tempered. For example, is it also not possible for the sybodies to bind the inner interface of the coiled-coil, resulting in steric hinderance to coiled-coil interactions. It is also possible that sybody interaction disrupts ScpAB interaction (as data ruling this possibility out has not been provided). Thus, other potential mechanisms would be worth considering/ discussing. In this direction, did AlphaFold reveal any potential insights into putative binding locations?

      We have attempted to map the binding by structure prediction, however, so far, even the latest versions of AlphaFold are not able to clearly delineate the binding interface that we have confidently identified by the mapping using chimeric proteins. Indeed, many ways of binding are possible, including disruption of ScpAB interaction. However, since the mapped binding sites are located on the SMC coiled coils, the later scenario seems unlikely and would be an indirect consequence of altered coiled coil configuration, consistent with our current interpretation.

      (3) Sybody expression in vivo

      Have the authors estimated sybody expression in vivo? Are they all expressed to similar levels?

      We have tagged selected sybodies with gfp and performed live cell imaging. This shows that sybodies without strong phenotypes are similarly expressed at least at low inducer concentration. Moreover, many sybodies localize as foci in the cell presumably by binding to Smc complexes loaded onto the chromosome at ParB/parS sites. We have included example data in the revised version of the manuscript as Figure S4 and Figure S5. Notably, a sybody (Sb007) with a weak growth phenotype shows focal localization at low inducer concentration and high expression levels when fully induced, comparable to sybodies with strong phenotypes. Altogether, this suggests that the lack of phenotype is not due to absence of sybody expression or localization.

      (4) Sybodies should phenocopy ATP hydrolysis mutant of Smc

      The sybodies were screened against an ATP hydrolysis deficient mutant of Smc, with the rationale that these sybodies would interfere this step of the Smc duty cycle. Does the expression of the sybodies in vivo phenocopy the ATP hydrolysis deficient mutant of Smc? Could the authors consider any phenotypic read-outs that can indicate whether the sybody action results in an smc-null effect or specifically an ATP hydrolysis deficient effect?

      As alluded to above, we think that our selection gave rise to sybodies that bind various, possibly multiple Smc conformations. Consistent with this idea, the phenotypes of sybody expression are similar to null mutant rather than the ATP-hydrolysis defective EQ mutant, which display even more severe growth phenotypes in B. subtilis. To highlight this point, we have added the following notes to the text:

      “These conditions favour ATP-engaged particles alongside the typically predominant ATP-disengaged rod-shaped state.”

      “ELISA data revealed that nearly all clones bind purified Smc-ScpAB (Table 1). However, the ELISA signals of only few Sybodies showed clear dependence on the presence or absence of ATP and DNA (Table S1).”

      Significance:

      Overall, this is an impressive study that uses an elegant strategy to find inhibitors of protein activity in vivo. The manuscript is clearly written and the experiments are logical and well-designed. The findings from the study will be significant to the broad field of genome biology, synthetic biology and also SMC biology. Specifically, the coiled coil domain of SMC proteins have been proposed to be of high functional value. The authors have elegantly identified key coiled-coil regions that may be important for function, and parallelly exhibited potential of the use of synthetic sybody/designed binders for inhibition of protein activity.

      Reviewer #2 (Public review):

      Summary:

      Structural Maintenance of Chromosome proteins (SMCs), a family of proteins found in almost all organisms, are organizers of DNA. They accomplish this by a process known as loop extrusion, wherein double-stranded DNA is actively reeled in and extruded into loops. Although SMCs are known to have several DNA binding regions, the exact mechanism by which they facilitate loop extrusion is not understood but is believed to entail large conformational changes. There are currently several models for loop extrusion, including one wherein the coiled coil (CC) arms open, but there is a lack of insightful experimentation and analysis to confirm any of these models. The work presented aims to provide much-needed new tools to investigate these questions: conformation-selective sybodies (synthetic nanobodies) that are likely to alter the CC opening and closing reactions.

      The authors produced, isolated, and expressed sybodies that specifically bound to Bacillus subtilis Smc-ScpAB. Using chimeric Smc constructs, where the coiled coils were partly replaced with the corresponding sequences from Streptococcus pneumoniae, the authors revealed that the isolated sybodies all targeted the same 4N CC element of the Smc arms. This region is likely disrupted by the sybodies either by stopping the arms from opening (correctly) or forcing them to stay open (enough). Disrupting these functional elements is suggested to cause the Smc-dependent chromosome organization lethal phenotype, implying that arm opening and closing is a key regulatory feature of bacterial Smc-ScpAB.

      Significance:

      The authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Reviewer #3 (Public review):

      Summary:

      Gosselin et al. use the sybody technology to study effects of in vivo inhibition of the Bacillus subtilis SMC complex. Smc proteins are central DNA binding elements of several complexes that are vital for chromosome dynamics in almost all organisms. Sybodies are selected from three different libraries of the single domain antibodies, using the "transition state" mutant Smc. They identify 14 such mutant sybodies that are lethal when expressed in vivo, because they prevent proper function of Smc. The authors present evidence suggesting that all obtained sybodies bind to a coiled-coil region close to the Smc "neck", and thereby interfere with the Smc activity cycle, as evidenced by defective ATPase activity when Smc is bound to DNA.

      The study is well done and presented and shows that the strategy is very potent in finding a means to quickly turn off a protein's function in vivo, much quicker than depleting the protein.

      The authors also draw conclusions on the molecular mode of action of the SMC complex. The provide a number of suggestive experiments, but in my view mostly indirect evidence for such mechanism.

      My main criticism is that the authors have used a single - and catalytically trapped form of SMC. They speculate why they only obtain sybodies from one library, and then only identify sybodies that bind to a rather small part of the large Smc protein. While the approach is definitely valuable, it is biassed towards sybodies that bind to Smc in a quite special way, it seems. Using wild type Smc would be interesting, to make more robust statements about the action of sybodies potentially binding to different parts of Smc.

      The reviewer reports (Rev. #1 and Rev. #3) made us realize that the manuscript text was misleading on the this point. Although we used the purified ATP hydrolysis–deficient Smc protein for sybody isolation, this is not expected to restrict the selection to a specific conformation. As described in detail in Vazquez-Nunez et al. (Figure 5), this mutant displays the ATP-engaged conformation only in a smaller fraction of complexes (~25% in the presence of ATP and DNA), consistent with prior in vivo observations reported by Diebold-Durand et al. (Figure 5). Rather than limiting the selection to a particular configuration, our aim was to reduce the prevalence of the predominant rod state in order to broaden the range of conformations represented during sybody selection. Consistent with this interpretation, only a small number of isolated sybodies show strong conformation-specific binding in the presence or absence of ATP/DNA, as observed by ELISA (now included in the manuscript). Notably, the effect size of ATP/DNA on ELISA signals was not a strong predictor to the strength of phenotypes observed in vivo. The text has been revised accordingly. See line 84 and line 92.

      We are thus quite confident based prior work (and on the now included ELISA data) that the Smc ATPase mutation did not strongly bias the selection in one way or another. The surprising bias towards coiled coil binding sites has likely other explanations, as they likely form a preferred epitope recognized by sybodies from the loop library.

      Line 105: Alternatively, the other libraries did not produce good binders or these sybodies were 106 not stably expressed in B. subtilis. This could be tested using Western blotting - I am assuming sybody antibodies are commercially available. However, this test is not important for the overall study, it would just clarify a minor point.

      While there are antibody fragments available to augment the size of sybodies (PMID: 40108246), these recognize 3D-epitopes and are thus not suited for Western blotting. We did not follow up on the negative results of two of the three libraries but would like to point out again that there are several biases that likely emerge for the same reason (bias to library, bias to coiled coil binding site). If correct, then sybodies are likely ineffective in inactivating Smc in B. subtilis, with the notable exceptions of the sybodies that we have isolated and characterized in this manuscript. We have added this notion to the manuscript.

      Fig. 2B: is odd to count Spo0J foci per cells, as it is clear from the images that several origins must be present within the fluorescent foci. I am fine with the "counting" method, as the images show there is a clear segregation defect when sybodies are expressed, I believe the authors should state, though, that this is not a replication block, but failure to segregate origins.

      We agree that this is an important point. We have added the following statement to clarify this point: “These elongated cells are known to harbour expanded nucleoids, consistent with delayed oriC separation rather than delayed DNA replication”

      Testing binding sites of sybodies to the SMC complex is done in an indirect manner, by using chimeric Smc constructs. I am surprised why the authors have not used in vitro crosslinking: the authors can purify Smc, and mass spectrometry analyses would identify sites where sybodies are crosslinked to Smc. Again, I am fine with the indirect method, but the authors make quite concrete statements on binding based on non-inhibition of chimeric Smc; I can see alternative explanations why a chimera may not be targeted.

      We have made several attempts of testing direct binding with mixed outcomes and decided to not include those results in the light of the stronger and more relevant in vivo mapping. However, we have added ELISA results (new Table S1) that support a direct interaction.

      Smc-disrupting sybodies affect the ATPase activity in one of two ways. Again, rather indirect experiments. This leads to the point Revealing Smc arm dynamics through synthetic binders in the discussion. The authors are quite careful in stating that their experiments are suggestive for a certain mode of action of Smc, which is warranted.

      In line 245, they state More broadly, the study demonstrates how synthetic binders can trap, stabilize, or block transient conformations of active chromatin-associated machines, providing a powerful means to probe their mechanisms in living cells. This is off course a possible scenario for the use of sybodies, but the study does not really trap Smc in a transient conformation, at least this is not clearly shown.

      We agree and have simplified the statement by removing “stabilize” and “transient”.

      Overall, it is an interesting study, with a well-presented novel technology, and a limited gain of knowledge on SMC proteins.

      We respectfully disagree with the last point, since our unique results highlight the importance of the Smc coiled coils. which are less well represented in the SMC literature (when compared to the heads and hinge domains for example), likely (at least in part) due the mild effect of single point mutations on coiled coil dynamics.

      Significance:

      The work describes the gaining and use of single-binder antibodies (sybodies) to interfere with the function of proteins in bacteria. Using this technology for the SMC complex, the authors demonstrate that they can obtain a significant of binders that target a defined region is SMC and thereby interfere with the ATPase cycle.

      The study does not present a strong gain of knowledge of the mode of action of the SMC complex.

      As pointed out above, we respectfully disagree with this assertion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Lumen formation is a fundamental morphogenetic event essential for the function of all tubular organs, notably the vertebrate vascular network, where continuous and patent conduits ensure blood flow and tissue perfusion. The mechanisms by which endothelial cells organize to create and maintain luminal space have historically been categorized into two broad strategies: cell shape changes, which involve alterations in apical-basal polarity and cytoskeletal architecture, and cell rearrangements, wherein intercellular junctions and positional relationships are remodeled to form uninterrupted conduits. The study presented here focuses on the latter process, highlighting a unique morphogenetic module, junction-based lamellipodia (JBL), as the driver for endothelial rearrangements.

      Strengths:

      The key mechanistic insight from this work is the requirement of the Arp2/3 complex, the classical nucleator of branched actin filament networks, for JBL protrusion. This implicates Arp2/3-mediated actin polymerization in pushing force generation, enabling plasma membrane advancement at junctional sites. The dependence on Arp2/3 positions JBL within the family of lamellipodia-like structures, but the junctional origin and function distinguish them from canonical, leading-edge lamellipodia seen in cell migration.

      Weaknesses:

      The study primarily presents descriptive observations and includes limited quantitative analyses or genetic modifications. Molecular mechanisms are typically interrogated through the use of pharmacological inhibitors rather than genetic approaches. Furthermore, the precise semantic distinction between JAIL and JBL requires additional clarification, as current evidence suggests their biological relevance may substantially overlap.

      We have previously analyzed the effects of different ve-cadherin (cdh5) mutant alleles on EC rearrangements (Paatero et al., 2018; Sauteur et al., 2014).These mutants show complex defects (e.g. hypersprouting, reduced contact inhibition during anastomosis) in EC behavior early in vascular tube formation. We find that analysis of JBL dynamics and function is very difficult in such situations. The use of small molecule inhibitors allows acute interventions within a defined time-window and to avoid pleiotropic effects of genetic ablations. We have expanded our discussion on the distinction between JAIL and JBL and hope that this will clarify why – in our opinion – these terms should be used differentially in different cell biological contexts (see below and lines 348-374 in the manuscript).

      Reviewer #2 (Public review):

      Summary:

      In Maggi et al., the authors investigated the mechanisms that regulate the dynamics of a specialized junctional structure called junction-based lamellipodia (JBL), which they have previously identified during multicellular vascular tube formation in the zebrafish. They identified the Arp2/3 complex to dynamically localize at expanding JBLs and showed that the chemical inhibition of Arp2/3 activity slowed junctional elongation. The authors therefore concluded that actin polymerization at JBLs pushes the distal junction forward to expand the JBL. They further revealed the accumulation of Myl9a/Myl9b (marker for MLC) at the junctional pole, at interjunctional regions, suggesting that contractile activity drives the merging of proximal and distal junctions. Indeed, chemical inhibition of ROCK activity decreased junctional mergence. With these new findings, the authors added new molecular and cellular details into the previously proposed clutch mechanism by proposing that Arp2/3-dependent actin polymerization provides pushing forces while actomyosin contractility drives the merging of proximal and distal junctions, explaining the oscillatory protrusive nature of JBLs.

      Strengths:

      The authors provide detailed analyses of endothelial cell-cell dynamics through time-lapse imaging of junctional and cytoskeletal components at subcellular resolution. The use of zebrafish as an animal model system is invaluable in identifying novel mechanisms that explain the organizing principles of how blood vessels are formed. The data is well presented, and the manuscript is easy to read.

      Weaknesses:

      While the data generally support the conclusions reached, some aspects can be strengthened. For the untrained eye, it is unclear where the proximal and distal junctions are in some images, and so it is difficult to follow their dynamics (especially in experiments where Cdh5 is used as the junctional marker). Images would benefit from clear annotation of the two junctions. All perturbation experiments were done using chemical inhibitors; this can be further supported by genetic perturbations.

      We have added annotations to several figures and paid particular attention to the proximal and distal junctions.

      We have previously analyzed the effects of different ve-cadherin (cdh5) mutant alleles on EC rearrangements (Paatero et al., 2018; Sauteur et al., 2014). These mutants show complex defects (e.g. hypersprouting, reduced contact inhibition during anastomosis) in EC behavior early in vascular tube formation. We find that analysis of JBL dynamics and function is very difficult in such situations. The use of small inhibitors allows acute interventions within a defined time-window and to avoid pleiotropic effects of genetic ablations.

      Reviewer #3 (Public review):

      The paper by Maggi et al. builds on earlier work by the team (Paatero et al., 2018) on oriented junction-based lamellipodia (JBL). They validate the role of JBLs in guiding endothelial cell rearrangements and utilise high-resolution time-lapse imaging of novel transgenic strains to visualise the formation of distal junctions and their subsequent fusion with proximal junctions. Through functional analyses of Arp2/3 and actomyosin contractility, the study identifies JBLs as localized mechanical hubs, where protrusive forces drive distal junction formation, and actomyosin contractility brings together the distal and proximal junctions. This forward movement provides a unique directionality which would contribute to proper lumen formation, EC orientation, and vessel stability during these early stages of vessel development.

      Time-lapse live imaging of VEC, ZO-1, and actin reveals that VEC and ZO-1 are initially deposited at the distal junction, while actin primarily localizes to the region between the proximal and distal sites. Using a photoconvertible Cdh5-mClav2 transgenic line, the origin of the VEC aggregates was examined. This convincingly shows that VE-cadherin was derived from pools outside the proximal junctions. However, in addition to de novo VEC derived from within the photoconverted cell, could some VEC also be contributed by the neighbouring endothelial cell to which the JBL is connected?

      Yes, the green (non-converted) VE-cadherin can indeed originate from either of the two cells. The main point we want to make, based on our observations, is that the red (converted) VE-cadherin from the proximal junction (as defined by the ROI) does not contribute to the distal junction.

      As seen for JAILs in cultured ECs, the study reveals that Arp2/3 is enhanced when JBLs form by live imaging of Arpc1b-Venus in conjunction with ZO-1 and actin. Therefore Arp2/3 likely contributes to the initial formation of the distal junction in the lamellopodium.

      Inhibiting Arp2/3 with CK666 prevents JBL formation, and filopodia form instead of lamellopodia. This loss of JBLs leads to impaired EC rearrangements.

      Is the effect of CK666 treatment reversible? Since only a short (30 min) treatment is used, the overall effect on the embryo would be minimal, and thus washing out CK666 might lead to JBL formation and normalized rearrangements, which would further support the role of Arp2/3.

      We have performed washout experiments and find that the ectopic filopodia disappear when the inhibitor is removed. This experiment is shown in supplementary Figure 3 and supplementary Movies 12 and 13.

      From the images in Figure 4d it appears that ZO-1 levels are increased in the ring after CK666 treatment. Has this been investigated, and could this overall stabilization of adhesion proteins further prevent elongation of the ring?

      This is an interesting thought and we haven take a closer look. There is quite a bit of sample-to-sample variation in the ZO1 signal. The quantification (Author response image 1) indicates that there is no increase in the CK666 treated embryos on average.

      Author response image 1.

      To explore how the distal and proximal junctions merge, imaging of spatiotemporal imaging of Myl9 and VEC is conducted. It indicates that Myl9 is localized at the interjunctional fusion site prior to fusion. This suggests pulling forces are at play to merge the junctions, and indeed Y 27632 treatment reduces or blocks the merging of these junctions.

      For this experiment, a truncated version of VEC was use,d which lacks the cytoplasmic domain. Why have the authors chosen to image this line, since lacking the cytoplasmic domain could also impair the efficiency of tension on VEC at both junction sites? This is as described in the discussion (lines 328-332).

      This line was used because it labels the entire JBL protrusion more clearly. We have also included an example using the VE-cad-Venus line (supplementary Figure 4b), which shows a Myl-Cherry pattern consistent with the other examples.

      Since the time-lapse movies involve high-speed imaging of rather small structures, it is understandable that these are difficult to interpret. Adding labels to indicate certain structures or proteins at essential timepoints in the movies would help the readers understand these.

      We have added annotations and labels to all movies. We have also improved annotations in several figures (i.e. Figs. 1, 2, 5, 6 and 7)

      Recommendations for the authors:

      Reviewing Editor Comments:

      Overall, the reviewers are supportive of the manuscript but identify a number of areas where the clarity of the presented data could be improved, and further quantification could be provided to strengthen your conclusions. We would encourage you to address these minor concerns as best you can and to consider the recommendations of all three reviewers when deciding how to revise your manuscript.

      Reviewer #1 (Recommendations for the authors):

      Lumen formation is a fundamental morphogenetic event essential for the function of all tubular organs, notably the vertebrate vascular network, where continuous and patent conduits ensure blood flow and tissue perfusion. The mechanisms by which endothelial cells organize to create and maintain luminal space have historically been categorized into two broad strategies: cell shape changes, which involve alterations in apical-basal polarity and cytoskeletal architecture, and cell rearrangements, wherein intercellular junctions and positional relationships are remodeled to form uninterrupted conduits. The study presented here focuses on the latter process, highlighting a unique morphogenetic module, junction-based lamellipodia (JBL), as the driver for endothelial rearrangements.

      JBL are described as oscillating membrane protrusions emerging at endothelial junctions, operating in a ratchet-like manner to mediate convergent cell movements. This ratchet mechanism allows endothelial cells to approach each other, thereby aligning and joining local luminal segments into a continuous vascular structure. The study employs in vivo high-resolution time-lapse imaging, a technically demanding method that captures spatiotemporal dynamics of cytoskeletal and adhesion complexes during JBL activity with unprecedented detail.

      The key mechanistic insight from this work is the requirement of the Arp2/3 complex, the classical nucleator of branched actin filament networks, for JBL protrusion. This implicates Arp2/3-mediated actin polymerization in pushing force generation, enabling plasma membrane advancement at junctional sites. The dependence on Arp2/3 positions JBL within the family of lamellipodia-like structures, but the junctional origin and function distinguish them from canonical, leading-edge lamellipodia seen in cell migration.

      An intriguing observation is that a novel junction arises at the distal pole of a JBL. This distal junction is formed from a pool of VE-cadherin that is spatially redistributed from regions outside the initial JBL domain. The distal junction then merges with the proximal junction through a process dependent on actomyosin contractility, as was judged by Myl9 recruitment.

      The alternation between pushing forces (Arp2/3-dependent JBL protrusion) and pulling forces (actomyosin-driven junction fusion) defines JBL as a bidirectional mechanical module. Inhibition of actomyosin prevents merging of proximal and distal junctions, thereby stalling lumen continuity. This two-phase system, actin-based extension followed by actomyosin-mediated constriction, ensures both elongation and maturation of endothelial arrangements, ultimately securing vascular patency.

      This manuscript represents a robust and thoughtfully executed study that advances our understanding of lumen formation during vascular development. The overarching conclusions are well substantiated, and the results section provides a clear and detailed exposition of the key findings. I appreciate the explanatory movie at the end. Nevertheless, I offer several remarks for further improvement:

      (1) The fluorescent images presented are visually compelling, yet lack quantitative analysis in the initial figure. Although quantification is included in Figure 3, it is advisable to incorporate this analysis into Figure 1 as well. Early presentation of quantification will help the reader to appreciate the impact and significance of the findings from the outset.

      We appreciate the reviewer’s suggestion and have now added line graphs to measure the spatiotemporal intensities of the Utrophin and ZO-1 reporters in Figure 1b. These measurements demonstrate the sequence of F-actin protrusion and subsequent junctional movement. In Figure 1a, we have added a double-headed arrow which shows the overall movement of the junction towards the dorsal side of the forming DLAV.

      (2) For the fluorescence images, further quantitative analysis of membrane overlap, either in terms of width or pixel overlap, would enhance the rigor of the study. Temporal quantification of overlap may provide valuable insights into the stability and reproducibility of the process across experimental replicates.

      JBL are quite heterogenous with respect to size, shape and dynamics, which makes quantifications of membrane overlap (JBL size) across experimental replicates difficult. We have published some quantifications on JBL orientation and oscillation in our previous paper (Paatero et al., 2018, Nat. comm. Figures 1 and 2), which are in agreement with our current study.

      (3) When referencing the role of Arp2/3, the authors employ an ArpC1b transgenic fish. The results section should thus specifically address the involvement of ArpC1b, rather than generalizing to Arp2/3. In the discussion, it would be appropriate to speculate on the potential involvement of the complete Arp2/3 complex. Notably, the use of CK is acknowledged as a broadly accepted inhibitor of actin polymerization.

      As ArpC1b is a subunit of an active Arp2/3 complex (Padrick et al., 2011), we have used an ArpC1b-Venus as a readout for Arp2/3 localization. The construct has been validated before in cell culture (Law et al., 2021) as well as in zebrafish (Malchow et al., 2024) and the spatiotemporal distribution of the reporter shown to be consistent with Arp2/3 complex. We are stating this in the results section (lines 173-178) and subsequently use the term Arp2/3 to facilitate reading of the text. In the corresponding figure legends, we are maintaining the term ArpC1b. CK666 interferes with the dimerization of Arp2 and Arp3 subunits and thus prevents activity of the Arp2/3 complex.

      (4) The discussion regarding JAIL versus JBL involvement remains challenging to interpret. If JAIL structures arise from the loss of cell-cell contacts, both JAIL and JBL resemble membrane protrusions and are likely governed by similar molecular mechanisms, predominantly actin polymerization and Arp2/3 activity, with probable contribution from Rac1 signaling. The precise semantic distinction between JAIL and JBL warrants further clarification, as their biological relevance may be overlapping.

      We agree with the reviewer. Below we outline the reasons why lamellipodial protrusions that emanate from cell-cell junctions should not be indiscriminately called JAIL, but that JAIL and JBL constitute different cellular activities acting in different tissue contexts. We have modified the text in the Discussion (lines 348-374).

      (1) JAIL have originally been described in cell culture experiments (Abu-Taha et al., 2014). According to this and subsequent papers by the same group, local dissolution of endothelial adherens junctions (i.e. downregulation of VE-cadherin) triggers the formation of lamellipodia-like structures. These protrusions eventually retract, followed by the reestablishment of EC junctions.

      (2) In our in vivo studies, we observed lamellipodial protrusions during endothelial cell rearrangements, and we call these structures JBL (Paatero et al., 2018). While JBL appear very similar to JAIL in general (i.e. regulation by Arp2/3 and its localization), we also observe two critical differences. For one, JBL form while maintaining the original (proximal) junction. Moreover, a distal junction is formed at the front edge of the JBL, leading to a “double junction” configuration. In our current manuscript, we have examined the role of actomyosin contractility and find that it correlates with and is required for the merging of proximal and distal junctions during JBL cycles. These observations indicates that the proximal and distal junctions are essential components of JBL function during endothelial cell elongation and rearrangements. These salient and distinct features prompted us to adopt the term junction-based-lamellipodia (JBL), in order to differentiate them from JAIL.

      (3) We like to argue that JAIL and JBL represent similar but different lamellipodia-like protrusions. JAILs are associated with the maintenance of endothelial integrity, and control permeability and trans-endothelial cell migration, as has been suggested by several publications (Cao et al., 2017; Kipcke et al., 2025; Seebach et al., 2021; Taha et al., 2014). In contrast, JBL drive cell rearrangements, by step-wise elongation of cell junctions leading to convergent cell movements.

      (4) Although JAIL have also been implicated in endothelial cell migration (Cao and Schnittler, 2019; Cao et al., 2017; Seebach et al., 2021), neither junctional patterns nor junctional dynamics have been analyzed in this context. We therefore propose that JAIL and JBL are actin-based protrusions forming at endothelial cell-cell junctions, but act in different contexts to provide cell motility (JBL) or endothelial integrity (JAIL).

      (5) Some of the quantification plots, specifically in figures 5d and 6c, do not display significant differences or distribution patterns. It would be beneficial to revise these graphs to clearly represent statistical significance and underlying data distributions.

      Because of the spatiotemporal heterogeneity, it is difficult to perform statistical quantifications across samples. In Figure 5c/d, we have imaged/analyzed myl9-EGFP in a mosaic situation, in which only one of interacting cells expresses high levels of myl9-EGFP. This is a rare situation and we managed to image only this example. Nevertheless, it is consistent with our other expression data of myl9-reporters and also with our previous photoconversion experiments using photoconvertible UCHD (Paatero et al., 2018, Figure 4), which shows that actin-rich JBL form at the front end of the endothelial cell in the direction of junction elongation. In Figure 5d, we have quantified the average intensity of GFP signal within the region of interest. The newly added error bars indicate the standard deviation between pixel intensities within the ROI.

      In Figure 6c, we have analyzed the Myl9b-mCherry intensities and find that it is redistributed during a JBL cycle. The spatial distribution is evident from the heat-map and we have not included a standard deviation. Myl9b-mCherry levels are very heterogenous and is not possible to quantify intensities across samples. We have, however, included four more examples of Myl9b-mCherry distribution in Supplementary Figure 4. The patterns observed in these samples are consistent with those in Figure 6.

      (6) The observation of myosin recruitment does not inherently imply a concomitant increase in actomyosin contractile activity. The inclusion of phospho-MLC staining would considerably strengthen the evidence for enhanced actomyosin activity.

      This is a good suggestion and we have extensively tried different anti-P-Myl antibodies (and protocols), but did not get them to work reliably on zebrafish embryos. We therefore rely on published work that has established the correlation between the recruitment of myosin light chain and increased actomyosin tension (Fernandez-Gonzalez et al., 2009; Munjal et al., 2015).

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1a is not described/mentioned in the Results.

      The have corrected this (lines 102-108). We have also added measurements to better present the different dynamics of F-actin (UCHD) and ZO1 within the JBL and the relative endothelial cell movements (see Figure 1b), as suggested by reviewer#1.

      (2) In Figure 3a, the authors claim that Arp2/3 is deposited at the distal side of the junction ring. While it is clear where the proximal junction is (ZO1-rich), the distal junction is less so (hardly any ZO1). It is therefore difficult to agree based on this time-lapse imaging that Arpc1b-Venus is at the distal junction. Can the authors please include panels showing merged channels and annotate where the proximal and distal junctions are?

      The activation of the Arp2/3 complex and the formation of the distal junction are sequential events. We see that ArpC1b oscillates with an accumulation at the onset and during JBL protrusion. In contrast, the distal junction is formed when the protrusive activity has been stopped. One caveat of the analysis shown in Figure 3a is that our ZO1 reporters label the distal junction only very weakly – this is in particular the case for the ZO1-tdTomato knock-in. The distal junction is better visible in VE-cadherin and UCHD reporters, as shown in Figures 5 to 7.

      (3) In Figures 3b and c, it is also difficult to distinguish proximal and distal junctions in these images. Please mark the boundaries in the image panels (Figure 3b) and indicate on the x-axis where the proximal and distal junctions are (Figure 3c).

      In Figure 3b, we show ArpC1b-Venus and mRuby-UCHD side-by-side. This Figure demonstrates that the Arp2/3 complex maintains its position at the front of the JBL during the protrusive phase (always distal to the UCHD signal). The imaging is done at very short intervals (1/30sec), which makes it difficult to follow entire oscillations due to photo-bleaching of the ArpC1b reporter.

      (4) The treatment of CK666 resulted in perturbed localization of Arpc1b-Venus. Therefore, the inhibition of junctional elongation can also be explained by the mislocalization of Arp2/3, rather than the inhibition of Arp2/3 activity at the junctions. Can the authors discuss this or perform another experiment that is more specific to manipulating Arp2/3 activity?

      CK666 is a well-established inhibitor of Arp2/3. Structural and functional analyses have shown that CK666 interferes with the interaction between Arp2 and Arp3, thereby preventing the activation of the complex (Hetrick et al., 2013; Padrick et al., 2011). We therefore conclude that the phenotypes we observe in CK666 treatment are due to Arp2/3 inhibition.

      It is possible that CK666 prevents ArpC1b binding to the Arp2/3 complex. However, published work suggests that ArpC1b can bind to Arp2/3 also in its inactive state (Chou et al., 2022). Thus, we can only speculate why we lose localization ArpC1b under CK666. We prefer not to do so.

      (5) In Figures 5d and 6c, is the quantification of Myl9 intensity of one cell only? If so, can the authors show the dynamics of average Myl9 intensity i) between forwarding and non-forwarding JBL poles and ii) as the proximal and distal junctions merge several endothelial cells?

      Figure 5c/d depicts two interacting cells, expressing different levels of Myl9a-EGFP. This is a rare experimental situation and we managed to image only this example. We quantified the average signal at both poles of the junctional ring within a region of interest. The newly added error bars indicate the standard deviation between pixel intensities within the ROI. The analysis has been done on immunofluorescent images, therefore a dynamic analysis over time is not possible.

      In Figure 6c, we have analyzed the Myl9b-mCherry intensities and find that it is redistributed during a JBL cycle. The spatial distribution is evident from the heat-map and we have not included a standard deviation. Myl9b-mCherry levels are very heterogenous and is not possible to quantify intensities across samples. We have, however, included four more examples of Myl9b-mCherry distribution in Supplementary Figure 4. The patterns observed in these samples are consistent with those in Figure 6.

      (6) Figure 5. The 'f' in the figure legend should be 'e' since there is no panel 'f'.

      We have corrected this.

      (7) Figure 7. As the boundaries for proximal and distal junctions are not always clear, especially when Cdh5 appears as clusters, how do you determine where the two junctions are in order to measure the interjunctional space? Please offer a clearer explanation in the Methods.

      We have added the following in the M&M. “Junctional merging tracking Speed of junctional merge was evaluated by monitoring isolated junctional rings during DLAV formation. Inhibitor treatment Y-27632 (75 μM) or DMSO (1%) were applied 30 min before mounting. The same concentrations of chemicals were applied to the low-melting-point agarose mounting medium and the E3 medium on top of it before imaging and imaging the junctions for 10-15 min on an Olympus SpinSR spinning disc microscope. Distances were measured using Fiji software. In each frame, the interjunctional distance was defined as the maximum distance between the proximal and distal junctions. A line was manually drawn between the proximal and distal junctions in Fiji, and its length was recorded. The same proximal and distal junction landmarks were used consistently across all time points.”

      (8) One would think that upon the inhibition of junctional mergence (by ROCK inhibition), actin polymerization would persist to push the distal junction forward to elongate the JBL. However, there is instead a decrease in junctional elongation (Figure 7b). Can the authors speculate why? Additionally, junction elongation can probably be achieved by continuous "pushing" of the distal junction alone (through actin polymerization). Can the authors speculate why there is a need/what is the benefit of merging proximal and distal junctions for junction elongation?

      These are all very interesting questions, but they are quite complex and would require extensive and speculative answers, which is outside the scope of this study. Nevertheless, here are a few quick thoughts on these issues.

      (1) When endothelial cells elongate, they have to overcome tensile forces at the junctions (generated by the subjunctional actomyosin belt). JBL are providing a tractive and deforming force, which overcomes the tensile force and thus promotes junctional elongation.

      (2) The distal junction is then providing an anchor to which the actin cytoskeleton can attach. The space between proximal and distal junction becomes a compartment of local actomyosin contraction, which provides the force for the ratchet to move the proximal junction forward  junctional mergence.

      (3) Thus, it is not the protrusion (pushing) itself that elongates the cell but the elongation of the junction (driven by actomyosin contraction)!

      (4) The maintenance of the proximal junction is most likely needed to ensure endothelial integrity during the JBL cycles.

      (5) How the frequency and the size of JBLs is regulated is not known. One possible player that might be involved is an internal clock mechanism (e.g. a feedback loop via small GTPases (such as Rac)  Arp2/3 regulation). Another possibility is that JBL size is limited by it sweeping up basally localized VE-cadherin (in cis-configuration). Increasing cell-cell adhesion (by VE-cad trans-interactions between the JBL and the underlying cell) eventually stop the protrusion. It is also possible that an cell-autonomously controlled mechanism of F-actin polymerization (actin pulses) are involved in the regulation of the JBC cycle length.

      (9) The animation showing the molecular mechanism of JBL function during endothelial junction elongation (Video 25) is very helpful in understanding the dynamic coupling between junctional proteins, actomyosin cytoskeleton, and junction remodelling. However, I wonder why there are no Myosin II proteins binding to the actin bundles during the merging of proximal and distal junctions (between 0:25 and 0:28), since this is one of the main findings by the authors in this study.

      Since we show two JBL cycles, we want to spread the information over both of them.

      References:

      Cao, J. and Schnittler, H. (2019). Putting VE-cadherin into JAIL for junction remodeling. J. Cell Sci. 132.

      Cao, J., Ehling, M., März, S., Seebach, J., Tarbashevich, K., Sixta, T., Pitulescu, M. E., Werner, A. C., Flach, B., Montanez, E., et al. (2017). Polarized actin and VE-cadherin dynamics regulate junctional remodelling and cell migration during sprouting angiogenesis. Nat. Commun. 8, 1–20.

      Chou, S. Z., Chatterjee, M. and Pollard, T. D. (2022). Mechanism of actin filament branch formation by Arp2/3 complex revealed by a high-resolution cryo-EM structure of the branch junction. Proc. Natl. Acad. Sci. U. S. A. 119, e2206722119.

      Fernandez-Gonzalez, R., Simoes, S. de M., Röper, J. C., Eaton, S. and Zallen, J. A. (2009). Myosin II Dynamics Are Regulated by Tension in Intercalating Cells. Dev. Cell 17, 736–743.

      Hetrick, B., Han, M. S., Helgeson, L. A. and Nolen, B. J. (2013). Small molecules CK-666 and CK-869 inhibit actin-related protein 2/3 complex by blocking an activating conformational change. Chem. Biol. 20, 701–712.

      Kipcke, J. P., Odenthal-Schnittler, M., Aldirawi, M., Franz, J., Bojovic, V., Seebach, J. and Schnittler, H. (2025). TNF-α induces VE-cadherin-dependent gap/JAIL cycling through an intermediate state essential for neutrophil transmigration. Front. Immunol. 16,.

      Law, A. L., Jalal, S., Pallett, T., Mosis, F., Guni, A., Brayford, S., Yolland, L., Marcotti, S., Levitt, J. A., Poland, S. P., et al. (2021). Nance-Horan Syndrome-like 1 protein negatively regulates Scar/WAVE-Arp2/3 activity and inhibits lamellipodia stability and cell migration. Nature Communications 2021 12:1 12, 5687-.

      Malchow, J., Eberlein, J., Li, W., Hogan, B. M., Okuda, K. S. and Helker, C. S. M. (2024). Neural progenitor-derived Apelin controls tip cell behavior and vascular patterning. Sci. Adv. 10, 1174.

      Munjal, A., Philippe, J. M., Munro, E. and Lecuit, T. (2015). A self-organized biomechanical network drives shape changes during tissue morphogenesis. Nature 524, 351–355.

      Paatero, I., Sauteur, L., Lee, M., Lagendijk, A. K., Heutschi, D., Wiesner, C., Guzmán, C., Bieli, D., Hogan, B. M., Affolter, M., et al. (2018). Junction-based lamellipodia drive endothelial cell rearrangements in vivo via a VE-cadherin-F-actin based oscillatory cell-cell interaction. Nat. Commun. 9,.

      Padrick, S. B., Doolittle, L. K., Brautigam, C. A., King, D. S. and Rosen, M. K. (2011). Arp2/3 complex is bound and activated by two WASP proteins. Proc. Natl. Acad. Sci. U. S. A. 108, E472–E479.

      Sauteur, L., Krudewig, A., Herwig, L., Ehrenfeuchter, N., Lenard, A., Affolter, M. and Belting, H. G. (2014). Cdh5/VE-cadherin promotes endothelial cell interface elongation via cortical actin polymerization during angiogenic sprouting. Cell Rep. 9, 504–513.

      Seebach, J., Klusmeier, N. and Schnittler, H. (2021). Autoregulatory “Multitasking” at Endothelial Cell Junctions by Junction-Associated Intermittent Lamellipodia Controls Barrier Properties. Front. Physiol. 11,.

      Taha, A. A., Taha, M., Seebach, J. and Schnittler, H. J. (2014). ARP2/3-mediated junction-associated lamellipodia control VE-cadherin-based cell junction dynamics and maintain monolayer integrity. Mol. Biol. Cell 25, 245–256.

    1. Reviewer #2 (Public review):

      Summary:

      The authors analyzed the temporal dynamics of gene expression patterns within the inflammatory response transcriptome following TNF stimulation, and proposed that the splicing rate of certain introns is a key mechanism of regulating mature mRNA expression rate.

      Strengths:

      The measurement strategy is generally well-designed to understand the core question of splicing rate and gene expression. The following computation analysis, as well as the mutation or repair studies, further supported the claims. The writing and presentation of the results are also generally clear and easy to follow. I think this manuscript will be of interest to a wide audience.

      Weaknesses: 

      I do have some questions regarding some of the results and conclusions, and I think either more analysis or more explanation and discussion can make the claims more solid. Please see below for details:<br /> <br /> (1) On the hybrid capture method and the RNA coverage results: The strategy of enriching for the last exon before sequencing does have significance in linking pre-mRNA and mature mRNA. If I understand correctly, this enriches for pre-mRNA molecules that are about to finish the full-length elongation of RNA polymerase. However, is this strategy biased towards measuring the splicing rate variation on introns closer to the 3-prime end? For example, if a gene takes 5 minutes for the RNA polymerase to elongate through the full length of the gene, for intron #1 that's very close to the 5' end, you can't tell if it takes 20s to be spliced out or 4 minutes, as both will show as fully spliced out in the sequencing library. In other words, for introns near the 5' end, a consistent "CoSI=1" pattern in the data doesn't necessarily suggest a true consistent fast splicing of that intron. Do you observe any general pattern of the measured "slowliness" in relation to the 5'-3' location of the introns? If so, should the 5' introns be specially considered or even excluded from certain analyses that use all introns?<br /> <br /> (2) Following on my last point, it may benefit the readers if the author can provide a more detailed comparison of possible sequencing library construction choices. For example, is it feasible to also enrich for other exons for the sequencing library, etc?<br /> <br /> (3) Figure 1C: Are there biological replicates, and should there be error bars and statistics on the plot? Similarly, in places like Figure 2, Supplemental Figure 4C, Supplemental Figure 6, etc., is there any statistical analysis that can be done to show if the claimed differences are statistically significant?<br /> <br /> (4) The logic behind measuring the half-lives of introns seems a little unclear to me.  From the time-dependent RNA coverage plots in Figure 2, it seems that, if we assume a constant transcription elongation rate, then the splicing rate of a specific intron can vary across time after TNF stimulation, as represented by the temporal change of CoSI values, or the heights of the coverage plot relative to neighboring exons. This means the splicing rate or half-life of an intron is not necessarily constant but may be time-dependent, at least in the case of TNF stimulation. Shouldn't the half-life measurements be designed in a way to measure the half-life at multiple time points after TNF stimulation? And maybe the measured half-lives of some introns will show as time-dependent?<br /> <br /> (5) In Supplemental Figure 6, the interpretation is a little confusing to me: If delayed splicing is causing delayed expression of the corresponding gene, shouldn't the non-immediate gene groups (early/intermediate/Late) have low CoSI beginning from the early time points (e.g. 4 minutes)? Why does the slowdown of splicing seem to peak at a later time point? Does it mean immediately after TNF stimulation, there's a different mechanism in delaying the expression of the non-immediate gene groups? Maybe it's better to have more explanation or use a different visualization to show what non-immediate gene groups are experiencing at very early time points.<br /> <br /> (6) On the fine-tuning of the deep sequence model: it's a little unclear whether the input and output are time-dependent. It's stated that expression at multiple time points is used for training, but it's unclear whether the model outputs time-dependent expression patterns and whether the time information is used as input.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study presents results supporting a model that tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the stem cell niche and inhibit the differentiation of neighboring cells. The valuable findings show that GSC tumors often contain non-mutant cells whose differentiation is suppressed by the GSC tumorous cells. However, the evidence showing that the GSC tumors produce BMP ligands to suppress differentiation of non-mutant cells is incomplete due to concerns about the new HCR data.

      Thanks for this assessment. All concerns raised by the reviewers regarding the HCR data and others are followed by our responses below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Fig. 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Fig. 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Fig. 2). They present data suggesting that in 73% of SGCs BMP signaling is low (assessed by dad-lacZ) (Fig. 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Fig. 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Fig. 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Fig. 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what in seen in the ovarian stem cell niche. This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Fig. 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Fig. 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Fig. 2). They present data suggesting that in 73% of SGCs BMP signaling is low (assessed by dad-lacZ) (Fig. 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Fig. 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Fig. 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Fig. 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what in seen in the ovarian stem cell niche.

      Strengths:

      (1) Use of an excellent and established model for tumorous cells in a stem cell microenvironment

      (2) Powerful genetics allow them to test various factors in the tumorous vs non-tumorous cells

      (3) Appropriate use of quantification and statistics

      Thank you for your valuable comments, and we greatly appreciate them.

      Weaknesses:

      (1) What is the frequency of SGCs in nos>flp; bam-mutant tumors? For example, are they seen in every germarium, or in some germaria, etc or in a few germaria.

      This concern was addressed in the rebuttal. The line number is 106, not line 103.

      (2) Does the breakdown in clonality vary when they induce hs-flp clones in adults as opposed to in larvae/pupae?

      This concern was addressed in the rebuttal. However, these statements are no on lines 331-335 but instead starting on line 339. Please be accurate about the line numbers cited in the rebuttal. They need to match the line numbers in the revised manuscript.

      We have rechecked the line numbers and confirmed that the mismatch arose from the Word-to-PDF conversion process on the eLife website. As this issue has recurred and reviewers’ file-format preferences are unknown to us, we have added a clarifying note at the beginning of each response letter: “Please note that the line numbers cited refer to the revised manuscript in the Microsoft Word format”.

      (3) Approximately 20-25% of SGCs are bam+, dad-LacZ+. Firstly, how do the authors explain this? Secondly, of the 70-75% of SGCs that have no/low BMP signaling, the authors should perform additional characterization using markers that are expressed in GSCs (i.e., Sex lethal and nanos).

      The authors did not perform additional staining for GSC-enriched protein like Sex lethal and nanos.

      The 70-75% of SGCs that have low BMP signaling display the following characteristics: 1) dot-like spectrosomes, 2) positivity for Dad-lacZ, and 3) absence of bamP-GFP expression. This combination of traits is sufficient to classify them as GSC-like cells. Neither Sex lethal nor Nanos is expressed exclusively in GSCs (Chau et al., 2009; Li et al., 2009), rendering them unsuitable for distinguishing GSC-like from cystoblast-like cells.

      (4) All experiments except Fig. 1I (where a single germarium with no quantification) were performed with nos-Gal4, UASp-flp. Have the authors performed any of the phenotypic characterizations (i.e., figures other than figure 1) with hs-flp?

      In the rebuttal, the authors stated that they used nos>flp for all figures except for Fig. 1I. It would be more convincing for them to prove in Fig. 1 than there is not phenoytpic difference between the two methods and then switch to the nos>FLP method for the rest of the paper.

      We appreciate this suggestion. These data are included in Figure 1-figure supplement 3 in the revised manuscript.

      (5) Does the number of SGCs change with the age of the female? The experiments were all performed in 14-day old adult females. What happens when they look at young female (like 2-day old). I assume that the nos>flp is working in larval and pupal stages and so the phenotype should be present in young females. Why did the authors choose this later age? For example, is the phenotype more robust in older females? or do you see more SGCs at later time points?

      The authors did not supply any data to prove that the clones were larger in 14-day-old flies than in younger flies. Additionally, the age of "younger" flies was not specified. Therefore, the authors did not satisfactorily answer my concern.

      We appreciate this critical comment. Figure 1J includes the SGC phenotype data from 1-, 7-, and 14-day-old flies. Both 1- and 7-day-old flies are younger flies in our analyses. The evidence that germline clones were larger in 14-day-old flies than in younger flies was provided in Figure 1-figure supplement 2 in the revised manuscript.

      (6) Can the authors distinguish one copy of GFP versus 2 copies of GFP in germ cells of the ovary? This is not possible in the Drosophila testis. I ask because this could impact on the clonal analyses diagrammed in Fig. 4A and 4G and in 6A and B. Additionally, in most of the figures, the GFP is saturated so it is not possible to discern one vs two copies of GFP.

      In the rebuttal, the authors stated that they cannot differential one vs two copies of GFP. They used other clone labeling methods in Fig. 4 and 6. I think that the authors should make a statement in the manuscript that they cannot distinguish one vs two copies of GFP for the record.

      Thank you for this suggestion. Such statement has been added in the revised manuscript (Lines 177-178).

      (7) More evidence is needed to support the claim of elevated Dpp levels in bam or bgcn mutant tumors. The current results with dpp-lacZ enhancer trap in Fig 5A,B are not convincing. First, why is the dpp-lacZ so much brighter in the mosaic analysis (A) than in the no-clone analysis (B); it is expected that the level of dpp-lacZ in cap cells should be invariant between ovaries and yet LacZ is very faint in Fig. 5B. I think that if the settings in A matched those in B, the apparent expression of dpp-lacZ in the tumor would be much lower and likely not statistically significantly. Second, they should use RNA in situ hybridization with a sensitive technique like hybridization chain reactions (HCR) - an approach that has worked well in numerous Drosophila tissues including the ovary.

      The HCR FISH in Fig.5 of the revised manuscript needs an explanation for how the mRNA puncta were quantified. Currently, there is no information in the methods. What is meant but relative dpp levels. I think that the authors should report in and unbiased manner "number" of dpp or gbb puncta in TFs. For the germaria, I think that they should report the number of puncta of dpp or gbb divide by the total area in square pixels counted. Additionally, the background fluorescence is noticeably much higher in bamBG/delta86 germaria, which would (falsely) increase the relative intensity of dpp and gbb in bam mutants. Although, I commend the authors for performing HCR FISH, these data are still not convincing to me.

      We appreciate these critical comments. Due to variable puncta sizes and frequent clustering in TF and cap cells (see Figure 5A, C), direct quantification of puncta number was unreliable. Therefore, we quantified mean fluorescence intensity instead, as described in the revised figure legend of Figure 5 (Lines 603-604). In Author response image 1 1A, B (modified from Figure 5A, C) , magenta ovals indicate empty background fluorescence areas, which appear similar between w<sup>1118</sup> (wild-type control) and bam<sup>-/-</sup> germaria. In Author response image 1, the yellow oval outlines a neighboring germarium, not an empty area (see the DAPI channel).

      Author response image 1.

      In situ-HCR results of dpp and gbb in wild-type and bam mutant germaria. Magenta ovals indicate empty areas displaying only background fluorescence. In panel (B), the yellow oval outlines a neighboring germarium, not an empty area (see the DAPI channel below).

      (8) In Fig 6, the authors report results obtained with the bamBG allele. Do they obtain similar data with another bam allele (i.e., bamdelta86)?

      The authors did not try any experiments with the bamdelta86 allele, despite this allele being molecularly defined, where the bamBG allele is not defined.

      While we agree that repeating the experiments in Figure 6 with bam<sup>Δ86</sup> would be helpful, our mosaic analysis strategy for two genes on different chromosome arms is technically complex (see genotypes in Source data 1). Switching from bam<sup>BG</sup> to bam<sup>Δ86</sup> would necessitate extensive and time-consuming genetic recombination. Given that both alleles induce the SGC phenotype indistinguishably (Figure 1J), we believe that repeating these experiments with bam<sup>Δ86</sup> would not alter our key conclusion. We appreciate your understanding regarding this technical complexity.

      Reviewer #2 (Public review):

      In the current version, Zhang et al. have made substantial improvements to the manuscript. It is now easier to read, and the data are more solid compared with the previous version, supporting their conclusion that tumor GSCs secrete stemness factors (BMPs and Dpp) to suppress the differentiation of neighboring wild-type GSCs. This study should benefit a broad readership across developmental biology, germ cell biology, stem cell biology, and cancer biology.

      Thank you for your valuable comments, and we greatly appreciate them.

      However, the following suggestions may further improve the clarity and rigor of the research content:

      (1) Clarification of sample size (n).

      Each germarium can contain highly variable numbers of SGCs, sometimes reaching 50-100. When reporting "n" values, the authors are encouraged to also indicate the number of germaria analyzed. For example, in lines 126-128:

      "Notably, 74% of SGCs (n = 132) were GFP-negative, while the remaining 26% were GFP-positive (Figure 2B, C). This suggests that SGCs can be categorized into two distinct groups: those resembling GSCs (GSC-like) and those resembling cystoblasts (cystoblast-like)." Please clarify how many germaria were examined to obtain n = 132.

      We appreciate this comment. In 14-day-old fly ovaries, each germarium that met our criterion for quantifying the SGC phenotype contains approximately 1.5 SGCs (see Figure 1K). For the specific analysis of the “132” SGCs presented in Figure 2C, we did not record the number of germaria from which they originated.

      In addition, it is unclear whether the authors intend to suggest that the GFP-negative SGCs are GSC-like or cystoblast-like; this point should be clarified.

      Thank you for this suggestion. We intend to suggest that the bamP-GFP-negative SGCs are GSC-like, which information has been added in the revised manuscript (Line 129).

      (2) Improvement of Fig. 6 in situ hybridization images.

      The in situ hybridization images in Fig. 6 are not fully convincing. The control images, in particular, would benefit from higher resolution and enlarged views of the germarium region.

      Thank you for this valuable suggestion. The enlarged views of both the control and bam<sup>-/-</sup> germarium regions were included in Figure 5A, C in the revised manuscript.

      In panel C, abundant signals are also present outside the germarium, which may complicate interpretation and should be clarified or controlled for.

      In the right panel of Figure 5C, the abundant signals noted by the reviewer originate from neighboring germaria (see the DAPI channel), not from empty areas, which would be expected to show only background fluorescence. For more details, please refer to our response to Question (7) raised by Reviewer #1.

      Alternatively, the authors could strengthen the in situ analysis by using bam mutants or bam dpp / bam gbb double mutants as controls to better define signal specificity.

      We appreciate this comment. Homozygous dpp or gbb mutants are lethal, precluding the generation of dpp bam or gbb bam double-mutant flies. Additionally, the GFP signal was drastically reduced during our HCR processing, preventing mosaic clone analysis.

      Reviewer #3 (Public review):

      Zhang et al. investigated how germline tumors influence the development of neighboring wild-type (WT) germline stem cells (GSC) in the Drosophila ovary. They report that germline tumors generated by differentiation-arrested mutations (bam and bgcn) inhibit the differentiation of neighboring WT GSCs by arresting them in an undifferentiated state, resulting from reduced expression of the differentiation-promoting factor Bam. They find that these tumor cells produce low levels of the niche-associated signaling molecules Dpp and Gbb, which suppress bam expression and consequently inhibit the differentiation of neighboring WT GSCs non-cell-autonomously. Based on these findings, the authors propose that germline tumors mimic the niche to suppress the differentiation of the neighboring wild-type germline stem cells.

      Strengths:

      The study uses a well-established in vivo model to address an important biological question concerning the interaction between germline tumor cells and wild-type (WT) germline stem cells in the Drosophila ovary. If the findings are substantiated, this study could provide valuable insights that are applicable to other stem cell systems.

      Thank you for your valuable comments, and we greatly appreciate them.

      Weaknesses:

      The authors have addressed some of my concerns in the revised submission. However, the data presented do not allow the authors to distinguish whether the failed differentiation of WT stem cells/germline cells results from "arrested differentiation due to the loss of the differentiation niche" or from "direct inhibition by tumor-derived expression of niche-associated molecules Dpp and Gbb".

      Blocking Dpp or Gbb secretion specifically from germline tumor cells promoted differentiation of neighboring wild-type germ cells (Figure 6). This indicates that BMP ligands secreted by germline tumors are required to inhibit this differentiation. However, we cannot rule out the possibility that disruption of the differentiation niche also contributes to the SGC phenotype, a point highlighted in the manuscript (Line 204).

      The critical supporting data, HCR in situ results, are not sufficiently convincing.

      Below, we provide a point-by-point reply addressing each of your specific recommendations.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      It's a surprising that the authors failed to induce germline tumors at the adult stage, as this has been reported by many labs and would allow for time course analysis of SGC phenotype. As a result, the data in this manuscript address only events occurring after the germline tumor formation (with clonal induction at larval stage) and and focus on the already presene "arrested wild-type germ cells", without providing insight into the process of by which these arrested germ cells are formed.

      In our hands, inducing germline clones by the hs-FLP method at the adult stage was efficient in males but not in females, despite subjecting adult flies to intensive heat-shock at 37°C.

      The HCR in situ data exhibit a high background.

      Regarding the background issue, please see our response to Reviewer #1’s Question (7).

      First, the signal appears stronger in TF cells than in cap cells.

      As demonstrated by Li et al. (Li et al., 2016), dpp-lacZ (P4-lacZ) signals are also stronger in TF cells than in cap cells (see their Figure 4D').

      Second, both dpp and gbb are detected broadly in somatic cells including escort cells. These observations are inconsistent with published data.

      As shown in Figure 5A and C, dpp and gbb were detected broadly in somatic cells of bam<sup>-/-</sup> germaria, but not in those of w<sup>1118</sup> (wild-type) controls. To our knowledge, no previous study has reported the expression pattern of these ligands in a bam mutant background.

      To demonstrate the tumor-derived dpp and gbb, the HCR in situ analysis could be performed in the germarium with mosaic clones. If these niche-associated molecules are indeed expressed in tumor cells, the authors should observe a mosaic expression pattern of these molecules, with signal "ON" in tumor cells and "OFF" in neighbouring arrested germ cells.

      This is a great idea and was indeed our original approach. However, GFP signal was drastically reduced during our HCR processing, ultimately precluding mosaic clone analysis.

      References

      Chau, J., Kulnane, L.S., and Salz, H.K. (2009). Sex-lethal facilitates the transition from germline stem cell to committed daughter cell in the Drosophila ovary. Genetics 182, 121-132.

      Li, X., Yang, F., Chen, H., Deng, B., Li, X., and Xi, R. (2016). Control of germline stem cell differentiation by Polycomb and Trithorax group genes in the niche microenvironment. Development 143, 3449-3458.

      Li, Y., Minor, N.T., Park, J.K., McKearin, D.M., and Maines, J.Z. (2009). Bam and Bgcn antagonize Nanos-dependent germ-line stem cell maintenance. Proc Natl Acad Sci U S A 106, 9304-9309.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors Hall et al. establish a purification method for snake venom metalloproteinases (SVMPs). By generating a generic approach to purify this divergent class of recombinant proteins, they enhance the field's accessibility to larger quantities of SVMPs with confirmed activity and, for some, characterized kinetics. In some cases, the recombinant protein displayed comparable substrate specificity and substrate recognition compared to the native enzyme, providing convincing evidence of the authors' successful recombinant expression strategy. Beyond describing their route towards protein purification, they further provide evidence for self-activation upon Zn2+ incubation. They further provide insights on how to design high-throughput screening (HTS) methods for drug discovery and outline future perspectives for the in-depth characterization of these enzyme classes to enable the development of novel biomedical applications.

      Strengths:

      The study is well-presented and structured in a compelling way. The purification strategy results in highly pure protein products, well characterized by size exclusion chromatography, SDS page as well as confirmed by mass spectrometry analysis. Further, a significant portion of the manuscript focuses on enzyme activity, thereby validating function. Particularly convincing is the comparability between recombinant vs. native enzymes; this is successfully exemplified by insulin B digestion. By testing the fluorogenic substrate, the authors provide evidence that their production method of recombinant protein can open up possibilities in HTS. Since their purification method can be applied to three structurally variable SVMP classes, this demonstrates the robust nature of the approach.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      The universal applicability of the approach could be emphasized more clearly. The potential for this generic protocol for recombinant SVMP zymogen production to be adapted to other SVMPs is somewhat obscured by the detailed optimization steps. A general schematic overview would strengthen the manuscript, presented as a final model, to illustrate how this strategy can be extended to other targets with similar features. Such a schematic might, for example, outline the propeptide fusion design, including its tags, relevant optimizations during expression, lysis, purification (e.g., strategies for metal ion removal and maintenance of protease inactivity), as well as the controllable auto-activation.

      In the revised version of the manuscript, we moved the detailed description of the optimisation of SVMP expression, including mature SVMP expression, Marimastat addition, active site mutations and fusion of propeptides, into the supplement as supplementary text. We hope this improves the clarity and flow. As suggested, we now include a new figure outlining the SVMP production strategy and optimisation steps in the revised manuscript (new Figure S1).

      The product obtained from the purification protocol appears to be a heterogeneous mixture of selfactivated and intact protein species. The protocol would benefit from improved control over the selfactivation process. The Methods section does not indicate whether residual metal ions were attempted to be removed during the purification, which could influence premature activation.

      We agree that improved control of self-activation would be desirable. However, there is an issue: Previous studies reported that (1) SVMP zymogens are processed within secretory cells of the venom gland (Portes-Junior et al., 2014), and (2) mature SVMPs accumulate in secretory vesicles during venom production (Carneiro et al., 2002). Accordingly, preventing the auto-processing of SVMP zymogens is difficult to achieve because this would require Zn<sup>2+</sup> depletion within the insect cells during production which would result in cytotoxicity. We have included this information in the updated Discussion section of the revised manuscript.

      Additionally, it has not been discussed whether the shift to pH 8 in the purification process is necessary from the initial steps onwards, given that a lower pH would be expected to maintain enzyme latency.

      The shift to pH 8 is required for the affinity purification of the SVMP zymogens from the medium, involving the poly-histidine-tag and immobilized metal affinity chromatography (IMAC). At lower pH, the histidines would become protonated, preventing binding of the His-tag to the column. Thus, with the His-tag the shift to pH 7.5 or pH 8 is necessary.

      The characterization of PIII activity using the fluorogenic peptide effectively links the project to its broader implications for drug design. However, the absence of comparable solutions for PI and PII classes limits the overall scope and impact of the finding.

      We agree that such assays would be extremely useful. However, the development of fluorescence based high-throughput assays to test for PI and PII SVMP activity is beyond the scope of this study. Here, our overarching objective is to report a broadly applicable production method for PI, PII and PIII SVMPs.

      Overall, the authors successfully purified active SVMP proteins of all three structurally diverse classes in high quality and provided convincing evidence throughout the manuscript to support their claims. The described method will be of use for a broader community working with self-activating and cytotoxic proteases.

      Thank you.

      Reviewer #2 (Public review):

      Summary:

      The aim of the study by Hall et al. was to establish a generic method for the production of Snake Venom Metalloproteases (SVMPs). These have been difficult to purify in the mg quantities required for mechanistic, biochemical, and structural studies.

      Strengths:

      The authors have successfully applied the MultiBac system and describe with a high level of detail the downstream purification methods applied to purify the SVMP PI, PII, and PIII. The paper carefully presents the non-successful approaches taken (such as expression of mature proteins, the use of protease inhibitors, prodomain segments, and co-expression of disulfide-isomerases) before establishing the construct and expression conditions required. The authors finally convincingly describe various activity assays to demonstrate the activity of the purified enzymes in a variety of established SVMP assays.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      The manuscript suffers from a lack of bottoming out and stringent scientific procedures in the methodology and the characterization of the generated enzymes.

      As an example, a further characterization of the generated protein fragments in Figure 3 by intact mass spectroscopy would have aided in accurate mass determination rather than relying on SEC elution volumes against a standard. Protein shape and charge can affect migration in SEC.

      We agree that intact MS would be useful to determine the mass of the produced SVMPs. In this manuscript, we performed SEC as a purification step, removing aggregates. Furthermore, SEC allowed determining if the SVMPs form monomers or dimers. MS characterisation of intact SVMPs (and their PTMs) is not trivial and beyond the scope of this manuscript (see below).

      Also, the analysis of N-linked glycosylation demonstrates some reactivity of PIII to PNGase F, but fails to conclude whether one or more sites are occupied, or whether other types of glycosylation is present. Again, intact mass experiments would have resolved such issues.

      We concur that glycosylation of SVMPs is an important question. However, analysing the glycosylation of the SVMPs is beyond the scope of this manuscript; it is actually a project on its own: Intact MS can indeed provide information on glycosylation but is not very precise. Unambiguous assignment of the number and occupancy of glycosylation sites is more challenging, especially for large, glycosylated proteins such as our PIII SVMP zymogen. In practice, confident mapping of glycosylation sites would require peptide-level mass spectrometry following enzymatic digestion (Trypsin and Multi-Enzymatic Limited Digestion, ideally). Sample preparation, method optimization, MS acquisition, and data analysis together would require a significant investment. Moreover, we do not have access to the native PIII SVMP from Echis carinatus sochureki venom - this is the main point of our manuscript: we describe a protocol to produce SVMPs which could not be purified from venom. Therefore, a comparison of the glycosylation of the recombinant SVMP and the native SVMP cannot be performed unfortunately (see below).

      The activity assays in Figure 4 are not performed consistently with kinetic assays and degradation assays performed for some, but not all, enzymes, and there is no Echis ocellatus comparison in Figure 4h.

      This is correct. The suggested control experiment is not possible for the PII SVMP and PIII SVMP because we cannot purify the native PII and PIII SVMPs from Echis venom. We have highlighted this information in the revised manuscript in the insulin B degradation section.

      Overall, whilst not affecting the main conclusion, this leaves the reader with an impression of preliminary data being presented. For consistency, application of the same assays to all enzymes (high-grade purified) would have provided the reader with a fuller picture.

      In the revised manuscript, we included new data showing the requested characterisations of all three SVMPs.

      We have included the respective assays in Figure 5 and Supplementary Figure S11. In the original manuscript, we had omitted these assays as the data show no enzymatic activity in the respective assays. Specifically, we show that (1) PII does not cause insulin B degradation (Fig. S11b), (2) that the PI and PII SVMPs do not degrade the fluorogenic peptide which is prototypic for PIII SVMPs and MMPs (Fig. S11a), (3) PI and PIII do not cause platelet aggregation because they lack the entire disintegrin domain (PI) or the RGD motif (PIII) (Fig. 5a), and (4) that the PI and PII SVMPs, like the PIII SVMP, are not pro-coagulant and do not cause blood clotting (Fig. 5d,5e and Fig. S11c). We also included this new information in the main text of our revised manuscript.

      Overall, the data presented demonstrates a very credible path for the production of active SVMP for further downstream characterization. The generality of the approach to all SVMP from different snakes remains to be demonstrated by the community, but if generally applicable, the method will enable numerous studies with the aim of either utilizing SVMPS as therapeutic agents or to enable the generation of specific anti-venom reagents, such as antibodies or small molecule inhibitors.

      Thank you.

      Reviewer #3 (Public review):

      Summary:

      The presented study describes the long journey towards the expression of members' SVMP toxins from snake venom, which are toxins of major importance in a snakebite scenario. As in the past, their functional analysis relied on challenging isolation; the toxins' heterologous expression offers a potential solution to some major obstacles hindering a better understanding of toxin pathophysiology. Through a series of laborious and elegantly crafted experiments, including the reporting of various failed attempts, the authors establish the expression of all three SVMP subtypes and prove their activity in bioassays. The expression is carried out as naturally occurring zymogens that autocleave upon exposure to zinc, which is a novel modus operandi for yielding fusion proteins and sheds also some new light on the potential mechanism that snakes use to activate enzymatic toxins from zymogenic preforms.

      Strengths:

      The manuscript draws from an extensive portfolio of well-reasoned and hypothesis-driven experiments that lead to a stepwise solution. The wetlands data generated is outstanding, although not all experiments along this rocky road to victory were successful. A major strength of the paper is that, translationally speaking, it opens up novel routes for biodiscovery since a first reliable platform for expression of an understudied, yet potent toxin class is established. The discovered strategy to pursue expression as zymogens could see broad application in venom biotechnology, where several toxin types are pending successful expression. The work further provides better insights into how snake toxins are processed.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      The manuscript contains several chapters reporting failed experiments, which makes it difficult to follow in places.

      Based on a similar comment of Reviewer 1, we now moved the ‘failed’ experiments reporting on SVMP expression optimisation to the supplement as new supplementary text. We hope that the revisions have improved the clarity and overall readability of our manuscript.

      The reporting of experimental details, especially sample sizes and replicates, could be optimised.

      The number of replicates has now been added to the figure legends in the revised manuscript. Detailed experimental information is found in the revised Methods part.

      At the time of writing, it remains unclear whether the glycosilations detected at a pIII SVMP could have an impact on the bioactivities measured, which is a major aspect, and future follow-ups should clarify this.

      A detailed analysis of glycosylation of the PIII SVMP is beyond the scope of our manuscript (see above, response to Reviewer 2). Our manuscript describes a generic protocol to produce active SVMPs. Importantly, we cannot purify the native PIII SVMP from Echis carinatus sochureki venom. Therefore, it is not possible to compare our PIII SVMP with the native PIII SVMP.

      We agree that this is an important question, and we will aim in the future to perform such a comparison of a different insect cell-produced PIII with a native PIII SVMP that can be readily purified from venom.

      Finally, the work, albeit of critical importance, would benefit from a more down-to-earth evaluation of its findings, as still various persistent obstacles that need to be overcome.

      We consider cytotoxicity to be the principal bottleneck in SVMP production. In this study, we present a strategy to overcome this bottleneck.

      Major comments to the manuscript:

      (1) Lines 148-149: "indicating that expressing inactivated SVMPs could be a viable, although inefficient, approach". I think this text serves a good purpose to express some thoughts on the nature of how the current draft is set up. It is quite established that various proteases cause extreme viability losses to their expression host (whether due to toxicity, but surely also because of metabolic burden), which is why their expression as inactive fusion proteins is the default strategy in all cases I have thus far seen. I believe that, especially in venom studies, this is of importance given the increased toxicity often targeting cellular integrity, and especially here, because Echis are known to feed on arthropods at younger life history stages, making it very likely that some venom components are especially active against insects and other invertebrates. With that in mind, I would argue that exploring their production in inactive form is the obvious strategy one would come up with and not really the conclusion of a series of (well-conducted and scientifically sound!) experiments. For me, the insight of inactive expression is largely confirmatory of what is established, unless I miss something in the authors' rationale. If yes, it would be important to clarify that in the online version.

      We agree that producing zymogens represents a straightforward strategy and now, in hindsight, would have wished we had tested this first thing, it would have saved us and apparently many others significant effort. However, realising this, and implementing this approach took us considerable time and insight as we described in this manuscript. The alternative strategies we describe in the manuscript, in particular the use of inhibitors and active-site mutation, have been successfully applied for recombinant production of diverse enzymes before, including enzymes that are toxic to host cells.

      We have revised the manuscript as requested and moved the optimisation of SVMP expression to the Supplement. We hope this improved the clarity, overall readability of the text and thus addressed the reviewer’s comment.

      (2) Line 173: Here, Alphafold 3 was used, whereas in previous sections (e.g., line 153, line 210), it was Alphafold 2. I suggest using one release across the manuscript.

      Thank you for bringing this to our attention. In the revised version of the manuscript, we clarified that all models were generated using AlphaFold 3.

      (3) Line 252-254: I fully agree, the PIII SVMP is glycosylated. Glycosylation is an important mediator of snake venom activity, and several works have described their importance in the field. This raises the question, which glycosylations have been introduced here in the SVMP, and to verify that these are glycosylations that belong to those found in snakes. This is important as insects facilitate thousands of N- and O- O-glycosylations to modulate the activity of their proteome, of which many are specific to insects. If some of these were integrated into the SVMP, this could have an impact on downstream produced bioassays and also antigenicity (the surface would be somewhat different from natural toxins, causing different selection).

      We agree that glycosylation is important and warrants a follow-up in the future.

      However, most publications we found reported that de-glycosylation has a negative effect on stability and solubility of SVMPs, which is expected to have a knock-on effect on toxin activity (e.g. AndradeSilva et al., 2025; DOI: 10.1021/acs.jproteome.5c00249). It will be difficult to separate the two effects from each other. We found only a few examples where SVMP glycosylation (sialylation and Nglycosylation) modulated proteolytic and haemorrhagic functions, including interaction with substrates such as e.g. fibrinogen (Schluga et al., 2024; https://doi.org/10.3390/toxins16110486; Chen et al., 2008; 10.1111/j.1742-4658.2008.06540.x; Nikai et al., 2000; DOI: 10.1006/abbi.2000.1795. PMID: 10871038). In our manuscript, we show that our PIII SVMP is very cytotoxic and highly active in casein, fibrinogen and ESO10 degradation assays, with a K<sub>M</sub> and k<sub>cat</sub>/K<sub>M</sub> comparing favourably with other SVMPs and MMPs. We are not aware of a specific substrate for this particular PIII SVMP that depends on a distinct glycosylation pattern. Recombinant production of such SVMPs with specific glycosylation pattern requirement would be a challenge in all commonly used expression systems (yeast, plant, insect cells and mammalian cells). In fact, insect cell expression systems could be advantageous in this respect because the Sf21 and High Five (Hi5) lepidopteran cell lines we utilised are well-characterized for their ability to perform posttranslational modifications on complex secreted proteins:

      (1) N-Glycan conservation: Both Sf21 and Hi5 cells typically produce N-glycans that are trimmed to a core 'paucimannose' structure (Man3GlcNAc2), often with an alpha1,6-fucosylation. While snakes can produce more complex, sialylated N-glycans, glycomic studies of native venoms (e.g., Bothrops venom) have demonstrated that high-mannose and paucimannose structures are also prevalent in native SVMPs. Therefore, the recombinant glycoforms produced in our system are not 'unnatural' in the snake venom context but rather represent a subset of the native glycan microheterogeneity.

      (2) Occupancy vs structure: The critical function of glycosylation in PIII SVMPs is thought to be often structural, facilitating correct folding and protecting the large metalloprotease and disintegrin-like domains from proteolytic degradation. Because Sf21 and Hi5 cells recognize the same Nglycosylation sequon (Asn-X-Ser/Thr) as reptilian cells, the site-occupancy remains consistent with the native protein, preserving the overall topography of the toxin.

      (3) Activity and authentic self-processing: We acknowledge that insect-specific alpha1,3-fucosylation can occur in Hi5 cells and is potentially antigenic. As the recombinant SVMPs will be used for binder selections and for testing in silico designed binders, useful binders will be selected based on neutralising activity against venom toxins. Here, our assays focused on auto-activation and proteolytic activity, which is primarily driven by the catalytic Zn<sup>2+</sup>-site and the protein backbone.

      As stated above, analysis of glycosylation pattern of the PIII SVMP is a project on its own and beyond the scope of this manuscript.

      We have incorporated some of the above information into the discussion section of the revised manuscript to clarify that insect cell glycosylation does not recapitulate the full diversity of SVMP glycosylation observed in native venoms.

      (4) General comment for the bioassays: It would be good to specify the replicates again and report the data, including standard deviations.

      We included this information in the figure legends.

      Discussion:

      I think the data generated in the study is very valuable and will be instrumental for pushing the frontiers in SVMP research, but still I would like to see a bit of modesty in their discussion. As I have pointed out above, it is unclear which effect the glycosilations may have (i.e., are the glycosilations found reminiscent of natural ones?), despite their being functionally important. Also, yes, isolation of SVMPs is challenging, but the reality is that their expression is equally challenging, as evidenced by the heaps of presented negative data (with which I have no problems, I think reporting such is actually important). So far, the "generic" protocol has been used to express one member per structural class of Echis SVMP, but no evidence is provided that it would work equally well on other members from taxonomically more distant snakes (e.g., the pIII known from Naja oxiana). It is very likely, but at the time of writing, purely speculative.

      We have expressed additional PIII SVMPs from Echis and Daboia species and will report their production and characterisation in due course.

      Lastly, the reality is also that the expression in insect cells can only be carried out by highly specialized labs (even in the expression world, as most laboratories work with bacterial or fungal hosts), whereas the isolation can be attempted in most venom labs. That said, production in insect cells also has economic repercussions as it will be very challenging to generate yields that are economically viable versus other systems, which is pivotal because the authors talk about bioprospecting and the toxins used in snakebite agent research.

      We thank the reviewer for this perspective on the practicalities of protein expression. However, we respectfully disagree with the characterization of insect cell expression as an inaccessible or economically non-viable platform for toxin research. We offer the following points:

      (1) Prevalence and accessibility: Contrary to the suggestion that insect cell expression is restricted to highly specialized labs, the Baculovirus Expression Vector System (BEVS) has become a cornerstone of modern biologics production, structural biology and biochemistry. For instance, our MultiBac system (which is but one of several systems currently widely in use) is utilised by over 1,000 laboratories and institutions, academic and pharma/biotech, worldwide. The maturation of commercially available kits, automated platforms, and standardized protocols has moved this technology into the mainstream, making it a standard tool for any lab requiring high-quality eukaryotic proteins.

      (2) Biological necessity: Bacterial (E. coli) and fungal (P. pastoris) systems are widely accessible, however, they appear to be fundamentally incapable of producing functional SVMPs. SVMPs require complex disulfide-bond formation, intricate folding, and N-glycosylation for stability and solubility. Bacterial systems have been widely tried by us and others but typically result in very low expression or misfolded inclusion bodies. Of note, originally, we had invested significant effort to adapt P. pastoris to the production of eukaryotic proteins we are interested in, without success, before moving on to the MultiBac system. The SVMPs that we analysed here are highly cytotoxic, rendering the baculovirus/insect cell system in a way a logical choice given that the cells are no longer 'living' after infection with the baculovirus (but more akin membrane-enveloped bioreactors). Thus, one can make the argument that insect cells represent the most accessible middle ground that provides folding apparatus and necessary post-translational modifications (PTMs) required for biological relevance, and it is possible to produce mg amounts of SVMP proteins per litre cell culture as reported here in our manuscript.

      (3) Economic viability and bioprospecting: Regarding the economic argument, we contend that viability in bioprospecting is defined by functional yield rather than simple volume. Producing large quantities of non-functional or misfolded protein in a cheaper system is economically inefficient. Furthermore, for snakebite research, the ability to produce specific, pure isoforms recombinantly without the contamination of other toxic venom components found in native isolations is essential for high-throughput screening and drug design.

      (4) Scalability: Historically, insect cell production was seen as expensive, but current bioreactor technology and reduction in consumables and media costs allow for significant scaling. Many therapeutic reagents (vaccines, viral vectors, protein biologics) are produced routinely in baculovirus/insect cells. For the purposes of bioprospecting and lead identification, the yields provided by our Hi5/Sf21 system are sufficient for rigorous downstream bioassays and structural characterization.

      Again, I believe the paper is highly important and excellently crafted, but I think especially the discussion should see some refinement to address the drawbacks and to evaluate the paper's findings with more modesty.

      Thank you. We included the discussion about glycosylation patterns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It is not entirely clear to me if the final constructs are indeed "fusion-proteins" (line 172, 974), in the sense of chimeric proteins. From the current description, it appears that the prodomain is encoded in the same gene rather than fused as a separate domain. Thus, referring to these constructs as fusion proteins may overstate the degree of protein engineering involved in the study.

      This is correct. In the revised manuscript, ‘fusion protein’ is only used in the context of the propeptide SVMP fusion construct to avoid confusion.

      (2) Figure 2J: It is difficult to assess how much protein is secreted relative to the intracellular amounts. The blot is surely misleading, as the effective protein dilution differs substantially between intracellularly vs. extracellularly. Providing an estimate of the relative dilution of extracellular protein would help clarify the extent of secretion.

      We estimate that the SNP and SN fractions are at least 10-times more concentrated than the media fraction. The blot is analytical and not quantitative.

      (3) The manuscript appears to use both alphafold 2 and alphafold 3 for structural predictions. Clarification on the choice of the version and its impact on results would improve consistency.

      In the revised version of the manuscript, we clarify that all structural models were generated using AlphaFold 3.

      (4) Figure S3b and others: a clear description of the antibodies used in the Western blots would be appreciated (including in the methods).

      We included this information in the figure legends and a paragraph in the methods section for Western blots in the revised manuscript.

      (5) MTT cytotoxicity testing would be more convincing if done in a concentration-dependent manner.

      We repeated this assay using different concentrations of SVMPs and show the results as a new Figure 5f in the revised manuscript.

      (6) Figure S3c: It could be interesting to show the sequence coverage to get an impression of what part of the protein is there.

      We have included this information as Supplementary Figure S4d in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Overall, the study is presented in a step-by-step manner, and its conclusions are valid.

      (1) As suggested in the public review, further characterization of the purified material would be good, for example, by intact mass-spectroscopy to characterize the enzymes in further detail.

      Preliminary MALDI-MS analysis (performed in Loic Quinton’s laboratory) of our PIII SVMP revealed a broad and heterogeneous mass distribution, consistent with heterogeneity caused by the presence of multiple glycoforms (which is not unlike the microheterogeneity in native snake venom). However, owing to the inherent limitations of MALDI-MS for the analysis of glycoproteins, our data do not allow determination of the number of occupied N-glycosylation sites or the identification of additional types of glycosylation.

      Moreover, the relatively large molecular mass of these proteins (zymogen 70.2 kDa protein only, mature PIII 50.6 kDa protein only) makes analysis by electrospray ionisation mass spectrometry technically challenging.

      An MS-based deep analysis of the glycosylation patterns would therefore be a project on its own, and beyond the scope of the present manuscript.

      (2) The studies involving PII appear challenging due to low yields and stability of the enzyme and the mentioned self-degradation. Some studies, such as the casein-degradation, would benefit from working with a well-characterized batch of enzymes to ensure, it is not auto-degrading during the experiment.

      We believe that the finding that the PII SVMP degrades itself after incubation with Zn<sup>2+</sup> is an important observation. It is novel to the best of our knowledge. Moreover, the key message of our manuscript is that we can produce and characterise novel SVMPs that cannot be readily purified from venom (and thus are not well characterised).

      Besides, there are very few intact PII SVMPs in venom (e.g. Suntravat et al. BMC Molecular Biol 2016); the vast majority cleaves itself into a PI and a disintegrin.

      (3) Figure 4h. Degradation of insulin is only shown for recombinant PIII, not the native enzyme, and therefore doesn't convey any information with respect to how well they compare.

      We do not have available any native PII and PIII SVMPs for a comparison with the recombinant SVMPs (in our manuscript we show expression of new, uncharacterised SVMPs). We have included the PIII SVMP in the original manuscript to show that the enzyme is active and has a different specificity compared to PI SVMP. In the revised manuscript, we also included the PII SVMP insulin B degradation assay in Supplementary Figure S11b.

      (4) Figure 5a. Inconsistent use of enzymes - data for PII is presented (both as mature protein and Zymogen) and compared to PIII, but not PI, as both zymogen and mature protein. The current data presentation is confusing and gives the idea of the manuscript assembled with figures produced during the exploratory phase of the study, and not from subsequent experiments systematically conducted for the purposes of clarity and completeness.

      In the revised manuscript, we included the missing enzymatic characterisations in Figure 5 (panel a and e) and Supplementary Figure S11a-c. These data were initially not included because the respective enzymes are inactive in these assays.

      (5) The manuscript would benefit from editing to make it more concise. For an early-career reader, it is of interest and utility to follow the thought and experimental processes that led to the successful solution, but there is a risk of losing the reader's interest along the way by going through expression experiments that did not "work" in the typical sense of the word. To this reviewer, there is no added value in a full paragraph around co-expression with disulfide isomerase, as it did not improve the protein yield. A single sentence, "co-expression with PDI did not improve yields," with a reference to a supplemental figure would convey that message.

      We have moved the optimisation of SVMP expression to the Supplementary Information, which we hope has improved the clarity and flow of the main text.

      We note that the hypothesis that co-expression of protein disulfide isomerases (PDIs) enhances yields of functional SVMPs, given the high expression of PDIs in snake venom gland cells, is well established in the field. While we consider PDIs (and other chaperones) likely to play an important role in SVMP expression, we were unable to demonstrate this effect using the baculovirus-insect cell expression system and hypothesize that efficient insect and/or baculoviral PDIs are already present.

      (6) Similarly with N-linked glycosylation, the section needs a headline (line 241) and firming up of a sentence like "and possibly not all of the glycosylation..." which is vague and appears to state that it was not really of interest to pursue this further. My view is that either an experiment is done properly with a stated aim and purpose, interpreted, and then, based on whether the results are of interest to the main story or not, they are included. If N-linked glycosylation is to be included in the manuscript, it should be with a purpose (e.g., N-linked glycosylation affects enzyme activity). As it stands, the message is "there is some N-linked glycosylation" without further explanation, and this generates information without justifying the inclusion hereof.

      Please see our reply above regarding an in-depth characterisation of insect cell glycosylation of the recombinant PIII SVMP without access to the native enzyme for comparison. In our revised manuscript, we confirm that the PIII SVMP is glycosylated and that this at least partly accounts for the apparent discrepancy in molecular weight observed in SEC and SDS PAGE. We have modified the text to clarify the purpose of the PNGase deglycosylation experiment.

      (7) The manuscript, in its current form, appears to have been copied from a Thesis with very detailed step-by-step logic and description. While this is useful in a scholarly context, a scientific manuscript should be presented more compactly, assuming the readers know basic biochemistry.

      We trust that this Reviewer finds the revised version of our manuscript more compact and concise. 

      Reviewer #3 (Recommendations for the authors):

      (1) Material and Methods plus Figures:

      Please report the number of replicates per experiment and how data is presented (means/ medians/ standard deviation/ others), and add error bars to the plots where needed.

      In the revised manuscript we have included the number of repeats in the figure legends.

      (2) Abstract

      Line 4: I would not say that SVMPs are the most potent viper toxins. This place is probably taken by some of the highly neurotoxic PLA2, such as Crotoxin. Nevertheless, SVMPs are surely some of the most important toxins responsible for pathophysiological effects stemming from viper envenoming, but I would suggest rephrasing for accuracy.

      In the revised manuscript, we have modified this sentence.

      (3) Introduction

      Lines 27-31: I would like to see a reference supporting the existence of all SVMP types across vipers.

      We have included references supporting the existence of PI, PII and PIII SVMPs in viper venom. We also rewrote the sentence to state that “representatives of all three sub-classes are present in different viper venoms.” This clarifies that we do not say that all classes are present in all venoms.

      Lines 59-60: I am not sure if this should be considered such an important impediment. Essentially, many vipers yield double- to triple-digit mg amounts of crude venom per specimen from only a single milking.

      We have rewritten this text in the revised manuscript.

      Currently, it is not possible to purify any given SVMP of interest from venom; in particular for E. ocellatus SVMP isoform mixtures are typically purified rather than individual enzymes (see also introduction section of our manuscript line 57ff). Also, many SVMPs are not present in sufficient amounts in the venom. Here, we provide an approach to recombinantly produce any SVMP of interest, independent of its abundance in the venom.

      (4) Results

      Line 102: The army-fallworms name is Spodoptera, not Spotoptera. Please correct the typo.

      Done. Apologies for our oversight.

      Line 311: Please provide the data at least as a supplement.

      In the revised manuscript, we have included this experiment in Supplementary Figure S6c.

      Line 432- 433: It would be useful to clarify whether the protein should have a pro-coagulant activity (or not).

      We have changed this sentence as follows in the revised manuscript: This shows that our recombinantly produced SVMPs have no pro-coagulant activity, which was unknown before.

    1. Some recommendation algorithms can be simple such as reverse chronological order, meaning it shows users the latest posts (like how blogs work, or Twitter’s “See latest tweets” option). They can also be very complicated taking into account many factors, such as: Time since posting (e.g., show newer posts, or remind me of posts that were made 5 years ago today) Whether the post was made or liked by my friends or people I’m following How much this post has been liked, interacted with, or hovered over Which other posts I’ve been liking, interacting with, or hovering over What people connected to me or similar to me have been liking, interacting with, or hovering over What people near you have been liking, interacting with, or hovering over (they can find your approximate location, like your city, from your internet IP address, and they may know even more precisely) This perhaps explains why sometimes when you talk about something out loud it gets recommended to you (because someone around you then searched for it). Or maybe they are actually recording what you are saying and recommending based on that.

      I agree and have personally witnessed this happening to me as a prominent social media user myself. The social media algorithms are programmed so that you spend as much time on their app as possible. To do this, they make sure they have our attention on their platform at all times by showing us content they think we like and therefore will watch the most. For example, recently I have been watching a lot of Instagram videos about the upcoming FIFA World Cup, and not only do I get videos now, but also my friends, whom I have sent reels to, and those who are the closest to me.

    2. When social media platforms show users a series of posts, updates, friend suggestions, ads, or anything really, they have to use some method of determining which things to show users. The method of determining what is shown to users is called a recommendation algorithm, which is an algorithm (a series of steps or rules, such as in a computer program) that recommends posts for users to see, people for users to follow, ads for users to view, or reminders for users. Some recommendation algorithms can be simple such as reverse chronological order, meaning it shows users the latest posts (like how blogs work, or Twitter’s “See latest tweets” option). They can also be very complicated taking into account many factors, such as: Time since posting (e.g., show newer posts, or remind me of posts that were made 5 years ago today) Whether the post was made or liked by my friends or people I’m following How much this post has been liked, interacted with, or hovered over Which other posts I’ve been liking, interacting with, or hovering over What people connected to me or similar to me have been liking, interacting with, or hovering over What people near you have been liking, interacting with, or hovering over (they can find your approximate location, like your city, from your internet IP address, and they may know even more precisely) This perhaps explains why sometimes when you talk about something out loud it gets recommended to you (because someone around you then searched for it). Or maybe they are actually recording what you are saying and recommending based on that. Phone numbers or email addresses (sometimes collected deceptively [k1]) can be used to suggest friends or contacts. And probably many more factors as well!

      I think algorithms are the most important thing to creating a social media. Because that collected data from users and show what they want or mentioned before on their account. Like we are using Facebook and can see friend suggestion or people you may know when we just met someone once or talked about them. Just like Facebook is reading your mind, sounds like creepy in a funny way. So algorithm is working, So if you and someone else have some connections in common, Facebook kind of “guesses” that you might know each other. I wonder how much social media really know about us.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers

      We would like to thank the reviewer for their constructive comments on our manuscript. We have addressed all comments made by the reviewers by additional experimental data, data analyses, and text edits. A detailed point-by-point response to the reviewers is documented below.

      Summary of new/amended data panels

      Fig 2C (Rev 2): Cell-by-cell quantification of the GFP fluorescence intensity as a surrogate measure of wild-type (WT) vs mutant Pfn1 rescue construct expression levels in B16F1 KO-rescue studies.

      Figs 1B, 2A, 3C, 4A, 4C (Rev 1, 3): Inclusion of zoomed images of PIP2 staining of select regions of interests.

      Figs 6B, 6D (Rev 2): Quantification of phospho-PKC substrate antibody immunoblots of MDA-231 and B16F1 cells with or without Pfn1 KO.

      Fig 3E (not requested by the reviewers): Time-lapse images of PIP2 biosensor and F-actin in HEK-293 cells.

      __Fig 3H (Rev 3): __Half-life comparison of LatB-induced PIP2 and F-actin responses

      Fig S1 (Rev 1): F-actin and PIP2 staining of MDA-231 cells with or without treatments of myosin inhibitor blebbistatin.

      Figs 6G-I (Rev 2, 3): Quantification of various parameters from Ca2+ imaging studies.

      Fig 6J-M (Rev 2): __Images and quantification of correlative PIP2 and DAG biosensor studies __in HEK-293 cells.

      Fig 7 (not requested by the reviewers)__: __A schematic model of how Pfn1 loss leads to PIP2 reduction in cells.

      Fig S2 (not requested by the reviewers): Effect of Pfn1 knockdown on PI4P in HEK-293 cells.

      Fig S3B (Rev 2): A list of top 100 (50 up, 50 down) differentially expressed genes in response to Pfn1 KO in MDA-231 cells.

      Point-by-Point response

      __REVIEWER 1: __

      1. "The quantifications of the PIP2 levels were apparently done simply by measuring the fluorescence intensities of wild-type and knockout cells stained with monoclonal actin-PIP2 antibody. However, the knockout cells appear more spread compared to the wild-type cells (Fig. 1B), and this can possibly affect the quantifications (e.g. there may be more plasma membrane ruffles/folds in the wild-type cells). Thus, I recommend that in all critical quantifications the authors would also use a general plasma membrane marker to confirm that the PIP2-density (and not just morphology of the plasma membrane) is indeed affected by Pfn1-depletion". Response: For PM PIP2 analysis, we specifically quantified the total rather than the average PM PIP2 staining intensity (as also previously done in other studies - Hammond et al. J. Cell Science 2006; Biochem. J 2009) for three reasons. First, PIP2 is non-uniformly distributed across the PM, and therefore the average intensity calculation collapses a lot of biologically meaningful spatial information. Second, the average intensity calculation is impacted by significant cell shape and area differences that exist between cells within a group as well as between groups. Third, the integrated PM intensity is a better metric of how much total PIP2 is available for metabolic turnover on a cell-by-cell basis. These justifications are now detailed in the revised manuscript.

      In our previous study (Ricci et al., J. Biol. Chem 2024, PMID 38141770), we utilized orthogonal techniques (immunostaining, lipid dot blot) in multiple cell lines to demonstrate that total PIP2 as well as PIP2 intensity at the plasma membrane (PM) (based on manual tracing of hundreds of cells in immunostaining experiments) are reduced by silencing Pfn1 expression, and conversely, elevated upon Pfn1 overexpression. We would like to clarify here that in our present study we used an automated pipeline in "cell profiler" to detect cell edges and quantify integrated PM intensity of PIP2 in control vs Pfn1 knockout (KO) cells, and our present findings in Pfn1 KO setting recapitulated our previous findings in transient knockdown setting. While our cell-profile pipeline accurately detects the cell edges, we address the reviewer's comment on confirmation of findings with a PM marker by providing new experimental data in HEK-293 cells transfected with fluorescence biosensors of PIP2 and DAG along with a PM marker (iRFP-Lyn11), which also shows reduction of PIP2 fluorescence staining at the Lyn11-positive PM regions in Pfn1 knockdown cells relative to control cells (see new data panels Figs 6J, L).

      "To get a better idea about which cellular actin filament structures are important for regulating the PIP2-levels at the plasma membrane, one could also use a larger repertoire of actin/myosin inhibitors (CK666, cytochalasin-B, blebbistatin). By using these compounds, one may e.g. uncover if the Arp2/3-nucleated branched actin networks and/or contractile actomyosin structures would specifically contribute to regulation of the plasma membrane PIP2 levels".

      Response: We thank the reviewer for this suggestion. We have now evaluated the effect of blebbistatin treatment on PIP2 in MDA-231 cells (now shown supplementary Fig S1). A previous study showed that the major effects of blebbistatin on actin cytoskeleton are disintegration of actin stress fibers, softening of cortical actin, and transformation of lamellipodial actin into loose network of accumulated amorphous actin structures that correspond to membrane ruffles (Shutova et al., 2012). These phenotypes were also recapitulated in our experimental settings. In general, blebbistatin-treated cells exhibited protrusive structures in random directions with PIP2 enrichment in peripheral F-actin-rich regions (consistent with the LatB experimental data) and a higher (p=0.09) overall cell edge PIP2 staining vs vehicle-treated cells further underscoring the impact of actin cytoskeletal perturbation on PM PIP2.

      "The effects of PLCb3 silencing on Pfn1-dependent changes in the PIP2 levels are interesting. To gain better insight into the underlying mechanism, one could also check if the levels of active (phosphorylated) PLCb3 are affected upon Pfn1-depletion".

      Response: We would like to point out that unlike PLCg, PLCb is not activated by phosphorylation. While literature has documented that certain site-specific phosphorylations of PLCb by PKC (in a feedback manner) and PKA, these phosphorylation events, if at all, have inhibitory effect on PLCb activity. Since our data supports the model that Pfn1 loss leads to an increase in PLC-mediated PIP2 hydrolysis and downstream PKC activation, we feel that probing for such inhibitory feedback phosphorylation events will not provide any mechanistic insights.

      "In the 'Discussion', the authors speculate that Pfn1 H119E mutant may have more frequent interactions with PIP2 as compared to wild-type Pfn1. This does not make much sense, because Pfn1 binding to PIP2 is very weak (e.g. ref. 28), and it is unlikely that introducing a negativelycharged glutamate would increase its affinity to negatively charged headgroup of PIP2. Thus, it seems unlikely that Pfn1 would affect the PIP2 content of plasma membrane through direct interactions with PIP2".

      Response: __We did not mean to imply that glutamate substitution of H119 residue would necessarily increase Pfn1's __intrinsic affinity to negatively charged PIP2. While PIP2 binding of WT vs H119E-Pfn1 has never been quantified in biochemical assays, we previously (Bae et al. PNAS 2010; PMID 21115820) showed that H119E substation does not affect the membrane fraction of ectopically overexpressed Pfn1 in cells. Along this line, Pascal-Goldschmit and colleagues (PMID: 7673143) also showed that analogous mutant H119D-Pfn1 inhibits PLCg-mediated PIP2 hydrolysis as efficiently as WT-Pfn1, further underscoring the fact that H119D/E-Pfn1 is not defective in membrane phosphoinositide binding. Our data largely supports a model that Pfn1-dependent PIP2 alteration is predominantly related to its actin-regulatory function. However, since Pfn1's binding to actin and PIP2 are mutually exclusive, we cannot absolutely rule out a minor (possibly insignificant) contribution of Pfn1's ability to block PIP2 hydrolysis by direct PM interaction. We therefore offered a hypothetical scenario where H119E-Pfn1 mutant may have more frequent interaction with PM PIP2 simply because it is not able to interact with actin. We have now better clarified this argument in the "Discussion" section of the revision.

      "The cell images in Fig. 2A are bit difficult to follow due to the large number of cells in the images. One could perhaps show higher resolution images with few knockout and rescue cells in the same field of view and indicate the rescued cells in these images e.g. with arrows".

      Response: As requested by the reviewer, we have now shown zoomed images in Fig 2A in the revision.

      "Please clearly describe in each figure legend what the error bars represent"

      Response: We have now clearly mentioned in the Statistics section of "Materials and Methods" that all error bars represent standard deviation unless explicitly mentioned otherwise.



      REVIEWER 2

      1. "The data show that actin binding-deficient mutants of Pfn1 do not rescue the knockdown. In these experiments, it is critical to quantitate the relative expression levels of the mutants. The model that Pfn1 regulation of PIP2 requires interactions with actin is not really clear - is it due to Pfn1 targeting by actin binding, or Pfn1 regulation of actin itself? Either possibility seems possible, and the experiments do not distinguish them". Response: We thank the reviewer for these comments. First, since GFP and Pfn1 rescue constructs are linked by an IRES, we analyzed GFP fluorescence intensity of cells selected for PIP2 analyses as a surrogate measure for comparing the relative expressions of Pfn1 rescue constructs across the various groups. As per these analyses (based on measurements of hundreds of cells from 3 different experiments), the average GFP expression of cells chosen for PIP2 analyses was found to be comparable between the various Pfn1 KO rescue groups (now shown in Fig 2C). Therefore, we argue that our observed phenotypic differences related to PIP2 are not confounded by the expressions of various Pfn1 rescue constructs.

      Second, it is known that Pfn1 loss leads to pronounced reduction in lamellipodial F-actin content (as shown in Figs 3A-B). Our LatB experimental data (Figs 3E-G) show that actin depolymerization leads to pronounced PM PIP2 reduction within minutes. Based on these findings, taken together additional evidence for increased basal PLC activity signature readouts in Pfn1-deficient cells (i.e. greater baseline PKC activity, greater PM DAG/PIP2 ratio from biosensor studies as recommended by the reviewer (new data - shown in Figs 6J-M)), we postulate (concurring with Reviewer 3) that disruption of cortical cytoskeleton (possibly also accompanied by removal of PIP2-binding adaptor proteins) may enhance PIP2's accessibility to hydrolytic enzymes. In fact, two previous studies (Cho et al., PNAS, 2005 and Andrade et al., Scientific Reports 2015) have demonstrated that actin filament disruption increases PM mobility of PIP2. There is also evidence for actin depolymerization-induced uncaging of PLC from the cortical actin network (Huang et al, Planta, 2009). Therefore, in principle, Pfn1 loss may cause more frequent PLC-PIP2 interaction and enhance baseline PIP2 hydrolysis by either increasing PM diffusion of PIP2 and/or uncaging of PLC. We have now included a schematic working model (Fig 7) to illustrate this concept and added these points in the discussion. However, a direct demonstration of increased PIP2 accessibility of PLC in Pfn1-deficient cells is beyond the scope of the present - this is something we will pursue in the future.

      "The knockdown data on PLCbeta is convincing with regard to its role in PIP2 reductions, but the papers does not explain how actin-Pfn1 interactions regulate PLCbeta".

      Response: Please see our detailed response to the previous comment that specifically addresses how we envision Pfn1 negatively regulates PLC-mediated PIP2 hydrolysis via modulating actin cytoskeleton.

      "The transcriptome data must be provided along with the data in Figure 5 - otherwise it is impossible for the reader to evaluate. The fact that the data is being used in another paper is not an adequate reason for its omission".

      Response: The transcriptomic data is now displayed in Supplementary Figure S3, where we have now listed top 100 (50 up, 50 down) differentially expressed genes in response to Pfn1 KO in MDA-231 cells (see panel B in Fig S2). We are in the process of submitting the FASTA file to GEO database.

      "The PKC substrate data is not convincing. The blots are messy, and there is no quantitation".

      Response: Since phospho-PKC substrate antibody is supposed to recognize all phosphorylated proteins by PKC, we expect to see multiple bands. The intensity of each lane in entirety is approximative of PKC activity by detecting proteins at multiple molecular weights phosphorylated at their serine residues. We have replaced the B16 generated data with a better-quality blot and added quantifications with statistical analysis (Figs 6B, D).

      "The calcium data should include statistical analysis of the differences".

      Response: We have now performed statistical analyses of the calcium data. Specifically, we compared the peak amplitude, integrated Ca2+ signal (area under the curve), and the post-stimulation resting value between control and Pfn1 knockdown groups. As per these analyses, we did not see any significant difference in either the peak amplitude or integrated Ca2+ signal between the control and Pfn1 knockdown groups, further underscoring the fact that Pfn1 loss does not necessarily confer cells an increased ability to respond to agonists (i.e. LPA-induced GPCR activation in this specific case). However, we noted that the post-stimulation resting Ca2+ signal was elevated in Pfn1-deficient cells relative to control cells (p2 hydrolysis and/or reduced re-uptake of cytosolic Ca2+ by endoplasmic reticulum and/or reduced efficiency of Ca2+ export. These analyses are now included in Figs 6G-I in the revision.

      "The discussion of DAG and PA levels is problematic. As the authors are aware, whole cell lipidomics can easily miss small changes in specific compartments. If the authors think that lipid sensor analysis of PM DAG and PA would strengthen the analysis, then this should be included. The large change in PC levels does seem to suggest an alternative source of PA. While the authors present arguments against a role for PLD, this could be directly tested. In any case, the finding of a nearly 100-fold greater change in PC than in PA raises question about what the whole cell PA measurements is really detecting".

      Response: We thank the reviewer for these comments and experimental suggestions__. First__, we completely agree with the reviewer that whole cell lipidomic analyses fail to detect small changes in specific compartment; we mention this point in the revision. In the revision, we have displayed our lipids of interest as individual line plots connecting control and Pfn1 KO group experiment-by-experiment to show the trend of lipid change in each experiment. As per these analyses, in 4 out 5 experiments, the total DAG increased in Pfn1 KO cells. However, the large experiment-to-experiment variability in the absolute content as well as Pfn1-dependent changes in DAG precluded us from achieving statistical significance between the two groups. The large variability in the measured DAG content in our experiments is not totally surprising since cellular DAG level is known to fluctuate with growth and/or impacted by unintended changes in the chemical parameters of culture condition. However, the largest pool of DAG is in ER/golgi, and since whole cell lipidomic measurements fail to reveal PM DAG due to PIP2 hydrolysis, as per reviewer's recommendation, we now include lipid biosensor experimental data (Fig 6J-M) of control vs Pfn1 knockdown HEK-293 cells to demonstrate that PM DAG-to-PIP2 ratio (an indicator of the basal PIP2 hydrolysis efficiency) is increased upon Pfn1 depletion. We believe that these new correlative PIP2/DAG biosensor data further strengthen our conclusion.

      Regarding the reviewer's comment on the orders of change in PC vs PA, we clearly mentioned in the original discussion that it is highly unlikely that PA increase in Pfn1-deficient cells is reflective of increased PLD-mediated conversion of PC for two reasons. First, we saw disproportionate orders of magnitude of changes in the content of PA (~3000 pmol/mg increase) vs PC (>200,000 pmol/mg decrease) in response to Pfn1 KO in MDA-231 cells. Second and more importantly, since monomeric actin directly binds to and inhibits the activity of PLD, the expected increased G-to-F-actin ratio in Pfn1-deficient cells, if at all, would likely result in diminished PLD activity reducing PLD-mediated conversion of PC to PA.

      In our opinion, since DAG is the direct hydrolysis product of PIP2 and we are now able to demonstrate elevated PM DAG-to-PIP2 ratio in Pfn1-deficient cells in biosensor experiments, PA biosensor studies are not necessary.

      REVIEWER #3

      1. "General: Scale bar labels are too small, please also provide time-stamps for time course measurements" Response: These concerns have been addressed in the revision.

      "As with every antibody stain, there is a remaining risk that a change in the cellular context affects an off-target of the antibody (e.g., a protein phosphorylation site). I think that this is not particularly likely, but I'd control for it, which can be done in a straightforward manner: The authors could do a strong-detergent treatment to rule out a potential off-target effect of the antibody (e.g., 0.1% Triton X-100, 1 h). This should remove all (non-amino-) lipids from the sample, including the phosphoinositides. Overall, binding of the antibody should be strongly reduced, fluorescence images should be much dimmer & the effect of the Pfn1 KO should mostly disappear."

      Response: The PIP2 antibody used in the present study is a well-vetted and widely used antibody in literature. Notably, two papers published by Dr. Hammond (one of the co-authors), an expert in phosphoinositide signaling, previously showed selectivity of this antibody by blocking with lipids, neomycin, and PH-domain of PIP2-binding proteins (Hammond et al, J. Cell Sci, 2006; Biochem J. 2009). We cite these papers in the revision.

      "Figure 1: Please show images in a larger zoom, cell details are barely visible (same for Figure 3). I also would not use "PM PIP2 levels" in the legend, as nuclei appear visibly lighter, indicating that some PIP2 is likely present in other membranes. The type of PIP2 staining should be specified in either the Figure itself or in the legend."

      Response: We would like to clarify here that we used an automated pipeline in "cell profiler" to detect cell edges and quantify integrated PM intensity of PIP2 in control vs Pfn1 knockout (KO) cells; so nuclear membrane PM is not accounted for in the analyses. We have zoomed PIP2 images in Figure 1 as the reviewer suggested. These changes are incorporated in the revision.

      "Figure 3: Same comment as for Figure 1, zoomed images would really help, especially for the PM/Cytosol distribution of the PIP2 biosensor"

      Response: Zoomed images of Fig 3 have been provided in the revision.

      "The lag time in the dissociation of the PIP2 sensor is interesting, as is the fact that the kinetic of PIP2 biosensor release is (visually) slower. I recommend to do a couple of simple fits to quantify these effects. If my impression holds, this would be a strong support of the author's interpretation that actin depolymerization actually leads to a loss of PM PIP2 - a simple binding/unbinding kinetic would be much closer to the actin depolymerization kinetic".

      Response: As suggested by the reviewer, we have done curve fitting of these data to calculate the half-life of F-actin and PIP2 (results shown in Fig 3H). As per these calculations, the mean half-life of PIP2 (~ 1min) is significantly longer than that of F-actin (~2.2 min) which further supports our interpretation that actin depolymerization leads to a loss of PM PIP2.

      "Figure 4: Same comment as for Figures 1 and 3, zoomed images would be most helpful."

      Response: Zoomed images have been provided in the revision.

      "Figure 5G: It looks like the two conditions were internally normalized. Given that we're looking at differential levels of PIP2/IP3/DAG, I think it is very possible that baseline Ca levels are also different. I'd either report in au or do a global normalization which would also capture any difference between the two conditions. This should also clarify whether there are differences in post-stimulus steady state Ca levels, as it currently looks like".

      Response: Since we used a transfectable Ca2+ biosensor (GCaMP), to account for cell-to-cell variation in the actual expression of the biosensor, we had to baseline-corrected GCaMP fluorescence by normalizing each kinetic datapoint readout to the average pre-stimulation value on a cell-by-cell basis. However, we have now performed additional analyses. Specifically, we calculated the peak amplitude, integrated Ca2+ signal (area under the curve), and the post-stimulation resting value for each of the two groups. As per these analyses, we did not see any significant difference in either the peak amplitude or integrated Ca2+ signal between the control and Pfn1 knockdown groups, further underscoring the fact that Pfn1 loss does not necessarily confer cells an increased ability to respond to agonists (i.e. LPA-induced GPCR activation in this specific case). However, we noted that the post-stimulation resting Ca2+ signal was elevated in Pfn1-deficient cells relative to control cells (p2 hydrolysis and/or reduced re-uptake of cytosolic Ca2+ by endoplasmic reticulum and/or reduced efficiency of Ca2+ export. These analyses are now included in Figs 6G-I in the revision.

      "Please increase the font size in Figure 6C, this is barely readable".

      Response: We have now replaced that panel with one with bigger font texts.


      "Do the authors think that most PIP2 is actually in lipid-protein complexes and actin depolymerization with the corresponding removal of PIP-binding adaptor proteins exposes previously shielded PIP2 molecules to enzymatic hydrolysis?"

      Response: Yes, we certainly think that is the most likely scenario. Please see our detailed response to Reviewer 2's comment #1. We have now clearly included this in the discussion and added a schematic mechanistic model to better illustrate our thinking (Figure 7).

      "The lipidomic changes are extremely interesting. This could indicate a change in overall cellular architecture which goes beyond PIPs. SM/Chol/PC all go down - I'd interpret that this as a relatively lower content of Plasma membrane and ER. It would be interesting to see if the surface to volume ratio of the cell changes - a comparison with total Cardiolipin as a proxy for mitochondrial membrane size could also be informative. It may very well be that the Pfn1 KO effects on structural membrane lipids are the more important finding - but elucidating that mechanism is beyond the scope of the current manuscript. I look forward to learning about it in the next story".

      Response: We thank the reviewer for this insightful comment. However, this is something we would consider as a scope of future studies.

    1. Cognitive Load Theory tells us that our brains need a certain level of difficulty to process information deeply. If something is too easy – like, say, getting AI to write an essay for you – your brain doesn’t engage enough to form lasting knowledge.

      This is the first time I've heard of this theory, and It makes sense that learning requires a certain level of effort for information to be processed deeply and retained. If something is too easy, such as having AI write an essay for you, your brain may not engage as much, which can limit learning. However, I think this depends on how the tool is used. For example, if AI is used as a tool to support studying, such as explaining concepts or organizing ideas, it can still involve active thinking, help in understanding and retaining information. In contrast letting it write a paper requires no thinking skills and requires less active thinking.

    1. In

      Try something like this maybe:

      In this chapter, we will consider models where the variance also depends on covariates. This leads to the following model specification:

      \begin{gather} Y_i \sim N(\mu_i, \sigma_i^2) \nonumber \ \mu_i = \beta_0 + X_{1i}\beta_1 + X_{2i}\beta_2 + \ldots X_{pi}\beta_p \nonumber \ \sigma^2_i = f(Z_i; \delta) \nonumber \end{gather}

      where \sigma_i^2 is written as a function, f(Z_i; \delta), of predictor variables (Z) and additional variance parameters \delta. The set of predictor variables (Z) may overlap with or be distinct from the predictors used in the mean model (X).

      If you don't go with this version, I think splitting up that long sentence into a few distinct sentences and removing the appositive expressions separated by "," in favor of something more readable that doesn't break up the flow as much.

  4. Apr 2026
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers


      __Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      __Summary: Overall, this study adds a large amount of data for the scyphozoan Aurelia coerulea by producing several single-cell RNA sequencing libraries that cover the transition from polyp to medusa. The study provides a modern view of cell type diversity and cell-specific transcriptome changes during this period of extreme morphological change in this particular cnidarian lineage, which is understudied. Certain unique cell subtypes, including neural cell subtypes and muscle cell subtypes which are specific to different life stages are discussed in detail providing some new insights.

      My overall assessment is that the manuscript has good potential to be impactful, but in its current form it is somewhat clunky and overly complex to read, the figures were too crowded and difficult to comprehend, and the authors did not provide enough context regarding the current state of knowledge and what this study adds to it. In particular, Figure 1 and the section about striated and smooth muscles sharing partial transcriptomic profiles need the most work. The results were presented in the context of the anthozoan Nematostella but this should be broadened further to include other cnidarian single-cell studies, such as those from Hydra and Clytia which are both medusozoans like Aurelia. The writing throughout could be streamlined and simplified to better highlight the major findings as described in the abstract of the paper. Several figures were not well presented or clear and could be improved or decluttered to better communicate and support important results. In addition, some methods were totally missing, and I was unable to access the github repository associated with the paper which should detail all analyses described in the paper. In its current form, reproducibility of analyses would be quite limited. I did greatly appreciate the inclusion of the data on the UCSC Cell Browser, which allows anyone to access the single cell data matrix for visual exploration.

      Answer: We thank the reviewer for the overall positive assessment and have tried to address all of the comments that follow.

      Major comments: The Introduction section was very short - only three paragraphs. I feel that this section could be expanded to give more context about Aurelia as a research organism, and the current resources available. This includes genomic and transcriptomic resources particularly those focused on the transition between life cycle stages (polyp to medusa). Any other relevant background on cell type diversity or if there is anything known about the molecular profile of specific cell types found in different life stages should also be included here . Do marker genes already exist for some of the important cell types discussed in the manuscript? It would be better to present the current state of knowledge, and context for why this study was done, how it builds upon current knowledge, and what it adds to our current understanding so that the study is properly framed from the beginning.

      Answer: Introduction was expanded and also includes explanations to which extant medusa specific cell-types were investigated so far. This additional information is highlighted in blue typeface in the manuscript.

      In the Results section, I find the sentence on p. 4, "Further, ~70% of these gene models do not have readily identifiable orthologs and thus represent putative orphan genes" to be rather confusing. What analysis was performed to determine this percentage, and which set of organisms were compared? Doesn't this percentage seem rather high for a cnidarian? Or is this referring to orthologs outside of cnidaria? Please comment further on how this percentage was determined and possible explanations for it being this high. Right now, it just feels tacked on to this paragraph with no context or further explanation which leads to the confusion.

      __Answer: __This statement originally referred to a lack of any best-blast-hit nor any protein domain annotation found for the sequence. This number has dropped to only 47% with the most recent mapping tool, which is a value also fairly commonly found in other animal genomes. Nonetheless this statement has been removed from the manuscript.

      Figure 1. There are many issues with this figure that encompass how I felt generally about the figures of the paper. The figure should ideally take up the entire width of the page rather than squishing some text next to the figure.

      __Answer: __The figures are intended to be a full page, they are also included embedded into the text to facilitate review of the manuscript and the full-resolution figures are included for proper review. In the revised version we have kept this comment in mind to ensure the figures are legible.

      Figure 1A: The colors of the different developmental stages from which tissue was samples (e.g. polyp1, polyp2, polyp.clover) do not seem to match between legend and figure. For example, the "polyp.clover" stage is circled in blue in the schematic, but given a green dot in the legend. The "medusa.manubrium" is circled in orange in the schematic, but given a purple dot in the legend. Suggest making the colors match between legend and schematics.

      __Answer: __ The colors correspond to the grouped stages and colour palette used for the life cycle stage divisions. This has been considered in the revised figure

      Figure 1E: In Panel E, the labels showing that the top graph is "polyp" and the bottom graph is "medusa" are much too small. Increase the font size of the labels. The font size for the GO terms themselves are also too small.

      __Answer: __This figure has been removed in the revision; Attention has been paid to font sizes in the revised figures.

      Figure 1F: The bulk of this study centers around the single-cell RNA sequencing data and resulting analyses from these data. As such, I would expect the cellular atlas resulting from these data to be similarly highlighted. In Figure 1F, the annotated cell atlas as presented is much too small, making it impossible to even add the labels for the different clusters directly on the UMAP. Suggest increasing the size substantially to at least half of the page width, so that it is possible to do so.

      __Answer: __This has been removed in the revision; the full distribution of the identified clusters is now figure 2. We do not include all of the population sub-types on the UMAP in this figure as this is simply a visualization tool and the distribution of the sub-types on that map is not necessarily informative. Rather we include the relative proportions of the sub-types/states in the bar plot, and the relationships between these clusters in the tree.

      -There should also be a complimentary figure in the supplement that shows all of the individual clusters, each in different colors and clearly annotated with labels, rather than just showing multiple clusters that were combined into the major cell types. There is an example of this in the Clytia single cell paper (see Chari et al. 2021 Figure 2A vs Fig S9).

      __Answer: __A fully coloured UMAP with all cell states is available in the supplement figure S3

      -The graph on the right of this panel showing the "Distribution of cell types in time and space" is overly complicated with all of the colors and the meaning is quite lost as it is quite difficult to interpret at this very small size. Suggest removing and possibly showing as a supplemental figure so that it's meaning is easier to assess.

      __Answer: __This barplot is now larger and includes both the partitions (major cell populations, as seen in the UMAP) and proportion of individual cell clusters. We feel this is an intuitive way to illustrate the relative distributions of all cell type states across the dataset as a whole and so we keep this in the main figures of the manuscript.

      -In addition, striated muscles are marked on the overall UMAP; however, it is not noted until later that the smooth muscles are part of the "outer epidermis" cluster. Suggest altering the legend or the text of the figure itself to show where the smooth muscles are thought to be in the overall UMAP, especially since they are specifically discussed in depth later in the manuscript. Exactly which "part" of the outer epidermis cluster includes the smooth muscle cells?

      __Answer: __We have added the smooth muscle cluster in the main figure umap.

      Figure 1G: Panel G, for example, is not useful in conveying its point as the text labels are too tiny and the figure is overly complex to be squished into a panel of this figure. Suggest removing and making 1G a supplemental figure by itself or perhaps together with 1C (as they are linked) where it is more legible. The figure legend text for Fig 1G is also confusing as it refers to "scyphozoa" in (C) but there is no "scyphozoa" in 1C, only "medusa".

      __Answer: __This is now Figure 1D and E and is given increased space in the figure. We feel the message that the medusa-specific gene set is not restricted to medusa-specific cell types is an important one and so we have kept this in the main figure. We provide a table with all gene annotations in the supplement so that it is accessible to anyone with further interest (DS1.1a and DS1.1b).

      Text, p. 6: The explanation for how the clusters were annotated in Fig 1 and Fig 2 is much too vague. The text states, 'We identified 9 broadly defined cell populations, for which we assign identities by assessing up-regulated gene lists (Data S1.3)." What does this mean? How exactly were the up-regulated gene lists assessed? This needs to be clarified further. What genes were used to label these clusters or groups as particular cell types? How does the annotation relate to Supplemental Tables S1.3 and S1.3b? Does the previous literature need to be cited to support these annotations based on specific genes? Suggest doing a better job overall and providing more detail and context explaining how the single cell clusters were annotated.

      __Answer: __We have expanded our description of how we assigned identities to the nine principal cell type families as follows:

      (pg. 8) The inner epithelia, or gastrodermis, expresses several collagens that are a characteristic of the inner cell layer of anthozoans (39); the outer cell layer houses the ring musculature and is rich in contractile proteins. The striated muscle cluster is also rich in contractile protein and is the only principal cell population absent from the polyp-derived samples (Fig. 2C). The mucin gland expresses mucin-like-proteins, whereas the digestive gland expresses other digestive enzymes, and the neural cluster expresses synapsin and other conserved known neural regulators such as ashA. The cnidocytes express mini-collagens and are enriched in pathways targeting the endoplasmic reticulum (40).

      Text, starting on p14: "Striated and smooth muscles share partial transcriptomic profiles." This section is highly confusing and could do with some simplification in both text and figures. - The genes for which expression is shown in Fig. 5, 6 and 7 are not properly introduced or given nearly enough context in the text. For example, the text states, "To investigate the dynamics of muscle formation, we further compared phalloidin staining of muscle fields with in situ hybridization detection of specific cluster marker expression in polyps (Fig. 5), strobila (Fig. 6), and ephyra (Fig.7)." However, it is not until the legend of Figure 7 and also much later in the text (in the Discussion, p23) that it is noted what types of muscles each of the genes used in ISH actually mark ("While a small set of genes are shared across the two muscle phenotypes (e.g. stmyhc1 and mrlc2), others are more specific to either phenotype (eg. stmyhc5 in striated muscle; myophilin-like-2 in smooth muscle) (Fig.8A), which were verified by in situ hybridization (Figs.5,6,7)". This needs to be rewritten and improved for flow and clarity purposes.

      Answer: Figure 5,6 and 7 were re-assembled in a different structure according to reviewers suggestion. Specifically, we now present the muscle anatomy together first, followed by molecular validations from the atlas data. Marker genes used for in situ hybridization (ish) were introduced as suggested. Text was re-written according to changes in figures. In general, figures and text were simplified to gain more clarity on the muscle chapter.

      • Suggest that the authors show an overall UMAP of smooth and striated muscle (perhaps the smooth muscle subtypes are part of the large 'outer epidermis' cluster; see the comment for Figure 5B above), and then include featureplots that show the expression of each of the genes used in ISH in these clusters. This might make it clearer as to what type of muscle the genes should be highlighting within each developmental stage. It might look something similar to what is shown in Figure 7P (although it is unclear how the featureplots shown in this figure relate to the UMAP shown in Figure 5B). In addition, the featureplots in Figure 7P only show 3 out of the 4 genes used in ISH which is not helpful. Featureplots should be clearly shown for all genes discussed. This is essential to linking the pattern in the single-cell data to the expression data and is the minimum required to provide clear understanding.

      Answer: We took this suggestion under consideration when re-compiling the figures. Now the feature plots and the insitu’s are found in the same figure (Figure 6).

      • The text reads, "To investigate the dynamics of muscle formation, we further compared phalloidin staining of muscle fields with in situ hybridization detection of specific cluster marker expression in polyps (Fig. 5), strobila (Fig. 6), and ephyra (Fig.7)." However, Figure 6 also contains images of ephyra (Fig6. P-S). Suggest that those panels could be included in Figure 7.

      Answer: This text no longer appears in the manuscript. The relevant section now reads as follows (p15:17):

      “We assessed the anatomic location of the muscle fields by phalloidin staining in Aurelia polyps, strobilae and ephyrae (Fig.5). Polyps have three distinct smooth muscle fields (Fig. 5A,B-G): the radial muscles of the oral disc (Fig. 5D), the longitudinal tentacle muscles (Fig. 5E), and the longitudinal retractor muscles that run along the body column (Fig. 5F,G (35)). During strobilation, fragments of the polyp retractor muscles are retained in the early ephyra (Fig. 5J (35)). Striated muscles appear coronally around the oral disc, oriented radially along the lappets of early detached ephyra (Fig. 5L-N). At the tips of the lappets, the border of the coronal muscle, and at the base of the manubrium, fibres show a mixed organization of smooth and striated myofibrils (Fig. 5O,P). These findings corroborate previous studies that used light- (26) or electron microscopy (24,25).

      We next compared expression patterns expected from our single cell data with the phalloidin-based anatomy of smooth and striated muscles. As expected, several genes were shared between the smooth and striated muscle cluster (Fig.6E), while others were highly specific to either smooth (Fig.6C,D) or striated muscle cluster (Fig.6P; Data S1.11). Different calponin paralogs show distinct expression in the different muscle types (Fig. 7A). For example, calponin1 is specific to the smooth retractor muscle of the polyp and no other subpopulation of the smooth muscle type (Fig. 6A-C). At the strobila stage, expression of calponin1 is still visible in fragmented retractor muscles, consistent with the single cell expression profile (Fig. 6F). By comparison, mrlc2 expression marks the locations of all smooth muscle populations in polyps including tentacle muscles, radial muscles of oral disc and retractor muscles of the body column (Fig. 6D,E).”

      • There are parts of this section text where reference to the Figures is complicated and not easy for the reader to follow. I got particularly confused in trying to follow this part of the manuscript. For example, a sentence on p15 reads, "mrlc2 and stmyhc1 reads are detected in both muscle types (Fig. 7pFig. 5M, Fig 6C,E,G-P, Fig. 7J-L,N-P), and ISH indicates that the expression is localised to the fields of striated muscles in ephyrae (Fig.7J,K,N), as well as the smooth muscle populations in polyps including longitudinal tentacle muscles, radial muscles of oral disc and retractor muscles of the body column (Fig. 5M, Fig.6H,I,L,M), and the muscles of the manubrium in the meta-ephyra (Fig. 7L,O)." It is quite difficult to keep jumping between Figures and panels to look at this. A better organization of the Figures and much clearer text that doesn't jump around could go a long way to making it easier to follow.

      Answer: __ We thank reviewer 1 for the suggested changes. We feel that recombining the results from previous versions of the figures helped to improve the clarity in this section. Single cell data was updated to include an UMAP of the muscle subset and gene expression plots highlighting the differential expression in either smooth- striated or both muscle types corresponding to the in situ hybridization (ish) gene expression profile. The figure (__Fig. 6) is now arranged in a way that allows the reader to easily follow the results for the spatial validation of both muscle types since ish for all life stages is shown in one panel together with the muscle subset UMAP and gene expression plots. Additionally, the two muscle clusters are now labelled also in (Fig. 2A) to provide a better understanding for the reader where muscle clusters are located in the UMAP of the full object.

      The text reads now: (Fig. 6, figure caption): (Q) feature plots of all marker genes on the muscle specific subset (R) reference UMAP of whole dataset (left) subset (right) (S) Distribution plot of muscle types across the different Aurelia life stages (left) and medusa tissues (right).

      Discussion -The authors do try to put their results into context with the two Aurelia genome papers (Gold et al. 2018, and Khalturin et al. 2019) and two additional bulk transcriptome studies (Fuchs et al. 2014, Brekhman et al. 2015), but not until the first part of the Discussion. In principle, this would be fine. However, in practice, their discussion of these studies is somewhat vague and generalized and did not really provide a clear review or analysis of how adding in cell-type specific data is helping our understanding. The argument about how their results fit with previous findings was confusing and unclear. They start by discussing "genome usage" but then switch to talking about cell type diversity across life stages. The connections between "genome usage", "gene representation", and cell types was not easy to follow. Suggest rewriting this section to clearly discuss the findings in this manuscript in the context of previous studies with straightforward and precise language.

      -In the discussion about the neural subtypes, comparisons are only made to Nematostella where there are also two major neural classes. It would be even better to include discussion of single-cell data related to neurons in other cnidarians, such as Hydra, where there is detailed discussion of neuron subtypes in both a published manuscript (Siebert et al. 2019, Science) and a preprint (Primack et al. 2023, biorxiv) and Clytia (Chari et al. 2021, Science Advances). I do see that Clytia and Podocoryna are mentioned in the next section of the Discussion, specifically related to the Otx gene.

      Answer: We thank the reviewer for this oversight. We have incorporated comparative observations from the published Hydra dataset in this regard.

      Pg 21 “ This contrasts with the distribution of n1 and n2 class neurons in the freshwater hydozoan polyp Hydra vulgaris, of which only three of the fifteen sub-types are of the ins-positive n1 type (“ec2”, “en2”, and “en3”: Fig. S8D; (58)). Similarly in the Clytia medusa only one of the three neuron groups (neuron cells “A” (16) have INSM reads and thus could be considered type 1 neurons as defined here.”

      -The section about muscle subtypes in the Discussion would need to be rewritten in accordance to changes suggested above for the Results for this section.

      Answer: Discussion was rewritten according to the changes made in the results section like suggested by reviewer1.

      Materials and Methods -In the section "Comparison with Nematostella" the authors discuss running OMA to generate the set of identified 1:1 orthologs but never go on to mention how many orthologs were identified. Please report this number so it is clear whether this is a small or large subset of the total analyzed. In a recent study of the Hydra AEP strain (Cazet et al. 2023 Genome Research), a similar analysis was done between Hydra and Clytia and they found 5979 genes with 1:1 orthologs between the two species. There should also be a supplemental datasheet that provides a list of these orthologs (See Supplemental Data S17 provided in Cazet et al. 2023 as an example). I am curious to know how many 1:1 orthologs were found between Aurelia and Nematostella. I would expect there to be a smaller overall number than between Hydra and Clytia due to the larger phylogenetic distance between these two taxa. I also strongly suggest that the Cazet et al. 2023 paper should be referenced, as it was the first time an attempt to compare single-cell datasets between two cnidarian species was done. The current manuscript took an alternative approach to comparing Aurelia to Nematostella, so it would be good to acknowledge this and justify the methods used in this manuscript compared to those used in Cazet et al. 2023.

      Answer: We recognize our oversight in not properly referencing the previous study comparing two cnidarian species and have integrated this reference now, and include the requested information regarding our OMA analysis as follows:.

      In total 4311 1:1 gene orthologs between the two species were identified (Data S2.). A similar comparison using OrthoFinder (90) between Hydra and Clytia, both members of the Hydrozoa clade, found 5979 1:1 orthologs (66). OMA was preferred in this study over other available orthology databases because it outputs a high-confidence predicted 1:1 gene orthology list that can be used directly to combine multi-species data.

      -There are missing descriptions of methods throughout the paper. One example is in the section about Transcription Factor families that are over or underrepresented amongst upregulated genes compared to their distribution in the genome - I could not find any description of the methods used to identify these Transcription Factor families in the dataset of Aurelia upregulated genes. How were these families chosen? How were they identified in this dataset?

      Answer: Transcription factors were identified and classified using the Animal Transcription Factor Database version 4. (https://guolab.wchscu.cn/AnimalTFDB4/#/). This information has been added to the manuscript methods.

      -I noticed in the Data and materials availability statement and a few other places in the manuscript, a github repository was mentioned: https://github.com/technau/AureliaAtlas. I tried to access this repository to review what was included, but unfortunately it is not accessible. I found seven repositories within github.com/technau but the AureliaAtlas was not one of them. This repository should include all scripts to generate all figures and other analyses in the paper and should be made available to reviewers to better understand exactly how all analyses were completed. A good example of how this could be done is found in the repository related to Cazet et al. 2023 (https://github.com/cejuliano/brown_hydra_genomes), which is very comprehensive and easy to follow. -When I looked through a similar repository https://github.com/technau/CellReports2022/ from the Steger et al. 2022 Cell Reports Nematostella single-cell paper from this same group, I find it to be rather disappointing. They apparently included all code to generate all figures in a single R file that is not easy to follow and not well commented. If this is the same strategy used for this manuscript, I feel that a much stronger effort could be made to make the analyses of this Aurelia manuscript transparent by producing a github that is more like that of https://github.com/cejuliano/brown_hydra_genomes from the Cazet et al. 2023 paper which organizes each type of analysis in a different github subfolder and within each subfolder they include very detailed information and comments explaining each step of each analysis. Doing this would go a long way to making the analyses in this manuscript more transparent and easier to follow and would certainly put some of my concerns to rest.

      __Answer: __We thank the reviewer for pointing this out. We have ensured that the github page is publicly accessible. We have provided all of the necessary R scripts to generate the analysis and figures. The structure is improved over the Steger paper; separate scripts are provided for each step, including importing and processing the raw data for the Seurat workflow, data processing to assess the life cycle and first clustering, analyses of each subset, and finally calling results from the previous scripts to generate all figures contained in the manuscript.

      Minor comments:

      Figures: Figure 2A: In the legend it says "Colour code as in (B) and (C)" but it's really referencing the colors in Figure 1A, correct? It is confusing to have to look back to Figure 1A to understand the colors here.

      __Answer: __The original figures 1 and 2 have been modified and combined into a single figure in this version.

      Figure 2D: Typo in the word "proteins" in the title of this panel.

      __Answer: __This word no longer appears in the revised figures.

      Figure 3F: The placement of the tree and the two featureplots for myc3 in Nematostella and Aurelia is confusing. Suggest moving the featureplot for Aurelia myc3 so that it is beside Nematostella (to the right of the tree) or move the featureplot for Nematostella myc3 so that it is beside the Aurelia featureplot (to the left of the tree).

      __Answer: __We thank the reviewer for this suggestion and have edited this figure accordingly by moving the myc3 expression plots alongside all of the others.

      Figure 4B: The description of this panel reads, "Distribution-histogram across all samples, medusa-specific cell clusters are highlighted with black outline.", however as a reader, the black outline is not very clear. Suggest making it bolder. In addition, this black outline is a little confusing - it should mark the medusa-specific cell clusters; however, the black outline appears in cell clusters in strobila and ephyra?

      __Answer: __ The black outline is now increased in width for clarity. Medusa-specific cell types are defined by their absence from the polyp samples because already in the strobila stage medusa-specific tissues are being generated and thus these transcriptomic profiles begin to appear. We added a clause in the figure legend to clarify this, as well as within the main text when medusa-specific cell states are first defined.

      Pg.8: “ In total we find 12 cell type states that are not represented (<br /> Figure 5B: It is unclear from where this reference UMAP was derived. Does it come from the overall UMAP, showing the 'outer epidermis' cluster only, with the putative smooth muscle cells in red? Or is it the 'outer epidermis' cluster plus the striated muscle cluster? Suggest making this clearer (see below for larger edits to this section of the manuscript).

      Answer: This has been addressed. Figure 6R now includes both the full dataset inset, as well as the muscle-only subset and is consistent with the rest of the manuscript in this regard.

      Figure 5K/L/M: It is unclear which parts of the polyp in K is used for the images shown in L or M. Both come from the large red box, but it is unclear from which part L and M were made. In addition, the subtraction of the background from the image (to make it look white) is distracting and makes the image itself look artificial.

      Answer: New brightfield images were included to give a better understanding of the region of interest. The images in which the background was subtracted were replaced with the original pictures and contrast was enhanced to brighten the background.

      Figure 6C, G-S: - Not sure what the blue boxes around these panels are meant to highlight? - Also not sure what the image in the left of panel C is. Perhaps an oral view of the strobila? The legend or panel itself should mention this. - Again, subtraction of the background from the image (to make it look white) in panels C, D and E is distracting and makes the image itself look artificial.

      Answer: The figure was redone and the boxes are not present anymore.

      Figure 6J, M, N, O: - For someone not accustomed to looking at images of strobilating polyps, it is unclear what part and what orientation these images are taken of. Suggest including some of these details in the figure legend at least. Fig 6O actually looks like an ephyra, but is annotated as an "advanced strobila"?

      Answer: Figure was re-done (fig.6) with appropriate schematics next to the images.

      Figure 7H: - Not sure what the white lines in this panel are meant to indicate?

      __Answer: __The white lines were removed.

      Results: p5 - In this sentence, "Because these four pouches look like a cloverleaf from above, we call this stage the "clover-polyp", suggest changing "clover-polyp" to match the Figure 1A (where it is written as polyp.clover), or change the text in the Figure to match the text in the manuscript.

      __Answer: __ We made sure to match this in the revised figure.

      p8 - In this sentence, "the bZIP protein family are over-represented as terminal cell type markers, while the number of zinc-finger proteins of the N2C2 class are under-represented", the "N2C2" class the authors refer to is not clear. Is there a typo here? In the figure to which this sentence refers (Figure 2D), the proteins referenced are "zf-H2C2" or "zf-C2H2".

      __Answer: __This no longer appears in the current manuscript.

      p9 - Typo - should be "medusozoans" rather than "medusazoans".

      __Answer: __This has been corrected.

      p11+ - Section titled, "Aurelia neural complement reveals two neural classes with similarities to anthozoan neurons" - I found the classification of N1 and N2 to be confusing, since initially they are described as neural clusters, however N1 in particular is shown to consist of primarily secretory, non-neural cell types. For example, when looking at Figure 4A and B, it is evident that N1 contains only a relatively small number of neural cell-types (in shades of orange), while most of the cells are other secretory, but non-neural cell types (in shades of brown). Not sure if the authors should alter the title to reflect this? For example, instead of 'neural' classes, they could be called 'neuro-secretory' or 'mixed neural and secretory classes'?

      __Answer: __We appreciate the confusion and have adjusted the heading accordingly. However we choose to maintain the designation as N1 and N2 class to reflect the distinction between insulinoma-positive and pou4-positive major Cnidarian neuroglandular sub-types present as defined in our earlier Nematostella work (Steger et al., 2018). We also include a comment in the discussion regarding the support for this distinction in other published Cnidarian dataset as follows.

      ”This contrasts with the distribution of n1 and n2 class neurons in the freshwater hydozoan polyp Hydra vulgaris, of which only three of the fifteen sub-types are of the ins-positive n1 type (“ec2”, “en2”, and “en3”: Fig. S8D;(58)).”

      p11 - Text reads, "Class 1 neurons in the medusa are also most prevalent within the gastrodermis and manubrium, and includes one subtype that first appears in the strobila and is found in all medusa tissue samples ("n1.3.medusa"; lower black box Fig. 4F).", however there is no "lower black box" in Figure 4F apparent.

      __Answer: __Re-evaluation of the detectable cell states after updating the mapping tool, which addresses issues associated with an overabundance of isoforms, results in the dissolution of this putative medusa-specific cell state. This profile is also found within the polyp and so the second half of this sentence has been removed.

      p13 - The text reads, "We find that class 2 neurons all express elevated levels of specific alpha- and beta- tubulins (TBA1-like3 and TBB-like-1; Fig. 4D).". Make the capitalization of your gene names (TBA1-like3, etc) consistent between text and figure throughout (in Fig. 4D the gene names are lower case).

      __Answer: __We have taken care to be consistent throughout the manuscript.

      p14 - In the first paragraph of this page, Fig. 4C is referenced twice, however both times the referencing sentence does not match this panel (most likely the authors meant to reference 4E, F or G).

      __Answer: __This has been corrected.

      p14 - The final sentence of this upper paragraph, "Specific tubulin-paralog expression within the class n2 neurons suggest that this is the portion of the nervous system labelled by the β-Tubulin antibody." is confusing. Do you mean that the b-tubulin antibody is most likely labelling the product of the tbb-like-1 gene that is shown in the featureplot in Fig 4D? Suggest rewriting this sentence for clarity.

      __Answer: __This sentence has been re-written as follows: “Specific tubulin-paralog expression within the class n2 neurons suggests that these two genes are translated into proteins recognised by this commercial β-Tubulin antibody. Furthermore, this antibody labelling suggests that the MNN is composed of N2 class neurons.” pg 14

      p14 - on this page and others in the manuscript, there are instances of the word "Aurelia" not being italicized.

      __Answer: __This has been corrected.

      p14 - In this sentence, "In the sea anemone Nematostella, anemone-specific gene duplications of members of the PaTH (Paraxis, Twist Hand-related) bHLH family of protein coding genes was driving the diversification of muscle cell types (29)." the "was driving" part of the sentence is grammatically clunky. Suggest rewording slightly. (e.g. "...protein coding genes drive the diversification of muscle cell type").

      __Answer: __We changed this to ‘drove’.

      -Myophilin-like2 in the text of the manuscript is written as myofilin-like2 in the figure panels (e.g. Fig 5L, Fig. 6D). Make consistent between text and figures.

      Answer: We changed all references to myophilin to calponin, which is the better known name of the vertebrate ortholog.

      p15 - on this page and several instances thereafter, "in situ" is not italicized as it should be.

      __Answer: __This has been corrected

      p19 - In the line, "Taken all together these data suggest that the contractile apparatus in the Scyphozoa, using here Aurelia as a proxy, is similar to the bilaterian smooth muscle contractile complex (Fig. 8C)." this should really reference Fig. 8 B-C

      __Answer: __This has been corrected according to the newest figure.

      Reviewer #1 (Significance (Required)):

      General assessment:

      I believe this manuscript adds a significant amount of useful data and provides some novel insights into scyphozoan cell types across an important life history transition from polyp to medusa in Aurelia. Adding the dataset to the USCS Cell Browser is a strength. I think there is the potential to make this an impactful paper but in its current form, it is pretty messy, and not clearly presented, and lacks some transparency. The greatest weaknesses lie in not framing the work adequately or putting it into enough context with previous work and also not relating it to other medusozoans; in the Figures which are overly crowded, and confusing rather than being clear and supporting the results; and in the lack of explanation for some methods like how cell clusters were annotated, how transcription factor families were determined; and the lack of access to the github data repository, which raises questions of reproducibility. It will take a good amount of restructuring figures and reframing to make the study clear and impactful and the methods and analyses reproducible.

      Advance: If the weaknesses are addressed adequately, this study does contribute new insights in the area of further understanding changes across an important scyphozoan life cycle transition in terms of diversity of cell types and their cell-type transcriptomes, opening up further questions which can now be addressed.

      Audience: The broader cnidarian community will be interested in this study. People studying cell type evolution and cell type novelty across the tree of life will also be interested. Anyone looking for examples of how to use modern approaches to understanding life cycle changes in animals will be interested.

      My expertise is in cnidarian cellular and molecular biology and evolution including working with model cnidarian research organisms and employing techniques and approaches similar to those used in this study.

      We thank this reviewer for their detailed comments and suggestions, and feel the manuscript is much improved in its current form. We hope that we have satisfied all concerns raised here.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      __This paper is well-written and serves as a valuable resource not only for the cnidarian community but also for researchers studying more broadly cell type identity and evolution. A key cell type enabling the transition from polyp to free-swimming medusa is the cnidarian striated muscle, which has only been morphologically identified in medusozoan jellyfish. While this study does not include functional analyses, it lays the foundation for the Aurelia research community to leverage single-cell atlas data for future investigations.

      Key experiments supporting the paper's main conclusions are missing :

      •At the beginning of the Results section, the authors mention identifying a previously undescribed developmental stage, which they name "clover-polyp" However, they do not later discuss whether this newly identified stage has a distinct gene expression signature. This point should be addressed in the paper or removed.

      __Answer: __We do not find any specific transcriptomic signature specific to this stage. We keep this designation as a morphological indicator of a strobilation-competent polyp, but have re-worded our introduction of this term as follows:

      “The first external sign of strobilation is the expansion of the body column into four pouches that are filled with multiple folds of inner cell layer epithelia (Fig. 1A), and resembles a cloverleaf from above; we call this stage the “clover-polyp”.”

      •A key reference is missing in the following sentences :

      "The anthozoan Nematostella vectensis has two principal neural sub-families that have been described that correspond to those with insulinoma expression (n1) and those with pou4 expression (n2) (13,14)."

      "The class n1 family also includes putatively non-neural secretory cell types ("s"), which are enriched in genes associated with digestion and extracellular matrix production (Data S1.10). These data suggest a close relationship between neurons and gland cells, like what has been suggested in other cnidarians (13,27)."

      "Thus, similar to that described for the anthozoan Nematostella vectensis (13,14), Class 1 neurons and related secretory cells comprise the predominant type of neuroglandular cells in the polyp stage. Further, these are the primary neuroglandular cells within the gastrodermis of the medusa."

      The first functional analysis of NvInsm1+ expressing neurons and secretory cells in Nematostella vectensis was conducted in this study (Tournière, O. et al., 2022), making it essential to cite this work.

      __Answer: __We appreciate the reviewer for drawing this oversight to our attention. This has been corrected in the revised manuscript.

      • To validate the neuronal component of this single-cell data, it is essential to confirm the N1 and N2 populations and demonstrate that they do not overlap. I recommend performing in situ hybridization or antibody staining for Insm1+ and Pou4+ cells (or any other suitable markers for these populations) to show that they are expressed in distinct cells/region in Aurelia.

      __Answer: __We appreciate the reviewers comment, however, there are unfortunately no specific antibodies available for Insm1 or Pou4, or any other n1/n2 specific neuronal marker protein. Moreover, we find in situ hybridization in this system to be very challenging except for highly expressed structural genes. Neurons are particularly difficult, because they are very small cells embedded between many other cell types. We attempted to validate distribution of different neuron populations with colorimetric in situ hybridization, FISH as well as HCR (hybridization chain reaction). However, we were not successful in labelling individual neuron bodies and visualising their cytoplasmic RNA content to distinguish individual cells and therefore individual neuron types. Regardless, to validate at least neuronal cell types, we were able to correlate pan-neuronal tbb-like expression with b-Tubulin antibody staining and of RFamide antibody staining with specific neuronal subpopulations.

      •What is labelled in yellow in Figure 5C? The legend should be updated.

      Answer: Figure 5C does not exist in the current version of the manuscript.

      •Figure 5i, j, and k, are not clear, the paper would benefit with bright field pictures.

      __Answer: __Images were replaced and some bright field photos are incorporated into both new figures.

      •Each figure should connect specific gene expression at a given stage with the corresponding single-cell expression data in a dot plot. For instance, in Figure 6, myofillin-like 2, mhc1, and mhc2 should be accompanied by their respective single-cell expression data at this stage in a dot plot.

      Answer: done!

      • The authors repeatedly refer to the polyp as asexual and the medusa as sexual; however, they do not mention any gonadal cluster nor discuss its absence from their single-cell data.

      __Answer: __We have added the following sentence to the current manuscript to account for this: “Despite its larger size, this animal was still reproductively immature and so no gonadal tissues were collected.”

      •The authors include EdU experiments in Figure S2 but discuss them only briefly in the text. If these experiments provide new insights, they should be elaborated on; otherwise, they could be removed from the manuscript.

      __Answer: __We have removed these data from the manuscript.

      • As this paper is primarily a resource for the cnidarian community, ensuring easy access is crucial for enabling species comparisons. I recommend making the data openly available through a single-cell portal, as done in Juliano et al. (2019).

      __Answer: __We have already released these data on the UCSC cellbrowser platform, as was stated in the manuscript. These data have been updated to reflect the current status of the analyses and is publicly available at www.jellyfish-atlas.cells.ucsc.edu

      Reviewer #2 (Significance (Required)): This well-written paper is a valuable resource for the cnidarian community. A key cell type driving the transition from polyp to free-swimming medusa is the cnidarian striated muscle, which has only been morphologically identified in medusozoan jellyfish. While the study lacks functional analyses, further biological validations, such as in situ hybridizations, are needed to confirm the single-cell data. Nevertheless, it lays a strong foundation for the Aurelia research community to utilize single-cell atlas data in future studies. To maximize its impact, the authors should ensure the data is easily accessible to the broader scientific community.

      We thank this reviewer for their recognition of the importance of this work. We have ensured that the data are available for download through the UCSC cell browser, and all scripts used in the data analysis are available on our github page. We additionally included our new gene models that are associated with the single cell data on the companion UCSC genome browser website, which now hosts the NCBI genome assembly with our gene models.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Link and collaborators presents a well-executed and thorough analysis (statistically significant) of cell types and developmental trajectories in Aurelia coerulea, a cnidarian with a medusa stage. While previous cnidarian cell atlases have focused on embryo-to-polyp development, this study uniquely incorporates adult medusa-stage cells, providing novel insights into cnidarian biology.

      The authors successfully identify a broad range of cell types and precursors in both polyp and medusa stages. By comparing transcriptional profiles, they demonstrate the presence of new cell types, such as neurons, in the medusa. Notably, they provide compelling evidence for the coexistence of both striated and smooth muscle within cnidarians-a topic they have explored in previous work. Their morphological analysis further suggests that striated and smooth muscle forms can exist within single cells, which is particularly intriguing. Overall, the results are convincing.

      A major strength of this study is the extensive number of cells analyzed and the rigorous classification of cell identities based on transcriptional profiles. Unlike many single-cell studies, the authors complement their findings with morphological, immunochemical, and in situ data, strengthening their conclusions. Conducting such an analysis without a fully annotated genome presents a significant challenge, yet the authors navigate this limitation effectively.

      One relative limitation, common to many single-cell studies, is the lack of detailed spatial information on the identified subtypes. While the authors have made efforts in this direction, a higher-resolution atlas that pinpoints these subtypes within the body would enhance the impact of the study. The absence of transgenic tools with cell-type-specific enhancers makes this difficult, but it remains a valuable avenue for future research. Despite this, the study's novelty and quality-particularly its inclusion of medusa-stage data-make it a strong candidate for publication in any journal associated with Review Commons.

      Minor Comments: • The term "terminal cell type markers" may not be the most appropriate for transcription factors that regulate state or specification. A more precise term, such as "state or specification transcriptional regulators," might be preferable.

      __Answer: __This term does not appear in the revised manuscript.

      • The suggestion that cell-type specification is not governed by a random collection of TFs seems self-evident. If not TFs, what alternative regulatory mechanisms (e.g., post-transcriptional regulation, small RNAs) are being implied?

      __Answer: __In the revised manuscript we have removed focus on the TFs.

      • The rationale behind the observation that "'early' cells separate along three principal trajectories (cnido.1, cnido.2, and cnido.3m), then converge upon a second mature transcriptomic phenotype" could be more clearly explained.

      __Answer: __This is a phenomenon that is now well established for cnidarians from the perspective of single cell transcriptomics (Chari et al, 2021: Clytia; Steger et al, 2022, Cole et al 2024, Plessier and Marlow 2026: Nematostella; Cazet et al 2023: Hydra). This phenomena is also described here in terms of the sequence of transcription factors that are activated sequentially in both Aurelia and Nematostella. We have modified the introductory text to better place these observations in context as follows:

      Recently we reported that within the sea anemone Nematostella vectensis, specification of the distinct cnidocyte types is marked by a diverging transcriptomic profile corresponding to the formation of the different capsule types, which then undergo a molecular switch demarcated by up-regulation of GFI1B and converge upon a secondary neural-like expression profile (11). Notably, we find a similar forked trajectory within the cnidocyte population of Aurelia. (Fig. 3A). A cluster of SoxC expressing ‘early’ cells separate along two principal trajectories (cnido.1, cnido.2), which then converge upon a second mature transcriptomic phenotype upon activation of jun/fos (Fig. 3E).

      • The illustrations of the nervous system in the ephyra and rhopalia are intriguing but lack spatial context for different neuronal populations beyond the positioning of class 2 neurons ("alpha- and beta-tubulin cells").

      Answer: We added a better introduction to gain more understanding of the different neuron populations in contrast to various findings of related publications. The text now reads:

      This rhopalia nervous system develops during polyp-medusa metamorphosis and is composed of specialized light- (pigment cup) and gravity- sensing (lithocyte/statocyst) cells, segregated into individual compartments with different developmental origins (12). Rhopalia development involves the gene expression of otx1, pit1 and brn3 in the pigment-cup (10),.... p4/5

      Further, we used findings from previous studies to add a more elaborate description to our results and we finally discuss it, for example:

      The ins-negative populations in both species express pou4 orthologs, also called brn3 (10), that is expressed also within the cnidocyte lineages and thus further supports claims of a close relationship between cnidocytes and insulinoma-negative/pou4-positive n2 neurons (13,14,52). p22

      • Muscle characterization is well-supported by phalloidin staining and gene markers, but is there a specific marker for smooth muscle? Myophilin-like-2 is mentioned, but is it definitive?

      Answer: Yes, there are many, as tabulated in supplemental Data S1.11. For example myophilin-like-2 [calponin] is a specific marker for smooth muscle cells and this is demonstrated via in situ hybridization in fig.6.

      • The finding that ~40% of genes distinguishing smooth and striated muscle lack homologs in other animals is striking. It may be worth investigating their expression patterns via in situ hybridization, particularly for those that differentiate muscle types. The fact that these genes are of unknown affinity does not mean they are uninformative.

      __Answer: __There are a variety of reasons that lead to a lack of orthology information amongst the gene models, including fragmented gene models, inclusion of unidentified lncRNAs, amongst others. However, due to this ambiguity and the lack of identification of these rationals we have removed this observation from the current manuscript. In fact, with the updated mapping tool and current gene annotations this number has fallen to only ~28% of the identified muscle-specific gene models, from a total ~38.7% unannotated gene models in the entire transcriptome. This is similar to other cells types in the dataset (between ~20%-35%), and also similar to the number of unannotated genes in the sea anemone Nematostella vectensis (36.5% overall)

      • The incompleteness of Aurelia genomes is acknowledged as a limitation. However, since the San Diego strain genome appears to be the most complete, is there a reason it was not used in this study? Was it not possible to recover the same strain?

      __Answer: __We have a standing culture in the lab that was used for these collections. While we considered generating a genomic assembly for this laboratory strain, we have concluded that this is not an effective use of resources at this time. We have now updated the reference for mapping however, from a re-analysis of the available Aurelia coerulea isolate AC-2021 genome (NCBI: GCA_039566865.1) annotated with the Gnomon 9.0 automated annotation pipeline, and supplemented with our in-house transcriptome to recover ~5000 additional gene model coordinates on the genome. These are available now via the UCSC genome browser website.

      We further thank this reviewer for the overall positive assessment of our work, and hope that the revised version further strengthens the data analysis and contribution to the community as a whole.

      __ **Referees cross-commenting**__

      Referees, I generally agree with their assessments. Below, I outline my main concerns and suggestions for improvement.

      Figures and Data Presentation

      I concur with Referee 1 that the figures are overcrowded, making it difficult to interpret individual panels. The excessive number of panels within a single figure creates unnecessary complexity. Some of these could be moved to the supplementary materials to improve readability. It seems that the authors aim to present every possible data analysis, but this is not necessary within the main text. As Referee 1 also noted, the key findings should be clearly visible, allowing the reader to follow the story without getting lost in excessive detail.

      __Answer: __We have re-structured most of the figures with this in mind and hope that we have achieved better clarity. Many of the data analyses in the previous versions have been removed if not directly related to the observations highlighted in the current version.

      Additionally, the annotation of clusters remains unclear, a concern also raised by other referees. The manuscript would benefit from a more explicit description of how these clusters were assigned.

      __Answer: __We have expanded our description of how we assigned identities to the nine principal cell type families as follows:

      (pg. 8) The inner epithelia, or gastrodermis, expresses several collagens that is a characteristic of the inner cell layer of anthozoans (39); the outer cell layer houses the ring musculature and is rich in contractile proteins. The striated muscle cluster is also rich in contractile protein and is the only principal cell population absent from the polyp-derived samples (Fig. 2C). The mucin gland expresses mucins, whereas the digestive gland expresses other digestive enzymes, whereas the neural cluster expresses synapsin and other conserved known neural regulators such as ashA. The cnidocytes express mini-collagens and are enriched in pathways targeting the endoplasmic reticulum (40).

      Writing and Discussion

      While I do not have major concerns with the writing, I suggest expanding the discussion, particularly regarding the relationship between muscle cell types and the diversification of paralogs. If the figures are streamlined, the text can also be made more concise, avoiding exhaustive references to every individual data point.

      Clarifications on the Muscle Section

      Several aspects of the muscle analysis require clarification: • The differences between muscle cell types are based on a set of differentially expressed genes, 40% of which (in each set) are of unknown affinities. However, it is surprising that the regulatory genes shared between both muscle profiles are expressed in bilaterian smooth muscles. The manuscript does not address whether bilaterian striated muscles share regulatory genes with the Aurelia striated muscle set. This comparison would be valuable.

      Answer: __With the latest mapping tool the percentage of muscle-specific genes of unknown affinities has dropped to ~28% and we no longer highlight this observation in the manuscript. Regarding the regulatory genes shared with smooth muscles of bilaterians, we feel this may be a misunderstanding. In Fig. 7 we clarify that these are __structural proteins regulating the contraction of the muscle (e.g. Myosin light chain kinase and calponin). With respect to the developmental regulators, e.g. muscle cell type determining transcription factors, we list several in Data S1.3b, S1.4b. A broader phylogenetic and also functional analysis of these transcription factors in different jellyfish species is the focus of another collaborative study and therefore we do not include an in depth discussion of this topic in the current manuscript.__ __

      • The high proportion of unknown genes is concerning. Is this due to issues with the transcriptome assembly, or is it a consequence of insufficient comparative analyses? The statement that "Mapping to this final transcriptome increased confidently mapped genes to 60%" raises questions-does this mean that 40% of differentially expressed genes remain unmapped? This point should be clarified.

      __Answer: __With the latest mapping tool, we now recover a confident alignment for ~80% of the sequences (See supplementary data S2.1). With the previous tool this value was only 60%, which means that 40% of the sequence data could not be used at all to generate the expression matrix. This is a different feature of the data analysis than the identity of the gene models. However, the statement mentioned here no longer appears in the current version of the manuscript.

      • Given the large number of differentially expressed genes with unknown function, could the authors perform in situ hybridization assays on a subset of these genes? This could provide insights into their spatial expression patterns and potential functional relevance.

      Answer: This is an intriguing suggestion, however, given that in situ hybridization for medium and low expressed genes are extremely difficult in this organism, we feel that this is beyond the scope of this study.

      • Both muscle types appear to rely on a similar contractile apparatus but exhibit differential usage of paralogs. This finding is intriguing but is not sufficiently discussed. Are other cell types associated with the differential use of paralogs? Expanding this discussion would add depth to the manuscript.

      Answer: We thank the reviewer for this insightful comment. Indeed, there is circumstantial evidence that differential usage of paralogs is also found among other cell types, e.g. neurons. We indeed discuss the example of a few other genes, e.g. ATOH-like transcription factors and myc. However, the diversity of neuronal populations is very large, which makes the picture quite complex. We are currently working on a phylogenetic framework of cell type families and also between species to address this point, but this requires more theoretical and methodological work. In this paper, we therefore restricted the analyses to the structural proteins of the two types of muscles, which facilitates the assignment of paralogs to either muscle. We point out that this is reminiscent of the differential expression of paralogs in the fast and slow contracting muscle cell types in Nematostella, suggesting that such a subfunctionalization may generally drive also the physiological diversification of muscle cell types in cnidarians (and of animals in general). Future work is aiming to address this on a broader scale, as suggested by the reviewer.

      Neuronal Subtypes

      I reiterate my previous comment regarding neuronal types: • The enrichment of neural subtypes in the medusa stage is an interesting, albeit expected, finding. However, the manuscript lacks details regarding their specific spatial distribution within the body. Providing this information would enhance the biological relevance of the findings.

      Answer: in situ hybridization for neurons is a challenge in all cnidarians, because the small neurons with very thin neurites are embedded and intermingled between many other cell types. In Aurelia, this has proven to be particularly difficult. At the very best, one might see small cell bodies stained, however, it fails to visualize neurites. We also tried HCR (hybridization chain reaction) in combination with antibody staining (b-Tubulin) to get to single cell resolution. However, the results were not conclusive and we therefore refrain from showing them in the paper. As an alternative we connected the findings of previous studies (Nakanishi et al., 2009, 2010) in terms of certain types of neurons located in different compartments of the rhopalia and corresponding marker genes with our single cell data (introduction/discussion). We acknowledge that more work needs to be done, best by generating specific antibodies against neuronal antigens. However, this is beyond the scope of this paper.

      References

      I also agree with Referee 2 that some statements require further substantiation with appropriate references. Strengthening these points with supporting literature would improve the rigor of the manuscript.

      Answer: We added appropriate references at all places indicated, as detailed above.

      Final Remarks

      Overall, while the study presents interesting findings, the manuscript would benefit from a clearer organization of figures, a more explicit explanation of muscle and neural subtype findings, and a deeper discussion on the significance of unknown genes and paralog usage. Addressing these concerns will enhance the clarity and impact of the paper.

      Reviewer #3 (Significance (Required)):

      Overall, this is a significant and well-supported study that advances our understanding of cnidarian cell diversity and muscle evolution. By examining how cell types change across the polyp and medusa stages, this study provides valuable insights not only into cnidarian development but also into broader evolutionary questions regarding the emergence of new body plans and tissue types. As a developmental biologist specializing in invertebrates, I find the results of this work particularly remarkable. It provides valuable insights into the developmental processes occurring in pre-bilaterian animals, shedding light on how cell types emerge and diversify in early-diverging metazoans

      Answer: We thank reviewer 3 for this positive evaluation.

      __Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      __Link et al. have studied cell type diversity in the scyphozoan Aurelia coerulea. More specifically, they compared several stages in the animal's life cycle using single-cell RNA-seq. Many members of the cnidarian clade Medusuzoa (scyphozoans included) have a metagenetic lifecycle that includes a sessile, clonally reproducing polyp and a free swimming, sexually reproducing medusa (jellyfish). The two phases are fundamentally different in their functional morphology, but the cellular basis of this difference has been unknown. The authors generated single cell RNA-seq libraries from eight life-cycle stages of the animal to include polyps, and medusae. Their main finding is that different cell types underlie polyp-medusa transition in this animal. Although expected intuitively, this finding has never been demonstrated experimentally. Moreover, a recent study on a colonial hydrozoan (Salamanca-Diaz et al. 2025) has shown that colony parts, as opposed to different life stages, use largely the same cellular components. Therefore, the current study is of broad interest to developmental and evolutionary biologists. Overall, the experiments and data analyses have been performed to a high standard, the figures are of good quality, and the manuscript is well written. Below are a few minor points to be addressed.

      The Aurelia strain used in the study is somewhat ambiguous (suggested to be A. coerulea). The authors' statements on pp. 24, 25 are somewhat confusing--they first say they got over 90% alignment to the San Diego strain genome assembly but then state (in the 'Transcriptome mapping' section) that they got only 40% of their reads aligned, forcing them to use Trinity de novo transcriptome assembly. Please clarify.

      __Answer: __Alignment to the genome is different from assignment of the alignment to a gene model. Ambiguous alignment cannot be assigned, and missing gene models would not have an assignment. However, we have switched the mapping tool used for this dataset for one that fits both genome sequence alignment AND gene model assignment better than the previously available choices. We now have ~80% of all sequences unambiguously aligned to the genome.

      1. 7--the authors state that some transcription factor families are over/underrepresented as terminal type marker. How do they know which cells are terminally differentiated.

      __Answer: __We have removed our focus on transcription factor families in this work and recognize that the definition of a terminally differentiated cell state from single cell transcriptomics has not been clearly defined.

      The homeobox gene Tlx has been reported to be associated with medusa development, being absent in taxa without medusae (Travert et al. 2023). Is it expressed in the Aurelia medusa (I couldn't find it in the data), and if so, where?

      __Answer: __This is indeed a good point that we were also interested in. However, Tlx is detected ONLY in the ephyra libraries and at very low levels which is why we chose to avoid discussing it as the low detection prevents accurate reporting of the expression and could reflect rather a mapping problem for this gene (mis-annotated 3’ end). As information for this reviewer, the gene model shows some spurious reads specifically in a few neuron subtypes, and outside the ephyra is lowly detected ONLY in the medusa library for medusa neuron n.7 (n2.7m).

      I do not quite understand the authors' arguments for independent striated muscle evolution in cnidarians and bilaterians. Key striated muscle genes (e.g., titin) are present in hydrozoan and anthozoan genomes; furthermore, the expression patterns of Otx is not indicative because its function in medusozoans is unknown. What are the arguments against an alternative scenario in which striated muscles evolved before the cnidarian-bilaterian split, but lost in anthozoans?

      Answer: This is indeed a complex question, which requires a more thorough and targeted comparative analysis. We note that a BLAST hit for Titin can be misleading due to the many domain repeats of this Titin, which are also found in other proteins. To be more prudent, we removed this part from the manuscript. This will be subject of a future, thorough study.

      1. 27, the link https://github.com/technau/AureliaAtlas is broken.

      __Answer: __We appreciate this comment and have ensured that the github archive is publicly available with all relevant scripts associated with all versions of the BioRxiV record.

      p. 24 (limitations of the study section), the authors refer to "cosmopolitan species"; they probably mean "genus".

      __Answer: __We changed to “taxon” and dropped cosmopolitan.

      p. 24-25 on two occasions in the M&M sections, the authors put the abbreviation first and the initials in brackets (ASW and BSA).

      __Answer: __This has been corrected.

      "Metagenic" should be "metagenetic"

      __Answer: __This has been corrected.

      Reviewer #4 (Significance (Required)):

      The study is of broad interest to developmental and evolutionary biologists. It addresses an important question, not dealt with directly in previous studies.

      Answer: We thank reviewer 4 for this positive and encouraging assessment.

    1. Reviewer #2 (Public review):

      Summary

      Previous studies by some of the same authors of the actual manuscript showed that healthy human newborns memorize recently learned nonsense words. They exposed neonates to a familiarization period (several minutes) when multiple repetitions of a bisyllabic word were presented, uttered by the same speaker. Then they exposed neonates to an "interference period" when newborns listened to music or the same speaker uttering a different pseudoword. Finally, neonates were exposed to a test period when infants hear the familiarized word again. Interestingly, when the interference was music, the recognition of the word remained. The word recognition of the word was measured by using the NIRS technique, which estimates the regional brain oxygenation at the scalp level. Specifically, the brain response to the word in the test was reduced, unveiling a familiarity effect, while an increase in regional brain oxygenation corresponds to the detection of a "new word" due to a novelty effect. In previous studies, music does not erase the memory traces for a word (familiarity effect), while a different word uttered by the same speaker does.

      The current study aims at exploring whether and how word memory is interfered with by other speech properties, specifically the changes in the speaker, while young children can distinguish speakers by processing the speech. The author's main hypothesis anticipates that new speaker recognition would produce less interference in the familiarized word because somehow neonates "separate" the processing of both words (familiarized uttered by one speaker, and interfering word, uttered by a different speaker), memorizing both words as different auditory events.

      From my point of view, this hypothesis is interesting since the results would contribute to estimate the role of the speaker in word learning and speech processing early in life.

      Major strengths:

      (1) New data from neonates. Exploring neonates' cognitive abilities is a big challenge, and we need more data to enrich the knowledge of the early steps of language acquisition.

      (2) The study contributes new data showing the role of speaker (recognition) on word learning (word memory), a quite unexplored factor. The idea that neonates include speakers in speech processing is not new, but its role in word memory has not been evaluated before. The possible interpretation is that neonates integrate the process of the linguistic and communicative aspects of speech at this early age.

      (3) The study proposes a quite novel analytic approach. The new mixed models allow exploring the brain response considering an unbalanced design. More than the loss of data, which is frequent in infants' studies, the familiarization, interference and learning processes may take place at different moments of the experiment (e.g. related to changes in behavioural states along the experiment) or expressed in different regions (e.g. related to individual variations in optodes' locations and brain anatomy).

      Main weaknesses:

      I did not find major weaknesses. However, I would like to have more discussion or explanation in the following points.

      (1) It would be fine to report the contribution of each infant to the analysis, i.e. how many good blocks, 1 to 5 in sequence 1 and 2, were provided by each infant.

      (2) Why did the factor "blocknumber" range from 0 to 4? The authors should explain what block zero means and why not 1 to 5.

      (3) I may suggest intending to integrate the changes in brain activity across the 3 phases. That is, whether changes in familiarization relate to changes in the test and interference phases. For instance, in Figure 2, the brain response distinguishes between same and novel words that occurred over IFG and STG in both hemispheres. However, in the right STG there was no initial increase in the brain response, and the response for the same was higher than the one for novels in the 5th block.

      (4) Similarly, it is quite amazing that the brain did not increase the activity with respect to the familiarization during the interference phase, mainly over the left hemisphere, even if both the word and speaker changed. Although the discussion considers these findings, an integrated discussion of the detection of novel words and the detection of a novel speaker over time may benefit from a greater integration of the results.

      Appraisal

      The authors achieved their aims, because the design and analytic approaches showed significant differences. The conclusions are based on these results. Specifically, the hypothesis that neonates would memorize words after interference, when interfered speech is pronounced by a different speaker was supported by the data, in block 2 and 5 and discussed the potential mechanisms underlying these findings, such as separate processing for different speakers, likely related to the recognition of speaker identity.

      I think the discussion is well structured, although I may suggest integrating the changes into the three phases of the study. Maybe comparing with other regions, not related to speech processing.

      Evaluating neonates is a challenge. Because physiology is constantly changing. For instance, in 9 minutes newborns may transit from different behavioral states and experience different physiological needs.

      This study offers the opportunity to inspire looking for commonalities and individual differences when investigating early memory capacities of newborns.

      Comments on revisions:

      The authors provided satisfactory answers to my concerns.

      I recognize that, because of technical and ethical reasons, the studies with neonates are particularly challenging, however, with a well-balanced design as the one the authors applied, even with small samples the data constitute valuable sources to advance in the field.

      Neonate brain works in a particularly state of intense metabolic, functional and structural changes, which we are far to understand. Current data contribute to fill this gap in knowledge.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Summary:

      This manuscript investigates whether newborns can use speaker identity to separate verbal memories, aiming to shed light on the earliest mechanisms of language learning and memory formation. The authors employ a well-designed experimental paradigm using functional nearinfrared spectroscopy (fNIRS) to measure neural responses in newborns exposed to familiar and novel words, with careful counterbalancing and acoustic controls. Their main finding is that newborns show differential neural activation to novel versus familiar words, particularly when speaker identity changes, suggesting that even at birth, infants can use indexical cues to support memory.

      Strengths:

      Major strengths of the work include its innovative approach to a longstanding question in developmental science, the use of appropriate and state-of-the-art neuroimaging methods for this age group, and a thoughtful experimental design that attempts to control for order and acoustic confounds. The study addresses a significant gap in our understanding of how infants process and remember speech, and the data are presented transparently, with clear reporting of both significant and non-significant results.

      Weaknesses:

      However, there are notable weaknesses that limit the strength of the conclusions. The main recognition effect is restricted to a specific subgroup of participants and emerges only during a particular testing window, raising questions about the robustness and generalizability of the findings. The sample size, while typical for infant neuroimaging, is modest, and the statistical power is further reduced by missing data and group-dependent effects. Additionally, the claims regarding episodic memory and evolutionary implications are somewhat overstated, as the paradigm primarily demonstrates memory retention over a few minutes without evidence of the rich, contextually bound recall characteristic of fully developed episodic memory.

      Overall, the authors have achieved their primary aim of demonstrating that speaker identity can facilitate memory separation in newborns, providing valuable preliminary evidence for early indexical processing in language learning. The results are intriguing and likely to stimulate further research, but the limitations in effect robustness and theoretical interpretation mean that the findings should be viewed as an important step forward rather than a definitive answer. The methods and data will be of interest to researchers studying infant cognition, memory, and language, and the study highlights both the promise and the challenges of probing complex cognitive processes in the earliest stages of life.

      We thank the reviewer for their thoughtful and positive assessment of our work, and for giving us the opportunity to clarify points that may have been unclear in the original manuscript.

      First, considering that the recognition response was quite consistent in previous studies, we expected the effect to emerge within a specific testing window, in either the first or the second block, depending on task difficulty. Accordingly, our analytical approach was designed to reflect this expectation, which was subsequently confirmed by the results. Second, the main recognition effect is not restricted to a specific subgroup of participants. Recognition responses were observed in both groups in the left IFG and bilateral STG. The only group-specific modulation was found in the right IFG, where the effect was primarily driven by Group A. This suggests that activity in this specific region may be influenced by contextual factors such as the nature and amount of recently processed stimuli. We have clarified these points in the revised manuscript to avoid the impression that the core effect is limited to a subset of participants or not generalizable across studies. 

      Regarding the sample size, a formal calculation was initially attempted based on the effect size reported in a closely related ANOVA-based study (Benavides-Varela et al., 2011; Study 2: Word recognition after intervening melodies, main effect for the comparison same vs novel word [F(1,26) = 19.318; p<0.0001 effect size f =.87). However, inputting this information into a dedicated software (G*power; α = 0.05; number of groups =1; number of measurements = 2) leads to an estimated sample size of N = 5 to 7 (depending on the desired power, range = 0.800.95). This sample size is unrealistically small and not representative of current research standards in the field. A proper formal power analysis for the LMM is otherwise hard to perform, as we lack information about the expected variance and random-effects structure. We therefore aligned our sample size with prior newborn studies using similar stimuli and experimental designs, and with fNIRS studies in newborns and infants (for recent metanalysis see De Roever et al., 2018; Boek et al., 2023; Gemignani et al., 2023; which examined studies with mean N =24; N range= 186 and sample sizes often including various conditions and groups). Note also that our design includes a within-subject comparison, our analytical approach models subject-level variance and handles unbalanced datasets and missing data (which are common in infant studies), thereby improving statistical sensitivity. We have now explicitly clarified this choice in the Introduction.

      Finally, we revised the discussion to ensure that interpretations are aligned with our findings, by including a limitations section and a more explicit note regarding theories of memory.

      Episodic memory is a multifaceted construct that matures over time through the integration of the what–who-where–when information. The present study does not aim to demonstrate the presence of a fully developed episodic memory system at birth; rather, it shows that specific features of episodic-like processing (i.e., what–who) are already bound from the first days of life. Future studies may track the progressive integration of additional episodic-related components leading to a mature episodic memory system.

      Reviewer #1 (Recommendations for the authors):

      (1) I wonder why a control condition with same-speaker interference was not included. Adding such a control would allow you to directly test whether the observed effects are truly due to speaker changes, rather than other acoustic or procedural factors. If it is not feasible to add this condition, please discuss its absence explicitly and clarify how it impacts the interpretation of your findings.

      We thank the reviewer for raising the issue of a same-speaker interference control. A similar control has been tested previously using a closely related paradigm, showing that recognition does not persist when neonates hear another word produced by the same speaker during the retention period (Benavides-Varela et al., 2011). As noted in the manuscript, there were some methodological differences between that study and the current one. Most importantly, in the present study familiarization was reduced (from ten to five blocks) and the retention interval increased (two to three minutes), making the current paradigm more demanding. We reasoned that, if newborns forgot the word under the prior (less challenging) study, they would also forget it here if a same-speaker interference control would have been implemented. With the current manipulation, despite the difficulty of the paradigm, the recognition response was observed. This pattern suggests that speaker change, rather than general procedural factors, is central to the observed effect. Given these prior findings and the ethical constraints of testing newborns, we believe that adding a new same-speaker control is not essential. We have now made this rationale more explicit in the manuscript (discussion section, limitations, p. 16), hoping that this clarification will make our methodological choices clearer.

      (2) It wasn't clear if Group A and Group B have the same number of infants, and whether they were randomly assigned. Please specify.

      Participants were initially assigned to Group A or Group B in a counterbalanced way to maintain comparable group sizes. Due to attrition and subsequent exclusion for various reasons (e.g., low signal quality, fussiness, technical issues), the final sample consisted of 17 infants in Group A and 15 infants in Group B. We have now specified this information in the revised manuscript (p. 20).

      (3) Please specify the exact number of fNIRS channels assigned to each region of interest (ROI), as it is currently difficult to map the channel numbers in Supplementary Table 2 to the optode montage shown in Figure 2. Additionally, report the percentage of usable channels after quality control.

      The inferior frontal gyrus left and right ROIs comprised 4 channels each, the superior temporal gyrus left and right ROIs 5 channels each, and the parietal lobe left and right ROIs 7 channels each. This information has been added to the methods section, along with the average number of channels contributing to each ROI after data rejection and the percentage of channels rejected throughout the recording (p. 23).

      (4) Also, a formal power analysis to justify your sample size would be helpful for evaluating the reliability of your findings and is increasingly expected in developmental neuroimaging research.

      Thanks for this suggestion. As stated in the public response, we agree that power analyses constitute an important component of methodological rigor in the field. In our case, a formal calculation was initially attempted based on the effect size reported in a closely related ANOVAbased study (Benavides-Varela et al., 2011; Study. 2: Word recognition after intervening melodies, main effect for the comparison same vs novel word [F(1,26) = 19.318; p<0.0001 effect size f =.87).

      However, inputting this information into a dedicated software (G’power; α = 0.05; power range = 0.80-0.95; number of groups =1; number of measurements = 2) leads to an estimated sample size of N = 5 to 7, which is unrealistically small and not representative of current research standards in the field. A proper formal power analysis for the LMM is otherwise hard to perform, as we lack information about the expected variance and random-effects structure. We therefore aligned our sample size with prior newborn studies using similar stimuli and experimental designs, and with fNIRS studies in newborns and infants (for recent metanalysis see De Roever et al., 2018; Boek et al., 2023; Gemignani et al., 2023; which examined studies with mean N =24; N range= 1-86 and sample sizes often including various conditions and groups. Note also that our design includes a within-subject comparison, and our analytical approach models subject-level variance and handles unbalanced datasets and missing data (which are common in infant studies), thereby improving statistical sensitivity.

      (5) The manuscript references episodic memory explicitly in the abstract and introduction, emphasizing the role of speaker identity in enabling episodic-like memory from birth. However, this concept is not sufficiently addressed or delineated in the discussion. Episodic memory is generally understood as recalling events with contextual details, involving complex integrative processes that extend beyond simple recognition of auditory stimuli. Your paradigm demonstrates memory retention over a few minutes but does not provide strong evidence for the hallmark features of episodic memory, such as contextual binding or autobiographical recollection. Moreover, infant speech recognition and memory formation in early life are influenced by the immediacy and complexity of sensory input, which may not necessarily engage fully developed episodic systems. Clarifying these distinctions and making sure your interpretations and claims are consistent with them would enhance the conceptual clarity of the manuscript.

      We agree that episodic memory is a multifaceted construct that, in its mature form, entails the ability to retrieve past events with contextual detail, typically involving autobiographical recollection and the integration of what–-who-where–when information (Tulving, 1993). Our study does not aim to demonstrate the presence of a fully developed episodic memory system at birth, nor do we claim that newborns’ performance satisfies all hallmark criteria of mature episodic memory. 

      Here, we focused on sensitivity to speaker identity as a contextual dimension relevant to memory formation. Within this narrower sense, both, the patterns of activation and the localization of the response provide evidence for early source–content binding (i.e., what–who), which can be considered a foundational aspect of episodic-like processing. Following up on this foundational step, future studies may track the gradual integration of additional aspects (where-when), ultimately leading to the maturation of a fully functional human episodic memory system.

      We have now clarified this point in the revised manuscript (p. 17)

      (6) Please add a dedicated limitations section. This should address the group-dependent nature of your main effects, the timing-specific recognition response, and any other methodological constraints that may impact the generalizability of your results.

      We thank the reviewer for this comment. We have made our best to expose the limitations of our study in the text (p.16), specifically regarding the reasons for the lack of a control condition and the effects of frequent changes in sleeping states in newborns. 

      (7) Consider revising sections where claims may be overstated, particularly regarding episodic memory and evolutionary implications.

      These sections have now been revised in the abstract and throughout the manuscript to ensure that interpretations remain proportionate to the data and consistent with current theoretical frameworks.

      Reviewer #2 (Public review):

      Summary:

      Previous studies by some of the same authors of the actual manuscript showed that healthy human newborns memorize recently learned nonsense words. They exposed neonates to a familiarization period (several minutes) when multiple repetitions of a bisyllabic word were presented, uttered by the same speaker. Then they exposed neonates to an "interference period" when newborns listened to music or the same speaker uttering a different pseudoword. Finally, neonates were exposed to a test period when infants hear the familiarized word again. Interestingly, when the interference was music, the recognition of the word remained. The word recognition of the word was measured by using the NIRS technique, which estimates the regional brain oxygenation at the scalp level. Specifically, the brain response to the word in the test was reduced, unveiling a familiarity effect, while an increase in regional brain oxygenation corresponds to the detection of a "new word" due to a novelty effect. In previous studies, music does not erase the memory traces for a word (familiarity effect), while a different word uttered by the same speaker does.

      The current study aims at exploring whether and how word memory is interfered with by other speech properties, specifically the changes in the speaker, while young children can distinguish speakers by processing the speech. The author's main hypothesis anticipates that new speaker recognition would produce less interference in the familiarized word because somehow neonates "separate" the processing of both words (familiarized uttered by one speaker, and interfering word, uttered by a different speaker), memorizing both words as different auditory events.

      From my point of view, this hypothesis is interesting, since the results would contribute to estimating the role of the speaker in word learning and speech processing early in life.

      Strengths:

      (1) New data from neonates. Exploring neonates' cognitive abilities is a big challenge, and we need more data to enrich the knowledge of the early steps of language acquisition.

      (2) The study contributes new data showing the role of speaker (recognition) on word learning (word memory), a quite unexplored factor. The idea that neonates include speakers in speech processing is not new, but its role in word memory has not been evaluated before. The possible interpretation is that neonates integrate the process of the linguistic and communicative aspects of speech at this early age.

      (3) The study proposes a quite novel analytic approach. The new mixed models allow exploring the brain response considering an unbalanced design. More than the loss of data, which is frequent in infants' studies, the familiarization, interference and learning processes may take place at different moments of the experiment (e.g. related to changes in behavioural states along the experiment) or expressed in different regions (e.g. related to individual variations in optodes' locations and brain anatomy).

      Weaknesses:

      I did not find major weaknesses. However, I would like to have more discussion or explanation on the following points.

      (1) It would be fine to report the contribution of each infant to the analysis, i.e. how many good blocks, 1 to 5 in sequence 1 and 2, were provided by each infant.

      (2) Why did the factor "blocknumber" range from 0 to 4? The authors should explain what block zero means and why not 1 to 5.

      (3) I may suggest intending to integrate the changes in brain activity across the 3 phases. That is, whether changes in familiarization relate to changes in the test and interference phases. For instance, in Figure 2, the brain response distinguishes between same and novel words that occurred over IFG and STG in both hemispheres. However, in the right STG there was no initial increase in the brain response, and the response for the same was higher than the one for novels in the 5th block.

      (4) Similarly, it is quite amazing that the brain did not increase the activity with respect to the familiarization during the interference phase, mainly over the left hemisphere, even if both the word and speaker changed. Although the discussion considers these findings, an integrated discussion of the detection of novel words and the detection of a novel speaker over time may benefit from a greater integration of the results.

      Appraisal:

      The authors achieved their aims because the design and analytic approaches showed significant differences. The conclusions are based on these results. Specifically, the hypothesis that neonates would memorize words after interference, when interfered speech is pronounced by a different speaker, was supported by the data in blocks 2 and 5, and the potential mechanisms underlying these findings were discussed, such as separate processing for different speakers, likely related to the recognition of speaker identity.

      I think the discussion is well-structured, although I may suggest integrating the changes into the three phases of the study. Maybe comparing with other regions, not related to speech processing.

      Evaluating neonates is a challenge. Because physiology is constantly changing. For instance, in 9 minutes, newborns may transit from different behavioral states and experience different physiological needs.

      We thank the reviewer for their constructive and positive appraisal of our work and for drawing attention to points that benefited from further clarification or discussion in the manuscript.

      In the following, we address each point in turn, using the numbering of the reviewer’s identified concerns.

      (1) In the Methods section (“Data Processing and Analysis”, p. 22), we have added detailed information about the number of data points contributed by each infant to the analyses.

      (2) The factor “blocknumber” ranged from 0 to 4 for statistical purposes, allowing Block 0 to serve as the reference (intercept) in the model. This coding facilitated the interpretation of parameter estimates. We now clarify this in the revised manuscript (p. 7).

      (3) Thanks for this relevant suggestion. In the Discussion, we now explicitly discuss the relationship across phases. We also acknowledged that a thorough examination of these issues lies beyond the scope of the present study as it will require future work based on multivariate and connectivity analyses.

      (4) We thank the reviewer for this comment. In the revised manuscript, we have expanded the Discussion to clarify the absence of a strong novelty response during interference. The discussion highlights how the temporal properties of the hemodynamic response and the functional demands of each phase jointly shape the observable fNIRS signal in newborns, with purely sensory novelty effects likely increasing with maturation.

      Finally, we agree that evaluating the transitions of sleeping states can further strengthen and clarify the results obtained in the present study. This has now been added as one of the limitations of this study.

    1. Author response:

      [These author responses are to reviews from another journal.]

      Reviewer #1:

      This manuscript investigates the behaviour of a variety of clock proteins in cultured cells when epitope tagged and transiently expressed and try to draw general implications for endogenous function of circadian clock proteins.

      Clock proteins are expressed at low levels in most cells, and so the clock interacting proteins (other kinases, phosphatases, ubiquitin-conjugated enzymes, etc.) are likewise probably at low abundance. Over-expression of one or two or even three components of a multicomponent system is going to produce odd and obscure non-physiological imbalances. The authors do not extend detailed study of these imbalances to more physiologic levels so the importance of their observations to clock function is not clear, and importantly, they are not tested in more biologically relevant models.

      To study the function of components within a system, the steady state must be perturbed in one way or another. This can be achieved through pharmacological treatment, mutagenesis, downregulation, or overexpression. Such interventions are inherently non-physiological, and the relevance of the resulting observations must therefore be carefully validated.

      In our study, the purpose of PER2 overexpression was to investigate its subcellular dynamics in the absence and presence of CRYs, specifically CRY1. This is far less trivial than it might appear at first glance, because our data clearly show that PER2 overexpression triggers, within 24 h, the accumulation of endogenous CRY1 (Fig. 1A), due to PER2-mediated stabilization of CRY1 (Fig. 4). PER2 overexpression also induces the accumulation of endogenous PER1, CK1, and BMAL1 (Fig. 2).

      This effect was not considered in previous studies, such as Yagita et al. (2002), in which PER2 subcellular localization was assessed at a single time point following transient transfection. Yagita et al. found roughly equal proportions of cells with PER2 exclusively in the nucleus, exclusively in the cytoplasm, or distributed between both compartments. Such extreme cell-to-cell variability cannot be explained solely by PER2’s shuttling dynamics, as that would imply synchronous export in one cell and synchronous import in another.

      Our time-resolved analysis of DOX-induced PER2 expression strongly suggests that the variability reported by Yagita et al. reflects a heterogeneous population of unsynchronized cells at different temporal stages along a trajectory from cytoplasmic PER2 (unbound) to nuclear PER2 fully saturated with CRYs (bound), owing to stabilization of endogenous CRYs. Similarly, Öllinger et al. (2014) analyzed PER2 nuclear export in cells constitutively expressing PER2-Dendra. Under such steady-state conditions, PER2-Dendra is already in complex with endogenous CRYs. The slow export rate and lack of dependence on additional CRY1 expression therefore likely reflect export of the complex, which is intrinsically slow.

      Thus, prior to our work, no data on the true shuttling dynamics of PER2 were available.

      Importantly, our results show not only that CRY1 promotes nuclear accumulation of PER2 (as reported by Öllinger et al.) but also that, conversely, PER2 promotes cytosolic accumulation of CRY1, depending on their expression ratio. Since CRY1 is predominantly nuclear and PER2 predominantly cytosolic, and because a PER2 dimer can bind one or two CRY1 molecules, our data suggest that the shuttling equilibrium depends on PER2 saturation state: a PER2 dimer bound to one CRY1 remains cytosolic, whereas a dimer bound to two CRY1 is nuclear.

      These observations are novel and have not been reported previously. They were only possible through time-resolved analysis of overexpressed proteins.

      A number of the findings are confirmatory rather than novel - the phosphorylation-regulated nuclear-cytoplasmic shuttling of CK1 and PER proteins is long known, and it's not clearly stated what is novel here. 

      We acknowledge prior work by Milne et al. (2001), who showed that kinase-dead CK1 is predominantly nuclear and that prolonged treatment with leptomycin B (16 h) enhances its nuclear localization. We cite this study at the beginning of the relevant paragraph. While we confirm these earlier observations, our work extends them in several important and novel ways:

      (1) Rapid dynamics of CK1 localization – We show that pharmacological inhibition of CK1 with PF670 induces rapid (within 1 h) depletion of CK1δ from the centrosome, accompanied by nuclear accumulation and elevated CK1δ levels. These kinetics have not previously been reported. We also show that proteasome inhibition with MG132 enhance centrosomal staining, indicating that centrosomal binding sites are not saturated. Together, the data show that CK1δ equilibrates rapidly between its binding partners. 

      (2) Integration of localization with protein stability – We relate the known localization patterns of WT CK1 and the kinase-dead mutant K38R to CK1 degradation dynamics and further compare them to the tau-like kinase mutant CK1δ-R1178Q. This integration of subcellular localization data with turnover mechanisms provides new mechanistic insight.

      (3) Comprehensive regulatory model – In the revised manuscript, we now include a schematic summarizing how CK1δ is posttranslationally regulated via subcellular shuttling, nuclear degradation, and dynamic interactions with binding partners (Figure EV5C). To our knowledge, such a comprehensive view of CK1δ regulation, linking localization, stability, and partner association, has not been presented before.

      We believe these additions clearly distinguish our findings from prior reports and highlight the novel aspects of our study.

      The formation of PER and CRY and CK1 complexes likewise is well established. The finding that formation of multiprotein complexes stabilize otherwise unstable over-expressed proteins is interesting but not novel.

      We fully agree that the existence of PER–CRY–CK1 complexes is well established. It is also known that PER2 stabilizes CRY1 by occupying the FBXL3 binding site and that CRY1 promotes the nuclear accumulation of PER2. We do not present these established interactions as novel findings.

      Our novel contribution, as outlined above, is the discovery that the shuttling and subcellular localization of PER2 and CRY1 are mutually dependent on their expression ratio. Specifically, we show for the first time that the steady-state shuttling distribution PER2 alone is cytosolic due to its rapid nuclear export wherease CRY1 is predominantly nuclear (known). Given that CRY1 facilitates the nuclear import of PER2 (known) and that a PER2 dimer can bind either one or two CRY1 molecules, our data showing that cytoplasmic PER2-CRY1 foci contain less CRY1 than nuclear foci lead us to conclude that cytoplasmic PER2 complexes contain one CRY1 molecule, while nuclear complexes contain two.

      This model provides a mechanistic explanation for the distribution of PER2 between the cytosol and nucleus and for the relatively lower cytosolic CRY1 levels. Moost importantly, we further show (for the first time) that CK1-mediated phosphorylation of PER2 displaces CRY1. This phosphorylation event would produce PER2 dimers with one or no CRY1 bound, promoting their export to the cytosol. We believe this represents a novel and potentially important mechanism for regulating circadian clock function.

      The results from many of the imaging assays are not quantitated, and the figures often show single cells. It's hard to draw statistical significance from these.

      The phenotypes we report here are result of multiple technical and biological replicates (n >3). Image analysis and statistical analysis was performed when required. We show additional examples in the EVs.

      There are a number of phenomena seen whose physiological relevance is unclear. In figure 1, forced over-expression of CRY1 and PER2 leads to formation of nuclear foci. It is unlikely these foci form at non-overexpressed levels, and so the general interest and relevance is not high nor investigated. This reduces the impact of the finding.

      It has been shown that PERs and CRYs do not form thermodynamically stable, large (detectable) foci under physiological conditions, as we have stated in the manuscript. Whether these proteins have the propensity to form smaller, more dynamic structures of physiological relevance is an interesting question that could be explored elsewhere, but it is not relevant to our study. In our work, these foci are simply convenient markers for analyzing the interaction and subcellular (co)localization of clock proteins under investigation. In the revised version, we have kept the analysis of these foci and the discussion of their potential relevance to a minimum in order to avoid confusion and unnecessary discussions.

      The finding that CK1δ is keep in the dephosphorylated state by binding to PER has been established previously by Johnson and colleagues and should perhaps be mentioned (Qin JBR 2015 (doi: 10.1177/0748730415582127).

      There is clearly a misunderstanding here. Qin et al.’s data show that, in a cell-free system, CK1ε phosphorylates PER2 and also autophosphorylates its C-terminal tail (autoradiograph, Fig. 1E).  

      However, because PER2 phosphorylation is carried out by CK1ε that is tightly anchored to PER2, there is competition between PER2 phosphorylation and tail autophosphorylation. As a result, the kinetics of tail phosphorylation are slower (Fig. 3B and quantification in C) than those observed with free CK1ε (as seen in the presence of the p53 substrate, Fig. 3A,C). We believe that his is also happening in the cell.

      Author response image 1.

      Our data, in contrast, address a different point. It has been known from the Virshup lab for decades that CK1δ/ε undergo futile cycles of (auto)phosphorylation and dephosphorylation, resulting in an active, dephosphorylated kinase in cells because cellular phosphatases are more efficient than CK1 autophosphorylation. We now show that CK1δ is also efficiently dephosphorylated when bound to PER2 (Fig. 3). Nevertheless, despite dephosphorylation of PER2-bound CK1δ, PER2 itself becomes hyperphosphorylated, indicating that cellular phosphatases act differently on these two substrates. To clarify this point, we inhibited phosphatases with calyculin A (CalA). Under these conditions, both PER2 and PER2-bound CK1δ became efficiently hyperphosphorylated (new Fig. 3).

      The degradation of kinase-active but not inactive CK1 is only shown here with 50-fold overexpressed protein so it's interesting, but the relevance to circadian biology is not made clear. The fact that over-expressed CK1 is degraded primarily in the nucleus is interesting, but needs further characterization - is this affected by the epitope tag? Is it true of endogenous CK1 or only over-expressed CK1? Is this not seen with e.g. other forms of CK1, e.g. lacking the C-terminus?

      The observation that unassembled kinase is rapidly degraded is most clearly demonstrated by overexpression experiments. However, Fig. 3 shows that overexpression of CRY1 and PER2 leads to the accumulation of elevated levels of endogenous CK1δ (untagged), indicating that endogenous kinase is likewise degraded in the absence of a stabilizing binding partner. In addition, we present data showing that overexpression of tagged CK1δ reduces the levels of endogenous, untagged CK1δ, further supporting the conclusion that unassembled endogenous CK1δ is unstable and subject to degradation.

      Further characterization of the CK1 degradation pathway is of considerable interest and could form the basis of a separate study, particularly to identify the components that mediate activity-dependent nuclear export and activity-dependent nuclear degradation. The Δ-tail kinase is expressed at very low levels, although interpretation is complicated by the possibility that this reflects pleiotropic effects.

      The final figure, showing that nuclear CK1 is the form responsible for shortening rhythms, is interesting. Is this because massive increases in nuclear CK1 alter PER, or BMAL/CLOCK, or proteasome activity?  

      Our data show that cells expressing either nuclear or cytosolic CK1 are viable, proliferate normally, and maintain a functional circadian clock. Therefore, overexpression of the kinase does not produce pleiotropic effects.

      To assume it's due to PER phosphorylation is in disagreement with the studies of Meng et al. Neuron 2008 DOI 10.1016/j.neuron.2008.01.019.

      The data are not in disagreement with Meng et al.; in fact, they align quite well. Meng et al. showed that CK1ε-tau shortens the circadian period, which we had also previously reported for CK1δ-tau-like (Marzoll et al., 2022). We now demonstrate that CK1δtau-like is enriched in the nucleus, contributing to its period-shortening phenotype. Furthermore, we show that active CK1δ (but not CK1δ-K38R) promotes cytoplasmic accumulation of PER:CRY complexes, consistent with PER2 degradation in the cytosol as described by Meng et al.

      Taken together, these findings suggest that PER proteins acquire their CK1 in the nucleus, and this interaction determines the circadian period length. Following a time delay—set by the kinetics of PER2 phosphorylation—PER2:CRY complexes are exported to the cytosol along with their bound CK1, where they are subsequently degraded.

      Reviewer #2:

      Interactions between the circadian clock proteins PER1/2 with CK1d/e and CRY1/2 influence each of their stability, subcellular localization, and activity, as countless studies over the last two decades have shown. However, many questions still remain, especially in light of newer models of the transcription-translation feedback loop (TTFL) in which the repression phase relies on two distinct mechanisms, a phosphorylation-dependent displacement of the transcription factor by CK1-PER-CRY complexes from DNA early in repression, and a CRY1dependent sequestration of the transcription factor activation domain later in repression. In particular, questions remain about mechanisms triggering nuclear entry/export and activity of these proteins in the cytoplasm and nucleus. 

      Here, the authors utilize a system of induced and/or transient overexpression of proteins with or without with fluorophores to track subcellular localization, stability, and interactions. As the authors point out throughout the manuscript, the overexpression of these clock proteins often causes them to behave differently from the endogenous proteins. It looks as though the authors have done their best to account for these changes, and they have certainly been rigorous in pointing them out, but there is concern that some of the conclusions may be influenced by this overexpression. For example, the relevance of work related to the overexpression-dependent foci is unclear. 

      Same answer as to Reviewer 1: It has been shown that PERs and CRYs do not form thermodynamically stable, large (detectable) foci under physiological conditions, as we have stated in the manuscript. Whether these proteins have the propensity to form smaller, more dynamic structures of physiological relevance is an interesting question that could be explored elsewhere, but it is not relevant to our study. In our work, these foci are simply convenient markers for analyzing the interaction and subcellular (co)localization of the clock proteins under investigation. In the revised version, we have kept the analysis of these foci and the discussion of their potential relevance to a minimum in order to avoid confusion.

      The findings that the stability of the kinase depend on localization, its intrinsic activity, and interaction with PER2 are interesting and important. Use of the CKBD deletion to show that CK1 stabilization depends on its anchoring interaction with PER2 is a nice touch. The authors bring up an excellent point that most of the potential phosphorylation sites on PER1 and PER2 have not been functionally characterized aside from the phosphoswitch mechanism. Their observation that CK1 eventually induces cytoplasmic localization of the CK1-PER-CRY1 complex and the release of CRY1 is intriguing. In particular, the finding that pretreatment of PER2 with CK1 in vitro blocked its ability to interact with CRY1 is very interesting. However, the absence of mechanistic data to explore this in more detail limits the impact of this conclusion. Using the system they have established here to identify the site(s) on PER2 and/or CRY1 that lead to this would help to solidify this work and increase the impact of this work. Overall, there are some interesting findings here but the inclusion of some competing viewpoints and mechanistic data would strengthen the impact of the work.

      Major

      (1) The characterization of the tau-like CK1 mutant R178C as less active than the wild type enzyme is not entirely correct-it is less active on the FASP region as described, but it has increased activity on S478 in the phosphodegron that is independent of inhibition from the FASP region (Gallego et al. PNAS, 2007 and Philpott et al. eLife, 2020). It is still possible that some of the period shortening effects of the mutant could arise from enhanced nuclear accumulation, but the oversimplified description of the mutant as less active should be corrected.  

      In the revised version, we discuss that the enhanced nuclear localization of the Tau-like kinase may contribute, at least in part, to period shortening, similar to how forced nuclear overexpression of wild-type kinase also shortens the period. We emphasize, however, that CK1 Tau is compromised in its priming-dependent activity, whereas its priming-independent activity is context-specific and enhanced toward the β-TrCP site.

      (2) One of main conclusions from the paper, that CK1 induces cytoplasmic localization of the CK1-PER2-CRY1 complex and subsequent release of CRY1 would be strengthened significantly by identifying the phosphorylation site(s) responsible for the cytoplasmic localization of the complex and the release of CRY1. The system they have developed here seems ideal to identify these sites.

      We fully agree with the reviewer. We substituted the known phosphorylation sites in PER2 surrounding the CRY-binding domain, but this had no effect on the phosphorylationdependent release of CRY1. Therefore, a more systematic analysis will be required, including the possibility that phosphorylations in CRY1 itself may contribute. To this end, we are generating PER2 and CRY1 variants in which all Ser/Thr residues are replaced by Ala. Using these constructs alongside the wild-type versions, we will by PCR systematically create hybrids in which specific regions containing phosphorylation sites are exchanged.

      Nevertheless, this will require considerable time and effort, and we believe this investigation exceeds the scope of the present manuscript and will address it in future work.

      (3) The concept of delayed release of CRY1 presented here is an interesting one. It's unclear why the authors have also not incorporated prior findings (Ukai-Tadenuma et al. Cell, 2012, Koike et al. Science, 2012) that peak levels of CRY1 are expressed in a later phase than CRY2, PER1, and PER2. It seems like figure EV6 should reflect the observation that CRY2 is the predominant cryptochrome present during early repression (Koike et al. Science, 2012).

      The reviewer is absolutely right: the expression phases of CRY1, CRY2, PER1, and PER2 are important. I have recently discussed these issues in detail in a News & Views article in The EMBO Journal, commenting on a paper by Smyllie et al. In this News & Views article, I discuss that the presently available data suggest that CRY1 is always present throughout the circadian cycle and keeps circadian transcription partially repressed even at peak phases of expression. In the revised version, I refer to these publications, including those mentioned by the reviewer. However, I would like to keep the model presented in the supplementary figure as simple as possible and specifically focused on the work presented in this manuscript, rather than presenting a comprehensive conceptual model of the circadian clock.

      (4) The model presented in figure EV6 and described throughout the text shows that PER-CRY complexes interact with CK1 in the nucleus, and not in the cytoplasm prior to nuclear entry. Prior work on endogenous protein complexes has shown that CK1-PER-CRY complexes exist in the cytoplasm very early on in the repression phase (Aryal et al. Mol Cell, 2017-ref. 14 in the manuscript). Work by Sancar and colleagues (Cao et al. PNAS, 2020) also shows with endogenous proteins that CK1d has a circadian pattern of nuclear entry (or possibly retention) concomitant with PER2 that is dependent on the presence of PERs and CRYs. Together, these data seem to be inconsistent with your model. 

      We think the data are not inconsistent. The recent Smyllie et al. paper in EMBO Journal shows that PER2 is present in both the cytosol and the nucleus at all times when it is expressed, but cytosolic PER2 is not saturated with CRY, which is more nuclear. Our data demonstrate that PER2 shuttles between the cytosol and the nucleus depending on its occupancy with CRYs (see schematic Fig. 1). Occupancy, in turn, depends on expression levels and binding affinities, including those of CRY2 and PER1. Consequently, PER2 complexes could shuttle continuously throughout the circadian cycle—either because they are not saturated with CRYs due to the balance between expression levels, freely available CRY, and binding affinity, or later in the cycle because CRYs are displaced by phosphorylation. If PER2 acquires casein kinase in the nucleus early in the cycle, it will shuttle out to the cytosol together with the bound CK1. We believe this does occur, but early in the circadian cycle the saturation of PER2 with casein kinase is likely to be very low due to the limited availability of CK1 in the nucleus. I am aware that not everyone will share this interpretation point by point, but discussing it in greater length and detail exceeds the scope of the present manuscript.

      Reviewer #3:

      This manuscript by Serrano and co-workers is a tight body of work that provides much needed insights into the regulation of clock proteins by CK1D, and into the regulation of CK1D itself. While the whole paper relies on artificial overexpression of chimeric/tagged proteins that may have significant differences in the function, the stability and subcellular distribution of the endogenous proteins they are suppose to model, this limitation was been clearly stated by the authors, and nevertheless their study still provides important insights. 

      While the authors have specified which Ck1d isoform (Ck1d1) they are overexpressing in their model cell lines, they may have thought to consider that the overexpression of one Ck1 homologue may affect the endogenous expression of the other homologues and their isoforms, e.g. ck1d1 overexpression may cause an increase in Ck1d2 or Ck1e, which would in turn affect the conclusions. 

      We show in revised Fig. 3 that overexpression of CK1δ1 reduces the expression of endogenous CK1δ1/2. This is consistent with our prediction that overexpressed and endogenous CK1 (including CK1ε) compete for the same stabilizing binding partners, leading to rapid degradation of unassembled kinases.

      Moreover, the antibody they used for endogenous Ck1d (which is ab85320, also mentioned as AF12G4 but that is the clone number, not the catalogue number) is discontinued and its specificity against Ck1d1, Ck1d2 or even the highly identical Ck1e, has not been clearly demonstrated. We know from Fig 3 that it can detect Ck1d1 but it would be great if the authors would provide additional evidence for the specificity of this antibody, for example by overexpressing Ck1d1/Ck1d2/Ck1e to see really which "endogenous" Ck1 we are seeing.

      Are the three bands for example seen in Fig 4A corresponding to the different isoforms? This simple experiment would reinforce the conclusions. 

      We show in the revised figure that the antibody recognizes CK1δ1 and CK1δ2, but not CK1ε. In U2OS cells, the antibody detects a single band (Figure); we do not know whether this represents predominantly one splice isoform or both, which are not resolved. However, this distinction is not relevant for our interpretation, because overexpression of tagged CK1δ1 reduces the expression of whichever endogenous kinase is present.

      There are no minor comments, as the figures, the figure legends and main text are all of good quality and ready for publication.

      Reviewers’ Responses to Point-by-Point Response to Peer Review 

      Referee #1:

      I appreciated the additional efforts by the authors to improve the manuscript. Unfortunately, the underlying approach of forced over-expression remains artifact-prone, and has been largely supplanted by readily available knockin and targeted mutagenesis methods. Over-expression may give clues, but I think more rigorous mechanistic validation is needed to make this compelling. I cannot support publication of this manuscript.

      Referee #2:

      In their response to reviewers, the authors make the valid point that the steady state of a system is usually perturbed to study it. In this study, they have used overexpression of the clock proteins PER2, CRY1 and CK1 to study their effects on subcellular dynamics and stability. In justifying this choice, they refer to several papers that similarly overexpressed at least one of these components, stating that their time-resolved approach brings novel insights. However, there is a missed opportunity here to translate any lessons learned from overexpression studies to a system where the proteins are expressed at physiological levels and stoichiometry.

      The authors reply to reviewer 1 stating that they conclude PER proteins acquire CK1 in the nucleus, but this does not account for other studies showing an apparent PER-CK1 complex in the cytoplasm during the early phases of repression and/or a pattern of PER-dependent nuclear entry of CK1 (Lee et al. 2001, Cell; Aryal et al. 2017 Mol Cell; Cao et al. 2021 PNAS). Given that all 3 of these studies were done with native expression levels, it seems incumbent upon the authors to demonstrate that their conclusions from the overexpression study are physiologically relevant by translating them in some way to a more native system. This also addresses a point made by reviewer 2, major concern 4 that was not satisfactorily addressed by the authors. Perhaps they could validate their hypothesis of PER shuttling and interactions with CK1 or CRY1 that alter this in a native system similar to Aryal or Cao et al. with the use of nuclear export inhibitors?

      The response to reviewer 2, major concern 1 is thoughtful and much appreciated. However, simplifying the effects of the tau mutation on CK1 as having a decreased rate on priming-dependent phosphorylation but not priming-independent is not quite true-the tau mutation also decreases the rate of priming-independent phosphorylation of S662 (in humans) (Philpott et al. 2020, eLife).

      Other papers appearing in this journal seem to all include at least one major new mechanistic insight. Although the authors do a diligent job in characterizing the overexpressed proteins in this system, some of their conclusions are at odds with prior studies of the system in more native conditions, so the potential impact of this work is unclear. To verify these conclusions or test new ones (ie, that CK1 disrupts PER-CRY1 interactions), they should use their insights to generate mutations or make perturbations in a native system and demonstrate that they still hold.

      Referee #3:

      The authors have adequately addressed the reviewers' comments, and it is my opinion that the manuscript is ready for publication. It is true, as previously mentioned by other reviewers, that the evidence presented rely on overexpression, which for the other reviewers seem to preclude publication. However, I find this to be a too strict opinion.

      If the authors had indeed provided evidence using crispr-cas9-mediated genetic manipulation and tagging/mutating endogenous genes for all their experiments, thereby providing more physiological evidence of how clock proteins interact, they would probably have submitted their manuscript to an alternative journal with a higher impact.

      As it stands, it is my opinion that, considering the evidence and limitations of the study, this manuscript is a good match for the journal.

      Author Rebuttal:

      Apologies for the delayed reply regarding our manuscript. In the meantime, we have added several new experiments which address the comments of the reviewers and more. These are now included as Figures 1C, EV3, 4D, 6E, 6F, EV6D, and EV7.

      Figure 1C reinforces our observations from Figure 1B showing that induction of stably-integrated PER2 also results in accumulation of endogenous CRY1 at a timescale that is compatible with the gradual localization of overexpressed PER2 into the nucleus.

      Figure EV3 addresses several technical comments from Reviewers #3 and #1, respectively: Figure EV3A shows that our CK1δ antibody recognizes CK1δ1 and CK1δ2, but not CK1ε. Figures EV 3B and C clearly show how overexpression of our transgenic CK1δ results in decreased endogenous CK1δ which further demonstrates the rapid turnover of active kinase.

      Figure 4D addresses the comment from Reviewer #2. We clearly show that CK1δ is not kept in a dephosphorylated state by binding to PER. In addition to our direct comment to this point, Figure 4D shows that CK1δ regardless if it is expressed alone or in complex with PER2 is phosphorylated to a similar extent when the cells are treated with the phosphatase inhibitor CalA. As indicated in our direct response, we are rather more interested in the observation that cellular phosphatases act differently on PER2 compared to CK1δ despite being in the same PER:CK1δ complex (as shown by the clear stabilization of overexpressed CK1δ by co-expression of PER2).

      Figures 6E, 6F, and EV6D demonstrate that our observations from overexpression systems are also observed in a more physiological context, addressing comments from Reviewers #1 and #2. Figure 6E shows that dephosphorylation of PER2 leads to its relocalization from the cytosol to the nucleus, while Figure 6F analyzes the subcellular localization of PER2 in the context of a functional circadian clock in U2OS cells. The latter demonstrates that PER2 is predominantly nuclear early in the circadian cycle, but redistributes to the cytosol at later time points. We included these experiments in response to the reviewer’s request for a more physiological context. Since we are not a mouse lab, this cell-based system represents the most physiological model we can provide. Figure 6F show the dynamics of endogenous PER2 from DEX-synchronized cells. At early timepoints, PER2 is predominantly nuclear likely due to the incorporation of CRY1 forming the PER:CRY complex. At later timepoints PER2 is redistributed between the cytoplasm and nucleus due to PER2 phosphorylation. Importantly, these results are consistent with and recontextualize the results from Liu et al. (Xie et al., PNAS, 2023) showing the hypophosphorylated PER2 at early timepoints post-DEX is predominantly nuclear and hyperphosphoryated PER2, that appear later post-DEX is predominantly cytoplasmic.

      Finally, Figure EV7 provides a model how the subcellular distribution of CK1δ affects its assembly into the PER:CRY complex emphasizing how nuclear kinase enacts its role in the circadian clock.

      Response to Reviewers:

      We were disappointed by the categorical rejection of overexpression experiments. Without a specific discussion of why they would be inappropriate or not sufficient in the context of the work presented here, the blanket assertion that overexpression inevitably produces artifacts functions more as a rhetorical device than as a substantiated scientific argument. The fact that the term ‘physiological’ generally carries a positive connotation, whereas ‘overexpression’ is often perceived negatively, does not in itself justify the categorical rejection of experiments.

      While we appreciate that some reviewers may personally prefer alternative strategies, we believe that the suitability of any approach must be evaluated in light of the specific biological questions being addressed. I cannot see a single specific point in the reviewers’ responses indicating that any of our experiments yielded artificial results. It is true that targeted knock-in and mutagenesis methods are available, however, these approaches are simply not suited to the questions raised in this manuscript. We also fully agree that, whenever possible, insights from overexpression studies should be validated in systems with a functional clock where proteins are expressed at physiological levels, which we did using U2OS cells, and noting the compatibility of our results with those in the literature using endogenously-tagged constructs. We have cited several recent studies that have investigated the subcellular distribution and circadian dynamics of endogenous or endogenously-tagged clock proteins in mice (Cao et al, 2021; Smyllie et al, 2022, 2016, 2025) and U2OS cells (Öllinger et al, 2014; Gabriel et al, 2021; Xie et al, 2023). While we cannot substantially expand on these previous observations, we confirm them in the revised version by demonstrating the nuclear-to-cytoplasmic relocalization of PER2 in U2OS cells over the course of a circadian cycle. In addition, we show that this process is, in principle, reversible: when CK1 is inhibited with PF670, overexpressed hyperphosphorylated cytosolic PER2 becomes dephosphorylated and accumulates in the nucleus.

      Overall, we consider our approach not only complementary but also essential, as it enables us to address two key questions that would otherwise be difficult or even impossible to resolve:

      (1) Mutual impact of PER2 and CRY1 on subcellular dynamics and the role of PER2 phosphorylation

      Evidence from mouse liver (Cao et al, 2021), mouse SCN (Smyllie et al, 2022, 2025), and U2OS cells (Xie et al, 2023) indicates that a substantial fraction of PER2 remains cytoplasmic throughout its expression cycle, even in the presence of CRY1, which promotes PER’s nuclear import. The mechanisms underlying this cytoplasmic retention remain unclear, and no circadian function has yet been attributed to the cytosolic PER2 pool. Our study addresses how PER2 abundance, phosphorylation state, and stoichiometry relative to CRY1 govern their interaction and subcellular dynamics. This is physiologically relevant because PER1/2 and CRY1/2 proteins oscillate in expression and degradation out of phase, such that their concentrations, stoichiometry, and phosphorylation state vary systematically over the circadian cycle. Transient transfection and inducible overexpression combined with time-lapse microscopy are essential here, as they uniquely allow modulation of protein ratios and CK1δ levels and to resolve their dynamics.

      Previous work established that CRY1 is nuclear and promotes PER2 nuclear accumulation (Smyllie et al, 2022). Our data extend this by showing that subcellular distribution is determined by the CRY1:PER2 ratio. While CRY1 alone is nuclear we show that PER2 alone is cytoplasmic due to rapid nuclear export. Mixed conditions reveal ratio-dependent shifts: at low CRY1-to-PER2 ratios, CRY1 relocalizes to the cytoplasm, whereas at high ratios, PER2 is retained in the nucleus. We explain this behavior by PER2 dimerization: dimers bound to two CRY1 molecules remain nuclear, while dimers bound to a single CRY1 localize to the cytosol. Such species can be expected to form in a physiological context depending on binding affinities and rhythmic expression levels and ratios across circadian time. Importantly, we show that CK1δ-mediated phosphorylation destabilizes PER2 and CRY1 interactions. From this, we infer that PER2 dimers with only a single bound CRY1 transiently form and accumulate in the cytosol, consistent with the lower CRY1-to-PER2 ratio we observe in the cytosol and that has also been reported in the SCN (Smyllie et al, 2025). With continued phosphorylation, PER2 dimers lose CRY1 altogether, while the released CRY1 accumulates in the nucleus. We suggest that this mechanism supports and extends the late repressive phase of the circadian cycle. Recent data show that hypophosphorylated PER2 is predominantly nuclear, whereas hyperphosphorylated PER2 is largely cytoplasmic in mouse liver (Cao et al, 2021; Xie et al, 2023), linking our data to a physiological context.

      Taken together, these findings suggest a mechanism whereby stoichiometry, subunit composition, and CK1δ phosphorylation determine PER:CRY complex composition and localization. Crucially, these complexes and their dynamic relocalization could only be observed using inducible overexpression; knock-in strategies at endogenous levels would not be able to capture such states.

      (2) Posttranslational regulation and subcellular homeostasis of CK1δ and impact on the clock

      Previous work has shown that nuclear export of CK1δ depends on its kinase activity (Milne et al, 2001). Here, we further demonstrate that unassembled CK1δ is subject to degradation, with nuclear turnover accelerated by its catalytic activity. Thus, when evaluating the impact of CK1δ mutants on the circadian clock, one must consider not only kinase activity but also protein stability and subcellular distribution. We find that CK1δ availability for PER2 differs between cytosol and nucleus. In particular, nuclear CK1δ is limiting, and its abundance directly determines circadian period length. This is significant because subcellular CK1δ availability and posttranslational regulation have not previously been examined or incorporated into circadian clock models, as the kinase has been assumed to be non-limiting given its constant expression throughout the circadian cycle. Complex formation between CK1δ and PER is a well-established determinant of circadian timing, with CK1δ overexpression known to shorten period length. Our data explain why: the binding equilibrium between CK1δ and PER must be finely tuned. Previous studies suggested that PER associates with CK1δ in the cytosol and enters the nucleus as a PER:CRY:CK1δ complex (Lee et al, 2001; Aryal et al, 2017). Our data suggest that nuclear PER is not saturated with CK1δ. This is because levels of free, active CK1δ in the nucleus are low, owing to its rapid export or degradation by the nuclear proteasome, which limits its availability for PER binding.

      Our overexpression studies support this mechanism. NES-tagged CK1δ overexpression does not alter circadian period length, because it fails to increase nuclear CK1δ levels: Each PER molecule can coimport only one kinase, a process already occurring in wild-type cells, and the few co-imported molecules rapidly equilibrate with the nuclear pool, where they are subject to export or degradation. In contrast, NLS-tagged CK1δ overexpression directly increases nuclear kinase abundance by antagonizing export, thereby enhancing PER binding and shortening circadian period. This multilayered regulation of CK1δ stability and localization and its consequences for PER2 availability would not have been revealed without targeted overexpression. Our findings therefore fill a key knowledge gap and remain fully consistent with previous studies (Lee et al, 2001; Aryal et al, 2017; Cao et al, 2021).

      Conclusion: In sum, our findings are novel and physiologically relevant, aligning with data from mouse liver and SCN. While studies at strictly endogenous protein levels are important and necessary, perturbation of steady state is a standard strategy to uncover and observe novel mechanisms. Endogenous-level experiments would demand technically unrealistic systems (for example, even the simplest case, analyzing the subcellular dynamics of PER2 alone, would require cells lacking PER1, CRY1/2, and CK1δ/ε). Moreover, adjustment of PER2-to-CRY1 ratios cannot be achieved with stably integrated genes and of course not at physiological expression levels. Thus, inducible overexpression is not merely practical but currently the most feasible approach to dissect these dynamics. We complement our findings with data from U2OS cells with a functional clock, showing that the availability of nuclear CK1δ directly determines circadian period length. Although specific aspects of our extended model require further experimental validation, no published evidence contradicts it to date. Mechanistic discussions of the circadian clock have so far focused primarily on PER protein degradation. Our model broadens this perspective by incorporating CK1δ homeostasis, PER:CRY complex composition, subcellular localization, and their regulation by phosphorylation. In doing so, it provides a detailed framework to be critically tested and refined in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript presents a compelling new in vitro system based on isogenic co-cultures of human iPSC-derived hepatocytes and macrophages, enabling the modelling of hepatic immune responses with unprecedented physiological relevance. The authors show that co-culture leads to enhanced maturation of hepatocytes and tissue-resident macrophage identity, which cannot be achieved through conditioned media alone. Using this system, they functionally validate immune-driven hepatotoxic responses to a panel of drugs and compare the system's predictive power to that of monocyte-derived macrophages. The results underscore the necessity of macrophage-hepatocyte crosstalk for accurate modelling of liver inflammation and drug toxicity in vitro.

      The manuscript is clearly written and addresses a key limitation in liver organoid systems: the lack of immune complexity and tissue-specific macrophage imprinting. Nevertheless, several conclusions would benefit from a more careful interpretation of the data, and some important controls or explanations are missing, particularly in the flow cytometry gating strategies, stress marker validation, and cluster interpretations.

      Strengths:

      (1) Novelty and Relevance: The study presents a highly innovative co-culture system based on isogenic human iPSCs, addressing an unmet need in modelling immune-mediated hepatotoxicity.

      (2) Mechanistic Insight: The reciprocal reprogramming between iHeps and iMacs, including induction of KC-specific pathways and hepatocyte maturation markers, is convincingly demonstrated.

      (3) Functional Readouts: The application of the model to detect IL-6 responses to hepatotoxic compounds enhances its translational relevance.

      Weaknesses:

      (1) Several key claims, particularly those derived from PCA plots and DEG analyses, are overinterpreted and require more conservative language or further validation.

      We agree that PCA does not allow for maturation trajectories and mentioned that it was a hypothesis that the co-culture was promoting maturation, which we later validated by looking at the expression of key hepatocyte markers as well as by pearson correlation comparison with fetal hepatocytes.

      (2) The purity of sorted hepatocytes and macrophages is not convincingly demonstrated; contamination across gates may confound transcriptomic readouts.

      We agree and have highlighted and addressed this limitation in our discussion. Unfortunately, this is a limitation of bulk sequencing that a small amount of contamination might be present, however the TPM values of ALB for example in the iMacs is extremely low especially when compared to the hepatocytes, indicating that the level of contamination is likely to be very low. Likewise, the expression of CSF1R in the co-cultured iHeps is also extremely low. This has been included in Supp Fig 1F and G.

      (3) Stress response genes and ER stress/apoptosis signatures are not properly assessed, despite being potentially activated in the system.

      This has been included in Supp Fig 2C, where we’ve included the expression of ATF4, CASP3 and CASP9. Although there’s a significant difference in ATF4 expression between Day 0 and Day 7 iHep only/Co-culture, there is no significant difference between the Day 7 iHep only and Day 7 iHep Co-culture. There are no significant differences in CASP3 and CASP9 expression across all the samples.

      (4) Some figure panels and legends lack statistical annotations, and microscopy validation of morphological changes is missing.

      Although we agree that the morphology changes would be interesting, we think that this question is unfortunately outside of the scope of our question. Although Kupffer cells are in direct contact with hepatocytes, they migrate from the liver parenchyma into the sinusoidal spaces where they primarily reside. We do not think that the morphology would add much to the paper, especially given that this is a 2D model as well.

      (5) The co-culture model with monocyte-derived macrophages is not fully characterised, making comparisons less informative.

      Although we agree that it would be interesting to look more closely at the monocyte-derived macrophage co-cultures as well, we think that this would be more suited to a future study as the transcriptomic analysis would likely include confounding effects of patient specific transcriptomic changes, and our primary focus was on developing an isogenic co-culture system.

      Reviewer #2 (Public review):

      Summary:

      This study builds on work by Glass and Guilliams showing that mouse Kupffer cells depend on the surrounding cells, including endothelium, hepatocytes, and stellate cells, for their identity. Herein, the authors extend the work to human systems. It nicely highlights why taking monocyte-derived macrophages and pretending they are Kupffer cells is simply misleading.

      Strengths:

      Many, including human cells, difficult culture assays, and important new data.

      Weaknesses:

      This reviewer identified minor queries only, rather than 'weaknesses' as such.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors establish a human in vitro liver model by co-culturing induced hepatocyte-like cells (iHEPs) with induced macrophages (iMACs). Through flow cytometry-based sorting of cell populations at days 3 and 7 of co-culture, followed by bulk RNA sequencing, they demonstrate that bidirectional interactions between these two cell types drive functional maturation. Specifically, the presence of iMACs accelerates the hepatic maturation program of iHEPs, while contact-dependent cues from iHEPs enhance the acquisition of Kupffer cell identity in iMACs, indicating that direct cell-cell interactions are critical for establishing tissue-resident macrophage characteristics.

      Functionally, the authors show that iMAC-derived Kupffer-like cells respond to pathological stimuli by producing interleukin-6 (IL-6), a hallmark cytokine of hepatic immune activation. When exposed to a panel of clinically relevant hepatotoxic drugs, the co-culture system exhibited concentration-dependent modulation of IL-6 secretion consistent with reported drug-induced liver injury (DILI) phenotypes. Notably, this response was absent when hepatocytes were co-cultured with monocyte-derived macrophages from peripheral blood, underscoring the liver-specific phenotype and functional relevance of the iMAC-derived Kupffer-like cells. Collectively, the study proposes this co-culture platform as a more physiologically relevant model for interrogating macrophage-hepatocyte crosstalk and assessing immune-mediated hepatotoxicity in vitro.

      Strengths:

      A major strength of this study lies in its systematic dissection of cell-cell interactions within the co-culture system. By isolating each cell type following co-culture and performing comprehensive transcriptomic analyses, the authors provide direct evidence of bidirectional crosstalk between iMACs and iHEPs. The comparison with single-culture controls is particularly valuable, as it clearly demonstrates how co-culture enhances functional maturation and lineage-specific gene expression in both cell types. This approach allows for a more mechanistic understanding of how hepatocyte-macrophage interactions contribute to the acquisition of tissue-specific phenotypes.

      Weaknesses:

      (1) Overreliance on bulk RNA-seq data:

      The primary evidence supporting cell maturation is derived from bulk RNA sequencing, which has inherent limitations in resolving heterogeneous cellular states and functional maturation. The conclusions regarding hepatocyte maturation are based largely on increased expression of a subset of CYP genes and decreased AFP levels - markers that, while suggestive, are insufficient on their own to substantiate functional maturation. Additional phenotypic or functional assays (e.g., metabolic activity, protein-level validation) would significantly strengthen these claims.

      We have added a discussion on the limitations of our study.

      (2) Insufficient characterization of input cell populations:

      The manuscript lacks adequate validation of the cellular identities prior to co-culture. Although the authors reference previously published protocols for generating iHEPs and iMACs, it remains unclear whether the cells used in this study faithfully retain expected lineage characteristics. For example, hepatocyte preparations should be characterized by flow cytometry for ALB and AFP expression, while iMACs should be assessed for canonical macrophage markers such as CD45, CD11b, and CD14 before co-culture. Without these baseline data, it is difficult to interpret the magnitude or significance of any co-culture-induced changes.

      We apologise for this oversight, some of the markers were used in determining the purity of the iMacs before co-culture, and we did not end up including these plots for brevity. We have added the purity plots in Supp Fig 2E now, showing that the iMacs were more than 90% pure before co-culture. We acknowledge the concern about cross-contamination for bulk sequencing, and have added in Supp Fig 2G and H the expression of ALB in the iMac fraction, as well as the expression of CSF1R in the iHep fraction, showing minimal contamination with our gating strategy.

      (3) Quantitative assessment of IL-6 production is insufficient:

      The analysis of drug-induced IL-6 responses is based primarily on relative changes compared to control conditions. However, percentage changes alone are inadequate to capture the biological relevance of these responses. Absolute cytokine production levels - particularly in response to LPS stimulation - should be reported and directly compared to PBMC-derived macrophages to determine whether iMAC-derived Kupffer-like cells exhibit enhanced cytokine output. Moreover, the Methods section should clearly describe how ELISA results were normalized or corrected to account for potential differences in cell number, viability, or culture conditions.

      We apologise if this was unclear. The cytokine production from dosed cells was normalized based on the viability of cells measured from the same well.

      (4) Unclear mechanistic interpretation of IL-6 modulation:

      The observed changes in IL-6 production upon drug treatment cannot be interpreted solely as evidence of Kupffer cell-specific functionality. For instance, IL-6 suppression by NSAIDs such as diclofenac is well known to result from altered prostaglandin synthesis due to COX inhibition, while leflunomide's effects are linked to metabolite-induced modulation of immune cell proliferation and broader cytokine networks. These mechanisms are distinct from Kupffer cell identity and may not directly reflect liver-specific macrophage function. Consequently, changes in IL-6 secretion alone - particularly without additional mechanistic evidence or analysis of other cytokines - are insufficient to conclude that co-culture with hepatocytes drives the acquisition of bona fide Kupffer cell maturity.

      We fully agree with the reviewer and have highlighted this in our discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) GSE ID for RNA-seq data has not been provided.

      This has been included.

      (2) Line 291: Can the authors specify what they mean by "state-of-the-art"?

      What we mean here is what others in the field have also recently described. We have rewritten this to be clearer.

      (3) Lines 299-300: check sentence for grammar mistakes.

      We have rewritten and clarified this.

      (4) Figure 1B: The PCA does not really allow for following maturation trajectories. Also, all samples (day 3 Co-iHep, day 7 Co-iHep, day 7 iHep) look as if they cluster more or less together. Therefore, the conclusion drawn in lines 303-305 does not hold. Why is day 3 iHep not also shown here?

      We agree that PCA does not allow for maturation trajectories and mentioned that it was a hypothesis that the co-culture was promoting maturation, which we later validated by looking at the expression of key hepatocyte markers as well as by pearson correlation comparison with fetal hepatocytes.

      (5) Can the authors show that the cells that they are sorting in the double negative gate are indeed hepatocytes? Typically, these cells are big in cell size; therefore, showing the FSC/SSC gate would also be important.

      We have added the FSC/SSC gate in supp fig. 1E to show that the populations have different sizes.

      (6) Can the authors provide microscopy pictures of iHeps, iMacs, and the co-cultured cells for the reader to appreciate whether the morphology of cells already changes during the co-culture experiments?

      Although we agree that the morphology changes would be interesting, we think that this question is unfortunately outside of the scope of our question. Although Kupffer cells are in direct contact with hepatocytes, they migrate from the liver parenchyma into the sinusoidal spaces where they primarily reside. We do not think that the morphology would add much to the paper, especially given that this is a 2D model as well.

      (7) Please show expression of apoptotic and ER stress genes comparing Day7 iHeps and Co-iHeps, since genes such as c-Fos and Ppp2r3b can also be associated with cellular stress.

      This has been included in Supp Fig 2C, where we’ve included the expression of ATF4, CASP3 and CASP9. Although there’s a significant difference in ATF4 expression between Day 0 and Day 7 iHep only/Co-culture, there is no significant difference between the Day 7 iHep only and Day 7 iHep Co-culture. There are no significant differences in CASP3 and CASP9 expression across all the samples.

      (8) In addition to the genes shown in Figure 1E, could the authors extract a longer gene list of maturing hepatocytes and display them all in bar graphs or heatmaps, or similar? E.g., Albumin expression is shown later, but why not show it already here?

      There are not many differences in the canonical hepatocyte markers, which is why we chose only to show the interesting genes that were different, as seen in the later ALB expression plot where there wasn’t a difference in ALB expression after 7 days of co-culture. Instead, we have included a new heatmap in Supp Fig 2B showing the top 40 genes that are contributing to the similarity by pearson correlation.

      (9) Along these lines, how do the authors ensure that they are culturing only hepatocytes and do not have a mixture of cells that may "dilute" the hepatocyte signature?

      Unfortunately, this is an limitation of our methodology, although the expression of key hepatic markers are routinely confirmed by qPCR to ensure that the majority of the cells are hepatocyte-like.

      (10) Lines 347-350: similar to the interpretation of the PCA for hepatocytes, this is a completely random interpretation. The expression of ALB in the co-cultured iMacs indicates that there are some hepatocytes that ended up in the macrophage gate.

      We agree and have highlighted and addressed this limitation in our discussion. Unfortunately, this is a limitation of bulk sequencing that a small amount of contamination might be present, however the TPM values of ALB for example in the iMacs is extremely low especially when compared to the hepatocytes, indicating that the level of contamination is likely to be very low. Likewise, the expression of CSF1R in the co-cultured iHeps is also extremely low. This has been included in Supp Fig 1F and G.

      (11) Figure 2D: Among the pathways shown, there are also stress pathways (acute phase response, HMGB1). Also for these cells, control of apoptotic and ER stress signatures is necessary.

      As mentioned, we have included some stress genes in Supp Fig 2C to address this.

      (12) Lines 385-386: Why would FCGRA3 indicate tissue residency? Is there literature to support this statement?

      CD16 is a marker often used to distinguish Kupffer cells from the surrounding cells, although it also expressed by non-classical monocytes, we have clarified the text here (Lines 356-357).

      (13) Figure 3E: ALB and other genes were at the same or even lower levels expressed in D7 compared to D3. Why is that? Are the cells starting to de-differentiate after 7 days? Please discuss.

      This is a very interesting question that we were wondering ourselves as well, although sadly we do not have an answer yet. We hypothesized that this might be due to the activation of cell proliferation/developmental programmes as the cells are kept longer together, as shown by the expression of morphogens like OSM and IGF-2 after co-culture. We have added some discussion for this (Lines 532-540)

      (14) Line 459: Word "in" is double

      We thank the reviewer for catching this, this has been corrected

      (15) Figure 5: The findings are interesting, but the co-culture model remains somewhat unclear. Can the authors show, e.g., using qRT-PCR, how hepatocytes are developing in this culture system? If the development with monocyte-derived macrophages is altered, then one would expect that also the cellular response is different.

      We agree with the reviewer, but we think that this question would be better answered in a follow-up study. We were looking to answer if the addition of isogenic iMacs would change the drug response of iHeps, and were using the PBMC-derived macrophages here as a control. A more complete study taking into account the genetic background of the donor PBMC-derived macrophages would be much more informative, but sadly outside of the scope of our present study.

      (16) Lines 482-484: The authors talk about LPS-treated cultures and refer to Figure 4. However, there is no graph shown for LPS.

      We apologise for being unclear here, but the co-cultures were co-treated with LPS during the drug stimulation assays, as it had been shown that LPS increases the sensitivity of the liver toward hepatotoxic drugs. We have clarified this in the main text (Lines 435-437).

      Reviewer #2 (Recommendations for the authors):

      (1) It would be nice to add some protein production by the hepatocytes. For example, can they produce albumin or some other protein that can be measured? Perhaps I missed this.

      The protein expression of Albumin and Urea were assessed in the hepatocytes prior to co-culture in Supp Fig 1C; however we did not measure the protein level changes after co-culture as the co-culture would have a significant number of macrophages as well which we thought might affect the readout. Instead, after co-culture the primary analysis was done on the RNA levels of ALB and other cytochrome genes after sorting in Fig 3.

      (2) Was there an increase in hepatocyte number? Did one cell outgrow the other, or did they maintain numbers?

      The relative proportion of the iHeps remained the same, although we did see an expansion in the iMac population after 7 days by flow cytometry in Fig 1D.

      (3) What happens if the iMACs and the iHeps are grown in Costar chambers with pore sizes too small to allow for cell contact, but allowing supernatant to be continuously exposed to both cell types?

      We were primarily focused on the acquisition of KC-like phenotype in the iMacs with regards the question of direct contact, which was why we chose to use conditioned iHep media as part of the iMac experimental set up. However, it would be very interesting to see if the converse is also true, and whether secreted factors from the iMacs alone would be sufficient to drive the changes we observed in the iHeps after co-culture in a follow-up study.

      (4) The discussion could use a brief paragraph on some limitations and what could be added to the co-culture system. For example, could stellate cells and sinusoidal endothelium also impart KC identity? Would growing KCs on endothelium provide a more natural substratum?

      Once again, these are very interesting questions which are unfortunately outside of the scope of our study. However, we have included a short section discussing this in the paper, as we do think that it would be interesting to look at iMacs educated by hepatocyte vs stellate cells for example (Lines 530-536).

      (5) The axonal guidance pathway in early iMACs is interesting. A recent report in vivo showed that macrophages migrate from the liver parenchyma into the sinusoids in neonates when they are still immature. The process could be chemotaxis, or it could be repulsion by parenchyma. Numerous axonal guidance molecules are repulsive, pushing axons away (robo/slit, etc). The migration of Kupffer cells into sinusoids could be a repulsive rather than a chemoattractant pathway. Did the RNA seq data provide any interesting molecules in this regard?

      Reviewer #3 (Recommendations for the authors):

      This manuscript presents a conceptually well-designed approach to modeling hepatocyte-macrophage crosstalk in vitro. The authors develop a co-culture system aimed at recapitulating key aspects of Kupffer cell (KC) identity and hepatocyte maturation. The data convincingly show that macrophages acquire KC-like features under co-culture conditions. However, several major issues limit the strength of the conclusions, the depth of mechanistic insight, and the translational impact of the work.

      First, the study relies heavily on bulk RNA-seq data with minimal functional or protein-level validation - particularly for hepatocyte maturation. To substantiate claims of functional maturation, additional assays measuring albumin secretion, urea production, and CYP activity are essential. Furthermore, the omission of zonation-associated markers (e.g., GLUL, CPS1, CYP2E1) leaves a critical gap in assessing whether the iHEPs achieve physiologically relevant functional states.

      Second, statistical interpretation and reporting are inconsistent. Significant and non-significant findings are frequently conflated, which risks overinterpretation. For instance, the reported reduction in HNF4A expression is not statistically significant, and AFP expression is only significantly reduced in Day 7 co-iHEPs - yet these distinctions are not clearly stated.

      Third, although the authors emphasize the role of cell-cell contact in promoting KC identity, no experiments (e.g., transwell separation, adhesion-blocking assays) directly test this claim. As a result, the mechanistic basis for this conclusion remains speculative.

      Finally, while the data support enhanced macrophage differentiation toward a KC-like phenotype, the evidence that co-culture significantly promotes hepatocyte maturation is far less convincing and requires additional functional, mechanistic, and statistical validation before firm conclusions can be drawn.

      Minor comments:

      (1) Methodology: The choice of a 2.5:1 iHEP:iMAC ratio is not justified. This proportion does not reflect physiological hepatocyte-to-KC ratios in vivo and should be either rationalized or benchmarked against native liver composition.

      We admit that the ratio here is on the higher side of things, but it has been previously reported that there can be between 20 to 40 macrophages per 100 hepatocytes (1:5 to 1:2.5) in the adult mouse liver (Baratta et al., 2009), while admittedly in the developing mouse liver the ratio is closer to 1:4 (Lopez et al., 2011). We chose 1:2.5 as we anticipated that not all of the macrophages would be able to attach, and would thus be lost during media change, as evident by the flow cytometry of the co-culture on Day 3 of the co-culture, where only 20% of the cells had clear CD45 and CD14 expression. We have clarified our methodology in paper (Lines 141-143).

      (2) Effect of iMAC on iHEP (Section 3.2, Supplementary Figure 1E):

      (2.1) The authors should explain why Day 3 co-cultured iHEPs show stronger transcriptomic similarity to primary hepatocytes than Day 7 cells. Possible biological mechanisms (e.g., transient paracrine signaling or temporal changes in maturation dynamics) should be discussed.

      We have added some discussion for this (Lines 309-311, 536-540).

      (2.2) The figure legend refers to "fetal hepatocytes," while the correlation map states "hepatocytes." This discrepancy must be clarified. Moreover, if fetal hepatocytes are used as the reference, and the goal is to assess maturation, comparisons to adult hepatocytes are necessary. 

      The comparison was done against fetal hepatocytes, and has been clarified in the figure. We chose to use fetal hepatocytes here as it would be unfair to compare iPSC-derived cells that are less than 3 weeks old to adult human tissue, and any similarity or differences between the mono/co-cultures to the adult tissue might be due to the shifting transcriptomic landscape during development. However, we do recognise the nuanced nature of using “maturation” here, and what we mean is that the iPSC-derived cells become more similar to their in-vivo counterparts.

      (2.3) Baseline characterization of both cell types before co-culture is insufficient. For iHEPs, flow cytometry data on ALB and AFP positivity rates should be presented, along with post-co-culture changes. For iMACs, marker expression (CD45, CD11b, CD14) should be shown before and after co-culture. The methods mention CD163, CX3CR1, and CD11b, but these data are absent from the results. Additionally, the gating strategy for cell sorting prior to bulk RNA-seq must be clearly described - including how potential cross-contamination of cell fractions (e.g., macrophages in the hepatocyte population) was excluded.

      We apologise for this oversight, some of the markers were used in determining the purity of the iMacs before co-culture, and we did not end up including these plots for brevity. We have added the purity plots in Supp Fig 2E now, showing that the iMacs were more than 90% pure before co-culture. We acknowledge the concern about cross-contamination for bulk sequencing, and have added in Supp Fig 2G and H the expression of ALB in the iMac fraction, as well as the expression of CSF1R in the iHep fraction, showing minimal contamination with our gating strategy.

      (3) IGF2 Expression: The observed upregulation of IGF2, a fetal marker, contradicts the conclusion that co-culture promotes hepatocyte maturation. This inconsistency should be addressed, and possible explanations (e.g., transient fetal-like activation driven by macrophage-derived signals) discussed. The lack of statistical significance for this finding must also be explicitly noted.

      We thank the reviewer for pointing this out. The expression of IGF2 was actually significantly different when comparing the Day 0 Hepatocyte only and Day 7 Hepatocyte only to the Day 3 Co-cultured Hepatocytes, but the significance is lost with the Day 7 co-cultured Hepatocytes. One possible explanation is as the reviewer suggested, that there is a transient program that is activated upon co-culture that is subsequently downregulated. We have updated the figure and text, and added some discussion to reflect this (Lines 309-311, 536-540).

      (4) Effect of iHEP on iMAC: The reported upregulation of KC-related genes is overstated. Changes in LYVE1 and ID1 are not statistically significant (Figure 2G), yet they are presented as meaningful. Clear separation of statistically significant results from non-significant trends is critical to avoid overinterpretation.

      We apologise for this, as it was never our intention to present these markers as significant, but rather we presented these markers because we thought that these markers would be of interest to the audience. We have clarified the text to reflect that these are trends and non-significant (Lines 367-369).

      (5) Mimicking In Vivo Clinical Responses:

      (5.1) The authors' conclusion that IL-6 responses are not recapitulated when iMACs are replaced by monocyte-derived macrophages (MoMs) is not fully supported by the data presented. In fact, the MoM co-cultures exhibit a noticeable trend toward increased IL-6 production (e.g., approximately 150% with LTG at 66.6 µM and 400 µM), suggesting that some degree of responsiveness is retained. To substantiate the claim that the observed cytokine modulation is unique to iKC-containing co-cultures, the authors should perform direct statistical comparisons of absolute IL-6 secretion levels between iKC and MoM co-cultures at each drug concentration. Such analyses are essential to determine whether the differences are statistically significant and biologically meaningful, and to clarify whether the observed effects truly reflect KC-specific functionality rather than general macrophage activation.

      (5.2) The effects of drug exposure on hepatocytes themselves are not addressed. It is important to evaluate whether the co-culture remains viable under treatment, whether it recovers after drug withdrawal, and whether there is evidence of cytotoxicity or irreversible phenotypic loss.

      (6) Interpretation of IL-6 Modulation and Model Specificity:

      The authors show that IL-6 secretion in their co-culture system varies in response to multiple hepatotoxic drugs and parallels some reported clinical trends - notably, a concentration-dependent decrease with diclofenac (DIC) and leflunomide (LFM). They further report that this pattern is not observed in hepatocyte-PBMC-derived macrophage co-cultures, and they conclude that iMAC/iKC-like cells are essential for capturing immune-mediated hepatotoxic responses. However, the data presented do not fully justify such a conclusion. Several key mechanistic issues weaken the interpretation:

      (6.1) Mechanistic ambiguity in the DIC response: The decrease in IL-6 following DIC exposure is most likely attributable to reduced prostaglandin E₂ (PGE₂) production via COX inhibition, which secondarily suppresses IL-6 signaling. This effect is a general pharmacological property of NSAIDs and is not necessarily reflective of Kupffer cell-specific pathways. Direct evidence - such as prostanoid quantification or PGE₂ rescue experiments - is required to establish that the observed effects are liver-specific rather than nonspecific NSAID responses.

      (6.2) Pharmacogenetic complexity in the LFM response: LFM-induced hepatotoxicity is highly variable and largely dependent on CYP2C9 polymorphisms, which determine conversion to the active metabolite teriflunomide. Because hepatotoxicity and the associated cytokine responses are not universal among patients, a simplified co-culture model lacking metabolic diversity cannot be assumed to faithfully reproduce patient-specific immune responses. The observed IL-6 suppression could arise from differences in metabolic activation, intracellular exposure, or indirect signaling changes rather than from intrinsic KC-specific mechanisms.

      These points significantly undermine the authors' claim that IL-6 modulation provides definitive evidence of model specificity or predictive value. At minimum, the manuscript should (i) explicitly acknowledge these mechanistic limitations, (ii) include supporting data such as prostanoid profiling, CYP2C9 modulation, or teriflunomide quantification, and (iii) temper its claims regarding the model's capacity to recapitulate immune-mediated hepatotoxicity. Without such evidence, the current interpretation risks overstating the functional significance and translational relevance of the co-culture system.

      We fully agree with the reviewer and have highlighted this in our discussion (Lines 540 – 551).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The analysis of neural morphology across Heliconiini butterfly species revealed brain area specific changes associated with new foraging behaviours. While the volume of the centre for learning and memory, the mushroom bodies, was known to vary widely across species, new, valuable results show conservation of the volume of a center for navigation, the central complex. The presented evidence is convincing for both volumetric conservation in the central complex and fine neuroanatomical differences associated with pollen feeding, delivered by experimental approaches that are applicable to other insect species. This work will be of interest to evolutionary biologists, entomologists, and neuroscientists.

      Many thanks for your assessment and time handling this manuscript. We value the constructive input of both reviewers and believe that the result is an improved publication.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors previously reported that Heliconius, one genus of the Heliconiini butterflies, evolved to be efficient foragers to feed pollen of specific plants and have massively expanded mushroom bodies. Using the same image dataset, the authors segmented the central complex and associated brain regions and found that the volume of the central complex relative to the rest of the brain is largely conserved across the Heliconiini butterflies. By performing immunostaining to label a specific subset of neurons, the authors found several potential sites of evolutionary divergence in the central complex neural circuits, including the number of GABAergic ellipsoid body ring neurons and the innervation patterns of Allatostatin A expressing neurons in the noduli. These neuroanatomical data will be helpful to guide future studies to understand the evolution of the neural circuits for vector-based navigation.

      We thank Reviewer 1 for the constructive feedback and criticism, which will have strengthened this publication.

      Strengths:

      The authors used a sufficiently large scale of dataset from 307 individuals of 41 species of Heliconiini butterflies to solidify the quantitative conclusions and present new microscopy data for fine neuroanatomical comparison of the central complex.

      Weaknesses:

      (1) Although the figures display a concise summary of anatomical findings, it would be difficult for non-experts to learn from this manuscript to identify the same neuronal processes in the raw confocal stacks. It would be helpful to have instructive movies to show a step-by-step guide for identification of neurons of interest, segmentations, and 3D visualizations (rotation) for several examples, including ER neurons (to supplement texts in line 347-353) and Allatostatin A neurons.

      We approached this with the following logic:

      All 3D segmentations were animated, to illustrate how they are generated from raw imaging data. This means we are providing a video file for each major species group (Heliconius/outgroup-Heliconiini) for Figure 4 (general CX anatomy), Figure 7 (ER neuron projections), Figure S5 (ER neuron/bulb anatomy). This visual connection should help the reader relate 3D segmentations to image stacks. We have also added a reference to these videos in the relevant Figure captions.

      We also annotated image stacks, but did so selectively. We annotated key stacks of Figure 4 (general CX anatomy), Figure 7 (ER neuron projections), Figure S5 (ER neuron/bulb anatomy) and include a reference in figure caption to them.

      We refrained from annotating stacks of Figures 5, 6, 8 and S4. This is because we believe that the annotations we have performed in the figure panels will be sufficient for readers interested in the finer detail of these anatomies who are familiar with general CX anatomy.

      We believe that our approach will help the reader to gain a visual illustration of those parts of the manuscript which report key results and novel insights, such as ER neuronal variation, and that the data and figures collectively provide accessible information sufficient for this purpose.

      Text changes in Figure captions 4, 7 and S5: “See animated 3D segmentations and annotated stacks in file repository.”

      (2) Related to (1), it was difficult for me to assess if the data in Figure 7 support the author's conclusions that ER neuron number increased in Heliconius Melpomene. By my understanding, the resolution of this dataset isn't high enough to trace individual axons and therefore authors do not rule out that the portion of "ER ring neurons" in Heliconius may not innervate the ER, as stated in Line 635 "Importantly, we also found that some ER neurons bypass the ellipsoid body and give rise to dense branches within distinct layers in the fan-shaped body (ER-FB)". If they don't innervate the ellipsoid body, why are they named as "ER neurons"?

      Thanks for pointing to this. We believe this is primarily a nomenclature issue but have tried to specify in the text.

      Ultimately, neurons from this group that project to the EB forming the actual ring neurons and those that project to the FB with unclear function, thus far, emerge through the same lineage, DALv2 (as determined by Kandimalla et al 2023) and therefore have common developmental origin (also noted by Homberg et al 2018). To acknowledge their common developmental origin and to simplify nomenclature, and therefore also provide easier comprehension by non-experts, we specify which DALv2 progeny project to which areas, but refer to both adult neuron populations to “ER neurons”. We have changed the following text to acknowledge our definition specifically, which we hope mitigates the understandable confusion.

      Lines 354-357: “Here, we refer to these neurons, as well as those neurons projecting to the fan-shaped body (GU neurons in [66]), as ER neurons due to their common developmental origin [45,66] and to simplify anatomical descriptions.”

      Lines 386-387: “Whether these ER neurons solely branch in the fan-shaped body, as shown for GU neurons elsewhere [66] or have additional side branches entering the ellipsoid body is not clear.”

      (3) Discussions around the lines 577-584 require the assumption that each ellipsoid body (EB) ring neuron typically arborises in a single microglomerulus to form a largely one-to-one connection with TuBu neurons within the bulb (BU), and therefore, the number of BU microglomeruli should provide an estimation of the number of ER neurons. Explain this key assumption or provide an alternative explanation.

      Thanks for this. We do not think that our hypothesis necessarily requires any specific assumptions regarding the ratio of microglomerulus to ER or TuBu neurons. Even in Drosophila the ratio of ER to MG is only approximately 1:1, as some microglomeruli seem to combine into one. In other species this relationship might be very different. Indeed, our data suggests that in outgroup-Heliconiini the ratio is 4.4 microglomeruli to 1 ER neuron, and in Heliconius it is 3.4. However, as these MG numbers are extrapolated and cannot be precisely counted, they may be too imprecise to come to a definite conclusion, hence why we do not mention this in the text. Importantly, extrapolation in the current form is a valid additional way for us to describe overall bulb anatomy (next to bulb volume, average microglomerulus size).

      In any case, the inference we make here is that a conserved bulb anatomy in volume, MG numbers and size supports our assumption that the additional neurons in the ER neuron group/DALv2 progeny do not arborize in the bulb, but do so in the SMP/SLP region and in the fanshaped body. We believe we have described this inference accurately in the current manuscript.

      An additional point, not mentioned in the manuscript, but emerging through lineage annotations of connectome data, is that some DALv2 progeny have been identified as MBONs as well as being GABA-ergic, which could potentially be the ER-FB neurons that we describe (Schlegel et al 2024 Nature). We refrain from mentioning this here, as its too speculatory, but we thought the reviewer may be interested in this observation.

      (4) The details of antibody information are missing in the Key resource table. Instead of citing papers, list the catalogue numbers and identifier for commercially available antibodies, and describe the antigen, and whether they are monoclonal or polyclonal. Are antigens conserved across species?

      We have now added substantial information to Table 2, including research resource identifiers (RRIDs) and antigen descriptions, as well as information about specificity and conservation. In the text itself, in line 757, we already provide publications that have illustrated conservation very extensively.

      We believe that with the additional information provided in Table 2, all necessary information is now provided.

      (5) I did not understand why authors assume that foraging to feed on pollens is a more difficult cognitive task than foraging to feed on nectar. Would it be possible that they are equally demanding tasks, but pollen feeding allows Heliconius to pass more proteins and nucleic acids to their offspring and therefore they can develop larger mushroom bodies?

      This is an excellent point. Our current understanding is that pollen feeding is a cognitively more demanding task, because, a) the density of pollen resources is lower than nectar resources, and b) the competition for pollen is higher (pollen is depleted quickly, and Heliconius compete with each other, and other taxa including hummingbirds). There is therefore a benefit to high foraging efficiency, which favours the evolution of learning. This is likely reinforced by the long lives of Heliconius which live up to a year, compared to ~4 weeks for most outgroups and the temporal stability of major pollen resources, resulting in a memorised location providing benefit for the long periods of time (Young and Montgomery 2020 Proc B).

      We now refer to an additional publication (Young and Montgomery 2020 Proc B) in lines 103-104 for a fuller description of the ecology of pollen feeding, and in the current manuscript simply focus on the impact of mushroom body expansion on the CX.

      Reviewer #2 (Public review):

      Summary:

      In this study, Farnsworth et al. ask whether the previously established expansion of mushroom bodies in the pollen foraging Heliconius genus of Heliconiini butterflies co-evolved with adaptations in the central complex. Heliconius trap line foraging strategies to acquire pollen as a novel resource require advanced spatial memory mediated by larger mushroom bodies, but the authors show that related navigation circuits in the central complex are highly conserved across the Heliconiini tribe, with a few interesting exceptions. Using general immunohistochemical stains and 3D reconstruction, the authors compared volumes of central complex regions, and unlike the mushroom bodies, there was no evidence of expansion associated with pollen feeding. However, a second dataset of neuromodulator and neuropeptide antibody labeling reveals more subtle differences between pollen and non-pollen foragers and highlights sub-circuits that may mediate species-specific differences in behavior. Specifically, the authors found an expansion of GABAergic ER neurons projecting to the fanshaped body in Heliconius, which may enhance their ability to path-integrate. They also found differences in Allatostatin A immunoreactivity, particularly increased expression in the noduli associated with pollen feeding. These differences warrant closer examination in future studies to determine their functional implication on navigation and foraging behaviors.

      We thank Reviewer 2 for the constructive and thorough review. We believe that addressing these criticisms will have improved this publication.

      Strengths:

      The authors leveraged a large morphological data set from the Heliconiini to achieve excellent phylogenetic coverage across the tribe with 41 species represented. Their high-quality histology resolves anatomical details to the level of specific, identifiable tracts and cell body clusters. They revealed differences at a circuit level, which would not be obvious from a volumetric comparison. The discussion of these adaptations in the context of central complex models is useful for generating new hypotheses for future studies on the function of ER-FB neurons and the role of Allatostatin A modulation in navigation.

      The conclusions drawn in this paper are measured and supported by rigorous statistics and evidence from micrographs.

      Weaknesses:

      The majority of results in this study do not reveal adaptations in the central complex associated with pollen foraging. However, reporting conserved traits is useful and illustrates where developmental or functional constraints may be acting. The implied hypothesis in the introduction is that expansion of mushroom bodies in Heliconius co-evolved with central complex adaptations, so it may be helpful to set up the alternate hypotheses in the beginning.

      Thank you for this relevant comment. We have added to the text in lines 124-128, as follows

      “Indeed, these circumstances permit us to test the hypotheses that modifications in the mushroom bodies either occurred in isolation from other integrative centres, or that they occurred in concert with specific changes in centres, such as the central complex. This provides insights into the functional flexibility of two interacting, integrative centres across evolutionary time.”

      In the main text, the authors describe differences in GABAergic neurons "across several species" but only one Heliconius and one outgroup species seem to be represented in the figures. ER numbers in Figure 7H are only compared for these two species. If this data is available for other species, it would strengthen the paper to add them to the analysis, since this was one of the most intriguing findings in the study. I would want to know if the increased ER number is a trend in Heliconius or specific to H. melpomene.

      This points to imprecise phrasing. We indeed have additional data in other species, but unfortunately not to an extent that would permit quantification of cell numbers, which is why we chose to put these data into the supplement, Fig. S4.

      We modified the text to more directly point at the additional data in Fig S4, now reading in lines 362-368

      “…, we noticed a pronounced difference in a portion of projections leading into the fan-shaped body and a strong difference in signal inside layer III in our two focal species H. Melpomene and D. iulia, as well as other representatives of the Heliconiini tribe (Figure S4A-B, Figure 7). To understand how these differences could have occurred, we quantified ER neuron numbers in our focal species, and identified a significant difference, reflecting a 35% increase in Heliconius (t = 4.221, P = 0.004; Figure 7H).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Add a detailed description about each of the tiff files that were deposited at https://doi.org/10.5281/zenodo.15304965. It was hard for me to relate these raw images with the Figure panels. For instance, "Melp_GAD_26-F_detailed_conc.tif" in the Figure 7 folder seems to be used to make Figure 7L and N, but that information is cryptic.

      We agree with the reviewer. We added further descriptions, and have created a detailed readme file which explains which original file refers to which figure. Together with the efforts for Reviewer 1’s first comment, we hope that this updated version of our repository is easier to understand.

      In addition, we made additional changes in image orientation in some of the files supplied, and which were originally incorrect.

      (2) Add descriptions about the dataset for large-scale volumetric analysis. With the current methods and texts, it is hard to understand what kinds of staining and microscopes were used. I initially thought that they could be micro-CT data.

      We have made two improvements:

      We have added an additional readme file to explain the different datasets, and which datasets were used for each figure, to relate them to the original data deposited at zenodo.org (see your previous comment).

      We have added descriptions in several places in the manuscript file, i.e.

      Lines 133-135, now reading “To assess evidence of volumetric changes in the central complex and associated neuropils, we drew data from a large dataset of immunostained brains from 307 individuals of 41 species, …”

      Lines 144-149, now reading “We used a combination of phylogenetic comparative analysis across a large dataset of brains immunostained against the structural marker synapsin in 41 species and 307 individuals, and more targeted sampling of species that represent the behavioural and neuroanatomical diversity of Heliconiini for more fine-scale assessments of patterns of divergence in substructures of the CX with various antibodies (Figure 1A-B).”

      (3) Line 275: Non-expert readers would need an explanation about what the gamma lobe is.

      Agreed and added in line 273

      “Some of the ventral projections seemed to directly originate from the γ lobe, a portion of the mushroom body, thus potentially labelling projections of mushroom body output neurons into the fan-shaped body (Figure 5a-c) [12,21].”

      (4) Figures 4 I-L are missing.

      We modified the figure caption accordingly, and address annotated differences more directly. This section now reads

      “G/H: Labelling reveals two distinguishable layers in the fan-shaped body while additional staining elsewhere reveals further detail (arrows in G/H-2/3). Thicker tract conflations indicate the columnar architecture determined through the four columnar neuron bundles (arrowheads in G/H-3). Labelling in the EB reveals two pronounced layers (arrows in G/H-1/2), while obvious columns could not be indicated. PB protocerebral bridge, FB fan-shaped body, EB ellipsoid body. A anterior, P posterior. Scale bars are 50 μm.”

      (5) In the current version of Figure 1B, AOTU is displayed with the mushroom body. The authors can emphasize its relation to the central complex by showing it on the right side of panels together with the central complex.

      Great suggestion. We have done this now. We have kept the AOTU at the scale of the MB, indicated by the different scale bars of the bottom of the figure, as we’re showing the CX at a slightly larger scale.

      (6) Figure 1C: What do the colors of the lines represent?

      We now changed these colours so that they correspond to the colours chosen in Figures 2 and S2 as well as in a previous publication of the lab, added an asterisk next to Heliconius aoede, and added text to the figure legend:

      “Colour indicates focal groups here and elsewhere [29]. The asterisk at the branch of H. aoede indicates a secondary loss of pollen feeding.”

      (7) Figures 2A and B: What does the size of the circles represent? I guess that small ones are individuals, and larger ones are species averages. Plots with only species averages would be easier to see. It is difficult to distinguish Heliconius and Helicononius aoede in these panels. It would be easier if Heliconius circles were outlined with thin black lines. 

      Thanks for this. We wanted to keep both the averages and individual data points in one figure, as to not overcrowd the manuscript with additional figures. We still hope that the changes we made address the confusion sufficiently. We made the following modifications to Figure 2 and S1 and S2:

      (1) Added text in the figure legend clarifying what solid and transparent circles indicate (“Solid data points indicate species averages, while opaque circles indicate individual data points.”)

      (2) Added, as suggested, additional contours, to all Heliconius data points, and added corresponding text to the legend (“Black contours indicate Heliconius sp. data points.”)

      (3) Changed opacity settings of individual data points.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 391 and Methods. It was unclear how the extrapolated microglomeruli numbers were calculated. Please clarify this in the methods.

      Agreed. We substantially modified the text to address this.

      Lines 392-396: “We generated high resolution images of the bulb to determine its size (Figure S5 C-F), and 3D segmented seven microglomeruli per individual with which we generated an extrapolated approximation of total microglomeruli number by dividing bulb volume with average microglomerulus volume. This was necessary as most microglomeruli were not discernible from each other (Figure S5 G-H).”

      Lines 862-873: “To segment the bulb, we created high resolution images and were particularly careful to only segment the area of the bulb that comprised large synapses/glomeruli, excluding parts of the LEa/IT projection. This was essential, because we relied on extrapolating the total number of microglomeruli from a subset of segmented microglomeruli and the total volume that contained microglomeruli, which means any section containing tracts and not glomerular structures would skew the estimated total number of microglomeruli. Extrapolation was necessary, as not all microglomeruli were visually discernible. We achieved an unskewed bulb volume by leaving out dense pieces of tubulin-positive tract material. We segmented seven microglomeruli per individual from the posterior section of the bulb, where they were most clearly visible, to get the most comparable impression across individuals and species. We then calculated average microglomerulus size and divided this by bulb volume to determine an approximation of microglomeruli number.”

      (2) Line 439. It would be helpful to add that Kaiser et al. studied honeybees.

      Agreed! Now reads in lines 443-444

      “Moreover, Kaiser et al. [75] identified Allatostatin A expression in three fan-shaped and two ellipsoid body layers in the honey bee brain, …”

      (3) Line 492. "outcome" should be "outcomes".

      We believe that this refers to original line 481. Corrected. Thank you.

      (4) Figure 3B. If there is significance to the colors and triangle directions, please include a key/legend.

      We have added:

      “Cell type depictions are examples with localisation inside each neuropil being purely visual (as well as their colour), while triangles indicate approximate output sites.”

      We also corrected the following issues that were noted during our revisions:

      line 587, wrong reference.

      We updated references 37 and 44, which are now respectively

      Hodge, E. A. et al. Modality-specific long-term memory enhancement in Heliconius butterflies. Philos Trans R Soc Lond B Biol Sci 380, 20240119 (2025).

      Hodge, E. A. et al. Conservation of sensory pathways implies a localised change in the mushroom bodies is associated with cognitive evolution in Heliconius butterflies. Evol qpag005 (2026) doi:10.1093/evolut/qpag005.

      Figure S5 had an error in panels C and D, where the pictures in C were actually for H. Melpomene in D and the reverse; the other panels were correct. We have corrected this.

      In the data submitted on Zenodo: we corrected a few inconsistencies in channel colours and orientation in the .tiff files for Fig 6, 8 and S4.

      We added important bulb 3D segmentation files to the repository on Zenodo.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is an important paper that reports in vivo physiological abnormalities in the hippocampus of a rat model of traumatic brain injury (TBI). In this study, authors focused on changes in theta-gamma phase coupling and action potential entrainment to theta, phenomena hypothesized to be critical for cognition. While the authors provide solid evidence of deficits in both features post-TBI, the study would have been stronger with a more hypothesis-driven approach and consideration of alterations of the animal's behavioral state or sensorimotor deficits beyond memory processes.

      We would like to thank the reviewers for their comments on our manuscript. By incorporating their feedback, we were able to make our hypotheses more clear, expand our analyses to compare physiological processes across similar behavioral states, and address extra hippocampal input and potential sensorimotor confounds in our data.

      Specifically, we have added new data in Figure 5 showing how theta amplitude correlates with theta-gamma PAC and entrainment strength. We have also added supplementary Figure 1 demonstrating that there are no differences in exploration or movement velocity in injured animals compared to shams. Supplementary Figures 2, 3, and 4 were added to compare oscillatory power while animals were still, moving at a higher velocity, and following a broadband power shift correction respectively. We also added Supplementary Figure 7 demonstrating that there were no differences in firing rates between sham and injured animals while they were still or moving and Supplementary Figure 8 showing no changes in pyramidal cell bursting. Finally, we added Supplementary Figure 10 showing that there was no difference in velocity or distance traveled during testing in the MWM between sham and injured animals and that learning curves were similar across groups before sham/injury surgery. We believe that the addition of this data significantly improves our manuscript by more strongly controlling for the animal’s behavioral state in our analyses and provides strong evidence that significant sensory/motor deficits were not present in injured animals at this injury level and time point post injury. Below we address specific points raised by the reviewers.

      Reviewer #1 (Public review):

      Summary:

      This study investigated how traumatic brain injury affects oscillatory and single-unit hippocampal activity in awake-behaving rats.

      Strengths:

      The use of high-density laminar electrodes enabled precise localization of recording sites. To ensure an unbiased, rigorous approach, single-unit analysis was performed by a reviewer who was blind to experimental conditions. A proof of concept study was undertaken to characterize the pathology that resulted from the specific TBI model used in the main study. There was an effort to link abnormalities in hippocampal activity to memory disruption by running a cohort of rats on the Morris Water Maze task.

      Weaknesses:

      The paper is written as if the experiment was exploratory and not hypothesis-driven despite the fact that there is a wealth of experimental evidence about this TBI model that could have informed very specific predictions to test a hypothesis that is only hinted at in the discussion. The number of rats used for the spatial working memory experiment is not reported. Some of the statistics are not completely reported. It is also unclear what the rationale was for recording single units in a novel and familiar environment. Furthermore, this analysis comparing single-unit activity between familiar and novel environments is quite rudimentary. There are much more rigorous analyses to answer the question of how hippocampal single-unit firing patterns differ across changes in environments. There are details lacking about the number of units recorded per session and per rat, all of which are usually reported in studies that record single units. Spatial working memory assessment is delegated to a single panel of a supplementary figure. More importantly, there is no effort to dissociate between spatial working memory deficits and other motor, motivational, or sensory deficits that could have been driving the lower "memory score" in the experimental group.

      In order to address these important concerns, we have made the following changes:

      (1) We have updated the results section to include more rationale for the recordings and analyses used to clarify our hypotheses. In addition, we hope that our extensive characterization will lay the groundwork to inform future studies investigating circuit-specific disruptions following TBI and neuromodulatory therapies.

      (2) The number of rats used for the spatial working memory experiment is reported in the text and figure legend.

      (3) We have added supplemental Table 2 to include the requested statistical information (t-statistic, degrees of freedom, and 1 vs 2-tailed analyses).

      (4) Unfortunately, we did not have adequate occupancy to robustly extract and compare place cell properties across groups and environments which obscured the rationale of our study design and limited us to more rudimentary analyses. While animals did actively explore the two environments, the relatively short recording time limited the spatial sampling of the two-dimensional environment. We were able to extract putative place cells and found some evidence that place cells in TBI rats had lower spatial information content than in shams (as has previously been described). However, we did not feel that place cell analyses were rigorous enough to include in this manuscript due to the limited spatial sampling. Future studies in the lab will assess how TBI affects place cell information content, stability, and phase precession with better occupancy.

      (5) We have added Supplemental Table 1 that includes the total number of units recorded for each animal.

      (6) The spatial working memory deficit we report in the MWM is not a novel finding in this model of TBI. However, we wanted to ensure that <sub>L</sub>FPI in our hands at this injury level reproduced this known deficit. Importantly, the swim speed and distance traveled during testing did not differ between groups, suggesting that differences were not due to motor deficits. Additionally, the learning curves before sham/<sub>L</sub>FPI surgery were the same across groups. This data has been added to the manuscript in Supplementary Figure 10. While we did not test animals in a version of the task where the platform was visibly marked, previous studies have demonstrated that sham and injured rats perform comparably in a version of the MWM where the platform is visible or when a constant start location is used. These citations have been added to the manuscript.

      Reviewer #1 (Recommendations for the authors):

      For a more rigorous way of analyzing changes in hippocampal firing patterns across environments, see Wills et al 2005 for example.

      Addressed in point 4 above

      Spatial working memory tasks should always be compared with a control task to rule out confounding performance variables. Examples would be to use a variant of the MWM task that does not require the hippocampus such as using a visible escape platform.

      Addressed in point 6 above

      Statistics are typically reported including a t-statistic and degrees of freedom, not just the p-value. In addition, the authors should indicate whether the t-test is one or two-tailed.

      Addressed in point 3 above

      Reviewer #2 (Public review):

      Summary:

      The authors investigate changes in theta-gamma phase amplitude coupling, and action potential entrainment to theta following traumatic brain injury (TBI). Both phenomena are widely hypothesized to be important for cognition, and the authors report deficits in both after TBI. The manuscript is well-written, the figures are well-constructed, and the author's use of high-level analysis methods for TBI EEG data collected from awake, behaving animals is welcome.

      Major Comments:

      The animal n's are small (4 sham and 5 injured). In Figure 3, for instance, one wonders if panels D and E might have shown significant differences if more animals had been recorded.

      There are conflicting reports regarding the effect of <sub>L</sub>FPI on single cell firing rates. This is likely due to differential task demands and variations in <sub>L</sub>FPI severity across studies. We agree that the firing rates do appear to be trending; however, overall firing rate changes can be difficult to interpret. Because firing rates are influenced by behavior and brain state, we further separated firing rates into epochs when animals were moving or still and found similar trends that did not reach significance (data added in Supplementary Figure 7). We also assessed bursting in pyramidal cells to investigate whether potential changes in bursting influenced overall firing rates, and we found no differences between sham and injured animals across conditions (data added in Supplementary Figure 8). While the n’s are small when considered by animal, the number of units is actually fairly large, so if there were robust effects (as there were for the entrainment analyses), we would expect to see significant differences.

      The text focuses on deficits in the theta and gamma bands, but the reduction in power appears to be broadband (see Figure 1F, especially Pyramidal cell layer panel). Therefore, the overall decrease in broadband (in the injured population) must be normalized between sham and injured animals before a selective comparison between sham and injured animals can be conducted. That is the only way that selective narrow bands i.e., theta and low gamma can be compared between the two cohorts. A brief discussion of the significance of a broadband decrease would be appreciated.

      This is an excellent point that has now been addressed with the addition of Supplementary Figure 4. We used a well-established method (Donoghue et al 2020) to flatten power spectra in order to compare specific frequency bands in the context of a broadband shift. After applying this correction, we show that theta power is still reduced in injured rats compared to shams. While there is no difference in gamma power between groups in the corrected power spectra, this result should be interpreted with caution especially since there is not a large distinct peak in the gamma frequency range in the power spectrum of either sham or injured animals. However, if this is interpreted to mean that gamma power is not different between sham and injured animals, it makes the PAC data even more compelling. While there is clearly a broadband shift, the frequency range of this shift is still limited in the frequency domain to ~4-90Hz which contains physiologically relevant frequencies associated with synaptic currents. Importantly, the power spectra of sham and injured animals converge at low (<4Hz) and high (>100Hz) frequencies. This suggests that slow oscillations which could include delta and respiration-associated oscillations are not affected by TBI (though sleep recordings would be needed to properly address this). High-frequency activity can include ripples and HFOs which need to be separately extracted when comparing between groups due to their transient nature. However, overall spiking activity including the depolarizing spike and the after hyperpolarization significantly contribute to power in the high frequency range. Because this general high-frequency power is not different between groups, it suggests that the limited range of the broadband power reduction still contains important physiological signals. This broadband shift may result from a global reduction in or desynchronization of synaptic input to CA1. The specific mechanisms behind this broadband shift and the consequences it has on coding information in the hippocampus are fascinating questions that we hope will be specifically investigated in future studies. This point is now addressed in the Discussion.

      Reviewer #2 (Recommendations for the authors):

      Minor Comments:

      Please define your reference waveform for theta - is it theta recorded on the channel containing the cell? Average theta for all electrodes in SP? SP + SO? Theta for the nominal "St. pyr." channel? Please define.

      For all entrainment analyses, entrainment was measured referenced to the theta oscillation recorded from st. pyr. on the specific shank where the unit was detected. We added clarification in the results and methods sections regarding this point.

      Similarly, even though the peak of the theta wave appears from the figures to be taken as 0 degrees, please explicitly state this in the text.

      This has been added to the results and methods.

      Did the authors check for any difference between interneurons in SP and interneurons in SO?

      This is an excellent suggestion that we had hoped to investigate as it could inform whether specific interneuron populations were affected. However, we did not record enough units in st. ori to make this comparison.

      On page 8, Figures 3E and 3F are incorrectly labeled 4E and 4F.

      This has been fixed.

      Figure 1, panel C: please add a numerical scale to the colored scale bar.

      This has been added

      Figure 1, panel F: how was the significance between the frequency bands calculated?

      Statistics were done using a t-test at each frequency point with significance set at α=0.01 for multiple comparisons. This has been clarified in the figure legend and methods.

      Figure 3, panel A legend: Please add "Spike at 0 ms omitted for clarity.”

      This has been added

      Figure 4, panel A, right side: please provide the MVL for this cell, so that readers have a benchmark for evaluating the MVL as a parameter. A sample poorly entrained cell, with MVL, would also be informative.

      We added the MVL for this cell. We were unable to add a poorly entrained cell without making the figure more confusing.

      Raw data must be provided for the Morris Water Maze experiments described in Supplementary Figure 3.

      We added data showing no difference in the swim velocity or distance traveled between the sham and injured groups during memory testing as well as data showing that the two groups had similar learning curves during training before sham/injury surgery. See Supplementary Figure 10.

      Antibody 22C11 for APP has been shown to be non-specific when used for immunocytochemistry (it may be fine for Westerns). In addition, using a biotinylated secondary with an ABC kit for visualization risks contamination by post-injury changes in biotin. Reviewed in Xiong et al., 2023, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580020/.

      As is standard practice in neuropathology, negative controls were run for all of these experiments (identical preparations minus the primary antibody.) No non-specific staining was present that could be mis-interpreted as APP-positive axonal profiles in either sham or injured tissue. While beyond the scope of this response, there are many reasons the authors of the cited paper may have had non-specific staining, including a concentration 450X that of the one utilized here and the absence of an antigen-retrieval technique in their protocol.

      Tummala et al. used in vivo calcium-imaging after TBI and also investigated single-cell activity in familiar and novel environments, and when moving or still. The authors could consider discussing their work.

      We have added a citation for this paper

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors studied the effects of traumatic brain injury created by LFPI procedure on the CA1 at the network level. The major findings in this study seem to be that the TBI reduces theta and gamma powers in CA1, reduces phase-amplitude coupling in between theta and gamma bands as well as disrupts the gamma entrainment of interneurons. I think the authors have made some important discoveries that could help advance the understanding of TBI effects at the physiological level, however, more investigations into deciphering the relationship of the behavioral and brain states to the observed effects would help clarify the interpretations for the readers.

      Strengths:

      The authors in this study were able to combine behavioral verification of the TBI model with the laminar electrophysiological recordings of the CA1 region to bring forward network-level anomalies such as the temporal coordination of network-level oscillations as well as in the firing of the interneurons. Indeed, it seems that the findings may serve future studies to functionally better understand and/or refine the therapies for the TBI.

      Weaknesses:

      Discoveries made in the paper and their broad interpretations can be helped with further characterization and comparison among the brain and behavioral states both during immobility and movement. The impact of brain injury in several parts of the brain can alter brain-wide LFP and/or behavior. The altered behavior and/or LFP patterns might then lead to reduced spiking and unreliable LFP oscillations in the hippocampus. Hence, claims made in the abstract such as "These results reveal deficits in information encoding and retrieval schemes essential to cognition that likely underlie TBI-associated learning and memory impairments, and elucidate potential targets for future neuromodulation therapies" do not have enough evidence to test whether the disruptions were information encoding and retrieval related or due to sensorymotor and/or behavioral deficits that could also occur during TBI.

      Movement velocity is already known to be correlated to the entrainment of spikes with the theta rhythm and also in some cases with the gamma oscillations. So, it is important to disentangle the differences in behavioral variables and the observed effects. As an example, the author's claims of disrupted temporal coding (as shown in the graphical abstract) might have suffered from these confounds. The observed results of reduced entrainment might, on one hand, be due to the decreased LFP power (induced by injury in different brain areas) resulting in altered behavior and/or the unreliable oscillations of the LFP bands such as theta and gamma, rather than memory encoding and retrieval related disruption of spikes synchrony to the rhythms, while on the other hand, they may simply be due to reduced excitability in the neurons particularly in the behavioral and brain state in which the effects were observed, rather than disrupted temporal code. Hence, further investigations into dissociating these factors could help readers mechanistically understand the interesting results observed by the authors.

      We appreciate the Reviewer’s insights into disentangling the complex interactions between power, entrainment, and excitability, and have attempted to dissociate these further in our analyses. Regarding the broad effects of TBI, we agree that TBI affects many brain regions outside of the hippocampus as well as white matter pathways containing axons from areas where pathology is not visible, which likely results in widespread changes to LFPs across regions and altered behavior. Here we report disrupted network activity in the hippocampus which is likely a consequence of numerous pathologies across multiple brain regions. In the discussion, we speculate that disrupted power and coupling comes from desynchronization of inputs (especially those from the mEC and MS) as well as changes to local circuits within the hippocampus which combine to disrupt temporal coding. While the disrupted processes we report in the hippocampus are implicated in computational processes thought to support learning and memory, we acknowledge that results from this study do not causally reveal a specific mechanism that is directly responsible for cognitive impairments. We have changed the language of the quoted sentence from the abstract to make our claim less causal as we agree that the direct effects of these results on cognition are difficult to quantify due to the fact that animals were not performing a spatial navigation task with measurable outcomes during recordings. We have also removed the graphical abstract as we believe it is an oversimplification of the results given new analyses.

      Regarding the possible contribution of sensory and motor deficits or differences in behavioral states to the observed changes, we agree that it is essential to consider potential sensorimotor deficits as well as the animal’s behavioral state when comparing oscillations and single unit activity in the hippocampus, especially since these phenomena have been extensively liked to movement velocity and exploration. To address this, we have added Supplementary Figure 1 showing that there are no differences in movement velocity or exploration time between sham and injured animals. Because animals were simply foraging during electrophysiological experiments we do not expect there to be any major additional behavioral differences that would influence oscillations or spiking once locomotion is controlled for, though differences in attention or arousal cannot be ruled out. Additionally, analyses throughout the manuscript are performed independently during periods when animals were moving or still. Data in Figures 1 and 2 also only include data from the familiar environment to rule out any effects of novelty on hippocampal oscillations. Supplementary Figures 2 and 3 were added to demonstrate that TBI-associated reductions in power were consistent when animals were still and when a higher threshold for movement (>20 cm/sec) was used. Finally, supplementary Figure 10 was added showing no differences in swim velocity or distance traveled in the MWM between sham and injured animals, further suggesting that there are no significant sensorimotor deficits at this injury level and timepoint. Additionally, previous studies have demonstrated that sham and injured rats perform comparably in a version of the MWM where the platform is visible or when a constant start location is used, which provides further support that sensorimotor deficits are not responsible for memory deficits in this task (see above).

      Regarding the contribution of neuronal excitability to the reported changes, we agree that changes in the excitability of neurons could have a strong effect on entrainment. Importantly, we show that the disrupted oscillations recorded in the injured hippocampus do not coincide with significant changes in neuronal firing rates between sham and injured animals. We have added Supplementary Figure 7 demonstrating this holds true both when animals are still and when they are moving. Additionally, we have added Supplementary Figure 8 showing no differences in pyramidal cell bursting between sham and injured animals. While this suggests that there are not major changes in excitability, homeostatic plasticity mechanisms may impact firing rates and bursting, and the extent of these effects and their role on entrainment are unclear. This point was added to the Discussion.

      To address the effects of LFP power on entrainment strength, Figure 5 has been updated to show theta and gamma entrainment strength as well as theta-gamma PAC as a function of theta amplitude. We found that, during periods of comparable theta power, interneurons from sham and injured animals are similarly entrained to theta, but pyramidal cells from injured animals become significantly more entrained to theta than in shams. We address the potential implications of these results in the Discussion.

      Reviewer #3 (Recommendations for the authors):

      The authors have stated on page 7 and Figure 2E, "Taken together, injured rats show a decrease in the strength of theta-gamma PAC that is specific to st. pyr, and a shift in peak gamma amplitude to a later phase of theta in both st. pyr and st. rad". Is the shift in the peak position greater than expected by chance?

      We are unaware of a rigorous method that would allow us to compare this shift statistically. We have reported the observed shift and avoided calling the shift significant for that reason.

      The authors state on page 9 "cells (sham familiar=1.63{plus minus}0.23 Hz, n=51, injured familiar=2.11{plus minus}0.20 Hz, n=141, p=0.446; sham novel=1.84{plus minus}0.18 Hz, n=55, injured novel=2.23{plus minus}0.21 Hz, n=134, p=0.170; mean{plus minus}SEM; ks-test; Fig 4E) between sham and injured groups, but a higher percentage of pyramidal cells were active (firing rate >0.1Hz) in both the familiar and novel environment in injured rats compared to shams (sham=74%, injured=87%, p=0.025, Fisher's exact test; Fig 4F)." Do the authors mean Figures 3E and 3F respectively in place of Figures 4E and 4F?

      This has been fixed.

      Regarding the finding of similar firing rates and differences in the overlap of the neurons that were active in between injured and control animals, it is imperative to study the differences in behaviors of the animals. First of all, it seems appropriate to quantify and compare the immobility and mobile periods as well as the movement velocity of the animals in both groups. Then, it would be interesting to see if any behavioral variables correlate with the firing characteristics of the cells in both the sham and the injured animals. Since hippocampal cells have been known to have different levels of recruitment and firing rates according to different behavioral states such as movement velocity, some of the similarities or differences in neural findings might as well be attributed to the differences in behaviors in between the groups. However, some differences may be observed in the injured rats despite similar behavior and the LFP powers. In other words, studying the effects of injury during similar behavioral (e.g. firing rate as a function of movement velocity) and brain states (e.g. categorical effects of awake theta state, type two theta, and ripple states on firing rates and the entrainment) might help dissociate some effects that might only be due to difference in the behavior caused by the injury throughout the brain and might as well have less to do with specific injury induced local circuits level deficits in the hippocampus. The results in Figures 4, 5, and 6 reveal such interesting differences and hence, it becomes even more important to quantify and correlate behavioral states (movement velocity and theta/ripple) to the neuronal characteristics (LFP power, PAC, firing rates, and entrainment) presented in Figure 3.

      These are excellent points, and we have addressed them in the following ways:

      We added Supplementary Figure 1 demonstrating that there were no differences in movement velocity between sham and injured animals during electrophysiological recordings.

      Power and PAC analyses were done exclusively when the animal was moving to compare across similar behavioral states. Additionally, these analyses were constrained to recordings from the familiar environment to rule out any effects of novelty. Because animals were simply foraging during recordings we do not expect other behavioral factors besides movement velocity to play a major role in these processes. We have also added Supplementary Figures 2 and 3 which demonstrate that TBI-associated differences in oscillatory power follow similar trends when animals are still (Sup. Fig 2) or when a higher movement threshold (>20cm/sec) is used (Sup Fig 3). We also added Supplementary Figures 7 and 8 showing that there were no significant differences in firing rates or bursting while animals were still or while they were moving.

      The Discussion was expanded to discuss how TBI may disrupt circuits outside the hippocampus which may contribute to our findings. Additionally, we acknowledge the limitation that these recordings were not obtained while animals were doing a quantitatively measurable spatial navigation task which limits our ability to assess whether changes are truly behaviorally relevant.

      We have also updated Figure 5 to show entrainment across different levels of theta power.

      Elaborating on the abovementioned point, Figures 4B and 4E depict a finding that mean entrainment is reduced in the injured during immobility. The following factors may contribute to the results:

      (1) Reduction in theta power during immobility (reduced attention and/or LFP profile due to brain-wide injury), which makes theta cycles unreliable, which can contribute to the results.

      (2) Changes in neural firing properties during immobility, such as reduced burst rates or firing rates during immobility.

      (3) As the authors claimed in the graphical abstract, there might be an actual disruption of temporal code associated with the memory encoding. It would be awesome if the temporal disruption could be investigated during the comparable theta power and behavioral states. This analysis would test whether there is an unconfounded disruption in the temporal code in the hippocampus due to the injury. In any case, it would be ideal to isolate the epochs during sleep in which animals were in theta state and exclude ripple states to make a definitive assessment of the aforementioned factors. These further investigations would also help the interpretations made by authors in the discussion section such as "This can disrupt type II theta which occurs when animals are not actively moving and exploring the environment. We found that single unit entrainment to theta was substantially decreased in injured rats when they were not moving, a phenomenon not seen in shams, which suggests a disruption in type II theta. This provides further evidence that cholinergic signaling may be dysfunctional following TBI."

      (1) While theta power is reduced in injured animals, it can still be reliably detected even at rest. We added Supplementary Figure 2 showing power spectra while animals were not moving, and a distinct peak can be seen in the theta frequency range. Additionally, clear peaks in entrainment can be seen in the theta frequency band in Fig 4B while animals were still. This suggests that theta can still be reliably detected in injured animals even when they are not moving. However, we agree that reduced attention or arousal could contribute to these changes, and this point has been added to the Discussion.

      (2) We added Supplementary Figures 7 and 8 showing no differences in firing rates or bursting parameters between groups during periods of immobility.

      (3) We updated Figure 5 which now shows entrainment strength as a function of theta amplitude. We found that the theta entrainment strength of both pyramidal cells and interneurons increased with increasing theta amplitudes. We address potential implications of these changes in the Discussion.

      On page 10 the authors state, "theta entrainment strength drastically increased when rats began moving in injured but not sham animals." It is unclear if the effect was confined to the periods when rats started movement. Also, it would be of interest to investigate whether movement epochs and velocity were affected in the periods when the effects were observed.

      This was not confined to the exact points when the rats started moving. We removed the word “began” for clarity. See point regarding velocity above.

      On page 12 the authors state, "On test day, injured rats had a lower memory score than shams (sham=114.8 {plus minus} 21.8, n=9; injured=51.5{plus minus}6.8, n=14; p=0.020; mean {plus minus} SEM; Welch's t-test) indicating poor spatial memory (Sup Fig 3A)." The result is the validation of the TBI injury on a hippocampal-dependent Morris water maze task. However, it would be nice to see the quantification of the movement velocity in the water maze and the trajectory length in each group to further dissect whether animals were constrained in the movement and hence, they could not get to the platform or they forgot where it was located. Also, it would help to compare the rats' performance after sham or TBI surgeries to their performance during the training before the surgeries (assuming the data during the training periods were recorded as well).

      We have added Supplemental Figure 10 to include all of this information. Importantly, movement velocity and distance traveled were not different between groups on testing day, and the learning curves of both groups were the same before sham/injury surgery.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study utilises fNIRS to investigate the effects of undernutrition on functional connectivity patterns in infants from a rural population in Gambia. fNIRS resting-state data recording spanned ages 5 to 24 months, while growth measures were collected from birth to 24 months. Additionally, executive functioning tasks were administered at 3 or 5 years of age. The results show an increase in left and right frontal-middle and right frontal-posterior connections with age and, contrary to previous findings in high-income countries, a decrease in frontal interhemispheric connectivity. Restricted growth during the first months of life was associated with stronger frontal interhemispheric connectivity and weaker right frontal-posterior connectivity at 24 months of age. Additionally, the study describes some connectivity patterns, including stronger frontal interhemispheric connectivity, which is associated with better cognitive flexibility at preschool age.

      Strengths:

      The study analyses longitudinal data from a large cohort (n = 204) of infants living in a rural area of Gambia. This already represents a large sample for most infant studies, and it is impressive, considering it was collected outside the lab in a population that is underrepresented in the literature. The research question regarding the effect of early nutritional deficiency on brain development is highly relevant and may highlight the importance of early interventions. The study may also encourage further research on different underrepresented infant populations (i.e., infants not residing in Western high-income countries) or in settings where fMRI is not feasible.

      The preprocessing and analysis steps are carefully described, which is very welcome in the fNIRS field, where well-defined standards for preprocessing and analysis are still lacking.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      While the study provides a solid description of the functional connectivity changes in the first two years of life at the group level and investigates how restricted growth influences connectivity patterns at 24 months, it does not explore the links between adverse situations and developmental trajectories for functional connectivity. Considering the longitudinal nature of the dataset, it would have been interesting to apply more sophisticated analytical tools to link undernutrition to specific developmental trajectories in functional connectivity. The authors mention that they lack the statistical power to separate infants into groups according to their growing profiles. However, I wonder if this aspect could not have been better explored using other modelling strategies and dimensional reduction techniques. I can think about methods such as partial least squares correlation, with age included as a numerical variable and measures of undernutrition.

      We agree with the reviewer that this complex and rich longitudinal dataset would benefit from more sophisticated analytical approaches to characterise developmental trajectories in functional connectivity and to more directly link them to measures of undernutrition. However, conducting such analyses would require substantial additional methodological development, model validation, and careful interpretation, which fall beyond the scope and timeline of the present manuscript. Our aim here was to provide a clear and robust characterisation of functional connectivity changes during the first two years of life and to examine associations with growth outcomes at a specific developmental stage, while ensuring methodological transparency and statistical reliability. Importantly, these more advanced trajectory-based analyses are currently being pursued in the final phase of the BRIGHT project (BRIGHT IMPACT), in collaboration with expert statisticians and data scientists. This ongoing work aims specifically to leverage the longitudinal richness of the dataset to model developmental trajectories and their associations with early-life adversity and nutritional factors. We therefore see the present study as an important foundation for these forthcoming analyses.

      Connectivity was assessed in 6 big ROIs. While the authors justify this choice to reduce variability due to head size and optodes placement, this also implies a significant reduction in spatial resolution. Individual digitalisation and co-registration of the optodes to the head model, followed by image reconstruction, could have provided better spatial resolution. This is not a weakness specific to this study but rather a limitation common to most fNIRS studies, which typically analyse data at the channel level since digitalisation and co-registration can be challenging, especially in complex setups like this. However, the BRIGHT project has demonstrated that it is possible and that differences in placement affect activation patterns, which become more localised when data is co-registered at the subject level (Collins-Jones et al., 2021). Could the co-registration of individual data have increased sensitivity, particularly given that longitudinal effects are being investigated?

      We agree with the reviewer that the fNIRS community should work toward more precise methods for spatial registration of optodes, not only at the group level but also at the subject level, in order to make more precise inferences about the locations of activations. However, we followed a very thorough offline procedure to model headgear placement based on each participant’s photographs, which we believe complements the coregistration work performed by Collins-Jones in 2021. As reported in the fNIRS data acquisition section “Infants were excluded from further analysis if the band was excessively high over the front above the eyebrows” (line 409, methods section). Moreover channels displacement was measured from the photos, and if it was “equal or greater than 1.6 cm were renumbered, so that each channel was shifted either backward or forward one full channel location in space” (line 413, methods section). While these practices are thoroughly followed in the BRIGHT project, we are aware that they are not part of the standard procedure in many infant fNIRS studies. We hope that this work provides guidance for other researchers on how to coregister infant fNIRS data.

      Considering the spatial resolution of fNIRS, which is on the order of centimetres, and the thorough procedure combining fNIRS–MRI coregistration with channel displacement assessment based on photographs, we do not think that individual-level coregistration would have significantly increased the sensitivity of the results.

      I believe that a further discussion in the manuscript on the application of global signal regression and its effects could have been beneficial for future research and for readers to better understand the negative correlations described in the results. Since systemic physiological changes affect HbO/HbR concentrations, resulting in an overestimation of functional connectivity, regressing the global signal before connectivity computation is a common strategy in fNIRS and fMRI studies. However, the recommendation for this step remains controversial, likely depending on the case (Murphy & Fox, 2017). I understand that different reasons justify its application in the current study. In addition to systemic physiological changes originating from brain tissue, fNIRS recordings are contaminated by changes occurring in superficial layers (i.e., the scalp and skull). While having short-distance channels could have helped to quantify extracerebral changes, challenges exist in using them in infant populations, especially in a longitudinal study such as the one presented here. The optimal source-detector distance that minimises sensitivity to changes originating from the brain would increase with head size, and very young participants would require significantly shorter source-detector distances (Brigadoi & Cooper, 2015). Thus, having them would have been challenging. Under these circumstances (i.e., lack of short channels and external physiological measures), and considering that the amount the signal is affected by physiological noise (either coming from the brain or superficial tissue) might change through development, the choice of applying global signal regression is justified. Nevertheless, since the method introduces negative correlations in the data by forcing connectivity to average to zero, I believe a further discussion of these points would have enriched the interpretation of the results.

      We added a paragraph discussing the choice of using GSR in our pipeline in the discussion of the manuscript as follows: “Importantly, these results remained significant even without GSR, indicating that our findings are not solely driven by preprocessing choices. While the use of GSR in FC studies remains debated (Murphy & Fox, 2017), in the absence of short channels (which are difficult to use reliably with infants (Emberson et al., 2016)) and external physiological measures, applying GSR represented the most appropriate preprocessing option. In fact, failure to correct for systemic physiological fluctuations can, in fact, lead to artificially elevated connectivity estimates in fNIRS data (Abdalmalak et al., 2022)” (line 250, discussion section).

      Reviewer #2 (Public review):

      Strengths:

      The article addresses a topic of significant importance, focusing on early life growth faltering in low-income countries-a key marker of undernutrition-and its impact on brain functional connectivity (FC) and cognitive development. The study's strengths include the laborious data collection process, as well as the rigorous data preprocessing methods employed to ensure high data quality. The use of cutting-edge preprocessing techniques further enhances the reliability and validity of the findings, making this a valuable contribution to the field of developmental neuroscience and global health.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      The study fails to fully leverage its longitudinal design to explore neurodevelopmental changes or trajectories, as highlighted by all three reviewers. The revised manuscript still primarily focuses on FC values at a single age stage (i.e., 24 months) rather than utilizing the longitudinal data to investigate how FC evolves over time or predicts cognitive development. Although the authors acknowledge that analyzing changes in FC (ΔFC) would reduce degrees of freedom (to ~30) and risk interpretability, they do not report or discuss these results, even as exploratory findings.

      As suggested, we added the table reporting the results of the associations between changes in functional connectivity (DFC) between 5 and 24 months and cognitive flexibility in the supplementary materials (Table SI3). We additionally explored the relationship between changes in growth and cognitive flexibility as suggested by Reviewer #3 and we reported these additional analyses in the text as follows: “We also explored whether changes in growth and changes in functional connectivity between 5 and 24 months were associated with cognitive flexibility at preschool age, but we did not find any significant association (Table SI3 and Table SI4).” (line 213, results section).

      Furthermore, the study lacks specificity in identifying which specific brain networks are affected by growth faltering, as the current exploratory analyses mainly provide an overall conclusion that infant brain network development is impacted without pinpointing the precise neural mechanisms or networks involved.

      We added this limitation in the discussion as follows: “While the impact of undernutrition on brain development has been documented in LMICs (46), herein, we provided empirical evidence that growth faltering specifically in infants younger than five months of age impacts observable development of functional brain networks in the second year of life. Future studies may be needed to pinpoint which specific brain networks are impacted” (line 279, discussion section).

      Reviewer #3 (Public review):

      Summary

      This study aimed to investigate whether the development of functional connectivity (FC) is modulated by early physical growth, and whether these might impact cognitive development in childhood. This question was investigated by studying a large group of infants (N=204) assessed in Gambia with fNIRS at 5 visits between 5 and 24 months of age. Given the complexity of data acquisition at these ages and following data processing, data could be analyzed for 53 to 97 infants per age group. FC was analyzed considering 6 ensembles of brain regions and thus 21 types of connections. Results suggested that: i) compared to previously studied groups, this group of Gambian infants have different FC trajectory, in particular with a change in frontal inter-hemispheric FC with age from positive to null values; ii) early physical growth, measured through weight-for-length z-scores from birth on, is associated with FC at 24 months. Some relationships were further observed between FC during the first two years and cognitive flexibility, in different ways between 4- and 5-year-old preschoolers, but results did not survive corrections for multiple comparisons.

      Strengths

      The question investigated in this article is important for understanding the role of early growth and undernutrition on brain and behavioral development in infants and children. The longitudinal approach considered is highly relevant to investigate neurodevelopmental trajectories. Furthermore, this study targets a little studied population from a low-/middle-income country, which was made possible by the use of fNIRS outside the lab environment. The collected dataset is thus impressive and it opens up a wide range of analytical possibilities.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses

      Data analyses were constrained by the limited number of children with longitudinal data on NIRS functional connectivity. Nevertheless, considering more advanced statistical modelling approaches would be relevant to further explore neurodevelopmental trajectories as well as relationships with early growth and later cognitive development.

      While in this study we selected specific FC and outcome variables based on our hypothesis, the final phase of the BRIGHT project, known as BRIGHT IMPACT, aims to apply advanced statistical models to integrate a range of project variables into a single comprehensive analysis. We have acknowledged this in the discussion as follows: “Applying more advanced statistical modelling methods and structural equation modelling analyses may provide greater insight with further investigations in contexts of adversity and, in turn, establish which outcomes are predicted by FC” (line 309, discussion section).

      The abstract and end of the discussion should make it clearer that the associations between FC and cognitive flexibility are results that need to be confirmed, insofar as they did not survive correction for multiple comparisons.

      We have acknowledged this in the abstract as follows: “Our results highlight the measurable effects that poor growth in early infancy has on brain development and the possible subsequent impact on pre-school age cognitive development, underscoring the need for early life interventions throughout global settings of adversity”.

      We have acknowledged this in the discussion as follows: “While our results are consistent with previous studies, we acknowledge that the significant associations between early FC and later cognitive flexibility do not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample” (line 300, discussion section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1 B and C the authors should indicate that the results refer to HbO.

      We have added the suggested specification in the caption of the figure as suggested.

      (2) Figure SI2. Please indicate in the caption that these are the results when pre-processing did not include global signal regression.

      We have added the suggested specification in the caption of the figure as suggested.

      Reviewer #3 (Recommendations for the authors):

      (1) The sentence l529-531 ("To investigate whether FC early in life predicted...") should be more explicit as it is not clear which of the two variables is regressed by the other: is it the measure of cognitive flexibility that is regressed by FC, as the hypothesis suggests? Were other variables considered in the regression model? (For linear regression with only one "prediction" variable, the square root of the coefficient of determination 𝑅2 is equal to the correlation between the two variables.)

      Yes, it is the measure of cognitive flexibility that is regressed by FC. We have rephrased it in the text as follows: “we regressed later cognitive flexibility against FC that showed a significant change across the first two years of life”. There were no other variables in the regression model.

      (2) A summary table of the statistical results for FC-cognitive flexibility associations should be included as for other analyses, in addition to Figure 3B.

      We added a table of the results for the association between FC and cognitive flexibility in the supplementary materials (Table SI2, page 10), matching the same colours of Table 2. We referenced the table in the text in the main manuscript (line 211, result section).

      (3) Figure 3B: The legend should precise that these results did not survive corrections for multiple comparisons.

      We have specified this in the legend of Figure 3 as suggested.

      (4) For the young pre-schooler group, it seems that the age is around 4 years (age mean +/- SD=47.96 +/- 2.77 months) and not 3 years as indicated at several places in the manuscript.

      We found only once instance in which we erroneously said that the younger preschoolers were around 3 years. We replaced “Gambian infants from BRIGHT were cross-sectionally assessed at the age of 3 or 5 years for cognitive flexibility” with Gambian infants from BRIGHT were cross-sectionally assessed between the age of 3 and 5 years for cognitive flexibility (line 489, method section).

      (5) The authors use the term "intra-hemispheric" connections for the ones within each of the 6 sections. This might be misleading since fronto-posterior connections are also intra-hemispheric ones. Specifying "short-range" or "within-section" connections might be clearer.

      As suggested by the reviewer, we replaced “intra-hemispheric” with “intra-hemispheric within section” where appropriate through the whole manuscript.

      (6) Abstract: what is the justification for using the term "optimal" for describing developmental trajectories of FC?

      The term “optimal” refers to knowledge about typical developmental trajectories, coming especially from fMRI studies, as mentioned in the introduction: “Based on data from fMRI, current models hypothesize that FC patterns mature throughout early development (23–27), where in typically developing brains, adult-like networks emerge over the first years of life as long-range functional connections between pre-frontal, parietal, temporal, and occipital regions become stronger and more selective (28–31). [...]. Importantly, normative developmental patterns may be disrupted and even reversed in clinical conditions that impact development; e.g., increased short-range and reduced long-range FC have been observed in preterm infants (36) and in children with autism spectrum disorder (37, 38)” (line 93-106, introduction).

      (7) The confidence interval should be added in Figure SI3.

      As suggested, confidence intervals have been added in Figure SI3.

      (8) Other scatterplot examples of associations might be added as supplementary information.

      As suggested, we added several additional scatterplots to Figure SI3 (with confidence intervals as noted in the comment above) to show other associations between changes in growth and FC at 24 months.

      (9) Figure SI6: % in x-axis is still indicated.

      We apology for the oversight, all the percentage signs have now been removed from the x-axis tick labels.

      (10) The authors might show the (even not significant) results of the associations between changes in growth and cognitive flexibility in supplementary information.

      As suggested, we added the table reporting the results of the associations between changes in growth (DWLZ) and cognitive flexibility in the supplementary materials (Table SI3). We additionally explored the relationship between changes in functional connectivity and cognitive flexibility as suggested by Reviewer #2 and we reported these additional analyses in the text as follows: “We also explored whether changes in growth and changes in functional connectivity between 5 and 24 months were associated with cognitive flexibility at preschool age, but we did not find any significant association (Table SI3 and Table SI4).” (line 213, results section).

    1. In the past, in the field of education, we often referred to this concept as “parent involvement” rather than “family engagement.” We use the term “family” instead of “parent” to recognize that MLs may live with and have strong relationships with family members instead of or in addition to parents. These family members may play a crucial role in the student’s education and should be included by schools and communities (Staehr Fenner, 2014). The use of the word “engagement” rather than “involvement” indicates an active partnership and shared responsibility between families and educators.

      I can see why the concept of "parent involvement" was changed to "family engagement" and I think the term is all the better for the word change. Because its true that there are some ML students who may be living with extended family rather than with their parents, they could be living with an aunt and uncle or their house could involve both their parents and their extended family all under one roof. And having the ML students family "engaged" instead of "involved" sends a more positive message of wanting to give the family a chance to not only be included in their own child's education but also the class as a whole. An ML student's family sharing their experiences can benefit the non-ML students as well and help them understand the culture their ML students come from.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      In my view, the presentation of the data is in some cases not ideal. The phrasing of some conclusions (e.g., group-attacks and wolf-pack-hunting by the bacteria) is in my opinion too strong based on the herein provided data.

      We agree with your comment and have replaced the terms “Group-attacks” and “wolf-pack-hunting by “attacks” throughout the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 2AB, please add the name of the statistical test and the number of replicates that the data is based on to the figure legend.

      We thank Reviewer#1 for highlighting the need for more detail. We have revised the manuscript accordingly. The captions of figures 2, 3, 4 and S1 were revised to include the name of the statistical test and the number of replicates. Asterisks indicate significant differences in a multiple comparison test (One -way ANOVA with post hoc Tukey test),* P ≤ 0.05, ** P≤0.01, *** P≤ 0.001

      (2) Figure 2C is this figure referred to in the text?

      We apologize for this oversight. Figure 2C was replaced by new figures 2C and 2D and the old figure 2C is now referenced in the manuscript as Fig 3B1.

      (3) Movie 1, could the movie please also be provided as .mp4? I suggest including individual images across time in the main figure so that readers do not rely on opening a supplementary file for this key finding of the study.

      In the revised manuscript, all the videos were converted to mp4 format and individual images across time were included in Figure 2C and 2D (Chronological snapshots of one attack) and in figure 3B1 (Chronological snapshots of the complete event), thereby improving the readability of the manuscript.

      (4) Figure 3A2 (text l. 355), I am afraid I do not find this figure.

      Fig. 3A2 which previously corresponded to Fig. 3B1, correspond now to Fig. 2C and Fig. 2D. This has been corrected in the revised version of the manuscript.

      (5) Lines 356ff, I am afraid that I find it hard to follow what the authors refer to as the right cell or the left cell. I suggest either adding labels to the movies or providing individual images across multiple timepoints into the main figure that can be labelled and bring across the point.

      Arrows have been added to videos 3–5 to clearly indicate the cells referred to in the text and facilitate tracking across time.

      (6) In general, for all the microscopy, on how many cells have these phenomena been observed? What is n=x? Has this been quantified?

      We thank the reviewer for pointing this out.

      In caption of Fig. 3, the sentence “(A) Percentage of motile A. pacificum ACT03. (B) A. pacificum ACT03 attacked by V. atlanticus LGP32 and (C) A. pacificum ACT03 lysis after 0, 15, 30, 45 and 60 min of interaction. “was replaced by “(A) Cumulative percentage of motile A. pacificum ACT03 cells. (B) Cumulative number of cells attacked by V. atlanticus LGP32 and (C) Cumulative cell lysis after 0, 15, 30, 45 and 60 minutes of interaction.”. In Fig. 3 caption, the sentence “All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.” was also added.

      In Fig. 4 caption, the sentence “All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.” was added.

      In Fig. S1 caption, the sentence “All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.” was added.

      (7) Figure S1A, does this figure show means plus/minus standard deviation? If yes, please add this to the figure legends.

      In Fig. S1 caption, the sentence “Error bars represent the standard deviation of the mean of three independent experiments” was added.

      How do the authors explain the big variation in the test condition and not in the control?

      Regarding the higher variation observed in the test condition compared to the control, this may, on the one hand, reflect biological variability between independent batches of 60-h V. atlanticus cultures used to prepare the supernatants, and, on the other hand, a heterogeneity in the physiological status of independent algal batches (N = 3 ; 2 × 10^4 cells ; see Materials and Methods, Co-culture assay), which may not be perfectly synchronized . In contrast, the control condition consists of A. pacificum cultures incubated in fresh medium without bacterial supernatant, for which algal motility is highly reproducible and thus shows very little variation.

      (8) Line 375, "The lysis phase corresponded to initial vesicle formation followed by the bursting of A. pacificum ACT03 cells (Movie 5) and was induced by the old-starved culture supernatant of V. atlanticus LGP32 (Fig. S1)." Is this reference to Figure S1 correct? S1 shows motility, doesn't it? I don't see how this data supports the statement made in this sentence.

      We apologize for this unclear message.

      "The lysis phase corresponded to initial vesicle formation followed by the bursting of A. pacificum ACT03 cells (Video 5) and was induced by the old-starved culture supernatant of V. atlanticus LGP32 (Fig. S1)." was replaced by "The lysis phase corresponded to initial vesicle formation followed by the bursting of A. pacificum ACT03 cells (Fig. 3C and 3C1).

      And “We next tested whether this lytic effect was mediated by thermostable molecule (s) secreted by Vibrio. “was replaced by “We next tested whether this lytic effect was linked to Vibrio culture supernatant and mediated by thermostable molecule (s) secreted by Vibrio.

      (9) Line 388ff, "Group attacks were observed on non-degraded A. pacificum ACT03 cells, but not on previously lysed cells." No reference to a figure is provided. I am afraid I don't see the data that this statement is based on.

      As it is impossible to show a lack of attack, we just clarified the basis of our experiment.

      “To this end, A. pacificum ACT03 in exponential growth phase was first exposed for 30 minutes to the supernatant of a 60-hour culture of V. atlanticus LGP32, which induced 25% lysis of A. pacificum ACT03 cells. Next, the corresponding V. atlanticus LGP32 cells were added. During exposure, attacks were observed only on undegraded A. pacificum ACT03 cells, but not on previously lysed cells” was replaced by “To this end, A. pacificum ACT03 in exponential growth phase was first exposed for 30 minutes to the supernatant of a 126-hour culture of V. atlanticus LGP32, which induced lysis of 70% of the A. pacificum ACT03 cells (Figures 3C and 3C1, arrow 2 and video 4). Next, cells of V. atlanticus LGP32 from a 60-hour culture, capable of attacking A. pacificum ACT03 cells (Fig. 3B), were added. For 1 hour of exposure, no attack was observed on the previously lysed algae.”

      (10) Figure 4a, Based on the labeling of the figure, in particular the x-axis, it is not fully clear to me what I am looking at.

      Figure 4A has been reworked and its legend modified. We hope that this graph is clearer now.

      (11) Line 428, did the authors consider complementing the pvuD deletion mutant and testing for gain of function when providing the gene in trans?

      We did not investigate pvuD in this study and did not construct a pvuD deletion mutant. We therefore assume that the recommendation refers to pvuB, which was the focus of our work. Unfortunately, we did not perform this experiment. However, several lines of evidence support the implication of PvuB and the vibrioferrin uptake system in this process: (i) the loss of attack behaviour is specific to the mutant in the vibrioferrin uptake pathway and (ii) our expression and proteomic data show a strong induction of vibrioferrin uptake components under starvation and iron-manipulated conditions, which correlate with the attack phenotype.

      (12) Use of the term "group attack" in parentheses in the text, but in the section header and title. Is there really sufficient actual data to say that this is a "group attack"? What exactly are the indications for this being a behaviour of a group?

      We agree with you. The terms “group attacks” and “wolf-pack hunting” were replaced by the more neutral term “attacks” throughout the manuscript.

      (13) Table S1 and S2, those tables give a nice overview. Do the authors provide the raw data based on which they make a claim on "+" and "-" in the individual categories? I would prefer to see the actual data or at least have the possibility to look into this.

      In the revised versions of Tables 1 and 2, we have improved the captions and clarified the meaning of each column in order to avoid any ambiguity between the results of this study and the bibliographic information.

      Specifically regarding Table 2 :

      We do not present any visuals of the interaction between Vibrio and Alexandrium because these species all look alike. Regarding the other algae species tested in interaction with Vibrio, phenomena other than lysis or cell attack have been observed and are the subject of specific laboratory studies.

      (14) Line 456 "first study", line 40f "first evidence of a new mechanism". I suggest toning this down a bit and being clearer in the abstract about this being a working model that can be suggested based on individual bits of data.

      We thank Reviewer #1 for this helpful suggestion.

      In the summary:

      “This is the first evidence of a new mechanism that could to be involved in regulating Alexandrium spp. blooms and giving Vibrio a competitive advantage in obtaining nutrients from the environment.” was replaced by “The interaction model we propose here suggests that Vibrio could play a role in regulating the proliferation of Alexandrium spp., giving it a competitive advantage in obtaining nutrients from the environment.”

      In the discussion:

      Considering predator as a free organism that feeds at the expense of another, this study is the first evidence of the capacity of some Vibrio to develop a predatory strategy against an alga. This behaviour differs from parasitism, because the survival of Vibrio is not exclusively dependent on algae in environment” was replaced by “Consider a predator as a free-living organism that kills its prey and feeds on it, this study provides data suggesting the ability of Vibrios to develop an original predator-like behaviour to kill and feed on algae.”

      (15) Line 469 "Overall, these observations show that V. atlanticus LGP32 is able of wolf-pack hunting behaviour." I see the similarities. I feel that the term "show" is a bit too strong here, or I suggest referring to "wolf-pack-like behaviour".

      The sentence “Overall, these observations show that V. atlanticus LGP32 is able of wolf-pack hunting attack behaviour” was replaced by “Overall, these observations suggest that V. atlanticus LGP32 can exhibit a predator-like behaviour”

      Reviewer #2 (Public review):

      As Weaknesses Reviewer #2 include:

      (1) A lack of early, clear definitions for several important terms used in the paper, including 'predation', 'coordination' and 'coordinated action', 'group attack', and 'wolf-pack hunting', along with a corresponding lack of criteria for what evidence would warrant use of some of these labels. (For example, does mere simultaneity of attacks of an A. pacificum cell by many V. atlanticus cells constitute "coordination"? Or, as it seems to us, does coordination require some form of signalling between predator cells?)

      The term “Coordinate” was replaced by “simultaneous” throughout the manuscript

      The terms “Group attack” and “wolf pack hunting” were replaced by “attack” throughout the manuscript

      (2) Absence of controls for cell density in the test for starvation effects on predatory behaviour; unclear how the length of incubation affects the density of V. atlanticus cells.

      We thank the reviewer for pointing this out.

      Cells density experiment was already performed (cf. Fig. 4A).

      The sentence. ”All percentages were determined based on a minimum of 2,000 cells of A. pacificum ACT03.“ was added in captions of Fig. 3, Fig. 4 and Fig S1

      (3) Lack of clarity in some of the methodological descriptions

      The Methodology has been checked and some improvements have been made.

      Reviewer #2 (Recommendations for the authors):

      (A) Title

      (1) Could 'induces' be better than 'promotes'?

      We agree with Reviewer #2. The initial title, “Starvation of the bacterium Vibrio atlanticus promotes lightning group-attacks on the dinoflagellate Alexandrium pacificum”, was replaced by “Starvation of the bacterium Vibrio atlanticus induces simultaneous attacks on the dinoflagellate Alexandrium pacificum”.

      (B) Abstract

      (1) Perhaps define pycosphere in the abstract - many readers might not know this word.

      We have revised the abstract to define the term phycosphere and added the sentence “This occurs in the microenvironment surrounding phytoplankton cells, the phycosphere. An interface rich in nutrients and organic molecules exuded by the cell.”

      (2) Perhaps "on dinoflagellates".

      We thank Reviewer #2 for this suggestion. We have revised the abstract by replacing “on the dinoflagellates species” with “on dinoflagellates”.

      (3) Line 33 - The word 'prey' is used without a claim of predation having yet been made; only killing has been claimed so far.

      We agree and have replaced the word “prey” by “algae” in the abstract.

      (4) Line 34 - It is unclear whether the description refers to the 'attack stage' or to 'wolf-pack attack' in general. The sentence is written in such a way that it seems to refer to 'wolf-pack attack'. However, this would seem to be incorrect, with the description being specific to V. atlanticus.

      To avoid this ambiguity, we have removed the sentence “resembles the ‘wolf-pack attack’ strategy” from the abstract.

      (5) Line 35 - Should there be a 'consumption phase'?

      We agree with the reviewer #2, “degradation” was replaced by “consumption”.

      (6) If predation is claimed later in the manuscript (which it is), it should be explicitly claimed in the abstract.

      We thank Reviewer #2 for this helpful suggestion.

      We have revised the abstract. The sentence “Results showed that Vibrio atlanticus was able to coordinate lightning group attacks then kill the dinoflagellate Alexandrium pacificum ACT03” was replaced by “The results showed that Vibrio atlanticus was capable of attacking and killing the dinoflagellate Alexandrium pacificum ACT03”.

      (C) Main text

      (1) Line 54 - Perhaps "Among HAB-causing organisms...".

      We agree with the reviewer’s suggestion and have revised the wording.

      (2) Line 56 - "that, together with..., form the "Alexandrium tamarense" complex".

      We agree with the reviewer’s suggestion and have revised the sentence.

      (3) Line 57 - What this "complex" is and its significance should be explained.

      “Among them, Alexandrium pacificum is a flagellated eukaryotic unicellular organism that together with Alexandrium tamarense and Alexandrium fundyense form the "Alexandrium tamarense" complex (Hadjadji et al., 2020)” was replaced by

      “Among them, Alexandrium pacificum is a flagellated eukaryotic unicellular organism that together with Alexandrium tamarense and Alexandrium fundyense form the "Alexandrium tamarense" complex, responsible for paralytic shellfish poisoning worldwide (Hadjadji et al., 2020)”

      (4) Line 58 - What is a Rephy survey?

      We clarified this point, “by rephy survey” was replaced by “by the French phytoplankton observation and monitoring network (Rephy)”

      (5) Line 59 - 'resulting in' instead of 'resulting of'.

      We agree with the reviewer and have replaced “resulting of” with “resulting in”.

      (6) Line 65 - It seems that ', influencing the time of appearance of blooms' would be more correct than the current phrasing. The current phrasing is unclear regarding the relation between species, tolerance range, and the time of appearance of blooms.

      To address this point, “Depending on the phytoplankton species, the tolerance range of physicochemical parameters is different and influences the time of appearance of blooms” was replaced by “Depending on the species of phytoplankton, tolerance to physicochemical parameters varies, which influences when blooms occur.”

      (7) Line 76 - Run-on sentence which should probably be split after the reference to Wang et al., 2020.

      We agree with the reviewer and have split the sentence.

      (8) Line 89 - What are these observations?

      This sentence was reformulated.

      “Based on observations from the natural environment showing a potent relationship between Vibrio and Alexandrium algae bloom events, this study aim to determine in vitro, the main factors implicated in this relationship” was replaced by ”This study aims to describe observations made in the natural environment between Vibrio bacteria and Alexandrium algal blooms, and to determine in vitro the main factors involved in this relationship.”

      (9) Line 94 - This is the first clear reference to a predator-prey interaction, and it is stated as if it's established. Is it not a central goal of the study to demonstrate that predation is even happening?

      Based on the title and abstract, I would have expected the major claims of the paper highlighted in the abstract to be:

      (i) that predation of algae by bacteria occurs in this system,

      (ii) there is a social component of predation,

      (iii) claims about what induces this predatory behaviour.

      The summary has been amended accordingly, and the term “predation” has been removed, along with all sentences referring to it.

      (10) Line 99 - What does n.d. mean?

      This point was addressed in the revised version.

      (11) Line 97 section - specify qPCR.

      This point was clarified in the revised version.

      (12) Line 139 - Mentioning the oligonucleotides in this part of the methods seems out of place. Would this not fit better in the section on Gene expression analysis?

      This sentence was discarded from this paragraph.

      (13) Line 147 - Where did the co-cultured phytoplankton species come from?

      To answer this point, reference to Table 2 was added

      (14) Line 149 - Is it known if the phytoplankton strains had all grown to the same density after 24 hours?

      The doubling time of dinoflagellates in laboratory culture is between 5 and 7 days. During the duration of the experiments, the dinoflagellate concentration did not change significantly.

      The sentence “(doubling time between 5 and 7 days)” was added

      (15) Line 150 - Was the density of the Vibrio cultures at the different incubation times measured? Density might play an important role in predation, and so it would be important to control for density in these assays.

      The concentrations of live vibrio in each individual culture were not actually measured. However, the role of vibrio density in attacks was measured and is shown in Figure 4A and observed in Fig 2B.

      (16) Line 153 - How long was the co-incubation?

      The incubation times were added in the revised version.

      (17) Line 158 - What is mean by "independent experiments", more exactly?

      To clarify this point, “Data are the means of three independent experiments” was replaced by “The data come from three independent experiments using independent phytoplankton cultures and independent bacterial cultures.”

      (18) Line 161 - Perhaps give the source information about the Vibrio strain at its first mention.

      A reference has been added in the revised preprint.

      (19) Line 163 - line 141 refer to multiple non-axenic species, whereas here "the algal strain" is referred to.

      And

      (20) Line 164 - language phrasing throughout the manuscript could use some polishing, e.g., "this means that additional bacteria...".

      To address this comment, “As the algal strain used in the study is not axenic, means that additional bacteria, other than the V. atlanticus LGP32, are potentially present in the experiments.” was replaced by “As the A. pacificum ACT03 strain (table 2) used in the study is not axenic, there is potential for bacteria other than V. atlanticus LGP32 to be present in the experiments.”

      (21) Line 208 - Why were both magnitude and p-value criteria used rather than just p-values?

      In the present proteomic approach each experimental condition was measured six times, and the average (mean) value was used to reduce random noise. Then we selected differences that had to be large enough to matter biologically, this is a central criterion and at least a 2-fold change was considered to focus exclusively on biologically relevant differences, which allowed us to control for the effect size. However, the differences also had to be statistically significant, we applied a statistical confidence at P < 0.01, to be sure that there is less than a 1% chance the result happened randomly. In the present proteomic approach each experimental condition was measured six times, and the average (mean) value was used to reduce random noise.

      Then we selected differences that had to be large enough to matter biologically, this is a central criteria and at least a 2-fold change was considered to focus exclusively on biologically relevant differences, which allowed us to control for the effect size. However, the differences also had to be statistically significant, we applied a statistical confidence at P < 0.01, to be sure that there is less than a 1% chance the result happened randomly. We considered that using both criteria makes the results meaningful and trustworthy, not just a small or random fluctuation.

      (22) Line 270 - Were these three replicate experiments also "independent"; if yes, in what sense?

      “All experiments were conducted in triplicate” was replaced by “The experiments were performed using biological triplicates, each of which was analyzed in triplicate.”

      (23) Line 296 - Perhaps "the temperature-sensitivity (or resistance) of" rather than "the nature of".

      The modification was made in the new manuscript.

      (24) Line 307 - The sentence mentions only one influential period that was removed from the dataset, but the word 'whenever' suggests multiple occurrences.

      We agree, “whenever” was replaced by “because”.

      (25) Line 325 - line 327 - The rationale behind the first part of the following sentence isn't clear to me, and what is meant by the second part is also not clear.

      To clarify this point, “This result is consistent with the difficulty that Vibrio has in growing at temperatures below 20°C and with the complex interacting factors driving bloom dynamics (Laanaia et al., 2013)” was replaced by “This result is consistent with the difficulty Vibrio has in growing at temperatures below 20°C and with the many environmental factors that influence the dynamics of algae proliferation (Laanaia et al., 2013)."

      (26) Line 327 - line 328 - Hard to interpret; does this refer to living algal cells, or all algal cells, living and degraded?

      To improve clarity, “Interestingly, in spring 2015, the mean densities of all Alexandrium cells and of free-living Vibrio were positively correlated” was replaced by “Interestingly, in spring 2015, the mean densities of Alexandrium cells (living and degraded) and of free-living Vibrio were positively correlated”

      (27) Figure 2 - These results strongly point to predation, but why the Vibrio population would already be elevated in the co-culture treatment relative to the control immediately after inoculation (0 hrs) is not clear.

      The experiments were not conducted at the same time, and the first value on the graphs corresponds to the concentration of vibrio determined after 1 hour of exposure/incubation and not at time 0. Figures 2A and 2B have been modified accordingly, and substantial changes have been made to the relevant section of the results.

      (28) Line 348 - There's no mention of Figure 2C in the main text, or of the statistical test associated with it in the Figure 2 legend.

      To address this comment, Figure 2C has now been cited in the main text, and the statistical analysis method has been added to the Figure 2 caption.

      (29) Line 352 - Text descriptions of videos are not easy to connect with the video content. Label the file names the same as how they are referred to in the text.

      We agree with you, the sentence “Epifluorescence microscopy observation of GFP-labelled V. atlanticus LGP32 (previously grown in Zobell medium) in interaction showed that A. pacificum ACT03 cells that had lost their motility were attacked individually by V. atlanticus LGP32 before being lysed (Fig, 2C and Video 1). “was rephrased and replaced by “Epifluorescence microscopy observation of GFP-labelled V. atlanticus LGP32 (previously grow in Zobell medium) in interaction showed that V. atlanticus LGP32 simultaneously attacks A. pacificum ACT03 cells (Fig, 2C and Video 1).”

      (30) Movie 1 could be cut to remove uninteresting footage at the start. What indicates lysis? Is the deformation of the cells an indication of lysis?

      To respond to this comment, Video 1 has been shortened and in the caption, “degraded” was replaced by “lysed”

      (31) Line 353 - Video could be zoomed in more on a few typical attacks to remove visual noise.

      A chronological overview of an attack has been added to Figure 2 corresponding to Figure 2D, and a chronological overview of the overall event has been added to Figure 3 corresponding to Figure 3B1.

      (32) Line 355 - There does not seem to be a Figure 3A2.

      To address this point, the Fig. 2 and Fig. 3 has been revised for more clarity. See above

      (33) Figure 3 - Can the authors fully exclude an effect of bacterial density as distinct from an effect of growth/starvation phase? It would be helpful to determine bacterial viable population densities at 12, 36, 60, and 126 hrs of incubation in Zobell medium, and to control for density in testing for effects on algae.

      Information on Vibrio densities incubated in Zobell medium for 12, 36, 60, and 126 hours has been now included in the results section “Attack of A. pacificum ACT03 is activated by V. atlanticus LGP32 starvation.”

      (34) Line 363 - It is unclear how the degradation of the flagella is apparent from movie 3. It would be helpful to have a comparison with healthy flagella.

      Alexandrium cells with intact flagella move so quickly that it is impossible for us to follow them and film their flagella with the tools at our disposal.

      For greater clarity, arrows have been added to videos 3, 4 and 5.

      (35) Line 364 - Sudden change from referring to the recording as 'video' instead of movie. What is meant by erratic swimming? The cell does not seem to move much.

      To address this comment, “Movie” was replaced by “Video” throughout the manuscript and “erratic swimming” was replaced by “irregular swimming”

      (36) Line 365 - How did you observe the detachment of the flagellum?

      The detachment of the flagellum can be observed using a confocal microscope. This process was filmed and presented in Video 3. Arrows have been added to the video to clearly indicate the flagellum detachment.

      (37) Line 368 - Perhaps this is due to it not being clear regarding which movie is meant, but there is no clear attack visible in movie 4.

      To make this clearer, arrows have been added to the video 4 to indicate attached cells.

      And the sentence in the caption of the video 4 “Vibrio, filmed under a confocal microscope, attacks in groups one immobilized Alexandrium cell then moves on to attack — still as a group — another cell without touching the other whole cells, suggesting active communication between Vibrio cells” was rewritten and replaced by “This video, recorded under a confocal microscope, shows Vibrios simultaneously attacking a first immobilized Alexandrium cell, then moving on to attack a second cell without ever targeting the other cells present, suggesting active communication between the Vibrio bacteria.”

      (38) Line 369 - It seems the peak attach % was reached at 45 minutes, not 15-30 minutes.

      Sorry for the confusion. In fig. 3 for more clarity, the sentence “(A) Percentage of A. pacificum ACT03 motile cells. (B) cells attacked by V. atlanticus LGP32 and (C) cells lysis after 0, 15, 30, 45 and 60 min of interaction” was replaced by “(A) Cumulative percentage of motile A. pacificum ACT03 cells. (B) Cumulative number of cells attacked by V. atlanticus LGP32 and (C) Cumulative cell lysis after 0, 15, 30, 45 and 60 minutes of interaction.”

      (39) Line 382 - "clearly show role of nutrient limitation", see comment re controlling for any role of bacterial density.

      To address this point, information’s on Vibrio densities were added in the manuscript. See cf comment 33.

      (40) Line 385 - line 386 - Phrasing unclear.

      We have revised the text accordingly, “To this aim, A. pacificum ACT03 in exponential growth phase was first exposed for 30 min to supernatant from 60 hours starved V. atlanticus LGP32 Zobell media that induced 25% lysis of A. pacificum ACT03 cells and next to the corresponding V. atlanticus LGP32 cells. Group attacks were observed on non-degraded A. pacificum ACT03 cells, but not on lysed cells.“ was replaced by “To this end, A. pacificum ACT03 in exponential growth phase was first exposed for 30 minutes to the supernatant of a 126-hour culture of V. atlanticus LGP32, which induced lysis of 70% of the A. pacificum ACT03 cells (Figures 3C and 3C1, arrow 2 and video 4). Next, cells of V. atlanticus LGP32 from a 60-hour culture, capable of attacking A. pacificum ACT03 cells (Fig. 3B), were added. For 1 hour of exposure, no attack was observed on the previously lysed algae.”

      (41) Line 413 - Is this the only pathway for quorum sensing in V. atlanticus?

      Indeed, the last two sentences of this paragraph are unclear.

      To address this point:

      “By targeted mutagenesis of key genes involved in QS pathways ΔluxM (HAI-1 production), ΔluxS (AI-2 production) and ΔluxR (high-density QS master regulator) did not lead to any change in the attack behaviour of V. atlanticus LGP32 (Fig. 4C).” was replaced by “Targeted mutagenesis of key genes involved in two of the three known QS pathways in vibrios (Fig. S3), ΔluxM (HAI-1 production), ΔluxS (AI-2 production), and ΔluxR (main high-density QS regulator), did not result in any changes in the attack behavior of V. atlanticus LGP32 (Fig. 4C).”

      And “Taken together these results showed that attack by V. atlanticus LGP32 is not link to QS.” was replaced by. “Combined with the absence of overexpression of the CqsS gene (inducible by CAI-1) involved in the last known QS pathway in Vibrio (Fig. S3), these results indicated that the attack by V. atlanticus LGP32 is most likely unrelated to QS.”

      (42) The references to tropism aren't clear.

      You're right, there's no reason to use the term tropism here. We have removed it.

      (43) Line 439 - Why was H3BO4 used as a control for the addition of FeCl3?

      For clarity, the sentence “Boron being known to be a regulator or capable of being transported by vibrioferrin (Romano et al., 2013; Weerasinghe et al., 2013), we tested its potential involvement in the interaction but no effect was evidenced here.” was replaced by “Given that boron is known for its role in regulating a global bacterial cellular response to phytoplankton and to bind to vibrioferrin (Romano et al., 2013; Weerasinghe et al., 2013), we tested its potential involvement in simultaneous vibrio attacks. Compared to the Zobell control, no effect on the number of attacks was observed”

      (44) Line 441 - line 449 - Should explicitly say in text that no attacks were observed for any species other than the Alexandrium and Gymnodinium species.

      We agree and have explicitly stated in the text that no attacks were observed for any species other than Alexandrium and Gymnodinium.

      (45) Line 454 - line 455 - The last part of this sentence seems a strange statement, since

      (i) it has long been know that predatory bacteria can eat a wide range of eukaryotes, ii) one of the cited papers (Perez et al) actually highlights a case of bacterial predation on algae, and iii) in the next paragraph the authors themselves highlight Streptomyces predation of algae.

      To make this clearer, « Among predators, predatory bacteria are found in a wide variety of environments, and like bacteriophages and predatory protists, they have been reported to prey exclusively on other bacteria » was replaced by “Among predators, predatory bacteria are found in a wide variety of environments and, like bacteriophages and predatory protists, feed primarily on other bacteria, although a few cases of predation on microbial eukaryotes have also been reported.”

      (46) Line 455 - Better to clarify the authors' definition of a predator at the start of the paper. The offered definition seems more like a definition of 'consumer' than 'predator', as the latter normally involves both the killing and consumption of other organisms, not just consumption with some kind of "expense".

      To address this comment:

      - “predator behaviour” was replaced by “predator-like behaviour”

      - and “Considering predator as a free organism that feeds at the expense of another, this study is the first evidence of the capacity of some Vibrio to develop a predatory strategy against an alga. This behaviour differs from parasitism, because the survival of Vibrio is not exclusively dependent on algae in environment” was replaced by “Consider a predator as a free-living organism that kills its prey and feeds on it, this study provides data suggesting the ability of Vibrios to develop an original predator-like behaviour to kill and feed on algae.”

      (47) Line 457 - Don't see the benefit of trying to distinguish from parasitism here, especially since parasitism can be facultative, whereas the authors' phrasing suggests that it is always obligate.

      You are right, this sentence has been deleted.

      (48) Line 463 - line 464 - The authors should clearly explain exactly what detailed aspects of Myxococcus and Lysobacter predation they think the "attack stage" of V. atlanticus resembles.

      Accordingly, “The second stage, the ‘attack stage’ corresponding to physical contact between Vibrio and Alexandrium resembles the ‘wolf-pack attack’ strategy described for Myxococcus xanthus and Lysobacter regardless of the prey species used, M. xanthus must be in close proximity to prey cells in order to induce their lysis and to benefit from their biomass (Martin, 2002; Perez et al., 2014)” was replaced by “The second stage, the ‘attack stage’ corresponding to the physical contact between Vibrios and Alexandrium, is similar to the strategy used by Myxococcus xanthus and Lysobacter. These bacteria must be in close proximity to their prey in order to cause lysis and utilize their biomass, regardless of the prey's species (Martin, 2002; Genovesi et al., 2013; Perez et al., 2016; Zhang et al., 2020)”

      (49) Line 466 - line 467 - The comparison to bacteria clustering around lysed cells is surprising since the authors show that V. atlanticus does not attack already lysed cells.

      The sentence was rephrased, “This phenomenon is comparable to that of bacteria clustering around lysed ciliate cells “was replaced by “Visually, this phenomenon resembles bacteria clustering around lysed ciliate cells.”

      (50) Line 469 - Missing is a statement of exactly what criteria constitute "wolf-pack hunting behaviour" and exactly how V. atlanticus meets those criteria.

      To address this point, “wolf-pack hunting behaviour” was replaced by “predator-like behaviour”

      'Able of' should be corrected to 'Capable of'.

      We agree and have reworded the sentence.

      (51) Line 470 - Consider starting a new paragraph for the material on quorum sensing.

      Accordingly, we have separated the section concerning QS pathway from the section concerning iron pathway.

      (52) As part of their discussion on the role of iron uptake, can the authors comment on any relationship between starvation and iron uptake, and in particular the observations that, while general nutrient deprivation induces attacks, supplementation with a specific nutrient (iron) also induces attacks (Figure 4D)? Do bacteria starved for general growth substrates take up more iron than growing bacteria?

      To respond to this comment, “Future study could demonstrate further the role of vibrioferrin in group attack, by adding iron-saturated vibrioferrin to algae-Vibrio co-cultures.” was replaced by “Interestingly, if a general nutrient deficiency causes attacks, iron supplementation increases the number of attacks (Figure 4D), suggesting the importance of iron absorption in the attack behavior. Future studies should determine whether nutrient deficiency increases the iron absorption capacity of Vibrios and whether this plays a major role in the attack mechanism.”

      (53) Line 486 - Of what is boron known to be a regulator?

      To respond to this comment, “Given that boron is known for its regulatory properties and for being transportable by vibrioferrin“ was replaced by “Given that boron is known for its role in regulating a global bacterial cellular response to phytoplankton and to bind to vibrioferrin”.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This paper describes the localisation of DNA repair proteins, which carry out their DNA repair function in the nucleus, to the cytoplasmic Golgi apparatus. Using the Human Protein Atlas to identify candidates, the authors use antibody localisation to show that a significant number of DNA repair proteins also localise at the Golgi. It appears that proteins involved in common DNA repair pathways localise to common regions of the Golgi. The Golgi-nucleus distribution of the DNA repairs proteins changes upon DNA damage, indicating a dynamic relationship. The authors focus on the DNA repair protein RAD51C and show that its loss from the Golgi and translocation to the nucleus upon DNA damage is mediated by the ATM kinase. Anchoring at the Golgi is shown to be mediated by the golgin giantin. A functional role for giantin in DNA repair is shown in knockdown studies, supporting a mechanism whereby Golgi anchoring of RAD51C, and possibly other DNA repair proteins, by giantin, is required to maintain proper control of DNA repair. The data are clear and support the authors' conclusions. The data are carefully quantified throughout. I found the text easy to read.

      • Major points:*

      • 1.) To validate the Golgi localisation, KD using siRNA was used. It was deemed that a signal reduction of 25% was enough to indicate specific antibody labelling. This seems like a low number, and not very stringent. For some of the hits, expressing tagged versions of the proteins would greatly strengthen the Golgi assignment. This may not be possible for all, but for RAD51C would seem an important experiment. *

      Response: We thank the reviewer for raising the important issue of antibody validation stringency. We agree that for a single-candidate study, a larger reduction after knockdown would generally be preferable. In our case, the 25% cutoff was used only in the primary high-content screening step as part of an intentionally inclusive two-stage workflow, for the following reasons:

      First, because this dataset is generated in a screening format across hundreds of targets, knockdown-efficiency, protein turnover, and the relative size of the Golgi associated pool are unknown and highly variable between genes. For many proteins the Golgi pool represents a small fraction of total cellular signal, and a modest change in total abundance can translate into a smaller absolute change in the Golgi ROI after segmentation, background subtraction, and imaging noise. We therefore selected a permissive cutoff to reduce false negatives and ensure we did not systematically miss candidates with slower turnover, partial knockdown, or small Golgi pools. This strategy is consistent with large scale subcellular mapping efforts, including the Human Protein Atlas, where genetic depletion by siRNA is used as a key validation pillar for immunofluorescence localization and is combined with additional validation strategies when deeper confidence is required (Stadler et al, 2012). Furthermore, it is important to note that this validation was performed in a high-content screening format in which fixation, permeabilisation, antibody concentration, and blocking conditions were kept uniform across all candidates rather than optimised for each individual antibody. In standard single-target immunofluorescence experiments, these parameters would be titrated to maximise signal-to-noise for the specific antibody and antigen in question. Under non-optimised screening conditions, the absolute magnitude of signal change upon knockdown is inherently attenuated compared to what would be expected from a purpose-optimised assay. We therefore consider a 25% reduction threshold under these uniform, non-optimised screening conditions to be a meaningful and appropriately calibrated criterion.

      Second, we wish to clarify that the primary intent of our screen was not to validate the Golgi-nuclear localisation of any single protein in isolation, but rather to identify whether entire functional pathways are represented at the two organelles. This is precisely why the bioinformatic network analysis was performed as an integral part of the workflow, and not as an afterthought. The finding that the validated hit list is significantly enriched for coherent functional clusters, most notably a network spanning multiple core DNA repair pathways (HR, MMR, BER, MMEJ) serves as an in silico validation of the dataset as a whole. The emergence of pathway-level organisation, with proteins from the same repair pathways co-associating, localising to the same Golgi sub-compartments, and redistributing in the same direction upon genotoxic stimuli, provides biological coherence that goes beyond what individual antibody validation can offer, and substantially reduces the likelihood that the Golgi signal represents a collection of unrelated false positives.

      Third, our mechanistic conclusions do not rely on the 25% screening threshold. For RAD51C, we used multiple orthogonal validation approaches, including independent antibodies recognizing distinct RAD51C epitopes and genetic depletion, supported by biochemical evidence.

      In response to this comment, we have provided the full screening validation dataset as source data (Supplementary____Table S1), including intensity changes for the candidates, so that readers can inspect the distributions and apply their own thresholds. We have also clarified in the Results section the rationale behind our screening strategy (lines 128-139) and the role of the bioinformatic network analysis as an integral validation step (lines 141-156).

      Turning to the specific suggestion of tagged RAD51C, we fully agree that tagged proteins can provide valuable orthogonal validation. We attempted endogenous tagging using CRISPR-mediated homologous recombination but were unable to obtain viable colonies following editing, consistent with the essential role of RAD51C in homologous recombination. We also attempted ectopic expression of tagged RAD51C but were unable to obtain constructs that preserved physiological expression levels, maintained robust cell viability or produced interpretable localization. This difficulty is not unique to our laboratory: colleagues working on RAD51 paralog complexes have reported that tagging or overexpression of RAD51C perturbs both its localisation and its ability to form functional paralog complexes (Greenhough et al, 2023; Rawal et al, 2023; Somyajit et al, 2015; Berti et al, 2020) all use purified complexes or untagged proteins for functional assays. We discussed these challenges extensively with experts in the DNA damage repair field at several international meetings (EMBO Sounio, Keystone Symposia, German DNA Repair Society). For these reasons, we relied on orthogonal approaches that do not require tagging (genetic depletion plus independent antibodies, and biochemical fractionation) to support the Golgi localization claim. We agree with the reviewer that this represents a limitation of this study, and we addressed these concerns in the discussion of our revised manuscript (lines 630-641).

      *2.) The total signal should be quantified for each DNA repair protein upon genotoxic stress, in addition to the Golgi to nucleus ratio. For many of the proteins it looks like the total signal goes down, which could influence interpretation. *

      Response: __We thank the reviewer for this important point. We wish to clarify that our imaging pipeline uses marker-based segmentation throughout, the Golgi compartment is segmented using GM130 and the nucleus using Hoechst, as unsegmented whole-cell masks without organelle markers yield unreliable intensity measurements in this experimental setup. True total cellular signal is therefore not directly accessible in this dataset. In the revised manuscript we provide the absolute fluorescence intensities for both the Golgi and nuclear compartments separately. In addition, we now include total (Golgi + nuclear) intensity measurements for each protein (__Supplementary Figures 3D, 4D, __and 5E__) as the most reliable proxy for overall protein distribution. These data are presented alongside the redistribution ratio to enable comprehensive interpretation.

      As the reviewer correctly notes, a subset of proteins shows a reduction in total signal after treatment, particularly with doxorubicin. This is consistent with known effects of doxorubicin-induced DNA damage on cellular proteostasis, including widespread ubiquitination and suppression of protein translation (Halim et al, 2018). Several DDR regulators are subject to ubiquitin-dependent turnover following genotoxic stress, such as CHK1 (Zhang et al, 2005). More broadly, ubiquitin and proteasome mediated regulation is an integral component of the DNA damage response and can affect the abundance and detectability of DDR factors (Brinkmann et al, 2015). Changes in abundance are therefore an expected biological feature of the response. For this reason, we used the Golgi-to-nucleus ratio as the primary redistribution readout, as it captures relative compartmental partitioning independently of changes in total protein levels.

      *3.) The study would benefit from live imaging of the Golgi to nucleus translocation of RAD51C. This would give a better indication of dynamics. *

      __Response: __We agree that live imaging would directly visualize the dynamics of RAD51C redistribution between the Golgi and the nucleus. This was indeed one of our initial goals following the identification of the Golgi-associated RAD51C pool. However, as described above in our response to Major Comment 1, live imaging requires a fluorescently tagged RAD51C construct, and all tagging strategies we attempted, both endogenous CRISPR-mediated tagging and ectopic expression, failed to yield cell lines with robust signal while preserving physiological behaviour. This appears to be a broader challenge for highly conserved and functionally constrained DNA repair proteins, and is not unique to our laboratory.

      Given these constraints, we focused on tag-independent approaches: multiple independent RAD51C antibodies combined with genetic depletion controls, quantitative fixed-cell time courses, and biochemical fractionation. These orthogonal datasets together support compartment-specific changes over time in a manner consistent with redistribution. We have clarified this limitation explicitly in the manuscript and avoided any wording that could be interpreted as implying direct single-molecule tracking in live cells. We present this as an important avenue for future work, contingent on the development of viable RAD51C-expressing cell lines (lines 630-641).

      *4.) The double depletion experiments suggest a functional relationship between giantin and RAD51C. But they do not formally show it. Experiments to more directly address the functional role of the interaction between these two proteins would strengthen the study. *

      Response: We agree with the reviewer that double depletion alone cannot formally prove that the physical Giantin-RAD51C interaction is the sole determinant of the observed DDR phenotypes. However, we would like to highlight the breadth of evidence we have assembled in support of this functional relationship:

      • Physical interaction between endogenous Giantin and RAD51C demonstrated by colocalisation (Figure 4F-G) and co-immunoprecipitation (Figure 4H-I).
      • Damage-induced dissociation of the Giantin-RAD51C complex that is prevented by ATM inhibition or Importazole treatment, directly linking the interaction to the DDR signalling axis (Figure 3K-P)
      • Premature nuclear accumulation of RAD51C upon Giantin depletion, producing aberrant nuclear foci lacking canonical HR markers and impaired ATM signalling (Figure 4B-E & J-M)
      • DR-GFP reporter assay confirming that Giantin depletion reduces HR efficiency to approximately 60% of control, consistent with the reduction previously reported in the genome-wide HR screen (Adamson et al. 2012) and validating the functional significance of Giantin in HR (Figure 5L).
      • Partial rescue of ATM phosphorylation, genomic instability and proliferation phenotypes by RAD51C co-depletion, arguing for RAD51C as a functionally relevant conduit of the Giantin-dependent phenotype (Figures 5M-5P). These observations are further supported by the established literature on RAD51C function, its roles in CHK2 phosphorylation, replication fork stabilisation, and RAD51 filament formation (Badie et al, 2009; Somyajit et al, 2015; Prakash et al, 2022) providing a mechanistically coherent framework in which mislocalisation of RAD51C, whether directly or indirectly through Giantin, leads to dysregulation of DDR signalling and repair capacity, as we directly demonstrate with the HR efficiency assay.

      Nonetheless, we fully agree that the most direct proof of the functional relevance of the physical Giantin-RAD51C interaction would come from separation-of-function experiments, ideally using an interaction-deficient Giantin mutant or an RAD51C variant unable to bind Giantin. We wish to be transparent that both approaches face substantial technical barriers in this system. RAD51C tagging consistently compromised cell viability and protein function, precluding the generation of interaction-deficient variants at physiological expression levels. Engineering an interaction-deficient Giantin mutant presents an independent challenge: Giantin is one of the largest Golgi matrix proteins (~376 kDa), composed almost entirely of extended coiled-coil domains that are resistant to structural prediction, and identifying a discrete RAD51C interaction interface without disrupting broader scaffolding function would require a dedicated structural and biochemical programme. We have framed these explicitly as the most important future priorities in the Discussion (lines 555-564), rather than over-interpreting the current data.

      *5.) The Kaplan-Meier plots in Fig S9 seems to be quite selective in that only breast cancer is shown. Does giantin reduction correlate with poor prognosis in other cancers? *

      __Response: __We thank the reviewer for this suggestion. We initially focused on breast cancer because RAD51C is a clinically established hereditary breast and ovarian cancer susceptibility gene (Meindl et al, 2010; Ghannoum et al, 2023), providing direct clinical context for a study centred on RAD51C dynamics and genome stability. We agree however that restricting the survival analysis to a single cancer type can appear selective.

      To address this directly, we expanded the in-silico survival analysis of Giantin (GOLGB1) using GEPIA2 (Tang et al, 2019) across all available TCGA cohorts (overall survival, median cutoff, FDR correction). In the pooled pan-cancer analysis, higher GOLGB1 expression is significantly associated with improved overall survival (HR(high) = 0.75, p = 6.6 × 10⁻¹⁵). When stratified by tumour type, the majority of individual associations do not reach statistical significance. The two most robust statistically significant associations are kidney renal clear cell carcinoma (KIRC; HR(high) = 0.57, p = 3.4 × 10⁻⁴), where high GOLGB1 expression is associated with improved survival, and lower-grade glioma (LGG; HR(high) = 1.5, p = 0.036), where the association is in the opposite direction. A significant association is also observed in thymoma (THYM; HR(high) = 7.3, p = 0.031), though this should be interpreted with caution given the small cohort size (n = 59). Notably, the breast cancer association observed in the KM Plotter analysis (HR = 0.71, p = 1.8 × 10⁻¹¹; n = 4,929) does not reach significance in the TCGA BRCA cohort (HR = 1.1, p = 0.68; n = 1,070), most likely reflecting the substantially smaller sample size of the TCGA cohort, which is approximately 4.6-fold smaller and therefore underpowered to detect a modest effect. These context-dependent associations are consistent with the tumour-type-specific roles of Golgi scaffolding proteins and are discussed accordingly in the revised manuscript.

      In the revised manuscript we have retained the original breast cancer Kaplan-Meier plots and supplemented them with a pan-cancer survival map across all TCGA cohorts (lines 611-625; Figure S9G) and a summary table (Supplementary Table 3) reporting hazard ratios, sample sizes, and p-values for each tumour type, allowing readers to assess the clinical relevance of GOLGB1 expression.

      *Minor points: There are a few grammatical errors here and there. The figures do not appear in the correct order in the text, which makes the early parts of the paper a bit difficult to follow. Some of the figures don't seem to clearly match the text. For example, it is mentioned that RAD51C labelling was done with 3 different antibodies. I could not find this data. *

      Response: __We thank the reviewer for these helpful observations. In the revised manuscript we have (i) carefully proofread the text and corrected grammatical errors throughout; (ii) revised the Results section to ensure that figures and supplementary figures are cited in sequential order and that each panel is explicitly introduced before being discussed, improving readability in the early sections. and (iii) corrected figure callouts to ensure they match the text. In particular, the statement that RAD51C labeling was performed with three different antibodies has been linked to the corresponding figure panels in the Results section. Antibody identifiers, sources, and dilutions are clearly reported in the Methods and in the table in __Supplementary Table S1.

      __ Reviewer #1 (Significance (Required)):__

      *This paper is novel and should be of significant interest to the field. It has important implications for how we think about the Golgi apparatus, and for how DNA repair pathways may be controlled. The pattern is clearly complex, with many DNA repair proteins localising to the Golgi, and some showing opposite dynamics. However, by focussing on RAD51C and giantin, the paper nicely demonstrates a novel mechanism for controlling DNA repair by these proteins. *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Background - Eukaryotic cells rely on tightly regulated DNA repair pathways to preserve genome stability under the constant threat of both endogenous and exogenous genotoxic stress. While the nucleus, and to a lesser extent the mitochondria, is the primary site where DNA damage is detected and repaired, accumulating evidence indicates that extranuclear organelles, particularly the Golgi apparatus, play a surprisingly important role in modulating stress signaling, proteostasis, and the trafficking/activation of key DNA repair factors.

      • Emerging evidence has shown that genotoxic stress can result in a major remodeling of the Golgi apparatus; however, the crosstalk between the Golgi and the nucleus, and its contribution to the DNA damage response, remains poorly defined. The present study offers timely insight by examining the spatiotemporal behavior of DNA repair proteins that shuttle between the Golgi and the nucleus, and how this trafficking contributes to the maintenance of genomic stability.*

      Main findings - The authors employed the Human Protein Atlas (HPA) project to shortlist proteins that might link Golgi-nuclear function and validated each candidate using an siRNA-mediated antibody-validation pipeline, thereby identifying 163 proteins that localize to both the Golgi and the nucleus. Bioinformatic analysis of these candidates revealed a significant enrichment for DNA damage response (DDR) regulators, including multiple factors from core DNA repair pathways, suggesting that a portion of the DDR machinery may reside in the Golgi at steady state. Interestingly, the authors observed that dual-localizing DDR proteins undergo lesion-specific redistribution between the Golgi and the nucleus in response to specific types of DNA injuries. For instance, BER and MMEJ proteins shifted from nucleus to Golgi in response to doxorubicin, whereas MMR and HR proteins redistributed from Golgi to nucleus. This trend was reversed with H2O2 or KBrO3 treatments.

      • To gain further insight into the link between the DDR and Golgi-nuclear communication, the authors focused on the HR factor RAD51C, which also plays a key role during the replicative stress response. The authors noticed that RAD51 is significantly associated with the Golgi, in addition to its known nuclear pool. Interestingly, they demonstrated that doxorubicin triggers the ATM-dependent release of this Golgi-tethered RAD51C pool and its Importin-β-mediated import into the nucleus, where it forms repair-associated foci. They further identified Giantin as the Golgi scaffold that anchors RAD51C at steady state in this subcellular compartment and showed that its depletion leads to premature nuclear accumulation of RAD51C, formation of aberrant RAD51C foci lacking canonical HR markers, reduced ATM activation, elevated genomic instability, and increased cell proliferation. *

      Together, this study revealed an underappreciated and functionally meaningful spatiotemporal level of regulation within the DDR, suggesting that the Golgi, rather than functioning solely as a trafficking organelle, acts as a platform that anchors, releases, and temporally controls the availability of key DNA repair factors in response to genotoxic stress. In particular, the authors demonstrated that the timely and regulated release of RAD51C from the Golgi is essential for maintaining genome stability and is dependent on canonical DDR signaling pathways, including ATM activation and Importin-β-mediated nuclear import.

      • Overall Critique - This manuscript offers a novel and compelling perspective on the regulation of the DDR by positioning the Golgi as an active participant in the spatiotemporal control of DNA repair factors. By integrating multiple experimental layers, including a systematic localization screening, a sub-Golgi mapping, several dynamic redistribution assays, and functional perturbation read-outs, the authors built a strong and coherent case for a biologically meaningful Golgi-nucleus communication axis during the DDR. Therefore, the study is timely and highly relevant for the DNA repair field, with broader implications for our understanding of how subcellular organelles coordinate genome maintenance and cellular homeostasis.

      While the manuscript is clearly written and the figures are coherent and supportive of the main findings of the study, several issues should be addressed to ensure full interpretability and reproducibility.

      Major Comments*

      *1. Limited use of agents causing genotoxic stress - The authors report intriguing lesion-specific shifts in Golgi-nuclear redistribution, yet much of the mechanistic work relies heavily on doxorubicin, a pleiotropic drug that induces diverse forms of DNA damage beyond DSBs. Expanding the core analysis of the study to include a broader panel of mechanistically defined genotoxins (e.g., etoposide, camptothecin, neocarzinostatin, or ionizing radiation) would substantially strengthen the conclusion that the trafficking patterns reflect damage-type specificity rather than drug-specific off-target effects. Such broader analysis would also clarify whether Golgi-nucleus communication responds differentially to replication-associated breaks, Topo II-dependent lesions, oxidative stress, or crosslinks. *

      __Response: __We thank the reviewer for this important point. We would first note that while doxorubicin is indeed pleiotropic, its primary and best-established mechanism of action is the poisoning of Topoisomerase II, leading to DNA double-strand breaks, a mechanism it shares with etoposide (van der Zanden et al, 2021; Thorn et al, 2011). The additional effects of doxorubicin, including reactive oxygen species generation and chromatin remodelling, are well-documented but secondary to this DSB-inducing activity, as we note in the revised manuscript. Nonetheless the goal of this study was not to comprehensively map lesion-specific trafficking for every DDR protein, but rather to establish the existence of a dynamic Golgi-nucleus redistribution axis and then focus mechanistically on the validated targets, in this case RAD51C. The lesion-dependent redistribution patterns are therefore presented as an initial, hypothesis-generating observation emerging from our screening and characterisation framework. A systematic, lesion-by-lesion dissection of redistribution kinetics across the broader DDR network would represent a substantial additional study and is beyond the scope of the present work.

      Importantly, our key mechanistic observations for RAD51C are not restricted to doxorubicin. We tested a panel of genotoxic agents covering mechanistically distinct lesion classes: camptothecin (CPT; Topoisomerase I-associated replication breaks), etoposide (ETO; Topoisomerase II-dependent DSBs), and mitomycin C (MMC; interstrand crosslinks) (Figures S8A-S8I). Across all DSB-inducing agents, RAD51C consistently redistributed from the Golgi to the nucleus, demonstrating that this response is not a doxorubicin-specific off-target effect. Notably, RAD51C did not redistribute in response to oxidative lesions induced by hydrogen peroxide or potassium bromate, consistent with its established role in homologous recombination and DSB repair rather than oxidative damage pathways, as discussed in the manuscript. This lesion-type selectivity provides additional evidence that the Golgi-nuclear redistribution we observe is a biologically specific response rather than a non-selective stress effect.

      *2. Functional implications of RAD51C redistribution for HR efficiency - Although the study convincingly demonstrates a release of RAD51C from the Golgi and its subsequent nuclear foci formation, it remains unclear how this redistribution influences HR efficiency. Incorporating a functional HR assay (e.g., DR-GFP reporter, RAD51 filament assembly, or fork protection assays) would help determine whether Golgi-anchored RAD51C release is directly required for HR or instead primarily modulates upstream DDR signaling. *

      Response: __We thank the reviewer for this important suggestion. We have performed DR-GFP reporter assays to directly assess HR efficiency following Giantin and RAD51C depletion. Depletion of Giantin reduced HR efficiency to approximately 60% of control levels, and RAD51C depletion to approximately 40%, consistent with the HR reduction previously reported in the genome-wide HR screen (Adamson et al, 2012). Co-depletion of Giantin and RAD51C reduced HR to levels comparable to RAD51C depletion alone, suggesting that the effect of Giantin on HR is mediated primarily through RAD51C, consistent with RAD51C being the key effector of the Giantin-dependent spatial regulatory mechanism we describe. These data are included in the revised manuscript (__lines 455-465; Figure 5L).

      *In addition, the manuscript does not fully reconcile how Golgi-tethering of RAD51C fits with its well-established nuclear roles during replication stress, where timely availability of RAD51C is essential for fork stabilization and restart. *

      Response: __We agree that the nuclear function of RAD51C during replication stress is well established and important to reconcile with our findings. Our imaging data consistently show a detectable nuclear RAD51C population at steady state across all cell lines examined, and we do not propose that RAD51C is exclusively Golgi-localised. We suggest that the two pools serve distinct functional purposes: the constitutive nuclear pool supports ongoing replication fork stabilisation and restart, processes that require RAD51C availability independently of acute DNA damage, while the Golgi-tethered fraction represents a damage-responsive reserve that is released acutely upon DSB induction in an ATM-dependent manner. We wish to be transparent that this two-pool model is speculative at present, formally distinguishing the contributions of each pool would require direct labelling of the Golgi-anchored fraction, which was not technically feasible in this system as discussed above. Nonetheless, this model is consistent with established principles of signal-responsive protein sequestration in cell biology, and is directly supported by our Giantin depletion data: premature release of the Golgi pool leads to aberrant nuclear RAD51C foci lacking canonical HR markers and impaired ATM signalling, demonstrating that unscheduled nuclear accumulation is actively detrimental rather than simply redundant. We have added a paragraph to the revised Discussion explicitly framing the two-pool distinction as a working model and identifying direct pool-identity tracking as an important future direction (__lines 566-587).

      *3. Specificity of Giantin-related phenotypes - The phenotypes observed upon Giantin depletion (e.g., increased micronuclei, comet tail moments, impaired ATM signaling, and elevated proliferation) could partially reflect a global dysfunction of the Golgi rather than RAD51C-specific tethering defects. Although co-depletion of RAD51C provides partial rescue, additional controls examining Golgi integrity, trafficking competence, or rescue with siRNA-resistant Giantin would help confirm specificity and distinguish direct from indirect effects. *

      __Response: __We thank the reviewer for raising this important concern, which was a central consideration throughout our investigation. We address it through three complementary lines of evidence.

      First, regarding Golgi structural integrity and trafficking competence: as previously reported, Giantin depletion has not been associated with strong Golgi fragmentation or major morphological alterations (Koreishi et al, 2013; Bergen et al, 2017; Stevenson et al, 2021), and we observed no significant Golgi fragmentation upon Giantin knockdown in our system. Consistent with the literature, Giantin has been implicated in specific cargo trafficking, most notably collagen secretion, rather than general secretory pathway function (Stevenson et al, 2021). To directly confirm that general Golgi trafficking competence was preserved in our experimental system, we performed the VSV-G-YFP trafficking assay (Presley et al, 1997), a well-established functional readout of general secretory trafficking. Giantin depletion did not result in a significant change in trafficking efficiency compared to control siRNA (Rebuttal Figure 1), consistent with the literature and arguing against a general collapse of Golgi function as the basis for the phenotypes observed.

      Rebuttal ____Figure 1. VSV-G-YFP trafficking assay.

      (A) Representative images of cells treated with control siRNA or giantin siRNA. Nuclei are stained with Hoechst. Total VSV-G-YFP (YFP-tsO45G) signal is shown together with antibody staining against VSV-G in non-permeabilized cells to assess cell surface levels. Scale bars, 10 μm.

      (B) Quantification of VSV-G trafficking from two independent biological replicates.

      Second, the phenotypes are RAD51C-dependent and not a generic Golgi dysfunction: the genomic instability and DDR signalling defects we observe upon Giantin depletion are not phenocopied by GMAP210 depletion, another Golgin family member, indicating that the phenotypes are not a generic consequence of Golgin loss. Critically, we now directly demonstrate using the DR-GFP reporter assay that Giantin depletion reduces HR efficiency to approximately 60% of control, and that co-depletion of RAD51C produces no further reduction beyond RAD51C depletion alone, consistent with RAD51C epistasis over Giantin for HR capacity (Figure 5L). This functional epistasis, together with the physical interaction between Giantin and RAD51C by co-immunoprecipitation, their co-localisation within the same Golgi sub-compartment, and the partial rescue of ATM phosphorylation, micronuclei formation and proliferation phenotypes upon RAD51C co-depletion, provides a coherent mechanistic chain linking Giantin specifically to RAD51C-dependent DDR outcomes. While we cannot formally exclude indirect contributions from other Giantin-associated factors, none of our observations are consistent with the phenotype arising from non-specific Golgi perturbation.

      Third, Giantin may play a broader role in connecting DDR signalling to cytoplasmic and Golgi-resident processes, beyond RAD51C tethering alone: we consider this a feature of the biology rather than a confound. Golgins are well established as multi-cargo scaffolding platforms, and Giantin in particular occupies a strategic position where several processes converge: the tethering of DDR factors, the regulation of damage-induced signalling cascades, and the directional trafficking of repair factors between compartments. This would explain why Giantin depletion produces a phenotype that extends beyond what RAD51C co-depletion alone can fully rescue, and is consistent with the pathway-level coherence we observe across our screen. Understanding the full complement of Giantin-associated DDR interactions represents one of the most compelling directions emerging from this work.

      In response to this comment, we have expanded the Discussion (lines 545-565) to explicitly propose that Giantin functions as a broader organisational node coordinating multiple DDR factors, while our data specifically and consistently implicate RAD51C as a primary conduit.

      *4. Positioning of ATM in the Golgi-nuclear signaling - While ATM inhibition prevents RAD51C release, its spatial and mechanistic basis of this regulation remains obscure. It is not clear whether ATM acts locally at the Golgi, through cytoplasmic pools, or indirectly via nuclear feedback signaling. Clarifying or discussing this point in more depth would improve the mechanistic coherence of the proposed model. *

      __Response: __We thank the reviewer for raising this important mechanistic question. The spatial basis of ATM action at the Golgi is indeed an emerging and exciting area of cell biology. A growing body of evidence demonstrates that ATM associates with the Golgi membrane through binding to phosphatidylinositol-4-phosphate (PI4P), and that this Golgi-resident pool modulates the magnitude and kinetics of the nuclear DDR (Ovejero et al, 2023). Importantly, the most recent work in this area demonstrates that Golgi-associated ATM is not merely a passive reservoir but is enzymatically active and capable of phosphorylating Golgi-resident substrates (Soulet et al, 2026), providing a compelling mechanistic basis for how damage-induced ATM signalling could reach the Golgi to license RAD51C release.

      To directly examine whether ATM localises to the Golgi in our system and whether its activation state changes upon DNA damage, we performed a biochemical Golgi enrichment assay using the Minute{trade mark, serif} Golgi Apparatus EnrichmentKit (Cat #: GO-037) to examine ATM distribution across cis- and trans-Golgi fractions. Fraction purity was validated using GM130 (cis-Golgi), TGN46 (trans-Golgi), and HSP60 (membrane fraction) (Rebuttal Figure 2A). This analysis revealed that ATM is detectable in the total membrane fraction and enriched in the cis-Golgi fraction under basal conditions (Rebuttal Figure 2A). Under normal physiological conditions, activated ATM (pATM) was absent from Golgi-enriched fractions (Rebuttal Figure 2B), but was detectable in the cis-Golgi fraction following doxorubicin-induced genotoxic stress (Rebuttal Figure 2C). While these observations are preliminary and require further validation, they are consistent with the emerging literature and raise the intriguing possibility that ATM is recruited to and activated at the Golgi in a damage-dependent manner, where it could act locally to license RAD51C release.

      Rebuttal Figure 2. Biochemical Golgi fractionation confirms ATM enrichment in cis-Golgi compartments.

      *Western blot of HeLa-K fractions enriched for cis- and trans-Golgi membranes, probing for (A) ATM under basal conditions, and (B and C) pATM under basal conditions and (B) pATM (C) after treatment with DOX (40 μM) (markers: GM130 for cis-Golgi, TGN46 for trans-Golgi, HSP60 for membrane fraction (MEM). *

      We consider the precise spatial and mechanistic dissection of ATM signalling at the Golgi and its relationship to nuclear feedback, one of the most exciting directions to emerge from this work, and one that we hope our study has helped to open. We have expanded the Discussion (lines 525-543) accordingly to place our findings in the context of the emerging Golgi-ATM literature and to frame this as an important unresolved question for future investigation.

      *5. RAD51C is examined in silo, without consideration for the BCDX2 complex - RAD51C is exclusively analyzed in isolation, despite its well-established function as part of the BCDX2 paralog complex (RAD51B-RAD51C-RAD51D-XRCC2). Because RAD51C does not normally operate as a standalone factor, it is unclear why only RAD51C, among all paralogs, would be subjected to Golgi tethering, ATM-dependent release, and Importin-β-driven nuclear import. This raises important mechanistic questions: Are other BCDX2 members also Golgi-associated? Do they undergo similar trafficking dynamics? Does Golgi tethering selectively regulate RAD51C, or does the complex translocate together? Addressing these points would greatly strengthen the biological plausibility and mechanistic coherence of the proposed model. *

      Response: We thank the reviewer for raising this important point. We fully agree that RAD51C functions as a core component of the BCDX2 (RAD51B-RAD51C-RAD51D-XRCC2) and CX3 (RAD51C-XRCC3) paralog complexes, and that its canonical roles in HR and replication fork protection occur within these assemblies. Our decision to focus on RAD51C was driven by the screening data: of the DDR proteins identified, RAD51C displayed the most robust Golgi-associated pool, the clearest damage-induced redistribution dynamics, and a tractable anchoring interaction with Giantin that could be interrogated biochemically.

      We would also note that extending this analysis to other RAD51 paralogs is not straightforward with current tools. The available commercial antibodies against RAD51B, RAD51D and XRCC2 perform poorly in immunofluorescence applications, and most localisation studies for these proteins have relied on overexpression of tagged constructs, a strategy that, as discussed above, risks perturbing both localisation and complex assembly. The lack of reliable antibodies for endogenous paralog detection at the resolution required for Golgi localisation analysis represents a genuine technical barrier that we encountered directly during this study.

      Whether Golgi association and ATM-dependent release involve RAD51C alone or extend to other BCDX2 or CX3 members is therefore a genuinely open and important question. We note that our co-immunoprecipitation data were performed on total cell lysate and cannot distinguish whether the Golgi-associated RAD51C is complexed with other paralogs or represents a monomeric subpopulation. Golgins are well established as multi-cargo scaffolding platforms, and it is entirely plausible that Giantin organises a broader paralog module rather than tethering RAD51C as an isolated subunit. A systematic analysis of RAD51 paralogs for Golgi localisation and lesion-dependent trafficking enabled by improved reagents such as proximity labelling or endogenous tagging approaches compatible with essential proteins would determine whether the BCDX2 complex translocates as a unit or whether individual subunits are differentially regulated, with potentially distinct consequences for HR fidelity. We have revised the manuscript accordingly and identify this as an explicit priority for future work in the revised Discussion (lines 583-602).

      Minor Comments

      1. Pathway-specific sub-Golgi localization patterns - The finding that DDR proteins map to distinct cis/trans Golgi subdomains is an interesting and potentially important observation. However, the dataset is limited to 15 proteins, making the proposed pathway-level trends (e.g., HR factors enriched in cis-Golgi; BER/MMEJ factors enriched in trans-Golgi) preliminary. Strengthening this conclusion by increasing the number of DDR proteins analyzed would help determine whether sub-Golgi compartmentalization contributes meaningfully to DNA repair pathway regulation.

      Response: We thank the reviewer for this constructive suggestion. We agree that extending sub-Golgi mapping to a larger number of DDR proteins would be valuable, and we present the current dataset explicitly as a first, hypothesis-generating map rather than a definitive pathway atlas.

      We would like to highlight, however, that the value of this observation lies not simply in the number of proteins mapped, but in the biological coherence of the patterns that emerge. The finding that proteins from the same repair pathway tend to occupy the same Golgi sub-compartment: BER and MMEJ factors enriching in the trans-Golgi, HR factors in the medial/cis-Golgi, and that this sub-compartmental positioning correlates with the direction of their redistribution upon genotoxic stress, is a pattern that would be unlikely to arise by chance across 15 independently validated proteins. This internal consistency argues that the sub-Golgi organisation reflects genuine pathway-level biology rather than noise, even if the dataset is not yet exhaustive. Together with the bioinformatic network analysis, which independently supports pathway-level clustering across the broader validated hit list, these observations reinforce each other as complementary layers of evidence.

      2. Is the Golgi-released RAD51C indeed the pool that enters the nucleus? The major assumption of the study is that the RAD51C population released from the Golgi upon DNA damage is the same pool that subsequently accumulates in the nucleus to form repair foci. While the imaging and fractionation data are consistent with this model, the study does not directly track or distinguish Golgi-derived RAD51C from cytoplasmic or pre-existing nuclear pools. Without a method to specifically label, pulse-chase, or track the Golgi-anchored fraction, it remains formally possible that nuclear RAD51C originates from other subcellular reservoirs.

      __Response: __We thank the reviewer for highlighting this important mechanistic point, which we agree cannot be fully resolved with the current dataset. Several independent lines of evidence are nonetheless consistent with a model in which the Golgi-associated pool contributes directly to damage-induced nuclear accumulation.

      • Our time-resolved imaging demonstrates a reciprocal decrease at the Golgi and a concurrent increase in the nucleus following genotoxic stress, consistent with redistribution rather than independent compartment-specific changes (Figures 3E-3I).
      • Biochemical fractionation provides an orthogonal readout of the same reciprocal shift under identical conditions (Figures 3J and S6D).
      • ATM inhibition simultaneously prevents Golgi loss and blunts nuclear accumulation, while Importin-β perturbation blocks nuclear entry, together supporting an active and regulated translocation route (Figures 3K-3P).
      • Giantin depletion, which releases the Golgi-tethered RAD51C pool prematurely, leads to aberrant nuclear RAD51C foci lacking canonical HR markers and impaired ATM signalling, strongly supporting that the Golgi-tethered fraction has functional consequences in the nucleus consistent with it being the relevant pool (Figures 4B-4E and 4J-4M).
      • In the revised manuscript we have included cytoplasmic RAD51C signal quantification across the doxorubicin time course (Figure 3H). The cytoplasmic signal shows only a moderate and gradual reduction that is kinetically distinct from the sharp Golgi decrease and does not precede the nuclear increase. This pattern is inconsistent with a large pre-existing cytoplasmic reservoir driving the nuclear accumulation; if the cytoplasmic pool were the primary source, one would expect a rapid and prominent cytoplasmic decrease coinciding with or preceding nuclear accumulation, which we do not observe. Instead, the data are more consistent with rapid transit of Golgi-released RAD51C through the cytoplasm rather than stable cytoplasmic accumulation prior to nuclear entry. We acknowledge that definitive pool-identity tracking would require spatially restricted labelling approaches such as Giantin-proximal TurboID or photoactivatable tagging strategies, which are precluded by the technical constraints on RAD51C tagging described above. We have revised the manuscript to avoid overstatement on this point and identify these approaches as important future directions (lines 297-305 & lines 715-719).

      Reviewer #2 (Significance (Required)):

      General assessment - This study presents a novel and conceptually compelling view of the DNA damage response (DDR) by positioning the Golgi apparatus as an active regulator of the spatiotemporal availability of DNA repair factors. The strongest aspects of the work include its integration of a systematic immune-localization screening, a sub-Golgi compartment mapping, dynamic redistribution assays, and functional perturbations to build a coherent model of Golgi-nucleus communication during genotoxic stress. The mechanistic focus on RAD51C provides a clear case study linking organelle-level regulation to genome stability.

      • Advance - To my knowledge, this is the first comprehensive demonstration that the Golgi can serve as a spatiotemporal coordination node for DDR proteins, including those involved in HR. The identification of a substantial pool of RAD51C, and reportedly other DDR factors, anchored within specific Golgi subdomains represents a significant conceptual advance. The demonstration that Golgi-tethered RAD51C is released in an ATM-dependent manner and subsequently participates in nuclear foci formation suggests a previously unrecognized organelle-level regulatory checkpoint in genome maintenance. This work therefore extends current models of the DDR by revealing a layer of intracellular coordination that bridges classical nuclear pathways with cytoplasmic organelle function.*

      • Audience - This study will be of strong interest to a specialized audience in the fields of DNA repair, genome stability, and cell biology, particularly those studying the spatial organization of repair pathways and intracellular stress signaling. It will also appeal to researchers investigating organelle biology, intracellular trafficking, and the broader coordination of cytoplasmic and nuclear responses to stress. Beyond these communities, the work may be relevant to cancer, as it suggests new mechanisms by which organelle perturbations or Golgi-associated scaffolding proteins could influence therapeutic responses or genomic instability.

      Reviewer expertise - Field of expertise: DNA repair, genome stability, organelle biology, cancer cell biology.*

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      *This study investigates the communication between the Golgi complex and the nucleus of the cell, which remains a largely unexplored field. The authors used publicly available siRNA and antibody data from the Human Protein Atlas as a basis for finding overlap between the proteomes of the two cellular compartments. In validating the data from the HPA, the study finds a novel cluster of DNA repair proteins present in the Golgi, which they validate and resolve to sub-compartmental localization. To do so they use immunofluorescence (IF) localization on ¬cis- and trans-Golgi cisternae marked by GM130 and TGN46, respectively. The authors find that many of the fully validated proteins present in both the nucleus and Golgi redistribute between the Golgi and the nucleus dependent on the protein and the type of DNA lesion. They focused on RAD51C, a recombination factor. They show that RAD51C resides in both the ¬cis- and trans- subsections prior to damage and responds to DNA damage in an ATM-dependent manner via release of a Golgi-based pool bound to Giantin, which is then imported into the nucleus via Importin-β. Knockdown experiments showed that Giantin regulates RAD51C spatially and temporally. The work reveals a dynamic interchange of proteins between the Golgi and nucleus that controls cell functions beyond the classic secretory, membrane trafficking, and PTM roles of the Golgi. The authors build on prior work on Golgi impacts on DDR, offering an alternative cellular compartment for storage of DDR factors prior to damage. Overall, the data is timely and relevant, as it finds new roles for the Golgi in DNA damage response (DDR) regulation. The data is largely convincing and well controlled. The IF data is presented in black and white single channels and merged in color, which allows good comparison of the different protein stains. The scope of the initial screen of HPA antibodies and Golgi/Nuclear dual proteomes is impressive, and the overlap of DDR proteins is characterized for fifteen different proteins at a sub-compartmental level. The focus on RAD51C as a member of the HR pathway was a strong choice, and the study presents interesting information on its regulation by Golgi complex members, as well as a feedback look with pATM. The possibility of the Golgi storing specific DDR factors in specific compartments is well-supported and intriguing. There are a few major and minor points that should strengthen the paper and improve clarity prior to publication. *

      Major Comments:

      *1. Much of the strength of the IF data is lost in the choice of scale for presentation of the data. In almost all cases, enlarged sections should be shown of the areas currently indicated by arrow, in all channels. This is done well in Figure 3A, where an area of the Golgi is enlarged and the overlap of RAD51C in the GM130-marked Golgi is clearly visible in the merged channel, even when printed out. I would highly recommend including the white box and enlarged in all images and channels, while keeping the representative fields as is (e.g. if the image is 40mm, draw a 7mm box around representative cells/Golgi, and enlarge to 15mm in the bottom left). This change should be made to F1E, F2F, F3E, F3J, and F3M, as well as having enlarged figures in the corners in all supplementary data IF figures. Where possible, a fully enlarged image of the bounding box could also be included. Some of the IF data would be strengthened by using the nuclei stain to draw a masking outline to include in the black and white channels, to clearly delaminate what is Golgi-localized and what is nuclear. *

      Response: We thank the reviewer for this helpful suggestion and fully agree that enlarged insets substantially improve the visibility of Golgi-localised signal, particularly when figures are printed. We share the reviewer's view that alternative display formats with larger insets would be preferable, and we have implemented enlarged boxed regions wherever space constraints permitted.

      Specifically, we have added boxed regions with enlarged insets to Figure 1E, all panels of Figure 3. For Figure 2, the number of conditions and proteins displayed simultaneously within the constraints of standard journal figure dimensions made it impractical to include enlarged insets for all panels without reducing the overall field size to the point of losing contextual information. We have nonetheless improved the visibility of the Golgi signal in Figure 2 as much as possible within these constraints, and note that the final figure layout will be further optimised in line with the journal's specific formatting guidelines. In addition, all figures have been provided as high-resolution image files to allow electronic magnification, enabling readers to inspect the Golgi-localised signal in detail beyond what is visible in the printed version.

      Regarding the use of nuclear outline masks in single-channel images, we tested this approach but found that given the number of structures present within each field, including Golgi stacks, nuclear foci, and cytoplasmic signal, overlaying nuclear outlines on individual channels added visual complexity that made the images harder rather than easier to interpret. As an alternative, we have included a full-colour merged panel, when possible, which we consider a cleaner way to delineate nuclear versus Golgi-localised signal and allows the reader to directly compare compartment-specific distributions across channels.

        1. *There is a lack of consistency in the representative images shown by IF. For example, Figure 1 gives the impression of very little RAD51C in the nucleus but this is rightly shown to not be the case in Supp. Fig 2A. The same is true of the various images of LIG1. The authors should use representative data that better reflects the distribution of the proteins being studied and maintain consistency across images. If there is a lot of variation in staining patterns, the authors should show images and percentages corresponding to the variations especially for the key gene studied, RAD51C.

      Response: We agree and have replaced the representative IF panels for RAD51C and LIG1 with images that better reflect the quantified distributions across biological replicates. The revised panels were selected to match the quantified compartment intensities shown in the accompanying graphs rather than representing outlier cells. We would also note that the apparent discrepancy between Figure 1E and Supplementary Figure S2A partly reflects a difference in imaging conditions: Supplementary Figure S2A __and __Figure 2F were acquired directly from the high-content screening pipeline under uniform, non-optimised antibody and fixation conditions at widefield resolution, whereas Figure 1E shows representative single optical section confocal images acquired after candidate identification with antibody conditions optimised for each individual protein. The improved signal-to-noise in the optimised confocal images more faithfully captures the dual Golgi and nuclear localisation of RAD51C, and the apparent difference between the two image sets is therefore expected rather than inconsistent. We have updated the figure legends to clarify the imaging modality and conditions for each panel. Furthermore, the quantified distribution of RAD51C across Golgi, nuclear and cytoplasmic compartments across multiple cell lines is shown in Figure 3B and 3D, providing a population-level representation of the dual localisation that complements the representative images shown in Figure 1E.

        1. *The initial screening by siRNA-mediated knockdown pipeline that validated and confirmed dual Golgi and nuclear localization of 163 of the 329 dual-localization HPA proteins does not have any data included. This seems like a very large amount of data to gloss over and not include even as supplementary data. This should be included as source data, and discussion of the in-text information should be strengthened. The data included with the networking of these validated proteins is strong, but the process of elimination and validation has not been shown. In addition, the antibody information included in the supplementary data does not include dilution factors or blocking factors is not included, which would be beneficial to future studies to include.

      Response: We agree and have addressed this in full. We note that the HPA antibody validation data, including immunofluorescence images and siRNA knockdown results, are publicly available for inspection on the Human Protein Atlas website (www.proteinatlas.org) for the majority of candidates, providing an independent layer of verification. In the revised submission, we additionally provide the complete siRNA-mediated validation dataset generated in our laboratory as source data (Table S1; lines 1025-1041), including for each candidate the HPA antibody identifier, gene symbol, Ensembl ID, antibody staining pattern, siRNA identifier, cell number per replicate, and normalised Golgi and nuclear signal ratios for both experimental replicates. This allows readers to inspect the validation metrics directly and apply alternative thresholds if desired. We have also expanded the antibody information to include diluent conditions (4% FBS in 0.1% Triton-X100 for all HPA antibodies used at 2 μg/ml in the screening pipeline), enabling reproducibility and reuse of the dataset by the community.

        1. *The authors should expand upon the paragraph lines 155-162 to include more discussion on Figure S2A and S2B. The expanse of this data is some of the strongest in the paper, and it should be further discussed in-text. Also, the rationale behind the choice in the specific proteins that are included in these analysis / figures is not always clear in -text, and more attention should be spent on the narrowing down of the analysis to the final proteins. This is also especially important as many of the DDR proteins chosen are not the most common DDR proteins. Also note in text that the Golgi marker GM130 (presumably) was used for the screening, which means that some proteins which are only localizing to the TGN46 trans Golgi might have been lost in the validation step (or, explain why this is not the case).

      Response: __We expanded the Results text (__lines 141-163) to discuss Figures S2A and S2B in more depth and clarified the rationale for selecting the final set of DDR proteins taken forward, including considerations of pathway representation, bioinformatic annotations, literature-described roles in DNA repair. We would also note that the identity of the DDR proteins identified in this screen was determined by the HPA dataset and the unbiased validation pipeline rather than by prior assumptions about which repair factors would be present at the Golgi. The presence of less commonly studied DDR factors is therefore a direct reflection of the screen output, and we consider this one of the strengths of the approach.

      We would also like to address the reviewer's concern about potential GM130-based bias directly: at the widefield or confocal resolution used in the high-content screening pipeline, the Golgi apparatus appears as a single perinuclear structure and cis- and trans-Golgi subdomains cannot be resolved. GM130 was therefore used purely as a segmentation marker to define the Golgi compartment as a whole rather than to selectively label the cis-Golgi cisternae. The resulting Golgi mask captures signals from the entire Golgi ribbon, including trans-Golgi regions, meaning that proteins with exclusively trans-Golgi localisation would not have been systematically excluded at the screening stage. Sub-compartmental resolution of cis versus trans localisation was only possible in subsequent analyses using nocodazole-dispersed mini-stacks imaged by confocal microscopy with co-staining for both GM130 and TGN46.

      *5. The relationship between Giantin loss, increased cell proliferation, and elevated endogenous DNA damage as it relates to RAD51C remains insufficiently resolved and requires further clarification. Several of the proliferation assays used are not optimal for addressing changes in cell growth. For example, Figure 5O appears to quantify cell numbers by counting fields from IF images, which is an unconventional approach. This should be done by growth curves, luminescent viability or colony formation assays. In addition, this point will be greatly strengthened by performing rescue experiments for Giantin directly (instead of co-depletion as a means of rescue) and/or using a mutant of RAD51C that does not bind to Giantin. If these additional experiments are beyond the current scope, the conclusions should be softened in the discussion. *

      Response: We thank the reviewer for raising these important points, which we address in turn:

      Giantin-RAD51C relationship and mechanistic interpretation. __We acknowledge that establishing the full causal chain between Giantin loss, RAD51C mislocalisation, elevated endogenous DNA damage and increased cell proliferation is challenging within the scope of a single study, and we discuss this openly in the Discussion (__lines 555-564). Our evidence collectively includes: physical interaction between endogenous Giantin and RAD51C by co-immunoprecipitation (Figures 4H and 4I), premature nuclear accumulation of RAD51C upon Giantin depletion (Figures 4B-4E and 4J-4M), new additional experiment showing direct reduction of HR efficiency in the DR-GFP assay (Figure 5L), impaired ATM signalling (Figures 5J and 5M), elevated genomic instability (Figures 5A-5E), and epistatic rescue by RAD51C co-depletion (Figures 5M-5P). These observations are further contextualised by the established literature on RAD51C function: RAD51C is known to regulate CHK2 phosphorylation and cell cycle checkpoint signalling (Badie et al, 2009), stabilise replication forks (Somyajit et al, 2015), and promote RAD51 filament formation required for DSB repair (Prakash et al, 2015). Dysregulation of these functions through Giantin-dependent mislocalisation provides a mechanistically coherent explanation for the elevated genomic instability and altered proliferation we observe, and is entirely consistent with our model. Together, the experimental evidence and the published biology of RAD51C support a model in which Giantin spatially regulates RAD51C to maintain proper DDR signalling and HR capacity.

      We agree that separation-of-function tools would further strengthen this model and identify these as important future priorities. We wish to note however that both approaches face substantial technical barriers in this system. As described in our response to Reviewer 1 Major Comment 1, RAD51C tagging, whether by CRISPR-mediated endogenous editing or ectopic expression, consistently compromised cell viability and protein function, precluding the generation of interaction-deficient variants at physiological expression levels. Engineering an interaction-deficient Giantin mutant presents an independent and considerable challenge: Giantin is one of the largest Golgi matrix proteins (~376 kDa), composed almost entirely of extended coiled-coil domains that are intrinsically difficult to model structurally, and identifying a discrete interaction interface with RAD51C without disrupting the broader scaffolding function of the protein would require a dedicated structural and biochemical programme. We therefore consider these important but substantial future directions rather than straightforward experimental additions to the current study.

      Proliferation assays. Colony formation assays provide a rigorous readout of long-term proliferative capacity, and these data are presented for single knockdown conditions in Figures 5F-5I. The cell number quantification in Figure 5P was specifically included to assess the double knockdown of Giantin and RAD51C simultaneously, a condition not covered by the colony formation assay. We respectfully note that automated fluorescence microscopy-based nuclear counting is a well-established approach for measuring cell proliferation in siRNA screening contexts. Nuclear counting from high-content imaging has been used as a direct readout of cell growth and proliferation in RNAi screens (Boutros et al, 2004; Martin et al, 2014; Garvey et al, 2016; Mikheeva et al, 2024), and has been shown to produce results comparable to or superior to conventional viability assays including MTT and flow cytometry-based methods (Mikheeva et al, 2024). We have nonetheless clarified in the revised figure legend that Figure 5P reports relative cell number quantified by automated nuclear counting from high-content imaging fields as a secondary concordant measure alongside the colony formation data, rather than a standalone proliferation assay.

      *6. It is unclear from the discussion and from presented data whether proteins are directly transported between the Golgi and the nucleus, or whether they go into the cytoplasm for a transient period, presumably when they could interact with Importin β. There is also some data where cytoplasm signal could be quantified to address this (Figure 3E-I). *

      Response: We thank the reviewer for this mechanistic point. In the revised manuscript we have included cytoplasmic RAD51C signal quantification alongside Golgi and nuclear measurements for the doxorubicin time course (lines 297-305; Figure 3H). The cytoplasmic signal shows a moderate and gradual reduction distinct in both magnitude and kinetics from the sharp Golgi decrease, consistent with a transient cytoplasmic intermediate rather than a stable pool. Regarding the identity of the translocating pool, two observations directly support a Golgi origin. First, Importazole treatment prevents RAD51C release from the Golgi following genotoxic stress and simultaneously reduces nuclear RAD51C foci formation, demonstrating that Importin-β-mediated import is required both for Golgi clearance and for productive nuclear accumulation. Second, Giantin depletion which prematurely releases the Golgi-tethered pool, leads to aberrant nuclear RAD51C foci, directly linking the Golgi-anchored fraction to nuclear accumulation. Together these data support a model in which Golgi-resident RAD51C transits through the cytoplasm for Importin-β-mediated nuclear import. We acknowledge that without direct labelling of the Golgi-anchored fraction, the precise contribution of each subcellular pool to the nuclear accumulation cannot be fully resolved with the current dataset. We discuss the development of appropriate tagging strategies as an important future direction to dissect the dynamics of this process in further detail.

      *7. Statistical analysis on experiments with more than two samples need to be performed with ANOVA and a follow up post-hoc test, not with two-tailed unpaired Student's t-test, which only compares the control and each individual sample. This type of analysis inflates the Type 1 error rates (false positives) in your datasets. For example, the two-tailed unpaired Student's t-test is appropriate in Figure 2F-H, but not in Figure 3 when the samples are timepoints. In this case, a One-way ANOVA with Tukey's post-hoc test (if you want to show all coparisons), or Bonferroni/Sidak if you only need to compare several samples). *

      Response: We agree with the reviewer and thank them for highlighting this important statistical issue. We have revised the statistical analysis for all experiments involving more than two groups to avoid inflation of Type I error rates caused by multiple pairwise Student's t tests. Specifically, for Figures 3F-I, 4C-E, and Figure 5, the data were reanalysed using one way ANOVA followed by the appropriate multiple comparisons post hoc test. The Methods section and corresponding figure legends have been updated to clearly state the statistical tests used for each dataset.

      Minor Comments: General 1. Throughout the text, the reference to many figures and supplementary figures in the same sentence, with little discussion of the data therein makes it hard to follow. In-text referencing is particularly confusing in the section "Dual-localising DDR proteins dynamically redistribute between the Golgi and nucleus in response to specific types of DNA injuries," where the reader is switching between multiple figures and supplementary figures.

      __Response: __We thank the reviewer for this helpful comment. In the revised manuscript, we have improved the readability of the text and revised the figure references to make them clearer. We hope these revisions make the manuscript easier to follow and allow readers to better inspect the figures.

      1. In figures that display technical replicates as individual data points, consider distinguishing each replicate by using different marker shapes (e.g., repeat 1 = upright triangle; repeat 2 = inverted triangle; repeat 3 = diamond). This would provide additional clarity regarding the consistency and repeatability of each technical repeat.

      __Response: __We thank the reviewer for this suggestion. We have updated the data presentation to distinguish biological replicates using different marker shapes in datasets where replicate tracking is of particular relevance to the interpretation. For datasets where individual replicate values are already clearly separable, we have maintained the existing presentation to avoid unnecessary visual complexity.

      1. Make sure all western blot data includes the marker size (F3C and F5L has none, F4H/I have size of proteins not size of markers).

      __Response: __We added missing marker sizes to our western blot data in the revised manuscript.

      1. Be consistent with use of capitalization in figure legends and graph/figure labels.

      __Response: __We made sure that the capitalisation is consistent in figure legends, graph and figure legends in the revised manuscript.

      Figure 2

      In Figure 2A, please include in the figure itself that GM130 is the cis Golgi, and TGN46 is the trans Golgi (Figures should not be dependent on the text for full understanding).

      __Response: __We revised Figure 2A and 2C to label GM130 as cis-Golgi and TGN46 as trans-Golgi within the figure, making it self-explanatory.

      1. Why are LRIG2 and LRRIQ3 not included in the 2E cis vs trans Golgi data, when all other proteins from F1D are included? Include, or comment on in-text.

      __Response: __Both LRIG2 and LRRIQ3 are included in 2E in both the original and revised manuscript.

      1. Be sure to include scale bar data in each figure legend (F2A-E is currently missing it), and include updated scales included in the enlarged data.

      __Response: __Scale bar data is now included in each figure legend in the revised manuscript.

      1. In Figure 2F, make sure that the merged green channel is presented at the same intensity as it is in the single black and white channel, as the green looks very overexposed in several of the merged (CCAR1 DMSO merged is the most noticeable).

      __Response: __We agree and thank you for pointing this out. We have now revised the images and corrected the issue by updating all image panels in the figure.

      1. In Figure 2G, include the grey label in the figure legend.

      __Response: __We thank the reviewer for this comment. The grey label has now been included in the figure legend in the revised manuscript.

      1. In Figure 2G-H, the method of data presentation in the graphs coupled with the statistical analysis is confusing and should be expanded upon in the legend.

      __Response: __We agree that the amount of data presented may appear overwhelming. In the revised figure, we have adjusted the placement of the statistical annotations to improve clarity. Also, we improved the figure legend, to make the figure easier to read and interpret.

      Figure 3

      Figure E/F/G: Is there cytoplasmic quantification as well? Your rationale is that the Golgi RAD51C goes into the nucleus, but via the cytoplasm (due to Importin β import); do you see the cytoplasmic levels increase? Or is it too dilute to notice a difference? At least, this omission needs to be mentioned in-text.

      Figure H/I also include the quantification of the cytoplasmic fraction. It is mentioned in-text on line 272, but not quantified. This comes up as a big question: Do the proteins go directly between the Golgi and nucleus, or do they go through the cytoplasm?

      __Response: __We thank the reviewer for both of these related points. As described in our response to Major Comment 6 above, we have added cytoplasmic RAD51C signal quantification to the doxorubicin time course in the revised manuscript (Figure 3H) and discuss the implications for the proposed translocation route.

      Figure 3A, 3E, and if the data is present for 3J and 3M, could all benefit from using the nuclei staining as a mask to draw an outline around the nucleus in the other channels, and then show a merge in full color instead of a nuclei-only channel. Also note from the major comments, that this data especially is so small to see without enlarged images.

      __Response: __We thank the reviewer for this suggestion. Regarding nuclear outline masks, we tested this approach but found that the number of structures present in each field, including Golgi stacks, nuclear foci and cytoplasmic signal, made overlaid outlines visually confusing rather than clarifying. We have instead included a full-colour merged panel in Figure 3E, which we consider a cleaner way to distinguish nuclear from Golgi-localised signal while preserving the spatial context of the data.

      Regarding image size, we have added enlarged insets to Figures 3E, 3J and 3M in the revised manuscript. We have chosen to display multiple cells per panel rather than a single enlarged cell in order to capture the heterogeneity of the cell population, which we consider important for an accurate representation of the data. All figures have been provided as high-resolution image files to allow electronic magnification, enabling detailed inspection of the signal beyond what is visible in the printed version. We acknowledge that the constraints of standard journal figure dimensions limit how large individual panels can be, and the final layout will be optimised in line with the journal's formatting guidelines.

      *In-text discussion of the results from Figure 3 has an in-depth discussion of the NLS and NES in RAD51C, but this is not followed up on with site-directed mutagenesis or any data; perhaps move this to the discussion instead of results section. *

      __Response: __We have removed the discussion of the NLS and NES from the Results section.

      Figure 4

      Comments from earlier figures hold, with size of enlarged events and using the nuclei as an outline in the single channels. E.g. Figure 4F arrows appear to point to nothing at the chosen scale. The zoom in 4G is insufficient, as the chosen feature is so small it is not even visible in full fields.

      __Response: __We thank the reviewer for this comment. The arrows in Figure 4F indicate individual nocodazole-dispersed Golgi mini-stacks, which are displayed at higher magnification in Figure 4G. The full field in Figure 4F is intentionally shown to illustrate the degree of Golgi dispersion achieved by nocodazole treatment, a context that may be unfamiliar to readers outside the Golgi field, before zooming into a single representative mini-stack in Figure 4G for the cisternal localisation analysis.

      • Figure 4H and 4I need to show the size of the markers *

      __Response: __The size of the markers are now included in the revised manuscript.

      *The representative image in 4L for siGiantin pATM has no pATM foci, while the quantification in 4M has a reduction from ~50% to ~25%, so this image is not representative of this data, or the data quantification is not as strong as the actual data. *

      __Response: __We thank the reviewer for this observation. We wish to clarify that the quantification in Figure 4M reports the mean percentage of RAD51C foci co-localising with pATM across the entire cell population from three independent biological replicates. A reduction from ~50% to ~25% therefore reflects a population-level shift in co-localisation frequency, not that every individual cell shows exactly 25% co-localisation. Given the inherent cell-to-cell variability in foci number and co-localisation, individual cells will span a range of values around this mean, and the representative image shown in Figure 4L reflects one such cell.

      Figure 5

      *Figure 5A has overexposure of the nuclei stain in order to visualize micronuclei. Readjust the levels, and enlarge the images for better visualization. (is this DAPI-stained? Please label). *

      __Response: __The display levels of the nuclear stain in Figure 5A are intentionally set to allow visualisation of micronuclei, which are significantly dimmer than the main nucleus and would not be detectable at display settings optimised for the primary nuclear signal. This is standard practice in micronuclei quantification studies and is necessary to accurately identify and score these structures. The nuclear stain is Hoechst 33342, and this has been explicitly labelled in the revised figure legend.

      *Figure 5A-C: Figure 5A does not show siRAD51, but it is included in the DMSO only graph. Please either show RAD51 data in 5A and 5C, or do not include in 5B. If the DMSO and ETO experiments were performed separately and that accounts for this discrepancy, then show separately. *

      __Response: __We thank the reviewer for this observation. The siRAD51C condition is included in Figure 5B as an internal positive control, consistent with its well-established role in genome stability. RAD51C depletion combined with etoposide treatment resulted in severe cellular toxicity and insufficient cell numbers for reliable quantification, and this condition was therefore excluded from Figure 5C. This has been clarified in the revised figure legend.

      *Figure 5M the white label is difficult to see in the green box. *

      __Response: __We have updated the label colour in Figure 5M to improve visibility against the green background in the revised manuscript.

      * Supplementary Figures*

      Consider reordering/ subdividing supplementary figures for ease of reference during reading.

      Response: We thank the reviewer for this suggestion. The current supplementary figure structure was intentionally designed to minimise the total number of supplementary figures and maintain a logical correspondence with the main figures, avoiding a situation where readers need to navigate an extensive supplementary section, a concern the reviewer raised regarding figure presentation. We believe the current organisation achieves a reasonable balance between completeness and accessibility.

      SF1 and SF2A: Include enlarged boxes or full images so that data is visible.

      __Response: __As described in our response to Major Comment 1, all figures have been provided as high-resolution image files to allow electronic magnification. Space constraints within standard journal figure dimensions preclude the addition of enlarged insets to all supplementary panels without substantially reducing the contextual field of view.

      *SF3A, SF4A, and SF5A: Include enlarged images, include nuclei marker if possible (otherwise, the nuclear intensity is not proven nuclear). *

      Response: We appreciate the suggestion, but adding enlarged insets and nuclei markers to all panels in Figures S3A, S4A and S5A would disproportionately increase the length and complexity of the supplementary section, making it harder rather than easier to navigate. The nuclear intensity measurements are derived from automated segmentation of the Hoechst channel using CellProfiler, which reliably defines nuclear boundaries independently of the antibody channel, and are therefore not dependent on visual confirmation of nuclear localisation in each representative image.

      *SF3B-C, SF4B-C, and SF5 B-D: Change the data presentation in the same method as changed for F2G-H. *

      Response: We have updated the figure legends for Figures S3B-C, S4B-C and S5B-D to improve readability.

      SF3D: List proteins in the same order as in B and C.

      Response: The proteins in Figure S3D are listed in the same order as in Figures S3B and S3C.

      SF6D: Label M N and C more clearly. Include size labels.

      Response: We have added clearer labels for the membrane (M), nuclear (N) and cytoplasmic (C) fractions and included molecular weight size markers in the revised Figure S6D.

      *SF7A-B: Include enlarged. *

      Response: We respectfully note that the purpose of Figures S7A-B is to display the overall cellular response to inhibitor treatments across the cell population, rather than to highlight specific subcellular structures. Enlarged insets would reduce the number of cells visible per panel and would not add scientific value in this context. The Golgi and nuclear signals are clearly visible at the chosen magnification.

      *SF8: Include arrows as in previous experiments, include enlarge. *

      Response: Arrows have been added to Figure S8 to indicate Golgi and nuclear RAD51C signal, consistent with the annotation style used in the main figures. The images already show two representative cells per condition to maximise the visible detail at the chosen scale.

      *SF9G: G is labelled, but not included. *

      Response: Figure S9G has been added in the revised manuscript, showing the pan-cancer overall survival map for GOLGB1 expression across all TCGA cohorts generated using GEPIA2. The figure legend has been updated accordingly.

      *Reviewer #3 (Significance (Required)): *

      * The work finds new roles for the Golgi in regulation of DNA damage responses and the screen could be an important dataset (but results need to be made available) for the DNA repair community. The scope of the initial screen of HPA antibodies and Golgi/Nuclear dual proteomes is impressive, and the overlap of DDR proteins is characterized for fifteen different proteins at a sub-compartmental level. The work provides important insights into RAD51C regulation, however, there are key mechanistic insights and control experiments missing from the studies involving RAD51C and Giantin, dampening its impact. The idea of an alternative cellular compartment for storage of DDR factors prior to damage is interesting, and suggests the spatial regulation of specific lesion responses are stored in specific sub-compartments of the Golgi, which could contribute to repair regulation.*

      References:

      Adamson B, Smogorzewska A, Sigoillot FD, King RW & Elledge SJ (2012) A genome-wide homologous recombination screen identifies the RNA-binding protein RBMX as a component of the DNA-damage response. Nat Cell Biol 14: 318-328

      Badie S, Liao C, Thanasoula M, Barber P, Hill MA & Tarsounas M (2009) RAD51C facilitates checkpoint signaling by promoting CHK2 phosphorylation. J Cell Biol 185: 587-600

      Bergen DJM, Stevenson NL, Skinner REH, Stephens DJ & Hammond CL (2017) The Golgi matrix protein giantin is required for normal cilia function in zebrafish. Biol Open 6: 1180-1189

      Berti M, Teloni F, Mijic S, Ursich S, Fuchs J, Palumbieri MD, Krietsch J, Schmid JA, Garcin EB, Gon S, et al (2020) Sequential role of RAD51 paralog complexes in replication fork remodeling and restart. Nat Commun 11: 3531

      Boutros M, Kiger AA, Armknecht S, Kerr K, Hild M, Koch B, Haas SA, Paro R, Perrimon N & Heidelberg Fly Array Consortium (2004) Genome-wide RNAi analysis of growth and viability in Drosophila cells. Science 303: 832-835

      Brinkmann K, Schell M, Hoppe T & Kashkar H (2015) Regulation of the DNA damage response by ubiquitin conjugation. Front Genet 6: 98

      Garvey CM, Spiller E, Lindsay D, Chiang C-T, Choi NC, Agus DB, Mallick P, Foo J & Mumenthaler SM (2016) A high-content image-based method for quantitatively studying context-dependent cell population dynamics. Sci Rep 6: 29752

      Ghannoum S, Fantini D, Zahoor M, Reiterer V, Phuyal S, Leoncio Netto W, Sørensen Ø, Iyer A, Sengupta D, Prasmickaite L, et al (2023) A combined experimental-computational approach uncovers a role for the Golgi matrix protein Giantin in breast cancer progression. PLoS Comput Biol 19: e1010995

      Greenhough LA, Liang C-C, Belan O, Kunzelmann S, Maslen S, Rodrigo-Brenni MC, Anand R, Skehel M, Boulton SJ & West SC (2023) Structure and function of the RAD51B-RAD51C-RAD51D-XRCC2 tumour suppressor. Nature619: 650-657

      Halim VA, García-Santisteban I, Warmerdam DO, van den Broek B, Heck AJR, Mohammed S & Medema RH (2018) Doxorubicin-induced DNA damage causes extensive ubiquitination of ribosomal proteins associated with a decrease in protein translation. Mol Cell Proteomics 17: 2297-2308

      Koreishi M, Gniadek TJ, Yu S, Masuda J, Honjo Y & Satoh A (2013) The golgin tether giantin regulates the secretory pathway by controlling stack organization within Golgi apparatus. PLoS One 8: e59821

      Martin HL, Adams M, Higgins J, Bond J, Morrison EE, Bell SM, Warriner S, Nelson A & Tomlinson DC (2014) High-content, high-throughput screening for the identification of cytotoxic compounds based on cell morphology and cell proliferation markers. PLoS One 9: e88338

      Meindl A, Hellebrand H, Wiek C, Erven V, Wappenschmidt B, Niederacher D, Freund M, Lichtner P, Hartmann L, Schaal H, et al (2010) Germline mutations in breast and ovarian cancer pedigrees establish RAD51C as a human cancer susceptibility gene. Nat Genet 42: 410-414

      Mikheeva AM, Bogomolov MA, Gasca VA, Sementsov MV, Spirin PV, Prassolov VS & Lebedev TD (2024) Improving the power of drug toxicity measurements by quantitative nuclei imaging. Cell Death Discov 10: 181

      Ovejero S, Kumanski S, Soulet C, Azarli J, Pardo B, Santt O, Constantinou A, Pasero P & Moriel-Carretero M (2023) A sterol-PI(4)P exchanger modulates the Tel1/ATM axis of the DNA damage response. EMBO J 42: e112684

      Prakash R, Rawal Y, Sullivan MR, Grundy MK, Bret H, Mihalevic MJ, Rein HL, Baird JM, Darrah K, Zhang F, et al(2022) Homologous recombination-deficient mutation cluster in tumor suppressor RAD51C identified by comprehensive analysis of cancer variants. Proc Natl Acad Sci U S A 119: e2202727119

      Prakash R, Zhang Y, Feng W & Jasin M (2015) Homologous recombination and human health: the roles of BRCA1, BRCA2, and associated proteins. Cold Spring Harb Perspect Biol 7: a016600

      Presley JF, Cole NB, Schroer TA, Hirschberg K, Zaal KJM & Lippincott-Schwartz J (1997) ER-to-Golgi transport visualized in living cells. Nature 389: 81-85

      Rawal Y, Jia L, Meir A, Zhou S, Kaur H, Ruben EA, Kwon Y, Bernstein KA, Jasin M, Taylor AB, et al (2023) Structural insights into BCDX2 complex function in homologous recombination. Nature 619: 640-649

      Somyajit K, Saxena S, Babu S, Mishra A & Nagaraju G (2015) Mammalian RAD51 paralogs protect nascent DNA at stalled forks and mediate replication restart. Nucleic Acids Res 43: 9835-9855

      Soulet C, Catalan J & Moriel-Carretero M (2026) The DNA Damage Response kinase ATM restricts Golgi extension. bioRxiv

      Stadler C, Hjelmare M, Neumann B, Jonasson K, Pepperkok R, Uhlén M & Lundberg E (2012) Systematic validation of antibody binding and protein subcellular localization using siRNA and confocal microscopy. J Proteomics 75: 2236-2251

      Stevenson NL, Bergen DJM, Lu Y, Prada-Sanchez ME, Kadler KE, Hammond CL & Stephens DJ (2021) Correction: Giantin is required for intracellular N-terminal processing of type I procollagen. J Cell Biol 220

      Tang Z, Kang B, Li C, Chen T & Zhang Z (2019) GEPIA2: an enhanced web server for large-scale expression profiling and interactive analysis. Nucleic Acids Res 47: W556-W560

      Thorn CF, Oshiro C, Marsh S, Hernandez-Boussard T, McLeod H, Klein TE & Altman RB (2011) Doxorubicin pathways: pharmacodynamics and adverse effects. Pharmacogenet Genomics 21: 440-446

      van der Zanden SY, Qiao X & Neefjes J (2021) New insights into the activities and toxicities of the old anticancer drug doxorubicin. FEBS J 288: 6095-6111

      Zhang Y-W, Otterness DM, Chiang GG, Xie W, Liu Y-C, Mercurio F & Abraham RT (2005) Genotoxic stress targets human Chk1 for degradation by the ubiquitin-proteasome pathway. Mol Cell 19: 607-618

    1. I deal with two kinds of writingblocks. One occurs when we cannot write in fluent, timely fashion. Thisfirst sort of block is a familiar pressure for many of us (and for our stu-dents). The second kind of writing block refers to the paradoxical reluc-tance evidenced by academicians who could but do not offer help tostymied colleagues or students as writers.

      This is interesting and makes me think of other instances outside of writing where I may experience that paradoxical reluctance.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This work demonstrates that MORC2 undergoes phase separation (PS) in cells to form nuclear condensates, and the authors demonstrate convincingly the interactions responsible for this phase separation. Specifically, the authors make good use of crystallography and NMR to identify multiple protein: protein interactions and use EMSA to confirm protein: DNA interactions. These interactions work together to promote in vitro and in cell phase separation and boost ATPase activity by the catalytic domain of MORC2.

      However, the authors have very weak evidence supporting their potentially valuable claim that MORC2 PS is important for the appropriate gene regulatory role of MORC2 in cells. Exploring causal links between PS and function is an important need in the phase separation field, particularly as regards the role of condensates in gene regulation, and is a non-trivial matter. Any study with convincing data on this matter will be very important. For this reason, it is crucial to properly explore the alternative possibility that soluble complexes, existing in the same conditions as phase-separated condensates, are the functional species. It is also critical to keep in mind that, while a specific protein domain may be essential for PS, this does not mean its only important function pertains to PS.

      In this study, the authors do not sufficiently explore the role that soluble MORC2 complexes may play alongside MORC2 condensates. Neither do they include enough data to solidly show that domain deletion leads to phenotypes via a loss of phase separation per se, rather than the loss of phase separation being a microscopically visible result, not cause, of an underlying shift in protein function. For these reasons, the authors' conclusions regarding the functional role of MORC2 condensates are based on incomplete data. This also dampens the utility of this work as a whole, since the very nice work detailing the mechanism of MORC2 PS is not paired with strong data showing the importance of this observation.

      We thank the reviewer for this thoughtful and constructive critique. We agree that establishing a causal link between phase separation (PS) and biological function—particularly in transcriptional regulation—is a central and non-trivial challenge in the condensate field. We also appreciate the reviewer’s emphasis on two critical alternative interpretations: (i) that soluble MORC2 complexes, rather than condensates, may represent the primary functional species, and (ii) that loss of phase separation upon domain deletion could reflect a downstream consequence of altered protein function rather than its cause.

      To address these concerns, we have performed a series of new experiments specifically designed to decouple condensate formation, and condensate dynamics, thereby allowing us to more rigorously interrogate the functional relevance of MORC2 condensates.

      First, to overcome the limitation of domain deletions which may affect MORC2 function beyond phase separation we introduced a micropeptide-based kill switch (KS) to the C terminus of MORC2. This strategy has recently emerged as a powerful approach to selectively reduce condensate dynamics without disrupting protein expression, folding, or domain architecture [1]. Importantly, unlike CC3 or IDRa deletions, MORC2+KS robustly form nuclear condensates but exhibits markedly reduced internal dynamics, as demonstrated by FRAP analyses showing minimal fluorescence recovery after photo bleaching (Fig. 6a-c). This strategy therefore allows us to perturb condensate material properties independently of MORC2 domain integrity.

      Second, we systematically compared the transcriptional consequences of rescuing MORC2-knockout HeLa cells with MORC2FL, condensation-deficient mutants (ΔCC3 and ΔIDRa), and the dynamics-defective MORC2+KS (Fig. 6d). Despite being expressed at substantially higher levels than MORC2FL (Fig. 6e), all three mutants showed a striking and consistent failure to restore MORC2-dependent transcriptional regulation (Fig. 6f-h). This effect was particularly pronounced for transcriptionally repressed genes, including two sets of high-confidence MORC2 targets reported in prior studies (Fig. 6i and Fig.S10). These findings demonstrate that neither increased protein abundance nor the mere presence of condensate-like structures alone is sufficient to restore MORC2 function.

      Third, our data instead support a model in which both soluble MORC2 complexes and dynamic MORC2 condensates are required for full transcriptional regulation activity. While soluble MORC2 is likely involved in target recognition and complex assembly, our results indicate that proper condensate formation—and critically, condensate dynamics—are essential for effective transcriptional repression and activation. The inability of the MORC2+KS mutant to rescue transcriptional defects, despite intact condensate formation, points away from a model in which MORC2 condensates represent only microscopically visible byproducts of MORC2 activity.

      We believe these new data strengthen the manuscript by pairing the detailed mechanistic dissection of MORC2 phase separation with direct functional evidence, enhancing the conceptual impact and biological significance of the study.

      Strengths:

      Static light scattering and crystallography are nicely used to demonstrate the dimerization of MORC2FL and to discover the structure of the CC3 domain dimer, presumably responsible for the dimerization of MORC2FL (Figure 1).

      Extensive use of deletion mutants in multiple cell lines is used to identify regions of MORC2 that are important for forming condensates in the nucleus: the IBD, IDR, and CC3 domains are found to be essential for condensate formation, while the CW domain plays an unknown role in condensate morphology (Figure 3). The authors use NMR to further identify that the IBD domain seems to interact with the first third of the centrally located IDR, termed IDRa, but not with the latter two-thirds of the IDR domain (Figure 4). This leads them to propose that phase separation is the product of IDB:IDRa interaction, CC3 dimerization, and an unknown but important role for the CW domain.

      Based on the observation that removal of the NLS resulted in diffuse cytoplasmic localization, they hypothesized that DNA may play an important role in MORC2 PS. EMSA was used to demonstrate interaction between DNA and several MORC2 domains: CC1, CC2, IDR, and TCD-CC3-IBD. Further in vitro microscopy with purified MORC2 showed that DNA addition significantly reduces MORC2 saturation concentration (Figure 5).

      These assays convincingly demonstrate that MORC2 phase separates in cells, and identify the protein domains and interactions responsible for this phenomenon, with the notable caveat that the role of the CW domain here is left unexplored.

      We appreciate the reviewer for their positive and detailed assessment of the strengths of our study. Our understanding of the CW domain’s function remains preliminary. Although we observed that the CW domain can influence condensate size, the IDR, IBD, and CC3 domains constitute the core structural elements driving phase separation. Consequently, the CW domain was not a primary focus of the current study. Nonetheless, investigating its functional contributions represents an interesting avenue for future work.

      Weaknesses:

      Although the authors demonstrated phase separation of MORC2FL, their evidence that this plays a functional role in the cell is incomplete.

      Firstly, looking at differentially upregulated genes under MORC2FL overexpression, the authors acknowledge that only 10% are shared with differentially regulated genes identified in other MORC2FL overexpression studies (Figure 6c, d). No explanation is given for why this overlap is so low, making it difficult to trust conclusions from this data set.

      We thank the reviewer for raising this important concern. In response, we have improved the quality and robustness of our RNA-seq analysis by repeating the experiments with optimized sample handling and increased sequencing depth. Using this updated dataset, we identified a considerably higher overlap between MORC2-regulated genes in our study and those reported previously.

      Specifically, we observed 84 overlapping genes with the study by Nikole L. Fendler et al. [2], corresponding to approximately 32% of the MORC2-regulated genes reported in that work (Fig. 6i). In addition, we identified 102 overlapping genes with the dataset reported by Iva A. Tchasovnikarova et al. [3], representing approximately 22% of the genes identified in that study (Fig. S10b).

      We note that complete concordance with previous reports is not expected, given substantial differences in experimental design. For example, Fendler et al. employed a doxycycline-inducible MORC2 expression system [2], whereas our study relies on transient overexpression in MORC2-knockout HeLa cells. In contrast, Tchasovnikarova et al. compared transcriptomes between MORC2 knockout and wild-type cells [3], rather than MORC2 rescue conditions. Moreover, RNA-seq results are inherently influenced by cell line batch variability, sequencing depth, and analysis pipelines, all of which differ across studies.

      Taken together, we consider an overlap in the range of ~20–30% to be reasonable and biologically meaningful in the context of these experimental differences, and we believe that the revised RNA-seq data provide a more reliable foundation for our conclusions regarding MORC2-dependent transcriptional regulation.

      Secondly, of the 21 genes shared in this study and in earlier studies, the authors note that the differential regulation is less pronounced when a phase-separation-deficient MORC2 mutant is overexpressed, rather than MORC2FL (Figure 6e). This is taken as evidence that phase separation is important for the proper function of MORC2. However, no consideration is made for the alternative possibility that the mutant, lacking the CC3 dimerization domain, may result in non-functional complexes involving MORC2, eliminating the need for a PS-centric conclusion. To take the overexpression data as solid evidence for a functional role of MORC2 PS, the authors would need to test the alternative, soluble complex hypothesis. Furthermore, there seems to be low replicate consistency for the MORC2 mutant condition (Figure S6a), with replicate 3 being markedly upregulated when compared to replicates 1 and 2.

      We thank the reviewer for raising these important concerns. In the revised manuscript, we have substantially strengthened both the experimental evidence and the data presentation to directly address the alternative “soluble complex” interpretation as well as the issue of replicate consistency. Specifically, we now provide data that clarify the functional impact of phase-separation-deficient MORC2 mutants and explicitly show replicate-level RNA-seq analyses. The Fig. 6 and Fig. S10support these improvements and enhance both the robustness and transparency of our transcriptional analyses. Collectively, these revisions directly address the reviewer’s concerns regarding the functional interpretation of MORC2 phase separation.

      Thirdly, the authors close by examining the in-cell PS capabilities and ATPase activity of several disease-associated mutants of MORC2 (Figure 7). However, the relevance of these mutants to the past 6 figures is unclear. None of these mutations is in regions identified as important for PS. Two of the mutations result in a higher percentage of the cell population being condensate-positive, but this is not seemingly connected to ATPase activity, as only one of these two mutants has increased ATPase activity. Figure 7 does not add any support to the main hypotheses in the paper, and nowhere in the paper do the authors investigate the protein regions where the mutations in Figure 7 are found.

      We thank the reviewer for raising this point regarding Fig. 7. At the current stage, the results for disease-associated mutations are primarily descriptive. While we observed that certain mutations clustered at the N-terminus can affect MORC2 condensate formation, ATPase activity, and DNA binding, we did not identify a mechanistic explanation for these correlations. Notably, the T424R mutation, previously reported to significantly enhance ATPase activity [4], also increased both intracellular condensate formation and in vitro DNA binding in our experiments. In contrast, other mutations did not show such consistent effects. Previous studies have established that MORC2’s ATP-binding and DNA-binding activities are independent [4]. Our results further suggest that MORC2’s phase separation behavior is independent of both ATP and DNA binding affinity, although existing evidence hints at potential cross-regulatory interactions among these three functions.

      We would also like to emphasize an additional observation that may help contextualize the relevance of N-terminal mutations. Although deletion of the MORC2 N-terminus does not prevent the remaining C-terminal region from forming nuclear condensates, these C-terminal condensates exhibit a marked loss of fluorescence recovery in FRAP assays (Fig. S11). This finding suggests that while the N-terminus is not strictly required for condensate assembly, it plays an important role in regulating condensate fluidity. Accordingly, disease-associated mutations distributed across the N-terminal region may influence MORC2 function by modulating condensate material properties rather than condensate formation per se. Based on this hypothesis, we evaluated the fluidity of condensates formed by the E236G and T424R mutants. FRAP measurements indicated substantially reduced fluorescence recovery in E236G, whereas T424R exerted minimal effects (Fig. 7e, f).

      Overall, our interpretation of the results in Fig. 7 is still at a preliminary stage. Nevertheless, the role of the MORC2 N-terminus in modulating condensate fluidity, together with the observed impairment caused by the E236G mutation, appears to be robust, although the underlying mechanism remains to be elucidated. We have incorporated additional discussion on this point and consider it an important direction for future study.

      Reviewer #1 (Recommendations for the authors):

      (1) Why does MORC2 overexpression lead to changes in gene regulation that are so different from past MORC2 overexpression studies? This is unsettling to me.

      (2) Likewise, why is replicate 3 for the MORC2ΔCC3 variant so different from replicates 1 and 2? Perhaps repeating this experiment would be helpful, both for showing better repeatability and perhaps as regards pulling out a stronger phenotype.

      We have repeated the experiments and obtained improved data quality.

      (3) A better explanation of the relevance of Figure 7 to the story of the rest of the paper, especially the phase-separation of MORC2, would be important to improving this paper.

      We thank the reviewer for this suggestion. We have performed additional experiments and expanded the discussion.

      (4) Are expression levels of mutant proteins in Figure 7 uniform between mutants? If not, is it possible that expression levels might account for the difference in condensate-positive cells between mutants?

      We cannot fully exclude the possibility that differences in expression levels may contribute to the observed differences among mutants. In our experiments, equal amounts of plasmid DNA were used for transfection across all conditions. Although we did not directly quantify post-transfection protein expression levels by immunoblotting or similar approaches, even if certain mutations were to affect protein expression, it would be technically challenging to further optimize the strategy to fully normalize expression levels across mutants.

      Importantly, we note that MORC2 does not form condensates in all transfected cells, even when EGFP fluorescence indicates robust expression levels that are comparable to, or even exceed, those observed in condensate-positive cells. This observation suggests that high expression alone is not sufficient to drive MORC2 phase separation in cells. Therefore, we do not favor the interpretation that the E236K and T424R mutations enhance MORC2 condensation simply by increasing MORC2 protein expression levels.

      Minor:

      (1) I would suggest considering using the term "dynamic" rather than "liquid-like", as FRAP is technically a measurement of the dynamicity of a protein within a volume, rather than a measurement of the actual fluidity of that volume.

      We thank the reviewer for this helpful suggestion. We agree that FRAP measurements primarily report protein mobility and condensate dynamics rather than the physical fluidity of the condensates. We have therefore revised the manuscript to replace “liquid-like” with “dynamic” where conclusions are based on FRAP analyses.

      (2) A further investigation of the role of the CW domain would be very interesting, since it clearly has a major role in condensate morphology. Perhaps CW confers important heterotypic interactions which contribute to compositional control of the MORC2 condensates, and thus function and morphology? However, due to the complexity of this specific question and the potentially marginal improvement offered by this paper, I do not think this is a critical addition.

      We thank the reviewer for this insightful suggestion. We have noted this possibility in the Discussion as an important avenue for future investigation.

      (3) Why is TCD not tested alone by EMSA for affinity to DNA in Figure 5?

      Our inference regarding the DNA-binding capacity of the TCD domain was based on comparative EMSA analyses. Specifically, we found that the TCD–CC3–IBD fragment was able to bind DNA, whereas the CC3–IBD fragment alone showed no detectable DNA binding. From this comparison, we inferred that the TCD domain is responsible for the observed DNA-binding activity.

      Because the TCD domain does not affect MORC2 condensate formation, it was not a central focus of the present study, which primarily aims to elucidate the mechanisms underlying MORC2 phase separation and its functional relevance. For this reason, we did not further test TCD alone by EMSA in Figure 5.

      Reviewer #2 (Public review):

      Summary:

      The study by Zhang et al. focuses on how phase separation of a chromatin-associated protein MORC2, could regulate gene expression. Their study shows that MORC2 forms dynamic nuclear condensates in cells. In vitro, MORC2 phase separation is driven by dimerization and multivalent interactions involving the C-terminal domain. A key finding is that the intrinsically disordered region (IDR) of MORC2 exhibits strong DNA binding. They report that DNA binding enhances MORC2's phase separation and its ATPase activity, offering new insights into how MORC2 contributes to chromatin organization and gene regulation. The authors try to correlate MORC2's condensate-forming ability with its gene silencing function, but this warrants additional controls and validation. Moreover, they investigate the effect of disease-linked mutations in the N-terminal domain of MORC2 on its ability to form cellular condensates, ATPase activity, and DNA-binding, though the findings appear inconclusive in the manuscript's current form.

      Thank you for your thorough and constructive review of our manuscript. In response to the concerns raised regarding the functional relevance of MORC2 condensate formation, we have redesigned and expanded the experiments presented in Fig. 6 and Fig. S6 to directly link MORC2’s condensate-forming capacity with its transcriptional regulatory function. These new experiments provide additional controls and validation, strengthening the causal relationship between MORC2 condensate dynamics and gene regulation.

      At the current stage, the results for disease-associated mutations are descriptive. While we observed that certain mutations clustered at the N-terminus can affect MORC2 condensate formation, ATPase activity, and DNA binding, we did not identify a mechanistic explanation for these correlations. Notably, the T424R mutation, previously reported to significantly enhance ATPase activity [4], also increased both intracellular condensate formation and in vitro DNA binding in our experiments. In contrast, other mutations did not show such consistent effects. Previous studies have established that MORC2’s ATP-binding and DNA-binding activities are independent [4]. Our results further suggest that MORC2’s phase separation behavior is also independent of both ATP and DNA binding, although existing evidence hints at potential cross-regulatory interactions among these three functions.

      Strengths:

      The authors determined a 3.1 Å resolution crystal structure of the dimeric coiled-coil 3 (CC3) domain of MORC2, revealing a hydrophobic interface that stabilizes dimer formation. They present extensive evidence that MORC2 undergoes liquid-liquid phase separation (LLPS) across multiple contexts, including in vitro, in cellulo, and in vivo. Through systematic cellular screening, they identified the C-terminal domain of MORC2 as a key driver of condensate formation. Biophysical and biochemical analyses further show that the IDR within the C-terminal domain interacts with the C-terminal end region (IBD) and also exhibits strong DNA-binding capacity, both of which promote MORC2 phase separation. Together, this study emphasizes that interactions mediated by multiple domains-CC3, IDR, and IBD- drives MORC2 phase separation. Finally, the authors quantified the effect of removing the CC3 on the upregulation and downregulation of target gene expression.

      We thank the reviewer for their appreciation of the key findings presented in this manuscript.

      Weaknesses:

      Though the findings appear compelling in isolation, the study lacks discussion on how its findings compare with previous studies. Particularly in the context of MORC2-DNA binding, there are previous studies extensively exploring MORC2-DNA binding (Tan, W., Park, J., Venugopal, H. et al. Nat Commun 2025), and its effect on ATPase activity (ref 22). The contradictory results in ref 22 about the impact of DNA-binding on ATPase activity, and ATPase activity on transcriptional repression, warrant proper discussion. The authors performed extensive in-cellulo screening for the investigation of domain contribution in MORC2 condensate formation, but the study does not consider/discuss the possibility of some indirect contributions from the complex cellular environment. Alternatively, the domain-specific contributions could be quantified in vitro by comparing phase diagrams for their variants. While the basis of this study is to investigate the mechanism of MORC2 condensate-mediated gene silencing, the findings in Figure 6 appear incomplete because the CC3 deletion not only affects phase separation of MORC2 but also dimerization. Furthermore, their investigation on disease-linked MORC2 mutations appears very preliminary and inconclusive because there are no obvious trends from the data. Overall, the discussion appears weak as it is missing references to previous studies and, most importantly, how their findings compare to others'.

      We thank the reviewer for their careful assessment of MORC2’s DNA-binding properties and its relationship with ATPase and transcriptional activities. We would like to offer the following clarifications to address these concerns, which will also be incorporated into the Discussion section of the revised manuscript.

      First, recent work by Tan et al. [5] similarly identified multiple DNA-binding sites in MORC2, consistent with our findings, though there are discrepancies in the precise binding regions. In particular, they reported that isolated CC1 and CC2 domains do not bind 60 bp dsDNA, which contrasts with our observations. We attribute this difference to the types of DNA used in the assays. In our study, we employed 601 DNA, a defined nucleosome-positioning sequence, which differs substantially from randomly designed short dsDNA. For instance, prior work by Christopher H. Douse et al. [54] also confirmed that MORC2’s CC1 domain can bind 601 DNA.

      Second, in the study by Fendler et al. [2], DNA binding was reported to reduce MORC2’s ATPase activity—an observation that appears inconsistent with the results presented in our Fig. 5j. A critical distinction between the two studies lies in the experimental systems used: Fendler et al. [2] employed MORC2 constructs and 35 bp double-stranded DNA (dsDNA), whereas our experiments utilized full-length MORC2 and 601 bp DNA (a sequence with high nucleosome assembly potential). These differences including the absence of potentially regulatory C-terminal regions in the truncated construct and the varying length/structural properties of the DNA substrates introduce variables that substantially complicate direct comparative analysis of ATPase activity outcomes.

      Separately, Douse et al. [4] demonstrated that the efficiency of HUSH complex-dependent epigenetic silencing decreases as MORC2’s ATP hydrolysis rate increases, implying an inverse relationship between ATPase activity and silencing function. Notably, our current work has not established a direct mechanistic link between MORC2 phase separation and its ATPase activity. Thus, we refrain from inferring that the effect of MORC2 phase separation on transcriptional repression is mediated through modulation of its ATPase function this remains an important question to address in future studies.

      Finally, we have redesigned and expanded the experiments presented in Fig. 6 and Fig. S6 to directly link MORC2’s condensate-forming capacity with its transcriptional regulatory function.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) Unaddressed discrepancies with the previous study:

      (a) Inadequate discussion of Reference 22 and apparent contradictions. Notably, Reference 22 provides evidence for reduced ATPase activity upon DNA binding, in contrast to the current study's observations. Moreover, Reference 22 demonstrates that ATP hydrolysis (ATPase activity) is inversely associated with MORC2-mediated gene silencing, whereas this study concludes that 'the silencing function of MORC2 requires its ATPase activity'. These apparent contradictions warrant a more thorough discussion to reconcile the differences, including potential mechanistic explanations and experimental context that could account for the discrepancies. Additionally, the authors should discuss potential reasons why Ref. 22 may not have observed phase separation during MORC2 biophysical analysis. For instance, in Ref. 22, SEC-MALS was performed at 2 mg/mL (~16 µM) MORC2 FL in the presence of 150 mM NaCl, conditions that could influence phase behavior based on the current manuscript's results. Addressing whether differences in protein construct, buffer composition, or experimental design might account for this discrepancy would strengthen the discussion.

      We thank the reviewer for pointing out the apparent discrepancies between our results and those reported in Ref. 22. We agree that these differences warrant explicit discussion, and we have revised the Discussion accordingly to clarify the experimental and conceptual distinctions between the two studies.

      First, regarding the effect of DNA binding on ATPase activity, Ref. 22 examined MORC2 ATPase activity under conditions where MORC2 does not undergo detectable phase separation, whereas our ATPase assays were performed under conditions in which MORC2 readily forms condensates in the presence of DNA. We therefore propose that the observed increase in ATPase activity in our study may reflect a distinct biochemical regime in which phase separation and/or high local protein concentration modulates enzymatic activity. Importantly, our data do not exclude the possibility that DNA binding per se can inhibit ATPase activity under non-condensing conditions, as reported in Ref. 22.

      Second, with respect to transcriptional repression, Ref. 22 reported an inverse correlation between ATP hydrolysis and MORC2-mediated silencing, whereas our study finds that ATPase activity is required for efficient repression. We suggest that these observations are not necessarily contradictory but may reflect different regulatory layers of MORC2 function. Specifically, ATP binding and hydrolysis may be required for MORC2 structural remodeling and chromatin engagement, while excessive or dysregulated ATP hydrolysis could impair stable silencing complexes, as suggested previously [4]. We now explicitly discuss this possibility in the revised manuscript.

      Finally, we appreciate the reviewer’s suggestion regarding the absence of phase separation in Ref. 22. Indeed, SEC-MALS experiments in Ref. 22 were conducted at ~16 µM MORC2 in the presence of 150 mM NaCl (the purification condition is 500 mM NaCl, 10% glycerol), conditions that based on our phase diagrams—are close to or above the saturation concentration but also strongly influenced by ionic strength. This combination of factors explains why the UV peak from SEC-MALS is not indicative of a homogeneous sample [3].

      (b) The DNA binding capacity of individual MORC2 domains was tested in Fig. 5. IDR appears to be the strongest DNA binder among others. Is this the effect of IDR being isolated from the rest of the protein? A recent paper (Tan, W., Park, J., Venugopal, H. et al. Nat Commun 2025) also investigated DNA binding capacity of different regions of MORC2 using hydrogen-deuterium exchange experiments and EMSA. Interestingly, it can be seen in Figure S9 that the DNA binding capacity of different regions changes when compared together to when in isolation (MORC2 1-603 vs 1-265; 1-495; 496-603). In line with the above, MORC2 IDR's interaction with DNA warrants additional investigation, taking the system as a whole to avoid misinterpretation arising from non-specific interactions.

      We appreciate the reviewer’s insightful comments regarding domain-specific DNA binding and the potential caveats of studying isolated regions. In Figure 5, our EMSA analyses show that the isolated IDR exhibits the strongest DNA-binding signal among the tested fragments. We agree that this observation may, at least in part, reflect the removal of structural or regulatory constraints imposed by the full-length protein.

      Consistent with the reviewer’s point, Tan et al. [5] demonstrated that DNA-binding behavior of MORC2 regions differs when analyzed in isolation versus in the context of larger constructs. We have now incorporated this comparison into the Discussion and explicitly note that DNA binding by the IDR should be interpreted as a contextual and potentially cooperative property rather than an autonomous function.

      Importantly, our conclusions do not rely on the IDR acting as an independent DNA-binding module in vivo. Rather, we propose that the IDR contributes to DNA engagement and phase behavior within the architectural framework of full-length MORC2. We now emphasize this limitation and highlight the need for future studies that probe DNA binding in the context of intact MORC2 or minimally perturbed constructs.

      (2) MORC2 DNA binding impacting phase separation and ATPase activity:

      While it is clear that MORC2: DNA interaction facilitates MORC2 phase separation, the impact on ATPase activity is not conclusive. First, they observe an opposite trend (compared to ref. 22) for DNA binding on MORC2's ATPase activity. Secondly, it is not clear if the increase in ATPase activity is mediated by DNA binding or phase separation. The ATPase activity was measured at 1 µM MORC2 protein concentration in the presence of DNA, where MORC2 appears to phase separate. To draw more definitive conclusions, additional controls are necessary. Specifically, a phase separation-deficient mutant (from this study) and a DNA-binding-deficient mutant (see ref. 22) should be included to disentangle the contributions of DNA binding and phase separation to ATPase activity. The choice of ATP-binding-deficient mutant N39A as a negative control seems inconclusive in this regard. Additionally, why is there an increase in ATP hydrolysis rate for the ATP-binding-deficient mutant in the presence of DNA, resulting in ATP hydrolysis rates similar to WT MORC2? This raises further questions about the underlying mechanism.

      We agree with the reviewer that disentangling the contributions of DNA binding and phase separation to ATPase activity is challenging and that our current data do not fully resolve this issue. As noted, ATPase assays were performed at protein concentrations (1 µM) where MORC2 undergoes DNA-induced phase separation, making it difficult to distinguish whether enhanced ATP hydrolysis arises directly from DNA binding or indirectly from condensate formation.

      We acknowledge that inclusion of additional mutants such as phase separation deficient or DNA-binding deficient variants would provide a more definitive mechanistic separation of these effects. However, generating and validating such mutants in a manner that preserves overall protein integrity is beyond the scope of the current study. Accordingly, we have revised the text to present our findings more cautiously and to frame the observed ATPase enhancement as a correlation rather than a causal mechanism.

      Regarding the ATP-binding–deficient N39A mutant, we agree that its behavior in the presence of DNA raises interesting mechanistic questions. We now explicitly note this unexpected observation and discuss possible explanations, including partial ATP binding, altered oligomeric states, or indirect effects mediated by condensate formation.

      (3) Dissecting the domain-specific contribution in MORC2 phase separation:

      (a) While in cellulo data indicate that the presence of IDR, NLS, CC3, and IBD is all essential for MORC2 condensate formation, it is not clear if this is the effect of the complex cellular environment or whether it is intrinsic for MORC2 phase separation ability. In lines 256-259, the authors suggest IDRa interaction with IBD may serve as a nucleation mechanism for LLPS. In other places, it has been mentioned that CC3 dimerization acts as a scaffold for condensate formation. It is not clear if all of these are essential for MORC2 phase separation, or one of them is essential while the other domain(s) facilitates the phase separation. Though Figure 3 provides a qualitative overview of the contribution of different regions in MORC2 phase separation in cellulo-influenced by the complex cellular environment and substrate interactions, the absolute domain contribution in phase separation would be better studied in vitro by quantitatively comparing phase diagrams (for example, c-sat vs temperature) of different domain deletion constructs.

      We thank the reviewer for highlighting the distinction between intrinsic phase separation propensity and cellular context dependent effects. Our in cellular screening was designed to identify regions required for condensate formation under physiological conditions, where chromatin, binding partners, and macromolecular crowding are present. We agree that this approach does not directly quantify the intrinsic phase separation contribution of individual domains.

      While CC3 dimerization, IDR–IBD interactions, and nuclear localization all contribute to condensate formation, our data do not imply that these elements are mechanistically equivalent. Rather, we propose that CC3 provides a structural scaffold, while IDR-mediated interactions lower the energetic barrier for condensation. We have revised the manuscript to clarify this hierarchical model and to avoid implying that all domains contribute equally or independently.

      We agree that quantitative in vitro phase diagrams would provide valuable insight into intrinsic domain contributions. Whereas the MORC2ΔCC3-IBD (1–900) and CC3-IBD (900-1032) fragment fails to induce phase separation, the IDR mix CC3–IBD fragment drives robust phase separation; additionally, phase separation is entirely abrogated in the absence of domain–domain interactions. These observations collectively verify that phase separation is contingent on specific domain combinations and their interactions.

      (b) Similarly, for line 228-231: 'Notably, condensates formed exclusively in the nucleus and not in the cytoplasm of transfected HeLa cells, suggesting that chromatin-associated nuclear factors, such as DNA, may contribute to the nucleation or stabilization of MORC2 condensates.' This is an important observation made by the authors. Since MORC2 readily phase separates in vitro under physiological conditions, it is important to discuss why MORC2 does not make condensates in the cytoplasm (in the case of MORC2deltaNLS). In this regard, how does the concentration of overexpressed EGFP-MORC2 constructs compare with in vitro tested droplets of MORC2?

      We thank the reviewer for highlighting this important conceptual point. Although MORC2 readily undergoes phase separation in vitro under physiological buffer conditions, the absence of condensate formation in the cytoplasm of cells expressing MORC2ΔNLS underscores the importance of the nuclear environment in promoting MORC2 assembly.

      The cytoplasm differs fundamentally from the nucleus not only in overall molecular composition but also in the availability of high-valency scaffolds such as chromatin. We propose that chromatin-associated components, particularly DNA, provide a platform that locally concentrates MORC2 and increases its effective valency, thereby facilitating nucleation or stabilization of condensates in the nucleus. In contrast, the cytoplasm lacks such scaffolds, even when MORC2 is expressed at appreciable levels. In cultured cells, MORC2 is seldom observed in the cytoplasm. While specific experimental contexts may facilitate its cytoplasmic localization, such observations are rarely reported [6]. In transfection-based systems, MORC2 predominantly displays droplet-like behavior in the nucleus. Notably, in endogenous EGFP–MORC2 chimeric mice, we detected punctate MORC2 structures in the neuronal cytoplasm of the brain and spinal cord. The functional significance and biophysical state of cytoplasmic MORC2 remain largely unexplored.

      With respect to protein concentration, while EGFP-MORC2 is robustly expressed in cells, direct comparison between cellular expression levels and the protein concentrations used in vitro is inherently challenging. Importantly, in vitro phase separation is driven by bulk protein concentration under defined conditions, whereas in cells, effective local concentration and interaction valency are strongly shaped by spatial confinement and chromatin association. We have revised the manuscript text to emphasize this distinction and to avoid interpreting nuclear specificity as a purely concentration-dependent phenomenon.

      (c) Lines 227-228: '... CW domain restricts condensate overgrowth or fusion', this inference is based on CTDdeltaCW puncta being larger in size (Figure 3a). However, in Figure 4h MORC2deltaIDRb and MORC2deltaIDRc also result in larger puncta. Making a final conclusion that the CW domain restricts condensate overgrowth or fusion warrants additional investigation.

      We thank the reviewer for pointing out the limitation of our original conclusion. We agree that the enlarged puncta in both CTDΔCW (Figure 3a) indicate that condensate size regulation involves the CW domain was insufficiently rigorous.

      Re-analysis of existing data identifies clear phenotypic disparities between the mutants: MORC2ΔIDRb/ΔIDRc mutants show two distinct phenotypes (reduced puncta number with enlarged size, or unchanged puncta number with uniform enlargement), and their total puncta area per cell is comparable to the WT. By contrast, CTDΔCW mutants display markedly larger puncta relative to the WT. Based on this distinction, we have revised our conclusion to a more cautious formulation: "These observations suggest that the CW domain may participate in regulating initial nucleation size and the exact molecular mechanisms require further investigation."

      (4) MORC2 condensate-mediated gene silencing:

      This is one of the key investigations of this study where the authors evaluate the ability of MORC2 condensates to regulate gene silencing (transcriptional repression). The major concern here is that the authors are drawing their conclusion based on a CC3 domain deletion mutant of MORC2 and comparing it with wild-type MORC2. Notably, the CC3 domain is responsible for MORC2 dimerization, and as the authors quote, 'The dimeric assembly of CC3 is essential for maintaining the structural integrity of the protein', the absence of CC3 would have a direct impact on its function (such as ATPase activity). With these considerations, it is not clear whether the effect of CC3 domain deletion on gene regulation is an effect of no phase separation or a consequence of loss of function. This necessitates additional validation by including other controls, such as IBD domain deletion mutant, IDRa domain deletion mutant, where the phase separation is impeded without affecting dimerization.

      We appreciate the reviewer’s concern regarding the interpretation of CC3 deletion experiments. We agree that CC3 deletion affects both dimerization and phase separation, complicating attribution of gene regulatory effects solely to condensate formation. Our intention was not to claim that loss of repression arises exclusively from impaired phase separation, but rather to demonstrate that disrupting condensate-dynamic capacity correlates with impaired silencing.

      To directly address these concerns, we have performed a series of new experiments specifically designed to decouple condensate formation, condensate dynamics, and protein abundance, thereby allowing us to more rigorously interrogate the functional relevance of MORC2 condensates.

      First, to overcome the limitation of domain deletions which may affect MORC2 function beyond phase separation we introduced a micropeptide-based kill switch (KS) to the C terminus of MORC2. This strategy has recently emerged as a powerful approach to selectively reduce condensate dynamics without disrupting protein expression, folding, or domain architecture [1]. Importantly, unlike CC3 or IDRa deletions, MORC2+KS robustly form nuclear condensates but exhibits markedly reduced internal dynamics, as demonstrated by FRAP analyses showing minimal fluorescence recovery after photo bleaching (Fig. 6a-c). This strategy therefore allows us to perturb condensate material properties independently of MORC2 domain integrity.

      Second, we systematically compared the transcriptional consequences of rescuing MORC2-knockout HeLa cells with MORC2FL, condensation-deficient mutants (ΔCC3 and ΔIDRa), and the dynamics-defective MORC2+KS (Fig. 6d). Despite being expressed at substantially higher levels than MORC2FL (Fig. 6e), all three mutants showed a striking and consistent failure to restore MORC2-dependent transcriptional regulation (Fig. 6f-h). This effect was particularly pronounced for transcriptionally repressed genes, including two sets of high-confidence MORC2 targets reported in prior studies (Fig. 6i and Fig. S10). These findings demonstrate that neither increased protein abundance nor the mere presence of condensate-like structures alone is sufficient to restore MORC2 function.

      Third, our data instead support a model in which both soluble MORC2 complexes and dynamic MORC2 condensates are required for full transcriptional activity. While soluble MORC2 is likely involved in target recognition and complex assembly, our results indicate that proper condensate formation and critically, condensate dynamics are essential for effective transcriptional repression and activation. The inability of the MORC2+KS mutant to rescue transcriptional defects, despite intact condensate formation, points away from a model in which MORC2 condensates represent only microscopically visible byproducts of MORC2 activity.

      We believe these new data strengthen the manuscript by pairing the detailed mechanistic dissection of MORC2 phase separation with direct functional evidence, enhancing the conceptual impact and biological significance of the study.

      (5) Uncertain impact of pathogenic MORC2 mutations:

      Line 356-365: While the statements such as "disease-associated mutations primarily affect enzymatic and phase behaviors rather than DNA affinity" and "these findings provide mechanistic insight into how specific mutations may contribute to distinct pathological outcomes" are conceptually compelling, the data presented in Figure 7b-d do not appear to fully support these conclusions. For many of the mutants, the differences from WT across key parameters-condensation, ATPase activity, and DNA binding-are either modest or statistically insignificant. As such, drawing a unified mechanistic conclusion from these datasets may overstate what the data actually support.

      We agree that the effects of disease-associated MORC2 mutations described in Fig. 7 are modest and, in some cases, statistically insignificant. Our intention was to document observable trends rather than to propose a unified mechanistic framework. We have revised the manuscript to temper these conclusions and to emphasize the descriptive nature of these data.

      (6) Important conceptual clarifications:

      (a) Intrinsically disordered regions (IDRs) are not synonymous with phase separation. As the authors show, it is a combination of IDR-mediated interactions and CC3 dimerization that contributes towards the phase separation of MORC2. While IDRs can act as scaffolds for multivalent weak interactions that may promote biomolecular condensate formation, many IDRs serve other roles-such as mediating transient interactions, signaling, or regulatory functions-without undergoing phase separation. Researchers should avoid generalizing the assumption that the mere presence of IDRs in a protein implies its ability for phase separation. In this regard, authors should consider restructuring some of their generalized statements: Line 87-88: 'Recent studies suggest that intrinsically disordered regions (IDRs) can drive liquid-liquid phase separation (LLPS)' and Line 159-161: 'we noticed a long unstructured region at its C-terminus (Fig. S1b), a characteristic often associated with proteins capable of phase separation'.

      We agree that IDRs are not synonymous with phase separation and have revised the Introduction to avoid generalized statements. The revised text now emphasizes that IDRs can contribute to phase separation in a context-dependent manner and act in concert with structured oligomerization domains such as CC3-IBD.

      (b) Liquid-liquid phase separation: I would suggest switching the phrase to just phase separation. The rationale is that the in vitro studies of MORC2 (FRAP, droplet imaging) do not show liquid-like behavior, but perhaps liquid-solid. The FRAP studies suggest liquid-like behavior for some of the constructs. Given the differences in viscoelastic properties across the in vitro and in cellulo studies, it is better to generalize to "phase separation". Movies for droplet fusion and FRAP, wherever applicable, would be much appreciated. As the nature of in vitro MORC2 droplets appears different than in cells, movie representations of the above would enable readers to better assess the viscoelastic nature of the droplets (whether liquid, gel, etc).

      We appreciate the reviewer’s insight regarding the viscoelastic properties of MORC2. Our experimental data indeed show a disparity in dynamics between the two environments: while in vitro MORC2-FL condensates exhibit relatively low internal mobility, the in cellulo MORC2-FL puncta display high dynamics, characterized by rapid internal recovery in FRAP assays and droplet fusion events (Fig. S2f).

      This contrast suggests that the intracellular microenvironment plays a critical role in regulating the material state of MORC2 condensates. Consequently, we have focused on providing in vivo fusion data, as we believe in vitro characterizations (such as fusion or FRAP under various artificial conditions) may not faithfully represent the physiological behavior of MORC2. We have revised the manuscript to use the more general term “phase separation” or “condensation” and have added a discussion on these limitations to avoid overinterpreting the material properties observed in vitro.

      (7) Methods:

      (a) Figure 6 S2b: If phase separation occurs at, say, 1.8 µM protein concentration, this indicates that the protein has reached its saturation concentration (c-sat). Beyond c-sat, any additional protein should partition into the dense phase, while the concentration of the dilute phase remains constant. However, in this figure, the dilute phase concentration appears to increase with increasing total protein concentration, which is inconsistent with expected phase separation behavior. As the methods section does not have any sub-section for the sedimentation assay, it becomes difficult to understand how this experiment was performed, whether there is any technical discrepancy in the way soluble and pellet fractions were handled and processed for loading onto the gels. This is also the case with Figure 3d.

      We thank the reviewer for carefully examining the sedimentation assay and for raising this important conceptual point. We agree that, for an ideal two-phase system at thermodynamic equilibrium, the concentration of the dilute phase is expected to remain constant once the saturation concentration (c-sat) is reached.

      In our study, the sedimentation assay was used as an operational readout to assess concentration-dependent partitioning rather than to quantitatively define equilibrium phase boundaries. The assay involves centrifugation-based separation of supernatant and pellet fractions followed by SDS–PAGE analysis, and therefore does not necessarily report the equilibrium concentrations of coexisting dilute and dense phases. In particular, this approach can be influenced by incomplete physical separation of phases, kinetic trapping, and redistribution of material during handling, especially in systems where condensate maturation or internal reorganization occurs on longer timescales.

      Consequently, the apparent increase in the supernatant fraction with increasing total protein concentration likely stems from kinetic limitations and inherent technical constraints of the sedimentation assay, rather than a genuine deviation from classical phase separation behavior. These caveats are now explicitly clarified in the Methods section, with similar limitations of centrifugation-based assays for defining equilibrium phase behavior of biomolecular condensates reported previously.

      (b) Figure 4: The NMR comparisons appear to be primarily qualitative, lacking quantitative analyses such as chemical shift perturbation (CSP) and intensity ratio plots, which would offer deeper mechanistic insights. The NMR spectra detailing interactions among the IDR domains need to be quantified.

      We thank the reviewer for the suggestion. We have now performed quantitative CSP analyses for the NMR data shown in Fig. 4, and the corresponding CSP plots have been added to the revised manuscript (Fig. S7).

      As expected for interactions mediated by intrinsically disordered regions involved in phase separation, the observed CSPs are generally small. Notably, the CSP profile of IDRa closely matches that observed for the full-length IDR, whereas IDRb and IDRc show minimal perturbations. These results indicate that the interaction is primarily mediated by IDRa, with little contribution from the remaining regions.

      Peak intensity analyses were also examined but did not reveal additional residue-specific trends. Together, the quantitative CSP data support our conclusion that the interaction is weak, dynamic, and region-specific, consistent with an IDR-driven, phase-separation-related mechanism. We add this statement in method: CSPs were calculated in Hz at 600 MHz using the following equation:

      Minor comments:

      (1) Line 59-60: The Authors mention the HUSH-complex and then the MORC protein family, but do not discuss the relation between the two.

      We thank the reviewer for this comment. We have revised the Introduction to explicitly state that MORC2 may serve as a component of the HUSH complex and to clarify the functional relationship between MORC family proteins and HUSH-mediated transcriptional repression.

      (2) Line 74: 'Despite their structural similarities...', similarities between what all?

      We agree that this statement was ambiguous. We have revised the text to explicitly specify that the comparison refers to structural similarities among MORC family members.

      (3) Line 75: 'MORC-mediated repression remains...', this is the first time the word 'repression' is mentioned in the text and directly as an outstanding question.

      We have revised the Introduction to introduce the concept of transcriptional repression earlier and to provide appropriate context before posing it as an outstanding question.

      (4) The third paragraph does address issues in comments 1 and 3 to some extent, but the introduction needs some restructuring to provide a proper flow of information.

      We agree that the Introduction required restructuring. We have revised this section to improve logical flow, better integrate prior studies, and more clearly articulate the motivation and scope of the present work.

      (5) Line 83-85: How does the presence of IDRs suggest potential regulatory mechanisms?

      We have revised this sentence to clarify that IDRs may contribute to regulatory mechanisms by enabling multivalent and dynamic interactions, rather than implying that IDRs inherently confer regulatory function or phase separation capability.

      (6) Line 106-107: 'To determine whether MORC2 has N- and C-terminal dimerization interfaces similar to those...', reference 14 has already established that CC3 (denoted as CC4 in ref 14) is responsible for dimerization. Consider acknowledging their work in this regard?

      We thank the reviewer for this reminder. We have now explicitly acknowledged Ref. 14, which previously established the role of CC3 (denoted CC4 in that study) in MORC2 dimerization.

      (7) Lines 117-122: Are the authors comparing morphology from negative stain EM with AlphaFold predicted structure (Figure S1a and S1b)? If so, providing a zoomed-in inset from Figure S1a would be helpful.

      Yes, the comparison was intended to relate the negative-stain EM morphology to the AlphaFold-predicted architecture. We have added a zoomed-in inset in Fig. S1a to facilitate clearer comparison.

      (8) Line 152-153: '...even under varying physiological conditions', what are these varying conditions? Are the authors trying to point towards any of their specific results?

      We have revised this phrase to explicitly refer to variations in salt concentration and protein concentration tested in our in vitro assays.

      (9) Line 154-155: 'The dimeric assembly of CC3 is essential for maintaining the structural integrity of the protein', if it has been established, then please provide a reference.

      We thank the reviewer for this suggestion. For MORC family proteins, C-terminal coiled-coil–mediated dimerization is necessary for correct homodimer formation and functional stability (Xie et al., 2019, Cell Commun Signal. 17:160, Ref 14 in the revised manuscript).

      (10) Line 159-161: 'we noticed a long unstructured region at its C-terminus (Figure S1b), a characteristic often associated with proteins capable of phase separation25.', again authors are generalizing a statement which is, in most cases, context-dependent. For example, ref 25 mentions that unstructured regions or IDRs serve as a scaffold for multivalent interactions.

      We agree with the reviewer and have revised this sentence to avoid generalization. The revised text now emphasizes that IDRs may facilitate multivalent interactions in a context-dependent manner, rather than being intrinsically indicative of phase separation. Additionally, we have explicitly cited the mechanistic insight from Reference 25 that IDRs serve as scaffolds for multivalent interactions, to strengthen the logical link between the structural feature and its potential functional relevance.

      (11) Methods section for NMR (Line 665-667) mentions that nucleotides were added to a final concentration of 10 mM. There is no figure or section for MORC2 NMR with added nucleotides/DNA.

      We thank the reviewer for pointing this out. The nucleotide (ATP) addition was part of preliminary NMR trials and is not directly associated with the figures presented. We have deleted this in the Methods section to avoid confusion.

      (12) Line 285-294: Authors compare the effect of DNA binding on the phase separation of both MORC2FL and MORC2 CTDdeltaCW and conclude that DNA-induced condensation is primarily mediated through interactions with the IDR-NLS region. This appears not to be backed by proper control experiments. The authors do not show whether DNA binding mediates any phase separation for the isolated NTD or not? Similarly, what is the effect of DNA binding on MORC2 deltaIDR?

      We thank the reviewer for this insightful comment and agree that additional controls are essential for rigorously dissecting the contribution of DNA binding to MORC2 phase separation. Our interpretation that DNA-enhanced condensation is primarily mediated through the IDR–NLS region was based on comparative analyses of MORC2FL and MORC2 CTDΔCW, together with EMSA results demonstrating that DNA binding activity is conferred by the IDR–NLS–containing region. We acknowledge, however, that DNA binding alone is not sufficient to infer phase separation behavior.

      To address this point, we have performed additional analyses using the isolated NTD’ (residues 1–536) and MORC2 ΔIDR–NLS mutants (Fig. S6). The isolated NTD’ exhibited detectable DNA binding [4] but did not undergo DNA-induced condensation under conditions while MORC2FL or MORC2 CTDΔCW (residues 537-1032) readily formed condensates, indicating that DNA binding by itself is insufficient to drive phase separation. In parallel, MORC2 ΔIDR–NLS mutants showed severely compromised solubility and stability in vitro, which limited their quantitative characterization in phase separation assays. Nevertheless, under the conditions tested, these mutants did not display DNA-enhanced condensation comparable to MORC2FL.

      Taken together, these observations support a model in which the IDR–NLS region plays a critical role in coupling DNA binding to condensation, while additional domains are required to sustain robust phase separation. We have revised the manuscript text to clarify the experimental scope and to avoid overinterpreting the contribution of DNA binding in the absence of fully reconstituted control systems.

      (13) How did the authors assign the backbone amide NMR chemical shifts for MORC2?

      Backbone assignments of MORC2 IBD (1004-1032) were obtained using SOFAST versions of standard triple-resonance experiments, including HNCACB and CBCACONH, recorded at 298 K. Residual assignment ambiguities were resolved using [15] N-edited HMQC-NOESY-HMQC spectra.

      (14) Line 256: 'The partial compaction of IDRa...', what does the author mean here with 'partial compaction'? How did they measure compaction here?

      Regarding the term “partial compaction” mentioned previously, we apologize for the typographical error this phrase was erroneously used in place of “key component”.

      (15) Line 312-315: Why is there even a MORC2 readout for MORC2 KO cells with only EGFP? Also, the authors suggest that IDR deletion may impair mRNA stability or transcription; however, the expression levels of MORC2 deltaIDR and MORC2 deltaCC3 do not appear drastically different in Figure 3a.

      We thank the reviewer for raising these points. The apparent MORC2 signal in MORC2 knockout cells transfected with EGFP alone is due to the presence of residual MORC2 mRNA. Although CRISPR–Cas9–mediated knockout introduces a frameshift that prevents MORC2 protein expression, the mRNA can still be detected by RNA-seq. This is because nonsense-mediated decay (NMD), which targets transcripts with premature stop codons for degradation, is not always 100% efficient. Therefore, some MORC2 transcripts remain and produce detectable RNA-seq reads, even though no functional protein is expressed.

      Regarding the apparent discrepancy in expression levels, Fig. 3a displays only EGFP-positive cells, within which the fluorescence intensity of MORC2ΔIDR and MORC2ΔCC3 appears comparable to that of WT MORC2. However, the overall fraction of EGFP-positive cells is markedly reduced for these mutants compared to WT. Thus, while expression levels among successfully transfected cells are similar, fewer cells express detectable levels of the ΔIDR or ΔCC3 constructs across the total population. We therefore interpret this reduction in EGFP-positive cell fraction as reflecting impaired expression efficiency of these mutants, potentially arising from altered transcriptional output, mRNA stability, or protein stability. We have revised the manuscript text to clarify this distinction and to avoid overinterpreting the underlying mechanism in the absence of direct measurements.

      Author response image 1.

      EGFP, EGFP–MORC2 (FL), EGFP–MORC2 (ΔCC3), and EGFP–MORC2 (ΔIDR) were re-expressed in MORC2-knockout HeLa cells. Confocal imaging revealed that full-length MORC2 formed condensates in the nucleus, whereas mutants lacking either the CC3 or IDR domain failed to exhibit such behavior. Notably, under identical experimental conditions, we observed a marked reduction in the transfection efficiency of the EGFP-MORC2 (ΔIDR) construct. In contrast to the other variants, EGFP signals for ΔIDR were detectable in only a small fraction of the total cell population, despite consistent DNA loading and protocol synchronization. This observation suggests that the IDR might be required not only for biomolecular condensation but also for maintaining the steady-state levels of the MORC2 mRNA/protein or overall cellular fitness.

      (16) Line 330: 'MORC2 deltaCC3 failed to repress any of the 18 downregulated targets...'. This does not appear to be entirely true as repression of some targets (LBH, TGFB2, GADD45A) are closer to MORC2 FL than the EGFP control.

      We thank the reviewer for pointing out this inconsistency and for highlighting the need for precise wording. We have updated the dataset and revised the text to describe the results more accurately. We now describe that the mutants impair MORC2FL-mediated transcriptional regulation, consistent with the overall trend observed across these target genes.

      (17) Line 347-350: Based on the percent of cells with condensates, the authors conclude that CMT2Z-linked E236G and SMA-linked T424R mutants promote MORC2 phase separation. Again, the effect of these mutations on MORC2 condensation in cells may be direct or indirect. This can be investigated by comparing the in vitro effect of these mutations on MORC2 phase separation.

      We thank the reviewer for raising this important point and fully agree that the effects of disease-associated MORC2 mutations on condensate formation in cells may arise from either direct alteration in intrinsic phase separation propensity or indirect influences mediated by the cellular environment.

      In our study, disease-associated MORC2 mutants were assessed for condensate formation in HEK293F cells. Attempts were made to characterize these mutants in vitro; however, the E236G mutant exhibited markedly reduced solubility and stability upon purification, which precluded reliable in vitro phase separation analysis. We therefore evaluated the impact of E236G in cells and found that this mutation significantly impaired the dynamics of nuclear MORC2 condensates. For the T424R mutant, we note that its intracellular condensates displayed FRAP recovery kinetics comparable to those of WT MORC2, suggesting broadly similar dynamic properties of the assemblies formed in cells, but not necessarily implying a direct enhancement of intrinsic phase separation.

      In light of these considerations, we have revised the text in Lines 347–350 to avoid attributing a direct causal role of these mutations in promoting MORC2 phase separation. Instead, we now describe the observed increase in the fraction of cells containing condensates as a descriptive cellular correlation. We further emphasize that systematic in vitro characterization of disease-associated MORC2 mutants will be required to distinguish direct from indirect effects and represents an important direction for future investigation.

      (18) The discussion section lacks referencing to individual figures in the results section as well as previous literature.

      We agree with the reviewer that the Discussion would benefit from clearer integration with both the Results figures and prior literature. In the revised manuscript, we have substantially restructured the Discussion to explicitly reference key figures when interpreting experimental findings and to more clearly distinguish conclusions drawn from specific datasets. In addition, we have expanded citations to previous studies where relevant, particularly in the context of MORC2 DNA binding, ATPase regulation, chromatin association, and disease-linked mutations. These revisions aim to better situate our findings within the existing literature and to guide readers more clearly between experimental observations and their interpretation.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Zhang et al. demonstrates that MORC2 undergoes liquid-liquid phase separation (LLPS) to form nuclear condensates critical for transcriptional repression. Using a combination of in vitro LLPS assays, cellular studies, NMR spectroscopy, and crystallography, the authors show that a dimeric scaffold formed by CC3 drives phase separation, while multivalent interactions between an intrinsically disordered region (IDR) and a newly defined IDR-binding domain (IBD) further promote condensate formation. Notably, LLPS enhances MORC2 ATPase activity in a DNA-dependent manner and contributes to transcriptional regulation, establishing a functional link between phase separation, DNA binding, and transcriptional control. Overall, the manuscript is well-organized and logically structured, offering mechanistic insights into MORC2 function, and most conclusions are supported by the presented data. Nevertheless, some of the claims are not sufficiently supported by the current data and would benefit from additional evidence to strengthen the conclusions.

      Thank you for your insightful review and constructive suggestions, which have been invaluable in refining our manuscript.

      The following suggestions may help strengthen the manuscript:

      Major comments:

      (1) The central model proposes that multivalent interactions between the IDR and IBD promote MORC2 LLPS. However, the characterization of these interactions is currently limited. It is recommended that the authors perform more systematic analyses to investigate the contribution of these interactions to LLPS, for example, by in vitro assays assessing how the IDR or IBD individually influence MORC2 phase separation.

      We appreciate the reviewer’s insightful comment regarding the characterization of IDR–IBD interactions. In this study, we combined NMR spectroscopy, domain deletion analysis (in vivo), and in vitro phase separation assays to demonstrate that interactions between the IDR and IBD contribute to MORC2 condensate formation. To systematically assess the individual contributions of the IDR and IBD to MORC2 phase separation, we performed in vitro reconstitution assays using purified domain constructs (Fig. S6). Neither the isolated IDR nor the IBD alone exhibited phase separation under buffer conditions approximating the physiological environment, indicating that each domain is individually insufficient to drive condensation. Upon the addition of 10% PEG8000, phase separation was selectively observed for the IDR but not for the IBD, suggesting that the IDR possesses an intrinsic propensity for phase separation that can be enhanced by crowding molecular. Importantly, when the IDR and IBD were mixed, phase separation was robustly induced, supporting a model in which cooperative inter-domain interactions between the IDR and IBD promote MORC2 condensation. In the absence of PEG, no phase separation was observed for the IDR–IBD mixture. These observations imply that IDR–IBD interactions cannot drive phase separation on their own, but require cooperation with CC3-mediated dimerization to achieve this process, which is the central point we wish to emphasize.

      (2) The authors mention that DNA binding can promote MORC2 LLPS. It is recommended that they generate a phase diagram to systematically assess how DNA influences phase separation.

      We agree that constructing a full phase diagram would provide a more systematic evaluation of the effect of DNA on MORC2 phase separation. In the current study, we assessed DNA-dependent condensation across multiple protein and DNA concentrations, which consistently showed that DNA enhances MORC2 phase separation. At low protein concentration (0.5 µM), phase separation requires sufficient DNA, whereas increasing either DNA or protein concentration promotes liquid droplet formation. At high DNA and protein concentrations, amorphous structures dominate, indicating a transition away from dynamic assemblies. We have clarified this point in the Results and Discussion sections and now note that a comprehensive phase diagram analysis represents an important direction for future work.

      (3) The authors use the N39A mutant as a negative control to study the effect of DNA binding on ATP hydrolysis. Given that N39A is defective in DNA binding, it could also be employed to directly test whether DNA binding influences MORC2 phase separation.

      We thank you for your constructive suggestions. The purified wild-type MORC2(1–603) exhibited weak but detectable ATPase activity, whereas the N39A mutant was completely inactive [5]. Based on this characteristic, the N39A mutant was used as a negative control for the ATP-binding-deficient mutant in this study [3]. However, no evidence has been provided to demonstrate that the N39A mutant is defective in DNA binding. Importantly, both our results and previous studies [5-6] indicate that MORC2 engages DNA via multiple domains, suggesting that a single-point mutation is unlikely to significantly compromise its overall DNA-binding capacity.

      (4) Many of the cellular and in vitro LLPS experiments employ EGFP fusions. The authors should evaluate whether the EGFP tag influences MORC2 phase separation behavior.

      We appreciate the reviewer’s concern regarding the potential influence of the EGFP tag. The use of EGFP fusions in our study was primarily to maintain consistency with the in-cell experiments. Importantly, we confirmed that EGFP alone does not undergo phase separation in cells, and this observation is consistent with previous studies [7]. Additionally, in vitro phase separation of MORC2 was independently validated using Cy3–labeled CTD (Fig. S5), which recapitulated the condensate formation seen with EGFP-fused protein. Together, these results indicate that the EGFP tag does not significantly influence MORC2 phase separation, supporting the validity of our conclusions.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors claim to have obtained nucleic acid-free protein, but no data are provided to support this assertion. It is recommended that they include appropriate validation to confirm the absence of nucleic acids.

      We thank the reviewer for highlighting this point. To validate that the purified MORC2 protein is indeed free of nucleic acid contamination, we have additional experimental evidence (e.g., A260/280 measurements, agarose gel analysis, or EMSA in Fig. 5), which has been added to the Methods section and Table S2.

      Note: Agarose gel analysis for MORC2 constructs to confirm the absence of nucleic acids. The pET32 vector as the positive control, the protein preparation for analysis is 0.05 mg. E means E. coli and H means HEK293F.

      (2) The FRAP recovery curves are not normalized to 0, making comparison difficult. The authors should normalize the post-bleach intensity to 0 and re-plot the curves to allow a more standard interpretation of mobile fractions.

      We agree with the reviewer and have now normalized the FRAP recovery curves by setting the post-bleach intensity to 0. The revised plots are presented in the Figures (2f, j, l; 6c, 7f), allowing for more direct comparison of mobile fractions across different conditions.

      (3) The HSQC spectra for IBD appear inconsistent: the peak positions in Fig. 4C do not align with those shown in panels D-F. The authors should verify the spectral assignments and ensure consistency across figures.

      We thank the reviewer for pointing this out. The apparent inconsistency arose from the fact that different spectral regions were displayed in Fig. 4c versus Fig. 4d-f for visualization purposes, which may have given the impression of mismatched peak positions. The spectral assignments themselves are consistent across all panels.

      To avoid confusion, we have now adjusted the spectral window shown in Fig. 4c to match that used in Fig. 4d-f. The revised figure ensures consistent presentation of the same spectral region across all panels.

      Reference:

      (1) Zhang, Y., Stöppelkamp, I., Fernandez-Pernas, P. et al. Probing condensate microenvironments with a micropeptide killswitch. Nature 643, 1107–1116 (2025).

      (2) Fendler NL, Ly J, Welp L, et al. Identification and characterization of a human MORC2 DNA binding region that is required for gene silencing. Nucleic Acids Res.53(4):gkae1273 (2025).

      (3) Tchasovnikarova, I., Timms, R., Douse, C. et al. Hyperactivation of HUSH complex function by Charcot–Marie–Tooth disease mutation in MORC2. Nat Genet 49, 1035–1044 (2017).

      (4) Douse, C. H. et al. Neuropathic MORC2 mutations perturb GHKL ATPase dimerization dynamics and epigenetic silencing by multiple structural mechanisms. Nat Commun 9, 651 (2018).

      (5) Tan, W., Park, J., Venugopal, H. et al. MORC2 is a phosphorylation-dependent DNA compaction machine. Nat Commun 16, 5606 (2025).

      (6) Sánchez-Solana B, Li DQ, Kumar R. Cytosolic functions of MORC2 in lipogenesis and adipogenesis. Biochim Biophys Acta. 1843(2):316-326 (2014).

      (7) Li, C.H., Coffey, E.L., Dall’Agnese, A. et al. MeCP2 links heterochromatin condensates and neurodevelopmental disease. Nature 586, 440–444 (2020).

    1. Reviewer #4 (Public review):

      Summary:

      In this manuscript, the authors present data describing the development of a model of ALS in rhesus macaques. They use a viral intersectional model to overexpress TDP-43 in a population of motor neurons and then study the spread of the pathology about 7 months later. They demonstrate that both the cervical spinal cord and motor cortex (new and old M1) are full of TDP-43, suggesting that the pathology spreads from the single motor pool to presumably related neurons.

      Strengths:

      This is a super-important study in two main ways:

      (1) This could be the birth of a really important model, one that is really needed for making progress in understanding ALS and the development of therapeutics. There are shortfalls with all the rodent models. Models dependent on cell cultures are superb for understanding cell-autonomous processes, but miss out on connectivity, particularly the long-range connectivity. Organoids may ultimately prove to be beneficial, but they would need cortex, spinal cord, and muscle, and translatability from them is not assured. So a NHP model is needed, and this may be it. Furthermore, the Methods are meticulously described and will undoubtedly facilitate reproducibility.

      (2) The concept of the spread of pathology has been proposed for some time, I think, based initially on the detailed clinical observations of Ravits and colleagues. The authors have looked at this directly and provide supporting evidence for this interesting hypothesis. They show spread locally and contralaterally in the spinal cord (although a figure would be nice) and to the motor cortex.

      Taking only these 2 points into account is more than sufficient for me to be enthusiastic about this work.

      Weaknesses:

      I'd like to make a couple of points that if addressed, could, in my view, help the authors strengthen this work.

      (1) We don't know how many MNs were transduced by the rAAV. There was no tdTom expression, for whatever reason. The authors show an image of a control experiment with a single MN transduced, but there should be a red motor pool, at least in the control experiments. The impression that I get is that very few were transduced, and, in my mind, this makes the findings even more interesting - maybe you don't need many "starter" MNs.

      (2) Continuing on this point, this leads the authors to conclude that all BR MNs have died. They support this by the reduced MN count (see point 3). Firstly, do we know how many BR MNs there are in the rhesus macaque, and does the reduction seen correspond to this number? Secondly, and more importantly, the muscle looks normal on MRI at 28 weeks - it does not look like a denervated muscle. The authors state that it has maybe been reinnervated, but by what, if all the BR MNs are dead? This does not seem like a plausible explanation to me. Muscle histology, NMJs, and fibre typing would have been useful to understand what's going on with the MNs. (And electrophysiology would have been wonderful, but beyond the scope of this study.)

      (3) Some MN biologists, like me, fuss a lot about how to count MNs, which is almost as difficult as counting the number of angels on the head of a pin. Every method has its problems. Focusing on the two methods here: (a) ChAT immunohistochemistry is pretty good in healthy states, but we don't know what happens to ChAT expression in different diseases, particularly when you have a new model. If its expression is decreased, then it is not a good marker for MNs; (b) Identifying MNs based on the size and morphology of neurons in the ventral horn is also insufficient. For example, ~30% of neurons in a typical pool are small gamma MNs, and a significant proportion (depending on the muscle) of the remainder will be small alpha MNs. So what one is counting is, at best, the large alpha MNs, not all the MNs in a pool. And in ALS, it's these largest MNs that are affected at the earliest stages. The small ones might be fine. So results will be skewed. (Hence, it would be interesting to see if the muscle had a higher proportion of Type I fibres after being reinnervated by S-type MNs.)

      (4) Statistics. These are complex experiments looking at the spread of a disease. The experimental unit is therefore the monkey, n=2. In each monkey, multiple sections are analysed, which are key technical replicates and often summative. For example, do we care about the average cell number in Figures 4D, E, 5 I, J or 6G, H, or rather the total cell number? Do the error bars mean anything? To be clear, I am by no means minimising the importance of the overall convincing findings. But I do not think this statistical analysis is particularly meaningful.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      The authors have used a macaque (two animals only) to follow the migration of 'seeded' TDP43 protein in neuronal pathways - thus mimicking the spread of ALS in the human CNS. Previous experiments in rodents failed to demonstrate this, posing interesting and important biological differences, possibly related to the UMN-LMN system in higher order apes and humans. 

      Strengths: 

      An important step forward. 

      Weaknesses: 

      No weaknesses were identified by this reviewer. Only 2 animals were used, but that is appropriate given the sensate status of the macaque. In the opinion of this reviewer, the results are entirely convincing. 

      Reviewer #2 (Public review): 

      Summary: 

      There are astonishingly few papers trying to reproduce the process of initiation and spreading that Braaks studies have suggested and postulated. The authors should be applauded for pioneering such a difficult experiment. They overexpressed the TDP-43 protein in the motor neuron pool of the brachioradialis muscle and showed that by this technique, motor neurons in this pool died, and the muscle got denervated. They had evidence of a spreading process from the spinal cord to the cortex, demonstrated by showing widespread deposits of phosphorylated TDP-43 bilaterally in the cervical cord and the motor cortex. By their experiment, they created a dying-backwards model, not a model of corticofugal spread, like that shown by Braak. No muscle weakness was observed, not even in the brachioradialis. 

      Strengths: 

      The strength of this innovative study is the fact that this spreading experiment uses the phylogenetically young connectome of primates (macaques). They also made the thought-provoking observation of spreading from the cord to the motor cortex, not the corticofugal spread model observed by Heiko Braak. This is thought-provoking because this enables the observer to compare their model with the findings in humans. 

      Weaknesses: 

      The following aspects are not a weakness but need to be better explained for the interested reader - and potentially improved in future studies for which the authors laid the foundation: 

      (1) Why do the authors use the brachioradialis motor neuron pool to overexpress TDP-43? More is known about other muscles and how they are embedded in the motor connectome of primates. Why not the biceps brachii or the hand extensors or - even better - the small muscles of the hand? These are known to be strongly monosynaptically connected with the motor cortex. The authors should explain this. I am unclear if there was a specific reason which I did not see or understand. In my view, the brachioradialis is not the best representative of the primate connectome, for example, to examine this model and compare it with the corticofugal spread. 

      The brachioradialis muscle was chosen primarily for reasons of animal welfare; our concern when designing the experiments was that the muscle we chose for injection might become very wasted and weak before the experiment had been completed. If we had injected a hand muscle, this would have affected manipulation, feeding and grooming behaviours, whereas had we injected biceps brachii or forearm extensors, this would have affected more important behaviours requiring strength for body support in the home cage (e.g. climbing, swinging, etc.). The advantage of choosing brachioradialis is that there is some functional redundancy; in macaques, compared to biceps brachii, brachioradialis has a relatively minor role in elbow flexion and supination of the forearm. We therefore reasoned that there should be physiological compensation for any weakness in brachioradialis, and thus minimal effects on normal behaviour.

      A secondary practical consideration was the importance of good quality MR imaging of the injected muscle and the positioning of the focussing coil; because of the physical constraints related to the monkey sitting in our narrow-bore scanner, the forearm muscles were the optimal choice. 

      With reference to the ‘primate connectome’, whilst hand muscles are known to have strong cortico-motoneuronal connections, we have shown previously that monosynaptic corticomotoneuronal connections are as strong in muscles innervated by the deep radial nerve (like brachioradialis) as in intrinsic hand muscles (Witham et al, 2016).

      Finally, for the purposes of these experiments, all we required was a method for inoculating TDP-43 into a motor neuron pool within the spinal cord, without direct surgical trauma to the spinal cord. Our aim was to test the hypothesis that extracellular TDP-43 is sufficient to cause spreading neuronal changes in macaque, similar to those observed in human ALS/MND; our aim was not to replicate the actual pattern of human MND observed clinically.

      These points will be addressed in a revised version of the manuscript. 

      (2) In the Braaks experiment, only (seemingly soluble) non-phoshorylated TDP-43 "crossed" synapses. Phosphorylated TDP-43 did not do this. The authors of this study saw phosphorylated TDP43 in motor neurons and the cortex. Is there any potential explanation for how it crosses synapses? If it really does, there is an obvious difference to the human situation which needs to be emphasized and explained (in the future). 

      To clarify, there was no evidence of phosphorylated TDP-43 crossing synapses. It is more likely that excess non-phosphorylated TDP-43 crossed synapses, and that this then subsequently led to TDP-43 phosphorylation.  

      (3) There were significant deposits of phosphorylated TDP-43 in oligodendrocytes in humans. Whilst I understand that one experiment cannot solve every question - I am curious about whether the authors saw anything in oligodendrocytes? 

      We have not looked at this.

      (4) Which was the pattern of damage? Of course, this pattern is not likely to have a monosynaptic pattern - like in humans........but was there a pattern? Did it have a physiologically meaningful basis? Was there any relation to the corticofugal monosynaptic pattern? What are the differences? The authors speak of "multiple waves". Does this mean that if this were a corticofugal model, for example, oculomotor neurons would also degenerate? 

      The description of ‘multiple waves’ in paragraph 2 of the discussion section is entirely hypothetical, based on the assumption that there are different mechanisms by which TDP-43 spreads through the nervous system, from slow local spread by diffusion to more rapid long-range axonal spread to widely separated regions. For the neuropathological staging analysis, we therefore looked at different brain regions (hypoglossal nuclei, reticular formation, inferior olives, frontal cortex, temporal cortex and hippocampal formation). This analysis only showed loss of motor neurons in the spinal cord ipsilateral to the side of the muscle injections, in segments consistent with the location of brachioradialis motoneurons. We did not demonstrate a ‘pattern of damage’ as described in humans in our experiments because this is a pre-symptomatic pre-clinical model, with no established ‘damage’ from each wave. We speculate that this is because animals were terminated too early in the disease process.

      However, whilst there was no established neuronal degeneration outside the cervical spinal cord, the observation that there were more pTDP-43 positive Betz cells in left (contralateral to the brachioradialis injection) New M1 than Old M1 (see Figure 6I and J) would support spread via monosynaptic connections to motoneurons; New M1 is where most monosynaptic cortico-motoneuronal connections originate.

      Reviewer #3 (Public review): 

      Summary: 

      In this paper by Jones and colleagues, a non-human primate model is described in which wild-type TDP-43 is expressed in the cervical spinal cord. This gave rise to loss of motor neurons in the ventral horn at that level in the cervical spinal cord. MRI of the muscles allowed to see increased intensity in the mostly affected brachioradialis muscle, suggesting this muscle becomes denervated. At the neuropathological level, TDP-43 and pTDP-43 staining in the cytoplasm is increased, not only at the specific level of the cervical spinal cord, but also at a distance. 

      Strengths: 

      A clear strength is the state-of-the art focal expression of the TDP-43 transgene at a focal site in the cervical spinal cord. This is achieved by combining a general expression of a flipped loxP flanked TDP-43 vector using AAV9 intrathecal administration, followed by an intramuscular AAV2 hSyn CRE-TdTomato vector in the brachioradialis muscle in order to induce focal recombination and expression of TDP-43 in motor neurons innervating this muscle on one side. 

      Another strength is the non-human primate background, which is much closer to the human situation. 

      Weaknesses: 

      Given the complexity and cost of the model, the n is very low. 

      As is common in most studies in non-human primates, we have carried out all statistical analysis within one animal (e.g. the comparison of motoneuron numbers between left and right cord). We then show that results are reproducible in two animals. Although the number of animals is lower than in a typical rodent study, we see this as an advantage of the model, adhering to the 3Rs principle of ‘reduction’.

      The design of the experiments and the results shown about the toxicity induced by this focal TDP-43 expression do not allow us to conclude that it is a good model for ALS for several reasons. It is not clear that the TDP-43 overexpression results in spreading weakness or in spreading motor neuron loss. The neuropathological changes described suggest that there is a kind of stress response, which extends to regions away from the site of primary damage, but more is needed to provide convincing evidence that there is spreading of disease pathology reminiscent of human ALS. 

      As already noted in our response to Reviewer 2 (point 1), animal welfare is an important consideration when designing these complex experiments in primates. We could not therefore justify allowing the animals to survive until extensive wasting and weakness were evident, recapitulating the human disease. 

      The model developed in these experiments is therefore a pre-symptomatic pre-clinical model, in which animals are terminated before pathology leading to widespread motor neuron loss is evident. At post mortem we do have evidence of motor neuron loss in the segments supplying brachioradialis (C4-C8).

      Stress of various forms, including blunt trauma (e.g. Anderson et al, 2021), stab/electrode insertion injury (e.g. Zambusi et al, 2022), chemical (e.g. arsenite) exposure (e.g. Huang et al, 2024), or hypoxia (Marcus et al, 2021) can result in pathological nucleocytoplasmic translocation of TDP-43. In our model, there was no direct trauma to the brain or spinal cord ante mortem, excluding one major cause of tissue stress. Hypoxia during the process of euthanasia is possible, but we would expect there would not be enough time before death for this to manifest as TDP-43 translocation. In the literature TDP-43 translocation due to stress is diffuse; we have demonstrated that in our model the TDP-43 pathology is not diffuse but selective. For example, there was no evidence of disease in the oculomotor nuclei; in the primary motor cortex (M1) there are significantly more pathological changes in the evolutionarily younger ‘NewM1’ compared to the neighbouring ‘OldM1’.

      It is therefore improbable that our findings could be explained by ‘a kind of stress response’. Our findings are better explained by spread of the TDP-43 protein.

      Reviewer #4 (Public review): 

      Summary: 

      In this manuscript, the authors present data describing the development of a model of ALS in rhesus macaques. They use a viral intersectional model to overexpress TDP-43 in a population of motor neurons and then study the spread of the pathology about 7 months later. They demonstrate that both the cervical spinal cord and motor cortex (new and old M1) are full of TDP-43, suggesting that the pathology spreads from the single motor pool to presumably related neurons. 

      Strengths: 

      This is a super-important study in two main ways: 

      (1) This could be the birth of a really important model, one that is really needed for making progress in understanding ALS and the development of therapeutics. There are shortfalls with all the rodent models. Models dependent on cell cultures are superb for understanding cell-autonomous processes, but miss out on connectivity, particularly the long-range connectivity. Organoids may ultimately prove to be beneficial, but they would need cortex, spinal cord, and muscle, and translatability from them is not assured. So a NHP model is needed, and this may be it.

      Furthermore, the Methods are meticulously described and will undoubtedly facilitate reproducibility. 

      (2) The concept of the spread of pathology has been proposed for some time, I think, based initially on the detailed clinical observations of Ravits and colleagues. The authors have looked at this directly and provide supporting evidence for this interesting hypothesis. They show spread locally and contralaterally in the spinal cord (although a figure would be nice) and to the motor cortex. 

      Taking only these 2 points into account is more than sufficient for me to be enthusiastic about this work. 

      Weaknesses: 

      I'd like to make a couple of points that if addressed, could, in my view, help the authors strengthen this work. 

      (1) We don't know how many MNs were transduced by the rAAV. There was no tdTom expression, for whatever reason. The authors show an image of a control experiment with a single MN transduced, but there should be a red motor pool, at least in the control experiments. The impression that I get is that very few were transduced, and, in my mind, this makes the findings even more interesting - maybe you don't need many "starter" MNs. 

      Unfortunately, we cannot know how many motoneurons were transduced.

      However, the reviewer may be correct, that it is actually only a small fraction of the brachioradialis pool. This is supported by the evidence for rather focal denervation seen on MRI.

      (2) Continuing on this point, this leads the authors to conclude that all BR MNs have died. They support this by the reduced MN count (see point 3). Firstly, do we know how many BR MNs there are in the rhesus macaque, and does the reduction seen correspond to this number? Secondly, and more importantly, the muscle looks normal on MRI at 28 weeks - it does not look like a denervated muscle. The authors state that it has maybe been reinnervated, but by what, if all the BR MNs are dead? This does not seem like a plausible explanation to me. Muscle histology, NMJs, and fibre typing would have been useful to understand what's going on with the MNs. (And electrophysiology would have been wonderful, but beyond the scope of this study.) 

      To clarify, we did not conclude that all brachioradialis motor neurons had died, rather that all transfected brachioradialis motor neurons pool had died. As noted above, when these cells die and the muscle is denervated, the MRI signal changes occupy only a small volume of the muscle and are transient. We would not expect to see long-term MRI changes in muscle anatomy after this limited denervation-reinnervation event. 

      Analysis of muscle histology, including fibre typing, is outwith the scope of this initial paper reporting the model; we hope that this will form the basis of a future publication.

      (3) Some MN biologists, like me, fuss a lot about how to count MNs, which is almost as difficult as counting the number of angels on the head of a pin. Every method has its problems. Focusing on the two methods here: (a) ChAT immunohistochemistry is pretty good in healthy states, but we don't know what happens to ChAT expression in different diseases, particularly when you have a new model. If its expression is decreased, then it is not a good marker for MNs; (b) Identifying MNs based on the size and morphology of neurons in the ventral horn is also insufficient. For example, ~30% of neurons in a typical pool are small gamma MNs, and a significant proportion (depending on the muscle) of the remainder will be small alpha MNs. So what one is counting is, at best, the large alpha MNs, not all the MNs in a pool. And in ALS, it's these largest MNs that are affected at the earliest stages. The small ones might be fine. So results will be skewed. (Hence, it would be interesting to see if the muscle had a higher proportion of Type I fibres after being reinnervated by S-type MNs.) 

      This is an interesting point, and we agree that each method used to quantify MN number carries its own limitations. The problem of MN identification is heightened in a MND-like pathological state, especially when considering evidence of reduced ChAT activity in spinal motoneurons in end-stage disease in post mortem human samples (Oda et al, 1995), and more recent evidence from Casas et al. (2013), who demonstrated early presymptomatic reduction in ChAT expression in SOD1G93A mice. It is important to note that this was a modest reduction, not complete abolition of signal (76% of control levels). ChAT immunoreactivity was still present and motor neurons were still identifiable as ChAT-positive at this pre-clinical stage of disease. As counts in our study were performed based on detecting ChAT in cells, it seems unlikely that we would miss cells. However, we cannot rule this out. If indeed this did occur, it would mean that the reduced motoneuron counts which we observed reflect not only cell death, but also profound motoneuron dysfunction which is presumably the proximal precursor to cell death.

      We acknowledge that size-based criteria applied to ChAT-positive neurons will preferentially capture large alpha motor neurons, and that gamma motor neurons and small alpha motor neurons are likely underrepresented in our counts. Our counts therefore reflect the large alpha motor neuron population rather than the total motor neuron pool. We believe that this is not a critical limitation in the context of the present study. Large alpha motor neurons are the population of primary pathological interest in ALS and related MND, being the earliest and most severely affected subtype. The selective vulnerability of fast-fatigable large alpha motor neurons in ALS is well established, and their preferential loss is the defining feature of disease progression in both human post mortem tissue and rodent models (Lalancette-Hébert et al., 2016). In this respect, our size threshold selects for precisely the population whose degeneration is most relevant to the disease phenotype we are modelling. 

      We intend to include comments on these important points in the revised version of the manuscript.

      In response to the final point regarding muscle histology and proportions of Type I fibres, as stated above, reporting of muscle histology, including fibre typing, is planned for a separate publication.

      (4) Statistics. These are complex experiments looking at the spread of a disease. The experimental unit is therefore the monkey, n=2. In each monkey, multiple sections are analysed, which are key technical replicates and often summative. For example, do we care about the average cell number in Figures 4D, E, 5 I, J or 6G, H, or rather the total cell number? Do the error bars mean anything? To be clear, I am by no means minimising the importance of the overall convincing findings. But I do not think this statistical analysis is particularly meaningful. 

      Here, the experimental unit is the tissue slice, mounted on a slide for histological analysis, and not the monkey. All statistical comparisons are made within a single animal. We then show that the findings can be replicated in two animals, both of which show significant results. This is standard approach taken in primate neuroscience, given the need to reduce animal numbers to the minimum consistent with producing convincing results.

    1. On 2026-04-09 21:38:21, user Alizée Malnoë wrote:

      The manuscript by Fridman et al. explores the unexpected finding that Aeromonas jandaei antagonistically employs a Type VI secretion system (T6SS) in a liquid environment. While researching the effector protein Awe1, which forms part of the T6SS apparatus, the authors observed T6SS-dependent intoxication of susceptible bacteria. Using a novel fluorescence-based screening method (named LiQuoR for liquid quantification of rivalry), the authors further determine that this intoxication is contact-dependent, and that contact between kin and non-kin Aeromonas bacteria in liquid is mediated by specific adhesins. Fridman et al. also identify additional marine bacteria capable of inflicting T6SS-mediated intoxication in liquid media, suggesting a mechanism for specific and contact-dependent bacterial competition and positing that such competition in liquid media may be more common in marine bacteria than previously documented. These findings have exciting implications for bacterial antagonism, potentially shifting the paradigm of how we view bacterial interactions in marine environments. We found this study to be well-written, containing high-quality data. Overall, the data presented in this manuscript are done well and support the claims made by the authors. We outline some major and minor adjustments aimed at aiding the clarity of reporting and presentation, strengthening the findings, as well as providing additional context for a broader audience.

      Major Comments<br /> - We are interested in the broader implications of the LiQuoR assay, particularly pertaining to this workflow’s application to different bacteria. The observation that the amount of prey luminescence in WT on solid media grew/increased after 4 h seemed counterintuitive to us (Figure 1E). It seems as if this result could make the workflow less sensitive for experiments done solely on solid media, further explanation of this finding would clarify on the workflows applicability to other solid surface experiments. Is this related to surface area? While this does not change the findings that inhibition is occurring in both liquid and in solid, it would enhance the clarity of these results to provide speculation on why this was seen.<br /> - We are curious about your perspective on the observation that kin-kin aggregation facilitated by CaCl2 supplementation does not increase kin intoxication but does increase non-kin intoxication (Figure 2A). Please speculate on this result in the discussion. Is the concentration used physiological? <br /> - While the images shown in Figure 2B make it clear that aggregates are forming in liquid media, we have a suggestion to improve the strength of these results and account for the images not shown. For instance, quantification of the % of prey cells displaying Sytox staining would more strongly demonstrate the presence of permeabilized E. coli in multiple aggregates. This quantification could substitute Figure 2C (which can be moved into the supplemental): it was not totally clear to us why an orthogonal view was included here. If this is significant for the findings, it would increase clarity to include an explanation for an audience less familiar with this system.<br /> -Lines 192-214: From a genomics perspective, we think further explaining how potential adhesins were identified would be helpful to increase the clarity and reproducibility of the experimental design. Please explain how you narrowed down these adhesins and located them in the genome, and why adhesins were targeted for this analysis over other proteins that could facilitate a physical interaction between predator and prey species. Define the acronyms and provide rationale for naming. <br /> -Figure 6B nicely demonstrates that intoxication takes place in liquid between certain marine bacteria but not in Vpara. However, please include a control showing that V. para does intoxicate prey in solid media to strengthen these findings and confirm that this strain of V. para is capable of intoxicating prey under typical conditions.<br /> -Given the significance of the TssB deletion for the core message of this work that type VI intoxication occurs in liquid media, please consider including data that confirm the TssB deletion e.g. sanger sequencing in supplemental or as source data. A complementation assay of TssB to show that regaining TssB restores the awe1 toxicity would be valuable.<br /> - Lines 224-225/Figure 5: We are curious and excited about the implications of the balance between kin-aggregation and non-kin aggregation and how this may aid our understanding of bacterial interactions in marine environments. Based on our understanding of these results, the observation that deletion of CraAj (responsible for kin-kin aggregation) increased non-kin intoxication (mediated by LapAj) could suggest that aggregation between two kin cells, who both contain the needed immunity proteins, could dampen the intoxication of nearby non-kin cells. This result is implied by the data but not specifically speculated on or addressed. Though it may not be within the scope of this experimental design, our group was intrigued by these findings. Given your expertise in this area, consider discussing how these bacterial interactions may play out and/or include these observations as part of Figure 5.

      Minor Comments<br /> -All figures: In the legends, it is stated “these experiments were repeated three times with similar results”. Please define what is meant by an experiment e.g. technical or biological replicate.<br /> -All figures: We felt that having the exact p-values indicating statistical significance is not necessary. For instance, in Figure 3B and 3D, we found it distracting that all of the values were significant by a factor of <1E-4, even when they appear different from each other. If this is simply a cutoff value, it would be helpful to keep that consistent between figures. Also, Figure 6A/B: The p-values presented, specifically the comparison between WT and T6SS – supplemented with 1 mM CaCl2 (6A) and the two left hand panels of 6B, do not appear to match the differences shown between the experimental groups. By eye, these groups do not appear different from one another but are shown to be either highly statistically significant or not statistically significant at all.<br /> - Figure 1A: To increase readability, we suggest that the colors could be more intuitive here- put WT in grey and then mix colors for double mutants. Bringing the light pink line (Δawei1 ΔtssB + pAwe1) to the front of the graph would further increase clarity.<br /> -Figure 1B/F: Making color scheme consistent between 1B and 1F would increase clarity.<br /> -LiQuoR assay: As there is often some level of variation in expression levels when working with a transformed population, confirmation that all prey strains luminesce to a similar level would provide further validation of this novel assay (similarly to what is done in FigS3B). <br /> -Figure 2A: The colored box legends showing whether CaCl2 is present or absent are inverted relative to one another, which we found to be confusing. To increase readability, please make them on the same side.<br /> -Figure 3B,C,D,E: To help guide the eye on the graphs, we suggest adding dashed lines between each new mutation group (+/- TssB).<br /> - Figure S1: Please include a loading control to verify assay input. <br /> - Table S1: Clarify the gene and strain for each mutation.<br /> - Line 112-113: It serves as an excellent control that the action of the T6SS apparatus is required for intoxication, however, since the T6SS apparatus is contained within the bacterium, would spent media contain free-floating T6SS proteins, or are these proteins only ejected from the bacterium in the presence of prey species? Please clarify. Direct evidence, such as immunoblotting, that effectors are present in the spent media from WT would make this claim more compelling.<br /> - Line 35: While this part of the introduction provides excellent background regarding the role of T6SS in interactions with eukaryotic cells, it would be helpful to also specifically mention the role of T6SS in prokaryotic communities, as much of the later work focuses on competition between bacteria.<br /> -Lines 70-71: A more thorough background on Aeromonas (lifestyle, importance, etc) is warranted.<br /> -Line 84: Please provide the exact genotype when first introducing this mutant, it would improve clarity for the reader to explicitly state that this is a double mutant.<br /> -Line 97: Clarify here that “Aj prey” in this paragraph refers to Aj which do not possess the cognate immunity protein, as the current phrasing could be interpreted to mean “prey of Aj”.<br /> -Line 138: “Desired conditions for competition” is vague. Is solid media also incubated with shaking or is it static?<br /> -Lines 156-157: The statement that all three effectors are injected into prey cells is broad and not necessarily supported within these findings. The injection of one effector could be favored, but other effectors could compensate in its absence.<br /> -Line 189: Describes Aj as stably binding to other competing bacteria. To this point, imaged aggregates have been fixed so stability of aggregates may not be known.<br /> -Line 248: Here, it is mentioned that there was a switch from using the Lux operon to using the RFP mCherry for improved cell detection. It might be helpful to clarify which fluorescent tag was used for each assay, as multiple different fluorescent tags are used.<br /> -Line 317: As the choice to test CaCl2 and the biological relevance of calcium for Aeromonas hosts is explained earlier in the manuscript, it would be interesting to include a brief explanation about the choice to include sodium chloride when assessing Vibrio intoxication rates. Presumably, sodium chloride was picked because Vibrio is commonly found in brackish water, but someone from outside the field may not be familiar with this biology. Additionally, since Aeromonas can be found in both fresh and brackish water, an interesting follow-up experiment would be to test the Aeromonas strains under different salinities.<br /> -Line 375-377: Needs citation.<br /> -Line 385: Clarify “under specific conditions not addressed within the scope of this study”.

      Carter Collins and Lily Pumphrey (Indiana University Bloomington) - not prompted by a journal; this review was written within a Peer Review in Life Sciences graduate course led by Alizée Malnoë with input from group discussion including Camy Guenther, Josy Joseph and Tahreem Zaheer. We are part of the Dept. of Biology where Julia Van Kessel’s group is located, Julia is a collaborator of the corresponding author and did not influence the choice of this preprint for our class.

    1. On 2025-12-19 20:20:10, user Michael Ailion wrote:

      This manuscript documents careful genetic analysis to better understand where and how Rho signaling acts in the C. elegans egg laying circuit. The authors demonstrate that Rho functions in mature neurons to promote egg laying, as well as in vulval muscle. By using calcium imaging, the authors were able to demonstrate how Rho signaling (specifically in the HSN neurons) regulates cell excitability presynaptically (HSN) and postsynaptically (vulval muscles). We found the experiments to be well designed and the data to be robust, with the major conclusions to be supported by the data.

      Minor comments:

      1) The introduction included a detailed analysis of the Gq signaling pathway and the candidate targets that regulate neuronal activity (i.e. DAG-regulated effectors and ion channels), but the scope of the paper does not include testing or identifying the targets downstream of TrioRhoGEF/Rho. On the other hand, the focus of this manuscript is neurotransmission in the egg laying circuit, and little detail is provided about how and what neurotransmitters are released by HSN. Only in the results section is NLP-3 mentioned, but it is known that both serotonin and NLP-3 released from HSN each contribute significantly to egg laying. <br /> 2) The authors conclude that Rho promotes synaptic transmission, and this is on the whole correct, but the authors could be more careful/precise with their wording and interpretations. As noted in comment 1, both serotonin and NLP-3 contribute to synaptic transmission in the egg laying circuit, but it is not known how directly these two components act in synaptic transmission. For example, NLP-3 is a neuropeptide that is released from dense core vesicles (DCVs), and it is possible that serotonin is also incorporated into DCVs as well as synaptic vesicles. In addition, serotonin and NLP-3 are known to act extrasynaptically as well as synaptically, and it is possible that Rho contributes to extrasynaptic release of serotonin and NLP-3. <br /> 3) When analyzing their data, the authors bin calcium imaging measurements in the active vs inactive state. The active and inactive egg laying states are characteristic for wildtype worms, but as the authors show, altering the activity of the HSN affects egg laying. Another interpretation of their data is that when Rho is activated (HSN::Rho-1(G14V)) the worm is always in the active egg laying state, and when Rho is inhibited (HSN C3 Transferase) the worm never enters the active egg laying state. While we don’t think they need to change how they analyze the data, the authors could just add this interpretation to the discussion. <br /> 4) We feel like the authors should include a more detailed discussion of why they see a difference in the effect of expressing dominant negative Rho (T19N) vs the C3 transferase in HSN. Why did Rho-1(T19N) expressed in HSN not show such a clear inhibition of calcium activity and egg laying as the C3 transferase expressed in HSN?<br /> 5) In general, gain-of-function experiments are hard to interpret. Activated Rho could increase cell excitability, but that does not necessarily mean that is the function of Rho normally. The loss-of-function experiments are more convincing, aside from the discrepancy we noted in comment 4. This could be noted in the discussion. <br /> 6) Lines 148 & 179: provide more detail or a reference for how extrachromosomal arrays were integrated.<br /> 7) Lines 195 & 214: it is unclear how GCaMP arrays were confirmed by mCherry fluorescence (nlp-3p::mCherry) given that these strains also have arrays carrying tph-1p::mCherry and both nlp-3p::mCherry and tph-1p::mCherry should express in the HSNs.<br /> 8) Line 339: the authors conclude that Rho acts “downstream of Trio RhoGEF.” However, the data show that a Trio mutant is only partially bypassed by expression of activated Rho – i.e. # of eggs is intermediate between the Trio mutant alone and activated Rho alone. These data are consistent with Rho acting downstream of Trio, but with RhoGEF activity still contributing to full activation of the “activated” Rho(G14V). The data would also be consistent with Trio and Rho acting at least partially in parallel, which could occur within the same cell or in different cells. A further complication to the interpretation of these data is that different activated Rho arrays are used in the WT and Trio mutant backgrounds. These different arrays could have different expression levels, which is a big caveat to making these comparisons. Ideally, one would use the same array in the WT and Trio mutant backgrounds.<br /> 9) p. 16, lines 348-459: many of the Fig 2 callouts on this page refer to the wrong panel.<br /> 10) Line 347: says 70%, but the data in the figure show >80%.<br /> 11) Line 348: says 3 +/- 1 eggs, but Fig 2B says 3 +/- 0.2 eggs for same strain.<br /> 12) Line 363: we were confused by this. Are the authors suggesting that you can’t quantitatively compare the effects of the HSN vs. muscle specific expression of activated Rho(G14V) because the arrays are mosaic? While it is true that the arrays may be mosaic, they also carry an mCherry marker expressed in the same cells, so they should know whether the array is expressing activated Rho as intended in the worms assayed, and it is unclear why mosaicism is an issue. A bigger issue to quantitatively comparing these strains is that they probably have different expression levels of activated Rho.<br /> 13) Line 396: “outside of egg-laying active states (Figure 3A).” However, the data in Fig 3A shows HSN activity “during an egg-laying active state” according to the figure legend. Data showing activity outside egg-laying active states are not shown, but should be presented.<br /> 14) Line 423: it is unclear how “instantaneous” transient frequency is defined. This should be added to the methods or figure legend.<br /> 15) Line 428: says “more than 5 transients per minute” but the data in Fig 3C show it to be just under 4 transients per minute.<br /> 16) Line 561-562. “This difference largely resulted from a lack of twitch transients around egg-laying events in C3T-expressing animals.” This argument doesn’t make sense to us. How could a lack of twitch transients affect the amplitude of the transients that are seen?<br /> 17) Line 648: “we do not see dramatic effects on HSN morphology and presynaptic structure upon Rho inactivation.” Presynaptic structure was not assayed, so this should be cut.

      Reviewed (and signed) by Amy Clippinger and Michael Ailion

    1. On 2025-11-19 21:19:50, user Daniel Vásquez-Restrepo wrote:

      This preprint already received a “major revision” decision. Unfortunately, the original reviewers were not available to evaluate it again, and the process stalled. Despite sending 15 additional peer-review invitations, no one agreed to take it on. Although the manuscript has now entered a new review process, I am attaching the previous reviewers’ comments.


      Reviewer 1

      This isn’t a finding as not only is it already available information, the use of the available IUCN maps and statuses was part of the methodology.

      R/ We rephrased the sentence to clarify that it refers to the underlying data itself and not to our results.

      I like the approach they’ve taken, but none of this is novel information or unexpected.

      R/ Although it is well known that mountains promote diversity and endemism at a global macroevolutionary scale, this information has not been explicitly tested in Colombian squamates in conjunction with threat categories. We consider that clearly stating the result of hotspots of diversity and endemism in Colombian squamates can help local environmental policies. Therefore, while our results are consistent with theoretical expectations, this alignment does not diminish the novelty of our findings, as we provide the first quantitative analysis supporting these patterns in the local context.

      This is the main novel finding of the work and I’d recommend reorganising the text to stress this.

      R/ We modified several sections of the text to emphasize the finding highlighted by the reviewer, also in accordance with comments made by the other reviewer.

      Unclear what this means in the context of this paper.<br /> R/ We rephrased the section for clarity.

      This is just the existing EDGE list, so I’m not sure it warrants mentioning as an output here.

      R/ In accordance with a comment from Reviewer 2, we acknowledge that this is a local rather than a global list, and that species rankings may differ between the two. Therefore, we believe it is an output worth highlighting. Nevertheless, we have clarified in the text the differences between the local and global scores and their implications.

      This entire paragraph seems superfluous, and this work has nothing to do with the latitudinal gradient so it’s a strange thing to focus discussion on.

      R/ While we briefly mention the latitudinal gradient, the main purpose of this introductory paragraph is to provide general context on biodiversity, leading into the key argument of the subsequent sections: the need to understand biodiversity and extinction risk as multidimensional phenomena. We have made minor adjustments to better integrate the role of the latitudinal gradient in promoting tropical diversity, thereby reinforcing the importance of prioritizing conservation efforts in regions of exceptionally high biodiversity.

      Suggested added context as this was unclear as worded.

      R/ We accepted the reviewer’s suggestion and revised the text accordingly.

      I’m not sure this follows - more that, as the paragraph goes onto say, it results in a lack of understanding of the impacts and vulnerability of the species.

      R/ We rephrased the idea to make it clearer.

      This seems to be an inappropriate reference, as Paez et al. 2006 focused on turtles rather than squamates. Please check and reword as needed.

      R/ We double-checked the reference and confirmed that it is correct, as it covers not only turtles but all Colombian reptiles (including squamates, crocodiles, and turtles).

      This seems inconsistent with the earlier statement that “a local assessment is lacking” - should this rather say a recent local assessment? Though as the paper goes on to reference a 2015 ‘local assessment’, it’s unclear what this section means.

      R/ We agree with the reviewer and revised the text to clarify that we refer to a recent assessment that also considers different facets of biodiversity, not just species richness (i.e., taxonomic diversity).

      The figure given later is 597, and that was used as the basis for the analysis. This may be a discrepancy due to a later update, but the same Reptile Database update should be cited throughout the paper for consistency.<br /> R/ In the Introduction, we refer to the most recent estimate of 620 reptile species for Colombia, based on the latest update of the Reptile Database (2024). However, the analyses in this study were based on the 2023 version of the database, which listed 597 species at that time. Given that the analyses were conducted using the 2023 data, and a complete reanalysis would be required to incorporate the updated figures, we chose to retain the original dataset to ensure consistency and reproducibility. We have clarified this point in the text to avoid confusion.

      Better to use the term ‘squamates’ rather than ‘reptiles’ if crocs and turtles are to be excluded.

      R/ Done, we have consistently replaced "reptiles" with "squamates" throughout the text where appropriate.

      Once again, this could benefit from clarity. The data in the Reptile Database should be reviewed with reference to available material and literature to be used as a formal checklist, but it should be ‘complete’ - it’s more likely to erroneously list species from a country than to miss ones that actually occur there.

      R/ We agree with the reviewer and rephrased the sentence to make the idea clearer.

      Are the authors able to explain the discrepancy between this figure and the maps (which represented 81% of the dataset)? Most IUCN assessments will have maps, but no IUCN maps will be associated with species that don’t have assessments.

      R/ The figures were validated against the information provided in Table S1. As the reviewer correctly points out, there are more assessments than polygons, consistent with the supplementary material. The figure of 77% corresponds to 461 species (excluding DD and NE categories) out of 597 species in our dataset (461/597 = 0.77). Meanwhile, the figure of 81% refers to 481 species with available geographic information, including species categorized as DD (481/597 = 0.81). The discrepancy arises because DD species were included when considering geographic data but excluded from threat category analyses. We have revised the Methods and Results sections to clarify this distinction explicitly. Also, we updated the previous 77% figure to include DD species too, increasing it to 92%.

      This is not a sufficient way to evaluate whether the assessments are likely to need updating - the Criteria take account of the distribution and extent of threats to each species, not simply its distribution. The ‘needs update’ tag is applied by the Red List only to assessments more than 10 years old, which is all that should be mentioned here.

      R/ We understand the reviewer’s concern and acknowledge that a mismatch between EOO and threat classification is not sufficient by itself to determine if an update is needed. We have separated these ideas in the text: first, we highlight species whose assessments are formally tagged as “needs update” after 10 years; second, we discuss species whose EOO does not align with their current threat classification. We moved the second point to the 3.2 Geographic patterns section, and expanded the Discussion to better explain these observations.

      See above. The authors didn’t ‘show’ this, they interpreted the Criteria incorrectly.

      R/ See previous answer. We further expanded the Discussion section to better frame this point.

      I would consider it suitable for the manuscript to be more fully revised as a shorter paper, as the region-scale analysis within Colombia and the phylogenetic results are of more interest than the well-trodden path of identifying the Andes as an area of greater endemism than Amazonia and the additional analyses included in the paper render its main findings somewhat opaque in places.

      R/ We consider that highlighting the Andes as an area of high endemism is necessary to provide context for interpreting the patterns of phylogenetic diversity. While it may be a well-known topic, not all readers will have the same background. Although the manuscript is extensive because it covers taxonomic, geographic, and phylogenetic patterns, its current length (ca. 6,300 words, excluding references) is well within the 9,000-word limit for Original Research articles in Biodiversity and Conservation and only slightly above the typical 5,000-word range. Nevertheless, we made an effort to shorten unnecessary sections to improve focus and clarity. For example, we removed some analysis related to diversification rates and extinction risk, since as the Reviewer 2 pointed out, some metrics depending on branch lengths may be biased.<br /> <br /> Reviewer 2

      L393-405: it is important to acknowledge the phylogenetic incompleteness of a national-level analysis, and how that might be affecting these results – divergence times are influenced by phylogenetic coverage and structure, removing >90% of squamate species from the phylogeny will give you divergence times between Colombian species, not true lineage age/divergence time information. This could be addressed with sensitivity analyses to explore how lineage age varies between pruned and complete trees, or with stronger discussion of the pitfalls of this approach in the methods and discussion, with clearer wording in the results.

      R/ We appreciate the reviewer’s insightful comment and fully agree. We performed additional calculations to assess sensitivity, and indeed, the age of some lineages can be severely affected, while others remain largely unchanged. Following the reviewer’s recommendation, we revised the Methods and Discussion sections to place greater emphasis on the limitations of using evolutionary metrics derived from pruned trees and on the considerations needed when interpreting these results. As the reviewer also notes, these results are not necessarily incorrect, since global conservation priorities do not always align with local ones. Additionally, we introduced local and global subscripts to our metrics to explicitly distinguish between them.

      407-418: Distinction is needed between EDGE scores and national EDGE scores (literally just saying ‘national EDGE scores’ would suffice). It may also be useful to identify national-specific priorities – i.e. high ranking national EDGE species that are not highly ranked in global context. There are EDGE scores available for all vertebrates at the global level here ( https://www.nature.com/articles/s41467-024-45119-z) . There are endemic Colombian squamates that are high EDGE in this study and also high EDGE at the global scale (e.g. Lepidoblepharis miyatai) but also species that are high EDGE nationally because of the phylogenetic diversity they are solely responsible for in Colombia, but the responsibility for which is shared beyond Colombia’s borders. These key cases can be instrumental in ensuring species that are globally ‘safe’ but locally important do not fall through the cracks.

      R/ Please refer to the previous response. We now explicitly distinguish between national EDGE scores and global EDGE scores throughout the text and highlight cases where species are locally important but not necessarily globally prioritized.

      L41 and throughout: “threatenedness” = “extinction risk” or “level of threat”.

      R/ Done.

      Throughout: It’s the IUCN Red List, not IUCN, particularly when referring to versions of the Red List database.

      R/ Done.

      L145: make it clear you’re referring to national endemics.

      R/ The Resolución 0126/2024 from Colombia’s Ministry of Environment (MADS) covers not only national endemics but all species occurring within the country’s administrative boundaries.

      L167: ensure it’s clear that its imputation based on taxonomy alone.

      R/ Done.

      L182: check references.

      R/ We reviewed the references cited at this point and confirm they are correct.

      L222-224 and throughout: phylogenetic diversity == Faith’s PD – the other measures are indices of phylogenetic distance/relatedness that are calculated in same units as PD, but are not phylogenetic diversity – that should be clarified.

      R/ Done. We clarified that Faith’s PD refers specifically to phylogenetic diversity, while the other metrics represent measures of phylogenetic relatedness or distance.

      L393: extinction risk should not be though of as a trait evolving but as the manifestation of extrinsic and intrinsic factors.

      R/ Agreed. We rewrote the sentence.<br /> L393-397: unclear what the relationships discussed are, and what they infer.

      R/ We have removed this section from both the Methods and Results. Given that the correlations discussed involved metrics dependent on branch length — and, as the reviewer previously pointed out, branch lengths can be affected by pruning the phylogenetic trees — we decided to eliminate this section. Overall, it did not substantially contribute to the text or to the discussion.

      L428-429: This is higher than, or at least comparable to, the global % of DD/NE squamates I think, so might not be considered relatively low for squamates.

      R/ We rewrote the sentence to clarify that it is comparable to or higher than the global percentage, as the reviewer correctly pointed out.

      L429-432: it might be worth highlighting how taxonomists and others can contribute to rapid reassessment of species with basic information in ecological publications see: https://doi.org/10.1016/j.biocon.2018.01.022

      R/ Done. We incorporated the reviewer’s suggestion.

      L442-444: Unclear what is meant here? A species can be assessed as CR with a wide range if its under population decline criteria, and a small-ranged species can be assessed as not-threatened if there is no evidence of decline/ongoing degradation.

      R/ This comment was also raised by Reviewer 1. We addressed it accordingly by revising the text to clarify that species can indeed have wide distributions and still qualify as Critically Endangered if facing significant threats, and vice versa. Please refer to our responses to Reviewer 1.

    1. On 2025-11-03 07:59:20, user Zoya Yefremova wrote:

      Dear colleagues,

      I read with great interest your preprint describing Tamarixia citricola Hansson and Guerrieri sp. nov. (Hymenoptera: Eulophidae), a putative new parasitoid of Diaphorina citri discovered during a classical biological control program in Cyprus. Congratulations on this interesting contribution to the taxonomy and biological control of psyllid pests.<br /> If I may, I would like to respectfully draw your attention to a publication that may be relevant to your study: Burckhardt, D., Yefremova, Z.A., & Yegorenkova, E. (2015). Diaphorina teucrii sp. nov. and its parasitoid Tamarixia dorchinae sp. nov. from the Negev desert, Israel (Zootaxa 3920 (3): 463–473). I apologise for the self-reference, but given the biogeographical proximity and the relevance of the Israeli Tamarixia fauna to the region, it was somewhat surprising not to see it cited.

      In Israel, we have documented five native species of Tamarixia, including T. dorchinae, which shares several morphological characters with what you describe as T. citricola, particularly in forewing and antennal structure across sexes. A comparative discussion of these taxa might offer further insights into whether the specimens from Cyprus are truly distinct species. A discussion comparing the putative new species with other taxa in the region is warranted anyway.<br /> Additionally, I think that host specificity in Tamarixia isgenerally more consistent with psyllid host genus rather than the associated plant. This ecological pattern may be worth emphasizing in your discussion.<br /> We are in the process of barcoding the Tamarixia species of Israel, and a comparison with your material would be most useful.<br /> Thank you again for sharing this work,

    1. On 2025-10-08 14:25:37, user Michal Tal wrote:

      Since I was asked to review this paper several months ago and waived my anonymity on review, I'm sharing my review publicly here as a comment. The TL/DR is that I think this paper is both very informative, and very important. However, it does need to be contextualized as a deep study of a recovery cohort, which is then being compared to public data from cohorts with a significant percentage of people who are not recovering, and that needs to be accounted for. Comparing immune cells from the PBMC fraction of blood of people who all went on to recover to cells from tissue of cohorts including those made up of 40% people who did not go onto recover does not allow for making conclusions about differences between the blood and the tissue without accounting for the differences in immune responses of those on a trajectory to recover and those who are not. Those immune responses could look very different, both in the blood and in the tissue.

      Here is my full review:

      This is an important and comprehensive study by Rostomily et al., "Multiomics Reveals Compartmentalized Immune Responses and Tissue-Vascular Signatures in Lyme Disease," which significantly advances our understanding of the immunopathology of acute Lyme disease (LD). I found it easy to read, and the figures were clear and compelling. By employing a longitudinal, multiomics approach integrating plasma proteomics, metabolomics, and PBMC immunophenotyping, supplemented with a meta-analysis of skin lesion transcriptomics, the authors present a compelling narrative of compartmentalized immunity. They propose that the robust alterations in circulating plasma proteins and metabolites, linked to endothelial barrier stability, metabolic reprogramming, and symptom severity, are predominantly driven by local immune processes within the skin and associated vasculature, while systemic PBMCs remain largely quiescent. It is quite surprising to see the PBMCs and metabolites show such fast resolution, and it feels like this is likely related to the complete recovery seen in this cohort. This work offers novel insights into effective immune responses against Borrelia burgdorferi and the kinetics of recovery from infection, particularly highlighting vascular involvement, and provides a valuable resource for future biomarker discovery and therapeutic development in LD. <br /> A critical aspect for the authors to address, perhaps in the limitations or discussion, is the high recovery rate observed in their patient cohort. The manuscript states, "Following antibiotic treatment, symptoms resolved in most patients, with only a few reporting mild symptoms attributable to LD at 6 months or at 1 year post-treatment". This contrasts with broader literature suggesting that 10-20% of LD patients develop Post-Treatment Lyme Disease Syndrome (PTLDS) with persistent symptoms. It would be beneficial for the authors to discuss why their cohort experienced such a high recovery rate. Were specific exclusion criteria applied that might have inadvertently selected for individuals less prone to PTLDS (e.g., absence of certain co-morbid conditions known to be risk factors, that’s very interesting to speculate)? The methods section details exclusions such as fibromyalgia, chronic fatigue syndrome, traumatic brain injury, prolonged undiagnosed somatic complaints, morbid obesity, sleep apnea, autoimmune disease, uncontrolled cardiopulmonary or endocrine disorders, recent malignancy, liver disease, major psychiatric illness, or substance abuse. While extensive, it's worth considering if these fully account for the low PTLDS rate. Additionally, the cohort demographics (Figure 1B) show a skew towards male patients (27 male vs. 22 female). Given that some infection-associated chronic illnesses, including potentially PTLDS, may skew female, could this gender distribution contribute to the observed recovery outcomes? Clarification on these points would help contextualize the study's findings regarding the typical immune trajectory of acute LD.<br /> Major Points:<br /> 1) Comparability of Meta-Analysis Cohorts: The conclusions regarding skin-derived systemic signals rely heavily on meta-analyses of public datasets (GSE63085, GSE154916, GSE169440). It is crucial to provide a more detailed comparison of the clinical characteristics (symptoms, treatment, PTLDS rates) of these external cohorts with the primary study cohort. For instance, the GSE63085 PBMC dataset is from a cohort with a reported PTLDS-like symptom rate of ~46%, substantially different from the near-complete recovery in the current study's cohort. These differences should be explicitly discussed as they could influence the nature and interpretation of immune responses. I wonder if the major differences seen in the skin vs PBMCs here are driven more by immune differences in people who are on a trajectory to recover vs those who are not. There are public datasets available on PBMCs as well, such as from the SLICE cohort including those on a trajectory to recover and those who are not. These should be compared in the analysis.<br /> 2) The finding of largely quiescent PBMCs in the face of infection and systemic mediator changes is surprising. The authors should expand their discussion to contextualize this observation against other types of infections, or Borrelia infections where people go on to develop borrelia infection-associated chronic illness. For example, how does this compare to PBMC responses in other chronic infections, tissue-localized (versus systemic/blood-borne) infections, or infections caused by slow-growing (like B. burgdorferi) versus fast-growing bacteria? Or, back to point #1, is this more just what PBMCs look like in someone who has been successfully treated with an antibiotic for a bacterial infection, and is this just what being on track to a full recovery looks like? That would explain why this looks so different from PBMC profiles in chronic illnesses like TB/HIV/HCV/ T. cruzi, but would make more sense in the context of a cleared infection and recovery. One additional thing to consider is that most of the immune granulocyte cells will be spun out of the PBMC fraction, but that does not mean those responses aren't circulating in the blood, they just won't be found in the PBMCs.<br /> 3) The T3 timepoint seems to stand out compared to T2 or T4, and it’s not clear why, and this isn’t adequately addressed in the discussion or limitations.<br /> Minor Points (Organized by Figure):<br /> Figure 1: Study overview and clinical manifestations <br /> Panel B: The gender distribution is skewed male. It would be useful to know if any sex-based differences in the measured parameters were analyzed, as this was not apparent throughout the manuscript.<br /> Panel C: The T3 (6 months) timepoint for C6 ELISA is missing; only T1, T2, and T4 are shown. Is that because T3 looks weird throughout, and you didn’t want to show it? <br /> Panel F: It would be helpful to indicate which correlations are statistically significant (e.g., using asterisks or by highlighting significant bubbles).<br /> Figure 2: Differential expression of circulating proteins and their correlation with symptoms <br /> Panel A: The separation into fast- and slow-resolving clusters is a very interesting and insightful presentation. However, the text states PRDX5 remained significantly elevated at T2, but this is not immediately clear from the heatmap's visual representation for PRDX5 in the T1-T2 comparison. Only IL17C is labeled as significant (T2-T3) in the slow responding genes.<br /> Panel B: It is unclear why not all rows are labeled as they appear in A across all pathway comparisons, which makes it harder to assess the full dynamics. Maybe this circles back to the fact that some of them were significant T1-T2, and not T1-3, but then again yes T1-4 so maybe it looked messy to show it that way? But this way you only show the first set it was significant for, and not the dynamics in between…<br /> Panel C and D: The rationale for selecting T1 versus T3 for this heatmap of cardiovascular, metabolism, and organ damage proteins could be clarified, especially as Panel A focuses on pairwise comparisons across all timepoints. And at other times T3 seems to be intentionally excluded. Displaying patient-based trends rather than just row-based averages might also be informative. The asterisks on the left indicating significance are somewhat hard to read on the opposite side from the label.<br /> Panel E: This is a visually appealing figure, though the bundling can make specific correlations slightly challenging to trace.<br /> Figure 3: Integrated community analysis and diagnostic modeling <br /> Panel A: Could the authors add descriptions of any shared features or overarching themes among the analytes within each of the three largest communities beyond endothelial disruption/protection? The rho scale for symptom correlations (-0.4 to 0.6) suggests many correlations are not very strong; indicating that adding statistical significance for these symptom correlations would be beneficial.<br /> Panel B, C, D: These ROC curves are interesting for diagnostic potential. Suggestion: If data are available, showing a baseline ROC curve using standard clinical diagnostic features (e.g., EM presence, basic serology if used for initial classification rather than just inclusion) could provide a useful comparison for the multiomic models.<br /> Figure 5: Minimal peripheral changes in acute LD <br /> Panel A: The highest variance explained by PC1 in the PCA of PBMC abundances is relatively low (18.4% for patients, 16.2% for controls), suggesting considerable heterogeneity not captured by the main principal components.<br /> Panel B: The decrease in plasmablasts over time would possibly be expected if it aligned with the development of memory B cells. But that doesn’t seem to be the case from this data. That might be a fit with what Nicole Baumgarth has described in B6 mice, and definitely warrants further discussion.<br /> Panel C: The UMAP visualization shows minimal separation. Without non-recovered patients, it's difficult to discern disease-specific trends versus inter-individual variability.<br /> Figure 6: Dramatic changes in a case with severe disseminated disease<br /> The boxplots effectively highlight how different the severe outlier patient is. This case underscores the point that systemic activation can occur. I really wonder if compared to publicly available data from people who did and did not recover after their acute infection, if you would see a lot more of this. Replicating this in a dataset with more patients with severe, non-recovering disease would be necessary to draw broader conclusions about this hyperinflammatory state.<br /> Figure 7: Skin immune responses reflect plasma protein and metabolic signatures <br /> Panel A: the source/location of "unaffected skin" biopsies can influence cellular profiles. This should be addressed.<br /> Panel C: The differential expression of CXCL8 (IL-8) across various skin-resident cell types is very interesting as is LILRB4 expression in skin-resident cells which would support the tissue-based regulation hypothesis as long as we had more comparators between the symptoms and inflammatory state of the individuals these cohorts.<br /> I think this paper is both very informative, and very important. However, it does need to be contextualized as a deep study of a recovering cohort, perhaps being compared to cohorts with more people who are not recovering, and that needs to be accounted for.

    1. On 2025-06-02 17:22:57, user Karl Milcik wrote:

      We reviewed this paper as part of our regular journal club. Below is a collection of the comments made by the various group members:<br /> --- 1 ---<br /> It's unclear why asymmetry in the latent embeddings is required.

      No mention of the model predicting trivial results during training due to the symmetric KL? Ablation might reveal that the loss weights require very careful tuning to avoid predictions or that the reference distribution is extremely important.

      There are a number of implicit assumptions being made with the model architecture, primarily that there is sufficient information to align two datasets. It becomes an issue when combining datasets from very different modalities (e.g. scRNA-seq and sc proteomics). Adding multiple modalities is definitely possible, but the overlapping information becomes smaller and lose additional information. It would be good to see where the model stops working. Small datasets will similarly carry little information: is there a minimum number of samples for the model to function as expected (exact number not required, but getting a sense with a few datasets of different modalities would be informative). As-is, we wouldn't expect the model to apply to most single-cell datasets.<br /> Aligning modalities that are of extremely-different dimensionality implies either redundant information in one modality or information loss. This should be discussed.

      Specifics of training, hyperparam optimization, etc. would be better in a supplemental (assuming the targeted venue allows it). The main contribution appears to be the combination of the various losses. The article could be shortened by focusing on that when describing the method.

      Re: training procedure. No mention of balancing the different modalities. "Difficult" modalities would be more difficult to learn. early stopping could be preventing complex modalities from being sufficiently mapped because the simpler modalities are overfit faster than the complex ones are learned.

      Evaluation metrics: NMI is very similar to the symmetric KL that is used to train the model. I'm not sure if it's a reliable metric for this.

      Fig. 2a: the figure amounts to "the model removed information," which is the point of batch correction but doesn't quantify what other information was lost. Fig. 6 suggests that there is quite a bit of biological information is lost.

      Fig. 3: scRNA reconstruction is producing high values for some genes when it shouldn't (purple cluster, top). If one were to use this, we would conclude that those genes are highly differentially expressed when they are not in the original data. This is a fatal problem.

      --- 2 ---<br /> 1. Lack of Evaluation in Downstream Biological Applications<br /> While UniVI shows strong performance in latent space alignment and cross-modality prediction, its utility in downstream biological tasks (e.g., identifying novel cell subtypes, inferring regulatory programs, or reconstructing differentiation trajectories) remains under explored. Demonstrating improvements in real biological discovery would substantially enhance the manuscript's impact.<br /> 2. Insufficient Validation of Generalizability Across Conditions<br /> The datasets used in evaluation are mostly standard and clean (e.g., PBMCs from 10x Genomics). It is unclear whether UniVI generalizes well to more diverse or challenging settings (e.g., different sequencing technologies, species, or tissues).<br /> 3. No Ablation Studies to Justify Model Design<br /> The architecture includes several important design choices (e.g., β-VAE, shared and private latent spaces, MoE layers), but the manuscript lacks ablation experiments to validate the contribution of each component.<br /> 4. Lack of Interpretability for Latent Space Representations<br /> The latent space is central to UniVI’s function, but its biological interpretability is not addressed. It is unclear which features (genes, peaks, proteins) drive the alignment, or how latent dimensions relate to known biology.<br /> 5. Failure Cases and Limitations Are Not Discussed<br /> The manuscript does not address situations where UniVI might fail or yield poor alignments. Understanding when and why the method breaks down would be critical for end users.

      --- 3 ---<br /> 1) They mention that scATAC-seq is not reliable for determining cell type specificity, then why did they necessarily include ATAC-seq?

      2) The dataset they use are reliable but I think it would be good for them to mention why exactly they preferred these dataset and databases, there is not much information about this

      --- 4 ---<br /> Figure 4: recommend labeling panels rather than referring to top left, etc. In the boxplots at the top left, uniVI and totalVI seem really similar in NMI, ARI, ACC but no formal statistical comparison done<br /> usability may be limited if you have to manually fit the model with your own data<br /> is overfitting a problem with very small datasets? is computational time a problem with very large datasets (eg early stopping used)?

      --- 5 ---<br /> -Use of the model to generate new data is stated and referenced throughout, but I felt the true utility of this is underexplored. Why would someone want to do this? The authors mentioned data augmentation, but the authors could be more explicit on any other uses.

      -Did the authors consider using alternative methods to grid search for their training procedure (e.g., neural architecture search)? Also what were the ranges of values searched and with what step sizes?

      -For adding >2 modalities, are there any considerations with computational complexity and training time at a certain point? How would this scale to K>2?

      -In general, the paper is well organized and detailed, but almost to a fault. I suggest moving details less relevant to the average reader into a supplemental section. For example, knowing the function calls and variables probably isn't relevant to most readers. Those that want to know that could look in the code or point the reader to a supplement. These somewhat irrelevant details to the figures were also mixed with critical details such that I felt a little lost on trying to pick out the most important parts of the methods.

      -On the same note, simple details are often over-explained or restated multiple times in the text (e.g., the explanation for subsetting the data to obtain non-overlapping labels is repeated several times), while more complex concepts such as the Beta term, mixture of experts model, etc. are often underexplained in my opinion.

      -For Figure 1, I am still confused on what exactly UniVI provides a benefit over in some panels versus just looking at individual UMAPs and annotating by the labels, since these are already known? More specific explanation on why a shared latent space is usual to find new biology would help.

      -Exploring more on the fringe cases in which data does not align is interesting. For example, the authors mention cell 59 aligning closer to a Dendritic cell than B cell. They mention this could be biological variation or technical error, but exploring more about this 'misalignment' in this and other datasets could be be a key way of identifying unique insights from this model, though would require biological validation. Perhaps the authors could suggest some such experiments as future work to tie in dry and wet lab approaches/experimental designs that would complement this model in the lab.

      --- 6 ---<br /> In the paper authors mention that approximately 1% of the dataset shows inconsistent alignment. Could you elaborate on how this might be interpreted as reflecting dynamic cellular states in continuous development? A deeper discussion of this would be very helpful.

      --- 7 ---<br /> Figure 7: how to prove that the reconstruction retains the biology signal or better illustrate:<br /> It’s weird that the error did not increase significantly with the higher dropout rate.<br /> As well as for the Correlation<br /> When no dropout is applied, the correlation between the raw and reconstructed data is only 0.52. Does this suggest that the pathways have changed significantly? It may be necessary to check which pathways have changed and which have not.

      --- 8 --- <br /> Lack of QC metrics and if there were any filtering involved for the data. Transparency is missing in the QCs.

      --- 9 ---<br /> A limitation is that this must be only used for measurements made from the exact same cells - we cannot apply this framework to cells measured in parallel with different methods

      Figure 2 not sure that they compared to CCA or OT as those were introduced alternatives in the beginning.

      Figure 2 : I like that they show the measurement pairs for each cell - can they quantify this globally somehow?

      The distinction between “imputation” and alternative mode reconstruction is unclear from their description; they mention fitting a gaussian mixture model with their data and then using that for input - does that mean they use the true values from one measurement modality and then use all zeros for the other? Why not simply run a forward pass from the one modality encoder and then use the opposite decoder?

      They comment on higher expression levels having higher reconstruction MSE - this is a common feature of autoencoders that compress the range of predictions so as to minimize error from any large magnitude predictions. The methods claim to have used pp.scale() which should have removed this effect of the measurements original magnitude?

      It would be interesting to know what are the limits in terms of minimum (or maximum) features per modality and minimum measurements for training.

      Based on figure 4, the claim that uniVI “outperforms existing state of the art integration methods does not appear to be statistically supported. It appears to be indistinguishable from TotalVI and perhaps even Seurat. The authors should compute p values using random samples of the data with replacement (I think these experiments used identical samples, which would violate the assumption of independence for t-testing). TotalVI appears to have been published over 4 years ago in Nature Methods. However they claim that TotalVI requires “modality specific priors”. This “prior” appears to be a specific model term that is learned from the data to account for background, so I agree that uniVI is more generalized but not by as much as I thought before seeing this prior work.

      The authors should be careful about statements of distance based on UMAP “The model preserved meaningful cellular distinctions, with closely related populations remaining spatially proximate in the latent space, underscoring UniVI’s ability to harmonize intra-modality variation while retaining biologically relevant structure.”

      Figure 6C is a neat application of this data. Does this scale beyond this data and how can it be less slushy in the representations?

      Can this be fit on very deep single cell omic data and then applied to predict missing depth from more shallow studies?

      It would be interesting to repeat the dropout experiment with multiple random dropouts to get a sense of variance in the genes that are dropped out.

      I’m confused why the pre and post reconstruction heatmaps in figure 7 bear no resemblance even with 0% dropout. Are these hierarchically clustered differently or should we be able to compare the shapes between them.

      Is there overlapping information between true SCP and SCT (beyond cite-seq where the proteomic measurement part is substantially limited based on the number of antibodies)?

      Does this work well beyond measurements from blood cells (what seems like an easy case)?

      --- 10 ---<br /> I was hoping to see more of the unified cell state concept play out in its experiments. I feel like they got sidetracked (or rather, realized they didn’t have enough to really fulfill that ambition), but it would be nice to have that addressed more clearly.

      I was wondering if weights trained for a single modality as paired to a second modality could be transferred to a third modality comparison. Doubtful, but it would be interesting to explore.<br /> Not sure if this is something that you actually want to include in the review. It was more what I was focusing on and was somewhat dissatisfied by.

      The text in the figures is too small to read, generally speaking. I found issues with all figures with the possible exception of the first.<br /> Figure 1b, Cell-Cell Alignment is not intuitive. It goes from a UMAP to decode as a graph figure, and is not consistent with the batch correction element of the same subfigure. It’s an odd inconsistency.

    1. Reviewer #1 (Public review):

      Summary:

      The question of how or whether "extensive memory training affects neocortical memory engrams" (to use the words of the authors) is an interesting question and an area where I think there is room for advancing current knowledge. That said, I do not think the current paper succeeds in meaningfully addressing this question. At a conceptual level, I really struggled with the predictions and interpretations of the findings. There are also several elements of the experimental paradigm and analysis decisions that feel incompatible with the claims that are made. While the manuscript does demonstrate that several measures of neural pattern similarity differ between the various groups of individuals, the issue is that it is difficult to draw clear conclusions from these findings.

      Strengths:

      (1) This is a very unique dataset. Being able to recruit and enroll high-level memory athletes is impressive.

      (2) In principle, comparing memory athletes to control subjects, active control subjects (who received working memory training), and trained subjects (who received method of loci training) is very appealing.

      (3) In several ways, the authors were rigorous in their analyses.

      (4) In principle, the question of how memory training influences neural similarity vs. dissimilarity is of potential interest.

      Weaknesses:

      (1) As far as I can tell, the training manipulation is fully confounded with instructions. That is, subjects were only instructed to use the method of loci if they had completed method of loci training (or if they were the memory athletes). For the training group, in the pre-training session, there was no strategy instruction (subjects could do whatever they wanted), but post-training, they were told to use the method of loci. I understand the argument, of course, that naïve subjects might not be very good at using the method of loci if they had no experience with it. But, it does seem entirely possible that some (or even many) of the observed fMRI results that are attributed to "extensive training" are better explained by strategy use. That is, maybe the effects can be explained by TRYING to use the method of loci as opposed to actual proficiency with the method of loci. It seems impossible to address this, given the design of the experiments. As such, any claims about the effects of memory training, per se, feel inappropriate. It feels equally plausible that the effects are due to the strategy instruction. If the same results could be obtained through a simple strategy manipulation without ANY training at all, that would radically alter the interpretation of the effects. I think the strategy use account is, in fact, quite viable because it is very easy to improve subjects' memories with a method of loci instruction (relative to no strategy instruction) without ANY practice at all. Obviously, practice does improve memory performance with the method of loci, but my point is that even without any meaningful practice, there is likely to be SOME immediate benefit to adopting the method of loci as a strategy. There is also the question of why the effects for the memory athletes weren't obviously stronger than for the trained group, given that the memory athletes have much more experience with the method of loci. Ultimately, the problem with the current design is that I don't see how one can tease apart the role of training, per se, vs. strategy use.

      (2) There is no clear theoretical framework for the predictions or interpretations. The Results section is mostly a list of lots of different permutations of analyses (similarity within a group, between groups, between trials, across trials between subjects, during encoding vs. retrieval, frontal vs. hippocampal vs. parietal ROIs, etc). For each analysis, I did not have an intuition for what the prediction should be (e.g., should athletes have higher or lower pattern similarity?), and even after seeing all the results, I still do not have an intuition for how to interpret them. For the main results related to dissimilarity in prefrontal cortex, I would have, if anything, predicted the opposite: that when individuals are trained to use a common strategy, there would be MORE similarity between them. The Discussion acknowledges a very wide range of possible factors that might contribute to measures of similarity/dissimilarity, but I am ultimately left feeling that I have no idea how to interpret the results because the design and analyses were not structured such that any of these interpretations could be teased apart.

      (3) Same theme: the analyses shift from frontal regions (when looking at encoding) to hippocampus and precuneus (when looking at temporal recency). This shift in ROIs is confusing. The analyses (encoding vs. recognition) are essentially confounded with the ROIs (frontal vs. hippocampal/precuneus), so it's hard to know whether different analyses yielded different patterns or different ROIs yielded different patterns. Why were the frontal regions that were important for encoding ignored for the temporal recency judgments? And the fact that medial temporal lobe regions showed opposite effects to the frontal regions during encoding did not get much attention. Given that there were opposing patterns (dissimilarity vs. similarity) across different brain regions, the framing of the paper (that "the method of loci may bolster uniqueness") feels like a very selective representation of the data.

      (4) One of the more surprising aspects of the analyses (or at least one of the analyses) is that representational similarity analyses (RSA) are used to compare the average activity pattern (averaged across all trials) between different individuals. At a conceptual level, this really just reduces to a univariate analysis. It is not standard (or intuitive) to think about RSA that is essentially blind to the actual representational content. In other words, averaging across trials obviously washes out the content, and what is left are process-level effects. For process-level analyses, univariate analyses are far more common and seem more straightforward. However, these 'RSA' analyses are described as reflecting the "uniqueness of each word-location association" (an account which strongly implies content-level effects). This feels like an inappropriate description of what the analyses actually reflect.

      (5) I think the analysis looking at trial-by-trial similarity during word encoding (showing greater dissimilarity among the experienced individuals) is a somewhat interesting result, but again, I think the interpretation is very difficult. It is hard (or, impossible, I think) to get a clear sense of what is driving those differences. Is it the association of a unique spatial context? Is it somehow a product of better encoding, per se (as opposed to distinct spatial contexts)? These things could be tested by actually manipulating the spatial contexts in a more controlled way. For example, the paper by Liu et al. that is cited several times - and also a just-published paper by Christopher Baldassano (Nature Human Behaviour) - each used a very controlled paradigm where the (imagined) spatial location associated with each item was known/manipulated. However, the design of the current study does not allow for these things to be teased apart.

      (6) Relatedly, the training group seemed to receive instruction on a common spatial route, but, surprisingly, "Participants were free to choose which route and how many they would use to anchor the 72 items." Thus, if I understand correctly, we don't know whether the trained individuals were using common or distinct locations. And the fact that they learned a 50-location route but then studied a 72-word list is also a bit strange. Not having control or knowledge of the location that was associated with each word (sequence position) is a major limitation and also a major difference between the current study and other recent studies. For that matter, the word order was also randomized, so there was no control over whether the words and/or locations matched. These issues really complicate interpretation.

      (7) Again, same theme: for the result showing lower trial-by-trial similarity (within-subject similarity), the question is why, exactly, training/experience is associated with lower trial-by-trial similarity. Does training specifically or preferentially lead to greater differentiation between temporally-adjacent trials (as in Liu et al)? Does it lead to greater differentiation IF subjects associate each word with a unique location? Or maybe there is a more abstract effect of sequence/position that is independent of spatial location? Importantly, each of these three possibilities that I mention here has a precedent in prior studies that were more tightly controlled. But here, there is no way to tease these apart because of the experimental design, limiting the conclusions.

      (8) The ISC analysis described on p. 9 (line 328) is confusing. If I understand correctly, correlations between different trials were not computed (e.g., subject 1 trial 1 was not correlated with subject 2 trial 2). Rather, trial 1 was always correlated with trial 1 (in other subjects). Thus, it is not clear whether trial-level alignment matters at all. Maybe the same results would be obtained if there were no correspondence across subjects in trial number. Or if the trial order was shuffled within the subject. Given this, I simply don't know how to think about the data. And why did memory athletes show higher pattern similarity in this analysis as opposed to lower pattern similarity (as in some other analyses)? And why was this analysis performed by comparing memory athletes to each other as opposed to memory athletes to non-athletes? And, conceptually, why was this selective to the memory athletes or to the precuneus? And why was it selective to the temporal order test and not encoding? I am not asking the authors to answer each of these questions; rather, the point I am trying to make is that this analysis, and many of the analyses, seem to raise more questions than they answer.

      (9) The ISC analyses are interpreted in terms of scene construction and context reinstatement, but these conclusions go (very) far beyond what the data actually shows. Again, I don't see how this analysis lends itself to a meaningful conclusion. And this general critique applies to many of the analyses reported in this paper.

      (10) The fact that words were in random order per subject also makes the ISC analysis even more confusing to think about. The memory athletes had unique spatial routes (that they used for the method of loci) and unique word lists. So, why would it make sense to look at trial-level ISC? At a conceptual level, I simply don't understand what this is intended to capture.

      (11) Differences in the pattern of results between the encoding and temporal memory recognition task are hard to make sense of and are not addressed in much detail. Why would it make more sense to have across-trial similarity during recognition than during encoding? I think any account of this is very speculative.

    1. On 2025-05-06 22:01:39, user Young Cho wrote:

      1. Key Findings: <br /> The researchers conducted a comprehensive comparison of 16S rRNA gene-sequencing (metagenomics) and meta-transcriptomic (RNA-seq) analyses to profile the microbiota of the female reproductive tract (FRT). They revealed that the 16S rRNA sequencing effectively identified the bacterial taxa present; however, the authors did not account for the functional or metabolic activity of the bacteria. The meta-transcriptomic sequencing captured gene expression, identifying which microbes are transcriptionally active. This distinction is interesting, for it became clear that microbial communities inferred from DNA-based methods do not always reflect active nor beneficial contributors to the local ecosystem. The study found profound differences between the DNA and RNA profiles from the same samples, leading to significantly different conclusions to which microbes dominate the FRT environment. <br /> For example, the Lactobacillus species that are traditionally considered beneficial and dominant in healthy FRT were abundant in 16S profiles but exhibited low transcriptional activity in RNA-seq data. In contrast, the potentially pathogenic or dysbiosis-associated genera like Gardnerella and Prevotella were underrepresented in 16S data but demonstrated high transcriptional activity, especially in samples where DNA-based methods did not identify their presence as significant. These findings suggest that the mere presence of Lactobacillus may not be a reliable indicator of vaginal health unless the bacteria are also metabolically active. By exposing the divergence between microbial abundance and activity, the study challenges the assumption that taxonomic dominance equals functional influence, allowing the authors to propose that integrating both DNA and RNA-based molecular profiling is essential to an accurate understanding of the microbial dynamics in the FRT and to improve diagnostics and interventions for female reproductive health.

      2. Results:

      3. Figure 1 effectively supports the paper’s conclusion by demonstrating that integration of both methods of 16S rRNA gene sequencing and <br /> meta-transcriptomic analysis shows the different microbes in the female reproductive tract. This means this approach can detect both live and dead microbes.
      4. Figure 2 shows the abundance of various microbes in the female reproductive tract from utilization of the dual approach and supports the dual method in significantly increasing or improving the detection of microbial composition.
      5. Suggestions for the box plots are to add asterisks to visually see any significances and possibly only show the top 10 most abundance or significant genera to reduce clutter and highlight the meaningful results.
      6. Figure 3 supports the conclusions by showing the microbial diversity in the endometrium by comparing it to the vagina based on the different sample types as well. The significance was shown by asterisks as well.
      7. Figure 4 strongly supports the conclusion by effectively showing both DNA and RNA profiles across the samples, showing limitations of just depending on the RNA-based profiling alone, and the table also shows the support for the importance of quality interpretation of RNA-seq data as seen through the dramatic drops of number from human reads to microbial reads.
      8. Suggestions include grouping or ordering samples on the x-axis by the sample type, so far it appears random. This would help make it easier to compare patterns. For human vs microbial reads, label clearly with commas or decimals.
      9. Figure 5 effectively shows overlapping and unique genera, as well as emphasizing methods and tissue specific differences. They were able to show that the microbiome composition detected varied according to method and the type of sampling.
      10. Figure 6 supports conclusions by showcasing the main genera varied by methods and emphasizing that the microbe activity does not necessarily equal abundance . Suggestions include adding significance asterisks to show the differences.\
      11. Figure 7 reinforces the conclusions of the paper by showing functional activity vs structural presence and that specific genera in the endometrium may be undermined by relying on DNA-based approaches alone.
      12. Figure 8 strongly supports the conclusion that the 16S rRNA and meta-transcriptomic approaches result in different microbial profiles and that both approaches are essential to understand the endometrium microbiome. They directly compared DNA vs RNA endometrial biopsy samples.
      13. Figure 9 is an excellent figure in illustrating the workflow and summary for characterizing the microbiome and microbiota in different samples types. It was clear on their methods, analyses, and objectives on what they wanted to look at.
      14. Discussion: <br /> I liked that the beginning of the discussion section started off reiterating the importance of this study. One area that could have been improved was the first sentence. What diseases or medical conditions can 16S rRNA gene sequencing of the female reproductive microbiota help? Further into the discussion, I liked that the authors explained the importance of each experiment. For example, going into detail about why the Tao Brush and decontamination was a necessary step. Another area of improvement would be discussing the future directions with this novel concept. After these findings, how else can the authors use it to advance their understanding of the female microbiota? Other than that, I thought the discussion summarized the findings of the study well.
      15. Methods: <br /> Overall, the methods were well-written with thorough explanations as to why each experiment was conducted and this section was nicely organized. There are some missing gaps of information that could be elaborated upon to make the paper more digestible. For example, the study cohort consisted of women aged from 27-42 years old. I think the authors could have done a better job at explaining how they were able to define the exclusion criteria for reproductive age range. I noticed that when I google “reproductive age range”, there are a variety of ranges and am curious as to why the authors chose this range. In addition, the study cohort consisted of 44 women and the validation cohort consisted of 5 women. Why is the validation cohort such a smaller number of women? Does this affect any statistical analysis? Another section that could be expanded upon is on page 27, where they discuss 16S rRNA gene sequencing. A bit more of an explanation as to why the V4 hypervariable region was amplified may be helpful. As for Figure 9, while the figure is relatively easy to follow along, I think there were other ways to display the workflow and could have helped the readers more. Other sections such as the DNA and RNA isolation and Bioinformatics methods were easy to follow along and understand.
      16. Strengths and Limitations: <br /> One of the strengths of this study lies in its innovative side-by-side comparison of 16S rRNA gene sequencing and meta-transcriptomic analysis applied to the same clinical samples from the female reproductive tract. This dual approach offers a more nuanced view of the microbiota by distinguishing between microbial presence and metabolic activity, an important distinction that previous studies relying solely on DNA-based techniques have overlooked. The authors implemented a rigorously controlled sample processing pipeline that included steps to minimize host RNA contamination, which increases the reliability of microbial transcript detection. The study is supported by robust bioinformatics workflows with clear visualizations like principal component analysis and<br /> taxonomic heatmaps that effectively illustrate the divergence between DNA and RNA-based microbial profiles. The findings have important implications for clinical diagnostics, for they suggest that relying solely on taxonomic abundance may be insufficient to assess microbial function or pathogenic potential in reproductive health contexts. <br /> There are a few limitations that constrain the broader applicability of the study’s conclusions. The relatively small sample size of only ten women limits the statistical power and restricts the generalizability of the results across diverse populations; moreover, the cross-sectional nature of the study means it captures a snapshot in time and cannot account for dynamic changes in the microbiome across different phases of the menstrual cycle, pregnancy, or infection. While the detection of microbial transcripts adds a valuable functional layer, the study stops short of validating gene expression with proteomic or metabolomic data, leaving open questions about whether detected transcripts translate to actual protein production or metabolic impact. Also, the authors do not account for host physiological factors such as hormone levels, immune activity, or vaginal pH, which could influence microbial transcriptional activity. Addressing these variables in future studies would help refine interpretations and improve the clinical relevance of microbial activity profiles.
      17. Editorial Decision: <br /> Overall, I think the paper is relatively well-written and breaks down each section in a digestible way. I am not often exposed to these types of research but I was able to follow along. There were some minor suggestions I have which just include adding more detail to help the reader understand more. <br /> The overall paper does an excellent job in presenting the results in a way that allows the reader to follow along and understand their methods and the why. They effectively showed the benefits and specificity of the dual method through comparison of certain methods alone to emphasize how significant their dual method approach is. The results show the significance of implementing a dual approach for the potential clinical use to impact gynecological disease Suggestions: <br /> ● Some results could be grouped together such as Figure 2 and 3. It would be neat to show together both the abundance of microbes in the female reproductive tract and the diversity of microbes. As well as combine figures 7 and 8, both figures go over the abundances of the most abundant microbes in endometrial brush vs endometrial biopsy samples and compare DNA vs RNA. Combining these figures together to make one figure would allow the reader to quickly see the pattern or any differences. ● In the methods section, there are a couple spelling/grammatical errors. On page 25, under the sample collection header, the word “gynaecologist” is spelled incorrectly. The proper spelling for this should be gynecologist. On page 26, “two additional aliquot<br /> were…”, it should be aliquots written plurally. Then, on page 27, the sentence reads “a double purification with magnetics beads…”, shouldn’t it be magnetic beads? ● In discussion, they can emphasize more on the interpretation on the discrepancies such as why is there a discordance between DNA and RNA. They could also dig deeper as to why the RNA-based analysis provided higher resolution in detecting certain pathogens, even in the endometrium. <br /> Some minor revisions should be considered to strengthen the manuscript and improve its clarity and reproducibility. First, we recommend expanding the discussion on the clinical relevance of microbial activity profiling. For example, how might the distinction between dormant and transcriptionally active bacteria influence treatment strategies for recurrent bacterial vaginosis or fertility assessments? Second, it would be helpful to include a brief statement on whether sequencing batch effects were assessed or controlled, especially since subtle technical variability can influence community composition in small-sample studies. Clarifying this will reinforce confidence in the strength of the researchers’ findings. Third, the methods section should provide more detail in regards to RNA integrity metrics, for RNA quality is critical in meta-transcriptomic studies where degradation can skew transcriptional profiles. <br /> The paper fills a methodological and conceptual gap in the field and provides a framework for future studies incorporating both taxonomic and functional dimensions of microbiome analysis.
    1. On 2025-04-24 20:51:03, user Alizée Malnoë wrote:

      The manuscript by Peterman et al. investigates the role of microtubule dynamics in Langerhans cell morphology, phagocytosis, and directed migration in the epidermis. Through live imaging in zebrafish explants, the study shows that microtubules originating from a perinuclear microtubule organizing center (MTOC) guide the extension of dendrites for effective debris engulfment and enable precise migration toward tissue damage. When microtubules are disrupted, Langerhans cells become less efficient at phagocytosis and lose directional control during migration. These defects are linked to altered actin cytoskeleton polarity through the RhoA/Rho-associated kinase (ROCK) signaling pathway. The findings highlight how microtubule-dependent cell polarity enables immune cells to respond effectively within complex epithelial microenvironments. We found this study to be well-written and containing high-quality data that advances the fields of microtubule and immune cell biology. Overall, the data presented in this manuscript are done well and support the claims made by the authors. We outline some major and minor adjustments aimed at aiding the clarity of reporting and presentation.

      Major comments<br /> Page 10, Lines 286-289: We felt it was somewhat unsupported that F-actin accumulation in the trailing half of the cell was “consistent with the idea depolymerizing microtubules increases RhoA activity at the rear of the cell.” While the data clearly show a disruption in F-actin distribution with nocodazole treatment, we felt it was not clear that this would increase F-actin in the trailing half rather than evenly throughout the cell. Our lack of expertise in the field may lead to our misinterpretation of this sentence, however we felt additional explanation is needed (e.g. on the Lifeact-mRuby reporter) to clarify the section and support the conclusions drawn. Consider including a schematic of the model to ease interpretation of the data shown in Figure 4.

      Minor comments <br /> Page 2: It may be more effective to explicitly introduce RhoA/ROCK in the introduction rather than first mentioning it on page 10. This could connect your ideas more thoroughly, even if it’s just a brief mention in the introduction.

      Page 3, Line 102: You mention that the mpeg1.1 promoter labels multiple macrophage populations. Is there a concern that you’re labeling more than Langerhans cells in the epidermis, and that cells could be confused due to their altered morphology during the treatment?

      Page 3: The writing may be clearer if all acronyms (i.e. EMTB as ensconsin microtubule binding domain, EB3 as end-binding 3) are defined at their first use.

      Figure 1D: We found this panel somewhat difficult to interpret. Consider showing this panel in two dimensions displaying the percentage of EMTB+ dendrites as a function of the number of dendrites per cell.

      Figure 1K: It appears that the nocodazole treatment has one outlier (value of 100 µm). Does removing this datapoint change the significance of the treatment on maximum dendrite length?

      Figure 2E: It was unclear how the distance between the MTOC and phagosome was determined, i.e. whether the phagosome was measured from the point most distal or proximal to the cell body.

      Figure 2J: We thought your data would be most effective if you showed both the number and percentage of engulfment events for both control and nocodazole-treated cells to demonstrate how many events happened under each condition.

      Figure 3B: It appears that there are fewer Langerhans cells present in nocodazole-treated samples. Is this a significant impact or just coincidence in the images shown? Furthermore, could off-target effects or toxicity be impacting the migration differences seen here?

      Page 10, Line 263-264: There may be a typo here, where it‘s omitted that the “Langerhans cells had a smaller meandering index” were nocodazole-treated.

      Figure 4: A quantification of RhoA activation, e.g., using immunoblot, would be stronger evidence to support the conclusion that disruption of microtubules alters actin polarity through the RhoA/ROCK signaling pathway. This may be technically challenging: can one compare pulled-down microtubules to quantify RhoA binding between treated and non-treated?

      Figure 5: We’re interested to see if nocodazole-treated Langerhans cells would respond similarly to vehicle-treated (5C) or paclitaxel-treated (5D-E), especially considering the impacts of nocodazole on dendrite morphology (decreased cell dendrite number with increased length) you showed in Fig. 1G and Supplemental Video 3. We don't think this is a necessary experiment but may be worth including to provide alternative evidence of the impact of microtubule alteration on cell migration. We also found the placement of Figure 4 to disrupt the line of thinking connecting Figure 3 and 5. Consider moving Figure 5 after Figure 3 for logical flow, as Figure 4 is more mechanistic and addressing the question of the role actin plays in this process.

      Page 14, Line 385: It seems there may be a typo here, where “n=128 cells counted from N=13 scales” should include that these are in paclitaxel conditions.

      Page 14, Line 395: You mention that “acute chemical perturbations” were used in this paper. We thought that the laser ablation and/or scratch injury assays may be more accurately described as a physical or mechanical perturbation rather than chemical, but this may be from a lack of familiarity with writing conventions within the field.

      Page 15: In the second paragraph under “Cell motility,” there’s no name given for the image processing software used, which we think it would be helpful to include.<br /> Methods, Line 522: Could you write the exact percentage of DMSO used for the vehicle controls either here or directly in the figure legends.

      Supplemental Videos: We found your supplemental videos extremely informative. Would it be possible to include these in the main text?

      Madison McReynolds and Mandkhai Molomjamts (Indiana University Bloomington) - not prompted by a journal; this review was written within a Peer Review in Life Sciences graduate course led by Alizée Malnoë with input from group discussion including Sally Abulaila, Kim Kissoon, Michael Kwakye, Madaline McPherson, Habib Ogunyemi, Octavio Origel, and Warren Wilson.

    1. On 2025-03-10 04:21:18, user Young Cho wrote:

      Dear Authors,<br /> Thank you for sharing your insightful work, "In Silico Engineering of Stable siRNA Lipid Nanoparticles: Exploring the Impact of Ionizable Lipid Concentrations for Enhanced Formulation Stability." Your study makes an important contribution to the field of lipid nanoparticle (LNP) research by highlighting the role of ionizable lipid concentrations in siRNA encapsulation and stability. The use of coarse-grained molecular dynamics (MD) simulations and steered molecular dynamics (SMD) provides a detailed molecular-level understanding of LNP formation, which is particularly valuable for optimizing RNA-based drug delivery systems.<br /> Summary<br /> This study examines how neutral and positive ionizable lipids influence LNP stability and siRNA encapsulation efficiency. The findings indicate that LNPs with positive ionizable lipids encapsulate siRNA more effectively than those with neutral lipids, likely due to their integration with phospholipids and the prevention of siRNA escape. Interestingly, low and medium concentrations of both neutral and positive DLKC2 showed better compartment formation and encapsulation efficiency compared to high concentrations. Additionally, neutral lipids exhibited greater aggregation, which could impact LNP stability.<br /> Introduction<br /> Your introduction effectively establishes the relevance of this study by situating it within the broader context of LNP research. The discussion of existing challenges in siRNA delivery and the role of lipid composition is well-articulated. However, further elaboration on how your study builds upon previous experimental findings could help connect computational insights with practical applications.<br /> Results<br /> The figures and data presentation generally support your conclusions. Figure 4, which illustrates LNP compartment formation at different lipid concentrations, is particularly valuable in showing how high concentrations lead to instability. We do think that some quantitative metrics such as bilayer thickness, lipid density, or compartment size would enhance the strength of these findings. Additionally, since water is omitted from the visualizations for clarity, we think it would be beneficial to include a figure that shows water so we can visualize the hydration effects and lipid-water interactions.<br /> Discussion<br /> Your discussion effectively compares findings with prior studies, reinforcing that positive ionizable lipids enhance siRNA encapsulation and that lipid aggregation in neutral systems may reduce stability. While you mention previous molecular dynamics studies (e.g., Paloncýová et al. and Trollmann & Böckmann), we feel a more direct comparison of numerical data and trends from these works would further contextualize your results. Additionally, discussing potential experimental validation techniques (e.g., cryogenic electron microscopy or encapsulation efficiency assays) could provide future directions for integrating simulations with laboratory-based studies. However, that is just our opinion as we do understand that this study takes on a more computational approach and is still very impactful.<br /> Suggestions for Improvement<br /> 1. Expand quantitative analysis in Figure 4 by including measurements of bilayer thickness, lipid density, and compartment size to provide a more rigorous validation of LNP stability.<br /> 2. Clarify the role of hydration in lipid-water interactions since water was omitted in visualizations.<br /> 3. Strengthen comparisons with previous molecular dynamics and experimental studies by integrating direct numerical contrasts.<br /> 4. Discuss potential experimental validation approaches that could complement your computational findings and enhance their real-world applicability.<br /> Final Thoughts<br /> Overall, this paper presents a well-structured and valuable contribution to siRNA delivery research. With minor refinements in quantitative analysis, literature comparisons, and discussion of experimental validation, the study could be even more impactful. Thank you for your efforts in advancing the field of LNP-based RNA therapeutics!<br /> Best regards,<br /> UHM MBBE 602 Graduate Students

    1. On 2024-12-06 17:54:14, user Malte Elson wrote:

      The remarks below are a summary of the points discussed during the Cake Club of the Psychology of Digitalisation lab at University of Bern ( https://www.dig.psy.unibe.ch/studies/cake_club_/index_eng.html ). They do not reflect the opinions of each individual journal club participant. Any responses to these points should be addressed to Malte Elson.

      In their preprint, Spiess et al. (2024) illustrate the impact of influential data points on statistical significance in linear regression analyses. The authors reanalyzed data from three high-impact journals by searching for the term "linear regression” and digitizing graphs of the included papers (due to the absence of raw data). Their findings revealed that excluding influential data points often rendered previously significant results non-significant. The simulations included in the study largely confirmed expected outcomes, supporting the overall argument for incorporating leave-one-out analyses in data analyses practices. The authors ultimately advocate for broader adoption of such methods to enhance the robustness of statistical conclusions.

      We found the paper to be interesting and an illustrative contribution to statistical education, both in terms of the potential fragility of published claims and as an illustration of an intuitive but underused outlier detection method. We identified points that might allow the authors to strengthen future versions of the manuscript, including some critical points about potential weaknesses or absences in the current version of the manuscript.

      1) TERMINOLOGY CONFUSION AND REPORTING ISSUES<br /> * Graphs vs. Papers: There is some confusion regarding the unit of analyses, and probably some reporting errors: On p. 4, l. 115, the paper states that the sample was 24 + 30 + 46 = 100 graphs, whereas on p. 6, l. 170 the authors state they examined 100 publications (going by Table 1, this is a simple clerical error, and should say graphs).

      * Similarly, the description of the columns in Table 1 (p. 11) is confusing, and we think has at least one reporting error:

      * It is unclear what “Hits” represent: Are these unique papers, or do the search engines of Science/Nature/PNAS return the same paper multiple times for each instance of the search term (“linear regression”)?

      * What does "number of graphs that were not shown" mean? We think these are instances of linear regressions that simply were not reported with a corresponding graph in the original publication, but they could also be graphs missing, inaccessible, or excluded <br /> * The “Articles” column is described as “number of Articles in which the analyzable graphs were found” (p. 11, l. 314), but we think these are the 21 articles in which the 29 “influential variables” were found. The number of articles with analyzable graphs is not reported. It thus remains unclear how many papers were included, and how many graphs were analyzed from each paper.

      * On p. 6, the authors report having identified 29 graphs in 21 papers in which the removal of one datapoint changes the result of a linear regression (see also Figure 1). On p. 6, l. 179 the “incidence” (should be prevalence instead) of changes in papers is reported as ~20%. However, this puts papers (21) in the numerator and graphs in the denominator (100), which underestimates the prevalence. On the graph-level, it should be 29/100 = 29%. The paper-level prevalence cannot be calculated because the authors do not report the number of papers with analyzable graphs (see above).

      * We strongly recommend reporting a Prisma flowchart to clarify the inclusion/exclusion of graphs and papers. In the same vein, the paper lacks basic information about the included studies, such as sample sizes or the distribution of p-values. Other information would also help emphasizing the importance of the present study, e.g. citation metrics.

      * The authors refer to “Supplementary Data 1” (p. 4, l. 121) but provide no link.

      2) SAMPLING STRATEGY <br /> * The study focuses on digitizable graphs without overlapping data points, inherently excluding studies with (1) larger samples and (2) homogeneous effects, where overlapping data points should be more frequent. This selection skews the included papers towards studies with smaller samples and p-values near 0.05 (due to lower power and publication bias / p-hacking), which are more susceptible to the illustrated effects. This is not a problem per se, but means the findings (including the prevalence rate) are about a narrower population of studies. Either way, the selection effects should be discussed in the paper.

      * It is not fully clear how it was decided which graphs are analyzable and which are not. Moreover, on p. 4, l. 127-130 the authors state that the obtained regression parameters match those reported in the paper closely, but they do not further explain what exactly this means, or what happened when they did not match

      3) ANALYSES AND CONCLUSIONS <br /> * The analysis does not account for dependencies when multiple graphs from the same paper, which will likely be based on the same data (which are then susceptible to the exclusion effects), are included.

      * In a way, the susceptibility of findings to the removal of a single data point is a restatement of issues related to small samples. Small samples are inherently more fragile, and larger sample sizes are more robust to the influence of removing (or adding) single data points and render p-values (and other estimates) more stable. This is not to say that the findings reported are not interesting; however, we were wondering whether a table of all included studies sorted by observed p-value and sample size would have flagged the same fragile papers. This is also not to say that dfstat is redundant, and we absolutely see the pedagogical value in being able to point at individual data points that “cause” a finding to be significant. Rather, we would be interested to what extent dfstat converges with common heuristics.

      * Relatedly, the authors decry that influence measures such as dfstat are largely ignored, even by statisticians (p. 4, l. 139). This may well be, but of course, statisticians (and non-statisticians) are obviously aware of issues related to low power and small samples, and one of these issues is the problem of spurious findings (e.g. due to few, extreme data points).

      * The authors largely blame frequentist statistics, particularly on p. 10, where e.g. they state that “[a]s long as stating significance or not is still based on the ubiquitous α = 0.05 threshold, these statements can be sensitive to the presence of a single data point.” (l. 282-284). However, it is unclear how this follows from their findings. Any inference (not just α = 0.05) could be susceptible to the influence of single data points when the estimate is close to the criterion. Moreover, particularly when the sample size is low, any metric’s value (e.g. point estimates) will vary as a function of the removal of individual data points, regardless of whether the inference is threshold-based or not. This is simply a property of statistical models fit to a limited amount of data. So again, the issue seems to be with small sample sizes.

      4) RECOMMENDATIONS AND FUTURE DIRECTIONS<br /> Things we would have liked to see:

      * Additional analyses, such as leave-two-out or leave-k-out methods. The leave-one-out analyses are providing a good intuition of how fragile some small-sample study results are. Additional leave-k-out analyses would provide further information about the fragility of the entire sample.

      * So far, the authors are concerned with the fragility of results as an outcome of removing data points. An additional study exploring the reverse scenario would be valuable. Specifically, it could investigate how extreme an additional data point would need to be to alter results, and how adding non-extreme data points could mitigate the relative weight of extreme data points.

      * Discussing dfstat as a robustness metric (“How many individual data points would have to be removed/added to render a significant result nonsignificant or vice versa”)

      * A discussion of how dfstat could be used for p-hacking by showing researchers which data points they would have to remove to turn a nonsignificant study result into a significant one.

      * The authors graciously and immediately shared data and code with one of us who requested it, and we thank them for this. We would like to see this data and code provided in a public repository and linked to in a future version of the manuscript.

      * We note that the authors chose to anonymise their data so that the reader cannot tell which original study’s results are robust or not. Personally, we think that meta-scientific interests are best served by making this information public; that is, we would like this data to not merely be used to illustrate the method but also inform the reader about the fragility or robustness of those publications’ results. Of course, not everyone agrees with this practice - perhaps the authors could comment on their perspective on this issue in a future version of the manuscript.

    1. On 2024-07-15 17:30:04, user priyanka.bajaj3193@gmail.com wrote:

      Reviewed by Priyanka Bajaj and Christian B. Macdonald (UCSF)

      Summary:

      Fusion oncoproteins occurring from genomic rearrangements are commonly observed in cancers and often drive oncogenesis. Although these fusions frequently involve kinases or transcription factors, they are a diverse group at both molecular and functional levels, and a unified description of their oncogenetic properties is lacking. Robust methods for predicting oncogenicity of unknown fusions would be immediately clinically useful, making this an important gap. At a more basic level, this points to a gap in our ability to describe a key biological phenomenon. Some recent work has tackled this problem by examining the physicochemical properties of fusion oncoproteins, notably [1], but this is essentially still an open question.

      In this manuscript, the authors present a language model of fusion oncoproteins, FusOn-pLM, by fine-tuning ESM-2 with two recent databases of human fusion oncoproteins. They compare random masking vs. one using their previous fine-tuned ESM-2 model SaLT&PepPr and benchmark their results on a number of tasks, demonstrating reasonably increased specificity on specific tasks and improvement with non-random masking. The model training and benchmarking are sound and convincingly demonstrate the improvement.

      Despite this, the lack of clarity about what unifies fusion oncogenes is a major challenge. Language models can be powerful ways to learn these sorts of definitions in a less biased way, and in that light this is an important step towards clarifying this basic gap. However, as written, the work uses a working definition of fusion oncogene that is based on physicochemical properties that may or may not be specific to oncogenes. Examining the benchmarking tasks the authors use makes this clearer: they are almost entirely predictions of condensate and IDR properties rather than oncogenetic ones. The one truly cancer-specific benchmark, differentiating carcinoma classes, is fairly narrow and no model performs particularly well here. As a result, we are unsure how strongly this model will perform in discrimination or generalization tasks.

      Another general problem for the field is the lack of negative controls. Gene fusions are relatively common mutations, but bona fide oncogenic fusions are a small fraction of all fusions, making this a class imbalance problem. Even within tumors, the majority of fusions are thought to be passengers rather than driver mutations. Any predictor should be able to discriminate between these, but the lack of good data on non-oncogenetic fusions makes this challenging. This is evident in this work, where the model’s discrimination is not strongly tested.

      In summary, we believe this is technically strong work which addresses a pressing need, and which also presents some general strategies for domain-specific language model fine-tuning, but which is unfortunately hamstrung by defects in the available data and conceptualization of the field that are outside of the authors’ control. As presented, it will be of interest to AI practitioners and oncofusion researchers, but the clinical utility is unclear.

      Major points:

      1) As discussed, we think the concept of an “oncofusion” is somewhat diffuse, as it describes an extremely heterogeneous set of proteins. This makes the prediction task particularly difficult. While the introduction discusses the barriers to prediction of fusion oncoproteins due to their intrinsically disordered regions and large size, we believe a bit more care with the effective definition they are using is warranted. Related to this is the choice of FOdb to train their model, which is essentially a database of condensate properties of oncofusions rather than oncogenetic ones. The implications of this choice also warrant a bit more discussion.

      2) We wonder if there is a class imbalance problem. The databases used to fine-tune their model have a small fraction of possible fusion proteins, and don’t contain large amounts of negative training information. We are thus unsure if FusOn-pLM’s significant improvements over ESM-2 are specific to driver fusion oncogenes.

      3) The method is not contextualized with respect to prior work in computational oncofusion prediction and characterization. Such methods are few ([2],[3],[4],[5],[6] among others) but important to understand FusOn-pLM’s performance.

      4) Several experimental datasets for fusion oncogenes have been published, including [7], [5], and [8]. FusON-pLM’s performance on these would be a compelling way to show its utility, as well as a more specific oncogenetic task.

      Minor points:

      1) Figure 2D: Although FusON-pLM is doing a slightly better job at distinguishing carcinoma prediction into two classes (BRCA vs. STAD), the performance metrics are the worst across the board. What does this mean for the prediction problem overall? Does the fact that IDR and condensate properties are much better predicted mean that the model is actually not learning an oncogenetic task? This seems worthy of more discussion.

      2) Figure 4A: The authors present a FusOn-pLM embedding visualization of fusion oncoproteins, along with the corresponding head and tail protein sequences. It would be beneficial to clarify whether the protein sequences used for the head and tail counterparts are full-length sequences or only up to the exon breakpoint that forms the chimeric fusion protein. This information can be included in the Materials and Methods section.

      3) Figure 4A: The authors demonstrate that FusON-pLM is able to separate out fusions from their head and tail components. To demonstrate that it is learning more specific embeddings for fusion oncoproteins, a comparison of the embeddings with untuned ESM-2 would be appropriate.

      4) Figure 4B: In the main text of results section the authors write “FusOn-pLM largely clusters sequences by key properties such as the fraction of polar, charged, and disordered residues as well as the propensity to form pi-pi and pi-cation interactions and prion-like domains, via the PLAC NLLR score.” From the data shown in Figure 4B, this conclusion seems fine for polar residues and NLLR scores, but not for disordered residues and pi-pi/pi-cation interaction propensity by eye. Without quantification of the clustering, we are not sure this statement is supported.

      References:<br /> 1. Tripathi S, Shirnekhi HK, Gorman SD, Chandra B, Baggett DW, Park C-G, et al. Defining the condensate landscape of fusion oncoproteins. Nat Commun. 2023;14: 6008.<br /> 2. Shugay M, Ortiz de Mendíbil I, Vizmanos JL, Novo FJ. Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions. Bioinformatics. 2013;29: 2539–2546.<br /> 3. Abate F, Zairis S, Ficarra E, Acquaviva A, Wiggins CH, Frattini V, et al. Pegasus: a comprehensive annotation and prediction tool for detection of driver gene fusions in cancer. BMC Syst Biol. 2014;8: 97.<br /> 4. Lovino M, Montemurro M, Barrese VS, Ficarra E. Identifying the oncogenic potential of gene fusions exploiting miRNAs. J Biomed Inform. 2022;129: 104057.<br /> 5. Li J, Lu H, Ng PK-S, Pantazi A, Ip CKM, Jeong KJ, et al. A functional genomic approach to actionable gene fusions for precision oncology. Sci Adv. 2022;8: eabm2382.<br /> 6. Liu J, Tokheim C, Lee JD, Gan W, North BJ, Liu XS, et al. Genetic fusions favor tumorigenesis through degron loss in oncogenes. Nat Commun. 2021;12: 6704.<br /> 7. Frenkel M, Hujoel MLA, Morris Z, Raman S. Discovering chromatin dysregulation induced by protein-coding perturbations at scale. bioRxiv. 2023. doi:10.1101/2023.09.20.555752<br /> 8. Kobayashi Y, Oxnard GR, Cohen EF, Mahadevan NR, Alessi JV, Hung YP, et al. Genomic and biological study of fusion genes as resistance mechanisms to EGFR inhibitors. Nat Commun. 2022;13: 5614.

    1. On 2024-06-07 16:53:51, user Reviewer 6 wrote:

      I am a C. elegans researcher with some familiarity with the topics discussed. I do not personally know, nor have I interacted with any of the authors involved. I have read in detail both the preprint and the response in the comments. Below I provide some comments in the hope that they will hone arguments from both sides. For brevity, I refer to the authors of this preprint as “the authors” and Dr. Coleen Murphy as “CM”.

      Summary:<br /> In my view, there are two issues here (1): the technical reproducibility of the choice assay; and (2) the physiological importance of CM’s results in a natural setting given the points raised by the authors. While CM makes some valid arguments on (1) – the authors should really have shown at least a few assays that attempted to follow the protocol exactly as stated by CM – the deviations here are in my view minor enough to raise significant questions about the choice assay and its interpretation. I believe the authors are justified in stating that (2) if the variables discussed here indeed significantly obscure detection of the phenotype, then the ecological significance of the inherited learned avoidance in a natural setting is in question. This is especially important given that, contrary to CM’s response, the authors do in fact see learned avoidance of PA14 as well as daf-7 expression at P0 and F1 in some experiments (indicating that the learning was induced) but not beyond in the F2 progeny of these same worms which displayed learned avoidance. Below is a detailed discussion of these points.

      Specific comments:<br /> - CM states that the lack of naïve PA14 preference seen by the authors is a “serious cause for concern”. In CM’s 2024 paper (Fig 1, https://journals.plos.org/p... , worms are tested for bacterial food choice between OP50 (the lab food) versus bacterial species C. elegans may be exposed to in the wild. However, it seems that worms naively avoid OP50 (i.e. ‘prefer’ test bacteria) in essentially every comparison made by CM. This is contrary to reports by other labs (PMID: 38228683, PMID: 38228683) and in my view potentially a more serious concern with the assay. Contrary to CM’s assertion, while CM’s group and others see *mild* PA14 preference in naïve worms, other groups also do not observe such a preference in naïve worms or report more variable results (e.g., PMID: 21172617, PMID: 28877481, PMID: 31371455). Overall, the authors did replicate P0 and F1 learned avoidance in some runs and had a “learning index” consistent with prior reports in these experiments, so I do not see how the lack of purported naïve PA14 preference (which is quite minor and variable to begin with) is a significant concern here. <br /> - Looking at the authors’ raw data (table S2) for individual experiments, it seems the authors used <200 worms as advised by CM for most of their plates. The “up to 770 on a spot” was from a single plate, so I do not think this would change the conclusions of the authors. The authors compared worm density with choice index and found that there is no correlation within the ranges tested here.<br /> - “no azide or other paralytic used” (CM) – the authors claim to have tested this and state that addition of azide did not affect their results. They also claim that worms make a choice within 15 minutes and do not leave the respective lawn in the first hour of the assay. But none of this data is shown (it should be). <br /> - It seems that CM’s group counts worms in proximity to lawns “if they are within a few millimeters of the bacterial spot.” (STAR protocol). This may introduce systemic bias given the OP50 and PA14 lawns are clearly visibly distinct. Again, this raises questions to me regarding the reliability of this assay for interpreting minute effects and making broad generalizations.<br /> - Aspirating worms for counting would be unlikely to affect results.<br /> - The fact that conditions tested by the authors are varied between experiments is in my view a strength of this study given they did not observe F2 effects in any of their tests (you would normally change parameters rather than keep repeating the same protocol if you were unable to reproduce something, no?). However, testing variables/conditions such as temperature, light/dark etc. are informative only in a context where the authors have first fully followed through on the exact CM protocol with no deviations. So, I do think it is crucial to show a few attempts where the protocol is followed exactly as stated by CM.<br /> - The use of Triton X after bleaching may be a concern as CM points out. Though seemingly low (0.01%), this may hypothetically make bleached (i.e. already somewhat stressed) embryos or newly hatched L1s more vulnerable to pathogenic bacteria or alter their physiology. I do not see a point in including Triton X during or after bleaching, it is not standard nor required and is certainly a confounding variable. However, given the CMC of Triton X is 0.02% and the authors use below this concentration and only during plating, I would be surprised if this led to a dramatic change in the phenotype observed.<br /> - I do not find CM’s critique on daf-7 expression to be substantive. CM asserts that the authors do not see elevated daf-7p::gfp expression. Except they do! Which is especially evident with the single copy (SC) construct Fig 2 under SC at both 20 and 25oC. The magnitude of P0 daf-7 increase with the SC construct (~2 fold) is similar to what other groups observe at this generation (albeit with the multicopy strain, so it is hard to compare). I think the use of a single copy reporter is a strength of this paper, but in the future assays of daf-7 expression should really be done using endogenous CRISPR/Cas9 reporters. That an F2 response is not observed in runs where there is a high upregulation in the F1 generation is consistent with the authors’ interpretation.<br /> - The authors should show representative images of what is being quantified as CM states, as without this we do not know which neurons are being assayed. I do not think averaging both ASI neurons in a worm is a concern – even if there is an increase in one ASI, it would still be reflected in the average (as long as the correct neuron is being quantified). It may even reduce variability or bimodality to average the two, given the brightness of reporters on a confocal image can depend on the depth of the imaging plane as the authors state. <br /> - CM states that chunking is an unusual way to maintain the fluorescent strain. But this is a genomically INTEGRATED multi copy array (ksIs2), no? The point of the authors is that the fluorescence expression and associated Rol marker are unstable in their expression, which is not unusual for such integrated repetitive multicopy arrays. This is not an extrachromosomal array wherein fluorescent worms need to be picked to maintain the array, so CM’s statement that it is “standard accepted practice” to do so is simply wrong. In fact I find it quite concerning if CM’s group picks fluorescent worms to maintain this strain as it biases the worms for an epigenetic state in which the integrant is poised for expression, which may indicate other epigenetic issues in the strain’s background (i.e., lack of silencing of repetitive sequences). The instability of this strain I assume is why the authors obtained a single copy daf-7 reporter, which in any case would supersede any results obtained from a multicopy array. CM says nothing about the single copy integrant results, and I believe that given the authors observe P0 and F1 upregulation with the single copy integrant, I think the case is solid that there is no response observed in F2 worms from F1s showing daf-7 upregulation. An endogenous CRISPR/Cas9 reporter (e.g., transcriptional/SL2::GFP if a translational fusion is not possible) would really push home this point. <br /> - CM states that the authors replicates show poor “consistency”. However, we can only see this because the authors, unlike CM, show each experiment independently! We have no idea whether every experiment CM performed actually displayed learned avoidance behaviour, given the source data for CM’s choice assays is apparently not public. CM’s reports only show all learning experiments in aggregate, and I believe if the authors aggregated all their runs herein to a single plot, they would indeed see a seemingly ‘consistent’ avoidance effect. CM could easily address this by releasing raw/source data for choice/learning assays.<br /> - CM claims that in their hands behaviour from a set of training plates is ‘always’ consistent, but data are not shown. Both sides need to avoid making important claims without showing data.<br /> - CM states that the authors use of the same population to assay and then maintain for the next generation may confound the results. Again, the authors need to do the assay exactly as stated by CM, but if a few extra minutes of suspension in buffer really so obscures the phenotype beyond any detection, then how ecologically relevant can it possibly be? To my knowledge, there is no major phenotype that is completely ablated by a few minutes additional incubation in buffer. By this standard nothing involving washing off worms in a buffer would be interpretable.<br /> - It is interesting that sid-1 and 2 mutants do not show a learned F1 avoidance, but daf-7 expression is still elevated. It may be sufficient to have one SID protein for elevated daf-7 expression in progeny but require both for the behavior. Given both sid-1 and 2 are RNA transport channels, without double mutants and reliable daf-7 readout from an endogenous reporter, it is difficult for either group to infer any epistatic relationships between these genes. <br /> - I read the protocol file with notes from CM. I did not find any changes that are severe enough to cause concern and it seems that these are more clarifications/updates than changes to the fundamental principles of the assay. I also did not find the authors’ statements on this disingenuous, as there were clearly differences between the original STAR protocol and the updates provided. It is important for both parties here to refrain from personal attacks and address the substance of the arguments made.<br /> - I did find that some details in the STAR protocol were excessive, e.g., the height of plate stacks. I appreciate the detail but again, this raises the question that if such artificial variables really influence the phenotype so severely that it is no longer at all detectable, how physiologically relevant or robust can the phenotype be? <br /> - The statistical error in the STAR protocol pointed out by the authors: it seems either CM is misinterpreting a two-way ANOVA or that this was an oversight. I did not find this point too important overall as correcting such a statistical error would not change the conclusion of CM’s papers given the magnitude of effects previously described. <br /> - CM states that expression of P11 is essential for TEI. In CM’s 2020 paper (Kaletsky et al) it is stated that: “moreover, training on a P11 mutant that disrupts the perfect match to maco-1 but conserves P11 secondary structure induced no avoidance (Fig. 4e)”. As written, it seems essential not just for TEI (F2 effect) but also the P0 learning itself (unless CM can clarify that it is only required for the F2+ effect and that in Fig 4e only F2+ are being tested). So as I understand it, if lack of P11 expression is the issue, then there should be no P0 or F1 avoidance at all in any of these runs. Given the authors do not see an F2 effect in worms with robust P0 and F1 responses, it seems that this point is moot. I also do not think the authors can be blamed for any putative lack of P11 expression as it seems that for this portion (PA14 growth) they adhered to the protocol quite closely and explored various PA14 lines including those obtained from CM’s and other labs.

      In summary, I think CM’s response is insufficient to alleviate many of the key concerns raised by the authors herein. I do not believe the lack of naïve PA14 attraction is a major concern, as there are literature examples where (a quite minor) naïve PA14 attraction is not observed. Furthermore, this is also confounded by CM’s recent (2024) paper wherein their worms prefer essentially every bacterium among a panel over OP50 in a naïve test, again contrary to prior reports from other labs. This makes me question the robustness as well as any broad conclusions that can be drawn from this assay.

      The authors do also observe P0/F1 learned avoidance and elevated daf-7 expression contrary to CM’s rebuttal. I agree that the effects shown are not consistent between experiments here, but we cannot say whether this is simply because we are seeing here individual runs of inherently inconsistent assays whereas looking at an aggregate of data in CM’s papers (since the source data for the choice assays are not public). The major concern is that in those populations with P0/F1 responses (meaning the learning has been successfully induced), there is no further inheritance of avoidance beyond F1, and similarly for daf-7 wherein populations expressing high daf-7 at P0 and F1 do not transmit this to progeny. I believe this precludes “basic concerns about [the authors’] bacterial and C. elegans growth conditions, assay conditions, and assay techniques”. Overall, while it is important for the authors to show a few runs where the protocol is followed exactly as described by CM, I believe the deviations here are minor enough that even if they were able to replicate the transgenerational effect successfully, the sensitivity of the effect to such minutia would greatly diminish its physiological relevance to the worms - and its importance as an adaptive paradigm of transgenerational epigenetic inheritance - in a natural setting.

      I also do not find it constructive for any party involved to address anything other than the scientific substance of arguments or engage in personal attacks. Given the attention and broad reach these studies have garnered, as well as the important implications, it is essential – and the normal course of the scientific endeavor – for such claims to be rigorously tested.

      I also very much appreciate that the authors have shared these observations, and find it very commendable that CM has responded in a timely and comprehensive manner (as well as been responsive to the authors in refining their protocol).

    2. On 2024-06-05 18:16:30, user Coleen Murphy wrote:

      Point-by-point critique of Gainey et al. 2024:

      Figure 1: <br /> 1. (A-C) It has been reported by many groups that PA14 is mildly attractive to C. elegans, that is, given a choice between PA14 and OP50, worms choose PA141,2. However, in almost every assay shown in this paper, the worms prefer OP50 over PA14 – that is, they are already avoiding PA14 - prior to training (naïve preference), which is odd. This suggests that the authors are not using conditions that are standard, either in PA14 or OP50 growth or in choice assays (see note about choice assay performance). This is a serious cause for concern that is independent of any training conditions. In fact, as far as we can see, in only one case (Fig. 1C, F1) did their experiments replicate the naïve choice results observed by other groups. <br /> 2. Choice assays: their “choice assays” involve putting 3-4x the recommended number of worms on a plate (up to 770 on a spot!), letting them roam for variable amounts of time (“30-60 minutes”) without trapping them (no azide or other paralytic used), and then putting them in a 4°C incubator (which does not immediately halt worm movement), then counting them. None of this follows our published choice assay protocols, or the standard chemotaxis assay protocol3–6. Putting more than 200 worms on a single plate can lead to altered choice because of crowding. In the absence of a paralytic, worms change their preference due to various factors, including adaptation; therefore, in this case, the worms’ first choice (which is what we measure in all our assays) is not being measured. They also count the worms by “aspirating” the worms off of the plate, which is not standard in any behavioral assays, as far as we know.<br /> 3. Table 2 and Figure 1: There are almost no true replicates, as in each experiment, at least one or more condition is changed. (For example, the authors only tested the PA14 we sent them in one replicate - Exp 3). <br /> 4. daf-7p::GFP imaging experiments (Fig. 1D, F, H) – Hunter and colleagues do not report seeing increased daf-7p::gfp expression in the P0 generation. Increased daf-7p::gfp expression after exposure to PA14 has been reported by multiple groups7, not just ours, and is usually not small or highly variable, as it is due to the combination of bacterial cues and P11 small RNA; if they cannot replicate this basic result, it suggests that something is seriously wrong with their protocols or technique, or their worms are very sick, even before trying to use our protocol to train worms. <br /> 5. Additionally, they do not report the expression of daf-7p::gfp in the ASJ neuron7, which is very strange, since we have been able to reliably replicate Meisel, et al.’s finding in the P0 generation. Therefore, it is not clear from which neuron the authors are quantifying daf-7p::gfp levels. <br /> 6. Instead of imaging and reporting fluorescence levels in individual neurons, the authors averaged fluorescence intensity/worm, which is explicitly not what we did or others have done, because different neurons in each worm can have different intensities – particularly if they are the ASI rather than ASJ neurons. <br /> 7. While we see modest decreases in fertility after PA14 training, the authors report severe decreases in fertility: about one fifth of normal egg production, and a severe developmental delay) in their F1 generation that we do not observe. Both facts indicate that their worms are very sick, even the worms that have not been exposed to PA14. If their worms are extremely sick, it might account for the small number of progeny, poor imaging results, and a developmental delay that shifted the training times. This could be a result of overbleaching, which causes developmental delays; the bleaching protocol described in Gainey et al. deviates from our published protocol. Additionally, they add Triton X100 to their final M9 wash, which is used (although at a higher concentration) to permeabilize embryos in other protocols. We are not aware of any bleaching protocols that include Triton in a wash step, and our lab certainly does not; this addition might also damage the progeny.

      Figure 2 <br /> 1. P0 imaging data suggest that the daf-7p::gfp response to PA14 is not reproducible in their hands; again, this has nothing to do with our paper or protocols, but rather appears that they cannot replicate previous results in the field that precedes our work. <br /> 2. Does “25°C” mean that the worms were grown at or assayed at 25°C, or both? This high temperature is generally hard on the worms. <br /> 3. Technical note: it appears that instead of consistently picking fluorescent daf-7p::gfp animals, the authors “chunked” large groups of worms, resulting in populations of non-fluorescent animals in their experiments. <br /> 4. Scale of P0 and F1 are extremely different (due to sickness of the P0s?).

      Figure 3 <br /> 1. Notes that panels A, C, and D are repeated from Figure 1.<br /> 2. The authors discuss “OP50 aversion” but this does not make sense, since both trained and untrained animals are placed on HGs after bleaching. <br /> 3. Their naïve in F1 is sometimes even lower than in the P0 (Fig. 3D).<br /> 4. There is no consistency in their results across replicates, within experiments, or across figures of the paper – not just the inability to see an F2 effect, but in their naïve chemotaxes, P0 trained choice indices, and F1 results; the authors claim that their F1 assays are reproducible, but only 3 out of the 9 assays in this figure show F1 learned avoidance. <br /> 5. In 3J, data that are not replicates, as they have been performed using different conditions, have been pooled. <br /> 6. Gainey et al. observe substantial variation in behavior between training plates (Figure 3, table 2, S2 annotated protocol), and incorrectly treat each training plate as a biological replicate, rather than a technical replicate. (Each training plate is seeded and grown in the same conditions, and worms from the same bleached population are added onto the plates, therefore these are not biological replicates but rather technical replicates; biological replicates require starting with different worm populations and carrying out the whole experiment independently.) In our hands, behavior from a set of training plates is always consistent. <br /> 7. Additionally, we note that the authors use the same population of worms for the choice assays and subsequently for bleaching, meaning that worms are held in liquid for an extended time before bleaching; this may cause worms additional stress which may interfere with behavior.

      Figure 4 <br /> 1. OP50 growth conditions: this would only matter if the controls and experimentals were grown on different plate types, which is not the case (but if the authors are in fact putting the controls on different plates from experimentals, then the experiment is done incorrectly).

      Figure 5 <br /> 1. We also found that sid-1 and sid-2 are required, but since their controls are inconsistent (Fig. 3) in the first place, it is hard to know how to interpret their data. <br /> 2. Other mutants (rde-1, hrde-1, sid-1, sid-2) – still show increased daf-7p::gfp in F1 – again, these data are hard to interpret since they do not show a wild-type control that worked here. This also has little bearing on our work since other training paradigms (e.g., 4- and 8-hour training that engages small RNA-independent pathways) also induce daf-7p::gfp. It is also unclear which neuron (ASI vs ASJ) they are imaging.

      Discussion <br /> 1. daf-7p::gfp - Picking fluorescent worms or rollers is standard worm husbandry; it is not a “result” to say that they noticed that Rol can be lost – but it does indicate that they should have discarded any results that they obtained before noticing that the array might have been lost in the worms they assayed. The fact that they have brought this up more than once suggests that they are not using standard accepted practices to maintain transgenic lines. <br /> 2. Dennis Kim’s work on phenazine-induced avoidance has been oddly neglected in this work7. Kim’s group found that phenazine-1-carboxamide induces Pdaf-7::gfp expression in the ASJ neuron, which we see quite reliably in our assays as well. No Pdaf-7::gfp imaging of the ASJ neuron is presented in this work, suggesting that either the PA14 they grew also did not make phenazines, or their image analysis is unreliable. <br /> 3. They made a lot of changes to our protocol (temperatures, light/dark, etc). We cannot find in this paper a single example of an experiment that followed our protocol entirely. <br /> 4. The authors make a point of calling OP50 a pathogen, which is odd; C. elegans grown on OP50 typically live for 2-3 weeks. They cite Garigan et al. 20028, which showed that when worms get old (past 15 days) eventually the pharynx stops grinding up bacteria and the gut will start to fill up with OP50, and killing bacteria does slightly extend lifespan - but this is not an effect observed in young (Day 1) animals on the short timescales used in the experiments here. In any case, since both control and trained animals are grown on HG plates with OP50, it cannot explain the behavior of the control animals. <br /> 5. The authors also never replicate the “bias towards Pseudomonas in choice assays ((Ha et al., 2010; Lee et al., 2017; Moore et al., 2019)” – Those papers also used OP50 vs PA14 to demonstrate this bias towards Pseudomonas, so it is unclear how the author think that their failure to replicate this basic finding is somehow supportive of any of their arguments. It is more likely that there is something fundamentally wrong in their initial conditions that have prevented the replication of all other groups’ findings, not just ours. Moreover, in our experiments, other than the 24 hrs of training on PA14 vs OP50, our control and trained animals are always on the same plates. This argument makes no sense, unless the authors have introduced an additional variable of plating control worms on one kind of plate/bacteria and their trained animals on a different plate/bacteria (which we do not do). <br /> 6. It is unclear why the authors grew worms at different temperatures. 20°C is the standard temperature for worm growth and assays. <br /> 7. In our hands, naïve OP50-PA14 choice index is not significantly different between P0 (when NGM plates are used) and the subsequent generations (when HG plates are used). The survival assay correlates well with the idea that their worms are very sick, much sicker than we see in our assays, although the sparse intervals in both assays make it difficult to draw any conclusions – not possible to draw the conclusion that the bacteria are “more lethal” since they are trying to compare two lifespans from different labs etc. - but if they are, it might be due to their PA14 cultivation conditions or the health of their worms. But the fact that they see massive leaving and desiccation of worms, they might indeed be growing PA14 under much more pathogenic conditions. <br /> 8. The authors state: “Near the conclusion of these experiments, we received an updated protocol that included several clarifying edits and additional deviations from the published protocols (C. Murphy, Personal communication).”

      We clarified our protocols, we didn’t “deviate” from them. This is a concerning way to present our email communications in which we tried to correct errors in their protocol and offer constructive advice; we even extended an invitation to Hunter to visit our lab to learn the assay. We are happy to provide these emails if necessary.

      In order to help others, we continuously update our lab’s protocols to make clarifications that will help future users. Any note from the Murphy lab is an example of this type of updating. For example, later we made a new bacterial construct that used a Kan marker and constitutive promoter instead of an Ara inducible promoter and Carb marker to streamline experiments. This is not a deviation, it is a natural progression of the research in our lab and our practice of continuously improving our assays and updating protocols.

      It is disingenuous for the authors to present our updates to our protocols as if we have “deviated” from them – in every instance, we gave the authors all of the information that we had available to us at the time. Our suggestions were made genuinely and in good faith, with the assumption that the authors wanted to get the assay working rather than using it to point out changes in our protocol.

      Moreover, this statement corroborates our assertion that all or most of the data in this paper seem to have been generated using a protocol that differs significantly from our lab’s, as the bulk of their experiments appear to have been done before contacting us: “Incorporating these changes into our procedures did not reliably alter our results.” (no data shown)

      1. “[T]his example of TEI is insufficiently robust for experimental investigation of the mechanisms of multigenerational inheritance” – The authors failed to test the fundamental requirement for transgenerational inheritance, that is, the expression of P11 sRNA by PA14, which only happens on plates at 25°C. Since they cite our subsequent papers where we first identified P11 sRNA as the key to TEI9, then our finding that the Cer1 retrotransposon is also required for P11-mediated TEI10 and then our finding that other Pseudomonas species use a similar small RNA to induce TEI11, they are definitely aware of this fact. Thus, it is not clear to us why they have not attempted to test P11 sRNA levels while searching for conditions that would replicate our findings. As a result, we can never know whether P11 sRNA was produced in any of the conditions that the authors tested in the experiments shown.

      Together, Hunter and colleagues’ failure to replicate the basic naïve attraction to PA14 over OP50 demonstrated by other labs, their failure to replicate the P0 daf-7 expression published by other labs, and their failure to reliably replicate the P0 and F1 behaviors shown by other labs suggests to us that there are more basic concerns about their bacterial and C. elegans growth conditions, assay conditions, and assay techniques independent of any of the attempts to replicate the findings from our work.

      References <br /> 1. Zhang, Y., Lu, H., and Bargmann, C.I. (2005). Pathogenic bacteria induce aversive olfactory learning in Caenorhabditis elegans. Nature 438, 179–184. https://doi.org/10.1038/nat....<br /> 2. Ha, H., Hendricks, M., Shen, Y., Gabel, C.V., Fang-Yen, C., Qin, Y., Colón-Ramos, D., Shen, K., Samuel, A.D.T., and Zhang, Y. (2010). Functional Organization of a Neural Network for Aversive Olfactory Learning in Caenorhabditis elegans. Neuron 68, 1173–1186. https://doi.org/10.1016/j.n....<br /> 3. Moore, R.S., Kaletsky, R., and Murphy, C.T. (2019). Piwi/PRG-1 Argonaute and TGF-β Mediate Transgenerational Learned Pathogenic Avoidance. Cell 177, 1827-1841.e12. https://doi.org/10.1016/j.c....<br /> 4. Moore, R.S., Kaletsky, R., and Murphy, C.T. (2021). Protocol for transgenerational learned pathogen avoidance behavior assays in Caenorhabditis elegans. STAR Protoc. 2, 100384. https://doi.org/10.1016/j.x....<br /> 5. Kauffman, A.L., Ashraf, J.M., Corces-Zimmerman, M.R., Landis, J.N., and Murphy, C.T. (2010). Insulin Signaling and Dietary Restriction Differentially Influence the Decline of Learning and Memory with Age. PLoS Biol. 8, e1000372. https://doi.org/10.1371/jou....<br /> 6. Kauffman, A., Parsons, L., Stein, G., Wills, A., Kaletsky, R., and Murphy, C. (2011). C. elegans Positive Butanone Learning, Short-term, and Long-term Associative Memory Assays. J. Vis. Exp., 2490. https://doi.org/10.3791/2490.<br /> 7. Meisel, J.D., Panda, O., Mahanti, P., Schroeder, F.C., and Kim, D.H. (2014). Chemosensation of Bacterial Secondary Metabolites Modulates Neuroendocrine Signaling and Behavior of C. elegans. Cell 159, 267–280. https://doi.org/10.1016/j.c....<br /> 8. Garigan, D., Hsu, A.-L., Fraser, A.G., Kamath, R.S., Ahringer, J., and Kenyon, C. (2002). Genetic analysis of tissue aging in Caenorhabditis elegans: a role for heat-shock factor and bacterial proliferation. Genetics 161, 1101–1112. https://doi.org/10.1093/gen....<br /> 9. Kaletsky, R., Moore, R.S., Vrla, G.D., Parsons, L.R., Gitai, Z., and Murphy, C.T. (2020). C. elegans interprets bacterial non-coding RNAs to learn pathogenic avoidance. Nature 586, 445–451. https://doi.org/10.1038/s41....<br /> 10. Moore, R.S., Kaletsky, R., Lesnik, C., Cota, V., Blackman, E., Parsons, L.R., Gitai, Z., and Murphy, C.T. (2021). The role of the Cer1 transposon in horizontal transfer of transgenerational memory. Cell 184, 4697-4712.e18. https://doi.org/10.1016/j.c....<br /> 11. Sengupta, T., St. Ange, J., Kaletsky, R., Moore, R.S., Seto, R.J., Marogi, J., Myhrvold, C., Gitai, Z., and Murphy, C.T. (2024). A natural bacterial pathogen of C. elegans uses a small RNA to induce transgenerational inheritance of learned avoidance. PLOS Genet. 20, e1011178. https://doi.org/10.1371/jou....

    1. On 2024-01-16 14:43:15, user Reviewer1 wrote:

      This study investigates the distribution of food source partitioning, across major groups of the animal kingdom. The overarching aim is to create a global trophic pyramid of biomass, partitioned by food source. The authors collected a large dataset on diet composition from the literature and other sources, ensuring a broad taxonomic spread. They then estimate diet partitioning for major taxonomic groups (~class) by averaging species-level data, and further estimate partitioned food source biomass by multiplying with class-level biomass estimates. This is taken to be provide a representation of a trophic pyramid, and the findings are discussed in the light of this concept. The major claim of this study is that they find a middle-heavy trophic pyramid, with invertivory more prominent (by biomass) than herbivory.

      The study pursues a very interesting question in studying the trophic pyramid on a global level. The authors have invested a lot of effort in compiling a large dataset on species-level diet partitioning, and such a dataset would certainly be very valuable for species-level comparisons and analyses, such as the taxonomic distribution of feeding styles or the evolutionary history of feeding specialisations. However, such questions are not the focus of the present study. Rather, an attempt is made to convert this species-level dataset into a trophic pyramid of food source biomass. In the process, the authors make several sweeping assumptions and generalisations, resulting in analyses that are not at all well supported by the underlying data.

      First, the conversion of species-level data to class-level partitioning of food sources, by averaging the data from available species, assumes that the compiled species are representative of the group (class) as a whole, and that a simple species average would provide a meaningful group average. Both are highly doubtful and not supported by any data.

      Second, the assumption is made that the class-level partitioning of food sources can be transformed into a partitioning of diet biomass by a multiplication with that group’s estimated biomass value. However, this will yield the biomass of that specific partition (e.g., the combined bodymass of all vertebrate herbivores) and not the biomass of their diet.

      Third, species groups (and their biomass) are assigned to a trophic level by their food source type, which leads to the three categories “herbivores” (= primary consumers), “invertivores” (= secondary consumers) and “vertivores” (presumably considered as predators including apex predators as they are placed at the top of the pyramid in Fig. 2). This is a strong oversimplification and does not represent a trophic pyramid. Most worryingly, the category “invertivores” will lump many higher-level consumers (third-level, fourth-level…) into the secondary consumer category, which as a result has by far the highest proportion (= biomass in this analysis). Thus, one of the key claims of the study, that the global trophic pyramid is middle-heavy, is likely due to a methodological artifact.

      In summary, the study attempts a methodological shortcut for deriving a trophic biomass dataset from species-level data, without verifying the assumptions. At the current time, there appears to be no ready substitute for species-level abundance or biomass data. Until such data are available for the majority of organisms, analyses of trophic pyramids on a global level may be premature.

      Recommendations for the authors:

      As mentioned in my public review, I commend the authors on compiling such a large and potentially very valuable dataset on species-level diet partitioning. I believe such a dataset can be very informative for species-level analyses, or possible investigations into the evolution of such partitioning. However, such a dataset cannot be transformed into a trophic dataset without corresponding data on species abundances and/or biomass. Your attempts to perform this transformation without such data unfortunately fall short, as it requires a series of sweeping assumptions that are almost entirely unfounded by real-world data.

      I will attempt to explain my views in the sections below:

      Title<br /> The title is misleading: in the current form, the manuscript deals with many more analyses than the number of herbivore and predatory species in each class. Though as I mentioned, this species-level analysis is actually the most relevant (and valid) analysis in your study while the trophic pyramid aspect is not.

      Introduction<br /> You provide a very nice overview of the different concepts of trophic pyramids and their development over time. As you point out, all these variants of the pyramid include a measure of scale for each level, such as ‘abundance’, ‘biomass’, or ‘energy’. It is also implicit in this introduction that this concept considers multiple levels (L42: “…food chains…”, L45: “…and so on up to…”) and not just three as in your following analysis.

      Materials and Methods<br /> The success of the method hinges on the representativeness of selected species. This is highly unlikely, as data on diet composition will be much more readily available for large or well-studied organisms, which are not necessarily the ones that are the most important (by number or biomass) members of their class. The authors themselves acknowledge that for many groups, even with a minimum of ~500 species per group, still only ~0.3 to 1.3% of described species are covered for insecta, arachnida, mollusca and crustacea (L265-267). In addition, I would strongly argue that even with good taxonomic coverage, as is achieved for birds and mammals, calculation of the group average has to consider the highly differing abundance and/or biomass of separate species. To illustrate these points, I would like to highlight the study’s data on the arachnida (Figs. 1 and 4). About 20% of their diet is considered as “parasite vertebrate”, with a considerable biomass. Without knowing the details of the species that were considered, I would assume that the majority of these are ticks, as these feed on (mostly) vertebrate blood. Roughly speaking, we know of maybe 60 000 species of arachnida, of which perhaps 1000 are ticks. On the species level, ticks therefore seem to be highly overrepresented in the dataset, possibly because it is straightforward to infer their food source from their specialized morphology. On the other hand, the group arachnida does not seem to consider very many oribatid mites, of which there are around 12 000 known species that are almost exclusively detritivore. In addition, oribatid mites are known to be extremely abundant in soils, so their biomass is likely many times that of ticks. A similarly obvious over-representation in terms of diet and biomass occurs in the marine dataset with “vertivore crustacea”. Please note that I only picked some obvious examples here, but that the same issues will be prevalent in all animal groups.

      Indeed, I believe that your method “validation” using bird species data shows that your estimate can be very unreliable, even for a well-covered group such as birds. Your Results (L345-347) show that “the respective contribution of invertebrates and vertebrates switched from 56% and 8% in the estimate to 23% and 45% in the species-weighted partitioning”. These are very large differences.

      A further point I would like to raise: using an animal group’s biomass to gauge the biomass of the separate diet partitions seems to oversimplify matters. You are assuming that the body biomass equals the diet biomass. However, foods have very different nutritional content (e.g., carbohydrates/protein/fiber). A Panda and a Polar Bear may have fairly similar body weights, but the panda needs to eat much more plant matter biomass due to the poor nutritional content.

      Overall, the Methods section is a little disjointed, and is difficult to match to the Results section. Also, some of the chosen methods are not well justified or explained. E.g., <br /> - How were Wikipedia sources selected and “confirmed” (L130), or how was the literature searched (L132)? <br /> - How did you incorporate a diet category that only exists for a single class (“plant-derived, L150”)? <br /> - How did you deal with separate diet data for juveniles and adults (L157)?<br /> - L184ff: It remains unclear why you compare your global dataset to two location-specific datasets. What did you aim to achieve? A validation of the global dataset in this manner appears dubious, as local datasets may always remain location-specific.<br /> - What is your justification for collecting a further dataset on dinosaur diet? You mention that you aim “to test if herbivory is related to higher body mass and lower metabolic rate” (L206), but then compile only diet data for these dinosaurs (inferred from dental morphology, adding a further level of uncertainty), and no data on body mass or metabolic rate. In addition, I would think that your dataset on mammal diet composition would be much more suitable for this purpose, as it appears to be quite comprehensive and would include many species with “high” body mass. Also, in extant mammal the diet composition has presumably been directly quantified, and not just inferred from dental morphology.<br /> - L214ff: Why have a specific method for assessing human diet? We are just one more species in your dataset.<br /> - L223ff: The use of reptile biomass data for amphibians is not justified. Your assessment that the differences in average body mass and population density ‘cancel each other out’ cannot be verified. If you do not have a good biomass estimate for amphibians, you cannot include this group in the analysis.<br /> - L265ff: Your statistical “validation” of achieving representative data from poor species coverage is inappropriate. By sampling 0.3% of bird species 10 00 times and calculating an average, you merely verify that you can calculate a good average from ~300 000 (~30 species x 10 000), overall randomly sampled, data points. To “validate” your approach, you need to investigate the variance of your 10 000 repeat samples, which presumably is extremely large.<br /> - L265ff: The Methods appear to be incomplete here, as the Results section describes an analysis that was weighted by bird species biomass and abundance (L340).

      Results<br /> Throughout the manuscript, but particularly noticeable in the Results section, you are using misleading terms to refer to your data and results. I believe this stems from your multiple assumption to derive trophic pyramid data from a species-level dataset. E.g.<br /> - Fig.1: “species in most animal groups”; this figure shows the group average diet composition, not the species proportions.<br /> - L355: “partitioning of diets… expressed as biomass (Fig. 2)”; this figure actually shows the biomass of the trophic group, not their diet.<br /> - Etc.

      L333: “we assumed a homogenous distribution of biomass across trophic levels in each group” – a further example of an unfounded assumption that weakens your analyses and conclusions considerably.

      The data on dinosaur diet is missing from the Results.

      Discussion<br /> As outlined above, I believe that your main conclusion of a middle-heavy global trophic pyramid is not supported by your analyses, as are other conclusions on the trophic pyramid. Your study does not support the conclusion of a “paradigm shift” (cf. L407).

      Finally, some further minor comments:<br /> L173: what is the category “Food I”, and why is it relevant to mention these categories here?<br /> L311: Conservation areas might include some “important species” that are missing elsewhere, but that should not distract from the fact that species lists remain highly biased and incomplete there, as everywhere. Most obviously, Kruger NP is bound to have more than 13 species of insect (Fig. S4). And certainly such species list do not consider the microfauna to a meaningful degree.<br /> L402: It seems very unfair to disparage previous efforts as biased, when your own study is based on highly incomplete datasets and unfounded assumptions.<br /> L476f: I find the definition of a carnivore from Román-Palacios et al. in this context highly misleading. Heterotrophs include fungi, which does not make a fungivore a carnivore.<br /> L494: There might have been larger insects in the prehistoric past (at least we know of one large dragonfly), but that hardly makes them “megafauna”.<br /> L523: “a world without insect would potentially mark the end of complex life on Earth” – there is certainly complex life in marine environments, where insects are not prevalent and their potential decline might not have large impacts.<br /> L676: “more abundant” – you are not considering abundance here.<br /> Fig. S8: Here you are literally comparing a species group with a single species (humans). I presume that your reasoning is that the diet of humans has important impacts on the global food web. This is a nice case in point that you absolutely need species-level information on abundance/biomass to construct trophic pyramids and food webs.

    1. On 2023-11-14 16:49:20, user James Mallet wrote:

      Congratulations on this provocative paper which I read with great interest.

      However, I have some questions about the meaning of the results. Your paper suggests that previously, the prevailing belief has been that there is more hybridization, and therefore more gene flow between species, in plants than in animals. However, your preliminary discussion suggests that this is actually an artefact of “rely[ing] on morphological traits to arbitrarily define species (16),” where ref. 16 is Mallet 2005 in TREE. Although it is true that the data summarized in Mallet 2005 was indeed based largely on morphologically identified species (and their hybrids), it doesn’t rely on a morphological species concept. Anyone who knows taxonomy of any group of organisms knows also that morphology is a rather good, although not foolproof, guide to species status; two sister species, when they co-occur in sympatry, will typically display two modes in multivariate morphospace. Actually, Mallet in 1995 and 2005 argues for a genotypic cluster definition of species, which certainly applies to molecular markers as well as morphology. Two related species, if they co-occur in sympatry, will display a series of genetic differences that enables them to be identified, even if they hybridize. There are two modes in the multivariate genotypic distribution; the relationship with the classical taxonomist’s morphological identification of species is clear.

      Then you argue “the emergence of molecular data ... enables substituting the human-made species concept with genetic clusters that quantitatively vary in their level of genetic distance (18),” where ref. 18 is Galtier 2019 in Evolutionary Applications. Now that is interesting, as I think Galtier proposes “Species are defined as entities sufficiently diverged such that gene flow (arrows) is very rare or inexistent” (his Fig. 1). In other words, he appears to have a species concept such that gene flow between species is zero. Any gene flow, he argues, would render the situation “ambiguous”.

      Later, perhaps recognizing that this is too extreme, Galtier proposes using a reference species based system: “...to identify taxa in which large amounts of data are available, and species boundaries are consensual, or can be agreed on. Species delineation in any other taxon could thus be achieved so as to maximize consistency with the reference [taxa].”

      Now perhaps this dickering about what is a species appears rather unreasonable, since I think we all know (and Nicolas Galtier certainly seems to agree) that there is a continuum between populations that are not species and those that are species. However, in order to disprove the prevailing narrative that plant species hybridize more than animal species, you really must take a stance on what you mean by a species, and what you mean by a population that is not a species. My natural history knowledge of flowering plants and animals such as insects and birds suggests that plant species that co-occur in sympatry really do have a higher rate of hybridization than animal species. Not only is a greater fraction of species involved, but when they do hybridize, there are usually a lot more hybrids.

      But you will say perhaps: “that is not really the question we attempt to answer.” And indeed it is not, so perhaps you should not have complained that that finding about whether species hybridize was an artefact, which you appear to do.

      The question you more attempt, I think, to answer is: “is introgression more common in plants than in animals for a given level of genetic divergence, DA?” Rather than a question about species, it seems to me you are asking a question here that is independent of what your (or the reader’s species) concept is (unless you argue that a species has a certain threshold level of genetic divergence).

      After arguing that “the Tree of Life” is “interrupted by species barriers that are progressively established in their genome as the divergence between evolutionary lineages increases,” you then argue that “The consequences of reproductive isolation can therefore be captured through the long-term effect of barriers on reducing introgressing introgression locally in the genomes, which provides a useful quantitative metric applicable to any organism (4).”

      Ref. 4 is Westram et al. (2022) J. Evol. Biol. “What is reproductive isolation?” Westram show that it’s actually very hard to measure overall reproductive isolation, RI, which they say is determined by the level of “effective migration” at neutral loci, or the fraction of the rate of neutral genes that actually establish (reduced due to species barriers) in the recipient population, me, divided by the rate of “potential gene flow,” m, into the population caused by the potential for hybridization and backcrossing, or RI = 1 - me/m. Effective gene flow depends on where in the genome you measure it; in which direction you measure gene flow; whether populations are parapatric or sympatric; whether you want to measure it using an “organismal” or “genetic” focus (in Westram et al.’s terminology). Furthermore, it depends on who is measuring it and how. Everyone who measures it seems to have somewhat different measures of reproductive isolation (Sobel, J. M., & Chen, G. F. (2014). Unification of methods for estimating the strength of reproductive isolation. Evolution, 68, 1511–1522). It doesn’t provide a very useful comparative measure applicable at the whole species level at all. My colleague from Boston University and I conclude from perusing the lengthy discussions in Sobel & Chen and Westram et al. that measuring overall reproductive isolation is unlikely to be useful, and we would be better off just accepting that it is a vague heuristic which expresses something about species (Mallet, J., & Mullen, S.P. 2022. J. Evol. Biol. 35:1175-1182). In contrast, one can readily measure some of its many components, such as “hybrid inviability”, “assortative mating” and so on, and these remain useful and interesting at the whole species level and as comparative indicators.

      Again, it may seem a distraction that I am discussing what is reproductive isolation, but it seems important here, because you are using a measure of reproductive isolation, and then relating it to genetic distance. In Westram et al., the main concern was to develop an experimental measure of reproductive isolation. Westram et al cautioned against estimating reproductive isolation from sequence data, which is the method you employ here. Their reasoning is that sequence divergence is a consequence only of actual gene flow, me (after taking into account barriers to gene flow), and that there is no way of estimating “potential gene flow” from the same data. In the main part of the paper (e.g. the data points in Fig. 1A), there seems to be a non-continuous measure of reproductive isolation, such that “migration” has a value 1, whereas “isolation” has a value zero. It was not entirely clear to me why this should be so, since, whatever it is, it seems clear to me that reproductive isolation should surely be a continuous parameter. Delving into the supplement, I found that “genetic isolation” was indicated “when our ABC framework yields a posterior probability P(migration) < 0.1304. This threshold was empirically determined by the robustness test conducted in (Ref. 6).” Similarly, the same robustness test yielded “strong statistical support for ongoing migration ... when the posterior probability P(migration) > 0.6419.” Pairs of taxa with intermediate posterior probabilities were considered “ambiguous” and were discarded. Note that P(migration) is not the actual mixing rate of the populations, me, or the fraction of the genome exchanged, but, if I understand it correctly, the posterior probability that any gene flow at all occurs. This is a very different measure of reproductive isolation from that proposed by Sobel et al. or Westram et al., or anyone else.

      I think the reason for your choice of a measure of reproductive isolation is indicated by the second question you ask in the introduction: “At what level of molecular divergence do species become fully isolated?” This is related to a common conception of species as irreversibly independent lineages, and the idea that speciation will be “complete” when gene flow becomes zero. But in fact, the “completion” of speciation in this sense seems rather unlikely. The progressive loss of compatibility between diverging lineages seems likely to follow some sort of continuous probabilistic failure law, similar to the way lightbulbs fail over time. The simplest failure law is log-linear with time, although more complex models such as the accelerating “snowball” model of hybrid incompatibility, or the likely “slowdown” model for selective reinforcement, are also possible (Gourbière, S., & Mallet, J. 2010. Are species real? The shape of the species boundary with exponential failure, reinforcement, and the "missing snowball". Evolution 64:1-24); but all have a long asymptotic tail. You seem to recognize this stretched out right-hand side timescale by plotting genetic divergence on a log scale in Fig. 1 (although why is “net divergence,” Nei’s DA, the correct scale on which to base such an analysis? You do not explain or justify this). Nonetheless, by making an argument for complete isolation as an endpoint, you ignore the asymptotic nature of compatibility decline to zero. Based on the data we analyzed, it is rather hard to estimate the shape of the failure curve, mainly because the accumulation of incompatibilities is so variable, even among closely related species, such as Drosophila fruit-flies, for example. This variability between pairs of species shows up only in the data, and not in the fitted curve in Fig. 1A, but is more evident from Fig. 1B.

      Overall, I remain somewhat unconvinced that plants have a more rapid accumulation of species barriers than animals. I agree it is likely that many plants have “less efficient dispersal modalities” than most mobile animals, and that this might mean that actual gene flow becomes lower for plants at a distance from one another, but this is a little different from what I think one would mean by “species barriers.” Reproductive isolation and species barriers should generally be rather independent of geography; in other words reproductive isolation at close range is what we are primarily interested in. This is the problem of using a measure of reproductive isolation that depends purely on actual gene flow. I therefore remain unconvinced that my natural history observations of many plant hybrids in nature, and very few animal hybrids, are not reliable indicators of lower levels of reproductive isolation among plants than among animal species.

    1. On 2023-10-02 17:42:18, user Neil Greenspan wrote:

      The manuscript by Killian et al. is a valuable contribution to the investigation of both the biological and biophysical aspects of the humoral immune response elicited in the context of allogeneic organ transplantation. I do, however, have some reservations regarding the interpretations of the authors.

      1)<br /> The authors suggest that individual amino acid residues shared between an<br /> allogeneic HLA antigen and a self-HLA antigen should be viewed as “self.” I<br /> view this act of classification as problematic. When a donor HLA antigen<br /> differs by one or more amino acids from a host HLA antigen encoded at the same locus, the entire protein is classified, at least from some perspectives<br /> routinely adopted in transplantation immunology, as non-self.

      One way to rationalize this view, which may conflict with the perspective expressed by the authors in this manuscript, is to suggest that what matters in<br /> antibody-antigen interaction are the thermodynamic roles of the amino acids that constitute HLA antigens, not their identities. The claim is that the relevant biochemical/biophysical properties of a given shared amino acid at a particular position in the primary structures of the self and allogeneic HLA molecules can be altered meaningfully as a consequence of the one to several amino acid differences between these proteins. For example, a lysine or tryptophan that is oriented slightly differently in the self vs. the allogeneic molecules or that is more or less likely to fluctuate in certain directions is not necessarily thermodynamically equivalent in the two proteins.

      2)<br /> If the above assertion is accepted, then the claim that breaches of tolerance<br /> are critical for damage to the allograft is not demonstrated. While it is of<br /> interest to know that self-reactive B cells are generated it is not clear from<br /> this study that the antibodies produced by these B cells cause graft damage in vivo. While I acknowledge the evidence that autoreactive anti-A*24:02 antibodies can bind to allogeneic A*01:01 with potentially meaningful intrinsic affinities, that is a necessary but not sufficient condition for contributing meaningfully to clinical allograft tissue damage, especially in the context of a single patient with an autoimmune disease. Experiments designed to test the hypothesis, in a broader range of transplant patients, that such antibodies do contribute to allograft rejection episodes would be of interest.

      In the context of the potential role of autoantibody responses in allotransplantation, it has been accepted for some years that generation of autoantibodies to a variety of proteins can accompany alloimmune responses to an allograft. Some investigators have offered evidence that the presence of such antibodies is associated with damage to allografts. At present, I do not think we know with certainty the extent to which, if at all, such autoantibodies contribute to allograft damage or whether they can do so in the absence of pathogenic alloantibodies.

    1. On 2023-08-30 08:41:17, user Jose E Perez-Ortin wrote:

      This new model for explaining mRNA<br /> buffering is a very interesting piece of work. We would like to suggest some<br /> possible improvements to be considered by the authors in this preprint stage before<br /> it becomes published in a journal.

      In some parts of the manuscript it is said<br /> that mRNA buffering is perfect as total mRNA concentration and even individual<br /> mRNA concentrations are invariant. We think that this is overblown. For<br /> instance, graphs in Sun et al 2013 (ref. #9; Figure 1),<br /> the variability in total mRNA may be as high as 50%. In fact, in García-Martínez et al 2004 (ref. #15;<br /> Figure 2) we published that during the carbon source change mRNA concentration<br /> changes also by a factor of 2. We wonder if this could be important for the modeling<br /> because it seems that on the advantages of the RS model is that it predicts<br /> robust buffering, contrarily to previous feedback models.

      The manuscript misses citation of some<br /> papers that we consider important for the field of mRNA buffering, such as Mena et al 2017 (doi:<br /> 10.1093/nar/gkx974). This paper is especially relevant because the current<br /> preprint describes in the Introduction section that total mRNA concentration is<br /> constant as the cell volume increases (refs. 19-22) but forgets to mention this<br /> piece of work, which was the first one to show that degradation rate perfectly<br /> balances production rate during cell volume change. Instead of our paper, the<br /> preprint cites ref. #27, which is 4 years older than Mena et al 2017.

      Garcia-Martinez et al<br /> 2023 (doi: 10.1016/j.bbagrm.2023.194910) is also highly relevant. We described in that<br /> article a mathematical model that explains mRNA buffering using a simpler<br /> mechanism consisting only one mRNA binding factor that co-transcriptionally imprints<br /> mRNAs. That model also predicts that synergistic changes in synthesis and<br /> degradation rates will provoke faster and stronger responses, as described in<br /> some experiments. We also previously published a multiagent model in Begley et al 2019 (10.1093/nar/gkz660),<br /> which combines mRNA imprinting and feedback mechanisms. That paper also<br /> demonstrates that Ccr4 and Xrn1 act in parallel with different sets of targets<br /> genes. We also have demonstrated in that paper and in other two (Begley et al 2021 doi:<br /> 10.1080/15476286.2020.1845504; and Medina et al 2014 doi:<br /> 10.3389/fgene.2014.00001) that protein factors, such as Ccr4 and Xrn1 act not<br /> only in transcription initiation level but also in elongation . We think it<br /> would be nice this manuscript to discuss the differences of these models with<br /> the proposed RS model.

      Finally, as for the model in Figure 4c, we do not understand why the<br /> activation of a degron used by Chappleboim et al 2022 (ref. #16) only<br /> degrades cytoplasmic Xrn1 molecules (Xc) and leaves Xp molecules intact. All<br /> Xrn1-degron molecules (Xc, Xp, Xn) will be proteolyzed after Auxin addition.<br /> This can affect the predictions made by the RS model.

    1. On 2023-08-21 17:16:09, user Cristiane Paula Gomes Calixto wrote:

      Revision comments from: <br /> Cristiane Paula Gomes Calixto <br /> Flaviane Lopes Ferreira<br /> João Francisco Canal <br /> João Henrique Servilha<br /> Lucca de Filipe Rebocho Monteiro<br /> Victória de Carvalho

      The manuscript titled “Epigenetic and transcriptional landscape of heat-stress memory in woodland strawberry (Fragaria vesca)” aims to investigate the inheritance of heat-induced epigenetic and transcriptional changes in Fragaria vesca through asexual reproduction. The study analyses genome-wide DNA methylation and differential gene expression in the initial generation (heat-stressed and control) and their three subsequent non-stressed asexual generations. The authors observed a decreasing transfer of the stress-induced molecular memory across the generations. Their work has originality/novelty, and we believe the biological question they seek to answer can be interesting for the plant sciences community.<br /> We would like to provide some suggestions which we believe might enhance the quality of the manuscript. Please be aware that these suggestions are not exhaustive.<br /> Major comments<br /> • Please include additional information so as to allow the research to be replicable and reproducible. For example, saying “9:00-11:00 a.m.” might not be precise enough. Using the specific light zeitgeber would better inform when samples were harvested in the diel cycle (lines 140 and 161). Another example, the description “Illumina paired end read sequencing (150 bp)” appears to omit crucial details concerning the specific options utilised in the NGS experiment. Important information, such as mRNA selection method, library construction kit, sequencing platform, and the strand-specificity of reads, among other factors, should be included. Line 192: Please state which transcriptome was used with Salmon. Line 282: Which clustering method was used to build the heatmaps?<br /> • The claim that “… genes linked to gibberellin pathways may contribute to a short transcriptional memory.” should be discussed with the literature.<br /> • Line 642-644: Kindly review the claim in relation to what is depicted in the figure.

      Minor comments<br /> • We recommend English editing to enhance grammar and clarity. <br /> • Scientific names must always be italicised. In the first appearance of the species, it is also required to list the person (or team) who first made the scientific name of that taxon available. <br /> • Lines 131, 134 and 144: could you please add the light intensity in µmol m-2 s-1?<br /> • Line 135: Is there a specific scientific or practical rationale for maintaining consistent temperatures in stress assays throughout both day and night, while implementing varying diel thermos-cycles for control and recovery conditions?<br /> • Line: 158: We found it a bit difficult to understand what was actually collected.<br /> • Line 166: please, add the reference where we can find more details on the bisulfide method used.<br /> • Line 193: It would make it easier for the reader to understand what the authors mean by DEG if the DESeq2 default parameters were described here. Is it log-fold change, p-value cut-off, etc?<br /> • Lines 205-207: Could you provide information on the duration of the heat-stress treatments?<br /> • Lines 264-267: Do terms like "low," "hypermethylation," and "hypomethylation" refer to a comparison with data from control samples? The comparison between different samples was not really clear to us. The same applies to “significantly different” (line 281).<br /> • Figure 1A: We think this figure could be improved to help the reader understand the temperatures used for CM. Additionally, could you confirm whether the application of 24°C on recovery days precisely occurred for 48 hours? It seems that the temperature might not be exactly 24°C, and we think the figure could provide more precise details.<br /> • Figure 1B: Why are scissors, “2w” and “sampling” shown only on the right-hand side of the figure?<br /> • Figure 1C: Detecting differences among samples based on the y-axis is proving to be challenging for us. The authors might want to contemplate plotting by C contexts on the x-axis, or alternatively, segmenting the y-axis into three distinct regions where resolution could be enhanced around 1-5, 13-17, and 38-42.<br /> • Figure 3B: Is it possible to apply colour shading similar to that seen in a heatmap for this figure?<br /> • Figure 3D: Kindly review the genes mentioned in the figure legend in relation to what is depicted in the figure.<br /> • Line 280-281: The phrase between the brackets seems a bit confusing. We recommend rephrasing it for clarity.<br /> • It might be advisable for the authors to verify whether they are employing a colour-blind-friendly palette.<br /> • Some of the finer details in the figures are quite challenging to discern, making it difficult to interpret the results.<br /> • The expression patterns of several FvHSFs were described previously (López et al., 2022), some also undergoing promoter demethylation. How does the expression patterns of these HSFs change in response to a temperature gradient challenge? We believe the paper would considerably improve if heat-shock proteins and chaperones are also investigated.

    1. On 2023-07-06 11:17:24, user Nick Leigh wrote:

      This is a well written and clear manuscript comparing successful and defective heart regeneration in zebrafish versus medaka, respectively. The experiments are well designed and the interpretation is careful and thorough. These kinds of studies are essential and, now powered by single cell sequencing, can cast wide nets that enable unbiased description and investigation of this process. As clearly stated by the authors, the description provided here undoubtedly provides numerous follow-ups, questions, and hypotheses about regenerative success and failure. The authors should also be commended on creating a webtool to allow others’ to query their dataset.

      “Cross-species data integration was effective as both zebrafish and medaka cells were represented in each major cluster”. Agreed that across the major clusters there is good agreement. I’m more curious about if this is potentially overfitting–are you losing a different cluster only present in one species? From published data, could we expect any different clusters between these two species? (addressed a bit later on with zEP cells). In general, it may be worth exploring a couple other strategies for cross species integration to try and prove this further (point 6 also addresses this). <br /> The scale of the interferon-deficiency in the medaka is striking. It’s mentioned that DAMPs from necrotic cells could be a driver of interferon responses, but building on some of your prior work (Balla et al. 2020 PMID: 32413307), are the zebrafish all harboring some virus at this point and the medaka not? Could a viral/microbiome-related reason result in lack of IFN signaling. Relatedly, it would be interesting to see if medaka have type IV interferons (https://www.nature.com/arti... (and if these are included in this one-to-one comparisons/ if they are even annotated in the current version of the zebrafish genome). Finally, is there evidence of any DAMP response? For example, are there still other chemokines and cytokines (potenitlaly NFkB nuclear translocation) being produced in medaka and just specifically not an IFN signature? This is getting at the question of whether this is specifically lack of IFN signaling or if medaka are hyporesponse to, for example, DAMPs. <br /> Is recruitment responsible for increased macrophages in zebrafish or is it expansion of tissue resident cells? This could affect the conclusion drawn in medaka that they are not recruiting macrophages. <br /> Figure 3H, the proportion of TNFa positive cells is reported, but what about the absolute number? Given the relatively higher numbers of macrophages in the zebrafish it would be interesting to see how these compare. The ratio of pro versus anti-inflammatory macrophages could be an interesting metric to report. Do the zebrafish ever mount a substantial pro-inflammatory response? It’s suggested that highly regenerative animals undergo a quick switch from pro- to anti-inflammatory and this is important for regeneration, but data demonstrating that is sparse at best and the question remains if there is ever robust a pro-inflammatory response in regenerative animals. <br /> Paragraph starting with “We know relatively little about the makeup..” is a bit unclear. What type of cells are you referring to? Are these the fibroblast-like cells or fibroblasts? The concluding sentence leads one to believe fibroblasts are benign studied, but earlier on it’s discussed that “epicardial cells cells expressing collagens”. Do you find collagen expression by macrophages? (https://pubmed.ncbi.nlm.nih.... Are mmp15/16 implicated in regeneration? <br /> Regarding the zEP cells and their potential uniqueness to zebrafish, it would be interesting to explore a samap or other tools and see if they still remain separate (https://github.com/atarasha..., https://www.biorxiv.org/con..., note: this paper integrates zebrafish heart single cell data with 4 other species and could be worth looking at). As noted by the authors, more work is needed here. Whole mount FISH of hearts from both zebrafish and medaka would be quite interesting to see if zEPs can be detected anywhere. <br /> The mammalian studies are interesting and could be worth expanding. It would be insightful to tie back into the first few figures and the major findings there. Can you learn anything new from the mouse dataset with the perspectives gleaned from the fish comparison? For example, what is happening with the ISGs in the mouse? It could also be interesting to compare to salamander heart regeneration to provide another evolutionary intermediate (https://www.nature.com/arti... <br /> Do primordial cardiomyocytes wane with age? Do larval/developing medaka contain these cells and do these young medaka regenerate their hearts? (perhaps not experimentally feasible). <br /> What is the role for the compact myocardium when not in regeneration? Why is there so much diversity in its size across species? <br /> Do you think there is a unifying reason for lack of regeneration in medaka? You uncover quite a few differences.

      Minor stuff: <br /> This is a biased comment, but it would be really interesting to know if there is divergence between replicates. You could pull out each sample with some genotype-based demuxing. Check out: https://www.life-science-al.... This might also aid with DE analysis (https://www.nature.com/arti... <br /> “To investigate the contributions of epicardial-derived cells to the fibrotic response, we re-clustered all cells expressing epicardial-specific markers tcf21 and tbx18, and re-clustered them into four…” a bit confusing with double re-clustering here. <br /> Do medaka lack cortical cardiomyocytes or are they just less abundant? The last line of the figure 6 results section suggests an absence with the use of “lack”. <br /> One could consider side-by-side violins might better illustrate between time point comparisons. <br /> Figure 6E and G with numbers for cluster labels is not super clear. Perhaps these could be labeled with the top markers they express or more info added to figure legend to explain. Including on the figure the species for E-F and G-H could also help orient readers more quickly.

    1. On 2023-06-19 11:25:43, user Adrien Jolly wrote:

      These views are my own only (I did not involve any of my past or present colleagues and collaborators)

      Thank you for this ambitious endeavour, I have some comments.

      Comment on the discussion:

      While I agree with your point on exponential distributions (we make exactly this point ref. 36), the claim that one generally cannot fine-tune the variability of the cycle phase length with ODEs (p 15-16) is misleading. we do exactly that with our sub-steps approach (that you mention) which in fact permits the modulation of the cycle phase length variability from quasi invariable to exponential. <br /> It has actually been shown that mammalian cycle phase durations follow Erlang distributions (as they arise from our sub-steps). In our work, we estimate the coefficients of variations of the cycle phase length when identifiable from the data and discuss the information generally contained in this regard in our thymidine analogue incorporation experiments.

      Comments on the agent-based model:

      Minor comment<br /> (1) EdU/BrdU incorporation. In my hands EdU is not immediately detectable in the thymus following injection (as is often assumed) and it takes more than 30 minutes to label all the cells in S phase (admittedly it was IP injection, and as you perform intravenous injection, labeling should be significantly faster). <br /> Here, if I understand correctly, you assume 0.5% of DNA being labeled is sufficient for detection which would label most cells after a couple of minutes in S phase when the analogue is present. Did you check this was the case (thymus collection 10 min after injection for instance)?<br /> You further assume that labeling stops abruptly after 45 minutes. However it does not seem plausible for incorporation to cease suddenly and a gradual decay of the analogue availability seems more realistic.

      Since you have time course data which can reduce the dependence of the result to the initial labeling phase, the effect of these assumptions might be very limited but it would still be useful to check to what extent these assumptions affect your result.

      (2) I found difficult to identify exactly what data you use for fitting (which is of course essential to judge the results) and how the data is assigned to your model, I think some clarifications would really be beneficial to the readers.<br /> Do you use exclusively EdU/BrdU information or do you combine this information with total DNA content (distribution of cells across the cell cycle)?

      Here is a point of particular concern:

      (3) I understand that you do not allow transition from cycling to "long G1"/”quiescent”, sometimes finding these “long G1” cells represent >90% of a given population. Given the duration of each stage of thymic development (60 h for total DP for instance according to your 2017 review) it seems very unlikely that labeled cells do not contribute significantly to the quiescent subpopulations within 20 hours and in any case, this should certainly not be assumed a priori.

      From my own experience ignoring the cells transiting to quiescence/long G1 after the initial label incorporation might greatly distort your result by affecting the rate of entry in S phase. <br /> I expect the introduction of transition rate should improve your fit when a quiescent population is present (for later time points in particular) although I cannot conclude based on the data you currently present. <br /> In general, I find the exclusion of cells from the dynamics (which sometimes turn out to represent the overwhelming majority of the population) to be an extreme decision and I don’t think this should be made without strong evidence (simulations?) that this does not invalidate your result.

      (4) Along the same line, differentiation and transmission of labeled or unlabeled cells between compartments should be considered carefully. Differentiation can certainly affect percentages of labeled cells in a downstream compartment over time.

      While in some cases, the influx compared to local proliferation can be negligible (given the difference in size between compartments and respective cycling properties), it is a point which should be addressed for each cell compartment.

      If I understand correctly, your model poses that cells leaving a compartment are replaced exclusively by non-labeled cells. This is not neutral and, in some cases, may cast significant doubts on your predictions. For instance DN4 are directly downstream of the highly proliferative DN3b, and DN4 cells will be progressively replaced mostly by labeled cells as time goes on.

      At the very least, it should be discussed compartment by compartment why you think the assumption of exclusive influx of non-labeled cells holds given what is known of T cell development dynamics.

      While you have certainly built an important dataset, the manuscript at its current stage gives the impression that some essential features of the EdU/BrdU dynamics have been overlooked in the agent-based model. hope my comments will prove helpful.

    1. On 2023-05-22 22:45:16, user Fraser Lab wrote:

      Summary:

      In protein engineering projects, it is always desirable to screen as efficiently as possible. Screening a relatively small number of variants becomes especially important when enzyme activity cannot be coupled to a high throughput sequencing readout. The major goal of the paper is to provide a proof of concept scoring and filtering system for selecting among proteins generated using computational methods to meet this challenge of efficient screening. They consider proteins generated using 2 machine learning methods and one phylogenetic method (ancestral reconstruction).

      The end result is a scoring filter combining the language model ESM-1v (which uses only sequence information) and the deep learning method ProteinMPNN (which is trained directly to find the most probable amino acid for a protein backbone predicted by AlphaFold2). After accounting for some simple idiosyncrasies of merging generative models with reality (ensuring starts with Met, removing repetitive sequences, accounting for localization signals) with heuristics, their filtering steps results in an enrichment of active sequences.

      The major success of the paper is a pipeline that actually works for selecting active sequences both in the experiments they conduct and (to some extent) literature examples. The table of potential protein failure modes is particularly useful as a baseline approach and reference for people designing sequences with computational methods. It is especially insightful to see how few deliberate filtering steps in the training process can have a big change in the outcome.

      We expect that a combination of sequence and structure-based filters will be used for prioritizing screening resources in the future. This paper lights the way of how to do that. The next steps will be to take into account structural features beyond stability (which is presumably covered by the AF2/ProteinMPNN), such as catalytic residue positioning, pocket size complementarity to substrate, etc. These are presumably implicitly captured by ESM-1. The next logical step (beyond this paper) is to go beyond statistical combination of these two scoring features to account for such features explicitly or with a new integrated deep learning approach.

      Major points:

      We are a bit confused about the exact value and sequencing of each part of the selection/filtering pipeline. We interpret experiment 3 as:<br /> Apply ESM-1v and Quality Filters and then apply a ProteinMPNN filter on top of that. <br /> Select Negative Controls by selecting sequences that fail the first filter (ESM-1v and Quality Filters) but are within 1% sequence identity to the closest natural sequence for some positive.<br /> The quality checks discussed in the supplementary information seem to have substantial impact. If the selected control sequences failed this quality check, it’s not clear whether the success of the pipeline is due to these heuristic quality checks or due to the computational filtering. These filters are biologically simple such as starting with a methionine, removing long repeats and not having a transmembrane domain - and it is kind of amusing to one of us (JF) that generative models have these pathologies so commonly. More discussion on why these filters were applied and what the distribution of effects were for the quality filters vs the insilico filters would help clarify the impact of each stage.

      This confusion then extends to determining how each of the two computational methods affect the selection. The authors contend that “no single metric would be sufficiently generalizable to screen against multiple sequence failure modes” and hypothesize that ProteinMPNN and ESM-1v “may capture distinct features.” However, because negative controls were selected only after failing the initial ESM + Quality Filters, its impossible to know what effect adding ProteinMPNN on top of ESM had. This is even more relevant given that the structures used to obtain proteinMPNN scores are first generated with Alphafold. Alphafold can be computationally intensive (expensive to run) and therefore it is imperative that we understand how much this part of the pipeline contributes to the overall success of the selection process. The authors themselves contend that “Structure-supported metrics, including Rosetta-based scores, AlphaFold residue-confidence scores, and likelihoods computed by neural network inverse folding models, take into account protein atom coordinates potentially directly capturing protein functionality, however, they can be impractical to compute, especially when evaluating thousands of novel sequences.” This is something that can potentially be teased out. In the case of the paper only 200 proteins were selected using ProteinMPNN, however, if many sequences end up passing the ESM filter and budget allows it would be within reason to expand this random ESM selection.

      In summary, it is a bit hard to tell (without some ablation studies) which different pipeline components and filters drive the results. Additionally, it would have helped if these same quality filters were applied in Round 2 but that doesn’t seem to be the case? A deeper discussion on the selection of quality filters would also point the way forward with combining more “functional” structural features as outlined above.

      Minor points:

      1) The author’s generalize the results with a few literature examples: “similar results were obtained by independently validating COMPSS on previously published datasets of six enzyme families generated by models not considered in the present study.” Looking at the results in more detail reveals that some of these (including one that we generated!) are very small samples and this caveat should be discussed. In 3 out of the 6 studies, only 1 sequence was selected by their pipeline. In another of the 6, 2 were selected. In all 4 of these studies, a number of actives were missed. The limited number of selected sequences makes it hard to know how effective the pipeline really is in these 4 studies. Further, with such a stringent filter is not practical especially when we consider the fact that the authors don’t discuss the level of activity across positive and negative active compounds. It’s entirely possible that you could miss very active sequences and select only moderately active sequences. In one retrospective, the results were truly similar, however in the last other study, the filters worked far from intended.

      Even more, my team has observed in its own work that the sensitivity of machine learning models for scoring can be heavily dependent on the sequences the models have seen before. It would have been useful for the authors to consider how the tested enzymes overlap with the model training data to understand whether these scorers generalize outside the models training distribution.

      2) The authors largely discount natural sequence identity as a metric:<br /> “Surprisingly, neither sequence identity to natural sequences nor AlphaFold2 residue-confidence scores were predictive of enzyme activity.”

      I think it’s important to qualify this with the fact that we are looking at sequences in the 70 to 90% range with very little dynamic range here. In their first experiment they looked at sequences in the 70 to 80 range. in their second they look at sequences in the 80 to 90 range. In their third experiment they looked at sequences in the 50 to 80 range but their filters end up selecting for sequences in the 70-80 range anyways. So it’s possible that locally, identity might not select for select for activity but globally, it could be a first filtering step on its own (which maybe is obvious and hence why it’s not more qualified?). Also to note is that sequence identity seemed to fare as well as or better than other metrics in identifying functional GAN-generated sequences and could be its own generative method:

      More problematic I think is figure 3f and figure 3g:

      It seems like the inactive controls are largely in a separate part of the tree compared to the active sequences passing and control. Does this have anything to do with the fact that these features failed the sequence based quality filters. Second,it suggests an approach where if you have some idea of where to focus on in the tree you could use sequence identity to those natural sequences as a metric for selection . Of course this information may not be readily available but the authors should discuss whether we could have hypothesized that the failing controls would have failed beforehand by considering their phylogenetic origins.

      Technical points:

      1) There is some problem with this sentence:

      “CuSOD training sequences had only a single Sod_Cu domain, while MDH had an Ldh_1_N followed by an Ldh_1_C domain and no other Pfam domains that generally only rarely occur in 6.3% and 1.7% of sequences in both families, respectively.”

      It’s much better captured in the supplementary material:

      “For CuSOD, 1,632 out of 25,701 proteins (6.3%) had aberrant architectures. For MDH, 1,127 out of 65,639 (1.7%) had aberrant architectures.”

      2) It’s not clear where/how they selected the natural test sequences for rounds 1 and 2. We assume it’s from the curated set of data but that’s not necessarily a given, further it seems that in round2, sequences were selected to span the range of esm scores. Was this done for the test natural sequences as well?<br /> “Only 13 test natural sequences were selected, as we had already screened five similar natural sequences in the remediation for Round 1.”

      “Besides the identity range, the experimentally tested sequences were selected to span the entire range of scores on each metric (Supplementary Table 4)”

      3) The authors should be more explicit on the natural sequence identities in each round. If you check the supplement you can find this information if you pay attention to the figures or check the supplement but I think that it should be explicitly stated in the section “Round 2: Calibration data for COMPSS” that sequences are selected in the 80-90 range and in Round3 that the filters resulted in sequences with >69% identity.

      4) The following section is confusing:

      “To further test the hypothesis that poor truncation selection was responsible for the lack of observed activity in the Round 1 CuSOD natural test sequences, we assayed an additional 16 natural SOD proteins (pre-test group)…”

      It should be stated at the beginning that 14 of the 16 test sequences are CuSOD sequences and 2 of the sequences are FeSOD sequences vs letting the reader figure that out later in the paragraph. Additionally, it would help the audience to say explicitly that 3/7 bacterial sequences with clipping also passed or include the table from the supplement up front. 3/7 doesn’t seem clearly distinct from 4/5.

      5) What’s the reason for changing the esm-msa sampling method in round 2? Did they observe some benefit or was this purely a computational choice?

      6) I think the text for a and b are switched in the figure 2 description. a is the AUC figure and b is the correlation figure. Further for figure a If the test sequences are natural sequences, is the identity score meaningless here?

      7) From the supplement: “We skipped the 'starts with M' filter because very few of the sequences in these sets start with M, and did not subset by identity to closest training sequence.” This modification to the pipeline should be mentioned in the discussion of the external validation tests. Or they should speculate what would happen if they just added a M at the beginning of every sequence?

      8) Looking at the figures in the supplement e.g. Fig 30 it seems like they had quantitative activity values. It would have been nice to discuss if there was any correlation between scores and activity for ranking purposes. Was this not included because of variance in the assay?

      Joel Beazer (Profluent) and James Fraser (UCSF)

    1. On 2023-05-11 13:43:40, user ADRIAN TREVES wrote:

      Pre-publication review of "Forecasting dynamics of a recolonizing wolf population under different management strategies" by Petracca et al. https://doi.org/10.1101/202...

      Reviewed by<br /> Adrian Treves, PhD<br /> Professor of Environmental Studies, Founder and Director of the Carnivore Coexistence Lab, University of Wisconsin-Madison<br /> +1-608-890-1450<br /> http://faculty.nelson.wisc.... (which includes full disclosures of potentially competing interests in the CCC.php page)<br /> Direct inquiries to atreves@wisc.edu

      11 May 2023

      I appreciate that Dr. Petracca and colleagues posted their manuscript to a preprint server to facilitate independent review and scientific debate. Such preprints are a healthy step in our field to improve the reliability of science.

      Also I acknowledge the risk posed by preprints, such as policy-makers or the public running with results or inferences before they have been approved by qualified peer scientists. I think two aspects of the preprint process guard against such undesirable outcomes: (a) peer reviews attached to the preprint as a comment should serve to caution against such precipitous use of preprints, and (b) the authors can reinforce the need for caution in subsequent revisions to the preprint, even citing their pre-reviewers. The science-policy interface in which this work lies is fraught with difficulties.

      Also I acknowledge these sorts of models are complex and difficult to parameterize realistically with confidence. None of my comments or criticisms below are meant to undermine the hard work put in, but rather they are meant to improve the final product, improve outcomes for wolves, and improve the policy that may result from applied research. Thanks in advance for reading my comments in that spirit.

      I have chosen not to cite much research below, instead calling the authors’ attentions to our website (above) where peer-reviewed substantiation of all my assertions can be found. I welcome peers’ emails to atreves@wisc.edu if anyone has trouble finding the evidence.

      Most of my comments relate to Tables 1 and 2 and the associated scenarios.<br /> A question about Table 1: the caption includes "Lethal removal rate was calculated directly from state agency records." Please provide those with annual numbers and locations (East or West) to help the reader understand the geographic and spatial context of that assertion.

      The annual lethal removal rate was a single point estimate of 0.04. I don’t understand why this was treated as a constant not bracketed by annual variability? Later, the authors wrote "In scenario 1 (“Baseline”) we simulated all relevant factors, as described below, at levels observed in the data collection period (2009-2020)." All factors include those affecting the human-caused mortality, right? There are numerous studies documenting a variable annual rate of lethal removal. There seem to me to be other issues with assuming a constant annual lethal removal rate in baseline and the scenario for increased removals below.

      The assumptions that seem to be made about constant annual lethal removal in the baseline or the increased removal scenarios might be summed up as "livestock losses will never get better or worse so long as the current rate of removal is applied randomly to wolf packs and entire packs are removed." I don’t mean to caricature the assumption, I mean to make it plainer so it can be scrutinized.

      1. If lethal removal is assumed to be effective in preventing livestock loss as WDFW has implied in the past, then it seems surprising that the model would treat it as ineffective or needing constant renewal. Can this be justified scientifically and by reference to articles that have not themselves been undermined by subsequent work? I call your attention to recent reviews of the literature on lethal removal which indicates unpredictable effects of lethal removal of wolves, resulting in increases, decreases and no change in livestock losses depending on study and site and years (the latter of no effect in the majority of cases, see studies of wolf removal by Grente, Krofel, and Santiago-Ávila.

      2. Is predation on livestock random? If not, how does the imposition of a random scheme affect the model (a sensitivity analysis would be useful); many studies reveal that predation on livestock is not spatially random or uniform. Rather livestock losses are sometimes highly predictable from spatial features and wolf pack demographics. Therefore, I also call your attention to risk models that are analogous to resource selection functions, which have been used to model livestock loss in our region among others (see my lab website and search for "risk" and "forecast" please).

      3 . Has WDFW lethal removal eliminated entire packs and in what percentage of cases? This baseline information might be helpful in interpreting the scenarios. I discuss partial or entire pack removal further below.

      I was confused by the increased removals scenario and the harvest scenario. Given they are differentiated I have to assume increased removals is NOT public hunting, trapping, hounding, etc. It is unclear what conditions might lead to such an increase in lethal removals. The authors wrote "In scenarios 4 and 5 (“Increased removals”), we simulated an increased number of lethal management removals such that 30% of the wolf population[*] would be removed every four years, corresponding to an annual removal rate of 8.5%." Does this replace the baseline removal rate or supplement it? I didn’t see a scientific justification for the value of 30% and I don’t understand where 8.5% came from (30 /4 = 7.5%). Even if I add the baseline it does not reach 8.5%. I’m sure I’m missing something but the calculation could be clarified.

      Another concern about this scenario is that it uses a flat mortality rate (% of population) regardless of conditions. That seems to simulate population reduction (sometimes called culling) but applied randomly to entire packs. Given that is a highly unusual pattern of management, it would help to understand the rationale behind it. See below where other more common scenarios are NOT considered. Therefore, I do not understand the criteria applied when selecting scenarios that deserve modeling and scenarios that do not deserve modeling.

      "Harvest"<br /> See issues with terminology in the section on Minor comments below.<br /> Every 6 months: This is an unusual off-take pattern. Readers may be tempted to assume that the policy-makers among the authors or their superiors in state agencies are planning two seasons of wolf-killing per year. The authors might wish to address why such an unusual wolf-killing system was included in this paper. Also, the method that allows only adults or juveniles yet simulates twice-a-year 'harvest’ assumes the public can avoid killing pups. Is there evidence for that assumption? The assumption seems dubious on its face but regardless it requires some consideration of methods of 'harvest’ and accidental non-target killing.

      Additive: While this is more conservative than any compensatory scenarios, it still does not acknowledge the many sources of evidence for super-additive mortality when the public begins killing wolves: Creel, Vucetich, Chapron, or when wolf-killing is liberalized in general: Santiago-Avila, Louchouarn, Suutarinen, Liberg, Treves. There are now more than ten studies quantifying the super-additive effects on population dynamics or the undocumented losses of wolves when killing is liberalized (I.e., undocumented deaths that can be attributed to policies of liberalized killing).

      The OMISSION of any alternative scenario with super-additive mortality and the OMISSIOn of alternative scenarios with increases in illegal killing triggered by the harvest and increased removals scenarios are problematic. I capitalized the word OMISSION to emphasize that they are not scientific decisions but value-based decisions about which scenarios to publish and which not to publish.

      Value-based decisions are akin to unstated assumptions derived from personal or organizational preferences / beliefs / policies. Assumptions about parameter values or interactions between variables should be transparently stated and usually justified scientifically. Unstated assumptions in a modeling paper seem to me to be scientific missteps because the range of possible parameter values was circumscribed for reasons that are not transparent or justified by peer-reviewed research.

      Also, please note that an attempt to scientifically justify circumscribed parameter values might require an even-handed summary of evidence for and against the assumed constraints on parameter values. For example, the increased removal scenarios (currently unjustified) might be paired with a lowered removal scenario or a scenario that curbs ongoing mortality sources such as poaching or vehicle collisions, hypothetically. To me it seems easier to evaluate alternative scenarios even-handedly than to justify the current ones.

      Furthermore, my concern is that the decisions about which scenarios to publish in the current manuscript leave unanswered 'why these scenarios and not others?' And the authors do not touch upon alternative scenarios for how wolf-human coexistence might play out differently. Instead, the scenarios presented in this paper are a subset of wolf-human coexistence and that subset is slanted towards negative views of wolves (more killing). For example, there is nothing scientific telling us to simulate lethal removal at level x or y. We explored this problem in sustainable use models in Frontiers in Conservation Science in 2021.

      My criticism is meant to be constructive as it is not too late to adapt your models to positive wolf-human coexistence scenarios, such as those involving provisioning to improve wolf reproduction or survival, increasing wild prey bases in regions with low prey, better enforcement against unregulated, human-caused mortality, use of non-lethal methods to protect livestock etc. I understand WDFW might never undertake such actions but that does not constrain scientists seeking approximations of reality. Also, administrations change, private actors / organizations sometimes step in, and background conditions change especially for a simulation run for 50 years. <br /> I hope you see how a subset of scenarios was presented for non-scientific reasons.

      Please remind readers that the selection of scenarios is value-based not science-based. Moreover, the selection of parameters within scenarios may also be value-based. For example, partial pack removals — simulated in your methods when "excess" removals are randomly assigned to another pack short of full pack removal — is NOT suggested to be effective in any study, even Bradley et al. 2015. Moreover, can the latter study even be used to justify the effectiveness of removal of entire wolf packs? I don’t think so. Consider that Santiago-Ávila et al. 2018 showed Bradley et al. 2015 was not reproducible until and unless the methods are clarified. Also, the 2018 article identified a possible statistical bias favoring lethal removal. If the data were to be shared (another hallmark of reproducibility), the bias minimized, and the methods clarified, one might argue that full pack removal has a scientific basis. But we’re not there yet.

      Because I noticed omissions of scenarios and circumscribed parameter values without explicit statement of assumptions and missing literature, I offer a comment on potentially competing interests. T

      The scientific community has changed position on this in recent years and is increasingly recognizing the potentially distorting effects of values and ideology on scientific research. Nothing is necessarily disqualifying but all should be disclosed fully and transparently. Ideological commitments expressed through memberships in civil society and professional societies (e.g., TWS or AFWA), institutional policy positions (e.g., WDFW’s current policies), and personal affiliations or rivalries, might all place pressures on individuals that reflect competing interests. These can affect the unstated assumptions, literature reviews (what is cited and summarized versus omitted) and the methods chosen and analyses used, in addition to the traditional issues relating to financial interests. I am not referring to one or two articles being missed but a pattern of omitting peer-reviewed research in highly ranked international journals as I noted here. I emphasize the issue of potentially competing interests as a way to inspire greater public confidence in the scientific endeavor. Thanks for your kind attention.

      Again I admire your decision to publish preprints so that pre-publication review has an opportunity to influence the future manuscript and perhaps public policy.

      Minor concerns<br /> Terminology: <br /> The term "recovery" has a meaning in US federal and state endangered species law as you all no doubt are aware. Recovery in its legal sense may lead policy-makers to shift regulatory schemes to down- or delist wolves. Therefore it is not a value-neutral scientific term and could be viewed as prejudicial. I see passages in your text where recover(y) is appropriate but others where it was used to refer to recolonization or population growth. There I recommend instead using recolonizing or geographic spread or numerical rebound which do not imply a legal status. This seems especially relevant when scenario outcomes suggest a low likelihood of achieving legal recovery.

      Relatedly, I recommend careful consideration of certain jargon words that may be mainstream in wildlife management but are not commonplace in ecological sciences or policy among all publics – and may have value-based or moral connotations, e.g., harvest and depredation. In place of harvest I suggest "permitted, regulated wolf-killing by the public", because harvest is a euphemism that holds implicit assumptions about the values of wolves and motivations of humans who participate. To see why not to use 'depredation’, look at the first definition in the Oxford English Dictionary. I used it for years but now see the error.

      Finally, the discussion of non-lethal methods might benefit from updating to include studies since 2010 on livestock-guarding dogs, and systematic reviews of effectiveness 2016-2021.

    1. On 2023-04-30 15:24:14, user Gul Zerze wrote:

      I sincerely thank both Emil Thomasen and Kresten Lindorff-Larsen for their time, careful reading, and comments on the manuscript. Below, I attach my responses to each point with reproduction of the comment. Since these commentary is not capable of pasting modified visuals, added/modified visuals can be seen in the published version of the manuscript (doi: 10.1021/acs.jctc.2c01273)

      Comment:<br /> The manuscript by Zerze reports on molecular dynamics simulations of the intrinsically disordered low complexity domain (LCD) of FUS using a beta version of the coarse-grained force field Martini 3. The author performed simulations to study the formation of FUS LCD condensates under varying protein-water interaction strengths (in the Martini force field) and at different NaCl concentrations, and concludes that strengthening protein-water interactions by a factor of 1.03 improves the agreement with experimental transfer free energies between the dilute and dense phases. Additionally, the author concludes that the NaCl concentration affects condensate morphology and protein-protein interactions in the condensate, and that the effect of NaCl concentration on protein-protein interactions in the condensate is sensitive to rescaling of the protein-water interactions. The manuscript provides an interesting and novel benchmark of the (beta) Martini 3 model in predicting phase separation of IDPs, and reveals potential short-comings of the model in predicting protein concentrations in (or volumes of) the condensed and dilute phases. This benchmark will be useful for readers who wish to simulate liquid-liquid phase separation of IDPs with Martini 3, and the work will be interesting to a wider audience interested in the biophysics of IDPs and their condensates.<br /> Below we outline some questions and comments that the author might take into account when revising the manuscript. Our main comment regards a clearer assessment of the convergence of the simulations and correspondingly the lack of error estimates for observables calculated from the simulations. We also suggest a clearer presentation of the experimental data used to validate the simulations. While some of these changes are mostly textual, in other cases we suggest additional simulations. We realize that some of these simulations require substantial resources; if these are beyond what is available, we suggest at least to clarify caveats as per the points below.

      The author’s response: I thank the reviewer for their scrutiny and thoughtful comments that greatly helped substantiating the optimization analysis in the revised version of the manuscript.

      Comment: We have the following suggestions for revisions to the manuscript:<br /> 1)<br /> Fig. 1 and 2: The finding of non-spherical droplets is interesting and intriguing. To examine whether the formation of these shapes in the simulations with higher salt and λ-values represent stable states or perhaps trapped metastable states of the system, we suggest that:<br /> 1a) The author runs simulations with the parameters that give rise to non-spherical morphologies (e.g. λ=1.025 and 50 mM NaCl) starting from the structure of the spherical droplet (for example formed with λ=1.0 and no salt) and observe whether the non-spherical morphology is recovered or the droplet remains stable. If the droplet remains stable, then the effect of salt concentration on the inter-chain contacts (Fig. 6) could be assessed without potentially confounding factors from different dense phase morphologies.

      The author’s response: Following the reviewer’s suggestion, I have performed an additional set of simulations for all λ values (1, 1.01, 1.02, 1.025, 1.03) at 50 mM salt concentration starting from a preformed spherical droplet. The initial condition with the preformed droplet is obtained from the last saved frame of the λ=1 simulation for 0 mM salt. We ran the simulations for 10 microseconds each. Within the given time frame the droplet remained stable for λ values 1, 1.01, 1.02, and 1.025 without a dilute phase concentration. I now added these findings into the supporting information (Figure S5).<br /> I also modified the main text (Page 9 last paragraph and Page 10 first paragraph) as follows:

      “Recent studies from independent groups show that the nonspherical droplet formation might be a kinetic arrest, playing an important role in droplet maturation and aging [51–53]. To test whether the nonspherical morphologies we observed are impacted by the initial conditions, we rerun 50 mM at all λ values starting from a preformed droplet (last saved configuration of 0 mM salt, λ = 1 condition). We simulated each λ for 10 μs and presented the analysis in Figure S5. Within the given simulation time, the initially spherical droplets stayed intact and spherical, except for λ=1.03, which had one copy of the FUS LC protein exchange back and forth between the dense and dilute phases). The enlarged droplet in the case of λ=1.03 also deviated from its initially spherical shape. These findings show that the nonspherical morphology was not reproducible for λ values less than 1.03 when starting from a preformed spherical droplet. We argue that the strength of effective protein-protein interactions at low λ are largely<br /> responsible from the initial spherical droplet staying intact.”

      Since the droplets stayed nearly spherical, I also analyzed the contact formation in these simulations (50 mM added salt, initially starting from a spherical preformed droplet) and presented the findings in Figure S7.

      I also discussed these findings in the main text as follows (Page 19, 20, the last paragraph before Conclusions):

      “Finally, we also examined the contact formation for the case of 50 mM added salt that starts from a preformed droplet (see Identification of condensate formation subsection for the description). As presented in Figure S5, we found that the initially spherical droplet remains largely spherical within the simulation time (never forms rod-like percolated structures) for this case. Therefore, this case helps us assess the effect of salt concentration on the inter-chain contacts without potentially confounding factors from different dense phase morphologies. Figure S7 shows both the contact propensity (A.) and the effect of salt concentration (B.) on the contact propensity. Figure S7A shows that the contact propensity decreases as the λ parameter increases, similar to the findings in Figure 5. Figure S7B shows, however, that the change in contact fraction with respect to 0 mM salt at λ = 1 is weaker (resembling λ = 1.02 at 50 mM salt in Figure 6A) although the salting out effect at high λ (λ = 1.025 and 1.03) are more prominent and stronger compared to those in Figure 6A.”

      Comment: 1b) The author shows time-series or distributions of an observable that reports on the dynamics of the proteins in the non-spherical droplet (e.g. Rg, mean square displacement, residue-residue contacts) and/or of an observable that reports on the dynamics of the droplet shape (e.g. the x-, y-, and z-components of the gyration tensor).

      The author’s response: Following the reviewer’s suggestion, we added the analysis of observables that reports on dynamics of shape fluctuations and size and presented them in Figure S4.

      We also modified the main text (Page 9, second half of the second paragraph) to discuss these findings: <br /> “We also investigated the time dependence of the size and shape of these morphologies by quantifying the radius of gyration (Rg) and the ratio of the smallest and largest eigenvalues of the gyration tensor (Figure S4). The latter offers a measure of sphericity of droplets. We found that low λ cases (λ = 1, 1.01, 1.02) at 0 mM salt have the most spherical morphologies. Beyond λ = 1.025 at no salt, the cluster formation is not tight (as evident from the Rg) so it also loses its sphericity. The condition that shows percolation (λ = 1 at 50 mM salt) has the largest deviation from the sphericity (it is rod-like instead) combined with a large Rg.”

      Comment: 1c) Additionally, independent replicas of droplet formation for each condition and parameter set would be ideal, but we realize that this would be expensive in computational resources and may be infeasible.

      The author’s response: We agree with the reviewer that the molecular simulations presented in this work are highly computationally demanding (e.g., a 10-microsecond simulation of one of these simulations at given salt and given λ takes about 25 days in terms of walk-clock time, occupying 28 CPUs and 4 GPUs) While it certainly is computationally demanding to replicate all λ parameters at all salt concentrations, we now rerun 50 mM salt concentration at all λ parameters where we start from a completely different initial condition (preformed droplet) for each. And we found that the morphology was not reproducible within the given simulation time at low λ, highlighting the initial condition dependence at low λ conditions. We now discussed this in the main text (Page 21, Conclusions).

      “We also note that we observed an initial condition dependence of the morphology at low λ conditions at 50 mM salt. This finding emphasizes the necessity of future work for exploring condensate morphology with proper advanced sampling techniques.”

      Comment: 2)<br /> “As λ increases, the volume of the dense phase increases (and condensed phase concentration decreases accordingly) until the system is not capable of forming a dense phase (λ >1.03)”: From Fig. 1 it seems that the rate of cluster formation decreases as λ increases. Is it not then possible that droplet formation at λ>1.03 is stable at equilibrium, but occurs on time-scales greater than those tested in the simulations? To support the statement that no droplets are stable at λ>1.03, we suggest that the author runs simulations with a higher value of λ starting from the structure of the spherical droplet (formed with λ=1.0 and no salt) to observe whether the droplet is dissolved or remains stable.

      The author’s response: Following the reviewer’s suggestion, we have performed a simulation for λ=1.04 at no salt condition starting from the preformed spherical droplet (last saved configuration of λ=1.0 at 0mM salt) and we found that the droplet quickly dissolves for λ=1.04. This finding is now presented in Figure S3.

      The main text is also modified as follows (Page 9, end of the first paragraph):

      “To further verify that no droplets are stable beyond λ = 1.03, we also ran λ = 1.04 simulations<br /> at no salt conditions starting from a preformed spherical droplet (last saved configuration of<br /> λ = 1 at 0 mM salt). We then analyzed the cluster formation as a function of time (Figure<br /> S3) and found that the initial droplet dissolves quickly (at a timescale shorter than that of<br /> the formation of the droplets).”

      Comment: 3)<br /> Figure 3: The use of the radial distribution does not seem ideal for the droplets that have a non-spherical morphology, as certain distances will report on an average over the dense and dilute phases. This should at a minimum be discussed.

      The author’s response: Following the reviewer’s suggestion, we have added further discussion related radial density distribution to the main text (Page 12, first paragraph):

      “This approach works reasonably well for droplets that have spherical/ellipsoidal shapes. However, since the condensates for the conditions with finite salt concentrations significantly deviate from a sphere (they do not show a clear plateau as the center is approached), we used a surface reconstruction method [54] to estimate the volume and concentration instead of fitting the radial density profiles/using the limiting values.”

      Comment: 4)<br /> Table 1: It seems that the discrepancy between the sigmoidal fit approach and the surface reconstruction approach increases with λ, possibly due to sensitivity to the shape of the droplets, illustrating that there might be significant uncertainty associated with the reported dense phase volumes. We think it would be useful to have an error estimate for the reported dense phase volumes (e.g. an error over volume calculation approaches and/or over different probe sizes).

      The author’s response: The volume obtained by surface reconstruction is definitely highly sensitive to the probe size. To justify the size of the probe that I used, I directly compared the sigmoidal fit protein concentrations and the surface construction protein concentration calculated by different probe sizes (Figure S3 in the old SI, Figure S6 in the revised SI). Based on that comparison, probe radius 10 A was the size that minimized the differences considering all lambda values. That’s how I justified the probe size I used. For the uncertainty/error estimates, I performed block averaging analyses (please also see the response to the point 7).

      Comment: 5)<br /> Table 2 and Fig. 4: We suggest that the author more explicitly states which experimental data was used for comparison with the simulations in Fig. 4. We also suggest a more direct comparison with experimental data points where possible (e.g. by showing the experimental values of csat as a function of NaCl concentration).

      The author’s response: We used two experimental papers to extract the experimental data, one is reference 36. In reference 36, the authors state: “Using incubation on ice to increase the driving force for droplet formation followed by centrifugation to fuse the droplets due to their higher density, our 15 ml samples of 1 mM FUS LC phase-separated to form an ∼400 μl viscous, protein-dense phase stable for weeks at room temperature. FUS LC concentration in the phase is approximately 7 mM (120 mg/ml FUS LC) as determined by spectrophotometry.“

      We note that the salt concentration is not specified in this case (or the authors obtained approximately the same protein concentration in the dense phase regardless of the salt concentration). Also, the thermodynamic conditions defined here does not exactly correspond to those in our simulations. That’s partly the reason why we looked for multiple sources of experimental data. The other experimental work that we used is reference 39. In reference 39, the authors state that “The relative intensity of the glutamine side chain residue NMR resonances in the condensed phase compared to a standard concentration (100 μM) dispersed phase FUS LC suggests a concentration of 27.8 mM = 477 mg/ml in the condensed phase.”

      The salt concentration in the corresponding NMR experiments were carried out at 25 °C in 50 mM MES, 150 mM NaCl pH 5.5. The conditions do not exactly correspond to our thermodynamic conditions, either. Since an exact match is not available in the conditions, we did not prefer to present a direct comparison of dense phase concentrations, instead, we preferred to show a range in Figure 4. We now modified the main text (Page 15, right above the Contact Maps subsection) to more explicitly state the source of the data:

      “The experimental data range is referenced from the work by Fawzi and coworkers; [36,39] where reference [36] measures the FUS LC concentration in the dense phase as approximately 120 mg/mL (spectroscopically) and in reference [39], a 477 mg/mL FUS LC concentration is deduced from the relative intensity of the glutamine side chain residue NMR resonances in the condensed phase (compared to a standard protein concentration in the dispersed phase, which is given as 100 μM, or 1.71 mg/mL). 477 mg/mL FUS LC dense phase has been obtained from 15 ml samples of 1 mM FUS LC solutions [36] (from which we calculated the dilute phase concentration as approximately 14.3 mg/mL). We used these dense phase and their respective dilute phase concentrations to calculate the experimental range of transfer free energy (gray-shaded areas in Figure 4).”

      Comment: 6)<br /> “We used the “tiny” bead type (TQ1) both for Na+ and Cl- ions”: The author should clarify the reason for and possible effects of choosing the TQ1 bead type, as TQ5 is, we think, the standard bead type for Na+ and Cl- ions in Martini 3.

      The author’s response: We would like to clarify that tiny refers to the bead type being Txx. We then also would like to clarify that TQ5 type was not available in the MARTINI version that we used. Ion topology file in the version that we used only had TQ1 types as the ion type. We are pasting the contents of “martini_v3.0_ions.itp” file below:

      ;;; IONS<br /> ;

      ;;;;;; SODIUM ION

      [moleculetype]<br /> ; molname nrexcl<br /> TNA 1

      [atoms]<br /> ;id type resnr residu atom cgnr charge<br /> 1 TQ1 1 ION NA 1 1.0

      ;;;;;; CHLORIDE ION

      [moleculetype]<br /> ; molname nrexcl<br /> TCL 1

      [atoms]<br /> ;id type resnr residu atom cgnr charge<br /> 1 TQ1 1 ION CL 1 -1.0

      ;;;;;; CHOLINE ION

      [moleculetype]<br /> ; molname nrexcl<br /> NC3 1

      [atoms]<br /> ;id type resnr residu atom cgnr charge<br /> 1 Q0 1 ION NC3 1 1.0

      ;;;;;; CALCIUM ION

      [moleculetype]<br /> ; molname nrexcl<br /> SCA 1

      [atoms]<br /> ;id type resnr residu atom cgnr charge<br /> 1 SQ2 1 ION CA 1 2.0

      Since we understand that this is causing a confusion, we modified the sentence as below (Page 6, right above the Simulation Details section):

      “We used the relevant TQ bead types for Na+ and Cl- ions and kept the ion-water and ion-protein interactions unmodified.”

      For further details of the parameters (e.g., epsilon-sigma), we made our topology and run parameter files publicly available (please see the response to the point 10).

      Comment: 7)<br /> We suggest that the author, where possible, reports error estimates for the various observables, for example from block error analysis and/or repeated simulations.

      The author’s response: We performed block averaging analysis (using two block) for volume estimation (accordingly, the protein concentration in the dense phase) and included the error estimates in Table 1 (Page 12). We note that for most ???? parameters, the error was less than 1%. But we now added the errors larger than 1% in Figure 4. We modified the Table 1 caption as:<br /> “…. Statistical errors calculated by block averaging of the data (dividing the equilibrated data into two equal blocks) are less than 1% at low ???? conditions. Errors larger than 1% are reported.”

      Comment: 8)<br /> It would be useful to include a discussion of the effects of simulation convergence and simulation starting configurations on the reported results.

      The author’s response: We added a discussion of the reproducibility issue and the initial condition dependence both to the Results and Discussion section and the Conclusions section (please also see the responses to the point 1a and 1c).

      Comment: 9)<br /> A discussion of the potential differences in the effect of non-bonded cut-offs in the dilute and dense phase would also be useful.

      The author’s response: We used a fairly large cutoff distance (1.1 nm) for short-range treatment of vdW and electrostatics but a potential nonbonded cutoff effect that I can think of is the long-range treatment of electrostatics. While vdW interactions are large power of r in denominator (therefore, negligible contribution to the potential at large r), we may argue that the long-range treatment of electrostatics might be a concern in general. It is well known that the simple cutoff of electrostatic interactions introduces artifacts on phase behavior of anomalous liquids that has two distinct phases [e.g., J. Chem. Phys. 131, 104508 (2009)]. Here, we applied the reaction field method for long-range treatment of electrostatics. In this method, a given particle is assumed to be surrounded by a spherical cavity of finite radius within which the electrostatic interactions are calculated explicitly. Outside the cavity, the system is treated as a dielectric continuum. Any net dipole within the cavity induces a polarization in the dielectric, which in turn interacts with the given molecule. The reaction field method allows the replacement of the infinite Coulomb sum by a finite sum plus the reaction field. One caveat of this approach might be the nonuniform distribution of the particles within the system (i.e., one protein-dense phase and one protein-dilute phase), which may jeopardize the assumption that outside the cavity is a uniform continuum dielectric. While this caveat may make the Ewald summation (or particle mesh Ewald, faster version of Ewald sum) look more preferable, we note that Ewald sum and reaction field techniques yield nearly identical phase behavior for liquid crystals (also nonuniform in nature) (see, Molecular Physics 92(4), 723-734 (1997)). We discussed some of these points in the main text as follows (Page 6, third from the last sentence):

      “Long-range electrostatic interactions were calculated using a generalized reaction field method [45]. We note that a long-range treatment of electrostatic interactions is essential to obtain accurate phase behavior [46].”

      Comment: 10)<br /> It would be very useful if the inputs/settings (including starting configurations) used for simulation and code for analysis were available.

      The author’s response: Following the reviewer’s suggestion, we uploaded the initial configurations and run files for all lambda values for 0 mM salt and 100 mM to GitHub and made it publicly available. We now noted in the availability of the data in the main text by modifying the last paragraph of Modeling subsection as follows:

      “Equilibrated initial conditions, topology files, and run parameter files for all λ values of 0 mM and 100 mM salt are publicly available on GitHub (https://github.com/gzerze/m...

      Comment: We also have the following suggestions for minor revisions to the manuscript:<br /> 1)<br /> “We kept the protein-protein interactions unmodified (and no additional elastic backbone constraints were applied)”: The author should clarify whether this includes assignment of secondary structure and/or side chain angle and dihedral restraints (ss and scfix in Martinize).

      The author’s response: Yes, this would apply for any restraints (i.e., they would remain unmodified). This particular protein, FUS LC, is left fully flexible, without any backbone/side chain structure. We clarified this in the main text by modifying the relevant part in the Modeling subsection:

      “No elastic backbone (or side chain) constraints were applied (i.e., FUS LC is kept fully flexible). We kept the protein-protein interactions unmodified but systematically tested a range of scaled protein-water interactions.”

      Comment: 2)<br /> “All simulations were performed using GROMACS MD engine (version 2016.3).”: Error in references.

      The author’s response: The references are fixed.

      Comment: 3)<br /> In the Cluster Formation Analysis section: We suggest that the author cites the specific package used (e.g. SciPy).

      The author’s response: Following the reviewer’s suggestion, we added the name of the routine related references by modifying the relevant part in Cluster Formation Analysis subsection as follows:

      “Any two protein molecules are considered to be in the same cluster if any two beads of the molecules are within 0.5 nm (or less) distance from each other. Based on this criterion, we built adjacency matrices and then found the connected components by using the compressed sparse graph routines of public Python libraries [50]”

      Comment: 4)<br /> Fig. 2: There are small red dots on the droplets, which should either be explained in the figure text or removed.

      The author’s response: Following the reviewer’s suggestion, we remade the Figure 2 by removing the red dots.

      Comment: 5)<br /> Fig. 3: It would be useful for the reader if the NaCl concentration was labelled at the top of each column. Additionally, the radial distribution of the ion concentration is shown as two separate rows, which we assume corresponds to Na+ and Cl- ions. This should be clearly labelled.

      The author’s response: Following the reviewer’s suggestion, we updated Figure 3 with proper labels.

      Comment: 6)<br /> “We found the largest water fraction For the ionic species…”: Typo?

      The author’s response: We removed that incomplete sentence now.

      Comment: 7)<br /> Fig. 4: Depending on how the plot is updated with more details on the experiments, perhaps the range shown on the y-axis could be made smaller.

      The author’s response: Figure 4 is updated as presented above (please see the response to point 7 above).

      Comment: 8)<br /> Fig. 5: May be clearer with a colourmap with three colours, as in figure 6.

      The author’s response: Figure 5 uses a color scale that changes the colors uniformly from black to white. For contact maps (like Figure 5), since the range of change is sequential growth of fraction, we thought a perceptually uniform sequential color scale fits better as opposed to a divergent color scale (e.g. the color scale in Figure 6).

    2. On 2022-11-27 12:46:31, user Kresten Lindorff-Larsen wrote:

      Review of “Optimizing the Martini 3 force field reveals the effects of the intricate balance between protein-water interaction strength and salt concentration on biomolecular condensate formation” by Gül H. Zerze<br /> Reviewed by F. Emil Thomasen and Kresten Lindorff-Larsen

      Comments:The preprinted manuscript by Zerze reports on molecular dynamics simulations of the intrinsically disordered low complexity domain (LCD) of FUS using a beta version of the coarse-grained force field Martini 3. The author performed simulations to study the formation of FUS LCD condensates under varying protein-water interaction strengths (in the Martini force field) and at different NaCl concentrations, and concludes that strengthening protein-water interactions by a factor of 1.03 improves the agreement with experimental transfer free energies between the dilute and dense phases. Additionally, the author concludes that the NaCl concentration affects condensate morphology and protein-protein interactions in the condensate, and that the effect of NaCl concentration on protein-protein interactions in the condensate is sensitive to rescaling of the protein-water interactions. The preprint provides an interesting and novel benchmark of the (beta) Martini 3 model in predicting phase separation of IDPs, and reveals potential short-comings of the model in predicting protein concentrations in (or volumes of) the condensed and dilute phases. This benchmark will be useful for readers who wish to simulate liquid-liquid phase separation of IDPs with Martini 3, and the work will be interesting to a wider audience interested in the biophysics of IDPs and their condensates.

      Below we outline some questions and comments that the author might take into account when revising the manuscript. Our main comment regards a clearer assessment of the convergence of the simulations and correspondingly the lack of error estimates for observables calculated from the simulations. We also suggest a clearer presentation of the experimental data used to validate the simulations. While some of these changes are mostly textual, in other cases we suggest additional simulations. We realize that some of these simulations require substantial resources; if these are beyond what is available, we suggest at least to clarify caveats as per the points below.

      We have the following suggestions for revisions to the manuscript:

      1)<br /> Fig. 1 and 2: The finding of non-spherical droplets is interesting and intriguing. To examine whether the formation of these shapes in the simulations with higher salt and λ-values represent stable states or perhaps trapped metastable states of the system, we suggest that:

      1a) The author runs simulations with the parameters that give rise to non-spherical morphologies (e.g. λ=1.025 and 50 mM NaCl) starting from the structure of the spherical droplet (for example formed with λ=1.0 and no salt) and observe whether the non-spherical morphology is recovered or the droplet remains stable. If the droplet remains stable, then the effect of salt concentration on the inter-chain contacts (Fig. 6) could be assessed without potentially confounding factors from different dense phase morphologies.

      1b) The author shows time-series or distributions of an observable that reports on the dynamics of the proteins in the non-spherical droplet (e.g. Rg, mean square displacement, residue-residue contacts) and/or of an observable that reports on the dynamics of the droplet shape (e.g. the x-, y-, and z-components of the gyration tensor).

      1c) Additionally, independent replicas of droplet formation for each condition and parameter set would be ideal, but we realize that this would be expensive in computational resources and may be infeasible.

      2)<br /> “As λ increases, the volume of the dense phase increases (and condensed phase concentration decreases accordingly) until the system is not capable of forming a dense phase (λ >1.03)”: From Fig. 1 it seems that the rate of cluster formation decreases as λ increases. Is it not then possible that droplet formation at λ>1.03 is stable at equilibrium, but occurs on time-scales greater than those tested in the simulations? To support the statement that no droplets are stable at λ>1.03, we suggest that the author runs simulations with a higher value of λ starting from the structure of the spherical droplet (formed with λ=1.0 and no salt) to observe whether the droplet is dissolved or remains stable.

      3)<br /> Figure 3: The use of the radial distribution does not seem ideal for the droplets that have a non-spherical morphology, as certain distances will report on an average over the dense and dilute phases. This should at a minimum be discussed.

      4)<br /> Table 1: It seems that the discrepancy between the sigmoidal fit approach and the surface reconstruction approach increases with λ, possibly due to sensitivity to the shape of the droplets, illustrating that there might be significant uncertainty associated with the reported dense phase volumes. We think it would be useful to have an error estimate for the reported dense phase volumes (e.g. an error over volume calculation approaches and/or over different probe sizes).

      5)<br /> Table 2 and Fig. 4: We suggest that the author more explicitly states which experimental data was used for comparison with the simulations in Fig. 4. We also suggest a more direct comparison with experimental data points where possible (e.g. by showing the experimental values of csat as a function of NaCl concentration).

      6)<br /> “We used the “tiny” bead type (TQ1) both for Na+ and Cl- ions”: The author should clarify the reason for and possible effects of choosing the TQ1 bead type, as TQ5 is, we think, the standard bead type for Na+ and Cl- ions in Martini 3.

      7)<br /> We suggest that the author, where possible, reports error estimates for the various observables, for example from block error analysis and/or repeated simulations.

      8)<br /> It would be useful to include a discussion of the effects of simulation convergence and simulation starting configurations on the reported results.

      9)<br /> A discussion of the potential differences in the effect of non-bonded cut-offs in the dilute and dense phase would also be useful.

      10)<br /> It would be very useful if the inputs/settings (including starting configurations) used for simulation and code for analysis were available.

      We also have the following suggestions for minor changes to the manuscript:

      1)<br /> “We kept the protein-protein interactions unmodified (and no additional elastic backbone constraints were applied)”: The author should clarify whether this includes assignment of secondary structure and/or side chain angle and dihedral restraints (ss and scfix in Martinize).

      2)<br /> “All simulations were performed using GROMACS MD engine (version 2016.3).”: Error in references.

      3)<br /> In the Cluster Formation Analysis section: We suggest that the author cites the specific package used (e.g. SciPy).

      4)<br /> Fig. 2: There are small red dots on the droplets, which should either be explained in the figure text or removed.

      5)<br /> Fig. 3: It would be useful for the reader if the NaCl concentration was labelled at the top of each column. Additionally, the radial distribution of the ion concentration is shown as two separate rows, which we assume corresponds to Na+ and Cl- ions. This should be clearly labelled.

      6)<br /> “We found the largest water fraction For the ionic species…”: Typo?

      7)<br /> Fig. 4: Depending on how the plot is updated with more details on the experiments, perhaps the range shown on the y-axis could be made smaller.

      8)<br /> Fig. 5: May be clearer with a colourmap with three colours, as in figure 6.

    1. On 2023-04-01 23:15:29, user Vitaly V. Ganusov wrote:

      Review of the paper by Shin et al. “Lung injury induces a polarized immune response by self antigen-specific FoxP3+ regulatory T cells “ (MICR 603 Immunology JC)

      Summary.

      We know that central tolerance – removal of T cells specific to self antigens – is not 100% efficient and some self-reactive T cells do accumulate in the periphery. This leaky process is likely responsible for some autoimmune reaction observed in humans. However, how such self-reactive T cells are activated remains poorly defined. The authors developed an interesting system where they have T cells recognizing a specific antigen that was engineered to be expressed in lung epithelial cells (OVA + 2W + gp66). By using the antigen with several epitopes this allows to investigate how T cell response to one of these epitopes impacts endogenous immune response to other epitopes. Interestingly, authors found that transfer of T cells specific to gp66 epitope into mice does not result in inflammatory response to 2W epitope by endogenous, 2W-specific CD4 T cells. Instead, the authors observed expansion of 2W-specific Tregs. Response was different in the lymph vs. lung. Interestingly, after primary response, immunization with 2W peptide with an adjuvant did not result in expansion of conventional, 2W-specific T cells indicating induction of tolerance. Expansion of 2W-specific Tregs was also observed by intranasal inoculation of LPS into mice. Overall, this study provides an interesting view on how ongoing immune response may influence response of self-specific CD4 T cells.

      Positive feedback.

      There are a lot of interesting things about this paper. First, the system to have lung-restricted antigen that has several well defined epitopes is highly innovative. The methodology to accurately count the number of naive T cells in the whole mouse (we talk about 10-100 cells per mouse!) is impressive. Looking at endogenous response, without transfer of monoclonal TCR-Tg T cells is really fundamental. The way how authors look at two tissues - lymphoid (lymph nodes) and lung - is important. The use of LPS injection as a model for lung injury is interesting as it also allows to look at actual pathology (mouse weight) as a medically relevant read-out. The text is short (perhaps in some places too short, see below for comments) and figures are relatively clear (see comments). Having an experimental layout for how the mice were treated, along with what was harvested for each experiment was very useful. Finally, having many different lines of mice is very impressive!

      Major Concerns

      I do not understand how transfer of naive T cells results in pathology in the lung (Fig 1 results). Per basics of immunology, 3 signals are needed to activate T cells - i.e., there is a need of inflammation to induce immune response and trafficking to the lung. Perhaps activated T cells were transferred but that was not clear from experimental design in Fig 1. Authors must provide better rationale of how transfer of naive T cells causes IgM in BAL to increase. Tracking immune response of transferred cells (e.g., activation markers, division history by CFSE, cell numbers in LNs/spleen over time) would be needed. Also, it would be very important to perform titration experiments to show how the number of transferred T cells impacts pathology. Similarly, why day 7 was chosen as the point to measure the endogenous response was not clear.

      While measurements of T cells in lymph nodes and spleen are typically efficient (most cells are recovered), isolation of activated T cells from nonlymphoid tissues, especially the lung is highly inefficient and may be biased (some subsets could be better isolated than others, PMID: 25957682). Confirming the results of Treg bias in lung samples must be done with using microscopy. Furthermore, when T cells are isolated from tissues due to contamination with the blood, cells in the circulation may be detected as in the parenchyma (24385150). Experiments must be repeated to include intravascular staining to separate cells in the blood vs. parenchyma to indicate that Tregs in the lung are in fact in the lung.

      I found it weird that the authors claim that 2W-specific Tregs are responsible for suppression of endogenous responses to 2W upon antigen+adjuvant injection and yet, depletion of Tregs did not result in a new response. A simpler interpretation is induction of anergy in endogenous T cells upon exposure to Ag in the absence of strong inflammation. Text must be carefully curated to avoid bias towards one favorite explanation.

      Focus on SLOs and lung is clear but I wonder if using another control peripheral tissue that did not express the antigen could be useful. For example, measuring T cell accumulation in the liver may be a useful control.

      It was not clear if expression of OVA is actually restricted to the lung. Perhaps some more thorough analysis of other tissues would be helpful to verify the absence of leakiness of the gene expression.

      Minor concerns

      Having numbers for lines in the paper could allow for better referencing to specific statements made in the paper.

      While for most immunologists Tregs are FoxP3, some younger researchers may not know this. Mentioning that this is how you define Tregs would be useful. Also, assessing the function of these T cells would be useful.

      Please do not use “ns” or “**” to denote statistical significance. Use actual p values, e.g., p=0.34 or p=0.012. Additionally, indicating fold difference between groups (effect size) could be also useful.

      In introduction: Whether autoimmune responses are driven by naive T cells or by cross-reactive memory T cells is unclear. Cross-reactivity may be a simpler explanation given that memory T cells may require lower thresholds for activation.

      Authors should describe better different epitopes used in the construct, e.g., gp66 is from LCMV.

      Why did authors use gp66-specific CD4 T cells and not OVA-specific OTII cells? Are the results the same is using T cells of a different specificity?

      Are the detected Tregs derived from the thymus or are these “converted” naive T cells to the Treg phenotype? I don’t think that the current data allow to discriminate between these alternatives.

      When indicating difference in expansion in the Results section, please indicate how much (how many fold) is that expansion.

      How is the lung injury by LPS dependent on the LPS dose? Perhaps this needs to be discussed.

      I wonder that measuring kinetics of response, e.g., before day 7 and after, may be useful. We know that exposure to self antigens typically results in deletion of naive CD8 T cells (10843383)

      Which specific LNs were isolated? This probably should be listed in materials and methods section.

      I wonder if plotting some data as paired (e.g., Fig 1 - 2W vs. SMARTA) could reveal some additional information.

      How were Tr1 cells gated? Some flow cytometry graphs may be useful here (Suppl Fig S2)

      Suppl Fig 3 would benefit from experimental design panel.

    1. On 2023-01-05 20:51:34, user Gregory Way wrote:

      Wong et al. present a deep learning approach called MOAProfiler (MP) to specifically predict compound mechanism of action (MOA) from Cell Painting images. They benchmark MP against CellProfiler (the standard image-based profiling approach) and DeepProfiler (an emerging image-based profiling approach also based on deep learning) using two publicly-available datasets (JUMP-pilot and LINCS). They evaluate these approaches using precision, recall, and f1 score at k for held out MOA predictions and by comparing similarity between same-MOA and different-MOA profiles. They report an astounding 1,000% performance increase for MP over CellProfiler and DeepProfiler in grouping like MOAs. We thank the authors for posting their work as a preprint - thank you!

      The primary innovation in MP is the specific training approach. MP uses the same architectural backbone as DeepProfiler (EfficientNet), but trains the model directly to predict compound MOA (instead, DeepProfiler uses EfficientNet to derive representations). Additionally, MP does not perform single-cell segmentation, instead training using full field of views and a series of data augmentations. The authors use the last layer as the per-compound feature embedding in their performance benchmarks. The authors also include several convincing supplementary analyses that further support their claims.

      Overall, the paper presents a very interesting observation and pushes against a commonly held mindset of analyzing Cell Painting data with generalist/universal approaches. Instead, the paper suggests that fine-tuned models for specific applications are vastly superior for specifically tailored tasks.

      However, we have two major concerns and several relatively minor comments that the authors might clarify in order to strengthen their findings and claims.

      Major concerns:

      • Our primary concern involves publicly-available resources. Namely, the github url is not public: https://github.com/pfizer-r.... Because we were unable to access the code, we were not able to perform a detailed code review. Additionally, the authors link to the CellProfiler and DeepProfiler embeddings they used to benchmark. These embeddings were derived from https://github.com/broadins.... These are not the official LINCS and JUMP resources, and at least one of the links pointed to level 3 profiles, which are not normalized. This could at least partially explain the exceptionally poor performance for CellProfiler and DeepProfiler.
      • Second, the authors train two separate MP models for both datasets. Did the authors try applying a trained-MP on the alternative dataset? The authors state: “To simulate the real-world use case of identifying MOAs of unknown held-out compounds, we performed an analysis where we split the dataset by compound instead of by wells (Methods).” We imagine that analyzing future compounds using embeddings of a pre-trained MP is also a common real-world application. This analysis would also reveal the level of overfitting occurring in each independently trained dataset. Would combining datasets improve performance?

      Minor comments and concerns:

      • The authors state: “​​Although traditional computer vision techniques have proved useful, they often require much fine-tuning and require human intelligence and intuition for deciding which phenotypic features and their parameters are important to measure.” We think this is a really good point, and we are glad someone else brought up the parameters and all the fine-tuning that typically needs to happen, even for generalist approaches.

      • The authors state: “In contrast, deep learning has emerged as a tool for learning and encoding meaningful representations (i.e. embeddings) without requiring humans to know beforehand what features may be useful for the task of interest.” We may have missed this, but the authors might decide to mention the deep learning limitation of having unlabeled and difficult-to-interpret features.

      • Figure 1B needs a scale bar

      • The authors state: “We divided the dataset such that 60% of the wells were assigned to training, 10% to validation, and 30% to test (Methods).” What does “class-balanced the training set” mean? Is this during cross validation? The authors should clarify.

      • The authors state: “We also ensured each MOA’s test wells spanned multiple plates (at least seven, Figure 2D, left)”. However, Figure 2D shows that most MOAs in LINCS spanned fewer than 7 plates, what did the authors do with those?

      • The authors state: “We also included the negative DMSO as a class to learn but excluded it from all performance metrics because of its overrepresentation in the dataset”. It would be helpful for the authors to clarify how they handled positive controls. Also related, the authors state: “we performed four analyses to assess how well the embeddings captured MOA-specific features.” How did DMSO perform? It would be interesting to see the distribution of DMSO probabilities across classes, which could point to classes with no effect or how often DMSO features might be influenced by batch effects.

      • For Figure 3A, the authors should clarify that their supervised learning architecture was multi-class. This is not explicitly stated.

      • The authors state: “On the held-out test set, the model achieved an area under the precision recall curve (AUPRC) of 0.46 (random AUPRC = 0.006) for image field classification over 176 MOA classes (Figure 3A)”. How are the authors calculating this random AUPRC? If this is theoretical, the authors should compare performance with a model trained with a randomly shuffled baseline.

      • Additionally, the authors state elsewhere: “it was able to correctly predict MOAs for 10.2-13.6% of the compounds in a space of 176 possible MOAs. Compared to a random baseline of 0.6%, this is a 17.9-23.9x improvement.” This begs another question of how the authors formed the baselines. Also, why did the authors choose to not include DP and CP in this eval?

      • The Figure 4B plate map might be wrong. There are more DMSO and what are the NAs?

      • How did the authors determine the categories “strongly correlated” and “weakly correlated”? At different thresholds did MP still outperform?

      • The authors state: “Performance varied depending upon whether we predicted MOA by the neural network’s classification output or by a compound’s latent similarity to training compound embeddings”. The authors should clarify how they determined these classification outputs and latent similarities as they are introduced.

      • The authors state: “(delta = 0.44 for MP, Figure 3C). For both CP and DP, the difference was smaller (delta = 0.03 for CP, 0.03 for DP).” The authors define delta in the figure legend, but this should also be clearly delineated in the methods.

      • We were confused by the legend in figure 4D - why are each of the models showing a different k? Is this the optimal k? MP doesn’t look optimal at k=4.

      • The authors state: “From a low-dimensional t-distributed stochastic neighbor embedding (TSNE) visualization of embeddings from three example MOAs, we could see that different compounds with the same MOA were clustered together with different MOAs inhabiting different areas in latent space (Figure 3G).” How did the authors choose these three example MOAs? Why not include all of them? It would be nice to visualize all embeddings for both datasets, and the TSNE plots look a bit strange, with highly similar distances between points.

      • The authors state: “However, the model created embeddings that were clustered by MOA despite each MOA being represented by multiple compounds (Supplemental Figure 1).” Supplementary Figure 1 is not a specific enough reference - there are multiple panels and it is unclear which panel the reader should focus on.

      • The authors state: “We found minor differences in classification accuracy (0.54 vs .50) suggesting that the model was not leveraging much confounding edge-specific features for its learning” Given the number of NA’s (especially in LINCS platemap in Figure 4), normalization to remove batch effects or TSNE/UMAP to suggest no batch effects would be more convincing.

      • Figure 6G mentions different shapes in the legend but all look like circles in the image (they are different but it's very hard to tell). The authors also forgot to include the letter g in the figure legend.

      • Does supplementary figure 3 show MP embeddings? This is not explicitly stated.

      • Performance across MOA counts for MP is impressive! Very strong performance at low n

      • In the discussion, the authors state: “Second, all the analyses were performed on compounds with just one known MOA. Understanding drugs that are associated with multiple MOAs is an important task, but our study did not address this question.” The authors seem to avoid explaining this in-depth throughout the article. Why is it an important task and is their justification for not including drugs with multiple MOAs good enough? They mention that they didn’t include compounds with multiple MOAs to simplify the compound space and limit polypharmacology intricacies. Did they try to include compounds with multiple MOAs? If so, I think they should report the results. If the results are bad, then that could give insight into how we can improve performance.

      • In the discussion, the authors state: “Although DP is another deep learning based approach to phenotypic profiling that also uses an EfficientNet backbone architecture, we observed larger performance gains with MP.” What was the authors’ rationale for using EfficientNet? Also, “architecture” here and in other sentences appears to have a broad meaning. Could another word be substituted for greater specificity? We think it would be helpful to include a diagram of their model architecture.

      • The authors state: “We permuted each channel’s brightness and contrast independently by a random factor in the range of 0 to 0.30 (just for the LINCS dataset).” This seems non-traditional, the authors should provide a citation. Why not perform this in the JUMP dataset?

      • The authors state: “As a final training augmentation step, we performed random 90-degree rotations on each image, along with random horizontal flips.” The authors should specify how many augmentations they performed, how did this expand the dataset, were any specific augmentations particularly helpful?

      • The authors state: “We kept only the compound data that had no more than one known MOA according to the CLUE Connectivity Map…” How often does compound data have more than one MOA from the CLUE Connectivity Map? Would it create a significant difference in results if others were included? How was CLUE connectivity data joined with or used as a filter for JUMP1?

      • The authors state: “We trained for 100 epochs and selected the model that had the highest accuracy on the validation set. We used a learning rate of 0.1, a weight decay of 0.0001, a dropout rate of 0.2, a learning momentum of 0.9, a learning rate scheduler with a gamma decay of 0.1 at epoch 50 and 75, and batch size of 56 for training.” Did the authors perform any sort of hyperparameter optimization? How did they select these hyperparameters?

      • For their CellProfiler pipelines, the authors do not explain why they used specific modules. The pipeline utilizes various different modules that I haven’t seen in other pipelines so it would help to know what is being done if there were notes.

      Reviewed by:<br /> Gregory P. Way, PhD<br /> Jenna Tomkinson<br /> Roshan Kern<br /> Dave Bunten<br /> Parker Hicks<br /> Rose Doss<br /> Keenan Manpearl

    1. On 2022-10-13 19:16:01, user BacillusBaRosh wrote:

      Author responses to feedback posted on hypothes.is - cut and paste because could not figure out how to respond there https://hypothes.is/a/5fVcAEaSEe2k4CPVTDZz7Q

      AtanasRadkov<br /> Oct 7<br /> on "Magnesium modulates Bacillus s…"<br /> (www.biorxiv.org)<br /> General comments:

      This study carefully delineates the role of magnesium in cell division versus cell elongation. The results are really important specifically for rod-shaped bacteria and also an important contribution to the broader field of understanding cell shape. Specifically, I love that they are distinguishing between labile and non-labile intracellular magnesium pools, as well as extracellular magnesium! These three pools are really challenging to separate but I commend them on engaging with this topic and using it to provide alternative explanations for their observations!

      A major contribution to prior findings on the effects of magnesium is the author’s ability to visualize the number of septa in the elongating cells in the absence of magnesium. This is novel information and I think the field will benefit from the microscopy data shown here.

      I completely agree with the authors that we need to be more careful when using rich media such as LB. It is particularly sad that we may be missing really interesting biology because of that! It’s worth moving away from such media or at least being more careful about batch to batch variability. Batch to batch variability is not as well appreciated in microbiology as it is for growing other cell types (for example, mammalian cells and insect cells).

      For me, the most exciting finding was that a large part of the cell length changes within the first 10min after adding magnesium. The authors do speculate in the discussion that this is likely happening because of biophysical or enzymatic effects, and I hope they explore this further in the future!

      I love how the paper reads like a novel! Congratulations on a very well-written paper!

      Kudos to the authors for providing many alternative explanations for their results. It demonstrates critical thinking and an open-mind to finding the truth.

      Comment<br /> Figure 2C → please include indication of statistical significance<br /> Figure 3C → please include indication of statistical significance<br /> Figure 6A → please include indication of statistical significance<br /> Figure 8B → please include indication of statistical significance<br /> Figure S1B → please include indication of statistical significance<br /> Figure S3B → please include indication of statistical significance

      Response<br /> Easy to add

      Comment<br /> For your overexpression experiments, do the overexpressed proteins have a tag? It would be helpful to have Western blot data showing that the particular proteins are actually being overexpressed. I think the phenotypes that you observe are very compelling, so I don’t doubt the conclusions. Western blot data would just provide some additional confirmation that you are actually achieving overexpression of UppS, MraY, and BcrC.

      Response<br /> The proteins are untagged. For the UppS and BcrC the cell shortening occurs with addition of inducer, , so strong indication expression is occurring. A western would provide information about degree of overexpression, but we don’t think is necessary to support conclusion drawn. Do you think there is an alternative possibility that needs to be excluded? We note that in another preprint (https://www.biorxiv.org/con... the authors delete the native uppS in their inducible Phy-uppS strain (Fig S4) and at 100 uM IPTG (10X less than what we used in experiment) the cells have wt growth on LB plates, so we at least know the Phy-uppS is functional and made (or they would die!). We are introducing the uppS deletion into our strain to see if we can identify a concentration of IPTG that doesn’t affect cell growth but still induces shortening.

      For MraY, the result is negative, so you are spot on – it is impossible to tell if due to lack of overexpression from data shown. We only know the strain is correctly made from sequencing. We will investigate if there is an antibody or functional fusion available. The reason we were not sure was worth doing is because the MraY reaction is reversible (15131133). This means that without a phenotype, there is no simple way to know the reaction can even be pushed forward even if the overexpression is confirmed (more negative data). We actually overexpressed some other proteins that act downstream (MraY, MurJ, AmJ) and they were also negative for shortening. Probably we should remove the negative data or reword to make the caveats of the negative result clear.

      Question<br /> Based on your data, there are definitely differences in gene expression when you compare cells grown in media with and without magnesium. Because the majority in cell length increase occurs in such a short time though (the first 10min), I was wondering if you think that some or most of it is not due to gene expression?

      Response<br /> The shortening is even faster than 10 min (not only statistically significant, but also obvious qualitatively if we mount immediately after adding Mg2+ ). We did not include the first timepoint because original purpose was to check everything was ready with microscope – did not expect shortening so fast! We can definitely add that data in. When we saw, we tried to capture the transition on pads, but going from culture to pad seems to stress the cells too much in the small window where the cool stuff happens. Since growth rate doesn’t appear to be a big factor in those initial divisions, we might be able to grow at lower temp and shift to pads for adjustment period before adding Mg2+. Did not play with it much due to lack of resources atm, but a flowcell setup would probably be best.<br /> In short, we think rapid divisions right after transition do not require transcription or translation. It really “smells” more like a biophysical thing.

      Question<br /> Do you have any hypotheses what is most likely to be affected by magnesium? Do you think if the membrane may be affected?

      Response<br /> We have a lot of hypotheses – all of which are speculative. There could be an extracytoplasmic enzyme involved in envelope synthesis is sensitive to Mg2+ availability, and that at lower concentrations, it’s activity is affected. There is some old literature with membrane preps that suggests PG synthesis requires higher Mg2+ than teichoic acid synthesis. If Und-P is limiting, higher Mg2+ may shift make the pool more available to make the septum. Tingfeng initially hypothesized there might be a receptor/signal mechanism but has not been able to identify one. Und-P seems to be important, but “availability” is not just pool, but how fast (and where!) the flipping across the membrane occurs. If Und-PP needs to be dephosphorylated to Und-P before being flipped back to cytoplasmic side, anything that effects the PPi equilibrium would be predicted to affect the reaction rate, with lower Pi (in periplasm or pseudoperiplasm in case of G+) favoring the dephosphorylation. Cell wall associated Mg2+ could shift equilibrium to be more favorable for a Und-PP phosphatase more closely associated with the divisome. I could go all day… In short, we don’t know enough!

      Question<br /> Why do you think less magnesium activates this program of less division and more elongation? Additionally why is abundant magnesium activating a program of increased cell division and less elongation? Do you think there is some evolutionary advantage, especially considering how important magnesium is for ATP production?

      Response<br /> In the window we looked at, the elongation rate is constant (not less or more) and only the division frequency changes. Some bacteria (like Caulobacter and to lesser extent E. coli) clearly elongate and divide simultaneously, so there is some competition for substrate (like Lipid II). Septators like Bacillus seem to delineate the two processes more, but we have found conditions where even Bacillus invaginates during division, so it’s not absolute. Like eukaryotic cells, bacterial undoubtedly have mechanisms not only commit to a round of DNA replication when there is some signal that resources are sufficient. Clearly with some bugs, this is not the case with cell division. The alternative possibility is that every cell cycle there is an opportunity to divide if some threshold of *something(s)* is reached. There is a hypothesis from Mtb literature that it may be GTP, but it’s not at all clear that is sufficient. In yeast, size at cell division is affected by perturbing 1-C pool.

      Question<br /> Related to this previous question, I also wonder if this magnesium-dependent phenotype would extend to other unicellular organisms, may be protists or algae? That would be a really exciting direction to explore!

      Response<br /> It’s a great question – lots to do! We didn’t even look at another Gram-positive, but we plan to. It’s trickier to limit Mg2+ in Gram-negatives (see 27471053 – we tried Bsub homolog for those wondering – it’s not responsible for phenotype we see).

      Question<br /> Regarding the zinc and manganese experiments, why do you think they lead to additional phenotypes compared to magnesium? Do you have any hypotheses?

      Response<br /> We have hypotheses, but if my (Jen’s) twitter engagement is any indication, way too speculative for public consumption at present. Need grant to acquire preliminary data to write grant.

      Question<br /> Regarding your results that Lipid I availability may be a major a problem for the cell division in the absence of magnesium, do you think that is due to effects magnesium has on the enzymes directly, or do you think magnesium affects the substrate availability/conformation by coordinating the phosphate groups? Or something else, may be membrane conformation?

      Response<br /> Several proteins involved in envelope synthesis (like UppS) are Mg2+ dependent enzymes. But at least for any intracellular players, levels of Mg2+ should be more than high enough to support enzyme activity even when levels are low (0.8 – 3.0 mM is Bsub range I recall off top of head). Could have impact extracytoplasmically by lowering pool sponged into the cell wall, but intuition (for what that is worth) is that it is not the coordination of an enzyme with a metal that is impacted rather the equilibrium with other ions like Pi and H+ and that this impacts net ATP synthesis. Lots to think about and do, and no simple answers. When Tingfeng started project idea was to find mechanism – didn’t realize we were asking “how does the cell work?” Turned out to be a bit much for a dissertation project :)

      -Jen Herman and Tingfeng Guo

    1. On 2022-10-08 16:37:04, user Michael Ailion wrote:

      This paper aims to understand how toxin-antidote (TA)<br /> elements are spread and maintained in species, especially in species where<br /> outcrossing is infrequent and the selfish gene drive of TA elements is limited.<br /> The paper focuses on the possible fitness costs and benefits of the peel-1/zeel-1 element in the nematode C. elegans. A combination of mathematical modeling and experimental tests of<br /> fitness are presented. The authors make a surprising finding: the toxin gene peel-1<br /> provides a fitness advantage to the host. This is a very interesting<br /> finding that challenges how we think about selfish genetic elements,<br /> demonstrating that they may not be wholly “selfish” in order to spread in a<br /> population.

      This paper is of interest to evolutionary biologists and<br /> population geneticists. It provides empirical evidence that supports a previous<br /> hypothesis of how selfish toxin-antidote elements spread in non-obligate<br /> outcrossing species. While the experiments and data are appropriate for<br /> addressing this hypothesis, one major conclusion is not supported by the data<br /> and one other major conclusion is supported only weakly.

      Strengths

      1. The authors support results found with a zeel-1 peel-1 introgressed strain by using<br /> CRISPR/Cas9 genetic engineering to precisely knock-out the genes of interest.<br /> They were careful to ensure the loss-of-function of these generated alleles by<br /> using genetic crosses.

      2. Similarly, the authors are careful with<br /> controls, ensuring that genetic markers used in the fitness assays did not<br /> affect the fitness of the strain. This ensures that the genes of interest are causative<br /> for any source of fitness differences between strains, therefore making the<br /> data reliable and easily interpretable.

      3. A powerful assay for directly measuring the<br /> relative fitness of two strains is used.

      4. The authors support relative fitness data<br /> with direct measurements of fitness proximal traits such as body size (a proxy<br /> for growth rate) and fecundity, providing further support for the conclusion<br /> that peel-1 increases fitness.

      Weaknesses

      1. One major conclusion is that peel-1 increases<br /> fitness independent of zeel-1, but this claim is not well supported by<br /> the data. The data presented show that the presence of zeel-1 does<br /> not provide a fitness benefit to a peel-1(null) worm. But the experiment<br /> does not test whether zeel-1<br /> is required for the increased<br /> fitness conferred by the presence of peel-1.<br /> Ideally, one would test whether a zeel-1(null);peel-1(+) strain is<br /> as fit as a zeel-1(+);peel-1(+) strain, but this experiment may<br /> be infeasible since a zeel-1(null);peel-1(+) strain is inviable.

      2. The CRISPR-generated peel-1<br /> allele in the N2 background only accounts for 32% of the fitness difference<br /> of the introgressed strain. Thus, the effect of peel-1 alone on fitness appears to be rather small. Additionally, this<br /> effect of peel-1 shows only weak<br /> statistical significance (and see point 5 below). Given that this is the key<br /> experiment in the paper, the major conclusion of the paper that the presence of<br /> peel-1 provides a fitness benefit is<br /> supported only weakly. For example, it is possible that other mutations caused<br /> by off-target effects of CRISPR in this strain may contribute to its decreased<br /> fitness. It would be valuable to point out the caveats to this conclusion, or<br /> back it up more strongly with additional experiments such as rescuing the peel-1(null) fitness defect with a<br /> wild-type peel-1 allele or determining<br /> if introduction of wild-type peel-1 into<br /> the introgressed strain is sufficient to confer a fitness benefit.

      3. The strain that introgresses the zeel-1 peel-1 region from CB4856 into the N2 background was made by<br /> a different lab. Given that N2 strains from different labs can vary<br /> considerably, it is unclear whether this introgressed strain is indeed isogenic<br /> to the N2 strain it is competed against, or whether other background mutations<br /> outside the introgressed region may contribute to the observed<br /> fitness differences.

      4. Though the CRISPR-generated null allele of peel-1 only accounts for 32% of the<br /> fitness difference of the zeel-1 peel-1 introgressed<br /> strain, these two strains have very similar fecundity and growth rates. Thus,<br /> it is unclear why this mutant does not more fully account for the fitness<br /> differences.

      5. Improper statistical tests are used. All comparisons use<br /> a t test, but this test is inappropriate when multiple comparisons are made.<br /> Importantly, correction for multiple comparisons may decrease the already weak<br /> statistical significance of the fitness costs of the peel-1 CRISPR allele (Fig 3E), which is the key result in the<br /> paper.

      6. N2 fecundity and growth rate measurements<br /> from Fig 2B&C are reused in Fig 3C&D. This should be explicitly stated.<br /> It should also be stated whether all three strains (N2, the zeel-1 peel-1 introgressed strain, and<br /> the peel-1 CRISPR mutant) were<br /> assayed in parallel as they should be. If so, a statistical test that corrects<br /> for multiple comparisons should also be used.

      7. It appears that the same data for the<br /> controls for the fitness experiments (i.e. N2 vs. marker & N2 vs.<br /> introgressed npr-1; glb-5) may be<br /> reused in Fig 2A and 3E. If so, this should be stated. It should also be stated<br /> whether all the experiments in these panels were performed in parallel. If so,<br /> this may affect the statistical significance when correcting for multiple<br /> comparisons.

      Minor<br /> points

      1. Though the mathematical modeling is interesting from a<br /> theoretical point of view, we feel that it oversells the rationale behind the<br /> experiments, setting up a “straw man” argument to knock down. Also, the modeling<br /> relies on rather high assumptions of the possible carrying cost of peel-1/zeel-1. For example, the modeling<br /> of the effect of outcrossing rate on peel-1/zeel-1<br /> frequency assumes a selection coefficient of 0.35, which seems rather<br /> arbitrary and high. Where does this number come from? Is there any precedence<br /> for this high carrying cost? In our opinion, the idea that energy expenditure<br /> or leaky toxicity accounts for such a high carrying cost seems unlikely.

      2. The two studies cited for “outcrossing rates typical for<br /> C. elegans” estimated vastly different outcrossing rates (~20% or ~1%).<br /> The model presented in Fig S1 specifically uses the lower estimates (0-2%), so<br /> the Sivasundar & Hey paper is miscited here. It is unclear whether there is<br /> a good rationale to go with the lower rate estimates.

      3. The measurement of body-size is unclear in the main<br /> text. Only when reading methods did we realize that body-size is more of a<br /> proxy for growth rate rather than an end-point measurement of worm size.

      4. What is the temporal distribution of egg laying of the<br /> N2 and N2peel-1(null) strains? Based on how the<br /> data collection is described in the Methods, the authors should already have<br /> these data. Does egg-laying start at the same time in the two strains? The fact<br /> that strains carrying peel-1 grow<br /> faster but also apparently produce more sperm (which might slow them down)<br /> makes an analysis of this worthwhile, especially since fitness depends on when<br /> eggs are laid, not just how many. Some more characterization of this fitness<br /> trait seems appropriate and useful for beginning to understand how peel-1<br /> may be increasing fitness. Given that the number of sperm limits how many eggs<br /> are laid, the presence of peel-1 apparently results in more sperm. It is<br /> surprising that a gene exclusively expressed in developing sperm can lead to<br /> production of more sperm.

      5. Line 65: the statement “similar elements have not been<br /> identified in obligate outcrossing Caenorhabditis nematodes” is somewhat<br /> misleading. TA elements may not have been identified in obligate outcrossing<br /> nematodes because of research bias since genetic experiments are easier to<br /> perform in non-obligate outcrossers and it is unclear that there have been<br /> extensive searches for TA elements in outcrossing nematodes. Furthermore, as<br /> the mathematical models in this study suggest, TA elements will spread quickly<br /> with increasing rate of outcrossing. Since a TA element’s non-fixation within a<br /> species has historically been a prerequisite for its discovery, the rapid TA<br /> element fixation that would generally occur in obligate outcrossers would make<br /> their identification more challenging.

      6. Line 209-210: it is stated that this is the “first<br /> measurement of the fitness cost of a TA element to the host” and “first<br /> demonstration that a TA element can benefit the organism.” These claims may be<br /> overstated. It has been previously shown in several cases that TA elements can<br /> provide fitness benefits in bacteria, such as improved antibiotic resistance<br /> (e.g. Bogati et al. 2022, PMID: 34570627).

      7. More details about the CRISPR protocol would be helpful.<br /> It is unclear whether Cas9/sgRNAs were introduced as RNPs or plasmids (and at<br /> what concentrations). It is unclear how worms were screened for edits. It is<br /> also unclear how many Dpy or Rol worms were screened and how many peel-1 or<br /> zeel-1 edited worms were found (the efficiency of CRISPR). The meaning<br /> of the shaded portion of the repairing oligo sequences in the table is not<br /> explained. Finally, it is not stated whether CRISPR-generated mutant strains<br /> were outcrossed.

      Reviewed (and signed)<br /> by Lews Caro and Michael Ailion

    1. On 2022-09-19 14:09:03, user Gregory Way wrote:

      We reviewed this preprint as a part of Arcadia's preprint review initiative: https://twitter.com/Arcadia...

      Peidli et al. present a data resource (for single-cell perturbations) and apply energy distance (e-distance) to quantify differences in perturbations. For the data resource, the authors focus on curating single-cell RNAseq and ATACseq measurements perturbed with CRISPR, drug treatments, and a few other perturbation types. The authors curate a total of 44 datasets. Overall, the paper is very well written with a sound logical flow. However, many elements of the paper seem incomplete. We provide several specific comments regarding our views on how the paper could improve. We thank the authors for posting their preprint and code publicly.

      Our two primary comments are:

      1. The data are not harmonized from reads. Instead, the authors process (in most cases) already processed read count by gene matrices. The authors also use different versions of scanpy to process different datasets. This is definitely still valuable, but the authors should state these facts earlier and probably decrease the use of “harmonization”. Additionally, there is no evaluation to determine the effect or benefit of this read count harmonization. Calculating e-distance before and after harmonization across datasets might be helpful.

      2. E-distance is not sufficiently benchmarked. The math and intuition are described marvelously, but how does E-distance behave across datasets and common perturbations? How does subsampling read depth impact E-distance calculations? How does drug dose impact e-distance? How does sequencing technology impact e-distance? How does modifying the distance metric within the E-distance calculation impact calculations?

      We also have several general comments on different aspects of the paper and github repository. We hope that the authors can benefit from our deep dive on the paper. Thanks again!

      Introduction

      • Definition of single-cell perturbation data (SCPD)

      Overall, this subsection is more of a “methods/techniques overview” of how to collect SCPD rather than defining what SCPD actually is. What is output from these techniques?<br /> - The authors should define these data in more detail.<br /> - The authors should also further define the techniques as it is helpful to have a general idea of why the data collected from the techniques are “good” and not just “more data are better”.

      Motivation for distance measure of high-dimensional profiles:

      • The authors claim that E-distance can identify strong or weak perturbations. It’s unclear what a strong or weak perturbation is. I was unable to find this information from a quick google search so I think they should define that here (not found in methods either).

      Motivation for unifying datasets

      • Their motivation only seems to be “it doesn’t exist yet because it’s difficult to do” so therefore we should do it. What will/could come of the integrated and standardized datasets? What would we hope to find?

      Web Interface

      • The authors claim, “a web interface for data access, analysis and visualization is available at scperturb.org.” There is data access on that site, but analysis and visualization appear to be absent using Brave and Safari browsers.
      • It seems that one would require a computer with enough memory (500G) to run scPerturb to reproduce the analysis. The authors present solutions for how to overcome these requirements, but it did not seem that they attempted to solve them.
      • The authors state that there are Quality Control plots for each dataset on the website but we could not find.

      Results<br /> - The authors should briefly describe the methods underlying the statement “dense low-dimensional embeddings of the original data (see Methods for details)” in a bit more detail upon introduction.<br /> - It is surprising to me that there are so many cells with 2 perturbations (proportionally to a single perturbation) (sup fig 1). Is this because of an overweighting of a specific study?<br /> - It might be helpful to add targeted sequencing depth to table 1 per study, also helpful to add the sequencing platforms used.<br /> - Data source trust: Zenodo sources appear to be auxiliary data downloads as opposed to direct sources. How might other researchers assume trust in the sources? Are the included metadata implied or entrusted to the authors?<br /> - Are the UMAPs in Figure 3E the same UMAP space or are the spaces fit independently in both panels?<br /> - Need to provide a bit more rationale for why the authors chose E-distance over the other options.<br /> - Did they calculate E-distance for all perturbations? Sup Fig 3 shows this, so maybe? It was not obvious where to find the measurements.<br /> - There are only 11 drug perturbations in common. This is a very interesting observation! How many genes are perturbed in common datasets?

      Methods<br /> - For the scATAC-Seq data, it’s not clear to me if they perform LSI jointly across all samples or not. This would cause non-interoperability across datasets if not done jointly since each LSI dimension may mean something different in each dataset. In addition, they provide peaks x counts matrix -- which is dataset specific. I would suggest aligning jointly using a uniform set of peaks -- Running MACS2 on all datasets would be a huge benefit to the community.<br /> - How do the different versions of scanpy impact data processing? Typically, harmonized data are generated with a single pipeline.<br /> - When performing subsampling to fit PCA, did the authors transform the full data subsequently? In other words, does the PCA fitting step impact cell count for e-distance calculation?<br /> - What distance measure is used in the E-distance calculation for ||x_i - x_J||? L2? For perturbations, comparing L2 to other metrics would help benchmark the method.

      Code/Github<br /> - It seems to us a good idea to spend time improving the existing model / code at https://github.com/theislab.... The authors should justify why they are not contributing to existing open source code.<br /> - I can’t find the script “fragments2outputs.R” in their github. From their paper: “All features described in the overview above were computed with ArchR functions. For details inspect the “fragments2outputs.R” script in our code repository (see Data Availability).”

      Data Repo comments:<br /> - Manual data testing for reproducibility within https://github.com/sanderla... (one must perform the steps, the repo doesn’t provide or outline within the code itself)<br /> - Suggests using “mamba” but does not provide instructions on how to install mamba <br /> - Would suggest a small description for each folder in the directory (README) explaining its contents <br /> - There’s no usage example on how to download the data or use the program<br /> - Would be best to have a notebook (or bash script) that describes the entire workflow. <br /> - The notebooks are not sequentially executed and there are no execution instructions<br /> - What environment (OS/hardware/configuration/etc) is required to run the code?<br /> - Is notebook (.ipynb) output expected within committed code? (should these have been scrubbed with nbconvert/jupytext?)

      Data Availability<br /> - Based on this section, their website only contains the first three bullet points (e.g scRNA-seq data, scATAC-seq data, and details about the datasets). We could not easily find the last three bullet points (Quality control plots for each dataset, Filtering, e.g., by readout or type of perturbation, Commands for direct file download using the Unix command curl)

      This review was produced jointly at The University of Colorado by:

      Gregory P. Way, PhD<br /> Natalie Davidson, PhD<br /> Erik Serrano<br /> Parker Hicks<br /> Jenna Tomkinson<br /> Dave Bunten

    1. On 2022-09-07 14:26:19, user Feng Yang wrote:

      I am the corresponding author of the original study. [Journal name redacted to follow bioRxiv's policy] rejected this Preprint based on our Concerns on their concern. Unfortunately, I do not know how to publish the PDF file of our response (it does not fit BioRxiv since our PDF file does not contain additional experimental data). I am pasting it below. We welcome open discussion based on solid experimental data and are looking forward to more independent studies in this area.<br /> Re: On the therapeutic potential of MAPK4 in triple-negative breast cancer <br /> Feng Yang<br /> Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas<br /> * Corresponding Author: Feng Yang, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030. Phone: 713-798-8022; Fax: 713-790-1275; E-mail: fyang@bcm.edu<br /> Boudghene-Stambouli et al. recently published “On the therapeutic potential of MAPK4 in triple-negative breast cancer” in BioRxiv concerning our Nature Communications publication, “MAPK4 promotes triple negative breast cancer growth and reduces tumor sensitivity to PI3K blockade.”, published 11 January 2022 (1). We want to reply to their comments as follows.<br /> Boudghene-Stambouli et al. essentially detected a similar MAPK4 protein expression pattern (Our report (1) vs. Boudghene-Stambouli et al., Fig. 1c) in the human TNBC cells, when using the same commercially available antibody AP7298b. However, they claimed, “We failed to detect a specific ERK4 band in any of the cell lines, including Hs578T cells transfected with human ERK4 cDNA.” They then used their own “validated custom polyclonal ERK4 antibody that we use routinely in our laboratories” to produce a different MAPK4 expression pattern (Boudghene-Stambouli et al., Fig. 1c). They provided a siRNA knockdown for the “validation” of their antibody. In this case, Boudghene-Stambouli et al. largely ignored our previous publications using the commercially available AP7298b to successfully confirm the overexpression, knockdown (up to five independent shRNAs), and knockout of MAPK4 in many human cancer cell lines and in “normal” cells (1-4). AP7298b can also detect a purified GST-MAPK4 fusion protein in the GST pulldown assays and the purified Flag/His-tagged wild-type and mutated MAPK4 proteins in the in vitro kinase assays (2). It should be noted that instead of our extensive validation of AP7298b using many MAPK4-overexpressing, knockdown (up to five independent shRNAs), and knockout cells as well as purified MAPK4 proteins (overexpressed/purified from both prokaryotic and eukaryotic cells), Boudghene-Stambouli et al. only used a single siRNA to “validate” their un-named custom antibody. Besides, they did not confirm HA-MAPK4/Erk4 overexpression in their Hs578T cells (Boudghene-Stambouli et al., Fig. 1c). Please note, due to the sensitivity of different antibodies, even if an HA-positive western blot is provided, it may not confirm significantly increased ectopically overexpressed MAPK4 expression over the endogenous MAPK4. Finally, their custom antibody detected many non-specific bands compared to AP7298b (Boudghene-Stambouli et al., Suppl. Fig. 1c, which was included in their submission recently rejected by [Journal name redacted to follow bioRxiv's policy] after peer-review). Therefore, we have concerns over Boudghene-Stambouli et al.’s concern on MAPK4 protein expression levels in the MAPK4-high TNBC cell lines that we used in our study (1).<br /> It is well-known that mRNA and protein abundances may not correlate well in biological systems. Therefore, Boudghene-Stambouli et al.’s concern about the variation of MAPK4 mRNA expression across the cell lines will not carry that much weight. We also noticed that Boudghene-Stambouli et al. used our reported 5’ primer but a modified 3’ primer for their qPCR data in Fig. 1a. We wonder whether they have performed qPCR using our reported 5’ and 3’ primers to detect MAPK4 expression (3), and what were the results? Besides, although we have not systematically examined MAPK4 mRNA expression in human TNBC cell lines as we did for human prostate cancer cell lines (3), we did qPCR confirmed MAPK4 expression in MDA-MB-231, SUM159, as well as the non-small cell lung cancer H1299 cells. Besides, Zheng et al. independently showed MAPK4 mRNA and protein expression in HCC1937 and MDA-MB-231 cells (5), two of the TNBC cell lines concerned by Boudghene-Stambouli et al. Without knowing the quality of Boudghene-Stambouli et al.’s RNA-seq data, we could not comment on their Fig. 1b data.<br /> Another concern of Boudghene-Stambouli et al. is their failure to verify our reported MAPK4-AKT signaling axis, a conclusion drawn from their Fig. 2 data. Without providing their data, the corresponding author Dr. Meloche has communicated with me about this issue. At that time, I provided the following answer. “I am not sure if you did a transient transfection in the 293 cells. Unlike MK5, phosphorylation of AKT is subjected to many more direct and indirect regulations in the cells. It is hard to imagine that you can easily detect MAPK4 phosphorylation of cell endogenous AKT in the transiently transfected 293 cells. It can be a hit and miss, especially if you do not carefully monitor cell confluency. I think that we only reported data from the stable 293T cells overexpressing MAPK4 or MAPK4 phosphorylating a co-transfected AKT in 293T cells. In the latter case, we suspect that these ectopically overexpressed AKT are less susceptible to endogenous cellular posttranslational modifications and more susceptible to the regulation of overexpressed MAPK4. Again, unless you can’t repeat our data, such as MAPK4 phosphorylating a co-transfected AKT in 293T cells, I do not see a common ground for our debate here either.” Now I see the experimental data, and Boudghene-Stambouli et al. did perform a transient transfection and tried to detect phosphorylation change of endogenous AKT, which we have already expressed concern about in our previous personal communications. Interestingly, as a positive control for their Fig. 2 data, Boudghene-Stambouli et al. showed MAPK4 enhanced the phosphorylation of an ectopically overexpressed but not endogenous MK5, raising concern about this so-called positive control per se. We are also unsure how much MAPK4 was overexpressed compared to endogenous MAPK4 (Western blots on GFP could not provide that information) nor the nature of the seemingly increased AKT T308 phosphorylation in the MAPK4 transfected 293 cells (Boudghene-Stambouli et al., Fig. 2).<br /> I want to finish this discussion using what I wrote to Dr. Meloche in another email. “Without detailed information from your side, it is hard for me to guess what happened. I want to emphasize several technical details that may help. 1. Please collect cells at about 50%-70% confluency. If your lab collected cells at very high confluency, please try this. 2. We have been using Dox-inducible knockdown and overexpression approaches. We typically maintain the cell culture without Dox induction and do a couple of days (such as three days) induction just before the experiments. 3. If you use a non-induction system as we did in some of our studies, please ensure that you only use the engineered cell lines at early passages. You can do this by freezing down many vials from a very early passage and only using the thawed-out cells for minimal additional passage(s). The cancer cells in culture may adapt to the cellular “stress” from long-term MAPK4 overexpression or knockdown.”<br /> We welcome open discussions based on solid experimental data. We will do our best to help if any group meets technical difficulty in repeating our data under the reported experimental conditions. We have validated our MAPK4-AKT signaling in more than 20 human cancer cell lines (Ref. (1-3), and unpublished data), and additional independent reports also confirmed MAPK4 phosphorylates/activates AKT in human cancer cells (5, 6). We welcome and are looking forward to more independent studies in this area.<br /> References <br /> 1. Wang W, et al. MAPK4 promotes triple negative breast cancer growth and reduces tumor sensitivity to PI3K blockade. Nat Commun. 2022;13(1):245.<br /> 2. Wang W, et al. MAPK4 overexpression promotes tumor progression via noncanonical activation of AKT/mTOR signaling. The Journal of clinical investigation. 2019;129(3):1015-1029.<br /> 3. Shen T, et al. MAPK4 promotes prostate cancer by concerted activation of androgen receptor and AKT. The Journal of clinical investigation. 2021;131(4).<br /> 4. Cai Q, et al. MAPK6-AKT signaling promotes tumor growth and resistance to mTOR kinase blockade. Sci Adv. 2021;7(46):eabi6439.<br /> 5. Zeng X, et al. MAPK4 silencing together with a PARP1 inhibitor as a combination therapy in triplenegative breast cancer cells. Molecular medicine reports. 2021;24(2).<br /> 6. Tian S, et al. MAPK4 deletion enhances radiation effects and triggers synergistic lethality with simultaneous PARP1 inhibition in cervical cancer. J Exp Clin Cancer Res. 2020;39(1):143.

    1. On 2022-08-19 20:54:50, user Stephanie Wankowicz wrote:

      Summary: In this paper the authors set out to develop new methods for refinement of models into cryo–EM density maps. There are three primary interrelated contributions:

      -Assigning “responsibility” for different regions of the map to a model and then fitting GMM as a real space B-factor. This is a new way to model atomic B-factors, since it is done in real space, compared to reciprocal space in most other software.<br /> -Sampling an ensemble based on those B-factors. The major success of this paper was that the authors created a new ensemble method that samples within the B-factors to improve the fit of hundreds of cyro-EM maps, demonstrating that their method is robust and can be done in a high throughput manner.<br /> -Refinement procedures for composite maps based on smoothing of responsibility. The examples all seem to be from individual maps with different levels of resolution across the map, not from true composite maps (calculated from different masking procedures for example). This part was very confusing for us to follow and although there are methodological links to the B-factor assignment/ensemble modeling parts of the paper, it might be better explained in a separate manuscript.

      Major comments:<br /> 1. The introduction only briefly discusses B-factors and doesn’t lay out what is distinct about this method. For a contrast, sampling is discussed with references and contrast:<br /> “ The sampling itself is usually based on either molecular dynamics (MD)4,9, minimisation10, normal mode analysis and/or gradient following techniques11,12, or Fourier-space based methods2.”<br /> Similarly, B-factor refinement should be discussed. The way Phenix and Refmac handle it (real vs. reciprocal space), the limitations that the GMM addresses, etc.

      1. With regard to sampling, there are other methods that are now similar for generating ensembles (the EMMI work from Vendruscolo and Bonomi for example). It would be useful to contrast the limitations of those methods and how this method is distinct. For example, this method seems likely to be much more computationally simple to run. It would also be good to benchmark against examples of those ensemble methods in terms of RMSF/inferred B-factors.

      2. When you refer to the TEMPy-REFF models in each case study are they always ensemble models using segmentation?

      3. How are the weights for each focus map decided for when creating a composite map? Stated in ‘combining focused maps into a single overall composite map, with optimal weights of the focused maps.’ (page 3)<br /> We think that more information on how you are generating ensembles belongs in the results section which will help clarify the paper. Some additional specifics we think would make this section strong include: Are the ensembles being created for different segments of the model (based on map segmentation) or the entire model? When creating an ensemble, what is the input model? Has it already gone through iterations of the map to model fitting? How are ensemble models represented? Please provide examples and discuss how you would like these models interpreted.

      4. Please clarify how b-factors are represented in your ensemble models and input into maps. Furthermore, in the discussion you state ‘We address this challenge using B-factor estimation. We find, as previously shown by us and others, that an ensemble of equally-well fitted models represents this local variability better than a single model.’ (page 16). However, it is unclear how the b-factors integrate with the ensemble model to represent local resolution. Please clarify which part of your model correlates with local resolution.

      5. On average, how many models were included in an ensemble? Please provide a graph of CCC values versus number of models in an ensemble for more examples (ie more than SI Figure 7). How are you thinking about the trade-off between a more complex model versus a small gain in CCC? How deterministic is this procedure? Can you repeat and compare at least one dataset? If you generate multiple ensembles starting from the same structure - do you get the same number of models out and are they similar?

      6. If we understand the calculations correctly, the increase in CCC comes from those models being refined independently, not collectively (which makes the increase all the more impressive). Does this suggest the ensemble captures both precision and accuracy (as discussed here: https://pubmed.ncbi.nlm.nih... and therefore the sampling allows escaping of local minima in a clever way. Are there other examples like the His alternative conformation that can help speak to this?

      7. When assigning responsibility for a part of the map that may be able to similarly explain two parts of the model, how does the method decide which part of the model should fit in that segment of the map?<br /> Please provide more insight on the interpretation of uncertainty of discrete positions of different sidechains as described in the sentence ‘ensemble adopting either (bottom inset), or uncertainty in the exact side chain confirmation (bottom inset) of two residues (Y76 and L78)’. How is uncertainty measured? Is the RMSF similar or comparable to what would be inferred by B-factors? Please compare the numbers you are reporting to other traditional refinement softwares such as REFMAC and Phenix. It’s unclear whether this is capturing anharmonic motions in a really different way or just sampling the B-factor harmonic component.

      Minor comments:<br /> 1. In Figure 1a, please provide more description about what you are representing with the blue and orange circles in the responsibility estimation.<br /> 2. How does your method represent very high resolution structures with low b-factors but high numbers of alternative conformers (specifically looking at PDBs: 7A4M, 7A5V of Apoferritin and GABA receptor).<br /> 3. In Figure 5a, please clarify how you are normalizing the B-factor.<br /> 4. Please deposit output models in Zonodo or some other public repository.<br /> 5. What does SMOCf stand for? Please introduce this briefly in the results.

      Review by Stephanie Wankowicz & James Fraser

    1. On 2022-08-09 15:01:20, user Uri Ben David wrote:

      Response to “Revisiting the effects of Cas9 on p53-inactivating mutations reveals sex-biased genome editing by CRISPR-Cas9”.

      Authors: Oana M. Enache, Veronica Rendo, Rameen Beroukhim, Todd R. Golub and Uri Ben-David

      A couple of years ago we reported Cas9-induced p53 signaling in cancer cell lines (ref 1). Here, Guo and Xiong address the possibility that this finding is affected by cell line sex biases (ref 2). In their preprint, they are trying to make 3 points related to our paper. We will address each of these points separately.

      1) TP53 mutations also shrink and not only expand upon Cas9 introduction.

      To study the trend of p53-inactivating mutations to expand or shrink following Cas9 introduction, we performed an analysis of pre-existing subclonal mutations (Fig. 3d in ref1). As mentioned in our paper several times, we deliberately restricted this analysis to pre-existing mutations with 0.02<af<0.48 or="" 0.52<af<0.98="" in="" the="" parental="" cell="" line.="" the="" reason="" for="" the="" focus="" on="" subclonal="" mutations="" in="" this="" analysis="" is="" that="" the="" tendency="" of="" mutations="" to="" expand="" or="" shrink="" can="" only="" be="" tested="" in="" subclonal="" events,="" as="" clonal="" events="" can="" only="" shrink="" and="" not="" expand,="" whereas="" non-detected="" events="" can="" only="" emerge="" but="" not="" shrink.="" inclusion="" of="" such="" clonal="" mutations="" would="" therefore="" bias="" the="" analysis.="" we="" found="" a="" highly="" significant="" trend="" for="" subclonal="" inactivating="" tp53="" mutations="" to="" expand="" following="" cas9="" introduction="" (fig.="" 3d="" in="" ref1),="" and="" tp53="" ranked="" 1st="" among="" all="" genes="" in="" this="" respect="" (fig.3e="" in="" ref1).="" in="" contrast,="" guo="" and="" xiong="" used="" different="" selection="" criteria="" for="" inclusion="" and="" exclusion="" of="" mutations.="" two="" of="" the="" shrinking="" mutations="" identified="" in="" their="" fig.="" 1a="" (in="" ovk18="" and="" c2bbe1)="" are="" clonal="" mutations="" (with="" af="" of="" ~0.5="" or="" ~1="" in="" the="" parental="" population).="" we="" argue="" that="" it="" is="" improper="" to="" include="" clonal="" mutations="" in="" this="" analysis,="" and="" it="" is="" clearly="" wrong="" to="" report="" them="" as="" “tp53="" inactivating="" subclonal="" mutations”="" (legend="" to="" fig.="" 1a="" in="" ref2).="" the="" third="" mutation="" that="" they="" identified="" as="" shrinking="" (in="" a2780)="" was="" also="" not="" analyzed="" by="" us,="" since="" it="" is="" a="" known="" snp="" that="" is="" pretty="" prevalent="" in="" the="" population="" (="">1% in gnomAD (ref3); see Supplementary Data 3 and our exclusion criteria described in the Methods section of ref1). We therefore think that it is a mistake to consider this mutation as an ‘inactivating TP53 mutation’ as well.<br /> Importantly, if one were to include the clonal inactivating mutations that Guo and Xiong have added to our analysis in their Fig. 1a2, then there is no justification for the exclusion of mutations that were not detected at all (AF~0) in the parental cell line but were present in the Cas9-expressing cell line, such as the mutation observed in the cell line SNU1 (Fig. 3c in ref1). However, this event was excluded in Fig. 1a of ref2. Similarly, if one were to include known SNPs in the analysis, then there is no reason to exclude the one in the cell line JHH7, which emerged from AF=0 to AF=1 (and was excluded both from our original analysis and from Fig. 1a2). In other words, the inclusion criteria for Fig. 1a of ref2 are inconsistent. <br /> Lastly, if we add the clonal mutations to the analysis (but exclude the known SNPs), there is still a significant trend for the expansion of TP53-inactivating mutations (p=0.03 in a one-tailed McNemar test for directionality). Guo and Xiong’s statement that they found “significantly shrinking inactivating subclonal mutations of TP53 in Cas9-cells, which means Cas9 also selects against TP53 inactivating mutations” (Abstract of 2) is therefore misleading. (We note that Guo and Xiong report that “four inactivating mutations from four cell lines were shrinking (P=0.039)”, but their manuscript does not provide any information about the statistical test that was applied to calculate significance.)

      2) There is a potential sex-bias in our results.

      We did not test whether any of our results were affected by a potential sex bias. Given that p53 has an effect on X chromosome inactivation, we cannot rule out the possibility that sex may affect p53 signaling following Cas9 introduction. However, sex representation in our cell line cohort was very balanced, and Cas9-induced p53 activation and selection were found in both male and female lines. Of the 43 TP53-WT lines used for the gene expression analyses, 21 were female, 21 were male, and one was of unknown sex; of the 122 TP53-mutant lines, 62 were female, 59 were male, and one was of unknown sex. Moreover, we used TP53-WT cell lines from both sexes (3 male lines, 2 female lines, 1 of unknown sex) to validate p53 activation following Cas9 introduction, and detected p53 pathway activation in both the male and the female lines (Fig. 2 and Extended Data Fig. 2 in ref1). Of the 10 cell lines in which a TP53 mutation was found to emerge or expand (Fig. 2c,d in ref1), 6 were female and 4 were male. Therefore, there is no evidence for any sex bias in these results.<br /> While Guo and Xiong raise an interesting hypothesis, they do not provide any real evidence that any of our results were indeed affected by sex bias. Instead, they make a few anecdotal statements on the matter:

      a) “The largest fold-change of p53 activation was observed in a female cell line (BT159)”.<br /> This is meaningless, as we tested the mRNA expression in 165 cell lines and protein expression in 9 cell lines. Guo and Xiong do not report any systematic comparison of the expression changes between male and female cell lines (although all of the data necessary for such analysis are available in our original paper).

      b) “There were more DNA damage foci in MCF7, which is a female cell line”. This assay was performed in only 3(!) cell lines, precluding any meaningful interpretation of sex bias. We also note that Cas9-induced p53 activation was actually mild in MCF7, compared to other male and female cell lines (Fig. 2e in 1), further weakening this particular anecdotal claim.

      c) “The largest TP53-inactivating subclonal mutations expanding or shrinking (293T, HCC1419, and OVK18) is seen in female lines”. This claim does not hold true if OVK18 is removed from the analysis. Moreover, according to Fig. 1a of ref2, 2 out 4 shrinking mutations and 4 out of 10 expanding mutations are actually seen in male lines, so the trend of mutations to expand or to shrink seems to be pretty sex-balanced.

      d) In the final paragraph of their manuscript, Guo and Xiong state that “We think the possible sex-biased effects of Cas9 may provide a possible reason for their failure to detect p53 activation in Cas9-expressing HCT116 (male) cells." This is factually wrong. We found significant activation of p53 in HCT116 cells transduced with Cas9, as is clearly shown in Extended Data Fig. 2d and<br /> 2e of ref1.<br /> We note that the majority of the manuscript by Guo and Xiong (Fig. 1b-d, Supplementary Fig. S1-S4, Supplementary Table S1) is an analysis of sex bias in CRISPR screens, which does not directly pertain to our paper. Sex biases in CRISPR screens may have nothing to do with the Cas9-induced p53 signaling that we observed. Moreover, we compared CRISPR to shRNA screens and found significant differences associated with p53 mutation status (Fig. 5 in ref1). Guo and Xiong do not discuss this at all, nor do they provide any evidence that this analysis was affected by cell line sex bias.

      3) TP53 mutation status of some cell lines is inaccurate in our paper.

      The Supplementary Note of 2 reads: "We found that 11 cell lines (RERFLCAI, SISO, SNU761, COV644, COLO684, HS294T, G292CLONEA141B1, D283MED, G401, SJSA1, and SNU1041) used as TP53-WT (Fig. 5a and Supplementary Data 5 in ref.1) by Enache et al. actually have non-silent TP53 mutations (Supplementary Table S2), although this should not affect their conclusions."

      There are 698 cell lines in Supplementary Data 5 and Fig. 5a, and we clearly did not validate the TP53 mutation status of each individually, but rather followed established annotations. There are several ways to classify TP53 mutation status in cell lines, and mutation calling algorithms constantly evolve. As described in our Methods section (ref1), we followed the annotations by Giacomelli et al. (ref4), which are based on the CCLE cell line annotations (ref5), according to which all of the 11 cell lines listed above are TP53-WT. These annotations have since been updated, however, and in the version downloaded by Guo and Xiong (22Q2, https://depmap,org/portal/), these cell lines are now classified as TP53-mutant. Importantly, exclusion of these cell lines has no effect on the outcome of the single analysis in which they were used (Fig. 5a in 1; p=8.8x10-6 instead of the original p=2.7x10-5; one-tailed t-test). Therefore, the slight discrepancy between the annotations used by us and those used by Guo and Xiong is irrelevant to the points that they raise.

      In summary, we thank Guo and Xiong for raising the intriguing possibility that sex may affect the cellular response to Cas9, in particular in the context of p53 pathway activation. However, this question remains open for now, as more research and data analysis are needed to determine whether this speculation is correct.

      References<br /> 1. Enache, O. & Rendo V. et al. Cas9 activates the p53<br /> pathway and selects for p53-inactivating mutations. Nat Genet 52, 662-668 (2020).

      1. Guo M. & Xiong Y. Revisiting the effects of Cas9 on<br /> p53-inactivating mutations reveals sex biased genome editing by CRISPR-Cas9. This preprint.

      2. Karczewski K.J. et al. The mutational constraint spectrum<br /> quantified from variation in 141,456 humans. Nature 581, 434-443 (2020).

      3. Giacomelli, A. O. et al. Mutational processes shape the<br /> landscape of TP53 mutations in human cancer. Nat Genet 50, 1381–1387 (2018).

      4. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables<br /> predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
    1. On 2022-06-14 18:49:19, user CJ San Felipe wrote:

      In this paper, the authors analyze an intrinsically disordered region (IDR) of the yeast general recognition factor Abf1 with the aim of identifying functional determinants of Abf1’s IDR. The advantage of the authors’ plasmid shuffle experiments is that it allows the study of many mutations and variations of Abf1. The authors reveal that Abf1 possesses an essential motif (EM) as well as several contextual residues that work together to mediate Abf1’s function. Upon further investigation of compositionally and functionally similar IDR’s, the authors hypothesize that sequence specificity and chemical context in IDRs functionally overlap with each other rather than act independently, and propose a 2D model to describe the contributions of each in IDRs. <br /> The major success of this paper is in developing a model that reconciles two contributors to IDR function: sequence specificity and chemical context. The major weakness of the paper is that the model is not comprehensively backed with control experiments. The 2D landscape model presented argues that modulation of essential motifs and contextual amino acids can produce several binding modes; however, no data is presented to show that these chimeras are viable because they interact with the same factors or function in the same way that IDR2 does. Therefore, we can’t be certain if these are off-target effects or the same interactions that occur with IDR2 as put forward in the model. In addition, we found some aspects of the organization of the paper may require more clarity. Overall, the paper reveals some of the functional determinants for Abf1’s IDR and proposes an intriguing model for the functional determinants of other IDRs, but it could be difficult for these findings to be generalized.

      Major points<br /> p.4: <br /> It is unclear to us why the minimal viable construct IDR2 449-662 is the background reference construct. Is it possible that IDR1 (absent in this construct) could provide unknown benefits in particular situations? For example, given the unknowns of Abf1’s interactome, is it possible that IDR1 helps to activate transcription of other genes that could rescue IDR2 mutants? Perhaps the presence of IDR1 could confer viability for IDR2 mutants that were deemed not viable in later experiments. Plasmid shuffle assays with IDR2 mutants that also have IDR1 present could be control experiments that answer this question.

      p.4 <br /> The constructs generated in this paper are tested for viability via plasmid shuffle assay, but there is no control experiment to ensure that these constructs are still interacting with the same partners or functioning in the same way that wildtype IDR2 does. One possible control experiment to test this could be to choose an Abf1-interacting partner based on proteomic literature on Abf1, and perform a co-immunoprecipitation/Western blot to see if the partner is still present across different IDR2 mutants. This control experiment should be done with full length Abf1, the background reference construct (with no IDR1), as well as a construct without the EM and a shuffled construct to represent the two extremes of the 2D landscape.

      p.5: <br /> The decision to choose the G4 motif does not have a strong justification or explanation. In figure 3F it is shown from the alignment between Abf1 and Gal4 that the region considered to have sub-homology does not overlap with the essential motif of Abf1 nor does it show similarity in its sequence. Therefore, in our view, it does not appear that Gal4 has an EM that is homologous to the EM of Abf1.

      Figure S1 PDF:<br /> By eye, it appears that there is large variation between the strains considered inviable – for example, FUS_1_163_WT clone 3 on page 6 and Shuffle 3 clones 2 and 3 on page 3 are both marked as inviable yet differ in growth. It could be helpful to readers if an explanation about why a binary classification of viable vs inviable was used in this study, as opposed to a sliding scale quantification.

      Minor points<br /> For a future direction, after identifying the essential motif in IDR2 (EM), we think it would be compelling to go back to the orthologs initially tested to see how conserved the essential motif is evolutionarily and to see how divergent the orthologs that we’re inviable were. We also feel that this could be incorporated into the paper’s discussion.

      Figure 3: <br /> Panels G-K were difficult for us to understand due to the sheer number of constructs presented. To us, the contrast between sequence-specific motif and chemical context would be clearer if panels E and K were combined, perhaps with labels “sequence specificity” and “chemical context” below the respective constructs, to underscore the two ends of the spectrum that these panels represent and to emphasize the unexpected viability of the constructs in K.

      p.2-3: <br /> The hypothesis that poorly conserved IDRs may still retain functional conservation is compelling, but the proteome-wide analysis of disorder leading up to this hypothesis could be clarified in the methods section. In particular, it would be helpful to include an explanation of why and how disorder score from metapredict and predicted pLDDT were used in conjunction with each other, as opposed to using the predicted consensus disorder score from metapredict alone.

      We review non-anonymously: Daphne Chen, CJ San Felipe, James Fraser (UCSF).

    1. On 2022-05-16 21:15:23, user Jingyi Jessica Li wrote:

      We thank Dr. Hejblum et al for sending us a draft of this article on May 3 before posting it. Below I'm pasting our reply sent to Dr. Hejblum et al on the same day. We believe that our discussion will be beneficial for the community.

      Dear Dr. Hejblum and all,

      Thank you for sending us your correspondence draft. We appreciate your professionalism.

      The main message of our article is that using popular methods without a sanity check may lead to inflated FDR, and permutation offers an easy sanity check.

      We agree that normalization is a tricky issue, and when samples do not need normalization (as is the case for permuted samples, which all come from the same "condition"), normalization may introduce unwanted bias, violate the null hypothesis, and thus deteriorate the FDR control. Meanwhile, we stand with our fundamental assumption that permuted samples should contain no true DE genes. Since many DE methods include normalization as an internal step and only accept count data as input, the only way to fairly compare them is to apply each method as a whole pipeline, not just its DE statistical test step, to the permuted samples. (That is, the "normalization first" approach in your manuscript is inapplicable to the DE methods that only accept count data, unless we dissect these methods and modify their code, which is beyond the scope of our benchmark study.) As a result, any bias introduced by normalizing the permuted samples (which do not need normalization) would be reflected in the actual FDR inflation. The Wilcoxon test is an exception because it is not a DE analysis pipeline, so we applied it to permuted samples without doing normalization in Figure 2A. This explains why our Figure 2A differs from your Figure 1A.

      We would like to clarify that our study is not a comprehensive benchmark because (1) there are numerous DE methods and (2) we did not want to dilute the cautionary message against using the popular DESeq2 and edgeR without a sanity check. Hence, we did not do a dissection of each method to find out how to fix the inflated FDR issue. Our dearseq results are based on dearseq (asymptotic), not dearseq (permuted), because we deemed dearseq (asymptotic) more appropriate when the sample size is large.

      We appreciate your clarification about the effect of normalization on the dearseq performance, and your results motivated us to think about the problem more clearly. However, we respectfully disagree with your conclusion that dearseq outperforms Wilcoxon in your results. Our reasoning is that only dearseq (asymptotic), not dearseq (permuted) has a slight power advantage over Wilcoxon, but dearseq (asymptotic) does not guarantee to control the FDR when the sample size is under 40; on the other hand, Wilcoxon only sacrifices power but not FDR control when the sample size is small. Nevertheless, we agree that dearseq is advantageous in that it can account for more complex experimental designs.

      We would be happy to publicly respond to your correspondence when needed. We believe that our discussion will be beneficial for the community.

      Best,<br /> Jessica


      Jingyi Jessica Li, Ph.D.

      Associate Professor<br /> Department of Statistics<br /> University of California, Los Angeles

      http://jsb.ucla.edu

    1. On 2022-03-21 04:02:34, user Andrew Bell wrote:

      First, well done on achieving such high resolution, and largely noninvasively. Now we can start to see evidence of what is really going on in the cochlea. Your work raises a whole lot of issues, but I’ll just mention a few key findings. I hope you find these comments helpful.

      1. In the abstract a fairly provocative statement is that the data is not explained by current theories. In my view, I don’t think this is quite right, as the motions you reveal appear to be the result of simple resonance between the rows, an idea first raised in my PhD thesis and then in several associated publications. Perhaps the most germane are Bell & Fletcher (2004), The cochlear amplifier as a standing wave, JASA 116, 1016; https://doi.org/10.1121/1.1766053 and Bell (2012), A resonance approach to cochlear mechanics, PLOS One, https://doi.org/10.1371/journal.pone.0047918. Both papers set out a scheme whereby the three rows of OHCs work together to establish a resonant element which gives rise to a standing wave between the rows. Tuning thus depends largely on the row spacing, not the stiffness of the BM. The OHCs are stimulated virtually instantaneously by the fast pressure wave (OHCs are pressure sensors, for which I’ve made a case elsewhere), not the conventional travelling wave. In this way, I think most of your findings can be accommodated, as set out below.

      2. In your Introduction you say that a special sort of phasing is required in order to amplify the travelling wave. This is not necessary if you look at it in terms of resonance. As Bell (2012) broadly explains, the travelling wave is simply the observed result of what happens in response to a graded bank of highly tuned resonators that are almost simultaneously excited by a fast pressure wave. The delay observed is then simply Q/pi cycles, where Q is the tuning sharpness. In other words, I suggest you may be looking at things back-to-front causally: it is the resonance that gives rise to an apparent TW, not that the TW is a causal entity that, through very careful phasing, is able to amplify BM motion and give rise to a large peak! That is, the OHCs don’t amplify motion at all; instead they are pressure transducers which, via electromotility, vibrate in response to the sound pressure surrounding them (the OHCs contain pressure-sensitive ion channels). I’ve published a number of papers on this, and I’m happy to discuss the idea with you in more detail if you wish. In brief, I am suggesting that, if we look at cochlear mechanics differently, the TW is an epiphenomenon of a tuned bank of active elements. The elements are local oscillators – there doesn’t need to be global coupling in order to propagate a TW.

      3. I am suggesting that each triplet of OHC1, OHC2, and OHC3 act together like a guitar string arranged radially. However, unlike a string, there is a fluid connection between the rows (a squirting wave) so that the wave travels at a particularly low velocity. Applied to your observations, at a BF of 46 kHz the wave traverses OHC1 to OHC3 (a distance of about 30 um) in 1/46000 of a second – that is, a speed of about 1 m/s. As an explanation, so-called squirting waves have such low phase velocities, and anatomically are well suited to act in the space between the TM and reticular lamina, as Bell & Fletcher (2004) describe. Electromotility of the OHCs causes squeezing in that space, generating squirting waves.

      4. At their tuned frequency (BF), the amplitude of vibration is largest, and that is consistent with a resonating element that is tuned to that frequency. Thanks to your high resolution, we can see the activity of each of the three OHCs. In Figure 5c there seems to be a larger amplitude of vibration for RL3 than RL1; another radial profile at a different level (Figure 7c) shows that the amplitudes are about equal. Given the intricate geometry, I think that the findings are generally consistent with a radial standing wave with the OHCs at the antinodes.

      5. Now, about the phases. The 3 OHCs seem to have about the same phase, and this is consistent with a standing wave between them. A standing wave is a wave that oscillates in time but whose profile of peak amplitude does not move in space. My papers suggest that OHC2 acts in antiphase to OHC1 and OHC3, an arrangement which is closer to a xylophone bar than a guitar string. In other words, each OHC sits at an antinode, and the result is a full-wavelength standing wave. Your OCT device sees all the OHCs vibrating at the same amplitude, but doesn’t see the wave moving backwards and forwards between them. Other phase arrangements may be possible, but the full wavelength case is probably the simplest. For a guitar string, there is only 1 antinode and 2 nodes, so if this applied in the cochlea, all the work would be done be OHC2 (we wouldn’t need 3 rows of OHCs).

      6. Taking together all the above, I hope you may appreciate that if we had a ringing xylophone bar between OHC1 and OHC3 then an OCT device would see all the OHCs vibrating at the same amplitude and the same phase. It would require special techniques to detect the standing wave, and I wonder if your device has that capability. This would provide convincing evidence in favour of a resonance model.

      7. Note that in my papers I regard the phase lag at resonance to reflect the group delay of the resonators. For a linear resonator, the group delay amounts to Q/pi cycles. It is interesting to look at the group delays you recorded in Figure 8f-h and Figure 10e,f. At BF (resonance) they show a phase lag of 2–3 cycles. So if one considers these delays to derive from a linear resonator (not strictly true, but perhaps not too far off), then the associated Q values would be pi times 2 or 3, which is about 6–9. Such Q values are roughly the same as those measured otoacoustically for the gerbil.

      In summary, I suggest it is possible to interpret your findings using a different causal chain, the inverse of what you have done. That is, the causal chain may involve the direct electromotility of OHCs in response to sound pressure, and not that OHCs have to very carefully amplify atomic-scale BM motions to create a traveling wave – and this approach simplifies cochlear mechanics enormously. The alternative view is that the BM may just be

      a supporting membrane for an array of tuned elements, which are independently excited by the fast pressure wave. Indeed, it is interesting that the ITER team (Khanna and colleagues) adopted this view more than 30 years ago. They said that “The present observations suggest that the outer hair cells vibrate mechanically along their axes in response to acoustical stimulation.” (p.188) https://doi.org/10.3109/00016488909138336. It is perfectly possible to look at your data in a different, but internally consistent, way.

      I hope this helps us move towards the truth of the matter. Best wishes for your publication. Andrew Bell.

    1. On 2022-02-19 17:21:19, user Charles Warden wrote:

      Hi,

      Thank you very much for posting this preprint.

      I appreciate your interest in COHCAP, but I thought that I should mention a couple things:

      1) You cited the COHCAP corrigendum, not a primary reference for the method or applications.

      This would be OK if you were citing something was specifically said in the corrigendum. Likewise, there are comments that complement the factual errors that were formally corrected.

      However, I think you may have meant to cite the following?

      https://pubmed.ncbi.nlm.nih...

      I apologize that I think this is confusing. PubMed correctly lists the 2019 citation as an "Erratum" if you view the original publication, although the separate listing for the corrigendum might look similar to a regular publication in PubMed among a set of search results.

      2) The default setting for the methylation threshold is 0.7 and the default setting for the unmethylated threshold is 0.3.

      For the patient data, we do offer using 0.3 as a troubleshooting suggestion. This may already be clear to some or most readers, although I wanted to mention again that some testing of various parameters may be needed. I also tend to use COHCAP along with at least 1 other method (such as methylKit) to try and assess the data, even if only 1 method is used in the paper (which may or may not be COHCAP).

      There is a newer location for support questions on GitHub (https://github.com/cwarden4..., but most previous questions are still on SourceForge (https://sourceforge.net/p/c....

      So, I think that is OK, but I am not sure if something like the following helps give additional context for readers of this paper:

      https://sourceforge.net/p/c...

      I believe that you referenced the use of thresholds rather than the method, but I am not saying that a beta value of 0.31 is truly significantly different than a beta value 0.29 by itself. The thresholds are 2 possible criteria out of several parameters considered in COHCAP, with the goal being to look for differential methylation.

      I hope this helps.

      Thank you again!

      Sincerely,<br /> Charles

    1. On 2022-01-28 20:16:36, user corihuel wrote:

      Dear Authors,

      Your work was recently reviewed and discussed by the Bacterial Pathogenesis and Physiology Journal Club here at the University of Alabama at Birmingham (UAB). As part of our review of pre-prints, we compile comments from our discussion that we think may better your publication.

      Overall, our group found the manuscript to be a very interesting read with detailed information on the structure/function of SteD emerging. We can tell that considerable thought that went into each experiment as well as figure production. Your lab has shown an exceptional amount of rigor in your experimental designs that made it difficult to refute your findings. This study was very well done, and we all enjoyed discussing it.

      Below we point out some comments and aspects that we feel could improve on the manuscript.

      1) We felt that the text was a little difficult to follow. Though it is probable that this will be alleviated once the paper has been properly formatted, as the figures help a great deal in understanding the text.

      2) We very much appreciated the short anecdotes in the manuscript explaining the specific actions of the chemicals used for your experiments. None in our journal club work this closely with transport systems and it made understanding your work much easier.

      3) We were curious about your justification for using a melanoma cell line in your studies rather than an APC line like BMDMs? We’ve noticed that it has been used for other Salmonella studies, but we think it necessary that you justify in the text why you use this cell line.

      4) The order of your figures is a little confusing, specifically figures 1-3. We think it would really help if you were to either combine Figures 1 and 3 in some manner or reorder them so that Figure 3 comes just after Figure 1, rather than being interrupted by Figure 2. This would streamline the reading and comprehension of your data greatly.

      5) On the topic of Figure 3, we were curious as to why you found the specificities you did and yet continued to use the region 13 mutation rather than the S68A G69A mutations in your experiments for Figure 4. Especially given the problems you had with region 13 mutation expression and release from Salmonella.

      6) Our group wanted to extend our compliments to your inclusion of the protein diagrams you had throughout your paper. The visualization made it easy to understand the mutations made and really helped with the overall comprehension of the paper and the experiments you were completing. On this note, however, we don’t think it necessary to highlight the F and Y residues in Figure 7. They are discussed in the text but are not tested in the figure. That depiction would be better included in a supplemental figure showing the experimental results from those mutations.

      7) Lastly, we believe Figure 5C should be moved to supplementary since it only confirms that your siRNA worked as intended.

      Sincerely,<br /> UAB Bacterial Pathogenesis and Physiology JC

    1. On 2022-01-21 22:03:41, user Debelouchina Lab wrote:

      Hello! This is the Debelouchina Lab at University of California, San Diego. We have begun doing preprint manuscript reviews during our “journal clubs” as a way to enhance our engagement with current literature and to hopefully assist with the manuscript if possible! Our lab also studies the behaviors of biomolecular liquid-solid transitions – with a focus on protein structure. We selected this manuscript out of curiosity for the spatial origins of solidification in liquid-liquid phase separated systems.<br /> Liquid-liquid phase separation (LLPS) is central to the spatiotemporal organization of biomolecules in the cell. Many of the proteins that are thought to mediate LLPS have also been found in pathological aggregates and fibrils that are associated with neurodegenerative disease. It has been demonstrated that liquid-like phase separated bodies can adopt gel-like or solid morphologies over time, which suggests that LLPS droplets may serve as nucleation points for pathological aggregates. This manuscript interrogates this process by characterizing the spatial characteristics of the liquid-to-solid transition within individual alpha-synuclein condensates using a set of fluorescence and infrared microscopy techniques. The authors found that droplets solidify form a central focal point that can be imaged through associated changes in fluorescence lifetime (via fluorescence lifetime imaging, FLIM) and protein secondary structure (via Fourier transform infra-red microscopy, FTIRM). To emphasize this significance in the text, we think it may be helpful if the authors added more background and discussion of previous literature on the spatial origin of solidification.<br /> These findings are exciting as they add new insight into biomolecular liquid-to-solid transitions, and relevant due to the potential role for liquid-to-solid transitions in neurodegenerative disease. We find that the combination of fluorescence microscopy techniques used here presents a strong model for studying spatiotemporal material properties of biomolecular condensates, which are challenging to characterize from a structural perspective due to their inherent heterogeneity and sensitivity to environmental factors. The power of these techniques is shown in their ability to complement the FLIM data into protein mobility (FRAP), structure (FTIRM), and interaction (FRET) components, providing a comprehensive look into the liquid-to-solid transition. We appreciated the use of small fluorophores rather than fluorescent proteins, as well as the confirmation by fluorophore-free techniques (TEM & cryo-SEM). Overall, we find that the data and the resulting model for the spatiotemporal dynamics of the liquid-solid transition are compelling.<br /> One area we are curious about is the sample handling, keeping a sample hydrated for 20 days is difficult. Would you be able to add a few words about the robustness of this moisture chamber in the main text? These aspects of the experimental design might not be obvious to a reader unfamiliar with the practical considerations of experiments like this, so more discussion would be helpful to anyone trying to reproduce the experiments. In a similar vein, a paragraph about the practical aspects of FLIM in the context of LLPS would be helpful. We also wondered about the necessity of the solidification timeline, how would the microscopy procedures described here work for a system that progresses to solid much faster than 20 days? What are the time limitations of these techniques? Would a faster system be expected to have the same center-growth effect as seen here?<br /> We were surprised that droplets appear to solidify from the exact center of the droplet in every case. If the model for solidification is that it begins from a (random) nucleation point, then why would droplet solidification always begin exactly in the center, as opposed to the inner or outer center regions that are mapped in Figure 1. We were left wanting more information about this, especially since FLIM is capable of resolving changes on these scales. It would be interesting to see if there are any cases where solidification does not begin from the exact center of the droplet. <br /> Some minor comments:<br /> -While the figures are clear and well-organized, a more colorblind-friendly palette could be used.<br /> -Infrared is occasionally hyphenated throughout the text.<br /> -The abstract figure may be clarified if the FLIM images were all of a single droplet, matching the cartoon.<br /> -The schematics describing the planes on the droplet are beautifully done and very helpful to understanding the figures.<br /> -Figure 1: formatting error with (e) placement.<br /> -Figure 2: (c) As we are unfamiliar with FTIRM, we thought it may be useful to have the corresponding secondary structure to each wavenumber (like the supplementary table 1 information) in the figure. Similarly, while supplementary figure 7 has a monomer and fibril control, we would have enjoyed that in the main figure.<br /> -Figure 4: (c) We wonder how consistent these recoveries are for several different droplets at the same time point.<br /> -For the TEM data (Fig 5), the results are a little bit different from other attempts to perform TEM on LLPS systems (for example, here: https://pubs.acs.org/doi/10.... A discussion of precedent would be appreciated in the main text. <br /> -Supplementary Fig. 11: We thought these EM images were fascinating and are curious if such images exist elsewhere for biomolecular condensates.


      We appreciated the chance to read and review this manuscript,<br /> The Debelouchina Lab

    1. On 2022-01-11 20:42:36, user Mina Bizic wrote:

      I would like to congratulate Rachel Szabo and colleagues on their great work and effort put into this manuscript. The goal of analyzing such a high number of particles has been something I have been calling for ever-since my work cited in the comment by Dr. Jacob Cram (Bizic-Ionescu et al., 2018). It’s exciting to see the efforts you have made in this direction.

      It’s equally exciting to see that my conclusion from 2018 that the initial colonization of particles is stochastic, is strongly featured in your paper title and well supported by your results.

      As Dr. Cram has mentioned in his comment, we discussed your study and have come up with several aspects that we feel deserve some attention and most likely to be better addressed in the manuscript. Some of these aspects were raised by Dr. Cram in his comment. However, we felt that our opinions on this manuscript were dissimilar enough to warrant separate comments, with some observations that overlap and some that differ.

      My general query goes to the applicability of the results to the natural environment, given several biases introduced by the chosen experimental system. I will list here my opinion on the source of these biases.

      1) The concentration of seawater is likely to have generated an unrealistic microbial community. This is for three reasons (A) concentration of particle-attached microbes, (B) concentration of large bacteria, and (C) non-concentration of DOM: <br /> (A) Filtering the water through a 63 µm mesh should leave all particles smaller than this size in the water The subsequent step of gentle centrifugation most likely further concentrated these microparticles increasing their abundance above natural concentrations. <br /> (B) The gentle centrifugation likely selected for larger bacteria, as smaller cells may not be concentrated by a 5 min 4000 g run. <br /> (C) Finally, the seawater DOM on which bacteria can feed was not concentrated in this process. <br /> Therefore, the resulting inoculum used for the experiment contains a size-selected microbial community and a microparticle enrichment which in the absence of ambient DOM will rapidly drive the experiments towards consumption of the particulate organic matter at rates not representing the natural environment.

      2) The incubation time and small volumes: While samples have been collected already after 12 h the experiment ran for 166 h in a closed microwell. It has been shown by many as well as by my colleagues and I that after 24 h at the latest, the community in the experiment does not represent the environmental one (for example: Baltar et al., 2012; Ionescu et al., 2015; Herlemann et al., 2019). Therefore, seeing such long experiments conducted in fully closed systems, as in this paper, makes me wonder to what degree the rates of events observed in the lab are similar to rates in nature.

      3) One possible problem with the incubation system used, is the effect of the microwell surface on microbial activity. Ploug and Jorgensen (1999), for example, came up with the net-jet system for measuring microprofiles on organic matter aggregates. However, aside of the effect of direct contact of particles with surfaces on particle properties and the microbial activity on it, a second issue is the formation of biofilms may form on the surfaces of the incubation system. Heterotrophic activity is known to increase in closed incubation systems (e.g. Fogg and Calvario-Martinez, 1989; Ionescu et al., 2015). Though it was shown that these biasing effects will occur regardless of bottle size (Hammes et al., 2010), these will likely have a stronger effect in very small incubation volumes (Herlemann et al., 2019), consuming oxygen and nutrients. I don’t recall reading whether the O2 concentration was monitored? My guess is that the system became anoxic relatively fast, unlike it would be in a natural environment. How does this affect the nature of associated (and active) bacteria?

      Having said that, I support the authors’ overall conclusion and applaud the effort that went into the data collection and analyses I am aware from my own work on the difficulties to obtain and maintain such a large number of particles in open systems, such as the one my colleagues and I designed. However, I think that the biases introduced by an experimental system should be openly discussed in the manuscript and if possible, explain how your results remain valid despite them. This is even more important when you often discuss late-stage particles, that are the most to be affected by aspects mentioned above.

      Sincerely,

      Mina Bizc

      References

      Baltar, F. et al. (2012) Prokaryotic community structure and respiration during long-term incubations. Microbiology open, 1, 214–224.

      Bizic-Ionescu, M. et al. (2018) Organic Particles: heterogeneous hubs for microbial interactions in aquatic ecosystems. Front. Microbiol., 9.

      Fogg, G. E. and Calvario-Martinez, O. (1989) Effects of bottle size in determinations of primary productivity by phytoplankton. Hydrobiologia, 173, 89–94.

      Hammes, F. et al. (2010) Critical evaluation of the volumetric “bottle effect” on microbial batch growth. Appl. Environ. Microbiol., 76, 1278–1281.

      Herlemann, D. P. R. et al. (2019) Individual physiological adaptations enable selected bacterial taxa to prevail during long-term incubations. Appl. Environ. Microbiol., 85.

      Ionescu, D. et al. (2015) A new tool for long-term studies of POM-bacteria interactions: Overcoming the century-old Bottle Effect. Sci. Rep., 5.

      Ploug, H. and Jørgensen, B. B. (1999) A net-jet flow system for mass transfer and microsensor studies of sinking aggregates. Mar. Ecol. Prog. Ser., 176, 279–290.

    2. On 2021-12-30 15:35:00, user Jacob Cram wrote:

      This is a public comment on Szabo et al. “Ecological stochasticity and phage induction diversify bacterioplankton communities at the microscale”, submitted to BioArxiv on Sep 21, 2021.

      Understanding the dynamics by which microorganisms attach to and grow on particles is an important and contemporary field in microbial ecology, and in the understanding of the factors that influence the role of particle flux in the global carbon cycle. Szabo et al focus on the randomness of this process. By taking ~1000 identical chiton beads and incubating them in the sea-water from the same sample, and looking at the community structure 100 beads at a time, over the course of seven days, the authors aim to quantify how much variability there is in the microbial take-over of these particles.

      The authors applied shotgun metagenomics to each and every particle, focusing on assembling genomes into metagenome assembled genomes (MAGs).

      Several key findings stand out to me:

      1) There is substantial variability over time in the microbial community structure, and on the number of microorganisms present per particle. <br /> 1a) The authors suggest that random variation in which bacteria attach to the particles and when they attach drives much of this variability.

      2) There do not appear to be statistical associations between which microorganisms are on a given particle. That is if a given species “A” is common on particle A and not particle B, that has no bearing whatsoever on the abundance of any other microbe on either particle.<br /> 2a) Such a finding suggests that there are essentially no meaningful interactions between the microbes on the particles. Cross feeding, predation, symbiosis, chemical warfare, all believed to be important for microbial communities (Fuhrman and Steele 2008; Steele et al. 2011) would each be expected to lead to some sort of statistical association between organisms, but in this scenario at least such patterns are essentially absent.

      3) The authors looked for contigs (partial phage genomes) and identified which appeared to “bin into the MAGs of their bacterial hosts”, suggesting that they were lysogenic with and therefore part of the genome of at least some members of that host. The more copies of this contig were present, the more active this phage was said to be. They found associations between the activity of these phages and the apparent growth of their hosts and negative associations between bacterial abundance and the presence of these phages.<br /> 3a) The authors suggest that stochastic absence of particular phages can lead to the situations where their hosts can rapidly take over a particle.

      I found this to be a very thought provoking manuscript and it raises a number of interesting and testable questions for future research. The sequencing and assembly of so many metagenomes, especially on very low biomass samples is an impressive technological feat (and clearly required diligent work on the part of the authors) which will be of value to the community at large. While some of my comments below are critical, I want to be clear that I was quite impressed with this paper and share these comments because I think the research is important and merits reflection.

      I have comments about each of the three main points listed above that I would like to share. I have not, as of yet, been asked to review this manuscript for any journal, but would be happy for any editor to use my comments. After preparing this review, I discussed it with Dr. Mina Bizic and she indicated that she shares my opinions. Dr. Bizic had several additional comments which she plans to make separately.

      Comment 1: On Stochasticity

      The authors make the case that there is randomness in the attachment and growth dynamics of microbial communities on particles. The authors suggest that because the variability between the communities on the particles is much higher than that of the surrounding water samples. However, I suspect that random variability in which rare taxa end up in each incubation could drive many of the patterns that they see.

      As context, in this experiment, chitinous beads (~80 micron diameter) are enclosed, one per well, in 96 well plates and incubated in, 175 ul of sea water. The microbes and particles have been concentrated in this small volume up to ten times by centrifugation. That is, volumes of whole sea-water were filtered, and then centrifuged and the bottom 1/10, presumably containing intact cells and small particles from the environment was retained. This means that each bead is incubated with essentially 1.75 ml of sea-water worth of microbes and microaggregates.

      I suspect that microbes that are adapted to degrading chitinous beads are scarce in the water, perhaps near or slightly below a concentration of 1 per 1.75ml. In this case, there could be random variability in whether chitin degrading microbes end up in any given well. Furthermore, a big driver in the randomness between which bacteria are in a well could be the presence of chitinous particles (smaller than the 60 micron filtration cutoff) in the background water. Ambient chitinous particles likely contain communities that would be adapted to break down chitinous beads. If one well happens to have one of these particles that particle is likely to come in contact with the bead near the beginning of the experiment in which case the microbes on the microaggregate can take over the bead. If such particles are absent, then perhaps the takeover of the bead doesn’t happen, or happens more slowly. Thus the stochastic process that drives the variability that the authors see may be in the starting community of the water in which the particle is incubated. If these organisms are rare, they would be likely to be missed by the sequencing, which can only sample the most abundant organisms. As they sequenced the seawater samples to a depth of ~500,000 reads per sample, and maintained about 25% of the samples (Table S5), this means that they essentially considered ~125,000 sequences per sample. Assuming the water had on the order of 1 million bacteria per ml, we might expect that any organism present at lower than ~10 copies per ml would likely be missed by their process. As there is an amplification step in their sequencing (supplementary methods) their method may even be less sensitive to rare organisms.

      Indeed, it is clear that the sequencing of the seawater didn’t catch every organism that could colonize the particles because per Table S7, some of the jackpot taxa (taxa that take over some particles) are either never seen or rarely seen in the seawater samples. Since they must have come from the seawater, it is clear that some species are missed by sequencing.

      Thus I contend that some of the particle to particle variability is likely from well to well variability in which microbes were stochastically placed in wells with each particle.

      On the other hand, it is possible that this stochasticity is environmentally relevant. For instance, an 80 micron bead that sinks through 100 m of the water column only clears a total volume of ~500 μl {π(80 μm / 2) ^2 * 100 m = 503 μl} and so it is possible that microbes beyond this abundance in water would actually be unlikely to encounter a particle as it sinks out of the photic zone, for instance.

      Comment 2: On interactions

      I’m surprised that there don’t seem to be interactions between organisms, but their graphical lasso based statistics seem reasonable to me.

      I’m furthermore surprised the authors did not seem to consider Bižić-Ionescu et al. (2018)’s paper, which has a very complementary design to this paper, but seemed to find the opposite pattern with respect to microbial interactions.

      Bižić-Ionescu et al. (2018) presented a very similar project, in which the authors also had replicate particles, though fewer than in the paper by Szabo et al. Key differences were that the authors used a flow-through rolling tank which exposed the particles to more water, and that those authors used (larger) aggregates of algae rather than chitinous beads as their particles. Bižić-Ionescu et al. did not quantify the variability in microbial abundance and so would not have seen the abundance dynamics that Szabo et al. saw, if they had occurred. Like Szabo et al. (this manuscript), they suggested that differences in the timing of microbial colonization of particles drive a lot of the particle-to-particle variability. Bižić-Ionescu et al. also saw statistical patterns that suggested interactions, as well as expression of genes for microbial interactions, including antagonistic processes. I hope the authors will consider the possible differences between the two systems and why those might lead to different dynamics, and what that says about the robustness and environmental realism of the patterns seen in both experiments.

      Comment 3: On viral contigs and non-assembled microbes

      The authors consider viruses that bin into MAGs which I presume means that they are often or always part of the microbial genome of a particular organism. I am not an expert on this process, but it seems to me a reasonable way of assigning viruses to hosts. I note that other validated tools for metagenomic host assignment are also available (Zielezinski et al. 2021). I presume there are many viral contigs that did not bin to a specific MAG. Why did the authors choose to ignore these?

      Similarly the authors focus only on those species that assemble into MAGs, I presume there is a bunch of microbial diversity that doesn’t assemble (since my impression is that in most communities not all sequenced contigs end up as part of a MAG). Could the authors expand on why they chose to ignore this diversity, and what impacts on their analysis only looking at assembled bacteria and not the rest of the microbial diversity might have on the analysis.

      I thank the authors for sharing this pre-print in a public forum and encourage them to consider these comments.

      Sincerely,<br /> Jacob Cram

      References

      Bižić-Ionescu M, Ionescu D, Grossart H-P. Organic Particles: Heterogeneous Hubs for Microbial Interactions in Aquatic Ecosystems. Front Microbiol [Internet]. 2018 [cited 2019 Dec 18];9. Available from: https://www.frontiersin.org...

      Fuhrman J, Steele J. Community structure of marine bacterioplankton: patterns, networks, and relationships to function. Aquat Microb Ecol. 2008 Sep 18;53:69–81.

      Steele JA, Countway PD, Xia L, Vigil PD, Beman JM, Kim DY, et al. Marine bacterial, archaeal and protistan association networks reveal ecological linkages. ISME J. 2011;5(9):1414–25.

      Zielezinski A, Deorowicz S, Gudyś A. PHIST: fast and accurate prediction of prokaryotic hosts from metagenomic viral sequences. Bioinformatics. 2021 Dec 14;btab837.

    1. On 2022-01-07 09:57:20, user David Bhella wrote:

      To help readers understand the process of peer-review, I am adding the peer-reviewer comments and article submission history for all of my preprints. For this article, although I was senior author - I was not corresponding author as the work was largely led by Dr Swetha Vijayakrishnan.

      The article was rejected without review at two journals prior to being sent for review at the journal of record. It underwent two rounds of review before acceptance.

      Reviewer Comments (Round 1):

      Reviewer 1

      In their manuscript Vijayakrishnan use Tokuyashi sections for electron microscopic imaging in the frozen hydrated state (‘cryo’). Tokuyashi sections are commonly used for immuno EM imaging in cell biology and then combined with dehydration. Direct imaging in the frozen-hydrated state results in higher molecular preservation compared to dehydration and resin embedding. The method is broadly applicable and relatively straightforward compared to cryo-FIB milling but does not allow comparable resolution levels.

      It was interesting to see that this manuscript again highlights the possible usefulness of cryo imaging of Tokoyashu sections. However, on the experimental side the reviewer does not see the novelty. In particular, Bos et al (ref. 13) seems to cover all novelty claims of the manuscript (application to cell culture, correlation with light microscopy). The remaining possibly novel aspect is the analysis of viruses by subtomogram averaging, which may shed some light on the quality of sample preparation. Nevertheless, the description of methods and analysis is somewhat superficial at this point. The conclusions on the association of pUL36 remain somewhat vague and do not appear statistically significant. Given the low resolution (~6 nm) indeed not too much can be concluded. Overall, the manuscript appears to touch on many things, but there is little novelty and conclusive results.

      Major points:

      • Page 5: “To our knowledge however, use of this method has thus far been confined to 3D imaging of tissue specimens 10-13.” This claim appears to be incorrect as Bos et al (ref. 13) applied the approach to cell culture – just as in this manuscript. Thus, it should be specifically stated which new contribution this manuscript makes to the field.

      • Page 5: “Here we present a modified strategy that combines correlative light microscopy and cryo-ET to locate regions of interest (ROI) in re-vitrified cell sections”. Again: what specifically is the novelty compared to ref. 13?

      • Page 14: “We successfully implemented this method …”. How do the authors validate their success? There are no quantifications provided. Is the method available?

      • Page 14: “To our knowledge this is the first attempt to implement this method on sub tomograms.” Previous implementations have already been reported in Schmid et al, PLOS Pathogens, possibly also later.

      • Probably the major problem in cryo-sectioning is the resulting compression. Thus, the reviewer would have expected an analysis of the effect of compression on subtomogram averages. Such analysis should be relatively straightforward given the available high-resolution structures of capsids.

      • The resolution of subtomogram averages appears overly low. Have the authors focused alignment and/or resolution measurement on specific parts of the capsids to compensate for compression and/or variable density in the core?

      • In the discussion the authors only compare cryo imaging of Tokuyashi sections to cryo-FIB milling / cryo-ET. A comparison to high-pressure freezing with freeze substitution and resin embedding should also be included.

      Reviewer 2

      In this manuscript, Vijayakrishnan et al. present an approach that allows the visualization of cells that have previously been fixed (cross-linking) prior to imaging using electron cryo-microsocpy. In this case the sample is subsequently vitrified in a state where the macromolecules have been chemically altered but in a way that allows direct imaging as opposed to imaging a counterstain, such as osmium or uranyl compounds. The fixation of material is normally avoided due to the significant chemical alteration of macromolecules within sample, and makes the analysis of additional densities associated with any such macromolecule a potential minefield to study. The reviewer appreciates the need to make the analysis of cellular material using cryoEM easier, but is unconvinced that performing structural biology in a background of chemical fixation is an appropriate route to go and will inevitably lead to structural information that is wrong.

      The attempt to visualize viral capsids is an interesting application and is one that is sensible. The capsids in the nucleus look to have retained much of their native architecture. The argument put forward that the C-capsids in the nucleus have extra densities present seen in mature capsids is strong, but is beset by a lack of control experiments, a lack of analysis in terms of other material found in their preparations, and a lack of appropriate interpretation of secondary analyses.

      1. The use of fixation using this approach results in the material not being in a “native” state as is the case with regular cryoEM methods. This is significant as this alteration to the macromolecular structure means that any subsequent structural analysis will be potentially affected by artefacts of this approach. This reviewer believes therefore that one must be very careful when analyzing the results of any potential structural analysis in this manuscript.

      2. The authors have not presented the proper controls for some of the interpretation of their results.

      A control is needed here to structurally analyse the herpesvirus capsid with the CVSC (positive control) after fixation – this should be relatively easy if you fix the mature virions and do sub-volume averaging on these virions to assess whether deformities in the CVSC structure are introduced. It is a gross misrepresentation to compare this structure in a fixed/unnatural/dead state to one from the EMDB determined in a frozen-hydrated state as has been done in Figure 5.

      These controls also apply to the subsequent analysis to determine the architecture of the capsid pore vertex.

      The capsids found in the cytosol exhibit significant breakage and are distorted when compared to those in found inside the nucleus (see figure 3) and is not commented in the manuscript. This is a significant concern as it would suggest that there is some damage to the structural integrity of potential targets. It also a shame, as the cytosolic capsids would appear to me to be a great target to compare structurally with the nuclear capsids.

      This would also be a concern were anyone wanting to use this approach to target processes occurring in the cytosol, as it seems there is a greater effect on macromolecules in this subcellular compartment.

      1. The cells are initially grown to confluency as a monolayer and then infected with the virus for 12hr. At this time point the cells are fixed which completely kills the cells. The cells are then scraped and pelleted. One assumes that after fixation of cells there is significant disruption to the structural integrity of the cells – a picture or demonstration of the state of the cells after this treatment would help to understand what exactly goes into the subsequent steps. Figure S1 shows widespread DAPI staining illustrating the point that there is significant mixing of compartments making knowing exactly what is being imaged difficult. My concern is that an additional step is needed to ascertain where is being imaged as the DAPI is almost everywhere.

      2. The attempt to classify the capsid 5-fold vertices makes the analysis of the CVSC confusing and brings up further questions about what is really going on as the analysis done here restores the CVSC to the B-capsids.

      The techniques outlined aim to address a curious debate in the herpesvirus field – namely whether the capsid vertex specific component (CVSC) is present on C-capsids on the nucleus, and it is important to frame the conclusions of the paper in this context. The protein component has multiple names that reflects the belief among different members in the herpesvirus community as to its true role or when/where/how it functions in relation to the capsid. The CVSC is made up primarily of pUL17 and pUL25 with a significant contribution of two helices from the C-terminal tail of pUL36. pUL36 is a very large protein, and its presence in the nucleus is unlikely in is full-length state. Debate continues in the field as to whether splice isoforms of pUL36 contribute to binding at the CVSC in the nucleus.

      In the present study, extra density is visible on C-capsids that is not visible on other capsids types (A and B), though in the case of B-capsids this density is visible after classification. These discrepancies need to be cleared up as the resolution limitation on the capsids makes it impossible to say what components are visible on the CVSC at this point – UL17 and UL25? UL17, Ul25 and extra density (UL36?)?

      1. Washing the sections a few times in PBS after infiltration would seem to this reviewer not wholly effective at removing73 the sucrose. Fig3 – halo around the multimembrane.

      2. The sentence “The pentaskelion density in B-capsids is more prominent than C-capsids; likely owing to far greater numbers of B-capsids (526) used during processing than of C-capsids (125). These data support our suggestion that low occupancy of CATC on B-capsids led to weaker density in icosahedrally averaged density maps. They are clearly visible upon asymmetric reconstruction (Figure 6), but not during symmetric reconstruction (Figure 4).” The significance of this analysis is not clearly explained.

      3. Introduction is far too long – I suggest the authors rewrite in order to make it more concise and streamlined i.e. significance of SPA, the play-off between cryoET and classical methods and the need to find more approachable methods. This introduction could be written with the same effect in half the space.

      4. It would really help the reader to have a correlative figure in the supplement (for example in S1) that goes from light microscopy

      Figure 1.<br /> The DAPI stain is present in the field of view of both cytosolic and nuclear regions – why is this?

      It is very hard to discern in this figure how the determination of what is nucleus and what is cytoplasm is made.

      Figure 2.<br /> Why have the authors excluded fluorescence data from this figure? One would assume this would be the most effective use of their correlative approach as it is possible to actually discern cellular features directly through EM here.

      The segmentation in Figure 2b is something of an eyesore. I would redraw or redesign a mean of highlighting the membranes.

      It is not easy to see the different types of capsid with this annotation- an A-capsid is not highlighted (left of field of view) for example, and the box is not immediately obvious. Why box and arrow? Why not all box or all arrow?

      Panel d is completely unannotated. Why is there a halo around the multilamellar vesicle (that is not a CTF effect)?

      Figure 3.<br /> The authors should comment on why the capsids in panels A and B look undamaged, while those in C and D exhibit significant damage/deformation.

      Figure 4.<br /> Why do the densities interface between the capsid coat and the inner regions blur as you move from A to B and C? A myriad of cryoEM structures of viral capsids have been determined and do not exhibit such an artefact.

      Figure 5.<br /> The comparison in this figure is not appropriate at all for a number of reasons: the structures are determined via different means (fixed vs non-fixed), the structures are at completely different resolutions which I consider to be a cynical attempt to improve how the authors’ own data appear - the figure on the right should at least be presented at the same resolution as the authors. The colour scheme is inadequate to show the CVSV, which should be the only thing visible here to help the reader to see what the authors are referring to in its entirety.

      Figure 6. <br /> The data in this figure are in conflict with those shown in Figure 5, and leads to some confusion. Symmetrically-determined 5-fold vertices are classified in an asymmetric manner. Therefore, the number of icosahedrally-related positions for the C-capsids remains the same. The data suggest that if you relax the symmetry then the CVSC density on the C-capsids smears due to low numbers – but this seems completely illogical to me. Why would an ordered density smear? Remember that this structure can be refined to ~3.5Å in cryoEM. If the occupancy in B-capsids is too low to get an effective CVSC in an icohedral reconstruction why would it be better in an asymmetric classification unless the structure of the CVSC is different to that of A-capsids? What happens if you reduce the number of particles from each each virus type to be the same number? Does the B-capsid density also smear?

      Once again, using the EMDB structure as shown in C is inappropriate.

      Figure S1.

      It is almost impossible to know how the authors came to the determination that these are different regions of the cell from this figure. This figure makes it clear that it is hard to determine what parts of the cell belong to where.

      Reviewer Comments (Round 2):

      Reviewer 2

      It is a shame that the current pandemic has resulted in the shutdown of the Authors’ Institute, and the reviewer would like to express their sympathy for this situation. Hopefully, things will change in the coming months.

      The lack of experiments validating either of the method or the major results (putative CVSC density on the capsid surface in the nucleus) is still a major concern, and without such experiments it is not possible for the reviewer to recommend publication.

      1. Publishing a single structural result at low resolution and without further validation either from comparison to other subcellular structures (e.g. ribosome or cytosolic capsids) or using biochemical means (e.g. immunolabelling with nanogold of CVSC components) adds to the confusion in the literature as to whether the CVSC is present, in part, whole or not at all, in the nucleus or not and such a publication would not be beneficial. Should analyses on other components also lead to structures that exhibit no difference to results previously published one can be more confident in novel results – though still not 100%.

      2. It is still unclear to this reviewer how exactly capsids are determined to be nuclear from the analysis of Figures 1, 2, and S1. While it is possible to see regions of membranes, the fact that the cells are disrupted using their methodology combined with the presence of DAPI in multiple regions adds to the confusion as to whether the nuclear capsids are indeed nuclear capsids. In Figure 1, it is possible to make out blue dots and red dots separate from one another and together. Capsids also containing DNA and would likely be stained by DAPI. This is not followed on in Figure 2, which is annotated manually. Is it assumed all capsids away from membranous regions are nuclear?

      3. In Figure S1, the caption says nuclei are stained with DAPI. Everything seems to be stained with DAPI.

      4. Figure 4 separates A-, B- and C-capsids. C-capsids would be more prevalent in the cytosol as this is a sign of maturity. Through following Figure 1 and Figure 2 it is not clear how the isolation of populations from subcellular components is achieved. The authors should think about how to make this process clearer.

      5. In terms of the method itself, the Authors propose this as a relatively easy method for routine examination of macromolecules in situ. This should mean that subcellular structures should not be difficult to determine and well know samples should be examined.

      This reviewer would like to re-assert the point that chemical fixation in a background macromolecular milieu is prone to artifacts. As such, fixation in cellulo vs in vitro is different. This is reflected in a statement that remains in the text of the manuscript:

      “The use of chemical fixative may cause some structural artefacts, possibly contributing to the low resolution of capsid structures in our study (5-6 nm), in comparison to resolutions obtained from subtomogram averaging of proteins from unfixed cryo-ET of for example purified virions (0.8-2 nm).”

      In the Authors’ rebuttal, they make the point that gradient fixation methods have been previously employed to determine structures of macromolecular complexes. However, the objective of these methods is to stabilise complexes of recombinantly expressed and isolated macromolecules that are prone to falling apart under buffer conditions. Furthermore, the complexes are known as they are biochemically characterised. The original grafix paper (Kastner et al., 2008) argues that the potential for the technique to improve structure determination is due to homogeneity, and this is borne out by the citations of that article.

      There is also a contradiction in logic; if chemical fixation is one stated factor potentially limiting the resolution of the capsids in this manuscript, why then are grafix methods elsewhere able to be used to determine high-resolution structures. Is It due to the presence of cross-linked entities or due to lack of particles? Such questions are why I feel the need for more work is required to validate the major finding.

      Finally, a 40-60Å structure is not equivalent to a 3-4Å structure and should not be presented as such.

      Reviewer 3

      In this manuscript, Vijayakrishnan et al describe the in-situ structures of HSV-1 capsids within the nuclei of host cells determined by subtomogram averaging, coupled with correlative light and electron microscopy (CLEM) and cryo-electron tomography (cryo-ET) of re-vitrified cell sections. Although at low resolutions, the reconstructions of the three types of capsids show the major components of penton, hexon and triplex. In addition, the C-capsids within the nucleus have extra densities, contributed by the capsid-vertex specific component (CVSC), are readily observed. The structural work is interesting in that the authors demonstrates an economic, easier and high-throughput approach to determine the in-situ structures of viruses using re-vitrified sections. However, a number of overstatements or concerns have to be corrected or be addressed before the publication.

      1. In the abstract on page 2: “Our reconstructions reveal that the capsid associated tegument complex is present on capsids prior to nuclear egress.” This is an overstatement. Previous single particle cryo-EM works have demonstrated that the CVSC binds to capsid prior to nuclear egress. (Conway at al., JMB 2010; Homa et al., JMB 2013; Dai et al., Science 2018 and Ref 29)

      2. In the introduction on page 6: “Our data reveal the presence of the CVSC pentaskelion on HSV nucleocapsids in the nucleus, suggesting that capsids may bind the tegument protein pUL36 (VP1/2) prior to nuclear egress.” This is again an overstatement. Previous single particle cryo-EM works have already revealed the presence of the CVSC on HSV C capsids purified from the nucleus.

      3. On page 10: “There has been uncertainty in the HSV field of how pUL36 and pUL37 are recruited to the capsid, if this happens within the nucleus or after nuclear egress. To shed light on this question, we carried out cryo-ET on the mutant lacking pUL37 (FRΔUL37).” Given the low resolution of the HSV capsid reconstruction determined by the authors, this work has no help to solve the uncertainty of how pUL36 is recruited to the capsid.

      4. Page 14: “Moreover, our analysis revealed pronounced star-like CVSC density at the penton vertex in the C-capsids, comparable to previously reported high-resolution structures of capsids within purified HSV-1 virions.” The CVSC density of nucleocapsid from virion are obviously better than the counterpart from the C-capsid. While the nucleocapsid shows strong CVSC densities extending from the penton to the triplex Ta, the C-capsid shows a much weaker and smaller tegument densities that only bind to the penton of C-capsid (Fig. 5).

      5. Page 15: “Our method opens the possibility of determining and characterising specific complexes and their interactions at high-resolution within the functional context of the cell or tissue, providing snapshots of important and dynamic events in biology.” Given the poor resolution of the HSV capsid determined in this work, this statement is hard to be justified.

      6. Page 20: “The subvolumes were subjected to 3D classification with a T value of 5, to reconstruct a single 5-fold vertex, without refining orientations and origins. A total of 10 classes were calculated with one of them identified to have apparent pentaskelion density over the 5-fold axis, corresponding to CVSC, in both B-capsids and C-capsids. " It is well established that all the vertices of HSV C-capsid and virion nuelcocapsid are fully occupied by CVSC, why only one of the ten classes has apparent pentaskelion density over the 5-fold axis in the C-capsid?

      7. Legend for Figure 5 on page 22: “high-resolution structure of purified capsids from within the nucleus at an equivalent resolution. " This sentence should be corrected. At first, the structure is from virion nucleocapsid not from nuclear capsid; Second, the structure has already been filtered to low resolution and could not be stated as high-resolution.

    1. On 2022-01-06 08:28:03, user David Bhella wrote:

      To help readers understand the process of peer-review, I am adding the peer-reviewer comments and article submission history for all of my preprints. This article presents the work of a number of students and post-docs that passed through my lab over many years, we attacked the problem from a number of different directions before we achieved an interpretable structure, through the application of Cryo-electron tomography and sub-tomogram averaging.

      The paper was rejected without review by two journals. We made it out to review at the next journal we submitted to, but unfortunately the article was rejected following one negative review. I found the quality of that review rather disappointing, but the journal refused our appeal (see below).

      Fortunately we had a far better experience at the journal of record where the paper was handled by a very supportive editor and peer-reviewers were positive about our work. The review process there is transparent, the critique is available on the publisher site.

      Here is the peer-review report that led to the paper being rejected.<br /> Thanks to reviewer 2 for their constructive report. Reviewer 1 - not so much.

      Reviewer: 1

      This is a paper that might have been submitted 10 (or even 20) years ago, but is so far from current standards in cryo-EM that I have no enthusiasm for seeing it published, even in a more specialized journal. The authors talk about how the problems frustrated attempts at a Fourier-Bessel 3D reconstruction, but it has been many years since people used such approaches. Modern software, such as Relion or cryoSPARC, all use iterative realspace methods for helical reconstruction. The analysis of the lattice is based upon one horribly noisy power spectrum from one tube. Many other large diameter tubes have been studied at high resolution, and almost all of these involve variability in diameters. The authors should look at Kalia et al., Nature, 2018 on Drp1 tubes, or Junglas et al., Cell, 2021 on PspA tubes to see how such problems are routinely treated. The paper is filled with statements such as how the features they see are "morphologically very similar to previously described decameric and undecameric rings produced by recombinant expression of RSV N" or how "making accurate measurements of the lattice was challenging" or "leading to these densities appearing to be more closely packed in the sub-tomogram average than they actually are". Given all of this, I found all of the modeling highly questionable.

      Reviewer: 2

      General comments

      RSV is an important human pathogen and the main cause of bronchiolitis in newborn children. There is no vaccine nor efficient antiviral compounds against this virus and the exact architecture of virions remains to be deciphered. In this work, the authors have used cryogenic electron microscopy (cryoEM) and cryogenic electron tomography (cryoET) to study the architecture of real RSV particles. They also used a particular technique to obtain these impressive data, the growing of RSV particles directly on transmission electron microscopy grids before flash freezing. This important detail was critical to obtain original filamentous viral particles instead of heterogenous and anarchic shaped virions as seen in previous publications. The use of a 300 keV electron microscope allowed images of unprecedented high quality, revealing a couple of quite unexpected results: (1) viral particles are much more organized than expected; (2) the matrix layer is formed by M-dimers geometrically organized as a curved lattice; (3) the presence of ring-shaped assemblies, likely formed of the nucleocapsid protein N and RNA and packaged within RSV particles in addition to the helical, long and filamentous viral genome encapsidated by the N protein; (4) there is a helical ordering of the glycoproteins on the virus surface (5) … that tend to cluster in pairs.

      The structural data presented in this manuscript are novel, convincing and make a significant contribution to the field. The data show for the first time that RSV particles exhibit helical symmetry at two levels, the matrix protein and the surface glycoproteins.

      Using the previously resolved atomic structure of M dimers, they modeled the lattice of M dimers that coordinate virions assembly and helical ordering of the glycoproteins at the surface of virions.

      The viral genomic RNA, 15 kb in length, is encapsidated by the viral nucleocapsid protein (N) to form a left-handed helical ribonucleoprotein complex. However, when N was previously expressed as a recombinant protein, N-RNA rings were obtained in bacteria or using the baculovirus system; but their presence, their role in infected cells and their possible presence in viral particles was totally unknown. The presence of RNA-N rings in viral particles was unexpected and intriguing result, raising new questions, in particular do these N-RNA rings packaged in virions play a role in the viral cycle or are they packaged incidentally? Do they contain some specific RNAs? The images indicate that they are located around the central nucleocapsid containing the viral genome.

      Specific comments

      Although the paper is well written, there are a lot of references which are not the right ones, missing or misplaced:

      Introduction

      “The viral RNA is encapsidated by multiple copies of the viral encoded nucleocapsid protein (N) to form a left-handed helical ribonucleoprotein complex (or nucleocapsid - NC).» Reference 6 (Bakker et al., 2013) should be placed at the end of this sentence as well as ref 14 (Liljeroos et al., 2013).

      « This serves as the template for RNA synthesis by the RNA dependent RNA polymerase (RdRp)6,7”: the demonstration that Nucleocapsid serves as a template for the polymerase was not shown in references 6 & 7. This assumption was for a long time inferred from data obtained with paramyxoviruses and rhabdoviruses. In Garcia et al., 1993 (doi: 10.1006/viro.1993.1366), transient coexpression of RSV N and P proteins in eukaryotic cells resulted in the formation of cytoplasmic inclusions that resembled the inclusion bodies found in infected cells. In Garcia-Barreno et al., 1996 (doi: 10.1128/JVI.70.2.801-808.1996), the interaction domains between P and N were identified, then further in Slack and Easton, 1998 (doi: 10.1016/s0168-1702(98)00042-2), Khattar et al., 2001 (doi: 10.1099/0022-1317-82-4-775), Castagne et al., 2004 (doi:10.1099/vir.0.79830-0.), Tran et al. 2007 (ref 33), Asenjo et al., 2008 (doi: 10.1016/j.virusres.2007.11.013). Sourimant et al. in 2015 (doi: 10.1128/JVI.03619-14) showed that P binds L through its C-terminal region, which was confirmed by Gilman et al. (ref 10).

      “… thought to occur in virus induced cytoplasmic organelles called inclusion bodies8,9. »: should refer to Rincheval et al., Nat Commun. 2017 too (doi: 10.1038/s41467-017-00655-9), which was the first paper showing that viral RNA synthesis occur in inclusion bodies for RSV.

      “The RdRp comprises two proteins: the catalytic large (L) protein and the phosphoprotein (P) that mediates the interaction with the NC 10. »: again, the reference 10 only describe the structure of the PL complex.

      “the matrix protein (M), which coordinates virion assembly together with M2-1 ». The role of M2-1 in the architecture of RSV is still debated. Although the location of M2-1 between M and the nucleocapsid was suggested by Kiss et al., 2014 and Liljeroos et al., 2013, Meshram and Oomens in 2019 (https://doi.org/10.1016/j.v... have shown that P, M and F are sufficient for the formation of viral pseudoparticles, which wasconfirmed by Bajorek et al., 2021 (doi: 10.1128/JVI.02217-20). Furthermore, incorporation of N in VLP did not need M2-1 (Forster et al., 2015 ref 15; Fig.6A). Although in Li et al. 2008 (doi:10.1128/JVI.00343-08) some experiments suggested that M2-1 is needed to recruit M to inclusion bodies, this was denied in Bajorek et al., 2021 who also showed that M directly interacts with P.

      “M2-1 forms a second layer at the virion interior, under the M-layer, and associates with NCs 13,14. »: again, I think the authors have transformed a hypothesis into an assertion considered as definitively accepted.

      “High resolution structures for some of the envelope associated proteins of both RSV and HMPV have been determined by X-ray crystallography, including the matrix proteins 15-17 the F glycoprotein 18,19 and M2-1 20,21. »: again, the first structure of RSV M2-1 was published by Tanner et al., 2014 (doi:10.1073/pnas.1317262111).

      Results<br /> Legend of Fig.2: the authors highlighted with colors the presence of glycoproteins, M protein and M2-1 protein on the tomogram (“The lipid bilayer is highlighted in pale blue, the matrix layer in orange and the M2-1 layer in dark blue. »). Although there can be no ambiguity for surface glycoproteins, concerning M and mostly M2-1 the situation is more uncertain. A formal demonstration of the presence of these proteins would require additional experiments such as immunogold labeling (not compatible with cryo-EM) or corelative microscopy. Could it be for example the phosphoprotein? Although highly disordered, this protein could be compacted and folded in the viral particles. The authors should be more prudent and talk of probable or putative localization for these last two proteins like they do in the text where they say “Underlying the lipid bilayer is a contiguous density that we attribute to the matrix protein (M). ».

      “The virion interior is densely packed with viral nucleocapsids, mainly having the characteristic herringbonemorphology 31 and suggesting that in common with several other mononegavirales, RSV virions are polyploid 32 (fig 3A, movie S1 timepoint 1m 08s). »: on the picture and in the movies we only see one continuous helical nucleocapsid. Were several nucleocapsids in the same axis along the filamentous viral particles? Were several parallel nucleocaspids observed in some portions? In Fig.3A the herringbone structure is placed at the centre of the viral filament; was it always the case? Was the length of nucleocaspids as expected or were there some truncated genomes?

      The presence of N-RNA rings in the viral particles was unexpected and very surprising; the authors say : “….strongly suggest that many of these objects may indeed be N-RNA rings, perhaps being products of aborted genome replication. ». Can the authors exclude that these objects could contain cellular RNAs? Recombinant expression of RSV N protein has shown that there is no apparent sequence specificity for RNA encapsidation. Cellular short RNAs such as tRNA could also be encapsidated in rings.

    1. On 2021-12-10 16:52:55, user Alizée Malnoë wrote:

      The manuscript by Ruiz-Sola et al. investigates the relationship between photoprotection responses, carbon concentrating mechanisms (CCM) and CO2 availability in Chlamydomonas reinhardtii. While photoprotection responses, mediated by LHCSR3, LHCSR1 and PSBS, are traditionally described as triggered by excess of light, this manuscript highlights the role of intracellular CO2 levels (both deriving from the environment and from mitochondria metabolism) in regulating these responses. Indeed, it demonstrated that photoprotection, and especially LHCSR3-mediated responses, are from one side inhibited in conditions in which inorganic carbon is largely available and abundant (acetate and external CO2 supply) and on the other side induced in conditions of reduced CO2 availability. Furthermore, CCM are also induced under high light (HL), in response to a drop in intracellular CO2 levels due to increased photosynthetic carbon fixation.

      While changes in the expression levels of both LHCSR3 and CCM genes at different CO2 concentration and under HL respectively, were previously reported, this manuscript has the novelty to connect these observations in an elegant experimental set up with several genetic backgrounds to confirm and prove their hypothesis through the use of mutants affected in mitochondrial respiration and of metabolic modeling. The proposed model for light-independent regulation of photoprotection is convincing and solidly backed-up by data. In addition a role for CIA5 in positively regulating LHCSR3 (and to a lesser extent PSBS) mRNA expression and in negatively regulating LHCSR1 at the post-transcriptional level is shown.

      However, we have some comments and suggestions to improve the manuscript, listed below.

      Major comments <br /> Figure 3, and corresponding result paragraph pages 6 to 8:<br /> - A large part of the results (1.5 pages) focuses on modelling the interaction between acetate metabolism and intracellular CO2 levels. Although we are not experts in mathematical modeling and thus we are unable to give proper feedback regarding this part of the paper, we think it adds small value to the main results of the paper. This is especially true as the modelling relies on a number of assumptions (listed at the bottom of page 7) which are not supported by literature nor experimental data, weakening the solidity of its conclusions. As it is, only assumption iv (page 7, “the acetate uptake is low (...) for the mutants (as indicated in Fig 2C and F)” is backed up by data. <br /> We suggest moving figure 3 to Supplementary material and shorten its description in the results and discussion. Please also provide better support to justify the assumptions i to iii, as well as the assumption that photon uptake is not altered in the mutants (e.g. do they have similar chlorophyll content?) and make the conclusions more solid.<br /> - Page 6, “In line with the experimentally observed values, we found that the predicted generation times for the icl and dum11 strains (...) did not differ from those of LL grown WT cells”. Please, provide the experimental values for the mutant strains, or rephrase the sentence.

      In Figure S1F to K: <br /> - During exposure to L2, the basal fluorescence Fo’ in the presence of acetate (and to a lesser extent CO2) is rising together with the maximal fluorescence Fm’. Please provide explanation or hypotheses for this fact, and if it might or not affect ETR and NPQ calculations. <br /> Also consider replacing “qE” with “fast-induced fluorescence quenching” or simply “NPQ”, as other regulation mechanisms might affect these fluorescence measurements.<br /> - Please precise the time points you used for assessment of Fo, Fm, and calculation of qE.<br /> To make this figure more understandable please provide clearer fluorescence traces in Figure S1 (C-K), showing only Fo, Fm and Fm' (ideally one plot for each genotype to be consistent with Y(II) and NPQ plots, L-N and O-Q) and a separate panel with Fo and Fo'.

      Figure 6B and corresponding text page 11:<br /> - Please provide an explanation for the cia5 mutant line accumulating high LHCSR1 protein and not fully reverting to wild type level in the complementation line under VLCO2 (and dark/ air). This aspect needs to be taken into account and clarified, especially in light of CIA5 proposed role as LHCSR1 regulator at the post transcriptional level. Rephrase this sentence “However, LHCSR1 protein over-accumulated in the cia5 mutant under all conditions tested, although the WT phenotype was only partially restored in cia5-C (Fig. 6B)” as this the case only for HL/air.

      Minor comments <br /> Title: Please add “algal” to the title, or a similar clarification.<br /> Introduction:<br /> - Page 3, when mentioning carbonic anhydrases (CAH) as part of the CCM please list the ones involved in CCM. Not all CAH are part of CCM (also it is useful to see their names, since the expression levels of some of them are measured in the results part). <br /> - Page 4, in the sentence "Here, using genetic, transcriptomic and mathematical modelling approaches, we demonstrate that the inhibition of LHCSR3 accumulation and CCM activity by acetate is at the level of transcription and a consequence of metabolically produced CO2" please replace "transcriptomic" with "expression analysis on selected genes", since no transcriptomics work has been shown in this manuscript. <br /> - Page 4, please reformulate the sentence "This work emphasizes the critical importance of intracellular CO2 levels in regulating LHCSR3 expression and how light mediated responses may be indirect and reflect changes in internal CO2 levels resulting from light intensity dependent, photosynthetic fixation of intracellular CO2". Based on the previous reports and from this work, we can say that internal CO2 levels are important in regulating activation and inhibition of LHCSR3-photoprotection mechanisms, BUT it does not mean that the light effect is indirect, this has not been proved yet. Furthermore, photoprotection by NPQ could lead to diminished CO2 fixation rate (especially sustained “photoinhibitory” quenching types), thereby increasing internal CO2 concentration which would according to your model repress photoprotective genes. This could be the case for genes involved in qE but may not be a general rule for “photoprotection”. The title could also reflect that aspect by specifying NPQ, qE in lieu of photoprotection.

      qRT-PCR results:<br /> - qRT-PCR results are described here as "mRNA accumulation". Please replace this nomenclature with "relative expression levels" or "relative gene expression".<br /> - It is stated in the methods, page 17, that the results presented are normalized on a reference standard gene, GBLP. However, the results presented seem to be (also?) normalized on the WT LL air. Is this correct? If so, please precise or clarify it. Instead of normalizing the data to the WT LL air, we suggest normalizing the transcript abundance of the target genes in each sample to your internal reference standard gene (GBLP) only. <br /> - Please provide a description on how the relative gene expression levels were calculated. We suggest calculating by determining the ΔCt levels of the sample compared to the standard and the 2^(-∆Ct) as final value.

      Paragraph "LHCSR3 transcript accumulation is impacted by acetate metabolism": <br /> - page 4, it is not clear in here the transition between TAP and HSM media.<br /> - page 4, rest of the text and figures legends, please indicate CO2 concentration in ppm (according also to figure 6D) instead of 5% CO2.<br /> - icl-C line not behaving the same.

      Paragraph "CO2 generated from acetate metabolism inhibits accumulation of LHCSR3 transcript and protein": <br /> - Page 5, “RHP1 (...) encodes a CO2 channel shown to be CO2 responsive and to accumulate in cells growing in a high CO2 atmosphere”. It is unclear here if RHP1 is sensitive to intracellular, extracellular, or both levels of CO2. Please better describe how the protein levels reflect the intracellular CO2 concentration.<br /> - Since Figure 1 includes results both described in this and in the previous paragraph, we suggest grouping the results described in Fig1 in a single paragraph and make a shorter but clearer description of the results.<br /> - Fig 1: you could merge Fig 1A and C in a single plot with WT icl, icl-C and dum 11 in LL and HL to make the comparison between the mutants clearer. Also, the same can be done for the panels B and D.

      Paragraph “Impact of carbon availability in other qE effectors”<br /> - Page 8, "We took HL acclimated cells that typically accumulate both LHCSR3 and LHCSR1 proteins (Fig. S2A) and performed photosynthetic measurements in the absence or presence of 20 mM sodium bicarbonate; the bicarbonate addition was just before performing the photosynthetic measurements. As expected, bicarbonate enhanced rETR (Fig. S2B) and….almost completely suppressed qE despite the fact both LHCSR3 and LHCSR1 had accumulated in the cells (Fig. S2)". The accumulation of these proteins was not checked in presence of bicarbonate in this particular experiment (the bicarbonate was added shortly before measuring photosynthetic parameters). Please, rephrase the sentence.<br /> - Page 9 and Figure 4B and Figure 5C " PSBS protein accumulation could not be evaluated because it was not detectable under the experimental conditions used. " It is surprising you could not detect PSBS in these conditions (600 uE), while it was possible in the conditions described in Fig 6B. At least the HL conditions (600 uE) were the same in these two experiments. Please provide an explanation for this, or if it is not possible, rephrase without mentioning PSBS expression and accumulation in the text and for clarity reasons remove Fig4A. <br /> Paragraph “CCM1/CIA5 links HL and low CO2 responses”<br /> - Page 9, "To elucidate the molecular connection between photoprotection and CCM, we analyzed mRNA accumulation from the CCM genes encoding LCIB and LCIE (involved in CO2 uptake), HLA3, LCI1, CCP1,CCP2, LCIA, BST1 (Ci transporters), CAH1, CAH3, CAH4 (carbonic anhydrases) and the nuclear regulator LCR1, all previously shown to be strongly expressed under low CO2 conditions (see (49)for a review on the roles of each of these proteins and (45)for the more recently discovered BST1)." Please provide the whole name for the reported abbreviation of the proteins that were not mentioned earlier in the text.

      Paragraph “Intracellular CO2 levels regulate photoprotective and CCM gene expression in the absence of light”<br /> - Page 11 and Figure 6C: the figure is unclear, making the quantification hard to pick up and understand. Please consider replacing the “LHCSR3 (r.u.)” line above the panel by a histogram clearly displaying the LHCSR3/ATPB ratio; add error bars. If no repeats/error are available, please refrain from using these quantification data and rephrase the paragraph page 11 to replace quantitative statements ("...which was reflected by a 3-fold change in the accumulation of the protein…", "and 21 fold (protein) compared to air dark conditions (Fig. 6A-C)...", "...and protein level (by a factor of~9)...") by qualitative ones.<br /> - Page 11, "This CIA5-independent regulation of mRNA in the presence of light could account for the contribution of light signaling in LHCSR3 gene expression, possibly via phototropin (10)" This should be discussed properly in the discussion section.<br /> - Page 11, “the cia5 mutant did not accumulate significant amounts of LHCSR3 protein under any of the conditions tested (Fig. 6B)” The lack of LHCSR3 in HL in the cia5 mutant is quite striking considering that its transcript level is quite high and similar to wild type. Please provide a possible explanation for this observation.<br /> - Page 12, please replace " in accord" with "in line" or "it fits the hypothesis" <br /> - Page 12, Fig 6E, for clarity, please develop the statement "In contrast to LHCSR3, sparging with VLCO2 only partly relieved the suppression of transcript accumulation for the CCM genes in the presence of DCMU (Fig. 6E)". For instance, consider adding “..., bringing it back to LL levels instead of the accumulation observed in HL in the control (see dotted line in Fig. 6E)”.

      Discussion<br /> - Page 13, "Increased CO2 levels were found to dramatically repress LHCSR3 mRNA accumulation, in agreement with previously published works (34, 35), but had little impact on accumulation of LHCSR1and PSBS transcripts". It is hard to say if it has a little or no impact on PSBS gene expression. We suggest not putting emphasis on the PSBS expression levels difference.<br /> - Page 14, beginning of last paragraph, “Our data demonstrate that most of the light impact on LHCSR3 expression is indirect”. Please tone down these sentences and discuss them with regards to the recent study by Redekop et al. (ref. 46). We suggest replacing this sentence with "Our data demonstrate that besides LHCSR3 gene expression variation together with changes in the light environment, it is also tightly linked to CO2 intracellular changes”. <br /> - Page 14 "It is tempting to propose that CO2 could be considered as a retrograde signal for remote control of nuclear gene expression, integrating both mitochondrial and chloroplastic metabolic activities". This sentence is very speculative, although clearly marked as such. To further soften the point, please consider adding “Further studies will have to be carried on to confirm or infirm this possibility”. <br /> - Page 15 "The CIA5-independent light-dependent induction of photoprotective genes possibly involves phototropin, as previous shown (10), but may also involve retrograde signals such as reactive species (46, 77). Our findings also highlight the need to develop an integrated approach that examines the role of CO2 and light, with respect to CO2 fixation, photoreceptors, and redox conditions on the regulation of photoprotection and to consider photoprotection in a broader context that includes various processes involved in managing the use and consequences of absorbing excess excitation". If you want to discuss photoprotection relationships with photoperception etc you should give more context, otherwise it is not easy to catch for people who are not familiar with this possible connection. The data of this manuscript do not show any experiments related to photoperception, yet and it has been mentioned in four times in the paper. In our opinion this does not fit in the discussion of this manuscript.<br /> - Data S2A, please replace “reaction names” by “enzyme names”.<br /> - Figures S1C to K, Figure S2C, Figure S4A to C, it is stated that the fluorescence is normalized to Fm, when it seems to be normalized to the maximum fluorescence reached during the experiment (highest Fm’ point). Please correct either the figures or the legend.<br /> - Figure S2B, it is stated that the statistical analyses are shown in the graph, though they appear to be missing.

      Maria Paola Puggioni and Aurélie Crepin  (Umeå University) - not prompted by a journal; this review was written within a preprint  journal club with input from group discussion including  Alizée Malnoë, Jingfang Hao, André Graça, Pierrick Bru, Jack Forsman.

    1. On 2021-10-28 09:34:38, user Peter Ellis wrote:

      What an ABSOLUTELY fascinating system! This paper blew my mind clean out my ears. Excellent work :-)

      I have only one quibble, relating to lines 329-333, i.e. the potential for conditional Y-linked drive.

      You show that it is possible for a Y-borne gene to favour transmission of the paternal X (and oppose transmission of the paternal Y) in matings between XY males and X*Y females. I think it would be worth pointing out that the paternal Y cannot be selected to drive against itself. Rather, in this case the maternal Y is being selected to drive against the paternal Y.

      In the case of the two-step pathway (b2'+3), a Y-borne drive modifier can only invade the population if it acts in X*Y females, not if it acts in XY males, because it is the maternal copy of the Y that is favoured by the drive in these matings - the paternal copy is disfavoured.

      The same applies to the one-step pathway b2. Even if a single Y-linked gene is responsible for both directions of conditional drive, if its only mode of action is by perturbing sperm function, then it will be rapidly selected to become an unconditional driver. It must therefore act in X*Y females as well.

      This means that conditional drive almost certainly has two separate mechanisms of action: one acting paternally, and the other acting maternally. This makes the two-step pathway much more likely than the one-step pathway, and may give some clue towards tracking down the mechanism of action - the proposed mandarin vole system in ref 11 (maternal Y acts via imprinting to inactivate an essential gene on the X*, so only embryos that inherit a paternal X can survive) is a beautifully elegant solution, and blew my mind for a second time in one evening.

      I personally think the most likely course of events is:

      1) Acquisition of unconditional Y-drive, acting paternally. <br /> We know that there is a paternally-acting sex ratio drive system in mus musculus, and some of the interacting partners (Sstx and Ssty) are also present in rat. So this is likely quite ancient. We also recently showed that the proximate mechanism for this is probably differential motility of X and Y-bearing sperm.<br /> https://pubmed.ncbi.nlm.nih...

      2) Appearance of a feminising X*, facilitated by the presence of Y drive

      3) Development or enhancement of compensation in X*Y females to improve fertility via polyovulation.<br /> In a transgenic system that eliminates male embryos in the peri-implantation, we show that there is some inherent compensation of litter size in mus musculus. So it seems some element of poly-ovulation may be common in rodents, allowing for a certain amount of pre-/peri-implantation attrition without reducing litter size. This seems like the sort of phenotype that could relatively easily be increased to allow greater levels of compensation.<br /> https://www.biorxiv.org/con...

      4) Development of conditional drive in which X*Y females drive against the paternal Y<br /> Once compensation is well established in step 3, the X*Y mothers have more scope to eliminate even more embryos prior to implantation and thus select only the ones they want.

      Mechanistically, all this can be most readily tested by IVF and/or embryo transplantation experiments - are these techniques established for mus minutoides yet?

      Once again, thanks for one of the most enjoyable papers I've read in a long time!

    1. On 2021-07-29 10:14:35, user Michael Coleman wrote:

      This is a really interesting article on a topic we tend to take for granted and then realise we (or at least I !) just hadn't thought about and certainly couldn't explain. Some mechanisms for microtubule polarity sorting in axons had been previously proposed but were recognised as being insufficient to fully explain the observations. Very nice original science with important implications for nervous system development, axon regeneration and neurodegenerative disease.

      Summary of findings

      Unlike dendrites, axons have microtubules that are almost all oriented with their growing (+) ends outwards. The mechanistic basis of this is not completely understood. Axonal microtubules are in a constant state of dynamic equilibrium, with their + ends growing but being subject to periodic ‘catastrophe’ that shortens them, either by dying back from their previously growing + ends or by severing them to create two 'daughter' microtubules, each with the potential for new growth. Unlike in other cell types, axonal microtubules are not attached to the centrosome but form a tiling array along the axon composed of individual microtubules from a few microns to over 100 microns in length (see work of Peter Baas and colleagues). Some kind of relationship between this dynamic equilibrium and selection of polarity appears likely but it has been unclear what that might be.

      To understand the mechanism, Jakobs et al used live imaging of microtubules in Drosophila axons in culture, labelled with EB1-GFP, which marks the growing tip. They find that during early axon growth in culture, microtubules with their + ends oriented distally have a growth advantage over those in the opposite orientation, so that over time + end-out becomes the dominant orientation.

      First, they show that each microtubule growth events is (on average) longer if the microtubule is further distal in a growing axon and if the microtubule is oriented + end out. The difference between + end out, and in, microtubules is more marked distally.

      Then, they measure the shrinkage distances in these same orientations and locations using double labelling of EB1 and tubulin. They use a mathematical model to show that + end-out oriented microtubules near growing tips have essentially unbounded growth (since the average growth event is longer in distance than the average shrinkage event), while in other locations and orientations average microtubule length stabilises because of the larger contribution of shrinkage events.

      Using two methods to disrupt microtubule polymerisation (nocodazole and increased osmolarity) they then confirmed the importance of this +/- growth difference in establishing unipolarity. They also hypothesised that microtubule growth promoting proteins locally synthesised at the axon tip, such as p150, would explain the longer growth cycles of +end out oriented microtubules there, and supported this hypothesis with p150 knockdown and dominant negative mutants. Again, removing the growth length differential also removed the orientation difference.

      Finally, they address the orientation imbalance in more proximal axon regions that is less easy to explain based on a p150 gradient. They propose a model in which dynein-mediated sliding of – end out orientated microtubules towards the cell body, and templating of new microtubules, essentially matching existing orientation bias, could explain these differences. No additional data are presented for this part but it clearly forms a new hypothesis for further testing.

      Implications

      Axonal transport deficits are an important driver of axon loss and neurological deficits. For example, mutations in the anterograde motor protein KIF5A are associated with hereditary spastic paraplegia, Charcot-Marie Tooth disease and ALS, all disorders of long axon degeneration in which distal regions are affected first. Toxic blockade of axonal transport, for example in vincristine neuropathy, is also an important cause of axon damage. This article sheds light on the basic mechanisms that establish, and presumably also maintain effective, directional axonal transport.

      Severe defects in this process of selection would be expected to result in failure of neuronal differentiation or axon growth. The likely phenotypic outcome of a severe defect would be embryonic lethality but partial defects could also occur and could therefore underlie disorders of axonal transport even if axons do initially form and carry out the process. Indeed, p150 mutations are associated with ALS. It would be really interesting to know how such mutations affect microtubule polarity and whether this underlies pathogenesis in these cases of ALS, or indeed in any other neurodegenerative disorders. It is challenging to address this in vivo, even in animal models, because of the requirement for live imaging of microtubule growth so I am not aware of any previous studies, but it is in principle an achievable aim now this mechanism has been identified.

      Limitations

      At present these findings are limited to Drosophila axons (seemingly dispersed starting from the entire CNS?) so it remains to be confirmed whether there are similar patterns in mammalian axons, and in different neuronal subtypes (e.g., CNS/PNS, motor/sensory, etc).

      Minor suggestions for improvement

      Just a presentational thing but in Fig 1E legend, would it be clearer to say ‘blue, right to left downwards’ than ‘blue, left to right upwards’ since these microtubule are in fact growing from right to left? Or probably the colour-coding explained in part D is already sufficient without this extra explanation?

      A bit more introduction to what is templating and sliding would be helpful.

      It would be just marginally easier to follow without the switch in axon orientation between Figs 1-3 and Fig 4. But this is a minor point that perhaps just keeps our reversal learning sharp anyway!

      Questions for the authors

      Superficially, it could be imagined that the more stable an axonal microtubule the better, since they are so crucial for axonal transport. Yet, this is clearly not the case, otherwise the state of dynamic equilibrium would not have evolved. Does this new model for selection of orientation shed any light on what that advantage of the dynamic equilibrium is?

      Studies of shrinkage events are so far limited to shrinkage from the distal end. Is there any contribution also from severing and how could that be measured?

      If +end-out microtubules at the distal end have unbounded growth what eventually stops them? Something must do this in the end because otherwise a mature axon would be clogged with lots of microtubules extending right up to the distal tip. Is this one of the functions of severing?

      In Fig 3b and c, there seems to be not only a decrease in + end-out growth distances but an increase in the growth of – end out microtubules. The same is true in Fig 3j and k when p150 is disrupted. Are these consistent observations and what could explain them? It would seem more likely that these interventions would disrupt microtubule growth regardless of orientation?

      To what extent do you think similar mechanisms may operate in mature axons, or is this phenomenon limited to axon growth stages? At the very least it seems likely that they also recur during axon regeneratio but in this context it would be very interesting to know if there are CNS/PNS differences in vertebrates given the difference in axon regeneration.

    1. On 2021-07-19 20:54:30, user stephens999 wrote:

      A Review of Zheng et al, Universal prediction of cell cycle position using transfer learning, by Matthew Stephens

      This paper provides a new approach (tricycle) for predicting the<br /> position of a cell in the cell cycle. The approach claims to work<br /> regardless of cell type, species and sequencing assay.<br /> There are several things to like about the paper. In particular,<br /> the tricycle method is very<br /> simple: i) compute the first two PCs on<br /> 500 annotated cell-cycle genes in a data set where cell cycle<br /> is the primary source of variation; ii) project<br /> any future observations to this 2-d embedding and compute<br /> the polar angle to predict its cell cycle<br /> position. Further, the empirical results are promising.<br /> At the same time I think the paper<br /> could be substantially improved by removing or<br /> reducing some of the less innovative parts, toning down some of the rhetoric,<br /> and focussing on the most convincing empirical results. My comments expand<br /> on these suggestions.

      Main comments:

      1. I found most of the material on PCA not to be<br /> especially novel or interesting. The use of PCA to determine cell cycle<br /> position has a long history (including many papers cited here),<br /> and existing mathematical results already go far beyond<br /> the analysis presented here. The behavior of PCA on cyclic phenomena<br /> is much more general than presented here, and does not rely on sinusoidal<br /> functions or "two distinct peaks" etc. Rather it stems<br /> from the result that cyclic phenomona lead to circulant covariance matrices,<br /> and all circulant matrices have the same eigenvectors:<br /> the columns of the discrete Fourier transform matrix. The result<br /> is that, when the covariance patterns primarily reflect cyclic phenomoena,<br /> the first two PCs will form a circle/ellipse.<br /> See Novembre and Stephens (2008) and references therein for further discussion.<br /> Figure 1 is useful for summarizing the method, but most of the other<br /> material could be condensed or removed and I think the paper would be improved because<br /> it would better focus on what is actually new and interesting, the tricycle<br /> method (currently not introduced until p6) and the empirical assessments of its performance.

      2. The paper left me asking myself this: what is the strongest empirical support that tricycle cell<br /> cycle assignments work in practice? To me, Fig 5 panels c and g are the most convincing, because they are quantitative<br /> comparisons with an alternative technology (and one that is often considered the<br /> "gold standard" in this area). I also liked the quantitative comparisons with other<br /> methods, and it seems some of those might<br /> be worth including in the main text. In contrast, the results in Fig 4 are not<br /> quantitative, and overall not that compelling. The top row<br /> of panels are kind of useful in demonstrating you get something like a circle.<br /> but we don't actually know that this corresponds to cell cycle from this picture<br /> (unless I misunderstood, the colors are inferred, not known).<br /> And looking at the mPancreas results one might be tempted to use (-3,0) as the<br /> center of the circle, which would change computation of polar angle quite a bit.<br /> Is there reason to think that sticking with (0,0) is better? If so, any idea why does<br /> the circle show this shift? (Similar issues arise, to a lesser extent, with HippNPC).<br /> The Top2A results are, on their own, too noisy to be convincing -- why not show R2 plots for<br /> all cell-cycle genes (which could be contrasted with non-cell-cycle genes, and also compared<br /> with other methods). And as far as I can<br /> see Fig 4c is, at best, only interesting once one is convinced that the cell cycle<br /> is being correctly inferred -- nothing here to say that the cell cycle inferences are accurate.<br /> To be clear, I'm not saying the method does not generalize well across<br /> data sets; I'm saying that the evidence for this needs to be more clearly presented.

      3. A less fundamental issue: I don't really think describing this as an example of "transfer learning"<br /> is helpful. Indeed it is not even clear to me it is accurate.<br /> For example, in the cited Pan et al 2008, they describe the transfer<br /> learning problem as follows: "In a transfer learning setting, some labeled data Dsrc are<br /> available in a source domain, while only unlabeled data Dtar<br /> are available in the target domain." That does not apply here - everything<br /> is based on unlabelled data.

      More generally, giving the approach a name like "transfer learning" seems to<br /> suggest that there is something going on to actually make this transfer<br /> from one dataset to another, or some deeper theoretical reason to think it should work<br /> -- but I don't believe either of these is true. You are just hoping<br /> that the PC weights learned in one (carefully chosen) data set will<br /> also work to capture cell cycle on other data sets.<br /> It isn't obvious in advance that this rather simple approach<br /> would work well, and the major contribution of the paper is to assess this<br /> empirically.

      1. The abstract is hyperbolic. "ubiquitous applicability of transfer learning";<br /> "can predict any cell's position in the cell cycle",<br /> "universally accurate", "eminently pertinent"...

      Minor:

      • p2 you introduce the term "cell cycle pseudotime" only to explain later that it is not really a time at all. Why not just go straight into "cell cycle position"<br /> or "cell cycle phase"? (Also, the term "wall time" may not be familiar to all readers?)

      -p5 left column: Figure 2d-> 2f?

      • p8, right column: is "superficial" the right word here?

      • Some of the loess fits (eg Fig 2 d-f; Fig 4 panel b, especially mHippNPC) don't look visually very good. Is this<br /> just an artifact of having 0s, whose density is impossible to see due to overplotting, or is loess over-smoothing? Might trend filtering, as used in Hsiao et al, work better?

      Refs:

      J Novembre and M Stephens. Interpreting principal component analyses of spatial population genetic variation.<br /> Nat Genet 40(5):646-649, May 2008A Review of Zheng et al, Universal prediction of cell cycle position using transfer learning, by Matthew Stephens

      This paper provides a new approach (tricycle) for predicting the<br /> position of a cell in the cell cycle. The approach claims to work<br /> regardless of cell type, species and sequencing assay.<br /> There are several things to like about the paper. In particular,<br /> the tricycle method is very<br /> simple: i) compute the first two PCs on<br /> 500 annotated cell-cycle genes in a data set where cell cycle<br /> is the primary source of variation; ii) project<br /> any future observations to this 2-d embedding and compute<br /> the polar angle to predict its cell cycle<br /> position. Further, the empirical results are promising.<br /> At the same time I think the paper<br /> could be substantially improved by removing or<br /> reducing some of the less innovative parts, toning down some of the rhetoric,<br /> and focussing on the most convincing empirical results. My comments expand<br /> on these suggestions.

      Main comments:

      1. I found most of the material on PCA not to be<br /> especially novel or interesting. The use of PCA to determine cell cycle<br /> position has a long history (including many papers cited here),<br /> and existing mathematical results already go far beyond<br /> the analysis presented here. The behavior of PCA on cyclic phenomena<br /> is much more general than presented here, and does not rely on sinusoidal<br /> functions or "two distinct peaks" etc. Rather it stems<br /> from the result that cyclic phenomona lead to circulant covariance matrices,<br /> and all circulant matrices have the same eigenvectors:<br /> the columns of the discrete Fourier transform matrix. The result<br /> is that, when the covariance patterns primarily reflect cyclic phenomoena,<br /> the first two PCs will form a circle/ellipse.<br /> See Novembre and Stephens (2008) and references therein for further discussion.<br /> Figure 1 is useful for summarizing the method, but most of the other<br /> material could be condensed or removed and I think the paper would be improved because<br /> it would better focus on what is actually new and interesting, the tricycle<br /> method (currently not introduced until p6) and the empirical assessments of its performance.

      2. The paper left me asking myself this: what is the strongest empirical support that tricycle cell<br /> cycle assignments work in practice? To me, Fig 5 panels c and g are the most convincing, because they are quantitative<br /> comparisons with an alternative technology (and one that is often considered the<br /> "gold standard" in this area). I also liked the quantitative comparisons with other<br /> methods, and it seems some of those might<br /> be worth including in the main text. In contrast, the results in Fig 4 are not<br /> quantitative, and overall not that compelling. The top row<br /> of panels are kind of useful in demonstrating you get something like a circle.<br /> but we don't actually know that this corresponds to cell cycle from this picture<br /> (unless I misunderstood, the colors are inferred, not known).<br /> And looking at the mPancreas results one might be tempted to use (-3,0) as the<br /> center of the circle, which would change computation of polar angle quite a bit.<br /> Is there reason to think that sticking with (0,0) is better? If so, any idea why does<br /> the circle show this shift? (Similar issues arise, to a lesser extent, with HippNPC).<br /> The Top2A results are, on their own, too noisy to be convincing -- why not show R2 plots for<br /> all cell-cycle genes (which could be contrasted with non-cell-cycle genes, and also compared<br /> with other methods). And as far as I can<br /> see Fig 4c is, at best, only interesting once one is convinced that the cell cycle<br /> is being correctly inferred -- nothing here to say that the cell cycle inferences are accurate.<br /> To be clear, I'm not saying the method does not generalize well across<br /> data sets; I'm saying that the evidence for this needs to be more clearly presented.

      3. A less fundamental issue: I don't really think describing this as an example of "transfer learning"<br /> is helpful. Indeed it is not even clear to me it is accurate.<br /> For example, in the cited Pan et al 2008, they describe the transfer<br /> learning problem as follows: "In a transfer learning setting, some labeled data Dsrc are<br /> available in a source domain, while only unlabeled data Dtar<br /> are available in the target domain." That does not apply here - everything<br /> is based on unlabelled data.

      More generally, giving the approach a name like "transfer learning" seems to<br /> suggest that there is something going on to actually make this transfer<br /> from one dataset to another, or some deeper theoretical reason to think it should work<br /> -- but I don't believe either of these is true. You are just hoping<br /> that the PC weights learned in one (carefully chosen) data set will<br /> also work to capture cell cycle on other data sets.<br /> It isn't obvious in advance that this rather simple approach<br /> would work well, and the major contribution of the paper is to assess this<br /> empirically.

      1. The abstract is hyperbolic. "ubiquitous applicability of transfer learning";<br /> "can predict any cell's position in the cell cycle",<br /> "universally accurate", "eminently pertinent"...

      Minor:

      • p2 you introduce the term "cell cycle pseudotime" only to explain later that it is not really a time at all. Why not just go straight into "cell cycle position"<br /> or "cell cycle phase"? (Also, the term "wall time" may not be familiar to all readers?)

      -p5 left column: Figure 2d-> 2f?

      • p8, right column: is "superficial" the right word here?

      • Some of the loess fits (eg Fig 2 d-f; Fig 4 panel b, especially mHippNPC) don't look visually very good. Is this<br /> just an artifact of having 0s, whose density is impossible to see due to overplotting, or is loess over-smoothing? Might trend filtering, as used in Hsiao et al, work better?

      Refs:

      J Novembre and M Stephens. Interpreting principal component analyses of spatial population genetic variation.<br /> Nat Genet 40(5):646-649, May 2008

    1. On 2021-07-16 14:56:45, user Claudiu Bandea wrote:

      Will Borgs Illuminate the Evolutionary Origin of Ancestral Viral Lineages?

      Borgs - another remarkable discovery by Banfield Lab that could illuminate the origin of ancestral viral lineages (1); the other discoveries I have in mind are the huge phages (2) and ARMAN/Thermoplasmatales inter-species connections (3).

      True to their data, Al-Shayeb et al. (1) seem, at least for a moment, to limit their speculations on the nature and evolutionary origin of Borgs to open questions: “Are they giant linear viruses or plasmids unlike anything previously reported? Alternatively, are they auxiliary chromosomes?” Then, to my big surprise, the authors, rather casually, write: “Perhaps they were once a sibling Methanoperedens lineage that underwent gene loss and established a symbiotic association within Methnoperedens …” (1). So, why is this a big surprise?

      Over the last four decades or so, I have been searching for data and observations that are consistent with, or support, the Fusion Hypothesis on the origin and nature of the ancestral or emerging viral linages (4-6). Although, it is clear that the extant viruses originated from other viruses, and there is compelling evidence that the endogenous viral elements, such as transposons and plasmids, originated from exogenous viral lineages, the evolutionary origin of the ancestral viral lineages has remained enigmatic.

      According to the Fusion Hypothesis, the ancestral viral lineages originated from parasitic cellular organisms, including endo- and ecto-parasites that, to increase their access to the resources present in their environmental niche (i.e. the host cell), fused their cell membrane with the host cell membrane, thereby losing their own cellular organization within the host cell. However, after synthesizing their proteins and other specific molecules and replicating their genome, these novel type of organisms induced the morphogenesis/differentiation of cell-like reproductive forms (i.e. virus particle, or virions), which started a new life cycle by fusing with new host cells. [Metaphorically, the Fusion Hypothesis places the ancestral viruses at the intersection of Hollywood and Greek ‘mythologies,’ in which 'viral Borgs' assimilate their hosts, and reemerge just like Phoenix. Factually, within the host cell, viruses, which have been historically and conceptually misidentified with the virions (4-9), are considered to be in the eclipse phase designated as “The time between infection by (or induction of) a bacteriophage, or other virus, and the appearance of mature virus within the cell”(10)].

      A fundamental premise of the Fusion Hypothesis is that only symbiotic/parasitic lineages that have a cellular and molecular composition, and processes compatible with those of their host cells (e.g. an archaeal lineage parasitizing another archaeal lineage) have the opportunity to evolve into a viral lineage (4-6); this implies that bacterial or archaeal lineages parasitizing eukaryotic host cells, for example, are unlikely to be able to evolve into viral lineages, regardless of the degree of their genome/proteome reduction (11). Another intriguing inference from this evolutionary model is that numerous cellular lineages evolved into viral lineages throughout the history of life, and that, remarkably, this process might still be active (5-6).

      The Fusion Hypothesis is a radical departure from the conventional thinking on the evolutionary origin and nature of ancestral viral lineages, including the historical reductive hypothesis, which lost its appeal more than half of century ago because it could not explain the gradual evolutionary transition from a cellular organisms to viruses (15), which have been conceptually misidentified with the virions and have been erroneously defined based on their physical, biochemical and biological properties (4-9). Perhaps no one has questioned the dogma of viruses as virus particles more explicitly, and in stronger terms, than Jean-Michel Claverie, one of the leading researchers in the field of giant viruses, who asked: “what if we have totally missed the true nature of (at least some) viruses?” (8). Claverie answered this intriguing question in a rather revealing way: identifying viruses with the virus particles, he wrote, might “be a case of ‘when the finger points to the stars, the fool looks at the finger.” (8).

      Nevertheless, likely, very few readers of this note are familiar with or even heard of these radical perspectives on the origin and nature of viruses. That might change, though, if the researchers realize that, as discussed next, these new perspectives might better explain the existing data and observations and might open new research venues and objectives for grant applications.

      Fortunately, there are only 2 broad ways of thinking about the evolution of viruses, and these paradigms could critically inform the hypotheses on the origin and nature of ancestral viral lineages: (i) viruses have evolved and diversified from simple to more complex entities by increasing the size of their genome/proteome/virions, or (ii) vice versa, they have diversified by reductive evolution. The first paradigm supports the hypothesis that the ancestral or incipient viral lineages were simple genetic entities, usually referred as ‘replicons’, which apparently preceded the cellular organisms at the dawn of life (13-14), and the second paradigm supports the hypothesis that the incipient viruses originated from more complex organisms as suggested in the Fusion Hypothesis.

      Because of the high rate of genome evolution and rampant sequence exchanges among various viruses and their hosts, the current sequence analyses cannot clearly differentiate between the two broad evolutionary pathways. Nevertheless, currently, the hypothesis that the complex viruses have evolved from simpler siblings dominates the literature and discussions in the field (e.g.13-14). This perception, though, is in stark contrast to the well-established fact that all intracellular parasitic or symbiotic microorganisms, which count into thousands of species, have evolved toward a smaller genome/proteome/cell size. Although, similar to their free-living ancestors or relatives, these parasitic and symbiotic cellular organisms do occasionally acquire new genetic material, there is overwhelming evidence that, overall, these species have experienced reductive evolution; and this principle apparently also applies to many free-living species. If this is indeed the case, why would viral lineages evolve in opposite direction? Without addressing this critical question, the dominance of the simple-to-complex hypothesis on the origin and evolution of viruses is questionable.

      Although, just like any symbiotic/parasitic cellular species, viruses can occasionally increase the size of their genome/proteome (the ‘accordion model’ on viral evolution) it is difficult to define the selective forces leading to the overall evolution of a parasitic organism towards complexity within an intracellular environment. Also, it would be difficult to envision the development of experimental approaches addressing the evolution of ‘replicons’ into simple and, eventually, into more complex viruses; interestingly, Howard Temin’s protovirus hypothesis on the origin of extracellular viruses from endogenous viruses (15) was abandoned when it became clear that the millions of endogenous viruses present in humans and other species originated from exogenous viral lineages, not vice versa.

      On the contrary, the Fusion Hypothesis on origin and diversification of viral lineages by reductive evolution is consistent with the life cycle of many viruses, which fuse with their host cells to start their intracellular development (4-6). Given the nature of their intracellular environment, which can provide basically unlimited resources, including ribosomes and other components of the metabolic and informational machineries, and considering the dominance of deleterious mutations over those beneficial, as well as the strong selection for increasing their reproductive rate, it is likely that, overall, viruses have experienced reductive evolution. And, very importantly, this reductive evolution is in line with that of all symbiotic and parasitic cellular species.

      Nevertheless, the huge advantage and appeal of the Fusion Hypothesis is that it can be addressed experimentally in the laboratory using various experimental models (5, 6). Even more thrilling is that, as I previously made the case (5), some parasitic/symbiotic cellular lineages are currently in the process of natural transition from a cellular to a viral type of biological organization. To realign this discussion with Al-Shayeb et al. study and intuition (1), it is likely indeed that the ancestor of the 'colorful Borg' was “a sibling Methanoperedens lineage that underwent gene loss and established a symbiotic association within Methnoperedens”, after fusing with it and losing its cellular organization. So are the Borgs viral lineages?

      To answer this question, we need to add a few more ‘dimensions’ to the Fusion Hypothesis. As I previously discussed (4-5), the paradigm behind this hypothesis is the ‘cellular fusion’ or ‘hybridization’ phenomena. In principle, two cellular organisms can interact and co-evolve in multiple ways: (i) one cell enters the other, keeps its individualizing membrane (i.e. cell-like structure), and integrates its symbiotic life style and life cycle in synchrony with those of the host cell, as has been the case with the mitochondria and chloroplasts lineages; (ii) a parasitic cellular organism enters its host cell, maintains its cellular structure, and after reproduction it leaves the host cell, which is a very common phenomenon; (iii) a parasitic cellular organism enters the host cell by a membrane fusion mechanism, synthesize its components using the host’s resources, and induce the assembly a cell-like progenies (i.e. virions) that leave the host cell and restart the viral life cycle by fusing with new host cells (iv) in an analogous case, a parasitic cellular organism enters the host cell by a membrane fusion mechanism, ‘assimilates’ the host cell, synthesize its components using the host’s resources and induce the host cell to divide and fuse with other cells, which is another putative viral type of biological organization; (v) and, finally, two related/compatible cellular organisms fuse with each other (i.e. hybridize), and integrate their metabolism and life cycle, generating a new hybrid organism; likely, this has been a very common phenomenon in the history of life, but because of the integration of the sibling partners, it is difficult to detect.

      It remains to be seen exactly in which group of biological organization and co-evolutionary pathway the Borgs and their apparent ‘partners,’ the Methanoperedens lineage, fall in, but the discovery of Borgs, and the mystery surrounding their nature and evolutionary origin, should stimulate the interest in developing experimental approaches for addressing the Fusion Hypothesis on the origin of viruses. Additionally, studding the fusion/hybridization of various cellular lineages should open new venues for studying cellular evolution and for dissecting various metabolic and information machineries.

      I think it is meaningful to end this note with the inspiring remarks by Jill Banfield (16), the senior author of the Al-Shayeb et al. (1) article:

      I repeat- I haven’t been this excited about a discovery since CRISPR. We found something enigmatic that, like CRISPR, is associated with microbial genomes. We have named these unique entities #BORGs.

      *Imagine a strange foreign entity, neither alive nor dead, that assimilates and shares important genes... A floating toolbox, likely full of blueprints, some that we may one day harness, like CRISPR… Wait- wouldn’t that just be a virus? a megaplasmid? a mini-chromosome? No… #BORGs are unique..<br /> .

      BORGs are huge, a third the size of their methane-eating hosts, they have assimilated many metabolism-relevant genes, and they have combinations of features not seen before... #BORGs are like turbo boosters for their host’s methane metabolism. This means they could have significant climate impacts...*

      This discovery started in deep mud and was brought to light by an analysis of around 10 billion DNA snippets. That such an approach could reveal something with potentially global ramifications!

      In 2021, I will again sit across the table from Jennifer Doudna (@doudnalab) and we will talk about how we might begin to explore the technological and environmental importance of this discovery...

      This may be an example of the type of basic, discovery-based science that can ultimately tackle the big problems that face our world, the type of discoveries that @elonmusk is seeking through his current 100M @xprize

      Basic science, starting with fieldwork and looking at what nature has invented, is important if we are to discover things that we could not imagine. This type of science deserves more funding. Without it, the world would not be meeting the #BORGs

      References:

      1. Al-Shayeb et al. 2021. Borgs are giant extrachromosomal elements with the potential to augment methane oxidation. bioRxiv: https://www.biorxiv.org/con... doi: https://doi.org/10.1101/202....
      2. Al-Shayeb et al. 2020. Clades of huge phage from across Earth’s ecosystems. bioRxiv: https://www.biorxiv.org/con... doi: https://doi.org/10.1101/572362.
      3. L.R. Comolli, J.F. Banfield, 2014. Inter-species interconnections in acid mine drainage microbial communities. Front Microbiol. 5:367.
      4. Bandea CI. 1983. A new theory on the origin and the nature of viruses. Journal of Theoretical Biology 105(4), 591-602.
      5. Bandea CI. 2009. The origin and evolution of viruses as molecular organisms. Nature Precedings: https://www.nature.com/arti...
      6. Bandea CI. 2019. Are Antarctic Nanohaloarchaeota Emerging Viral Lineages? PrePrints: https://www.preprints.org/m...
      7. Forterre P. 2010. Giant viruses: conflicts in revisiting the virus concept. Intervirology. 53:362-78.
      8. Claverie JM. 2006. Viruses take center stage in cellular evolution. Genome Biol. 7, 110.
      9. V. Racaniello, The virus and the virion. 2010. Virology Blog. http://www.virology.ws/2010...
      10. Definition of “Eclipse phase.” 2021. Biologyonline. https://www.biologyonline.c...
      11. Husnik et al. 2021. Bacterial and archaeal symbioses with protists. Current Biology. doi: 10.1016/j.cub.2021.05.049
      12. Luria SE and Darnell JE. 1967. General Virology. Wiley. New-York.
      13. Koonin et al. 2006. The ancient Virus World and evolution of cells. Biol Direct. 1-27
      14. Krupovic et al. 2019. Origin of viruses: primordial replicators recruiting capsids from hosts. Nat Rev Microbiol. 17(7):449-458.
      15. Temin HM. 1976. The DNA provirus hypothesis. Science. 192(4244):1075-80.
      16. Banfield J. 2021. Comments on the discovery of Borgs. https://twitter.com/banfiel... ; https://twitter.com/hashtag...
    1. On 2021-06-28 02:48:27, user Stephen Goldstein wrote:

      1. I think including an unrelated email to the SRA was unwise. It’s a reasonable inference from this that Chinese scientists somewhat broadly are involved in unscrupulous data handling and sharing practices. My understanding from others with respect to that specific email is that the data in question is back on the SRA, and the pangolin CoV sequences associated with that paper are available on GISAID. Implicating researchers unrelated to the Wang et. al. paper in this matter seems unfair. I don't think it serves a positive purpose but can have a negative connotation for Chinese researchers.

      2. It's of course true you recovered the raw data files and you do reference Wang et al preprint and paper. However, I think you need to acknowledge that Wang et. al. specifically describes the mutations assigning these sequences to lineages A and B and even reference the lineage split (called L and S at the time). So while the raw sequences are newly recovered, the key information gleaned from them was not concealed. Your response on twitter that the data are less useful for analysis purposes in a paper table is something you can bring up to still support your argument that this was underhanded (though I disagree about the strength of evidence for this). But I think currently the reader comes away thinking not only the raw data but the genetic diversity information associated with it was concealed and as you know, this is not the case.

      3. In general, it doesn't surprise me at all that the earliest sequences recovered might not actually represent the first infections. Since the outbreak didn't really catch attention until super-spreading at the Huanan market, almost all viruses preceding that went un-sampled. Uf the first human infections were in November as calculated (maybe at Huanan, maybe not, maybe there and somewhere else) then these viruses could not be the first sequenced examples and in fact none of the first sequences likely exist. So I don't think the discordance between the first reported sequences being more distant from the bat viruses is unusual, even if Lineage B is derived. I would argue it's actually expected. It may be particularly difficult to identify the first cases of a respiratory disease, often with unremarkable symptoms, then infections with a more unusual presentation.

      4. I agree A may be a better root than B, though the proper route may also be between them. However, the details of this particular rooting issue is somewhat beyond my phylogenetic expertise.

      5. It does not necessarily follow, however, that B is descended from A in humans. I think it's just as likely (or more for the below reasons) this split occurred in an intermediate host and represent independent spillovers. These sequences are from January, WA-1 is from January, there's one A virus from Dec (maybe?) in the WHO report. The existing evidence is therefore consistent with contemporaneous introduction of these lineages, rather than lineage A entering the human population first and B diverging from within lineage A diversity. Apparent intermediate sequences may result from early Illumina pipelines calling low coverage bases as Wuhan-1 (the reference) making it appear that some LinA sequences were LinA+a B mutation, though this requires additional study. There is precedent for diversity of SARSr-CoVs arising in an intermediate animal reservoir. Among four animal sequences of SARS-CoV sampled in spring 2003, they differed by 0 to 8 nucleotides in the spike gene, following several months of transmission among animals in wildlife markets, which were not shut down until the following winter.

      6. Given the above, the Huanan market, if it was a spillover site, is certainly not the only spillover site. The Lineage A virus in the WHO report was linked to an unnamed market and one beneficial outcome of your work highlighting these sequences would be if epidemiological data can be linked to these sequences. I believe Huanan is a plausible spillover site with subsequent human-to-human transmission for Lineage B. The limited infections in early December (and molecular clock analyses) point to perhaps a mid-late Nov introduction there with limited onward transmission for some time before super-spreading commenced.

      7. In terms of tone, I suggest sticking to the findings and staying away from assigning motive, in particular to individual researchers in undoubtedly difficult circumstances. The Chinese government has obviously been obfuscatory throughout this pandemic as with most things. Notably, the most well-documented obfuscation related to early stages of the epidemic was the denial to the WHO team that live mammals were sold at Huanan, which we now know to be untrue. Criticism of the Chinese government is therefore firmly within bounds. Based on the limited information available, I believe extreme caution with respect to criticizing the Wang et. al. authors is warranted.

      8. You obviously need to add something in response to the NIH statement about the data removal, and the revelation that eight other data sets were also removed from the SRA.

      -Stephen Goldstein, PhD

    1. On 2021-06-14 19:49:00, user Fraser Lab wrote:

      EDITORIAL COMMENTS

      Reviewers agree that this is an excellent showcase of state of the art native MS as applied to membrane proteins. The detection of a small drug bound in the complex with the membrane is an impressive technical achievement. There is some concern that these experiments may teach us more about the limitations of native MS than about AM2 function specifically; even in face of that concern, this manuscript is valuable. The key technical considerations that merit further caveats/discussion in the manuscript are:

      1) contrasting how insertion into detergent/nanodisc vs. translation and incorporation into “real” membranes might affect the results

      2) given differences in native mass spec and biases about certain oligomers flying, etc better - is there any orthogonal metric to use to calibrate how each oligomer might be biased or to calibrate the reproducibility<br /> - See especially this comment by Reviewer #3: The authors offer two interpretations of their data in the discussion: 1) that it is very challenging to capture the pure tetramer 2) that the oligomeric states of AM2 are more complex than previously thought. The former is unlikely to have any physiological relevance while the latter could have important implications for development of novel therapeutics. A third interpretation could also be that the oligomeric profile observed is a byproduct of the native MS technique utilized. This manuscript would be much more impactful if this study included experiments to differentiate between these possibilities.

      3) the concentration dependence (of AM2 and of detergents) of the results

      James Fraser (UCSF)

      Note: I solicited some reviews and am acting as an “editor” and authenticator of their expertise to preserve their anonymity. Happy to facilitate any interactions between authors, reviewers, or any other interested party.

      REVIEWER #1

      In this study, Townsend and colleagues utilize native-state mass spectrometry to characterize the oligomeric state distribution of matrix protein 2 from influenza A (AM2) in response to varying environmental conditions and pharmacologic agents. AM2 is a well characterized viroporin, which are small transmembrane proteins which oligomerize into ion-conducting channels during viral infection. Viroporins are clinically validated drug targets, and investigating the structural and mechanistic properties of viroporins is important for understanding their roles in the viral replication cycle and could aid future drug discovery.

      Most prior structural insights into AM2 have been obtained by X-ray crystallography or NMR. This manuscript adds to this structural investigation of AM2 by using native-state MS to investigate AM2 oligomeric states in the solution state and in nanodiscs, which could better reflect the physiologic membrane context. Their key findings are that 1) AM2 adopts a range of oligomeric states (monomers to hexamers) and 2) the distribution of these oligomers vary depending on environmental conditions (lipid composition, pH), small-molecule inhibitors, and mutations. The relative quantification of AM2 oligomer polydispersity is uniquely enabled by the authors’ use of native-state MS. This contrasts with the predominantly tetrameric state that has been appreciated from prior structural studies of AM2. The authors’ findings present a compelling case for investigators to employ careful experimental design and data interpretation when working on AM2/viroporins and other dynamic and oligomeric proteins. The implications of this polydispersity on AM2 function and viral replication remain unknown. Insights into the energetics and dynamics of interconversion of these oligomers, and application to other viroporin homologs are also areas for future investigation.

      The manuscript is written clearly and the researcher’s rationale and methods are described in detail. Specific comments are listed below:

      How were the equilibration time and temperature of the samples for native-state MS analysis chosen? These two parameters (among others) can have significant effects on the population distribution of oligomers observed.

      Page 5, first paragraph. “The precise oligomeric state distribution varied substantially between replicate measurements, indicating variable and relatively nonspecific oligomerization.”.

      Could the authors provide some context/examples on this variation between replicates? For most figures, a representative spectra or an average with error bars (with no individual data points noted) are presented.

      Could the authors comment what implications the observed replicate variability would have on their interpretations of AM2 polydispersity?

      Could the authors explain why they conclude that the oligomerization is driven by relatively non-specific interactions? Prior structures of AM2, at least of the tetramer, show a symmetric oligomer with specific contacts being made at the interface between the monomers to form a conducting pore. Would the authors expect the interactions in the non-tetrameric states to be similar to or different from those observed in the tetramer?

      Were oligomers/aggregates larger than hexamers observed?

      In Figure 4, the distribution with 0 uM AMT of WT AM2 solubilized in C8E4 appears quite different than in Figure S1 and in the Figure S9 QToF data. Could the authors comment on the reproducibility of these distributions?

      Monomeric AM2 appears to be very low or non-existent in detergent, but is present in nanodiscs. Could the authors comment on how the detergent vs nanodisc environment could be responsible for the observed differences?

      Did the authors investigate the dependence on the AM2 to nanodisc ratio on the oligomeric distribution of AM2?

      The authors suggest that the S31N mutant is unable to bind amantadine because it is locked in a predominantly non-binding pentameric state (based on Figure 4 data). However, in nanodiscs, the S31N mutant forms monomers/dimers/trimers but no larger oligomers. Could the authors comment on this observed difference in their data, and how the authors’ proposed mechanism of resistance relates to previous studies on the mechanism of the S31N mutant?

      Page 9: “Importantly, AM2 S31N nanodiscs did not show any mass defect shifts upon addition of amantadine, confirming specificity of drug binding.” Could the authors include this data, potentially in the supplementary file?

      REVIEWER #2

      The paper by the Marty group investigates by native MS of nanodiscs the oligomerization state and drug binding properties of the viral Matrix protein 2 from influenza A (AM2) at different chemical environments. Interestingly, AM2, which is thought to exist primarily as a tetramer, is shown in this study to be highly sensitive to the chemical environment and displays a distribution of assembly states, depending on pH and lipid composition. The findings that illuminate the polydispersity of Am2 provide new potential mechanisms of influenza physiology and pathology. The data is high quality and reproducible and the manuscript is well-written. I recommend addressing the points raised below.

      1) According to the materials and methods section, the protein was analyzed at a concentration of 50 μM (of the monomer?), which is quite high. Understandably, if a tetramer is expected, then higher amounts of the monomer are needed. However, since the protein appears in a range of assembly states, non-specific oligomerization should be ruled out.

      2) In the few cases in which dilution experiments were performed the extent of dilution is not indicated, i.e. what are the starting and end concentrations.

      3) The data in Figs. 4, S1-S6 and S9 is processed, can the authors provide representative raw spectra, so the quality of data can be estimated.

      4) The discussion section should be extended, with emphasis on the biological relevance of the results. Like what is the composition of the natural host membrane? How can polydispersity in assembly states benefit the influenza virus? and their similarity to the membranes tested. Does any of the tested conditions mimic the natural environment of the host membranes? Can any conclusions be drawn as to the endogenous assembly state of AM2 in the host cells? In a structural and chemical point of view what is the mechanism in which pH or lipid content affect assembly?

      5) AM2 is post-translationally modified. Can the author comment on this aspect and how do they think it affects the assembly state distribution?

      6) In Figs 4, S1, S2 and S3 the concentration of Am2 is not indicated.

      7) The mass defect analysis should be explained.

      8) Raw data of the IM-MS results shown in Fig. S6 should be provided.

      9) Theoretical and measured masses, including mass measurement errors should be added (also of drug binding). Perhaps in a table.<br /> 10) Figure 2, in panels E and F the y axis in the inset is distorted.

      11) What does the cartoon in figure 5 demonstrate?

      REVIEWER #3

      In Townsend et al. the authors utilized native mass spectrometry to characterize the oligomerization state of the influenza A M2 channel in different environments and found that in contrast to what has been previously reported, AM2 exists in multiple oligomeric states depending on pH, lipid composition, and presence of drug. Of note, this study utilizes native MS to measure drug binding to a membrane protein in an intact lipid bilayer, which is technically challenging. Although this is a novel application of native mass spectrometry, additional experiments are needed to provide convincing data that would support the main conclusion, namely that the oligomeric state of AM2 is actually more polydisperse than previously reported. This manuscript would be greatly improved by addressing the following questions:

      Major points:

      1.The authors offer two interpretations of their data in the discussion: 1) that it is very challenging to capture the pure tetramer 2) that the oligomeric states of AM2 are more complex than previously thought. The former is unlikely to have any physiological relevance while the latter could have important implications for development of novel therapeutics. A third interpretation could also be that the oligomeric profile observed is a byproduct of the native MS technique utilized. This manuscript would be much more impactful if this study included experiments to differentiate between these possibilities.

      1. The author's note that "There are several dozen X-ray or NMR structures of the AM2 TM domain in a variety of membrane mimetics, all depicting monodisperse homotetramers" yet most of their conditions do not replicate this finding. Could the authors please comment in more detail on how their conditions differ from the previously reported structural studies which indicate AM2 is present as a homotetramer? The authors mention that most studies used high concentrations of drug - are there other explanations as to why they observed high variability and complex instability where others did not? Do all the previous studies use drug to stabilize the complex? In cases where they did not use drug, what was different?

      2. The fact that the replicate measurements showed significant variation suggests that these results may be due to technical complications rather than truly reflecting distinct complex formation. Did the authors consider using a positive control - perhaps something else known to form a tetrameric complex of similar molecular weight for comparison? This would help build confidence that utilizing native MS for this application can provide reliable data.

      3. In figure 2 and S1, please provide intensity values associated with each condition. Larger complexes are harder to ionize and more likely to inadvertently dissociate in the gas phase. It is impossible to understand how well AM2 ionized in each of these conditions when it is presented as a percent of total. Have the authors considered creating covalently bonded versions of dimer, trimer, and tetramer AM2 to use as standards to accurately quantify the amount of each complex in each condition?

      4. In figure S2, as protein concentration increases, a shift towards higher molecular weight complexes is observed. Is it possible this is due to protein aggregation and unlikely to be observed in physiological conditions?

      5. The "orthogonal measurements confirm oligomeric sensitivity" section is confusing. What do the authors mean by oligomeric sensitivity? It is also unclear how the SEC data supports the authors' claims about the oligomeric state of AM2.

      6. Please explain the statement "very small signals for bound drug were observed". Does this refer to the signal from AM2 or from the drug itself or for drug bound to AM2?

      Minor:

      1. Could the authors please comment on why the select conditions were chosen for figure 2? Supplemental figure 1 is more informative and is worth including in the main figures. Similar question for the other figures where parital datasets are shown in the main text.

      2. Please clarify the concentration of AM2 used in Figures 1, 2, 3, 4 and S1 and S3.

      3. Clarify which detergent was used in figure S9.

      REVIEWER #4

      The authors of this manuscript explore the effects of detergents, drugs, pH, and lipids on the oligomerization state of a well-studied viroporin from the influenza A virus, the M2 channel. Using native mass spectrometry as their main approach, the authors show that pH and the chemical nature of the membrane or membrane mimetic influence the observed polydispersity of M2. While native mass spectrometry captures a distribution of oligomeric states that was not seen in previous analytical studies, the question, ultimately, is whether this polydispersity is physiologically relevant or whether it highlights the need for rigorous testing and vetting of membrane mimetics for structural and functional studies.<br /> In the initial detergent study, the authors investigate how various detergents affect oligomerization of the channel at different pH. They show that certain detergents favor different oligomeric states over others and capture an array of states in the detergents tested. They then show that the binding of drug to the WT shifts the observed population distribution to favor the tetramer. They repeat these experiments with the S31N mutant, which forms pentameric assemblies in the given conditions.<br /> To see the effects of lipid bilayers on the oligomerization state of M2, they assembled M2-incorporated nanodiscs. They show that choice of lipid composition of the nanodiscs is crucial to the observed distribution of states with DPPC being the lipid that favors the homotetramer. Moreover, they show that they are able to detect mass defect shifts from drug binding, corroborating earlier work in the field. The authors repeat the nanodisc studies with the S31N mutant. From their lipid studies with and without drug, they again rationalize that the drug-resistance of the mutant to amantadine and rimantidine may arise from the formation of small oligomers that preclude binding.<br /> The big question is whether these newly observed states are physiologically relevant or whether they’re an artifact of the physicochemical nature of the local environment. Overall, the authors clearly show that the assembly of M2 is sensitive to its chemical environment, and from their data, seem to suggest that the observed polydispersity reflects the true distribution of states in the physiological context. The data showing the polydispersity is very convincing and serve as a reminder that the choice in membrane mimetics plays a critical role in determining which oligomeric state, whether functional or otherwise, is favored. However, if the point is that these non-tetrameric states have some biological or channel function, then the authors bear the burden of proof.

      Major Comments:

      • Why are the lipid nanodisc experiments only done at pH 7.4 and not other pH? In the detergent study, we clearly see a change in the oligomerization state brought on by a change in pH, and the authors speculate that the change in pH in the endosome could change the oligomerization state to higher order oligomers, so why is there no pH-dependent study of M2 in nanodiscs?

      • There have been several studies that look at the effects of a completely different set of detergents on the conformational landscape of the channel using solution NMR (Thomaston et al. JACS 2019) or different lipids using solid state NMR (Mandala et al. JMB, 2017): how does this study compare to these results? If the authors do the detergent study with solution state NMR, would they see evidence for polydispersity? Similarly, if the authors do these same native MS experiments using the detergents and/or lipids discussed in these two manuscripts, would they see polydispersity or do these conditions favor the exclusive formation of the homotetramer? The choices for lipids/detergents are orthogonal to what has been published in the literature, so a couple of experiments with the same sample conditions (i.e. lipid/detergent and pH) would be insightful as to whether the previous conditions just happen to favor the homotetramer.

      • In the amantadine-binding study of the WT and S31N in detergent micelles, the authors noted no major changes to the oligomeric state distribution for the mutant and conclude that the absence of a shift is indicative of lack of drug binding. They also suggest that the known drug resistance of the S31N variant arises because this mutant is locked into a novel pentameric state that is impervious to drug-binding. While this is an interesting hypothesis, their MS data does not prove that the drug is not binding. Moreover, they note that even in their WT samples, which show clear shifts, there is a lack of signal from the bound drug in their MS results, so how can the authors make the claim that S31N is not binding the drug? A similar comment can be made about the S31N nanodisc study, although the experimental evidence for drug-binding in the WT lends more support to this conclusion than the one made in the detergent study.

      Minor Comments:<br /> - Can the authors rule out effects from the varying peptide:detergent ratios? Each of these samples was run at 2x CMC (seemingly standard in the native MS field) with a constant monomer concentration of 50 uM, which works out to very different peptide:detergent ratios. At the same peptide:detergent ratios, how do the distributions compare to each other?

      • Since the higher order oligomers (i.e. hexamers) in LDAO seem stable, could they potentially crosslink these samples to get a low-resolution structure of the hexamer?

      • Is there polydispersity evident in other detergents for S31N?

      • Previous studies (Ref #35 in this manuscript, for example) which look at the oligomerization of M2 using analytical ultracentrifugation used dodecylphosphocholine (DPC) micelles as the membrane mimetic. Using this particular detergent, the authors of the JMB publication showed that the monomer-tetramer equilibrium was cooperative in the presence and absence of the drug amantadine. Is there a reason why DPC was not used in this study? It would be interesting to see what distribution of states this technique captures in the detergent primarily used for the classical analytical ultracentrifugation experiments.

      • Can the authors comment on why the drug-binding studies were only done in C8E4 detergent? How does the drug affect the distributions of the oligomers in other detergents? Would the larger hexamer observed in LDAO also bind the drug?

      • The authors comment that the thickness and fluidity of the membrane is known to modulate M2 activity and suggest that these changes are due to a shift in the observed population of states in their discussion. Functional studies (i.e. liposomal proton flux assays) in the various lipids tested would be helpful to drive this point home. I would like to see how the activity of M2 changes in these lipids and how it relates to the distribuition of states observed in the native MS.

      • The authors commented on the bilayer thickness/saturation of DPPC as a potential reason for the tetrameric specificity of M2 in these conditions. Similar speculation into the chemical or physical properties of the detergents that give rise to the observed oligomeric distributions would be welcome.

      • Figures

      o Figure 2: Since the main take-home message from the figure is the deconvolved mass spectra, which clearly illustrate the polydispersity of the sample, it may help to flip the inset and the mass spectra or move the mass spectra to the supplemental. To someone who isn’t in the field of native MS, the representative mass spectra are distracting and detract from conclusions illustrated in the deconvolved spectra.

      o Figure 3: A similar comment to the remarks made in Figure 2 can be made for this figure as well.

      o Figure 5: Is there a reason for the exclusion of S31N data? Since the drug-binding can be clearly seen in the corresponding WT samples, it would be better to swap out one of the WT-AMT figures (since they both are very similar) for one that shows the S31N with the drug even if no clear mass defect shift is seen. The two concentrations of AMT binding to WT is probably meant to show

    1. On 2021-06-04 16:07:21, user Andrew Alamban wrote:

      “A biosensor to gauge protein homeostasis resilience differences in the nucleus compared to cytosol of mammalian cells”. Raeburn et al.<br /> doi: https://doi.org/10.1101/202...<br /> Reviewed by Andrew Alamban* and Linh Tram*<br /> *University of California San Francisco

      Summary:<br /> In the cell, there is an extensive network of protein quality control machinery that maintains protein homeostasis. A disruption in this network may lead to protein aggregation, which is a hallmark for many neurodegenerative diseases. This has prompted a need to develop a biosensor that can measure chaperone activity in the cell, which the authors have done in their previous work (Wood, R. et al. 2018. Nat. Comm.). One way to gauge chaperone activity is to measure their ability to bind unfolded proteins, also known as “holdase” activity, to prevent aggregation. We found that the authors give a helpful explanation of how their previously-designed biosensor works, reducing the need for the reader to reference the previous publication.

      In this manuscript, the authors improve upon this tool to include nuclear localization or export sequences (NLS or NES, respectively) to probe protein homeostasis in the nucleus or cytosol, respectively. This control of biosensor localization is very impressive. Using this new capability, they show that 1) holdase activity in the cytosol is more abundant than in the nucleus and 2) imbalance in protein homeostasis - by co-expressing the huntingtin exon 1 mutant - can reallocate chaperone supply in different areas of the cell.

      A long-standing view in the proteostasis field is that the quality control machinery is more abundant in the cytosol. Their new biosensor supports this view by showing that there is more holdase activity detected in the cytosol than in the nucleus.

      The authors show that Huntingtin (Htt) inclusions can affect the fluorescence analysis of cells via flow cytometry. We appreciate that they addressed this limitation of their biosensor. They propose a workaround by measuring FRET using microscopy instead of their flow cytometry method. Using this workaround, the authors find evidence that the cell can reallocate quality control machinery between the cytosol and the nucleus.

      By adding the NLS or NES, the authors have extended the biosensor’s capability to answer more questions about proteostasis. While the constructed biosensor only used barnase, which binds to Hsp70 and Hsp40 family chaperones, as the model protein, the scheme suggests a potential to expand the scope of the biosensor by using different model proteins that bind other quality control proteins beyond Hsp70 and Hsp40 families.

      Major Points:

      The authors modify their previously-developed biosensor to restrict its localization to either the cytosol or nucleus using an NES or NLS, respectively. Because protein folding is essential to the biosensor’s function, the authors validate these new modifications by measuring protein stability via urea denaturation of the wild type* (WT*) barnase. The authors only perform the validation for the WT* but not for the mutants. Could the differences in the lower slope gradient observed in Figure 2B for the mutants be due to the NES or NLS affecting the mutant barnase stability differently than WT*?

      There seems to be a discrepancy between data from Fig 2B and Fig 3B. Fig 2B shows that there is a lower slope gradient in the cytosol than in the nucleus. Looking more closely at Fig 3B, it almost looks like the nucleus has a lower slope gradient than the cytosol. This contradicts the conclusion from Fig 2B that the cytosol has more holdase activity when in Fig 3B, it looks like the nucleus has more. How could these differences be reconciled?

      Minor Points:

      Introduction:<br /> In page 1 line 57, the abundance of unfolded-like barnase is not detected by FRET but rather by the absence of FRET

      Duplicate citations on refs. 5 and 6

      In the paragraph that starts on page 1/line 59, I was able to understand the motivation for creating the biosensor. However, the authors go on to explain that they added localization sequences without motivating a reason for why the comparison between the nucleus and the cytosol is important. The authors have this information in their discussion (line 226-227) and a brief mention of this in the introduction would help motivate the study.

      Methods:<br /> In line 270, it was unclear to me what it means to “decouple the expression of the two plasmids”. More detail in this section would also help in the ease of reproducibility of the work.

      Catalog numbers should be included for all materials

      In line 280, there’s a typo. “Ovine” should be “bovine”

      We like that the authors provide scripts alongside the example datasets for their image analysis. This aids in reproducibility

      Figure 1:<br /> Labeling style for Fig 1A could benefit from the cartoon in the style of the (Wood, R. et al. 2018. Nat. Comm. Fig. 1), where the conformations of the bait, as well as the other proteins, were explicitly shown

      We are curious about how the linker control was designed since the linker control was not introduced in the initial biosensor paper (Wood, R. et al. 2018. Nat. Comm.) Which factors determine the linker control’s length and its amino acid sequence?

      It looks like both localization sequences (NLS and NES) are appended to the biosensor in Fig 1A, which contrasts with what described in the Results.

      Fig 1D, unclear that they didn’t label “D” and “A”

      Mentioning that urea acts as a denaturing agent would be helpful, especially to newcomers unfamiliar with the assay

      Figure 2:<br /> Figure 2C: How was the percentage of cells with aggregates calculated? Legend of the figure suggest that the percentage is derived from the ratio (upper slope)/(lower slope)

      In Line 113-114, the authors observed that the I25A I96G mutant was potentially outside of the detected dynamic range of the biosensor. However, the I25A,I96G mutant was still used in subsequent experiments without providing further explanations.

      Figure 3:<br /> Fig 1 and 3, using the same color for the Hoechst dye would help better with continuity across figures

      What drove the choice for using the Y66L Emerald as the transfection control rather than an empty vector?

      Figure 4:<br /> It would be useful to see a color map for the FRET map on the side to get a better idea of the range

      What does white or red arrow mean in figure 4? We think that the white arrow indicates inclusion-targeted and the red arrow indicates diffuse-targeted

      The signal that the white arrow is referring to in Figure 4A for the nucleus is barely visible

      In Figure 4, would it be possible to use different line styles for the WT* and the mutants?

      In Figure 4, WT should be labeled as WT*

    1. On 2021-05-27 17:09:25, user Allan Konopka wrote:

      I found this work via Antonia Fernandez-Garcia’s blog post from summer 2020, and thought it very intriguing. As I have a deep interest in physiological microbial ecology, I have wondered for some time now “whither metagenomics?” and this approach that categorizes GC’s by their “knownness” is helpful. Muren asked me to make further comment on a tweet (https://twitter.com/Hamatsa... <br /> here, to hopefully start a conversation.

      So first, what is the objective of applying metagenomics? Sometimes stated (at least in grant proposals) is to “develop a predictive understanding of microbial communities.” But this implies knowing the function of the relevant gene products in adequate specificity (i.e., what specific biochemical function they carry out). We could all come up with lists of important functions, but let me identify 3 which I think are particularly problematic re: the databases of information.

      1. Premise: the instantaneous activity rates of microbes are limited by the fluxes of an essential resource (for chemoheterotrophic bacteria, this is most often the diversity and concentrations of organic energy substrates)<br /> Inference: the breadth, levels of expression, and biochemical affinities of specific transport proteins are critical to understand interspecific competition in natural habitats.<br /> Problem: inadequate specificity – if “known” as (for example) an ABC transporter, this isn’t helpful in predicting in which cases a microbe has a selective advantage. [please correct me if there is recent work that improves this issue]

      2. Premise: microbes/microbial communities rarely (if at all) exist in steady-state conditions. Rather, there are both regular and stochastic environmental perturbations to which organisms may evolve different strategies in response. [Side note: my fav paper on this is Nature’s Pulsing Paradigm, Estuaries 18: 547-555 (1995) by the three Odum brothers. Although about estuaries, easy to think how it applies to other systems and down to microscale.]<br /> Inference: Genes for regulation will be key here. <br /> Problem: I haven’t found much metagenomics work that addresses these regulatory proteins [please correct if necessary, as I have not done an exhaustive search of literature]. Likely (?) similar problem to transporters – motifs identifiable, but specificity of binding site unknown.<br /> Although most genome-scale simulation models (generally of one organism) generate a steady-state solution (and hence less useful ecologically), one can apply heuristics to simulate what you think you know re regulation (but this is outside metagenomics itself)

      3. Premise: The extreme end of the “Pulsing Paradigm” are microbes in highly spatially structured habitats (soils, deep sediments, etc) in which the resource pulses are temporally rare<br /> Inference: evolutionary strategies that favor low/very low rates of metabolism (dormancy) better than the “optimistic” one high macromolecular content in terms of maintaining viability until the next pulse<br /> Problem: relatively weak understanding by microbial physiologists of dormancy (going beyond endospores)

    1. On 2021-04-14 01:10:41, user stephens999 wrote:

      A review of Chris Wallace's preprint "A more accurate method for colocalisation analysis allowing for multiple causal variants", by Matthew Stephens

      Summary

      This paper introduces an extension of the "coloc" method for colocalization<br /> to deal with multiple causal variants in a region. This extension exploits a<br /> recently-introduced method for fine mapping (SuSiE). The extension is<br /> attractive in its simplicity, and simulations show it to perform better than some<br /> alternative approaches. The paper also suggests a way to speed up computations<br /> by pre-filtering out "non-significant" SNPs.

      The key idea of combining SuSiE and coloc is nice, and I think that with<br /> some improvements to the presentation will make a nice publishable contribution.

      The idea of speeding up SuSiE by pre-filtering SNPs is also attractive from<br /> a practical point of view, but it has some potential downsides that I feel<br /> are not sufficiently emphasized and explored (even though the manuscript does end<br /> with a statement that trimming might be not beneficial in general final mapping).<br /> Specifically trimming out non-significant SNPs<br /> could increase the potential for false positive identifications,<br /> and indeed such a result has been previously reported in<br /> https://www.biorxiv.org/con...<br /> (their Figure S7). It's not clear to me how, if at all, this is reflected in the results<br /> shown here. Maybe it is simply the case that, as the paper suggests in the discussion,<br /> that "Coloc benefits from comparing posterior probabilities across... two traits".<br /> But the overall way that the manuscript deals with false positive (or indeed<br /> false negative) identifications<br /> is not clear. (Maybe methods are applied with some<br /> knowledge of the true number of causal effects? It isn't clear to me.)<br /> Since there are also other potential ways to speed up computation (see comments below)<br /> I am not really convinced that the pre-filtering approach is really the way to go,<br /> and would like to see at least a stronger assessment of the potential downsides.

      Main Comments

      1. The presentation of the method requires more details, including more precise<br /> equations showing how quantities computed by SuSiE are used/combined. For<br /> example you could introduce $\alpha_{lj}$ for the matrix of posterior probabilities output by susie<br /> and then give explicit expressions for the Bayes Factors being computed<br /> ($BF_{lj}$) in terms of $\alpha_{lj}$. I'm not sure what $P_0$ is (is it something output by SuSiE?)<br /> Is $\pi=1/p$ where p is the number of SNPs in the region, or something else? How<br /> do you set the maximum number of effects in SuSiE (L in the SuSiE paper)? Do you get SuSiE to<br /> estimate the number of effects by estimating the prior variance, or do fix the prior variance?<br /> If $L_g$ is the number of effects identified by SuSiE in the GWAS and $L_e$ the<br /> number identified by SuSiE in the eQTL study, do you end up running coloc $L_g * L_e$ times?<br /> (as suggested by "for every pair of regressions across traits" on p3).<br /> How do you combine/summarise the results from all these different runs of coloc?

      2. Presentation of colocalization results also needs more details. Can you say explicitly<br /> what is an "AA" or "BB" comparison and an "AB-like signal"? From the description on p3 I<br /> thought the simulations would include settings where there were 2 causal variants in each trait,<br /> but no sharing. But Fig 3 seems to suggest<br /> only a small portion of potential configurations of up to 2 signals in each trait are actually<br /> included - is that right? (why?) And in Fig 3, what happens if SuSiE finds a signal in one trait<br /> and not in the other - what comparison do you make? (Or do you force SuSiE to find the right<br /> number of effects in each trait by fixing L to the true value? If so, is that cheating?)<br /> Is the smaller height of the AA bar for susie_0 compared with other methods -- and indeed<br /> the slightly smaller height of all bars -- something to be<br /> concerned about? Are all methods equally applicable if (as is always the case) you do not know<br /> the true number of causal signals in each trait?

      3. Figure 1 compares only the PIPs at causal variants. Since in practice we don't know the<br /> causal variants, one should also care about PIPs at non-causal variants. Is there a tendency<br /> for SuSiE to inflate PIPs at non-causal variants when trimming?

      4. It seems there are many potential ways to improve computation than<br /> filtering out non-significant SNPs, and many of them may ultimately be better choices<br /> (although filtering is obviously very simple to implement!) I don't think the discussion<br /> in the paper really adequately reflects the options available or the many<br /> issues involved.

      Although I did not see it explicitly said anywhere, I believe the<br /> paper is using the susie_rss function for applying SuSiE to summary data.<br /> The details of this function are not included in the original SuSiE publication, but I believe<br /> that at the time this work was done susie_rss<br /> worked by performing an initial eigendecomposition of the reference LD matrix R, which<br /> makes it possible to convert the summary data into "transformed data" to which<br /> regular SuSiE can be applied. This approach is appealing from a software engineering<br /> point of view, but not necessarily the most efficient, computationally. The eigendecomposition<br /> of R is quite expensive, being O(p^3) where p is the number of SNPs.<br /> The subsequent application of SuSiE<br /> to the transformed data is O(p^2) per iteration.<br /> Thus if p is sufficiently large the eigendecomposition step will likely<br /> dominate the susie_rss computation (and Figure 2 does indeed suggest computation maybe<br /> increase something like p^3?)

      One way to reduce computational complexity would therefore be to avoid the eigendecomposition<br /> step, and we are currently actively exploring these in our development of susie_rss. <br /> However, note that computing R itself is already<br /> an O(np^2) operation, where $n$ is the number of samples in the reference sample used to compute R. So<br /> if n is big then this computation (which is basically considered free<br /> in this paper since R is precomputed) could be the dominant computational cost. Alternatively<br /> if n<<p, then="" one="" should="" perhaps="" entirely="" avoid="" forming="" r="" --="" in="" the="" case="" n<<p="" an="" eigendecomposition="" of="" r="" can="" be="" obtained="" by="" doing="" an="" svd="" of="" the="" reference="" genotypes="" (o(n^2p))="" which="" will="" cheaper="" than="" forming="" r="" (o(np^2))="" when="" n<<p.="" in="" the="" future="" it="" seems="" quite="" likely="" that="" pre-computed="" r="" and="" eigen(r)="" could="" be="" made="" available="" for="" some="" large="" panels,="" avoiding="" the="" need="" for="" each="" user="" to="" compute="" them.="" once="" these="" pre-computations="" are="" done="" there="" may="" no="" longer="" be="" any="" need="" to="" filter="" snps.="" other="" comments="" details="" -="" p3="" although="" the="" number="" of="" potential="" models="" increases="" exponentially,="" susie="" computation="" does="" not="" increase="" exponentially.="" -="" p4:="" "we="" labelled="" each="" comparisons="" considered...."="" i="" did="" not="" understand="" this="" sentence.="" -="" p4:="" "...="" having="" strongest="" posterior="" support="" for="" h\_4"="" -="" this="" should="" be="" h\_3?="" -="" p8:="" "="" this="" does="" apply="" to="" single="" trait"="" -="" missing="" \*not\*?="" -="" in="" the="" second="" row-set="" of="" figure="" 3,="" is="" the="" figure="" on="" the="" lhs="" wrong?="" (the="" methods="" suggest="" colocalization="" but="" the="" figure="" shows="" no="" shared="" variant...)="" -="" on="" p7="" the="" r2="" threshold="" is="" 0.8="" but="" on="" p4="" it="" is="" 0.5.="" are="" there="" referring="" to="" different="" thresholds?="">

    1. On 2021-04-12 20:52:59, user Alexis Germán Murillo Carrasco wrote:

      Dear authors,

      First of all, I would like to thank all of you for your invaluable effort to improve Peruvian scientific research. To continue this effort, I would like to adequate some points in your pre-print.

      There is interesting the use of Syrian hamsters as a study model. It was announced by various articles mentioning similarities between Syrian hamsters and humans on COVID-19 disease. The response to SARS-CoV-2 infection of these animals is usually increased in aged (instead of young) individuals, as happens in humans. In the methods section, you described the use of 4-5 weeks-old Golden Syrian hamsters. Therefore I believe that the age of these animals could influence the interpretation of histopathological results. I would suggest your review published data (and discussion) on PMC7412213 and PMID32571934.

      About your challenge experiment, I felt a lack of scientific rationale to determine the proper doses of vaccine candidates that were applied on animals. In Figure 9A, I would hope to see higher levels (above 80%) of viral isolate for all cases in 2 dpi. Can you explain a bit more possible reasons for this situation? Also, I think it would be interesting to see a statistical comparison between 2-5-10 dpi at least for the most important candidate in your proposal (rLS1-S1-F).

      In the text, you wrote: "This is consistent with previous studies, which reported that viral load is reduced to undetectable levels by 8 days after infection in the hamster animal model". Today we know that viral load is detectable up to 14 days after infection in Syrian hamsters. I think different factors (as the age and sex of these animals) would intermediate this fluctuation. Probably, you should update this information on your preprint, especially on the discussion.

      You also wrote: "Being lyophilized, this vaccine candidate is very stable and can be stored for several months at 4-8⁰C". However, I think there is not sufficient evidence to say this by your western blot with products stored up to 50 days. You could attach results of the biological effect of previously-stored vaccine candidates. Also, you may consider testing candidate vaccines stored for more than 2 months. In a general view, I suggest showing more technical details, such as information about qPCR efficiency curves (or efficiency ranges) for all studied genes.

      Finally, I kindly hope these comments can improve your high-quality work and stimulate further studies in Peru. I look forward to your next version (or published article). Please share it with me when it comes out.

      Best regards,

      Alexis M.

    1. On 2021-04-05 00:48:31, user Pablo Jenik wrote:

      This is nice work, a nice contribution to our understanding of petal morphogenesis. But I'm biased towards mosaic work! I take slight issue with the characterization of our older work: "In Arabidopsis that has simple and unfused petals, petal shape and size were never fully restored when AP3 was expressed in one cell layer only (Jenik and Irish, 2001)". Although we showed that full size required the cooperation of both layers, the L1 did appear to control organ shape in Arabidopsis. I think this is relevant because, although the authors focus mostly on growth (size), it is clear that wico (L1) flowers also have the right shape of the limbs, similar to the results in Arabidopsis. I can't tell from the pictures whether the tube shape (not size) in wico is abnormal or not, but it may be good to expand the discussion about the distinction between growth (size) and shape. I also found it thought provoking that, while in Arabidopsis cell fate (epidermal and subepidermal) is clearly cell autonomous (from our work), here it depends on which layer is wild type and the position in the petal. Different signaling or, as they mention, some protein movement in one species but not the other? Interesting!

    1. On 2021-03-25 14:02:08, user Magnus Kjaergaard wrote:

      Response to eLife reviewer 3. Our answer in italics.

      Reviewer #3 (General assessment and major comments (Required)):

      In this manuscript by Hansen et al., the authors describe three low (3.0 to 4.0 Å) resolution crystal structures of Ca2+-ATPase from Listeria, a gram positive bacterium. Two are crystal structures of wild type protein with B eF3- and AlF4- in the absence of Ca2+, thus, likely to represent the E2P ground state and E2~P transition state. The third one is a structure of a G4 mutant, in which 4 Gly residues are inserted into the A-domain -M1 linker, with BeF3- and Ca2+-present in crystallisation, designed to capture the E2P[Ca2+] state. Authors state, however, the three structures are virtually the same and that the E2·BeF3- crystal structure represents a state just prior to ("primed for") dephosphorylation. They also propose that proton counter transport "mechanism" is different from that of SERCA.

      ===== <br /> As Listeria Ca2+-ATPase has been studied by a single molecule FRET, its crystal structures will certainly contribute to our understanding of ion pumping. Furthermore, different from SERCA, Listeria Ca2+-ATPase transports only one Ca2+ per ATP hydrolysed. Therefore, how site I is managed is an interesting topic, although lets not forget the same 1:1 stoichiometry is observed with plasma membrane Ca2+-ATPase (PMCA), for which an EM structure appeared in 2018 (ref. 9). The authors indeed find that the Arg795 side chain extends into binding site I. This part is solid and a more elaborate (and interesting) discussion could be made than what is currently described.

      Another solid finding is that the two E2·BeF3- crystal structures are similar to the E2·AlF4- crystal structure, although how similar is unclear as a structural superimposition reporting an RMSD is not provided and the presented figure makes it difficult to judge directly; the structures are viewed from almost one direction, which makes it unfeasible to discern the differences in M1 and M2 and in the horizontal rotation of the A-domain. Two or three structures are superimposed, but with cylinders and again viewed from only one direction. As the authors designate that the structures represent H+ occluded states, it is important to clearly show the extracellular gate is really closed to H+ (not only to Ca2+ as well). For completeness, they should also examine the effect of crystal packing on the A-domain position. <br /> A new view of the structures after a 90-degree rotation has been added to Figure S2 and 6 to make it easier to judge domain orientation. Additionally, we have added a new supplementary table S2 containing RMSDs for pairwise alignments of LMCA1 and SERCA structure.<br /> A new supplementary figure S3 has been added, which shows crystal packing of the A domain in the three structures. The packing differs between G4 and WT structures. As the contacts are on the outer surface of the headpiece, we think it is unlikely that they affect any of the structural interpretation in the manuscript, but we have added the following sentence to the discussion of the headpiece orientation: <br /> “The A domain makes different crystal contacts in WT and G4 structures (Figure S3), so changes in the domain orientation should be interpreted with caution. “

      With regard to the point that the E2·BeF3- structure is "primed for dephosphorylation", only Fig. 2 (now Figure 3) is shown, in which differences appear to be the path of the TGES loop and the orientation of the Glu167/183 side chain. Their atomic models show that there is a plenty of space for the Glu167 sidechain to take an orientation similar to that of Glu183 in SERCA. The authors should, however, provide an omit annealed Fo-Fc map for the Glu167 side chain and explain why that is the preferred and only orientation. If a Glu side chain is free to move, it could adopt in less than a nanosecond a different orientation. If it does, then the difference in the orientation of the Glu side chain does not sufficiently explain "the rapid dephosphorylation observed in single-molecule studies". The authors place further emphasis on proton occlusion and countertransport. However, this part of the manuscript is more speculative and, as detailed later should, at least, be entirely moved to the Discussion section.

      We have added a new supplementary Figure S5 showing an omit annealed Fo-Fc map for Glu167. This shows that the side chain has the preferred location that we discuss. We would like to clarify that the pre-organization of the catalytic side is not merely a question of the rotamer of the side chain of Glu167, but also requires the TGES loop to break interactions to reorganize its backbone structure. This can be seen e.g. in Figure 3C. <br /> Proton occlusion and counter-transport will be addressed below.

      ===== <br /> As mentioned, the authors place a larger emphasis on proton countertransport. Here a number of issues show up. First of all, I think they have frequently used the term "occlusion" improperly. From my understanding, occlusion of a site (or ion) means that the site (or ion) is inaccessible from either side of the membrane. This means more than closure of the gates, as the two gates have to stay closed for a substantial length of time (i.e. locked). It is experimentally well established with SERCA that Ca2+ ions are occluded in E1P species. It can be shown that the lumenal gate is closed for Ca2+ in the E2 state. However, that does not necessarily mean that the gate for *H+* is also closed. As far as this reviewer knows, nobody has actually demonstrated that H+ is occluded, even in the E2 state of SERCA.

      Furthermore, the authors presume that protons enter the binding sites through a different pathway from that used for Ca2+ release, citing ref 26. However, if it does, can closure of the gate for Ca2+ really mean closure for the gate for H+? This seems a contradictorily statement as the authors designate that the E2·BeF3- state in Listeria Ca2+-ATPase as a proton occluded state (p.12). Apparent closure of the gate for Ca2+ on the extracellular side in a crystal structure seems insufficient for such a statement. One must keep in mind that a crystal structure merely provides a possible conformation in that particular state. It may not, however, represent the most populated conformation for that state. It is equally plausible that the E2·BeF3- complex takes a closed conformation for only a small fraction of the time. At this resolution it is simply not possible to determine if H+ occupies the binding site in the crystal structure. Furthermore, although it may be possible to show the gate is closed for Ca2+, it would be very difficult to show the gate is closed for H+. Thus, more experimental evidence is required to support that the structure represents a H+ *occluded* state.

      The authors write in the Abstract "Structures with BeF3- mimicking a phosphoenzyme state reveal a closed state, which is intermediate of the outward-open E2P and the proton-occluded E2-P* conformations known for SERCA". In essence this statement is fine, although what "closed" means is still unclear to me. In Figure 1 (now Figure 2), the authors state that "LMCA1 structures adopt proton-occluded E2 states". This statement is a bit misleading, because, in E2·BeF3-, the lumenal (extracellular) gate can in fact be opened and closed, at least with SERCA. As the authors recognize (p.14), the BeF3- complex of SERCA can be crystallised in two conformations, one with the lumenal gate is closed (with thapsigargin) and the other with the gate open; yet, they write "In SERCA, the calcium-free BeF3 -complex adopts an outward-open E2P state,..." p.8). This is for lumenal (extracellular) Ca2+, not for H+. Further evidence is required to establish that the extracellular gate of LMCA1 is fixed in a closed position for H+ in E2·BeF3-. Again more experimental evidence is required to support that E2·BeF3- is a H+ occluded state.

      The underlying challenge is that it is incredible difficult to demonstrate proton occlusion experimentally: The protons are invisible in most crystal structures and experimental variation of the H+ concentration affects many parts of the molecule. This means that it is not possible to get the same level of evidence for occlusion as for e.g. Ca2+, and as the reviewer states this has also not been achieved for other pumps.

      This does not mean that it is impossible to deduce information about protonation states and H+ pathways from a crystal structure. A buried side chain is thus unlikely to be charged unless it is paired with a neutralizing charge, and we can thus reasonably deduce protonation states from structure-driven pKa prediction. Second, it is known from functional studies that LMCA1 and other Ca2+-ATPases counter-transport protons, so some of the transport site residues must be protonated. We think it is reasonable to interpret the crystal structure in terms of the most likely residues involved in proton counter-transport. <br /> We agree with reviewer #3 that the crystal structure only represent a single (likely highly populated) conformation. However, this criticism is equally true of any other crystal or cryoEM structure, and does not prevent such structures from being useful. It is tricky to precisely map proton access as they can be relayed via protonatable residues, i.e. “proton wires”. It is unlikely that any experimental method would unambiguously probe proton accessibility, and molecular dynamics would be unlikely to be conclusive due to the coupling between dynamics and protonation state. As absolute proton occlusion is difficult to demonstrate, we think it is more useful to think in terms of relative rates of proton exchange. All other things being equal, a residue that is fully exposed to the solvent will exchange protons more rapidly than a residue that relies on proton relaying or breathing motions in a protein. In this context, it is reasonable to consider this state a proton occluded-state.

      To reflect this, we have edited the manuscript as follows:<br /> We have edited the “Results” section so it focuses on the immediate structural interpretation, i.e. pKa prediction and comparison of ion pathways. Discussion of the mechanisms that strays from the immediate structural interpretation has been moved to the “Discussion” section as proposed. The section headers have been updated to reflect this so now they discuss “Ion pathways and binding sites” and “Transport site protonation” rather than the “Mechanism of proton counter-transport”. Overall, we have softened the language describing proton occlusion to reflect that this is our best current interpretation and not established fact. Furthermore, we have qualified the statement about what a proton occluded state is:

      “It should be noted that occlusion has a slightly different meaning for protons than e.g. Ca2+, as it is difficult to experimentally demonstrate proton occlusion. Furthermore, a crystal structure only provide a single snapshot of a protein and it is likely that protein dynamics will allow proton access to a certain extent. In the following, we describe a state as proton occluded, if it the ion binding site is closed to direct solvent access”

      The authors write that "SERCA has two proposed proton pathways: a luminal entry pathway [26] and a C-terminal cytosolic release pathway [27] (p. 9). One has to be careful here, as the luminal entry pathway has not been experimentally confirmed in SERCA. The authors write that "The luminal proton pathway has been mapped to a narrow water channel ... [26]. But since the pathway is not confirmed in SERCA I don't think it can be used to justify that the corresponding part of LMCA1 is mainly hydrophobic and that protons cannot enter through this pathway.

      As discussed above, experimental confirmation of a proton pathway is really tricky, but the structural comparison of the different residues in this region is unambiguous. We think it is reasonable to keep this comparison in the manuscript, but have rephrased the it to the “proposed” luminal proton pathway, and rephrased to remove the word “mapped”, which suggests experimental verification.

      The description on the exit pathway for H+ also needs clarification. They describe (p. 10; first line) "In SERCA it consists of a hydrated cavity...[27]. ... M7 in LMCA1 further blocks the pathway ... and LMCA1 therefore does not appear to have a C-terminal cytosolic pathway either" and rationalize that "This may explain why no distinct proton pathways are required in LMCA1". I think it should be made clearer that this is a *proposal* rather than an established *fact*.

      This section has been re-phrased and merged into the discussion.

      As H+ release takes place in the E2 to E1 transition the authors state that the E2·BeF3- structure of LMCA1 is different from that of SERCA. However, I don't think they can confidently make such statements without E1 and E2 structures of LMCA1. Furthermore, these descriptions (discussion) should not be in the "Results" section. As they conclude that LMCA1 use the Ca2+ release pathway, which is assumed to be the same as that in SERCA (even though no Ca2+ release pathway is visualised in their crystal structures), for H+ entry, why does SERCA not use the same pathway? I think experimental evidence is required for a proposal that H+ binds to E309 from the cytoplasmic side.

      Proton release likely takes place in the E1 state, not the transitions. Getting a crystal structure of this state would be great, but falls outside the scope of a revision. We compare our crystal structures of LMCA1 to the E2 crystal structures of SERCA, and they are clearly more similar to the E2-AlF state (see new Table S2). This is a straight forward alignment of a protein to its closest homologue with an available structure, so we think it is fair to keep this in the “Results”.

      As this paper focus on LMCA1 and not SERCA, we think that both protonation of E309 and ion pathways in SERCA fall outside the scope of the manuscript except as a reference for LMCA1. However, as SERCA has additional pathways it will presumably be a question of kinetic competition.

      The issue of proton counter-transport is dealt with above.

      Additionally all the minor comments from reviewer #3 have been dealt with in the updated version 2 of the manuscript.

    1. On 2021-02-20 19:31:38, user Ekaterina Shelest wrote:

      Further major concerns.

      The FunOrder is positioned as a tool for “automated identification of essential genes in a BGC”; (for people who deal with BGCs, this means all cluster genes, because usually clusters are compact and spare genes are rare). But the input is already a set of BGC genes, so, first of all, the clusters are not really identified. We can only speak about some refined annotation. Given that the emphasis is made on biosynthetic genes and not all BGC genes, it is only partly refined. This makes all the statements about the importance of better cluster annotation, provided in the introduction, obsolete. Secondly, where the input BGC genes come from? In case of a new genome, will this be a set of genes in some vicinity of the PKSs and NRPSs (if yes – in which?)? Or a result of preliminary BGC annotation with antiSMASH and/or CASSIS? This should be specified. For known genomes and BGCs, again, what is the source of the BGC information? MIBiG, antiSMASH, other databases, literature? Where the examples used in this study were taken? Table 2 provides MIBiG IDs but not for all clusters; where the others come from?

      MATERIAL AND METHODS <br /> FunOrder - Workflow

      1. Practically the only part of the tool that deals with evolutionary questions is treeKO. This is fine. But it is not clear to me, if the “speciation history” is shown by the authors of treeKO as less significant in detection of co-evolution, why do you consider it at all? What’s the point of a combined measure that includes something that is less trustable and informative (“speciation history”, in this case)? The examples are not convincing; if you want to use a measure, you should show it’s useful.

      2. I did not understand what was the point of making a curated proteome database. In which sense is it curated? Did you filter something out? If yes, what, on which principles? Is it just a collection of 134 proteomes from JGI and NCBI? Could you please explain the principle on which they were selected? One can blast against all ascomycetes in JGI and get many more hits for the query genes. Why limiting yourselves to just 134? Many of which are of the same genera? If the reason is just to rename the sequences assigning a species identifier, this can be done with any genome/proteome with a simple script, no need to keep the proteomes in a special database.

      Performance evaluation.

      Hmm… I was puzzled by the effort of manual comparison of 102 control BGCs, each with at least 3 genes. Did I understand it correctly, was it literally manual? Why did you do that? (Was it a practical assignment to a class of students?) I had a feeling that this manual assessment was then used as a gold standard to set up a threshold for the tool. But why? Why not simply select parameters of treeKO, which would allow to re-identify the true positive BGC genes? Eventually, this is what was done, setting up the treeKO parameters;<br /> I don’t understand the sense of the manual evaluation step.

      Measures of the performance.

      Here we come to an interesting part. <br /> The worries start with this: “we calculated three measures (two measures for the positive control BGCs and one for the negative control BGCs)”. In general, positive and negative controls are treated identically. Otherwise, they are not controls. Or did you mean something different?

      Speaking about the proposed measures themselves, they are confusing. To start with, TP, TN, FP, FN are already defined with clear definitions and there is no need to re-define them. What you measure in your experiment and put in a confusion matrix ARE already TP, FP, and so on. A phrase like “obtained values for FCGM and ERM were classified as true positives (TP) or false negatives (FN), and the values for NCV were classified as true negative (TN) or false positives (FP).” is bewildering. You cannot classify ERM or ECGM or anything based on them into TP, FN, etc., because you use the real (measured) TP, FN, FP to calculate ERM, ECGM, and NCV! It seems that you are going in circles.

      Probably you haven’t noticed that your notations “a”, “b”, “c”, correspond to FN, FP, P. The “number of genes necessary for the biosynthesis of a SM, that did not cluster with the other necessary genes in the FunOrder analysis” to me translates into “genes that we expected to be there but haven’t found”, which is a typical FN. So, your “a” from equation 1 is the FN. Moreover, your FCGM is not a new measure but just the sensitivity, or true positive rate (TPR), or recall, this is evident if you use standard notations:

      a=FN; c=P; c-a=P-FN=TP; => (c-a)/c=TP/P=TPR.

      What’s the point of inventing new notations?<br /> ERM is nothing else than accuracy: <br /> By definition ACC=(TP+TN)/(P+N)<br /> ERM=1-(a+b)/d; A=FN; b=FP (if there were no other genes that should not belong to the cluster); d=P+N; =><br /> ERM=1-(FN+FP)/(P+N)=(P+N-FN-FP)/(P+N)=(TP+TN)/(P+N)=ACC

      I must also point out that the way how the equations are written is… a bit strange. It’s some brackets obsession there. There is no need for brackets in expression like 1-a/c, the division goes before subtraction anyway. Same for a/d+b/d; moreover, you are allowed to sum up the fractions. The scary expression for NCV looks actually like this:<br /> 1-g/2d(d-1)

      No need for three classes of brackets, especially between the factors of the multiplication.

      Regarding the NCV, I did not fully understand what is meant by g. It is defined as a “number of … distances in all matrices” but this does not make sense. Is it the number of genes of the considered cluster on strict and combined distances at selected thresholds, in other words, genes that fulfil the condition to be considered as clustered? If yes, then this is just TP. If no, what is it, then? It’s also not clear, why 2d(d-1)? In general, could you please explain how this NCV measure was defined, derived and why?

      Results and discussion: <br /> “In our experience, evaluating only the numerical values is not enough for a thorough analysis of a BGC and it is necessary to consider all provided visualisations for a thorough data interpretation“ – Usually visualisations are used for illustration or as supportive material. The idea of computational tools is to switch from human interpretations, which may be biased, to something more systematic, isn’t it? There are ways to extract the results of cluster analyses and operate with numbers.<br /> By the way, the Fig. 3 legend is mixed up.

      Performance evaluation <br /> As I think that all metrics are calculated incorrectly, further discussion of the results is senseless. But if the metrics were correct, they could be hardly considered as good. <br /> This is not surprising because, as I said, we shouldn’t expect that all genes in the clusters are co-evolving.

      More comments to come!

    1. On 2021-02-01 15:12:18, user Melissa Bu wrote:

      Hello Drs. Alkhatib et al.,

      My name is Melissa and I am an undergraduate biomedical sciences student at UCLA. A few classmates of mine and I chose your manuscript to present for our Journal Club seminar course. We wanted to share some of the feedback we collected from our ~15 peers and professor on your excellent work, and hope it may be of use to your revision process:

      In general, we were curious about the stage of TNBC tumors obtained from patients for the gene expression profiling? We wondered if the stage of tumor would affect the selection of onco-proteins for subsequent FACS analysis.

      For fig. 1, we appreciated the simple and effective coloring, as well as the use of sketches to illustrate the workflow. We thought it might be a good idea to clarify that the schematic illustrated in fig. 1 is of the experimental order, not the treatment order (since elsewhere in the manuscript the targeted therapy is described to be administered prior to radiation therapy). Additionally, what were the demographics of the patients form which the xenografts were derived? Were they from a diverse sample of patients? Furthermore, since the solution schematic is illustrated in fig. 3, we thought it would be worth considering the omission of the bottom half of figure 1. If it is kept, however, we wondered about the definition of "non-proliferative"—does this mean the tumors are still present, just no longer growing? Or, does it imply that a new subpopulation (from the original mass prior to RT and targeted therapy)? <br /> We also wanted to learn more about the 14 processes in supplementary table 1, but had trouble comprehending it and thought it could be modified to be more accessible to readers outside of the your research niche.

      For fig. 2, we noticed a small typo in panel a where "patients" was mis-spelled as "pateints." We also wondered whether what the colors meant for the plot in panel b, and why R^2 was used. We thought panel b might be suitable as a supplementary figure instead. In panel c, we were unsure whether one of the red/blue outlines should read "down due to" as opposed to both "up"? In terms of coloring, we thought it may make the figure more clear if the outline color were orange and green, for example, corresponding to the red and blue solid boxes. In addition, we thought it could be beneficial to include somewhere in the text that CD326 was not participating in the processes, since we could not find this marker involved in any of the processes. For the sake of reading ease, would you consider assigning more distinct colors to EGFR and CD326 in the selected onco-markers key?

      Fig. 3 was really helpful for understanding the workflow and purpose of each step in the project!

      Fig. 4: in panel g, we were confused about the label placement of "CSSS" and suggest the "CSSS" label currently labeled vertically to instead be placed horizontally. In its place we suggest "process number." We were also curious bout the lettering of the CSSS barcodes—are they in temporal chronological order (curious because b and f correspond to the "Early" and "Late" sub-populations). In terms of presentation, we wondered why the squares were now black and grey as opposed to the red and blue presented in in earlier figures. <br /> In panel a, we suggest increasing the font size of the protein names. We also noticed that the tops of the error bars for 15 Gy group are detached for both Her2 and cMet plots. In part b, we suggest cleaning up the underlying blue grid structure, as well as putting up the Flow gating for more natural interpretation.

      Fig. 5: We thought it might make panel a more clear if "E," "L," and "P" were written out and perhaps also represented with different colors as opposed to the line patterns currently used to differentiate between the groups. in panel b, to avoid confusion of indicating other panels, you could consider labelling the CSSSs as "Early" and "Late" instead of "b" and "f." In the figure legend, we think readers would appreciate it if RT and C could be written out in full for clarity's sake. Since the orientation of panels i and j are confusing, and there is generally lots of data packed inn fig. 5, we suggest that it can be split into two separate figures.

      Fig. 6: We thought that panel a may be unnecessary, since we had little trouble comprehending the mice radiation process. Instead, we suggest replacing panel a with a timeline of the mice workflow. We thought panel d of this figure was very clear and well-done! In part e, since all groups have RT, instead of "+" across, you could consider a single line or a simple description. Importantly, for the sake of color-blind readers, it would be beneficial to use more differentiable colors for the bar colors here.

      Fig. 7: We thought that the "14d post RT group" in panel a could be changed from green to a different color, since it's currently labeled as the same color as the "RT+T+C" treatment group in later panels. In part b, we thought the colors could, again, be more distinguishable for accessibility's sake. In panel c, we were curious about the arrows pointing at the green group—could you explain in the legend why these arrows were placed? Since there is a lot to look at in panel c, we believe it would be beneficial to space out the graphs. In panel e, we were curious about the use of "E" drug, and suggest, for consistency's sake (with panel c), that this group be omitted (or added to panel c). We would again suggest "Early" and "Late" instead of "b" and "f" for the CSSSs depicted in panels b and f of this figure.

      In general, we learned a great deal about TNBC, single cell surprisal analysis and other valuable techniques from reading your manuscript. Thank you for sharing this exciting and important work. The writing was overall easy to understand and compelling, and we particularly appreciated your use of multiple models (i.e. human data, cell lines, and mice models). Again, my peers and I are undergraduates, so we are giving feedback from a baseline level of knowledge. We really appreciate your efforts in offering better solutions for this deadly disease. After reading about the level of specificity in and strategic approach in which you are investigating TNBC, we feel more hopeful for the future of patients with TNBC.

    1. On 2020-12-02 21:30:36, user Alexis Rohou wrote:

      I was asked by a journal to review this manuscript. Below is my review

      ***

      This manuscript explores the observation that Thon rings visible in amplitude spectra of micrographs decrease in amplitude as a function of spatial frequency (distance from the origin in F space) and that this decrease is more pronounced in micrographs collected with larger objective lens defocus.

      Since the height of Thon rings from image of test specimens can be taken as an estimator of recoverable signal-to-noise ratio in experimental data recorded under identical conditions, this has led many practitioners to prefer to collect data as close to focus as possible. The dominant assumption in the field has been that the observed defocus-dependent contrast attenuation is due to imperfect spatial coherence of the electron source, but this manuscript provides compelling evidence that another phenomenon is responsible.

      The authors note that a significant amount of signal is delocalized beyond the edges of the field of view and so cannot be recovered. Further, the authors point out that single-sideband (SSB) signal in the collected image (be it from features in the field of view but near its edges, or delocalized from features not present in the field of view), while it contributes power to the image, does not contribute to Thon rings because its amplitude is not modulated by the CTF.

      I find the authors' evidence in support of this compelling:<br /> - experimentally, the nodes (local minima) between Thon rings to not reach the "noise floor" as would be predicted if all contrast in the image arose from phase contrast attenuated by a spatial-coherence envelope. Computationally, the authors show that this "Thon ring floor" is raised under conditions where more of the recorded image power consists of SSB signal (increased defocus or small field of view)<br /> - theory predicts that, at the fluencies normally used in cryoEM, the spatial coherence of the illumination supplied by modern eletron sources is such that one would not expect significant defocus-dependent attenuation effects<br /> - most compelling, the relative intensity of Thon rings in actual images is well predicted by the fraction of image features for which signal for both side bands is recorded (Fig 4)

      My only significant reservation with this manuscript is about the "messaging", and specifically this sentence of the abstract: "The principal conclusion is that much higher values of defocus can be used than is currently thought to be possible". <br /> While the authors have convinced me that the negative effects of defocus were misunderstood and overstated, their claim that higher defocus could be used with no ill effect should be qualified (preferably in the abstract, and in the main text) to make it clear that they are only referring to the imaging part of the experiment, and not the image processing part of experiments, where high defocus values would force users of most packages to use very large box sizes at various parts of the process creating unusually large computational burdens, and/or other problems may occur. If the authors want to keep the claim as is, they should add experimental results that support it, e.g. high-resolution apoferritin reconstructions obtained from both low and high defocus datasets, along with characterization of the mean SSNR, ResLog plot, or similar, in each case. Probably better to keep the paper more or less as is and just qualify this claim, in my opinion.

      Beyond that, I have more minor suggestions / questions.

      (1) Abstract: I'd encourage the authors to consider removing the sentence remove about correcting mag distortion ("We also show (...) many orientation") - if I understood correctly, this becomes very significantly only at very large defocus, and only if averaging spectra to 1D curve before fitting. For these reasons, I think this is a rather minor point of the paper. In the context of the abstract, I think this aside distracts from the main message

      (2) Abstract: "and Ewald sphere correction". Perhaps I missed it, but I don't recall reading in the main text an explanation of why defocus should allow for better Ewald sphere correction, or a demonstration that this is the case. I suggest removing this from the abstract, or adding text explaining this, or a citation to a reference that does (on that note, after a quick re-read of Russo & Henderson 2018, I also don't see an obvious demonstration there that higher defocus yields better Ewald sphere curvature correction, but I'd happily stand corrected).

      (3) Page 3: "This is because compensating information, which unfortunately is of no use, may enter the image from features that are outside the field of view." On first read, this sentence confused me - I think because the phrase "compensating information" threw me off. How about something like "This is because unrelated single-side-band signal delocalized from features outside the field of view may enter the image."?

      (4) Page 4: "Since delocalized (...) high defocus values to record images (Russo and Henderson 2018b)". I think readers who like me are not well versed in the optics and maths of SSB imaging, this statement is difficult to understand. Could it be explained a little further / clarified? To spell out my confusion: why does the feasibility of recovering SSB information even the absence of the Friedel mate mean that it should be advantageous to operate at higher defocus?

      (5) Same paragraph ("We note that information in (...) become greatly reduced"). This whole paragraph argues (I think) that collecting highly-defocus images is OK, yet wasn't one of the points of Downing & Glaeser (2008), cited in this paragraph, that the larger the defocus the lower the more CTF correction schemes or Wiener filters fail at retrieving all of the information (due to the "twin image" problem). My apologies If I'm mis-understanding - if that's the case perhaps other readers will also need a bit more hand-holding through this paragraph.

      I loved all the detail poured into M&M, so I suggest specifying further:<br /> (6) Page 5: "annular zones of 1 reciprocal-space pixel" - how was interpolation done here? Nearest neighbor?<br /> (7) Page 5: "floated" - I assume this means adding a constant so that the average value is zero?<br /> (8) Page 6: "Smooth curve" - fix capitalization. Also, what kind of smooth curve?

      Results:<br /> (9) Page 6: "The integrated power at 2.35 Å" - measured how? In real space in the white box?<br /> (10) Page 6: "(67% of intensity)" - 67% of which intensity?<br /> (11) Page 6: "~0.23 nm" - to guide the eye, please add a second x axis in figure 2, or replace the existing one, so that we can look for the 0.23 nm feature.

      (12) Page 7: "The mean value of this noise spectrum can be regarded as the "zero baseline" for the power spectra of images recorded with a specimen". This noise floor will rise as a function of the number of electrons incident upon the detector. The choice of illumination condition when collecting "no-object"/"beam-only" images for these experiments is therefore important. I assume that the authors used the same illumination conditions as had been used in the actual experiment with a specimen. Is this correct? Either way, could the authors briefly mention somewhere what illumination conditions were used for this? <br /> -- I expect that using the same illumination condition would lead to an overestimate of the height of the noise floor. Indeed, during experiments with specimens, some fraction of electrons will be lost to apertures, leading to an overall decrease in the average number of eletrons reaching the detector. One may thus expect the actual noise floor in "with-specimen" experiments to be even lower, perhaps making the authors' point even more striking.

      Discussion:<br /> (13) Page 7: "did not prevent images at 8 um defocus from being recoded at a resolution of 1.44 Å". Is this shown somewhere? Fig 1C shows 1.3 um defocus, not 8 um.

      (14) Figure 2a: could the X axis be re-labelled, or also labeled with spatial frequency in nm-1 or Å-1 - this would help locate the 3.5 Å bump mentioned in the discussion

      (15) Suppl Figs 4 and 5: here also, having a second X axis, or a second set of labels with spatial frequencies would be helpful.

      (16) Figures S4 and S5: The lower bound of the Thon rings is "raised" with increased defocus, as predicted by the increase in SSB signal, but why is this lower bound so much higher at around 0.5 Nyquist, while remaining low at the origin and edges of F space? Is this predicted by the model? Does it correspond to the FT of the shape of the circular mask used in generating the simulated images?

      (17) Page 9: "to interference between the contributions (...) which is 2a". This sentence reads as though the two SSB beams are interfering constructively or destructively with each other. Unless I'm mistaken the interference is between the scattered beams and the unscattered beam, is it not? That's certainly what the next sentence seems to say.

      (18) Page 9: "The persistence of lattice images within (...) displaced from the particle". Likely because of my lack of expertise, and specifically because I do not know what the "coherence diameter" is, this sentence was lost on me.

      (19) Page 10: "We note that this behavior is different (...) envelope function". For completeness, how about adding a supplementary plot overlaying the observed behavior (as in Fig 4) and the prediction from the spatial coherence (at whatever beam characteristics best fit the data, to point out perhaps that an unrealistic illumination semi-angle would be needed to fit the data)? This would help readers like myself who are not quite certain what one would expect such plots to look like if spatial coherence were really at play here.

      (20) On the subject of Figure 4, I am curious about why the last few points of the 2.3 Å series seem so far off the prediction. The authors made a point of saying that the power spectra were so oversampled that even at that frequency, they had 3 pixels sampling each ring. So why the discrepancy, if not undersampling/aliasing? This made me curious: what would an equivalent plot from the simulation data look like? Would the Thon ring amplitudes from this synthetic experiment be a closer match to the predictions (dashed lines in Figure 4)? If not, perhaps this mismatch is due to poor sampling of these very fine rings at high defocus after all?

      Summary and conclusions<br /> (21) Here might be a good place to formulate some caveat about the practicalities of processing data collected at very large defocus.

      Figures & supplements<br /> (22) Figure S5: this would seem to argue strongly against evaluating the power spectrum using patches - would the authors agree? if so, how about mentioning it in passing somewhere? The optimal way to compute power spectra for the purpose of CTF parameter fitting is still a topic being discussed in the literature of late, and this observation would seem to be relevant.

    1. On 2020-12-01 23:43:34, user Adrian Barnett wrote:

      This is a useful experiment given the shortage of experiments into funding. As Guthrie et al (reference #1) stated: "We need to overcome the reluctance of funders and scientists to acknowledge the uncertainties intrinsic to allocating research funding, and encourage them to experiment with peer review and other allocation processes". The results are broadly supportive of a simpler and cheaper peer review system.

      The agreement between reviewers was not adjusted for chance (e.g, using Gwet’s statistic). I agree with this approach as the raw agreement is what researchers are interested in (their only question is always, “Was I funded or not?”). We can account for chance by setting a threshold for an acceptable difference, e.g., an agreement of 75%. This threshold would ideally be based on discussions with the research community.

      The differences in agreement were tested using chi-squared, but these are paired categorical data and so I think McNemar's test would be better. Although I'm not sure that p-values are useful given the sample size and the potential for a p-value of 0.05 to be interpreted as demonstrating equivalence. I would focus on the confidence intervals and whether they rule out an important difference in agreement.

      The authors use Wald intervals but the sample size is small and the proportion is sometimes close to one, hence the normal assumption may start to be strained. I would consider using a bootstrap interval.

      Although face-to-face meetings for peer reviewers may increase trust they also are a networking opportunity and could disadvantage those not invited or unable to attend (e.g., researchers caring for children). It is also a great learning opportunity for the reviewers about what makes a good application.

      Minor comments<br /> - Table 1 shows summary statistics not "the distribution" <br /> - "no negative or positive reactions to the use of random selection were received from applicants" but was feedback asked for or were there only unsolicited comments?<br /> - The success rates here are very high success rate compared with other schemes. This may put less pressure on the system and allow it to conduct more novel experiments such as modified lotteries.

    1. On 2020-11-24 17:42:09, user Fraser Lab wrote:

      There are clinically relevant proteins that are difficult to target for drug discovery due to the lack of an obvious binding site. Cryptic binding pockets are often difficult to identify, and may not exist on some proteins of clinical interest. This manuscript examines the relationship between cryptic pockets and ethylene glycol bound in crystal structures. However, it is hard to follow and the organization/ordering of different sections can likely be improved to make a more logical flow. There are three sections:

      First, the instigating observation is that a mutant (W->A) creates a small cavity in a xylanase that the authors work on. Seeing the WT protein (4QCE) overlaid with the mutant will make the presentation of this new binding site more clear and the engineered nature of the inciting cryptic binding site more transparent.

      Second, the authors then compare how often cryptic pockets are observed interacting with ethylene glycol in enhanced MD simulations (in three systems). This type of analysis expands the initial mixed solvent experimental work https://www.nature.com/arti... and is similar to previous analyses e.g https://pubmed.ncbi.nlm.nih... and other references in their manuscript. The comparison of MD simulations between RBSX, and NPC-2 and IL-2 are incomplete. RBSX simulation is simulated with EDO in the cryptic pocket, but there is no explicit solvent or co-solvent simulation of apo RBSX that demonstrates EDO can identify a cryptic pocket on RBSX, like is shown for NPC-2 and IL-2. It would be nice to see a comparison of explicit co-solvent simulations using EDO and PGO as organic probes for identifying cryptic pockets. Is this teaching us more than FTMap and related fast methods would? We’re not sure this section compares properly to the state of the art, and we doubt it improves on it.

      Third, they compare retrospective examples of crystallographically bound ethylene glycols (in two systems in results, and then a long discussion on a kinase in the discussion) with eventual optimization into those pockets through medicinal chemistry. If such an analysis were carried out even more broadly, it would be of significant interest. Due to the widespread use of EG, glycerol and other small molecules as cryoprotectants, this seems to have potential.

      Some minor points:

      “reiterating that cryptic pockets in general prefer to stay in closed-state in absence of the ligands”<br /> Isn’t this a post hoc fallacy because they are cryptic?

      “For years, efforts to develop inhibitors against K-RAS, an oncogene mutated in human cancers, were unsuccessful until a new cryptic site was found leading to successful targeting of K-RAS (4,9)”<br /> these references are PRETTY different in terms of impact. I also think this misses the nuance between cryptic and covalent

      "often have negative outcomes"<br /> ? not sure what that could be

      " Importantly, the information that, which of the probes used in fragment screening have potential to identify cryptic sites is lacking. The identity of such probe molecules having validated “cryptic-site finding” potential can significantly reduce time, efforts and expenditure in fragment screening experiments for identification of cryptic sites."<br /> not sure what these sentences mean

      Figure legends need to be more direct and recapitulate what’s in the main text.<br /> Fig 3 - not clear that you're comparing water accessibility of Ala6 between open and closed states

      I would remove the X angle labels on Fig 1 from the model view

      The displaced green sticks in 4B are confusing. I understand the rotation F66 undergoes upon EDO binding, but the change in Y100 position seems more dependent on the backbone than the rotamer. If dep on backbone, would include some way to signify that esp if the res retains same rotamer angle

      Molecular views could be set to same for given model

      6F - unclear why EDO molecules behind surface are shown.

      Fig 7D is unclear

      Fig 8B, 8C is confusing due to overlay

      We were prompted to review this by a journal and post this comment non-anonymously, James Fraser and Roberto Efraín Díaz (UCSF)

    1. On 2020-11-02 18:26:41, user David Klinke wrote:

      Based on a class exercise in reviewing pre-prints, students generated the following critique of this pre-print. We hope that you find these comments helpful.

      Makaryan and Finley have submitted a pre-print of work relating to a gap in the field’s understanding of possible methods to combat NK cell exhaustion by developing a computational model that describes the dynamics of GZMB and PRF1, which showed that suppression phosphatase activity maximized GZMB and PRF1 secretion, but that this method depleted intracellular pools of GZMB and PRF1. As a result, they investigated further by modifying their model with a synNotch system. They found that the optimal synNotch system is dependent on the frequency of NK cell stimulation. The ultimate goal of the work was to provide insights that could be used in clinical applications for the engineering of robust NK cells resistant to exhaustion. Although this work is of interest to the field, there are some concerns that could be addressed in the next version. These are outlined below.

      -What results did you find the most interesting and why?

      The methods presented in this paper were of particular interest to me. As a researcher new to the field of computational modeling and Bayesian frameworks such as the Metropolis-Hastings algorithm, this reviewer appreciates the opportunity to read about what others are doing in the field using such methods.

      This reviewer found the results relating to the optimal synNotch system and its dependence on the number of rounds of stimulation particularly interesting. Specifically, the fact that the inhibition of SHP is not a beneficial long term strategy because of the accumulation of phospho-proteins. From the model diagram, one would think that this would be effective long term by eliminating the inhibition coming from the pSHP node, but the interdependencies make for a more interesting optimal case.

      This manuscript has the potential to open up opportunities for new work in the engineering of NK cells for use in immunotherapies, which is of particular interest in cancer research, however, this reviewer believes that there are some concerns that need to be addressed before the results can provide any actionable insight.

      Major Concerns:<br /> - Considering that there are some assumptions that have assigned some random values for type of parameters, which can be called “hyper-parameters” in the paper. This reviewer would use some hyper-parameter optimization methods for finding the best one so that the model accuracy will be improved by this way. Literally, hyper-parameter tuning is just an optimization to find the set of hyper-parameters leading to the improvement of a model. Practically, we can specify a grid of acceptable values for the specified hyperparameters. Then train a number of models pertaining to each of the different hyperparameters. Finally, select the model that performs the best from the pool of many models.

      • Regarding Figure 2, is there any assessment for accuracy of the model? What if add a test set to evaluate the performance of the model? Clearly, validation set is different from test set and it can be a part of training set, because validation set is used to build your model. It is always used for parameter selection and to keep away from overfitting in your model. If your model is non-linear that is training on a training set only, it is more likely to get highest accuracy and overfitting, then you will get very poor performance on test set. So, you choose a validation set such that it is not depends on the training set and is used for tuning the parameters of a model. Conversely, test set is going to be only used to evaluate the performance of a trained model.

      • Significant concern lies in some of the assumptions made for this model. In particular, the setting of the upper bound of the initial value of synNotch receptor based on the CHO cells modified to produce IgG is questionable. While the manuscript already points out the dissimilarities between CHO and NK cells and between the synNotch receptor and human IgG, the specific value of 10 uM, which I assume was chosen because it was the approximate average of the range from the CHO experiment, also presents problems. The results presented in Figures 4B and 4C regarding the difference between the optimal amount of R0 for the two pathways was specifically dependent on this value of 10 uM that was arbitrarily chosen. What would have happened if you had arbitrarily chosen the minimal value of 0.3 uM in that range so that it matched the initial amount of NKG2D? Or the maximum value of 20 uM which would be closer to the initial value of CD16? The importance of this upper bound in the trends presented in the results section should warrant a more sound basis for the choice of value. Other important assumptions, such as the value of the weight constant used to determine the emphasis on minimizing exogenous material versus maximizing cytolytic molecules should have some literary backing and be further explored as opposed to being chosen for simplicity.

      • This reviewer also believes this model requires further validation beyond that currently presented. At this point, all validation was done internally using a subset of the same data set used to train the model (from Srpan et. al.). A second data set, either from Srpan or preferably repeated in Finley lab should be used as validation to ensure the model is not highly specific to the single data set used, but that it can be generalized to the dynamics as a whole.

      Minor concerns:<br /> - While the manuscript overall flows well and tells a cohesive story, there were small sections when reading that information would be unclear, only to be clarified later in the paragraph or in the next paragraph. One such instance was the discussion of the Akaike information criterion for the three different models that were tested. In the beginning of the paragraph as the addition of crosstalk and synthesis/decay reactions was discussed, it was unclear that you were forming multiple models. When arriving at the sentence “Excitingly, all candidate models demonstrated a good agreement with experimental observations”, it wasn’t understood that there were multiple combinations of parameters being investigated in different models, which caused confusion. The explanation of the AIC and Table 1 at the end of the paragraph helped to provide clarity, but if a reader choses to go back in the manuscript rather than reading forward to find their answer, it may cause further confusion. It may be helpful to clarify some of these basic pieces of information throughout the manuscript to ensure understanding.

      • In addition, supplementary file S3 is not available on the BioRxiv site. As this contains all of the supplementary figures, it is important that this be available with the manuscript for optimal clarity.
    1. On 2020-08-24 15:01:09, user Gary Linz wrote:

      OK, I'll start! I recommend that resolution, accuracy and precision be separated into their components. All things being equal, who would want resolution of the different populations? Ah, but all things are not equal. Putting these all under one column is creating the wrong impression. For example, resolving the four equal numbered standards by NFCM is quite impressive for both standards, but if you take a look at the PS mixture, up to a 50% error in size is reported, to go with a "bonus" peak. I am not a big fan of 50% errors. If this system over sizes the bright particles, is it then under sizing the weak scattering EV samples? So here we have an issue with accuracy. When I look at these TEM data I see a lot of 40-60 nm particles, and a significant number of 100 plus nm particles. I can not for the life of me figure out what the nCS1 instrument is measuring to get the highest counts, with all most all between 65-100nm. I have old eyes, so maybe that's it. I will note that the NTA instrument is the only one that measured the larger particles in the EV samples, so another accuracy issue. Regarding precision, it was mentioned that with some measurements it was a challenge to get the same or similar answers three times. This is a precision problem (I am looking at nCS1 data). I would have to look at the complete SOP to determine whether the NTA data could have been improved, but if Min brightness of 30 were used for the EV and/or Si samples, that would explain the missing smaller particles.

      The ZetaView instrument will resolve 60nm and 130nm EV in a single sample if run properly, that is the resolution we offer. We would rather measure EV samples correctly than a hard particle mixture that as the authors point out, may not have much baring on EV measurements. Nanoparticles in this size range do scatter to the 6th power, which is a huge issue for dynamic light scattering measurements. It is also a fairly annoying problem for NTA instruments using a 20x objective. I consider it a minor problem for the ZV with a 10x objective. Remember, we are tracking the diffusion of particles, not the light intensity. The table: try as we might, we could not get useful data on the nCS1 under 5e8 particle per ml, let alone 1e7. At higher concentrations we note clogging. The ZV detection range is 1e5 to 1e9, the useful measurement range in my experience is 5e6 to 3e8, sample dependant. These are two different specs and should be reported as such. Size detection limit: I have heard that one of the advantages of the nCS1 is that there is a hard stop at 65nm or 50nm, cartridge dependant. At least in the hand of a novice user, I don't think so. It seems deciding what to include as data and what to omit is quite arbitrary. Perhaps experience is needed (or an orthogonal technique). Now, I have measured reliable down to a 50nm mode on EV samples by scatter, 70nm is very routine. With that in mind, the cut-off for NTA is NOT 70nm. Our size range for EV is 30nm to 1000 nm. We cover this range in a couple of measurements by adjusting camera settings. The nCS1 does their range by using multiple 8-10 dollar cartridges. Sample size: one might get the impression that the ZV requires much more sample that the other platforms, the typical amount of material needed is 2-20 microliters, diluted to 2ml. Time to run a sample; UNC emailed me a couple of weeks ago to say they are running 24-32 samples per hour, versus 2 per hour with their old NS500. Your results may vary. Experience: it matters! Three of these platforms were relatively new to the users, only the nCS1, to my knowledge, was used for an extended period, three years or so? References: any reference about the limitations of NTA using NanoSight instrumentation does not translate to NTA in general. It is largely 8 year old technology. Furthermore, our PMX110 systems with CD camera are not nearly as capable as our CMOS based systems. nanoView and NFCM: I am routing for these companies, as I think they both have potential to add a tremendous amount to this community. I will continue to think the nCS1 is the most dangerous instrument being offered until I'm proven wrong. Conflicts: I built Particle Metrix Inc. in North America, my views are through that lens. I would like to also mention that Michael Pauliatis has a close relationship with Spectradyne, having participated in a promotional webinar and having been given access to experimental cartridges that are not available to the general research community at this time. This is not to say you can not trust what I've written here today, or what Dr Pauliatis has published or presented, just that our thoughts are colored by our associations. Ever the diplomat, Gary

    1. On 2020-08-23 20:22:08, user Alexis Rohou wrote:

      I was asked to review this manuscript for a journal. Below are my comments. I hope they are helpful.

      Outcomes of the 2019 EMDataResource model challenge: validation of cryo-EM models at near-atomic resolution

      This is a report on the most recent "model challenge", organized in 2019 by EMDR, during which 4 maps obtained from cryoEM datasets were used as targets for atomic model building and refinement. A number of teams submitted models, which were then run through an extensive suite of model validation tools. The study's design, which included three experimental maps of the same target at varying resolutions, made it possible to draw rich conclusions from the comparisons of validation metrics to each other. I found the analysis to be thorough and informative.

      This manuscript is a useful historical record of the state of the art in model validation today, a marker of how far the field has come in the last few years, and it makes a number of observations and recommendations that should be of interest to all cryoEM practitioners faced with checking whether the model they are building into their map is as correct as could be. Of course, it will also be of profound interest to those involved in the development of methods for atomic model building, refinement and validation. For these reasons, I should think publication with only minor modifications would be warranted.

      One danger with reports emanating from large-scale collaborations between a number of groups is that the text might end up purely descriptive, with any strong conclusions watered down or avoided altogether. This minimizes the potential to ruffle feathers, but also reduces utility to partitioners in the field. This manuscript walks that line pretty skillfully and manages to deliver a few key lessons and recommendations, but I think some sections still skirt around the issues and "just" state observations without delivering as strong a message as they could (or should, in my opinion). Specifically, I thought the sections entitled "Evaluating metrics" at times read like they should have been titled "Describing metric behaviors", because they did not deliver the result of the evaluation, e.g. they avoided clearly stating where some methods have shortcomings and may not be suitable for archive-wide, routine use as robust validation metrics, while other methods checked more of these boxes. In other words, I'd encourage the authors to be more explicitly (self-)critical of the metrics they characterized - what's missing, what can be improved upon?

      For example, the Section "Evaluating Metrics: Fit-to-Map" concludes with (p10, l8-10): "Collectively these results reveal that multiple factors such as experimental map resolution, presence of background noise, and density threshold selection can strongly impact Fit-to-Map score values, depending on the chosen metric." I know this will be picked up later in Recommendation 3, but I believe this is the point where you need to say that these are not desirable features in a validation measure to be used archive-wide or for all new depositions (right?).

      In this same section:<br /> - p9,l12-13: "The observed trend is expected: by definition each metric assesses a model’s fit to the experimental map in a manner that is sensitive to map resolution." Two things:<br /> 1. I don't recall that, by definition, EMRinger cares about resolution - what I mean is that it's a property of the algorithm that it scores better for high-res maps, but it's not embedded in its definition that it should do so, is it? This is in contrast to map-model FSC which underpins the most-commonly-used definition of cryoEM resolution (Rosenthal & Henderson 2003), and Q score which is defined following a "point resolution"-like metric.<br /> 2. A reasonable outside observer (or at least I) might have had the expectation that the correlation metrics of cluster 1 should also score better as resolution improves... after all, don't we want scores to reward us for higher resolutions and better, more correct interpretations? So why wasn't this expectation also there for cluster 1, and if it was, how about pointing out that the expectation was violated by that cluster?<br /> - Cluster 1: I believe the important thing is here is that these are real-space, not frequency-normalized, correlation scores. I suggest changing "The cluster consists of six correlation measures" to "The cluster consists of six real-space correlation measures"<br /> - p9, l7-9: "The observed trend arises at least in part because as map resolution increases, the level of detail that a model<br /> map must faithfully replicate in order to achieve a high correlation score must also increase." Presumably, one important factor is the resolution at which the model-maps are generated. If the model-maps were generated at, say, 1.2 Å resolution, one might expect the opposite trend in real-space correlation scores: the score should increase as the resolution improves and approaches 1.2. Is there something that can be stated briefly about what resolution these methods generate their model-maps? Also, as I said above, doesn't this seem like the "wrong" behavior? Shouldn't validation metrics give better score for higher resolutions? If not, perhaps explain why, but if so, I think it is worth stating that this is not a desirable property.

      Separately:<br /> - p10, l19-20: I have a question, to which I am not sure an answer exists. Is the fact that 33 of the submitted models have zero outliers expected (statistically speaking) given the resolution range? In other words, what's the p-value of this occurring? On a very naive level, I bet this would be highly unlikely unless Rama restraints were used. Can the authors make any statements about this, e.g. "more than half of submitted models had zero Rama outliers, which would be extremely unlikely in the absence of restraints used during refinement".<br /> -p 10, l21-22: Would the authors be comfortable adding a statement along the lines of, "and the reduced utility of Rama outliers as a validation metric" at the end of the paragraph? I just think we could do with reviewers and the field in general acknowledging that zero Rama outliers is actually weird, not expected, and not a sign of truly improved models.<br /> - p12, l9-11: "A wide variety (...) in different places". Please re-check this sentence. Is it missing "were used to produce a model"?

      • p5, l4 (intro): "Researchers can now routinely produce structures at near-atomic resolution" This is fluffy language, which doesn't actually mean much because you haven't yet defined explicitly what you mean by "near-atomic resolution". If you mean better than 3 Å (which I think is how you define "near-atomic" later), I would encourage you to consider whether you really believe that these resolutions are truly routine. I would disagree and I think your figure 1 also disagrees.

      -p 5, l16: perhaps "derived" -> "convened" ?

      • p7, l26-30: If I understand correctly, this is what is sometimes referred to as "peptide flip". Given that this is one of the most common problems in models from cryoEM maps, I think this warrants more detailed explanation. For example, I admit to not having a good, intuitive grasp of the problem, as evidence by the fact that the second point (about how refining locally in the Rama plot pulls the geometry to the wrong local minimum) is almost lost on me. By this I mean that I understand what the authors are stating, but that I would be quite incapable of explaining why it is the case that Rama refinement leads to a worse solution. Also, it is not intuitively clear to me what exactly is meant by side chains being "pushed further in the wrong direction".

      My suggestion: could the authors add a figure depicting the geometry of a problematic bond, perhaps next to the Rama-refined (wrong) version as well as the flipped and corrected version of the geometry? Whatever might be the most simplified depiction of this, I would suggest without an experimental mesh, and erring toward the abstract, rather than realistic.

      • p32, l27-28: "but to the same cluster in b" - that isn't saying anything, is it? since, all measures were in the same, and only, cluster...

      • Figure 4 - I suggest a few tweaks to improve intelligibility on first read:<br /> -- label panels a and b to differentiate them, e.g. "a - per-model correlation", "b - per-target correlation"<br /> -- label the clusters "c1", "c2", "c3" rather than 1,2,3<br /> -- panel c: It's not obvious at first glance that 1,2,3 in c refer to 1,2,3 in a. Switching to c1,c2,c3 notation may help, but also how about framing them in red like they were in panel a, rather than black?

      Alexis Rohou<br /> 23-Aug-2020

    1. On 2020-07-30 12:10:14, user Shankar Srinivas wrote:

      Thank you for starting the discussion here Alfonso, and for the detailed, helpful comments. Our responses (on behalf of all the authors) below:

      • You rightly point to several recent single cell transcriptomic characterisation of non-human primate embryogenesis. In addition to the ones you cited, there is also the study from Niu et al. (PMID: 31672917). Comparing the human gastrula data with these would certainly be interesting, although there are a number of caveats (some of which you also point out). The Nakamura data set is valuable but unfortunately there are relatively few cells from the stages comparable to our CS7 gastrula, making a meaningful comparison difficult (36 cells at E16 and 53 cells at E17 and of these, approximately half annotated as epiblast). The in vitro cultured embryos are exciting for the opportunities they open up, however, there are several factors that can confound a meaningful comparison. For example, for the cultured human embryo data, the stage is not comparable (they had to stop at 14 dpf) and again the number of cells is relatively small (70). Similarly, in the Ma et al. dataset, at 17 dpf, there are only 43 cells from embryonic tissue. More importantly, given that these samples are cultured, it would be difficult to determine whether any differences between human and the monkey are due to the species differences or the culture. <br /> For these reasons, although we were tempted to compare the human gastrula data with these data-sets, we decided to prioritise comparisons with the mouse because it would provide clearer insights.

      • Regarding neural differentiation and ‘marker’ gene expression/co-expression: as you say, SOX2 and OTX2 are co-expressed in the rostral neuroectoderm, but this doesn’t imply that cells co-expressing these two markers are necessarily neuroectoderm. Epiblast cells also co-express these two markers - eg. see mouse gastrula atlas - https://marionilab.cruk.cam... . Just looking at markers can be a blunt tool that does not lend itself to categorical classification, particularly of related cell types/states. Therefore, wherever possible, we used orthogonal information (location of cells) to help annotate the clusters. As you note, we found expression of SOX2 and OTX2 in the rostral domain, but they weren’t only in the rostral domain – they are also co-expressed caudally (the Epiblast cluster is 45% rostral and 55% caudal). In Sup fig 6, we look quantitatively at several markers.

      • Regarding amnion: We mention in the text that the cluster we annotate as Ectoderm likely includes both embryonic and extra-embryonic (=amnion) ectoderm. Regarding your point about POSTN as a marker of the amnion – as you say, a cursory look may indeed lead one to annotate that cluster as amnion, but if one looks deeper, at the data in the paper you cite (Dobreva et al. PMID: 29884675), one can see clear expression of POSTN in the yolk sac mesoderm (Figure 3a) as well as amniotic mesoderm. So though POSTN is undeniably a ‘marker’ of amniotic mesoderm, it is equally a ‘marker’ of the yolk sac mesoderm. Moreover, 69% of the cells from that cluster were collected from the yolk sac, arguing against it representing amnion. This again demonstrates the danger of allowing ‘marker’ genes to take on a life of their own.<br /> An interesting point to consider is the remaining 31% of cells in this cluster that are spatially allocated to the embryonic disk. The simplest explanation for this is the imprecision of the micro-dissection, which might have left behind a little yolk sac around the fringes of the embryonic disc. An alternative explanation however is that this 31% represent amniotic mesoderm (which, along with YSM would be expected to be POSTN +ve) and would imply that at CS7, amniotic mesoderm is transcriptionally very similar to yolk sac mesoderm.

      • Regarding hemogenic progenitors - none of the text books on human embryology that we use speak of the blood forming at E13/E14 and the review by Alexander Medvinsky that you cite also indicates that the earliest this occurs is between CS7 and CS8. There is disagreement in the human embryology literature regarding the correspondence between Carnegie Stages and ‘Embryonic Days’ or ‘days post fertilisation’ that can cause confusion. To add to this confusion, as we know from the mouse (eg see the Lawson and Wilson 2016 staging) there is a considerable embryo to embryo variability in the rate of development, so it can be tricky to estimate the precise age post-fertilisation of an embryo on the basis of its carnegie stage categorisation. This is why as far as possible, we used CS throughout the preprint and gave a reasonably broad range of days this might correspond to. <br /> Additionally, in our analysis there is much more detail than the mere indication of the presence of primitive blood islands: we have identified specific cell populations that would be thought to arise much later and have never been described in human at this early stage before, e.g., EMPs.

      • Regarding PGC: Existing studies of PGCs are either from NHP or from cultured embryos, while ours is the first unequivocal demonstration of the presence of PGC in a in utero developed human embryo as early as CS7.

      • Regarding the node: trying to identify it is certainly on our to-do list.

      As you mention, we think there are still lots of insights that will emerge from this dataset. While we focused on some discoveries we found particularly interesting, we are aware of the extreme richness and complexity of these data and look forward to insights emerging from the analyses of others with expertise and interests different to ours.

      Shankar and Antonio

    1. On 2020-07-22 18:35:18, user Guest wrote:

      "We first performed 20 simulations (680 µs total simulation time) of two GTP-bound K-Ras proteins (PDB 4DSN) in aqueous solvent (Figure S2A, left). In one simulation, the two K-Ras proteins formed stable interactions mediated in part by a bound GTP (Movie S2). This model is compelling because it provides a direct explanation for the GTP-dependence of K-Ras dimerization. Hereafter we will refer to this model as the GTP-mediated asymmetric (GMA) dimer model. "

      "Because K-Ras dimerization occurs at the membrane, we then performed 23 simulations (363 µs total simulation time) of two GTP-bound K-Ras proteins anchored to the membrane by their farnesylated Cys185 (fCys185) residues31 (Figure S2A, right). In one of these membrane simulations (Figure 2A and Movie S3), the K-Ras proteins also formed the GMA dimer; the structure is virtually identical to that obtained from the solvent simulations (Figure 2B, upper panel). "

      I'm curious what happens in the 19+22=41 simulations (~990us out of 1040us simulations) not discussed in the manuscript, and if any quantitative analyses/measurees were used to decide on the dimer model that you proposed. Was this structure the only structure that was found in both solvent and membrane simulations? Were any of the other dimers that formed reproduced in multiple simulations? Is there a quantitative metric that could be applied that points to the dimer model you accepted? Did you use mutational data to select the final model? Did you run 23 simulations of membrane association because the first 22 didn't reproduce the solvent model?

      I'd also be curious to hear a comment on the computational efficiency/inefficiency of this approach. It seems you've run 1.04 milliseconds of simulations and thrown out 0.990ms to build a dimer model. What happens if you try to use the existing data you used to validate your model (mutation data, NMR line broadening) as a restraint in a docking method such as HADDOCK (https://haddock.science.uu.... Given the key role of salt-bridges, it seems you may have been able to simply search for complimentary electrostatic surfaces to build the dimer model, and then run short MD refinements.

      Essentially what am I asking is, do you think this is a good use of long time-scale MD? The amount of simulation required to model a dimer interface is simply astonishing.

    1. On 2020-07-15 20:50:55, user Jeffrey Ross-Ibarra wrote:

      While the connection between repeat content and life history in plants is known, this paper does a nice job of suggesting a connection between telomere length and flowering time in three plant species. I think the main thing that could help, although a big ask, is to connect telomere variation to life history mechanistically. TERT knockouts in thaliana exist, for example (and if my quick read is correct, live longer and fail to flower). But work on a mechanism would go a long way to reassuring that the results aren't simply correlative.

      I would like to see the selection analysis done without ascertaining the two haplotypes. Perhaps iHS or something would be good here? I worry ascertainment of the two haplotypes may give spurious signals of selection.

      I would like to see genome size used as a covariate in analyses throughout the paper. We know genome size correlates with flowering time, and if I understand the approach to counting repeats correctly, I could imagine a scenario where two plants with similar telomere length nonetheless get different estimates because genome size changes the relative proportion of kmers.

      I think given how strong population structure is in thaliana, using more than the first few PCs may be warranted. I'd also like to see some comparison/discussion of these results to the telomere-length mapping in Abdulkina et al. (https://www.nature.com/arti..., which are not impacted by flowering time and don't find TERT as a candidate gene (maybe both haplotypes aren't present in their parents?). Of course, TERT makes sense as a candidate and their results overlap with a RIL pop, so I don't doubt this finding. Nonetheless, I think more stringent control of pop structure and comparison to the MAGIC pop are probably warranted.

      Maybe also worth comparing other repeats -- do we see the same trend if we look at other common repeat types? Long et al. 2013 (https://www.nature.com/arti... find massive difference in ribosome repeat in thaliana between populations that also differ in flowering time (and perhaps worth noting the connection between ribosome biology and telomeres in Abdulkina et al.)

      Some discussion of the percent variation explained I think is warranted. In each of the three species, telomere abundance explains at most a few percent of the variation in flowering time. Is this expected?

    1. On 2020-07-02 13:45:06, user Concerned Biophysicist wrote:

      This is very cool work, and the public engagement of folding at home aspect is great to raise awareness/excitement about computational biophysics.

      As a scientists working in the field however, I do wonder if having "to Combat Covid 19" in the title might be crossing a threshold that we as a field collectively decided exists for a reason. Much (most?) of the applied word work in computational chemistry and biophysics is on disease related proteins, and many of the methods we all work on have relevance to drug discovery, so we could all be constantly claiming/marketing most of our papers as "fighting X disease" or "towards a cure of Y diseases" while in reality most of what we do is fundamental basic science, with eye towards pharmaceutically relevant discovery in the future. This science is just as important as applied pharmaceutical research in the research ecosystem, and does ultimately lead to tools and insights that are relevant to the pharmaceutical industry, but I think there is something to be said about keeping some of our powder try in terms of the claims we make about what is essential basic science and what is pharmaceutical research, so as not to create an arms race in the field to market all of our methodological work as having a dramatic immediate effect in curing disease, and end up devaluing and lowering the profile of the essential basic science that makes all this research possible.

      Bluntly speaking, if we all start slapping "to cure cancer" on the titles of every paper that is about developing molecular simulation or drug discovery tools and every paper that studies proteins related to cancer, we may drum up a little buzz and be able to eek some extra press in the short term, but eventually, there is backlash to overselling a field. Other scientists will start to view all of our claims of the value of and potential pharmaceutical relevance of our work as oversold and less credible. This skepticism could creep into funding priorities and funding decisions (for national funding agencies and VCs), so it can effect more than the just the labs that are pushing the boundaries of how boldly we claim that "computational biophysics research = curing disease".

      I get that folding at home is playing a different public facing role in our field than most academic and pharmaceutical/biotech labs, and I think a lot of it is great, but the simplification/boldness of some of the claims does make me worry a bit about an inevitable backlash for the entire field,

    1. On 2020-06-29 20:02:07, user Jing Peng wrote:

      Dear authors,<br /> My name is Jing Peng, a scientist from UC Davis. I am happy to take this opportunity to congratulate you on the publication of the paper “FoodMine: Exploring Food Contents in Scientific Literature” in the bioRxiv. The idea of using computational methods to analyze published studies to enlarge and annotate food composition databases from the scientific literature is fascinating.

      The existing food composition database is unbelievably lacking in critical information of most of the actual composition of food. The current food databases are asymmetrical. For essential nutrients such as mineral and vitamin, food scientists have identified each specific type such as iron, zinc, vitamin C, and vitamin D. Each compound has its unique name and related compound-specific research. But for most of the non-essential nutrients, there is only a vague “class” name for them, such as carbohydrates. There are lots of unique and independent compounds in the "class" carbohydrate, and they each have a specific name and feature. However, current food databases contain neither their names nor their functions. We need to understand each chemical compound and its effects. If food databases are lacking in such basic and important information, how do nutritionists provide the most effective advice to the population? Right now, most people, including some scientists, acquiesce to the vague definitions of those nutrients and the shortage of annotations in the food database. It is easy for people to lose the vision of measuring all compositions in food. But it is the food compositions that help us understand diet and the relationship between diet and food. Without such basic information, talking about diet is insubstantial.

      The central idea of using scientific literature as a database and extracting information from those data is engaging. This approach demonstrates the successful extraction of novel compounds that were not included in existing food databases. If taken to its logical conclusion, it is indeed imaginable as the authors suggest to recommend diets based on the chemical composition of the food. However, this logic and its lack of imagination of food and health more broadly is a problem I have with the paper. Food exists in multiple dimensions. Compounds that are beneficial to people’s health are one important reason for people to choose food, but not the only one. When people think or talk about the food, they will not only talk about the chemical compounds of food, but also describe the appearance, taste, smell, and texture of food. Appearance and smell would contribute to the first impression of food. If food does not exhibit an attractive appearance and flavor, people will hesitate to taste it. Even with appearance and odor that are themselves attractive to people, without delicious taste and texture, people will still give up on the experience. So only measuring chemical compounds of interest to health and ignoring the other aspects of food is limiting. Food is joy. A strategy based on chemical compounds solely to give food recommendations is emotionless.

      Food is multi-dimensional and so are people and they are different. Since each individual has his/her own sensory preference, they choose foods and diets based on their preferences. So, the brilliant idea of constructing a chemical compound network in food, even considering taste may not be sufficiently precise to provide useful food advice for the whole population. In order to individualize diet and give more focused food advice, each individual's diet preference is key. How do the authors imagine that their methods could measure the responses of people to foods with sufficient accuracy to capture their diet preferences? In place, such databases would create a more complete food network combined with food composition network annotated for personal preference. As food databases become more thorough and acquire the dimensions of individual dietary preferences, we could imagine using technologies and computational methods to provide more precise, sustainable, and enjoyable food for people.

      In the end, I would like to congratulate the authors for such inspiring ideas, using computational methods to extract information about chemical compounds in food to expand existing food databases. I look forward to more multidimensional research to define future food database structures and contents. As a person who is going to work in food systems, my future in food depends on usable information and enlarged food composition databases.

      Best,<br /> Jing Peng

    1. On 2020-05-22 19:51:11, user Kenneth W Witwer wrote:

      This preprint confirms some previous findings that miRNA:EV ratios are quite low, and that in some cell culture supernatants (as also suggested elsewhere for biofluids), most miRNAs are found outside EVs. Also that host EV proteins are much less fusogenic than those of viruses, particularly those like VSV.

      I think that the greatest disagreements with this manuscript, which includes rigorous approaches, will be around how strongly the conclusions are presented. In my opinion, the authors certainly have a right to be a little provocative in their language, but perhaps some more caveats could be introduced in revision. It's still possible that longer exposure times, different conditions, etc. could lead to uptake with some functional relevance.

      A few random comments:

      "These experiments also indicated that, depending on individual reporter plasmids, 20–300 miRNA copies per cell reduced the luciferase activity by half (data not shown)."<br /> -Showing these results would greatly strengthen the paper by showing how little miRNA would be needed.

      "A higher ratio of EVs per cell led to a reduction of the Renilla luciferase signal probably because a very high EV concentration was toxic to the cells"<br /> -This was quite interesting to me, as we tend to see a trophic effect of EVs in other systems. I am not sure that we can generalize this result.

      Regarding Figure 6C: I would prefer to see, additionally, an experiment where miRNA mimics were introduced to the donor cells, not just miRNA-expressing plasmids, to be sure plasmids were not transferred. Although since no effect was observed, this does not affect the current conclusions.

      I may have missed it, but where are the viability data? The methods mention viability tests, but I did not see the results. Dying cells may release large amounts of miRNA, and this could greatly affect EV vs non-EV miRNA ratios.

      Figure 7A was interesting and puzzling to me. I would have expected that the mini-UC pellet would be the least pure and most "contaminated" with non-EV miRNA, followed by SEC-separated material and then density gradient. If this were the case, one would expect higher miRNA:particle ratios for the UC pellet. However, the UC pellet seems to yield fewer RNAs per particle than the other I'm not sure how much we can read into this, but the result does not seem entirely consistent with the conclusion that more purified EVs have lower RNA:particle ratios. A nice addition to this figure would be to show results from the input, too. There, one would expect many more RNAs per particle compared with the separated fractions (at least for particles in the size range detected by NTA).

    1. On 2020-05-05 18:53:52, user Taekjip Ha wrote:

      Thank you very much for sharing your interesting manuscript!<br /> We used your preprint as one of the journal club papers in the Single<br /> Molecule & Single Cell Biophysics course for graduate students of Johns<br /> Hopkins University during the Covid-19 lockdown. Students also practiced peer<br /> reviews as the final assignment. I am submitting their formal reviews here <br /> and hope you find them useful.

      Taekjip Ha


      Reviewer 1.

      Summary<br /> Overall I enjoyed this methods development paper and thought the technique<br /> showed promise for future application. I think this work is suitable for<br /> publication after some minor revisions, mainly expanding the discussion of<br /> interesting results and considering any remaining experimental limitations, or<br /> lack thereof.

      This paper characterizes the technique ABEL-FRET, which combines Anti-Brownian<br /> ELectrokinetic trapping and single-molecule FRET (smFRET) to achieve long-time<br /> imaging of freely diffusing biomolecules. The introduction describes smFRET as a<br /> molecular ruler and points out that current methods are restricted to either<br /> immobilization of molecules (which can aberrantly impact structure or function)<br /> or diffusive molecules (which limits imaging time). ABEL-FRET is situated as a<br /> method to get the best of both worlds. The authors show that their version of<br /> ABEL-FRET is more efficient than existing smFRET modalities utilizing confocal<br /> or TIRF microscopy—nearing the shot-noise limit of theoretical photon counting<br /> precision, and resolving single base pair differences in dsDNA. Illustrating the<br /> potential uses of their setup, the authors then use ABEL-FRET to examine three<br /> example systems. Example systems include: the spontaneous switching of Holliday<br /> junction isomers, ssDNA binding kinetics of the bacterial recombinase RecA, and<br /> the kinetics of single-stranded DNA-binding protein (SSB) sliding on ssDNA. This<br /> paper’s kinetic results are largely in agreeance with previously published data<br /> obtained using immobilization smFRET. Returning to their Holliday junction and<br /> SSB models, the authors propose their method is also amenable to hydrodynamic<br /> profiling—providing conformational and binding stoichiometric information.

      The main contribution of this paper is that it makes a previously proposed<br /> method a novel reality and performs an initial characterization of its precision<br /> in mostly DNA centered assays. The major strengths of this paper include:<br /> clarity in explaining the methodology, comparing findings to the existing field<br /> of knowledge as confirmation of technical accuracy, and writing style. The<br /> weaknesses of this paper lie predominantly in the lack of an expanded discussion<br /> which may have answered many questions that arose.

      Major comments:<br /> This paper proposes that the transient event indicated by a black arrow in<br /> Figure 3d may be a new dynamic state of RecA. The presented data is not strong<br /> enough to fully support this claim or rule out the possibility that the<br /> transient event represents an optical aberration or noise. Theoretically one<br /> could put an arrow in any transient peak and propose a new state. To solidify<br /> this claim, more experimental replicates could be collected to see if this peak<br /> persists (indicating a real event) or disappears as background noise. If<br /> sufficient replicates were already tested and the event was present in all, then<br /> it would helpful to see the new state indicated on multiple representative<br /> traces to prove its constancy. The number of experimental replicates could also<br /> be explicitly stated on this figure or the S12b graph moved into the main figure<br /> to support this claim. As the proposition of a potentially new RecA state would<br /> contribute greatly to the existing field of knowledge, it warrants further<br /> discussion or obvious proof in the text. Since this ABEL-FRET technique is a<br /> major technological upgrade from existing methods, any new information collected<br /> from it should be thoroughly validated to prove its reliability. Maybe<br /> information in the supplement should be added as new figures or more explicitly<br /> presented.

      A similar major point concerns the use of ABEL trapping and its potential<br /> electrokinetic impacts on charged biomolecules. Since this paper focuses on<br /> negatively charged DNA and positively charged DNA-interacting proteins, it would<br /> benefit from references or control experiments showing that the applied voltages<br /> do not change endogenous binding dynamics. This concern was addressed in<br /> Supplemental Note 3, but it is not obvious from the one sentence mention in the<br /> main text. Although it is understandable that not all concerns can be addressed<br /> in the main text, expanding the discussion of any controls which answer common<br /> questions gains added favor for innovative methods.

      Minor comments:<br /> Overall this paper was clear in word choice and grammar. Minor comments are just<br /> more questions that popped up while reading which could easily be addressed in<br /> the discussion without the need for further experimentation:

      -In this microfluidic device setup, is diffusion in the z-axis an issue at all?<br /> Are biomolecules able to diffuse in and out of focus at any point? Would such<br /> diffusion impact FRET efficiency background noise?

      -The discussion states that this method should be compatible with any<br /> FRET-labeled biomolecules, have dynamics of other proteins been tested yet (i.e.<br /> those not focused around DNA or DNA binding)? How would things change if<br /> flexible proteins (more susceptible to voltage changes) are trapped and imaged?<br /> Are there restrictions to what biomolecules can be profiled using this method?

      -Additionally, have any FRET fluorophore pairs other than Cy3-Cy5 been tested<br /> with this technique? Since it seems as if confocal microscopy was used here,<br /> could this technique be optically limited compared to other forms of single<br /> molecule imaging that rely on higher resolution microscopes? Does this matter<br /> for measurements of hydrodynamic profiling?


      Reviewer 2.

      Although single-molecule Förster resonance energy transfer (smFRET) has been<br /> used widely since its introduction over two decades ago, there is still room to<br /> tweak and improve this method for additional biological applications. Many<br /> smFRET methods rely on tethering molecules to a surface, which can disrupt their<br /> activity or function. Additionally, immobilization eliminates the possibility of<br /> interrogating hydrodynamics concomitantly with distance information provided by<br /> FRET. However, without surface immobilization, tagged molecules diffuse in and<br /> out of the detection volume rapidly, preventing long observation times. One<br /> promising method to overcome the limitations inherit in immobilizing molecules<br /> for smFRET is Anti-Brownian Electrokinetic (ABEL) trapping. In an ABEL trap, a<br /> single molecule’s position is monitored in real time, and its Brownian motion is<br /> cancelled out by applying electrokinetic force, keeping the molecule within the<br /> field of view for an extended amount of time. This allows of longer observation<br /> times, without the need to tether the molecule of interest. In this work, the<br /> authors extend the possible observation time of ABEL-FRET, achieve high<br /> resolution by obtaining high precision FRET efficiency measurements, and are<br /> able to combine hydrodynamic measurements with smFRET.

      The authors achieve a longer sampling time than has previously been reported;<br /> they are able to observe a FRET pair within the ABEL trap for up to ten seconds,<br /> an exciting advancement in the field. Additionally, their high precision FRET<br /> efficiency measurements allow them to achieve single base pair resolution when<br /> observing double stranded DNA labelled on either end with a FRET pair. Due to<br /> the long observation times and the fact that this technique is tether-free, they<br /> are also able to profile the hydrodynamics of molecules caught in the ABEL trap.<br /> The paper is well written, the logic is sound and clearly spelled out, and most<br /> proper controls are included.

      Although current events prevent most of us from performing new experiments in<br /> the lab, there are several points it would be worthwhile for the authors to<br /> address. First, what, if any, effect does the ABEL trap have on protein<br /> hydrodynamics? Although the authors demonstrate that increasing the<br /> electrokinetic force applied by the trap does not impact the kinetics of<br /> Holliday junctions, it would be reassuring to see the same validation performed<br /> with a tagged protein. Several proteins with different charge states would be<br /> preferable, to confirm that diffusion is not significantly altered by the forces<br /> necessary to contain a protein within the trap. If the authors have data<br /> speaking to this question, it would be worthwhile to include; if not, they might<br /> speculate on why they are not concerned about electrokinetic effects on<br /> proteins. Similarly, are charged ions, such as Mg2+ used in the Holliday<br /> junction experiments, affected by the ABEL trap? Could the electrokinetic forces<br /> applied affect the local concentration of these small molecules, influencing the<br /> biological processes being observed? More discussion of this would be<br /> beneficial.

      My second issue relates to data interpretation. The authors state that with<br /> their high resolution, it is possible to detect additional transient states that<br /> have been missed by previous methods. The data supporting this come from<br /> experiments to validate their technique by investigating RecA-ssDNA<br /> nucleofilament dynamics. The authors convincingly reproduce past experiments<br /> that have identified three different conformations. In addition, they argue, the<br /> resolution of their experiment allows them to identify more transient states<br /> that have gone undetected in the past (shown in Fig 3d, Fig S12b). Although it<br /> is possible that these additional FRET efficiency peaks are indeed newly<br /> discoverable states, due to the low number of occurrences observed, it is<br /> difficult to distinguish them from noise. Until it is possible to reproduce<br /> these results with a larger sample size, or via an independent method, we should<br /> be cautious in our interpretation of the additional peaks.

      The remaining questions and limitations do not, however, detract from the<br /> significance of the technical advancements this paper introduces. The increased<br /> resolution and ability to couple smFRET measurements with hydrodynamics are<br /> important steps forward in realizing the potential of smFRET. It will be<br /> exciting to see what interesting biology can be uncovered with this improved<br /> technique.


      Reviewer 3.

      Wilson and Wang present a technique for acquiring single molecule Förster<br /> resonance energy transfer (smFRET) measurements that avoids the potential<br /> confounds of established smFRET techniques by using Anti-Brownian ELectrokinetic<br /> (ABEL) trapping to capture free molecules in solution. They demonstrate the<br /> ability of this technique to measure sub-nanometer distances on dsDNA species<br /> and detect changes in DNA conformational states on a millisecond timescale with<br /> the same fidelity as traditional tethered smFRET techniques and with enhanced<br /> precision. The authors highlight the inherent ability of their ABEL-FRET<br /> technique to constantly sample molecular charge and diffusion, which allows them<br /> to temporally pair FRET signal and diffusion kinetics in order to profile<br /> molecular species in three-dimensional space. Through these pilot experiments,<br /> Wilson and Wang showcase the utility of a unique single-molecule imaging<br /> technique that generates measurements comparable to those of tethered smFRET<br /> while providing the added benefit of hydrodynamic profiling.<br /> The primary justification for ABEL-FRET, as framed by the authors in their<br /> introduction, is the ability of the technique to circumvent the potential<br /> confounds introduced by traditional smFRET techniques, which either immobilize<br /> molecules by covalent tethering or lack the temporal longevity needed to probe<br /> the conformational dynamics of free molecules in solution. The major challenges<br /> arising from tethered smFRET, as emphasized by the authors, are 1) shortcomings<br /> in signal detection precision caused by a field of view limited to the<br /> molecule-coverslip interface, 2) an inability to extract diffusion information<br /> due to covalent tethering and, 3) the potential of covalent tethering to<br /> introduce biochemical consequences on conformation or function. The presented<br /> data unquestionably supports the ability of ABEL-FRET to capture molecules on a<br /> much longer timescale than with contemporary untethered techniques. The authors<br /> provide good evidence supporting the ability of ABEL-FRET to make detection<br /> measurements with greater precision than tethered smFRET (Fig 1b) and devote a<br /> significant number of experiments to showing the benefits of hydrodynamic<br /> profiling afforded uniquely by ABEL-FRET (Fig 4). While the aforementioned<br /> improvements are alone enough to justify the utility of ABEL-FRET for measuring<br /> single molecule conformational dynamics, the ability of ABEL-FRET to avoid the<br /> potential biochemical pitfalls of molecular tethering is never directly tested.<br /> Taking this into consideration, an introduction that places more emphasis on the<br /> optical and diffusion limitations of tethered smFRET, rather than the<br /> biochemical limitations, would better position and highlight the strengths of<br /> ABEL-FRET that are directly supported by the data. Likewise, more discussion<br /> could be devoted to speculating or explaining why ABEL-FRET signal detection<br /> allows for such highly precise FRET efficiency measurements, as this finding is<br /> striking and strongly justifies the utility of ABEL-FRET over previous smFRET<br /> techniques. <br /> In experiments probing the conformational states of DNA species in the<br /> presence of RecA (Fig 3 and Fig S12) the authors provide data showing the<br /> distribution of observed FRET states (Fig S12b). While these results are<br /> interpreted as three separate populations, thus conformations, and the existence<br /> of a “minor state” is highlighted, more replicates are needed to separate these<br /> populations in order to fully support this interpretation. Though more data is<br /> needed to confirm RecA binding states, the experiments exploring the binding<br /> conformations of RecA and SSB provide good foundational data that can serve as<br /> models or examples for reference in future studies of interactions where the<br /> binding conformations/dynamics are unknown. <br /> The authors demonstrate well the ability of ABEL-FRET to make highly precise<br /> measurements for as long as seconds and can extract the same conformational<br /> populations as tethered smFRET from FRET efficiency measurements. These<br /> strengths validate ABEL-FRET as a technique comparable to its contemporaries;<br /> however, the ability to simultaneously extract smFRET and diffusion information<br /> from ABEL-FRET highlights the uniqueness and justifies the necessity of this<br /> technique in molecular profiling. The authors elegantly demonstrate the ability<br /> to uncover more conformational populations than previously identified with their<br /> own smFRET measurements alone by applying an orthogonal diffusion axis to their<br /> FRET efficiency measurements. In doing so, they provide direct evidence for two<br /> separate biological phenomena and simultaneously demonstrate the unique<br /> capabilities of ABEL-FRET.<br /> Wilson and Wang provide a linear and logical introduction to their new<br /> single-molecule profiling technique, ABEL-FRET. They validate the technique’s<br /> ability to produce conformational data on par with its contemporaries while also<br /> demonstrating that ABEL-FRET performs with greater temporal longevity than free<br /> molecule smFRET and better optical precision than tethered smFRET. The authors<br /> go on to show the unique ability of ABEL-FRET to integrate both FRET and<br /> diffusion information in order to unveil conformational populations before<br /> unresolved with traditional smFRET. This paper presents exciting new technology<br /> with evident utility and great promise.

    1. On 2020-05-05 18:47:07, user Taekjip Ha wrote:

      Thank you very much for sharing your interesting manuscript!<br /> We used your preprint as one of the journal club papers in the Single<br /> Molecule & Single Cell Biophysics course for graduate students of Johns<br /> Hopkins University during the Covid-19 lockdown. Students also practiced peer<br /> reviews as the final assignment. I am submitting their formal reviews here <br /> and hope you find them useful.

      Taekjip Ha


      Reviewer 1.

      Summary of Evaluation:

      Here, Janissen et al. describe a novel mechanism by which viral RNA-dependent<br /> RNA polymerases (RdRp) undergo induced template switching during RNA synthesis.<br /> These template switching reactions can be intermolecular, resulting in<br /> homologous recombination, or intramolecular, resulting in copy-back synthesis.<br /> Typically, RNA-analogues introduced as antivirals result in chain termination or<br /> lethal mutagenesis, but non-single-molecule experiments may have inappropriately<br /> classified instances of template switching as termination and would not have<br /> been detected. Therefore, by utilizing a single-molecule approach, the authors<br /> are able to analyze RdRp pauses, backtracking, and copy-back synthesis, which<br /> they ultimately determine can be induced by the addition of a<br /> pyrazine-carboxyamide antiviral nucleotide with an unconfirmed mechanism.<br /> Overall, the paper makes a compelling argument for viral RdRp backtracking and<br /> recombination induction as a third mechanistic class of antivirals, although a<br /> few components of the experimental design and conclusions may require further<br /> experimentation. The use of a single-molecule approach to probe the in vitro<br /> dynamics of RdRp synthesis, though previously described, proves powerful in<br /> elucidating how reversals and recombination, particularly of EV-A71 RdRp, may<br /> occur. Further, the data suggests that the recently approved antiviral T-1106<br /> may be acting through this recombination mechanism, which has not been<br /> previously described. <br /> The article benefits from well-structured and balanced figures that successfully<br /> convey the data at hand in a straight-forward manner. Although occasionally<br /> verbose (discussion) and short at others (conformational dynamics results), the<br /> paper’s writing successfully conveys the importance of the findings and supports<br /> the findings with appropriate literature references. The work itself tells a<br /> fairly complete story with logical transitions and progression between<br /> experiments and conclusions. The paper is overall of high quality, though<br /> further controls and validation may be necessary to fully substantiate some<br /> claims as detailed below. Though not necessarily field-defying, the paper<br /> introduces the possibility of novel mechanisms of antiviral therapeutics that<br /> could serve to push human health forward and is deserving of high recognition.

      Summary of Data:

      To elucidate the mechanism by which template switches occur, the authors first<br /> utilized a magnetic tweezers assay to determine the elongation or retraction of<br /> an RNA-RNA complex that served as a read-out of RdRp synthesis or<br /> backtracking/reversal, respectively. Using the RdRp from EV-A71, a virus more<br /> prone to recombination than the RdRp of their previous work (poliovirus, PV),<br /> the authors show instances of RdRp backtracking and magnetic bead retraction,<br /> which they conclude is due to a template switching mechanism that leads to<br /> copy-back RNA synthesis and formation of defective viral RNA products. Utilizing<br /> an EV-A71 RdRp variant analogous to a previously described mutation in PV RdRp<br /> that impairs recombination, the authors showed that while replication was not<br /> impaired, the EV mutant showed a 100-fold decrease in viral titer in an assay<br /> that required recombination for successful viral replication. The mutant virus<br /> was also highly attenuated in a mouse model relative to WT, in agreement with<br /> previous work that PV RdRp requires recombination to cause disease in a mouse<br /> model. <br /> Mutant EV-A71 RdRp showed increased pause probability and pause duration, but<br /> decreased reversal probability compared to WT, which suggested a decreased<br /> ability to backtrack in the mutant. The orthologous PV mutant showed similar<br /> results. Using molecular dynamics simulations, the authors showed that the<br /> EV-A71 mutant had a smaller RNA-binding channel compared to WT, with the mutant<br /> channel more closely resembling the PV RNA-binding channel size. The dynamics<br /> data corroborates a mechanism by which EV-A71 RdRp, but neither its mutant nor<br /> PV RdRp, has a binding channel large enough to accommodate copy-back RNA<br /> synthesis. <br /> Finally, the authors utilized the antiviral ribonucleotide T-1106, a drug with<br /> inconsistent mechanistic understanding, in cell-based viral recombination<br /> experiments. The WT EV-A71 RdRp showed increased pausing, pause duration, and<br /> reversal probability in the presence of T-1106. In the recombination assay,<br /> T-1106 increased recombination in WT EV-A71 but not the recombination defective<br /> mutant. The authors also show that the mutant RdRp does not lead to viral<br /> resistance to T-1106.

      Major Issues:

      • In the single molecule assay, the retraction of the magnetic bead is<br /> attributed to copy-back synthesis. Though a plausible mechanism, the main<br /> evidence of this mechanism is of similar kinetics between elongation and<br /> copy-back. While a valid assumption given the data shown, further validation is<br /> required to definitively say that copy-back synthesis is occurring. The most<br /> obvious way to validate this is through the determination of the RNA products.<br /> Though it may be difficult to detect RNA products in these single molecule<br /> experiments, this information is crucial to confirm that copy-back synthesis is<br /> indeed occurring, especially since this mechanism is invaluable to conclusions<br /> drawn throughout the paper.

      • The in vitro experiments in this paper exclusively look at intramolecular<br /> template switching (though this must be further validated as stated above).<br /> However, most if not all of the cell-based assays exclusively assay for<br /> intermolecular recombination (luciferase donor assay). Though the correlation<br /> between the two types of recombination are believable, validating that<br /> intermolecular recombination trends hold in vitro and that intramolecular trends<br /> hold in the cell-based assays is a crucial control. Without this data, the<br /> mechanistic conclusion of copy-back and recombination sharing an intermediate is<br /> jeopardized.

      Minor Issues:

      • The paper would benefit from greater elaboration on the effects of defective<br /> viral genomic products on viral replication to provide context for the activity<br /> of the purported new antiviral mechanistic target. What is known about defective<br /> viral genomic products?

      • In the T-1106 assays, a 400µM T-1106 concentration is the only concentration<br /> that significantly increased recombination. This is not elaborated in the paper.<br /> Would you not expect higher concentrations to also have increased recombination?

      • The flow of the paper is occasionally interrupted by terse, short sentences<br /> and the occasional grammatical error. Luckily, this is a bioRxiv and these are<br /> easily fixed prior to peer review.


      Reviewer 2.

      Summary: <br /> Janissen et al describe a third mechanistic class of antiviral ribonucleotides<br /> that utilize RdRp template-switching reactions, an interesting topic that is<br /> highly relevant today and can be especially appreciated in the context of the<br /> recent COVID-19 pandemic. They first demonstrate the need for new broad-spectrum<br /> antiviral therapies and identify viral polymerases as a powerful target, placing<br /> special emphasis on RNA-dependent RNA polymerases (RdRp). The currently approved<br /> antiviral nucleotides fall into two functionally distinct mechanistic classes;<br /> they are either chain terminators that stop nucleic acid synthesis or lethal<br /> mutagens which increase mutational load on the viral genome. However, these<br /> often have off target effects which lead to the emersion of a new class of<br /> antiviral nucleotides known as the favipiravir (T-705) class which requires the<br /> cellular nucleotide salvage pathway. Within this class, the nucleoside analog,<br /> T-1106, has high efficacy but its mechanism of action is unknown which prevents<br /> FDA approval. This work is an expansion of a previous study that used a magnetic<br /> tweezers approach to illustrate that pausing and backtracking of the elongating<br /> Poliovirus (PV) RdRp was enhanced by incorporation of T-1106 into the nascent<br /> RNA. They noted that traditional polymerase elongation assays would have missed<br /> the backtracked state, which they believe provides evidence for a third<br /> mechanistic class of antiviral ribonucleotides that rely on RdRp mediated inter-<br /> (homologous recombination) or intramolecular (copy back RNA synthesis) template<br /> switching. <br /> In hopes to elucidate this mechanism, Janissen et al hypothesized that T-1106<br /> induced backtracking generates a free 3’ single-stranded RNA end, which<br /> functions as an intermediate for template switching and results in a reduction<br /> of viral replication. In this work they 1) characterized the recombination prone<br /> Enterovirus (EV) RdRp in their magnetic tweezers system, 2) developed a<br /> recombination deficient EV RdRp (Y276H), 3) briefly analyzed the structures of<br /> WT and mutant PV and EV RdRps in silico, and 4) explored the effect of T-1106 on<br /> the WT and mutant RdRps. They used the same magnetic tweezers approach is in the<br /> previous study to demonstrate that EV RdRp pauses similarly to what was seen for<br /> PV RdRp, which is inversely correlated with nucleotide concentration. Their more<br /> interesting finding was that unlike PV RdRp, EV RdRp displays reversals. They<br /> proposed a probable reversal mechanism in which EV RdRp pausing leads to<br /> backtracking that produces a free single stranded 3’ RNA end which can serve as<br /> a primer for copy-back RNA synthesis as observed by a decrease in bead height.<br /> To connect reversals with recombination they generated a recombination deficient<br /> RdRp mutant, Y276H, which is orthologous to the known PV mutant, Y275H.<br /> Replication, plaque formation, and genome amount of virus titer were determined<br /> to be similar for WT and Y276H EV RdRp. They confirmed that this mutant was<br /> recombination deficient and showed that oral inoculation of EV Y276H resulted in<br /> attenuation of virulence in hSCARB2 mice compared to WT. Y276H had increased<br /> pausing, decreased processivity, and decreased reversals which were still<br /> pause-dependent. This finding was puzzling but was proposed to be due to<br /> increased stability of Y276H on the free 3’ RNA end, rendering it unavailable<br /> for reversals. Similar results were observed for the PV RdRp Y275H. In hopes to<br /> explain why EV RdRp can reverse but not PV RdRp and the impacts of the mutations<br /> they superimposed the structures and conducted molecular dynamics simulations.<br /> They concluded that PV had the smallest RNA channel which was similar to EV<br /> Y276H and that EV WT RdRp had the largest channel, enabling it able to undergo<br /> copy-back RNA synthesis. Finally, they explored the effect of T-1106 on EV RdRp<br /> template switching, which was shown to increase dwell time, reduce processivity,<br /> increase pause probability and duration, and increase reversal probability, all<br /> of which were claimed to be reflective of intramolecular template switching.<br /> They concurrently assessed T-1106’s effect on PV RdRp which was reflective of<br /> intermolecular template switching. They therefore concluded that antiviral<br /> ribonucleotides lead to increased backtracking and recombination, in which<br /> recombined products are not replication-competent and thus lead to a decrease in<br /> virulence. Most of the claims in this paper are substantiated by data, however,<br /> there are some major and minor flaws outlined below that need to be addressed<br /> prior to acceptance. A revised version of this paper would be suitable. <br /> General Feedback:<br /> Overall, this paper provides compelling evidence that RdRps display pausing and<br /> backtracking behavior. Their magnetic tweezers platform allows for single<br /> molecule analysis of the RdRps, which to my knowledge, has not been done before<br /> besides through their previous paper. The existence of a third mechanistic class<br /> of antiviral ribonucleotides is substantiated by their data, however, they only<br /> briefly addressed T-1106. The majority of the paper is spent characterizing EV<br /> RdRp in their magnetic tweezers system, with only one figure dedicated to T-1106<br /> effects. It may be more beneficial to split this paper in two, with one paper<br /> focusing on characterizing EV RdRp and comparing it to PV RdRp and the other<br /> determining the effects of T-1106, especially considering the T-1106 experiments<br /> that must be done to confirm its viability as an antiviral. Additionally, to<br /> convince the existence of an entire third mechanistic class, the favipiravir<br /> class of antiviral ribonucleotides should all be analyzed in their system. In<br /> general, the paper has an acceptable and organized flow, with only minor<br /> adjustments necessary (see minor issues). The experiments appear reproducible<br /> and robust. Their work in PV and EV RdRp recombination is mainly confirmatory,<br /> however, their platform allows for analyzation of this process at the<br /> single-molecule level which reveals novel insight into the mechanism. There are<br /> a few major issues that need to be addressed, outlined below, to support this<br /> finding and complete the paper. The discussion of the paper is also repetitive<br /> and should be edited to be more concise.

      Major Issues:

      The article did not discuss the intrinsic ability of RdRps to undergo template<br /> switching, which they extensively showed in their assays. If recombination and<br /> copy-back synthesis are intrinsic why would these be a valuable target as an<br /> antiviral? Is there a specific level or cut off where too much recombination<br /> becomes detrimental? I would like to see an assay in which they determine the<br /> level of recombination necessary to decrease virulence.

      I would like to see more virulence studies, why didn’t they treat the hSCARB2<br /> mice with T-1106? This should be conducted to directly address T-1106 efficacy<br /> both in the context of WT and Y276H EV RdRp treated mice.

      While the single molecule experiments demonstrate copy-back synthesis<br /> (intramolecular template switching) the cell-based experiments exclusively<br /> quantify homologous recombination (intermolecular template switching). This<br /> paper should contain an experiment that directly quantifies copy-back synthesis<br /> in a cellular context. Since copy back RNA synthesis should generate hairpins,<br /> RNA seq could be conducted to determine if sequences with hairpin-forming<br /> properties are enhanced in cells infected with EV RdRp and treated with T-1106<br /> compared to WT.

      The magnetic tweezers approach was also unable to directly quantify<br /> intermolecular template switching. If possible, another template could be<br /> introduced to assay if pausing, and thus no change in bead height, becomes<br /> indefinite, which could indicate that the RdRp has left the initial template.

      They claim that T-1106 has no effect on EV Y276H RdRp, but they show a<br /> significant reduction of recombination at higher doses (100-fold, figure 6H), so<br /> the data does not substantiate their claim.

      Bar graphs are no longer an acceptable form of data presentation, these figures<br /> should be converted to dot plots to show data variability and illustrate<br /> replicates.

      Is there a way to generate a mutant that has a greater propensity for<br /> recombination? If so, this would allow for a direct analysis of whether<br /> increasing recombination leads to decreased virulence. Another way to address<br /> this question would to be comparing PV and EV virulence, especially in a mouse<br /> model, since EV RdRp is more recombination prone.

      Minor Issues:

      This pausing-backtracking phenomenon was shown in both their previous work and<br /> in this paper, however, it was not confirmed through the use of other methods.<br /> Confirming the pausing phenomenon through other methods would be beneficial,<br /> perhaps using nanopore sequencing and/or single molecule tracking of the RdRp in<br /> cells to elucidate kinetic rates and interaction dynamics.

      The structure and dynamics information seems out of place and is not very<br /> informative. This figure may be better suited at the beginning of the paper, or<br /> may not be needed at all, to describe the structural differences between PV and<br /> EV RdRps, and the greater propensity for EV RdRp recombination. It could later<br /> be mentioned that the mutants display pore sizes similar to PV RdRp which could<br /> be shown in a supplementary figure. These data show a smaller channel width for<br /> Y276H compared to WT EV RdRp. How could a smaller channel width affect<br /> backtracking ability, especially since PV and EV RdRps both display backtracking<br /> ability? How could this relate to the function of an antiviral ribonucleotide,<br /> does the nucleotide interfere with pore interactions? These questions are not<br /> adequately addressed and would contribute to the paper. Additionally, would a 4<br /> Å difference be sufficient to yield the PV RdRp unable to accommodate a three<br /> stranded intermediate at the time of initiation?

      They did not hypothesize as to why 400 uM T-1106 concentration had the optimal<br /> response in their recombination assay. This should be addressed.

      Determining the molecular basis for the reduced recombination capabilities of<br /> the recombination deficient RdRps would be beneficial but may be the grounds for<br /> a separate paper. For example, how might the Y275(6)H mutation be stabilizing<br /> the polymerase and reducing recombination?

      Recommendation: <br /> I recommend revision of this article before acceptance in which the major and<br /> minor issues are addressed.


      Reviewer 3.

      The search for efficacious anti-viral therapeutics has become prominent in<br /> light of the recent coronavirus outbreak, with the RNA polymerase being a common<br /> target. A recent class of pyrazine carboxamide antiviral nucleotide and its<br /> analogs have shown promise, but there is ambiguity in the mechanism of action.<br /> It has been shown to increase backtracking during elongation for the poliovirus<br /> (PV) RNA-dependent RNA polymerase (RdRp), which may free the nascent 3’ end and<br /> allow for a template switch and recombination, producing inviable viral genome.<br /> This study used the more recombination-prone enterovirus (EV) RdRp to establish<br /> this connection between backtracking and recombination using a magnetic tweezers<br /> platform.<br /> The magnetic bead in this assay is tethered to a surface with ssRNA, and<br /> annealed to it is a template with a hairpin serving as a primer for the RdRp. As<br /> the RdRp polymerizes, the annealed RNA is displaced from the tethered RNA. At<br /> forces of >8pN, this causes the tethered RNA to lengthen, which is monitored by<br /> observing the height of the bead. This simple and highly informative assay<br /> showed that the EV RdRp is able to reverse, likely from the freed nascent 3’ end<br /> annealing to and elongating off itself and allowing reannealing to the template<br /> RNA. <br /> They then generated an EV mutant (Y276H) orthologous to the recombination<br /> deficient Y275H PV mutant and used a clever cellular assay with a gene construct<br /> reporting on recombination ability to show it is also recombination deficient,<br /> and also less deadly. This assay leaves non-recombined genomes unable to produce<br /> virus and expressing luciferase, and recombinants are viable and have low<br /> reporter output.<br /> To connect recombination ability and reversals, this mutant was tested for its<br /> ability to backtrack using the magnetic bead assay, and while it showed<br /> increased pausing, it showed decreased backtracking and reversals. To test if<br /> this was due to stabilization of the 3’ nascent RNA freed with the WT, they<br /> evaluated each RdRp in an in vitro RNA synthesis assay and found the mutant had<br /> a slower nucleotide incorporation rate. This same assay was performed with the<br /> PV WT and mutant RdRp to show similar results, but they importantly note that PV<br /> does not undergo reversals.<br /> Next the authors looked to the structure and dynamics of the PV and EV WT and<br /> mutant enzymes for insight into the mechanism of backtracking and recombination.<br /> They did not find obvious differences in crystal structure between EV and PV or<br /> between EV WT and mutant model. From here they did a lot of molecular dynamics<br /> simulations that I don’t fully understand, but they essentially tracked the<br /> distance between two residues within the RNA tunnel of the EV WT, EV mutant, and<br /> PV WT RdRps. Interestingly, the average distance was largest in the EV WT RdRp,<br /> smallest for PV WT, and the EV mutant was in between (but closer to PV). This is<br /> good suggestive data to show for the implications of RNA tunnel width for<br /> reversal ability, but they make no bold claims.<br /> In their last set of experiments, the authors again used the magnetic bead<br /> assay to assess EV RdRp movement but with the T-1106 drug. Unlike the<br /> recombinant deficient mutant, the drug caused a decrease in pausing and an<br /> increase in reversals. When the cellular recombination assay was applied with<br /> the WT EV RdRp and with the T-1106 drug was administered, there was an increase<br /> in recombinant-proficient plaque formation and a decrease in<br /> recombination-dependent reporter protein output. When the drug was applied to<br /> the mutant RdRp in the recombination assay, there was no activity to suggest<br /> that recombination was taking place.<br /> In supplementary Figure 6 the authors tested the sensitivity of each the EV WT<br /> and mutant RdRp to a titration of T-1106 concentrations. This was a great assay<br /> to perform, as it shows that even as the virus accumulates mutations in the<br /> polymerase the drug remains proficient. However, the sensitivity of the mutant<br /> to the drug is surprising since the drug was shown to cause no significant<br /> increase in the recombination ability of the mutant polymerase. While not stated<br /> explicitly, this could be addressed in the model posed in figure 6K as the<br /> aborted RNA synthesis.<br /> The model proposed from these data shows a logical conclusion drawn about how<br /> the drug is functioning on the polymerase, and the data were overall extremely<br /> well articulated. The experiments were mostly well described and straightforward<br /> while also being innovative and informative. This could be valuable information<br /> in drug development and testing for anti- RNA viral therapeutics.

      Major points<br /> • Figure 2B and F: why is the mutant about equal to WT in the plaque assay, but<br /> has a significantly higher survival rate in vivo? You mention this is consistent<br /> with PV, but propose no reason.<br /> • Figure 3G and 4G: Mechanistically, why does a decreased rate of nucleotide<br /> incorporation correspond to an increase in polymerase stability?<br /> • Figure 6H: I would have liked to see the magnetic bead assay for the T-1106<br /> drug applied to the EV RdRp mutant.<br /> • Supplementary Figre 6: How is it that the mutant can still be so sensitive to<br /> the drug? It should maybe be discussed that the T-1106 drug is inhibiting some<br /> other property of the enzyme that leads to recombination as well as normal<br /> function.

      Minor points<br /> • Figure 4: Why was the magnetic bead assay performed for the PV WT and mutant<br /> RdRp?<br /> • Why do you think PV RdRp doesn’t undergo reversals? Perhaps something to do<br /> with the RNA tunnel width?<br /> • Figure 5: What are the next experiments that should be done to explore the<br /> structure/dynamics? Is the tunnel width a potential factor in reversal ability?<br /> • There should be a sentence clarifying that the cellular recombination assay<br /> used in Figure 6 is the same as the one in Figure 2.

    1. On 2020-04-20 23:12:30, user Charles Warden wrote:

      Hi,

      Thank you very much for posting this pre-print.

      I am interested in the overall topic, and I have some specific questions about the analysis:

      1) Figure 3B makes we wonder about the effect of sample size on the results (but I think it is very good that you showed this, along with Table 1) --> In the abstract, you mention relatively large numbers (8184 individuals with European ancestry, 966 individuals with African ancestry, and 649 individuals of East Asian ancestry). However, if the events are rare, then the total number of samples per event (such as a given BRCA2 mutation) should be small (increasing variability, even in the set of individuals of European ancestry).

      o For example, how many cases (and controls) have the novel BRCA2 mutation in lung squamous cell carcinoma (and/or in ovarian serous cystadenocarcinoma)? Am I correctly understanding that everything except S1982fs in BRCA2 for 8 OV cancer samples (and a couple other mutations found in 2 samples) are only found in 1 sample each?

      o Also, I apologize, but I am also having some difficulty finding “S1982fs” (or “1982”, from Figure 2A) in the main text. However, I see Y1710fs described in Figure 4C as well as the main text (even though I see a note that variant is not in ClinVar). My understanding is that the abstract is describing a gene-level test for BRCA2 in OV, but how many BRCA2 variants did you observe in the 30 African OV samples (which you think may show an ancestry-specific difference)?

      o If Y1710fs is related to the BRCA2 result in the abstract, can you please provide the dbSNP identifier (or some other accession number) for that variant (and any others where you expect a difference), in order to maximize the chance of finding more information? For example, I was also trying to check BRCA Exchange (or data.color.com, etc.), but I am not sure if I need to use a different nomenclature to describe the variant(s).

      o If I have not correctly understood the case counts, did you check that the individuals with the mutations are not related to each other? Off the top of my head, I can only remember a report of viral cross-contamination (rather than mislabeling, etc.). I also think it should be less common in TCGA than a project like the 1000 Genomes project. However, I am essentially wondering what sort of artifacts (either technical or biological) could be checked for. With the 20% read fraction threshold that you describe, I think that should help with most "index hopping" but it seems like there probably should be something that can be checked. So, samples from the same patient or related family members is probably unlikely, but it is something we know can be checked (and hopefully you can think of some better possible confounding factor to check).

      o I see that “germline SNVs were identified using the union of variant calls between Varscan[12] and GATK[13]. Germline indels were identified using Varscan, GATK, and Pindel[14]”. I also see that you visually inspected variants using IGV (which is good, if I understand correctly). However, for the candidate variants, did they tend to be found using both VarScan and GATK?

      2) Are there other studies where you can re-process the raw data in a similar way to check if the results replicate in cohorts that had a higher fraction of individuals with African or East Asian ancestry (even if it is only for a limited number of cancer types)? It looks like you have done something to this effect for Figure 1B, but I wonder if you can get more evidence (and/or work with more primary data, if I understand the table correctly).

      o Visually, there is detection of BRCA2 variants in both African and European ancestry individuals. In addition to wondering if the lack of an East Asian difference is a sample size issue (in Figure 1A), you describe other studies in Figure 1B. Did you re-process that data, to call variants in a similar way? I could find the reference for Churpek et al. 2015, but I couldn’t find the reference to Gayther et al. 1997 in the paper (and I think there are at least 2 possible citations: in AJHG and Nature Genetics). Also, I am assuming that there was first a difference within the TCGA data – if so, can you create a table with multiple p-values / FDR values, as well as the absolute case counts?

      o You say “we tested 33 cancers in European Ancestry, 15 cancers in African Ancestry, and 8 cancers in East Asian ancestry” in the text. I think the criteria of 20 cases sounds small, especially if looking for rare variants. For example, I wouldn’t say this is enough samples to be making a clinical decision in other patients (or at least I would say there is a need to be transparent about the data being used, and continual collection of data and revision of estimates is important). However, I agree that you should try to have some sort of filter: I am not sure exactly what is the best way to communicate this, but maybe you could grey out the cells on Table 1 when the current criteria for testing is not meet?

      o You probably already know this, but I think you can probably get some extra WGS samples from other studies in ICGC: https://icgc.org/

      o The data type can vary, but there is at least a SNP chip datset with 473 African cases and 885 Japanese cases in phs000517.v3.p1 (for breast cancer). This may not be the best example, but I hope something in dbGaP may be able to help.

      3) Do any of the specific candidates that you focus on for validation fall under the “Other LOH” category?

      o Essentially, when I look at the TCGA results, I wonder if the African versus European difference for BRCA2 is significant (or if they are essentially replications of a similar finding), especially if there are only a total of 30 African OV cases to begin with (although I am also a little confused about the different color being used for the European BRCA1 carrier frequency, which is less than the African ancestry value; I think this is because there is a varying threshold for red versus orange between ancestry groups, but that is what I am trying to double-check). If you found a consistent result between ancestries, validation of a finding in a different ethnicity would be important (especially since my understanding was that BRCA2 mutation carriers should usually have less predictive pathogenic mutations than BRCA1 for the overall gene, even though I thought that the BRCA2 variants should be relatively more common for the overall gene). However, my understanding is that the current limitation would be really knowing whether the variant was not present in individuals of European ancestry (kind of like BRCA2 should be mutated in individuals with Asian ancestry, even the gene-level test couldn’t detect differences at the current sample size)

      o I would also expect the gene-level frequencies to vary, if this is for all cancer types (versus just OV or just BRCA). However, I still wonder about the change in BRCA1 vs BRCA2 ranking for the European individuals, which I think mostly due to “Other LOH” variants for BRCA1. Are there any thoughts about what could be causing those and/or if there could be any confounding factors, so that the “Other LOH” calls might have a higher false positive rate? I am not sure if I am reading too much into this, but it did catch my attention.

      4) Figure 3A shows read fractions. As a reminder for myself (and other readers), this is still supposed to be germline mutations (rather than somatic mutations). I understand that 3B and 3A are being tied together (where a LOH even can cause the allele fraction of the pathogenic variant to increase), but my questions relate to how many samples are used to calculate the read fractions for each dot. I think they may already be answered from the questions above, but maybe this particular question is more about whether you are emphasizing similarities or differences.

      o To be fair, it looks to me like the purple dots are in a roughly similar region for all 3 ancestry groups. So, if you are emphasizing mostly similar results, then I think this is OK. Indeed, you say “several predisposing genes are shared across patients” in the abstract.

      o However, if that is true, then I wonder if “ancestry-specific” may not be the best way to describe most of the germline differences (in the title), even if you do try to focus on a few variants that you believe vary more between ancestry groups. For example, you could say something like “Investigation of candidates with possible ancestry-specific frequencies…”?

      5) The number of admixed (or ambiguous) individuals seemed small to me, although maybe that is more common in some areas than others. While it may not matter so much for African or East Asian ancestry, I wonder if that could affect anything. Perhaps more importantly, is there some measure to attempt to respect the patient’s wish to declare a race/ethnicity? If so, does that mean there is also reported race/ethnicity that you can double-check (and exclude individuals without reported race/ethnicity)? I am guessing removing more admixed individuals would mostly decrease the European count, but I don’t really know for certain (especially in terms of who wouldn’t want to report that information, for validation).

      o Also, do you have enough SNPs to use something like RFMix to check of the ancestry for a particular region of the chromosome (containing the candidate gene) matches the largest fraction of ethnicity for the individual? For example, about 2% of my genome has African ancestry, but I would self-report myself as European ancestry (and that is the most accurate for my overall ancestry).

      Also, some other notes:

      • For citation #1, the sentence says “National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) program” but the reference says “Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin. American Cancer Society; 2018;68:7–30.”. So, I think either the sentence or the reference needs to be changed.<br /> • I see a supplemental file with 3 Figures, but I don’t see the 6 Supplemental Tables. Am I overlooking something, or do extra files need to be uploaded?<br /> • There is a reference for the AIM markers in “accepted draft attached”. However, I don’t see such a draft. Is there an earlier pre-print that you can reference?<br /> • There is also a part that says “Table 1Error! Reference source not found.)”, which probably needs to be revised.

      Thank you again!

      Sincerely,<br /> Charles

    1. On 2020-04-06 04:48:08, user Alexis Rohou wrote:

      I was asked to review this manuscript for a journal.

      5-April-2020<br /> Alexis Rohou, Genentech<br /> (I do not review anonymously)

      In this manuscript, Beckers & Sachse describe an algorithm to estimate the resolution of a 3D reconstruction obtained from single-particle cryoEM. Their method is notable in that it requires as input only two reconstructions, each from one half of the available dataset ("half-maps"), and knowledge of any applied symmetry. Also notable, the method makes no assumptions about the statistical properties of the signal and noise within the half-maps, and it does not rely on any Fourier Shell Correlation (FSC) threshold "criterion". These are notable achievements, which if reproduced and implemented in commonly-used image processing packages, could be highly impactful to the field of cryoEM and to other fields. I wholeheartedly recommend publication.

      The algorithm turns on two methods.

      First the authors use permutation sampling, whereby Fourier components within a shell of one of the half-maps are scrambled to simulate the null hypothesis, which is that the half-maps have no signal in common in that shell. Provided the shell has enough Fourier voxels, a large number of permutations can be generated. By calculating FSCs between one half map and numerous scrambled versions of the other, the authors show convincingly that the distribution of FSC values under the null hypothesis can be measured empirically. Once this distribution of FSCs under the null hypothesis is known, a statistical test can be performed at every shell to ask the question: is it highly unlikely that the measured FSC value (between the original, non-scrambled half maps) would have occurred in this shell under the null hypothesis? If the answer is positive, then we deem that there was detectable signal in that shell (and therefore, at that resolution).

      The manuscript convinced me that permutation sampling is a powerful approach to using the FSC without necessitating the derivation of thresholds or criteria.

      The second method used by the authors aims to reduce the risk that a false positive occur; that is, the risk that a shell be deemed to contain signal when in truth it doesn't. This False Discovery Rate (FDR) correction of the p-value to account for the "multiple test problem" is known to be valuable and important in the treatment of many problems, but to my mind the manuscript does not really make the case convincingly that it is necessary in this case.

      To some extent, this is purely academic curiosity - I do not suggest that FDR control should not be used. Rather, I think the manuscript would be stronger if the authors clearly showed the pitfall of not using FDR control in their algorithm, specifically in the context of testing whether a measured FSC denotes the presence of signal. For example, in Figure 2a, how many more red crosses would be drawn if it weren't for FDR control? In Supplementary Figure 5a, what would happen to the estimated resolution if FDR were "turned off" and a simple, fixed p-value used instead? Or perhaps not using FDR correction would affect the behavior described in Supp Fig 6b (left panel) at small window sizes. In a similar vein, no where in the manuscript do the authors explicitely and precisely define the "multiple testing problem" we are faced with when using the FSC. Is the problem that the same test is applied to many shells? Intuitively, I would have thought that those shells where FSC approaches 1.0 do not contribute at all to any "multiple testing problem" (since they are vanishingly unlikely to ever give a false positive), but rather that only those near FSC ~ 0 might be problematic. Is that so? These and other questions would be addressed by a more explicit description of the problem in this case. Again, I'm not suggesting the algorithm be changed, only that more explicit description of the multiple testing problem be given for readers like myself who are non-experts in that field.

      The authors show that their method, labeled FDR-FSC, gives resolution estimates that are similar to most author-reported resolutions in the EMDB, which is impressive given that FDR-FSC does not require any knowledge about masks (necessary for most workflows) or molecular weights (necessary for workflows where an unmasked FSC is calculated and then scaled to compensate for the limited number of real-space voxels describing ordered mass). I agree with the authors' suggestion that the community should consider using the FDR-FSC as a standard way to automatically estimate resolution upon deposition.

      Another, perhaps more important, aspect of this algorithm deserves better discussion. In simulated experiments involving noiseless half-maps simulated at 2.5 Å resolution, the authors show that as noise is added to the half maps, the FDR-FSC resolution estimate remains approximately constant (until very high levels of noise), whereas the FSC=0.143-estimated resolution descreases with increasing added noise. This seems to me a fundamental difference between FDR-FSC and earlier proposed methods for resolution estimation using the FSC, illustrated by the apparently paradoxical observation made by the authors that, having added so much noise that no high-resolution features are recognizable anymore in the real-space map, the FDR-FSC resolution estimate was still 2.5 Å. This suggests, as does the authors' discussion (page 12, "One potential (...) structure of interest."), a shift from using the FSC as an estimator of spectral signal-to-noise ratio, as has been since the inception of FRC/FSC almost 40 years ago, to using the FSC solely as a tool to detect the highest shell at which any signal (correlation between half-maps) is detectable. The authors tie this to noise tolerance and the need (or lack thereof) for masking, but I think it's much more fundamental than this, and I'd encourage the authors to draw this distinction more clearly and explicitly in their discussion (assuming I understood correctly).

      Is this what we really want from a resolution measure?? If I gave my chemist colleagues a map dominated by noise where the side chains and ligands are not visible but assured them that the map is 2.5 Å-resolution, and that they should trust the atomic model I built into it, do you think they would believe me? This is perhaps the most fundamental risk I see in this whole paper. I think the authors need to explain this paradox much better. (Maybe I misunderstood!)

      In addition to the above suggestions for improvements, below is a list of comments, questions and suggestions which the authors may like to consider when improving the manuscript further (line numbers refer to the PDF from the journal). In fact, many of the points in the figures I believe should be addressed before resubmission.

      FIGURES<br /> - Figure 1b: I don't understand why in Fig 1b the red (permutation) distribution looks much narrower and peakier than the blue (leading me to expect that the blue (simulation) curve should have longer, fatter tails), and yet the ECDFs in Fig 1c seem to overlap so well. This suggests to me that I don't really undertand how 1b was plotted exactly, for example. I suggested explaining somewhere: How was the count normalization done? / What are "normalized counts"?<br /> - 2a: Could the authors show somewhere (either in this panel, or in a suppl) what would have happened just with permutation, but no FDR correction? How many more crosses would there be? Which shells?<br /> - Fig 2: Panels b, c and e are exactly as would be expected. Overall this figure nicely makes the point that the "FDR-FSC" method circumvents the need for masking (or as the authors call it "noise removal)<br /> - Figure 2 title: please change. "noise removal" is to vague, ill-defined. be more specific. You are specifically referring to "solvent noise" here, or "background noise". In fact, really what you are doing is more commonly referred to as "masking". "Noise removal" sounds more "clever", but I'd advocate for more straightforward wording.<br /> - Figure 3<br /> -- There is no strong convention in the field, but I feel pretty strongly about this one. I advocate for red = hot = more movement/disorder = worse resolution; blue = cool = less movement = better resolution. Are you convinced? If not, no big deal, I'll survive. I guess.<br /> -- a: This map has psize of 1.4Å. At first glance it looks like FDR-FSC and ResMap both gave many pixels an estimated resolution of 2.8Å, which rings alarm bells - it's not usually a good sign when many estimates end up at the boundary of allowed/tested values. But perhaps I'm mis-reading the bar graphs. it's very difficult to see what's going on the high resolutions. Could the authors re-plot the histograms with an non-linear x scale? Perhaps spatial frequency (1/x) rather than resolution (x), say? This would make it easier to see what's going on. Or maybe a detailed view near 2.5-3.0, or some other way to clarify?<br /> -- a, histograms: is the y-axis really counts? why are the values <1.0?<br /> - Figure 4<br /> -- please name the proteins. Not many of us know our EMD identifiers by heart.<br /> -- panel d: again, I like blue and red the other way<br /> -- panels a-c: wait, I thought you were using blue-red, but now actually you're using yellow/black... could you please pick one colormap and stick with it?<br /> - Supp Fig 5<br /> -- a: this suggests to me that the FDR p-value correction wasn't really necessary... a simple fixed p-value test would ahve done the job... am I missing something?<br /> -- e: these plots suggests that actually FDR-FSC overestimates the resolution in low-noise conditions (overestimates = gives more optimistic resolution estimates than expected, lower numerical values). I suggest saying so in the main text. I don't really understand why. Do you?<br /> -- legend: "up to 4.0 standard deviations". std devs of what? is the signal scaled to 1.0 sigma?? Was the noise added everywhere? If so, isn't it worrying that the FDR-FSC doesn't give worse resolution estimates when noise is added? Or do you mean just noise in the solvent regions? (original notes before I understood [I hope] what was going on. Please be more explicit in describing the experiment to avoid such confusion for future readers)<br /> - Supp Figure 6, lower left panel: This is very nice, demonstrates much better behavior and less sensitivity to window size than fixed threshold. See my comments elsewhere in the main text.

      MAIN TEXT<br /> - p5, l27: "In order to account for this multiple testing problem". You have not introduced or defined any multiple testing problem, nor even what a multiple testing problem is, or why it's a problem. <br /> - overestimate/underestimate. These are difficult words, often too vague. Numerous ambiguous uses of these words are sprinkled in the manuscript. I recommend avoiding them. The first example was:<br /> - p6, l1: "can give rise to overestimates in comparison with". Overestimates of what? the sigma is overestimated, but the resolution will be underestimated. Might be worth making the language more precise here.<br /> - p6, l30: "i.e." - I think you meant e.g.<br /> - p6, l48: "more closely" - Yes, but the permutation and simulation are still quite off... simulated 0.15 is matched by permutation 1.0. That's quite a big margin, isn't it? Sorry, I don't have a specific suggestion here, but I wish I understood the discrepency.<br /> - p7, l8 "70%" Am I correct in thinking this is consistent with the ratio of the longest dimension within the sphere (1) to the longest dimension witin a cube (sqrt(2) ~ 1.4) 1/1.4 ~ 0.7 ? If so, does similar relation hold with the Hann window?<br /> - p8, l4-8: "While the 0.143 FSC threshold decreases from (...) only fluctuates at the second decimal digit". OK, but this is entirely as expected. A fairer comparison might have been to the behavior of the FSC compensated by the factor introduced by Sindelar & Grigorieff (so called FSC_part in cisTEM). Oh, and the 0.143 FSC threshold does not decrease, the estimated resolution does.<br /> -p8, l11: "at 2.5 Å resolution and added different amounts of white Gaussian noise." Was the noise added to all voxels in real-space, or just the solvent voxels? There is confusion, because the previous experiment just concerned the noise fromt he solvent parts of the map. This confused me for a long time, especially since if the noise is added to all voxels, I "wanted"/expected the chosen resolution estimate to get worse as more noise is added. Please be more explicit. See also comments to Figures, later.<br /> - p8, l21: "4.0 standard deviations". Standard deviations of what? What is the power of the signal? 1 sigma? Is the signal also white in power?<br /> - p8, l21 "the resolution remains constant (...) [even] when high-resolution noise entirely dominates visibility of the structural features". This confused me to no end for a long while, but I now think I understand that this is due to the fundamental difference I described above. Here are the notes I made on first reading this passage - perhaps reformulate to save the next readers some time. I was very confused:<br /> -- If a map is dominated by noise and no structural features are visible, surely the reported resolution should be very bad! What's going on here? Is this a case of white noise added to a signal completely dominated by low frequencies (in other words, much more power in the lower frequencies?) so that the white noise floor doesn't affect the SNR significantly at resolutions of interest??? Please: (1) specify where the noise was added, (2) specify the power spectrum of the map before noise was added, or equivalently show SSNR curve(s)<br /> - p9, l4: "in a fully automated fashion without any user interference". I know users are annoying, but perhaps in this case you could just say "user input"?<br /> - p9, l31-2: "tends to underestimate resolution" If you are going to say this (which is true), you should perhaps also say that the FDR-FSC overestimates the resolutions under those same conditions. Actually... perhaps try to find a way to say this that avoid "overestimate" or "underestimate".<br /> - p10, l2: "the resolution is overestimated at 3.8-4.0 Å". Wait, overestimated resolutions means that the resolution was worse than it should have been? That should be an understimate of the resolution, shoudln't it (it's less well resolved)? When discussing Sup Fig 5d/e, you used "understimate" to mean an estimated resolution that was worse (higher number) than the supposed truth. I suggest you remove "overestimated" or "underestimated" when talking about resoluition, because it's ambiguous, and use phrasing like "the estimated resolution tended to be worse than expected".<br /> - p10, l11-15: "Following Cardone (...) we confirm that using window sizes of seven times(...) resolution determination". I don't think your results support this assertion. In the well-resolved part of the map (Supp Fig 6b left), you show good (and constant) estimates of 3.4 with krr=4.4 to 14.7. In the solvent region of the map (Supp Fig 6b, right), I would argue that the actual resolution is ill-defined. so it's not clear what the krr is in that case. If you disagree with me on this, please state explicitly what features of your result lead you to suggest that a krr of 7 is desirable.<br /> - p10, l31-33: "The local resolution histogram (...) covers both aspects (...) well.". "covers both aspects (...) well" is a vague statement, which I find difficult to agree with. In my opinion, for example, neither method does a good job of estimating the resolution int he detergent micelles - at almost all values of the the window size, both method assign resolutions of ~5 Å, or even 2.8 Å (!) to parts of the detergent micelle (Supp Fig 6a)<br /> - p10, l40: "avoiding overestimation of the resolution in the low-resolution map parts". How do you know that it avoided overestimation of the low-resolution map parts? For example, the estimates for the voxels describing the detergent micelle looks like they are ~ 4-10 Å. Are you suggesting that these are correct estimates? What evidence supports this? If those are indeed correct resolution estimates, this would suggested that e.g. detergent molecules are well ordered within such micelles. I don't think this is the consensus on micelles.<br /> - p13, l4: "correlations (...) between resolution shells (...) due to uncertainties in alignemnt". How do real-space alignment errors lead to correlations between neighboring shells?? Real-space alignemtn uncertainties might convolute the images with an error (Gaussian?) kernel, leading to and envelope function in Fourier space, but this doesn't convolute or correlate neighboring Fourier shells with each other... Only the limited support of particles (or masking) in real space leads to correlations between neighboring Fourier components, as far as I know.<br /> - p13, l19-29: "This is a general property when local FSC (...) can affect the resolution determination". I would argue that FDR-FSC performs significantly better with smaller windows (see Supp Fig6, where a window of 15 pxl was sufficient to give correct estimate in center of the structure)<br /> - p13, l34: typo: "FDR-FSC", not "FDR-FDR"<br /> - p16, l38-9 "In cases of insufficient sampling below 10, the program will generate a warning message". I guess this means that this algorithm will not properly estimate local resolution in low-resolution parts of maps.<br /> - p16, l46 "multiple testing problem". Could the author succinctly summarize what is meant by "multiple testing problem".<br /> Specifically: what are the multiple tests - are they referring to the fact that the same test is performed on numerous independent (?) shells? What is the potential problem - that given enough shells, the probabiliy of a false claim that the actual FSC is above the desired threshold? These things may be obvious to the authors, but even to this reviewer, who tried to refresh his memory by re-scanning the authors' 2018/9 manuscript, a little bit more "hand holding" would help to make it easier to follow this section.<br /> - p17, l4: "(Beckers et al., 2019)". Specifically what part of that paper? Also, wasn't the earlier paper concerned with testing whether a particular real-space voxel's density is significantly above solvent noise? Is the same reasoning really valid here in the context of Fourier shells? I'm doubtful, so please elaborate. I prefer papers that stand alone as much as reasonably possible, rather than having to extrapolate from earlier papers.<br /> - p17, l34: "sub-sampling". Sub-sampling how exactly? Perhaps by only sampling in a geometrically-bounded region of Fourier space?<br /> - p18, l16. "a soft circular mask". Studying soft circular masks and their effects is interesting, but the preceding text ("depend on the specific shape and volume of the mask") made me expect discussion of realistic masks. I have serious doubts as to whether this is a good approximation for 90% of protein complexes under study. Take an ion channel - my guess is that the smallest circular mask that encloses it is probably going to leave at least ~50% solvent voxels within it.

    1. On 2020-03-31 14:36:50, user NYUPeerReview wrote:

      NOTE: This paper was selected for discussion and critique in “Peer Review in the Life Sciences”, a course for PhD students at the New York University School of Medicine. This course aims to build skills in the critical reading of the scientific literature, and provide formal training in the process of peer review. Following discussion as a class, three students wrote this peer review under the guidance of course instructors Damian Ekiert and Gira Bhabha.

      The BAM complex is essential for maintenance of the outer membrane of Gram-negative bacteria. It is known that the BAM complex is responsible for export of the outer membrane lipoprotein RcsF. How the BAM complex interacts with its lipoprotein binding partners, like RcsF, at the molecular level is poorly understood. In this paper, the authors determined the crystal structure of the inward-open conformation of BamA bound to RcsF at 3.8 Å resolution. The authors found RcsF lodged in the β-barrel of BamA and identified specific regions of interaction. Using structural comparisons of their model against previously solved conformational states of BamA, the authors found that RcsF binding is incompatible with the outward-open conformation of BamA. Lastly, the authors used crosslinking experiments to demonstrate that cellular levels of BamCDE modulate the formation of BamA-RcsF complexes and may promote the maturation of RcsF-Omp complexes. <br /> Our class was excited by the results of this paper, and had a nice discussion about the implications of this work for the field. Though many details of the structure such as side chains are not clearly defined at 3.8 Å resolution with significant diffraction anisotropy, the model appears to be well-refined and provides useful insights into this important structure. Moreover, the structural interactions between BamA and RcsF were validated using multiple orthogonal methods. The experiments were thorough and appropriate, providing insight into a previously undetermined transport mechanism. Below are some comments that came up during our class discussion, and we hope will be helpful to the authors:

      1) While an interesting observation was made with the BamAΔloop1 mutants, the rationale for selecting this loop for deletion was unclear to us. Were other loops also tested but caused loss of Bam activity? It would be nice to know if there is no disruption of RcsF interaction with BamA upon deletion of a different non-essential loop.

      2) The authors suggest a model in which the flux of incoming OMP substrates triggers conformational changes in BamA and the release of RcsF to its OMP partners. However, we didn’t see any experiments presented that directly addressed this (e.g., changing Omp expression levels and assessing RcsF maturation in response). Based on the presented data in the paper (notably, where OmpA-RcsF cross-linked product was observed with BamABCDE overexpression, but not BamA or BamAB), we think a major key finding of this work is that RcsF binding to BamA is dependent on BamA conformation and that the BamA conformation may be influenced by the presence of Bam accessory proteins. The triggers of BamA conformational cycling and exchange of RcsF to OMP partners remained unclear to us.

      3) In Extended Data Figure 4, we noticed that all the residues chosen for incorporation of the lysine analog are on the inside of the BamA β-barrel. We discussed that including a crosslinker location away from the RcsF binding site that is not expected to crosslink (as a negative control) would strengthen the data and clearly demonstrate the specificity of the assay.

      4) Our class had a brief discussion about how the rise of pre-prints may change the general practice in the field of releasing structural data to coincide with publication. As pre-prints increasingly gain visibility and can serve as a means of establishing priority of discovery, it seems worthwhile for the structural biology community to discuss/reassess when structural models and maps should be shared, and perhaps redefine a standard in the field. Should they still be released upon publication in a journal? Or should we be thinking about releasing this data along with the pre-print? Having the coordinates available while reading a structure manuscript can make it much easier to grasp the key points.

      5) We found some of the figures a bit difficult to follow. For example, in Figure 3A, it is hard to see the steric clash between the outward-open conformation of BamA and RcsF. Perhaps a different color choice, rendering, and/or orientation of the outward-open state would be helpful for the reader.

      6) We noticed that the structure statistic table did not include Ramachandran statistics or CC1/2. We think it would be good to include both of these.

      7) We were confused about the nature of the experiment presented in Extended Figures 1B and 1C. Was native gel electrophoresis performed on purified complexes, then the resulting 8 bands excised, and then somehow separated by SDS-PAGE? It was unclear from the main text, legend, and methods sections.

      8) The sensogram in Extended Figure 5C is missing labels and not described in legend (i.e. what do the colors represent?).

    1. On 2020-03-24 18:25:53, user Katrina wrote:

      Review of “Photosynthetic protein classification using genome neighborhood-based machine learning feature” by Sangphukieo et. Al. provided by the discussions during a computational biology journal club at University of Tennessee at Knoxville

      Summary: <br /> The authors find that their genome neighborhood-based model can identify known photosynthetic proteins with 94% accuracy, 0.892 F1 minor which is higher accuracy than prior sequence-based models and blastp methods. The novelty of this work is gene neighborhood network feature extractions. These features are generated by determining gene neighbors by summing the total branch length of the tree for each shared gene content between the 154 genomes normalized by a quartile cutoff. They compare their method to several other classification of photosynthetic gene software methods, feature reduction methods and different model classifiers.

      The author’s claim that identifying photosynthesis related genes is hard because photosynthetic components are temporarily present in plants, experimental identification costs time and money. The motivation is to improve photosynthetic efficiency since the current photosynthesis efficiency is at 6%. Previous computational approaches rely on sequence similarity and can falsely label genes as photosynthesis related. Machine learning provides a non-homology way to identify photosynthesis genes. No previous computational methods for predicting protein function incorporating gene cluster information. The dataset was tested on photosynthetic and non-photosynthetic proteins from UniprotKB. They propose this work includes genomic context and sequence similarity, two criteria that may be useful to identifying function for photosynthetic genes.

      Major comments:

      What is the exact purpose for this work? Is it to create an essential cyanobacteria genome, or understanding the genes involved in photosynthesis in cyanobacteria? What is the fundamental knowledge that we are learning from identifying photosynthetic genes vs. non-photosynthetic genes?

      How were proteins from UniprotKB annotated to be photosynthetic vs. non-photosynthetic? Is this based on GO ontology? Is GO ontology best for verifying photosynthetic genes or is there a better metric/dataset to use? How does filtering types of relationships based GO ontology impact accuracy of methods? What about experimental verified GO ontology relationships included? Are experimental only verified relationships have high accuracy of identifying photosynthetic genes?

      How do the results work if used only experimentally verified photosynthetic proteins vs. GO annotated? Were the labels too generic? And how can go ontology be incorrect and maybe better label better?

      It is not clear what data the tree was built on to determine phylo score? The proteomes of all 154 genomes , the shared genes only between 154 proteomes, whole genome sequence?

      The thresholds for the e-value are very high to get high accuracy for predicting novel photosynthetic proteins (greater than 1)?

      How were 154 genomes selected from NCBI? How are representatives from each 7 phyla determined? What NCBI database? RefSeq, Genbank, SRA? There are over 10,000 cyanobacteria genomes in Genbank.

      Minor comments:

      On line 38, the motivation is to improve photosynthetic efficiency since the current photosynthesis efficiency is at 6%. What does 6% efficiency mean for photosynthesis?

      For table 2, were there duplicate features included in the model when combining all features for all e-value cutoffs (line 248) or were duplicate features removed?

      Based on S3, the minimization of features figure, shows a trend of higher accuracy, F1 minor and MCC as number of features increase not as number of features decrease. Please check lines 241-243 again. The data does not support feature reduction and show it is helping model prediction.

      On line 160, the split of data 90% training model and 10% for testing. Why this cutoff? Usually the cutoff is 60/40? Is the model overfit for this data?

      On line 138, the normalization method is applied for phyloscore. Why not divide the phylo score by average phylo score to normalize? Please explain rationale for quartile method and how this could impact results by setting different cutoffs and what is quartile cutoff for level 2?

      In table 3 on line 533, the recall and precision would be good for photomod for identifying known photosynthetic genes. Why weren't those metrics included?

      On line 335, there are all sequence based methods. Can you confirm why you think your method is not sequence based ? <br /> On line 312, put S9 where validated the table. <br /> On line 116- 117, what is the rationale for this ?

      Why does neighboring genes have to be conserved in at least 3 genomes? Rationale for this cutoff and how this was determined?

      On line 101, what was the rationale for blastp cutoffs of 1E-10, 1E-50 and 1E-100?

      On line 149, you state there are exactly 6,430 photosynthetic and 6,430 non-photosynthetic proteins. Is this true statement for the equal number of photosynthetic proteins and non-photosynthetic proteins?

      The f1 minor is only for classifying photosynthetic genes. Do you think it is important to classify non-photosynthetic genes and have F1 including this too into metric? Justify use of the F1 minor metric and that it may over stress noise and why not use F1? What is the F1?

      For readers, can you include supplemental at the end of the main file and include the figures incorporated into the main text? This will help readers comment and follow better.

      The current colors in the figure are not legible if paper is printed out in black and white.

      The current figures are rasterized images. Can you please make it vectorized for scalable images? Ex. Figure 3. Also you can include full high resolution at the end of file and low resolution images in the paper text itself.

      Thank you for including line numbers

    1. On 2020-03-10 16:52:36, user Jef Vizentin-Bugoni wrote:

      "The transition from trait-based to abundance-based linkage rules corresponds with a decline in floral trait diversity" corroborates predictions of the 'neutral-niche continuum model' (Vizentin-Bugoni, J., Maruyama, P. K., de Souza, C. S., Ollerton, J., Rech, A. R., & Sazima, M. (2018). Plant-pollinator networks in the tropics: a review. In Ecological networks in the tropics (pp. 73-91). Springer, Cham.)

      Based on similar insights, we produced (in the review above) a simplified model where we specifically predict that in communities with high trait variation, niche-based processes (or trait-based, as you call) tend to be more important than neutral-based processes (or abundance-based, as you call) as drivers of species interactions. The underlying mechanism we propose for the first scenario is that more biological constraints (morphological, phenological, chemical, etc) exist, limiting species interaction. In contrast, random change of encounter should prevail prevails when trait diversity is low and, therefore, traits do not importantly constrain species interactions. I think your work may be the first formal test of this model which is, however, overlooked in this preprint. Hopefully this could be amended in a further version. Otherwise, this is a great work.

      Jef

    1. On 2020-03-09 19:39:13, user Fraser Lab wrote:

      This manuscript by Leander, M., et al, uses TetR as a model system to explore the robustness of an allosteric response (in this case coupling drug and DNA binding) to mutation. This paper uses high throughput mutational scanning to identify variants that compromise function using FACS coupled to deep sequencing. As a follow-up the authors conduct a break-and-restore secondary screen where they generated libraries in the backgrounds of 5 deleterious mutations to identify rescuing suppressor mutations with FACS followed up by sampling with sanger sequencing. They use structural modeling (in particular rosetta and MD) to develop potential mechanistic explanations for these mutations.

      Overall, the data presented shows that empirically identified allosteric residues appear to be distributed across TetR, are not conserved, and have a variety of structural mechanisms potentially underlying them. The authors take this to mean that broadly, allostery is distributed and not conserved. The generality of the present approach is perhaps a bit overstated ("profound impact", “radically reframe”), but this is a great example of leveraging the classic strategy of identifying suppressor mutants using a functional screen while taking advantage of the new power and massively parallel nature of modern high throughput sequencing. With the focus on plasticity and robustness there could be increased citations/discussion of previous work on protein robustness and strategies involving suppressor mutations. Many of their conclusions could be put in context with previous work on allostery in this system (see: Reichheld and Davidson, PNAS, 2009), which puts forth an alternative subdomain folding model that is not really considered here.

      One of the main arguments in the introduction is that previous works weren’t comprehensive. From our reading, only one experiment, presented in the structural hotspots more conserved than allosteric’ section, measured all (or a nearly comprehensive set) of the mutations with deep sequencing. While the libraries were made it is unclear why sanger sequencing as opposed to sanger sequencing was used for the break-and-restore experiments. Moreover, the paper does not make clear which statistical tests are used to validate qualitative observations. For example, somewhat arbitrary thresholds are set and used to define where a region is an allosteric hotspot. In general, the thermodynamic coupling between one residue to another is not binary and so it does not make sense to treat the data qualitatively. It makes more sense to develop a quantitative score for whether a residue is allosteric or not based on deep scanning mutational data. For example if some mutations are harder to rescue you should expect not only less residues will rescue them but those that have to should have higher coupling then those that are easier to rescue- a core argument in the paper. This should be measured and tested quantitatively. Percentages should be reported somewhere regarding each of the rescued background libraries. It’s quite possible all this data is there, just not presented clearly.

      Similarly, if the assignment of allostery is made quantitative it would be easy to calculate correlation between allosteric residues and conservation or as is it would be easy to calculate the z score between the conservation of dead vs allosteric residue populations. This would quantitatively back up the claim of the paper that residues allosteric residues are not conserved. There are many other examples throughout the paper where it would be appropriate to do a statistical test.

      Overall, the paper is hard to follow as written. For example, it is confusing that the mutations in various mutational backgrounds are presented prior to the single mutational data. Perhaps it would make more sense if the single mutation datasets were presented first, followed by the rescuing mutations in the background of these mutations. It is unclear as is whether the deep sequencing data from the single mutational libraries were used in deciding mutations to be used as backgrounds for the second order mutations.

      The major successes of the paper are the “break-restore” cycle of mutagenesis and integrating one potential structural framework to develop mechanistic explanations for some mutations which is often the lacking step in deep scanning mutational studies. The major concern we have with this data is that the timescale of the MD simulations (while still impressively long microseconds) is still insufficient to get at many issues of folding of subdomains (see again Reichheld and Davidson) and other aspects of the conformational ensemble that may mediate allostery in this system (esp. if it is not simply a matter of an “active” and an “inactive” structure).

      Specific points:

      Throughout the paper, it is unclear why methods were chosen, how assays were developed, and whether statistical tests were done. Some examples:<br /> * How were libraries generated? Chip-DNA is not sufficient information. Looks like from the methods inverse pcr and golden gate was used. High level information should be in the body of the paper. How do these libraries compare to similarly generated libraries? <br /> * There are triple mutations in the library. Where did these come from?<br /> * Nowhere in the paper are the quality of the libraries discussed. How much WT is present? How many variants were observed of the possible variants? How much coverage on the effective size of the library (considering WT) at the sorting/sequencing? Baseline library statistics (WT %, % present, bias) is needed to determine how well NGS experiments went.<br /> * How was the threshold for ‘low’ GFP decided on? Were any controls used? More broadly, were controls used to determine any thresholds? Example raw data for this experiment should be in the supplement.<br /> * In the disrupt and restore first step experiment presented in Fig1C it’s mentioned that there were many mutations that disrupted but 5 were chosen as background for secondary libraries. How many mutations were disruptive? Was this the data presented later in fig3? Or if not from the experiment presented in Fig3 this primary screen should be in the supplement. Why these 5 apart from them being distributed across TetR? Strongest signal? Did they represent distinct clusters? <br /> * How is partial vs full rescue of function described? How do you think about positions that can have varied impacts of rescue vs those that have a range of responses? For example D53V and N129D seem to all be rescued more or less the same amount whereas (impossible to know as a reader without statistics...) R49A and especially G102D have vastly different responses. <br /> * Fig1C ranks mutants by mean. Ranking by mean does not seem appropriate based on the fact that G102D in Fig1C is the second most easily rescued whereas in Fig2B it is the hardest to rescue. This seems odd. In the next section this idea is discussed somewhat and maybe does not make sense to rank order this data.<br /> * How and why were thresholds chosen? Why couldn’t this same analysis be done in Fig1C data by binning fluorescence? If 1000 mutants were done why are there not 1000 mutants in FigS3? Where is that raw data?<br /> * The authors discuss that rescuing residues are either unique to a given mutant background or shared across multiple. They call this ‘ variant-specific regional bias’. However, only 200 out of a possible of ~3000 variants per background are sampled so it is hard to know whether this analysis is meaningful. It is unclear why these experiments were done with clonal sequencing and not illumina sequencing. An added benefit would be being able to do thermodynamic cycle calculations mutations to quantify the coupling between all mutations. This would just require sequence baseline libraries as well.

      * 5/20 mutations having a signal was used as a threshold for allosteric residue classification. This seems somewhat arbitrary unless this was quantitatively determined to be a good threshold. It makes more sense for every residue to get a coupling score based on depletion of weighted sequencing reads and have a statistically defined threshold (R packages like DESEQ2 can do this easily) for calling residue allosterically coupled.<br /> * Thermodynamic coupling is not binary so enrichments could be quantitative. Then it will be easier to judge the data and easier to calculate statistics. How many residues were missing from the dataset? How common are allosteric sites? Looking at FigS4 it is hard to know if white residues are missing data or mutations that don’t meet the cutoff.<br /> * A statistical test could be used to back up the statement that allosteric residues aren’t conserved. As is or it would be easy to calculate the z score between the conservation of dead vs allosteric residue populations. Really there should be a quantitative score that could be used to calculate correlations between conservation and later centrality.<br /> * A baseline high throughput experiment was done without ligand to see how TetR is inhibited without induction. The authors interpret GFP no ligand mutations as destabilizing DNA binding. However, mutations could alternatively impact baseline expression through TetR structure disruption or dimerization. This should be mentioned<br /> * Why was a triple mutant chosen for the rescued MD simulations when H44F had a stronger signal (Fig 1C)? Also, a double mutant would be better to limit higher order epistatic effects.<br /> * In figure 4d there do appear to be broadening in the distributions and a shift to To the left two populations. Is this meaningful? Is there any insight into why the triple mutant isn’t all the way back to WT?

      Throughout the manuscript there are broad generalizations that are not consistent with our view of the literature. Here are some examples:<br /> * Authors discuss TetR having a high degree of allosteric capacity based on the results. However, without more datasets or discussing previous work in this space it is hard to say whether TetR has a high allosteric plasticity.<br /> * The authors postulate that ease of rescuing a dead variant may correlate with how stabilized the inactive state of the protein is. However, the literature has certainly considered this and should be discussed/cited if this section remains. <br /> * The authors talk about how their work radically reframes the problem and is very impactful. We will leave the impact for history, but this is a pretty classic strategy and we fail to see what is “radical” about it. It is a great example of using modern technology on a “classic” system - that is cool!

      Throughout the manuscript there are explanations whereby the logic is unclear. Here is an example that would benefit from further explanation: <br /> * In after the site-specific mutation section the authors conduct rosetta modeling to develop putative mechanistic explanations for several of the mutations. Here the authors see reduced helix-turn-helix stability however there is no explanation of it’s significance.

      Insufficient background/missing citations<br /> Through the manuscript there is lacking background and many missing citations. Here are some examples:<br /> * ‘Thermodynamic does not require spatial connectivity’ should have a citation<br /> * ‘Allosteric signaling occurs through redundant and robust networks’ based on one example from one paper it is improper to generalize. There should be citations here as there are certainly more examples of allostery being redundant.<br /> * The authors discuss allosteric hotspots but do not cite work here that came up with the concept. For example, earlier in the paper Rama Ranganathan’s work is cited and should be again here.<br /> * Citations needed that identified mutations in DBD and LBD<br /> * Centrality is a used to identify residues associated with allostery. The authors mention that in some instances it does not predict their allosteric classification. How does this compare to previous evaluations of centralities performance as an allosteric metric?<br /> * More discussion of how the field views the conservation of allostery would be good. Overall, it’s not entirely novel that allosteric sites are not as conserved as Though it’s not necessarily novel that allosteric sites are not as conserved as catalytic/binding sites. Fig1b of Yang J-S, Seo SW, Jang S, Jung GY, Kim S (2012) Rational Engineering of Enzyme Allosteric Regulation through Sequence Evolution Analysis. PLoS Comput Biol 8(7): e1002612.

      A major rationale and point the authors make in the introduction is that previous studies have been exhaustive, however many of the examples the authors give are clonal experiments with limited sample size. Some examples:<br /> * If this is 200 variants per position this is nowhere near exhaustive. How is there only 1 variant for G102D in fig2a when in 1C there were more? Were any statistical thresholds used for the data in Fig 2b? <br /> * The authors discuss that rescuing residues are either unique to a given mutant background or shared across multiple. They call this ‘ variant-specific regional bias’. However, only 200 out of a possible of ~3000 variants per background are sampled so it is hard to know whether this analysis is meaningful. It is unclear why these experiments were done with clonal sequencing and not illumina sequencing. An added benefit would be being able to do thermodynamic cycle calculations mutations to quantify the coupling between all mutations. This would just require sequence baseline libraries as well.

      Figures<br /> 1B<br /> It would be nice to see raw data somewhere for gating. To get a sense of what the library data looked like. It is unclear why only the top and bottom gates were collected and not a series of bins. It would also be good to get a sense of what percentage of the population these gates represented.<br /> Fig 1C<br /> How many replicates were done for each? There should be extensive statistical tests here between mutants, wt and background single mutations. <br /> Why are there triple mutants? Seems triple mutants shouldn’t be included as that starts moving into high order epistatic space and is hard to discuss.<br /> Unclear why mean was used to range order these as clearly several don’t fall quite inline especially G102D<br /> Fig1D<br /> Hard to read labels. Poor contrast.<br /> Fig 2A<br /> Seeing the raw data for these would be good. I don’t think it’s appropriate to use binning for this data and instead there should be a numerical value for fold induction. Then induction could be scored quantitatively. Also, need for statistical tests.<br /> Fig2B The raw data for this would be good to have in the supplemental figures<br /> Fig2C<br /> Hard to read residue labels, It would be nice to have an example that has an allosteric explanation. As all of these are just direct interactions.<br /> Fig2D<br /> This hypothesis could have been more fully tested if full libraries were characterized<br /> Fig3A<br /> Really hard to interpret this. The distribution are clear but there should be quantitative comparison.<br /> Fig3C <br /> Same comment as fig 3A.<br /> Fig 3D<br /> Need better labeling. What is top and bottom? Also pointing out where the modelled residues are in 3C would be good.

      Grammar:<br /> There are missing ‘a’, ‘the’, etc but here are some examples as well as a couple of other issues:

      Page3:Line7<br /> ‘the’ decentralized<br /> Page3:Line10<br /> Unclear what ‘they’ refers to. <br /> Page4:Line5<br /> ‘Time and again’ and ‘myriad’ are redundant<br /> Page4:Line14<br /> ‘a’ biochemical understanding<br /> Page4:Lines19-20 <br /> ‘a’ promoter and ‘that’ promoter<br /> Page6:Line11: <br /> ‘a’ high degree<br /> Page6:Line16 ‘<br /> allosteric’ signaling<br /> Page7:Line11 <br /> Break up the one massive paragraph after sentence 10 in the site-specific rescuability of allosteric dysfunction section.<br /> Page8:Line15<br /> Why are hotpots in parentheses? This is confusing.

      We were prompted to review this by a journal, James Fraser and Willow Coyote-Maestas

    1. On 2020-03-07 12:38:58, user Tanai Cardona Londoño wrote:

      This is a REALLY fascinating study. Thank you.

      I want to point out a couple of aspects regarding the evolution of PSII and offer a perspective that you may find interesting.

      PSII evolved as a homodimer and it is very likely, if not a certainty, that some form of water oxidation had already evolved before the duplications that led to the heterodimeric core of PSII. Those are the duplications leading to D1 and D2, and to CP43 and CP47.

      I think beyond the following papers [1-3], no one else has actually considered the implications of the homodimeric transition and the heterodimerization process of PSII for the origin of water oxidation and the Mn4CaO5 cluster. See also our most recent work:

      Oliver et al., (2020) Origin of photosynthetic water oxidation at the dawn of life, bioRxiv, doi.org/10.1101/2020.02.28.... (I just uploaded this to the preprint service too!)

      Our studies on the evolution of these duplications indicate that the homodimeric stage of PSII was extremely transient and short-lived [3]. The transition not measured in hundreds of millions of years, but perhaps just in number of generations. We know this from the requirement of an exponential decay in the rates of evolution of the core of PSII from the point of duplications [3]. This is independent of how ancient the duplications are, but the younger PSII is, the faster the heterodimerization process. But why is this important?

      The heterodimerization of PSII is a process that occurred to improve water oxidation, and includes stuff like the energetic optimization of TyrD to be able to oxidize the S0 state to S1 in the dark; or the evolution of electron-transfer side pathways that involve the Cytochrome b559 bound to one side of the reaction centre for protection; or the evolution of the bicarbonate-mediated control of QA, also for protection; not to mention, the evolution of the extrinsic proteins themselves.

      Therefore, a photosystem that produced birnessite-type oxides, that existed before water oxidation, could have only occurred before the core duplications of PSII, and might have been an extremely short-lived evolutionary transition.

      I think there is a growing consensus that a form of oxygenic photosynthesis was happening by about 3.0 billion years ago [4], but the Johnson et al. [5] work on Mn deposits are from rocks 2.45 billion years ago. My own work suggests that water oxidation may be even older than 3.0 billion years. So, if there was ever a PSII that produced birnessite-type oxides, and did not split water, and that could have survived for hundreds of millions of years… it could not have been, necessarily so, a “transitional” evolutionary stage towards the evolution of water oxidation.

      In other words, it would mean that there was a lineage of photosystems that had evolved and became optimized, through hundreds of millions of years of natural selection, to do exactly that: to oxidize Mn without water oxidation, as suggested by Johnson et al. for the 2.45 billion-year-old rocks, for example. Do you see what I mean?

      And who is to argue that such a photosystem did not originate from a water-splitting photosystem to begin with? If that is the case, and there was a lineage of Mn-oxidizing photosystems related to water-splitting PSII, how can we prove that the ancestral state was actually Mn oxide production and not water oxidation instead? An interesting change of perspective, right?

      Furthermore, if there was ever an organism with a Mn-oxidizing photosystem that lasted for hundreds of millions of years (I don’t rule out the possibility at all), then there is a good chance that they may still be around, as no one has actually looked for them.

      References<br /> 1. Rutherford, A.W., et al., Photosystem II and the quinone–iron-containing reaction centers, in Origin and evolution of biological energy conversion, H. Baltscheffsky, Editor. 1996, VCH: New York, N. Y. p. 143–175.

      1. Rutherford, A.W., et al., Photosystem II: evolutionary perspectives. ‎Philos. Trans. Royal Soc. B, 2003. 358: p. 245-253.

      2. Cardona, T., et al., Early Archean origin of Photosystem II. Geobiology, 2019. 17: p. 127-150.

      3. Catling, D.C., et al., The Archean atmosphere. Science Advances, 2020. 6: p. eaax1420.

      4. Johnson, J.E., et al., Manganese-oxidizing photosynthesis before the rise of cyanobacteria. Proc. Natl. Acad. Sci. U.S.A., 2013. 110: p. 11238-11243.

    1. On 2020-01-24 09:13:52, user ani1977 wrote:

      Very timely publication! And thanks for releasing the data :) I see the genome https://www.ncbi.nlm.nih.go... based on mapping as far as i could read the M&M, wondering if de-novo assembly was also performed? Otherwise the read shared generously seem to be there at http://virological.org/t/pr... and I can give it a go... BTW why HeLa for "Determination of virus infectivity" (Fig. 4) as we think it may not be good system for it given that we have shown antiviral response just with mock transfection https://www.sciencedirect.c...

    1. On 2019-12-30 18:12:36, user Fraser Lab wrote:

      The major goal of this paper is to use diffuse scattering data to inform models of collective protein motions. This is a landmark paper that unites many disparate observations in the field and pushes the state of the art forward much more so than any paper since Wall et al, 1997 PNAS.

      Through careful data collection, the authors are able to separate Bragg and diffuse scattering. The major experimental advance over previous work is that their fine-scale analysis enables them to integrate diffuse halos surrounding the Bragg peaks. This data yields the observations needed to model lattice dynamics. They find that lattice dynamics explain a significant fraction of the diffuse scattering data. Nonetheless, the authors noticed residual B-factors and turned to internal protein motions to explain the remaining disorder, which leaves signals both around the Bragg peaks and in hazy streaks and clouds between them.

      To explain these residual features, they tested both normal modes analysis (NMA) and full molecular dynamics (MD). Furthermore, they were able to use Patterson analysis to choose between redundant NMA models, conquering an outstanding challenge in the field of macromolecular diffuse scattering. Surprisingly, the NM model that accounts for lattice motions and internal protein motions matches the data better than a crystalline MD model. What does this mean for MD that a reduced representation fits better?

      Overall, the data collection and processing are extremely thorough. Opening up these analytical methods to the community is the next step - and publishing their code is the only essential revision we would request prior to publication.

      Despite our enthusiastically positive interpretation, we do have a few minor questions and requests for clarification:

      While examining the exponential decay in halos around the Bragg peaks, why are the 100 most intense peaks between 2 Å and 10 Å focused on? In Figure 2 it appears that there is a skew in the distribution of exponents toward a sharper decay (n > 2). How do the histograms look when more halos are sampled? Is it possible that this sharp decay could be explained by Bragg peaks that are leaking into adjacent voxels?

      The authors are rigorous and explicit in their modeling efforts and make impressive strides forward. Still, we are left with questions about these models. For refinement of the lattice dynamics model, a small fraction of halos were chosen. Why did the authors not use all the halos? Why was the angular range of 2 Å to 2.5 Å chosen for refinement? Why does this resolution range differ from the analysis of halo decay ( 2 Å to 10 Å)?

      As we commented above, we were surprised to see that a NMA model matched the diffuse intensities better than a crystalline MD model. We wonder whether incorporating the isotropic component of the diffuse scatter would alter this interpretation? Furthermore, since the authors scrupulously subtracted sources of isotropic background scatter, why was the remaining isotropic portion of diffuse scattering not used for refinement of the NMA and MD models?

      Using diffuse scattering data to distinguish between competing models of motion has been a longstanding challenge in the field of macromolecular diffuse scattering, and we are impressed with the authors’ work in this regard. This is really a breakthrough! We were surprised to see how subtle the effects of restraining domain motions were upon the ΔPDF in Figure S17, can the authors comment on the statistical significance of this difference? What is the uncertainty in the Patterson map, and how does this play into the interpretation of the best model?

      We have no major stylistic recommendations. The figures are elegant and clearly represent the main points of the paper. Similarly, the text is clear and concise, with thorough expansion in the supplemental material.

      On a final note, this paper pushes the field forward, and we believe there is room for further speculation. A few areas to consider:<br /> How might crystallographers who encounter more mosaic Bragg peaks (these are some of the least mosaic crystals in existence!) separate the Bragg signal from the diffuse signal to analyze halos? <br /> In what ways can NMA models and MD be further improved to match diffuse scattering data? <br /> What complications might arise in crystals with more complex unit cells, and how can this be overcome? <br /> How do they reconcile the results of ref 18 with their analysis of the lattice dynamics (different systems obviously)?

      The authors have done an excellent job of carefully collecting data, thoroughly analyzing it, and clearly explaining their work. We think that digging into the questions above may add to the already substantial impact of this paper, and look forward to their replies. Nonetheless, we think this important paper is worthy of publication as is (noting the caveat of code release).

      We review non-anonymously, James Fraser and Alex Wolff (UCSF)

    1. On 2019-12-25 21:33:59, user HonSing wrote:

      Hi Authors,

      As the first person to image the ultrastructure of EVs with AFM (tapping mode, non-liquid), I am trying to understand the value of this preprint. I adore manuscripts/works that use AFM because in my opinion, there is just no other way to understand their volumetrics. But there are always major assumptions when using AFM for volumetrics.

      Firstly, the cantilevers used (SNL-10 - or "A" in this preprint) have a forward angle of 15degrees and a back angle of 25degrees. This means that while you state it is impossible, multiple traces/scans must be performed if you want to measure the radius properly. As I understand it, the radius and height is critical for your analysis. From that you derive the contact angle, and then you will arrive at your stiffness (k) metric.

      I did not see any incorporation of the forward/back angle into your calculations. If not, I will share with you why it is so critical. I remember that when I was a PhD student, I was informed by AFM experts that the angle of the tip will lead to an artefact when imaging and trying to ascertain what an object may topographcially look like. The tip is not perfectly shaped like a needle, it is a triangle shape and it is substantially larger than the object being scanned. Hence, vesicles/non-vesicles will become larger/wider than they really are.

      I think that when you look at your different vesicle preps (which is a very cool experiment), you will find that due to your masks, some liposome preps are larger than others. Ie., the DOPC has smaller diameters, 40-60nm; whereas the DSPC has a median diameter of 120nm. This is a major difference when your team makes comparisons between liposome preps. When accounting for the geometry of the cantilever tip, this means that you are unfortunately comparing apples versus oranges. You need to control for size before calculating your stiffness (K). I think you will understand what I mean when you compare the LUT scales of the DOPC versus POPC, DPPC, and DSPC (40nm, 90nm, 140nm, 190nm). These are progressively larger and larger. This is why your Figure 4 (contact angle) exhibits a near linear relationship. The vesicles are simply just larger and hence, the contact angle is greater because the cantilever tip is not a perfect vertical tip, but a big-ass triangle.

      The title states that it is a high-throughput screening method. But lets be very honest with each other, anyone that does AFM on EVs is already doing this "high-throughput" imaging by simply zooming out to a 5-10um FOV. In that instance, they will be able to image tens or hundreds of EVs in a single FOV. I'm quite sure this is not novel if this is something that all AFMers do when they simply are trying to look for the mica coverslip (when the cantilever engages the object of interest).

      I also would like to see images of actual microparticles/microvesicles/ectosomes etcetcetc and what your contact angles as calculated are and what they look like. I think you will quickly see that they are no longer spherical but really highly heterogenerous objects with an irregular radial geometry and "rough" topography. That is because they will contain things like actin filaments and other structural components. That in itself would make a max/min XY radial measurement (that this work asserts) to arrive at a contact angle and stiffness (k) measurement an inaccurate one.

      I think what would be valuable is a calculation that accounts for the forward/back angle of a cantilever and its limits of imaging a quasi-3D object, such as an EV. I think that is the most important issue that faces AFMers - the fact that the tip itself produces an imaging artefact and that I have seen very little in terms of how we account for how big this error is when we image dome-shaped things smaller than 500nm. I did enjoy reading about this work and I hope that with more work, it will be something that I and many other AFMs can cite and refer to in the future. Good luck!

      Cheers,<br /> Hon S. Leong

    1. On 2019-11-17 23:04:04, user Eran Halperin wrote:

      We have to strongly disagree with the comment by Teschendorff. As in several cases in Jing et al., Teschendorff makes another false claim about the TCA paper in his comment below: We do provide in the TCA package an option to learn the tensor, which is of interest (and works well, as demonstrated in the TCA paper), however, TCA should be applied differently for the task of association testing (i.e., EWAS). Specifically, we used Equation (13) in the Methods of the TCA paper for association testing; we clarified this in the paragraph that follows Equation (13) in our paper: "In this paper, whenever association testing was conducted, we used this direct modeling of the phenotype given the observed methylation levels."

      Importantly, in his commentary, Teschendorff does not acknowledge the fact that there are two innovative components in the TCA paper: (1) inferring a three-dimensional tensor of cell-type-specific levels from two-dimensional bulk data, and (2) direct modeling of phenotypes as having cell-type-specific effects, given the observed methylation levels, which allows to integrate over the hidden tensor information; as pointed out in the TCA paper (and instructed in the vignette and manual of the TCA package), this is the preferred way to perform EWAS using TCA. While the estimates of the tensor may also be used for EWAS (as performed by Jing et al.), this option is substantially less powerful, as it does not take into account the differences in variance between samples. For more details see Equation (13) in the TCA paper.

      Also, we concur that there is value in the CellDMC paper as a benchmarking paper for previous methods. However, our argument is that CellDMC is not a new approach (although in their own words, in the CellDMC paper Tschendorff and his colleagues present it as a “novel statistical algorithm”), as the same method has been previously applied to gene expression (Westra et al., Plos Genetics 2015, Shen-Orr et al., Nature Methods 2011), while to the best of our knowledge, TCA is a new approach, with its advantages and disadvantages.

      Finally, we would like to emphasize that we disagree with most of the claims made by Jing et al. in their paper, however, these claims are irrelevant as long as they present irrelevant results based on an irrelevant application of TCA. If any of the reviewers or editors of Jing et al. would be interested in a more detailed criticism of their claims, we will be happy to provide it, although we do not think that it is needed at this point.

    1. On 2019-10-28 23:45:14, user Charles Warden wrote:

      Hi,

      I have a minor point about a citation:

      "The only other paper we identified which performed a systematic benchmarking of pseudocounts is Warden et al [2], however they limited the range of their pseudocount to be between 0 and 1; and as we’ve seen the optimal value may be much larger."

      The fold-change calculation is from FPKM values. While you can do something similar with Count-Per-Million (CPM) values, that would still not be exactly the same as a pseudocount. In other words, an FPKM of 1 probably is a much more conservative threshold than a pseudocount of 1 (if you are talking about normalized counts, whose exact value can vary depending upon other samples).

      Also, I apologize, but I think accessing that paper has become tricker more recently. However, you can see all of the original content here:

      http://cdwscience.blogspot....

      Thank you for putting together this paper!

      Sincerely,<br /> Charles

    1. On 2019-09-13 18:09:41, user Timothée wrote:

      As much as I see the need to quantify biases and trends in the hiring process, I have a number of concerns with data collection and data release associated to this paper.

      As far as I can tell, the inference of gender has been done based on names and pictures and pronouns, which is biased, and is actively erasing colleagues that express gender non-normatively, or are read as a different gender.This is not a mere methodological point; it is a practice that is actively harmful to the overall effort on Equity, Diversity and Inclusion, by specifically applying bias to the more marginalized. I think this should be commented in a lot more detail in the manuscript, but I do not think that the methodology is at all reliable.

      Second, this dataset contains nominative information on EU citizens (which is in likely violation of the GDPR), and seems to contains information that was divulged by third parties. As much as I understand that people may have been given their consent to communicate data for the purpose of the analysis, I wonder whether explicit consent for un-masked data publication was given, and what the data retention policy is.

      Finally, I was surprised to see no mention of the IRB approval process. This is likely an oversight on the side of the author, but I wish that the preprint could be amended with the IRB approval, or the clear statement that the approval was not needed.

      We cannot afford a cavalier attitude towards data publication when it involves people, and I do not think that this preprint does a particularly good job at this (which is not a comment on the quality of the underlying scholarship).

    1. On 2019-08-24 12:50:04, user WJR wrote:

      Regarding the paper, "Sex solves Haldane's Dilemma" (currently unpublished), by Donal A. Hickey and G. Brian Golding. The following comments concern the paper and its accompanying computer simulation. These comments arise primarily from reading the software code, and may be less obvious from reading the paper.

      SUMMARY:

      The paper needs clarifications and expanded discussion on key points. (1) The simulation is biologically unrealistic in ways that lend to the paper's conclusions. (2) The simulation artificially (and completely) removes the advantages of asexuality, and also artificially decreases the disadvantages of sexuality. (3) The paper thereby reaches the (questionable) conclusion that sex provides faster evolution. The paper will need to clarify these matters, if it is to be successful.

      (a) SELECTIVE ADVANTAGE:

      The paper specifies that the beneficial alleles have a selective advantage of 0.02. However, the ambiguity of that wording might mislead readers. The authors ought explicitly clarify that they mean a homozygote will have an advantage of 0.04.

      That is significant here, because that figure is much higher, (between 4 and 40 times higher), than is typical of the textbooks/papers in this field. This high a selective advantage will need justification. Especially since this high selective advantage is used for each of 100 separate alleles simultaneously.

      (b) STARTING FREQUENCY:

      The simulation begins with a cloned population of identical genomes, and initializes these by randomly creating beneficial alleles at each locus. The starting frequency of these is set to 0.05, (which is 1 out of 20). In other words, each individual, at each diploid locus, has nearly **a ten-percent chance** of possessing a beneficial allele. And this high starting frequency occurs at each of 100 loci simultaneously. This unusually favorable starting situation needs more justification in the paper.

      (c) RANDOM GENETIC DRIFT:

      Sexuality uses a randomized recombination of alleles, while asexuality does not. Because of that, a sexual species experiences more random genetic drift than does an otherwise equivalent asexual species. And this excess genetic drift often eliminates beneficial alleles. These tend to be randomly eliminated when they are yet few in number. In a sexual population, this random genetic drift is like extra genetic 'noise', that can push a rare beneficial allele into extinction.

      In an extremely large sexual population, a newly-minted beneficial allele, will succeed only 2*s percent of the time. For example, a typical selection coefficient, with s=0.01, will be eliminated 98 times out of a hundred. (For s=0.001, it is eliminated 998 times out of a thousand.) The situation is worse for smaller population sizes, because genetic drift is stronger there.

      In the above-described way, genetic drift is a disadvantage to sex. But the simulation minimizes that disadvantage by using a large population size (=100,000), together with high initial frequency (=0.05), together with high selection coefficients (s=0.04). This setup virtually guarantees that none of the beneficial alleles will be lost through this genetic drift. Indeed that is the case, as seen in the posted results of the simulation. This artificially benefits the sexual population in the simulation.

      (d) MUTATION RATE:

      The simulation uses an unusual manner of mutation, where harmful mutations are entirely disallowed. Instead, only a specific type of back-mutation is allowed; where a beneficial allele reverts back to the original allele, (which has a multiplicative fitness contribution of 1.0). In this way, the simulation artificially eliminates the problem of error catastrophe (also known as mutational meltdown), since fitness is automatically never allowed to fall below 1.0.

      Also, the back-mutations occur at an extremely low rate, given by:

      Mutation_rate_per_progeny = MUT_RATE * number_of_loci * 2 * p

      where: <br /> MUT_RATE=1.0e-08, given as a mutation rate per gametic loci<br /> number_of_loci = {1, 2, 4, or 100}, <br /> p is the frequency of the beneficial alleles (which starts near 0.05 and ends near 1.0), <br /> the "2" is because each progeny is a diploid.

      The factor 'p' arises because the back-mutation merely converts an existing beneficial allele back to the original allele. Due to that handling, the mutation rate varies throughout the simulation; it starts low (for p=0.05), and slowly increases by a factor of twenty (for p=1.0). This varying mutation rate is peculiar.

      These back-mutations are the *only* mutations throughout the simulation. (Note: The simulation is hard-coded for 400 generations, with a population size of 100,000 progeny each generation.) Yet the mutation rate is so low that this entire simulation will sometimes experience not even one mutation. This low rate of mutation is trivial, and can be ignored.

      This must be compared with recent measurements of the human mutation rate, which is around 100 new mutations per progeny. That is over 50 million times higher than the highest rate employed in the simulation. The paper needs much more justification of it's handling of harmful mutation. An explicit attempt should be made. [Note: This issue runs far deeper than it first appears.]

      The remaining items (below) address the simulation's handling of sexuality versus asexuality.

      (e) FECUNDITY and REPRODUCTION RATE:

      In the simulation of sexual reproduction, the FECUNDITY is set to 2. That is, for males the FECUNDITY is 2, and for females the FECUNDITY is 2. The authors ought remind readers that such a female would need to produce 4 progeny. This arrangement correctly represents the fact that half the female's reproduction goes toward reproducing her mate's genetic material.

      However, in the simulation of asexuality, the FECUNDITY is likewise set to 2, which is a mistake. It should be 4. That way, the females produce 4 progeny in both cases (sexual versus asexual). We must compare apples to apples.

      Asexuality is twice as efficient at transmitting its genetic material into the next generation. But the simulation artificially cut the asexual reproduction rate in half, thereby disallowing this advantage of asexuality.

      (f) The SLOWING-EFFECT versus STARTING FREQUENCY:

      A human-like population has around 23 chromosome pairs. There is no linkage between alleles on different chromosomes, and such alleles segregate independently. (Also, a human-like population has a somewhat higher recombination rate than used in the simulation.) Because of those things, a collection of, say, 100 different alleles, (randomly distributed across the genome), would expect little or no linkage between them. To a first approximation, they would segregate independently. And this produces a well-known disadvantage of sexual reproduction. That is, yes, sex can bring favored alleles together into one progeny, but it tears them apart just as effectively. (Some theorists describe sexual reproduction as a genetic shredding machine, each generation shredding and re-mixing the genomes.)

      By 'tearing apart' the beneficial combinations of alleles, sex slows evolution. This slowing-effect is strongest when the beneficial alleles are yet rare, at low frequencies. Then, they can only fleetingly exert their combined selective effect, before sexual reproduction separates them again. This is all standard theory.

      This slowing-effect doesn't happen in asexual populations. Once a beneficial combination of alleles is obtained, it is not shredded or separated. Rather, it is inherited, intact, into the next generations.

      This slowing-effect ordinarily places a sexual population at a disadvantage. But the simulation minimizes that disadvantage by starting the beneficial alleles at an extraordinarily high frequency (=0.05), thereby artificially avoiding the worst of the slowing-effect.

      (g) EPISTASIS:

      The above-described slowing-effect is even stronger when there is epistasis. (Epistasis occurs when a group of alleles have a combined selective effect that is much stronger than the sum of their effects taken individually.)

      And the simulation employs strong epistasis. (The epistasis in this simulation occurs through its use of a multiplicative-fitness model with high selection coefficients over many loci.)

      The evolutionary genetics literature regards the following as a robust and firm result: Sex-with-epistasis makes evolution slower than asexuality-with-epistasis. So how does the simulation minimize this slowing-effect? See below.

      (h) CHROMOSOME NUMBER and RECOMBINATION RATE:

      The paper seeks to challenge that prevailing view and show that sex speeds evolution. The paper aims to prove it via simulation. Unfortunately, the simulation attempts it by artificially decreasing one of the classic disadvantages of sex. It does that by reducing the chromosome number to 1, (and also by slightly reducing the recombination rate). This allows the substituting alleles to (unrealistically) experience linkages that would be unexpected in a human-like population. In the simulation, the substituting alleles are all on *one* chromosome; with various groupings effectively linked together as one; transmitted together into progeny as one; exerting their combined selective effect as one; generation after generation. And this situation makes them substitute faster. In other words, the simulation artificially increases the speed under sexuality by mimicking asexuality.

      For this simulation to effectively challenge the prevailing view and resolve this question, a more life-like chromosome number would be needed, (say, 23 to 25). This would be a reasonably simple change to the software. [For example, take the simulation's model with 4 loci, and increase it to 25 chromosomes. Easier still, just let all the alleles segregate independently. There would still be 100 alleles, and the computer run-time would be about the same.]

      (i) THE HORSE RACE:

      After initializing the simulation, no further beneficial alleles are added throughout the duration. You can think of this as lining up many race horses together at a starting gate, then after the start, no further horses are added to the race. In the simulation, (with all the horses lined up at the starting gate), all the beneficial alleles are guaranteed to eventually join-up together within the sexual individuals. But that is forbidden in an asexual population. That is the advantage of sexuality.

      But that setup artificially disallows a major advantage of asexuality. That is, new horses (i.e., new beneficial alleles) are added to the race throughout time, continuously, through mutation. Then an asexual species can more rapidly acquire those. How? As mentioned above, an asexual female's genome effectively has double the reproduction rate of its sexual peers. This allows a fit asexual female to more rapidly increase its sub-population size, and thereby (through having a larger size) more rapidly 'receive' its next beneficial mutation. (For example, if a sub-population is ten times larger, then that group receives its next beneficial mutation ten times sooner, and then the cycle begins anew.) This real advantage of asexuality is explicitly disallowed in the simulation.

      That fact undermines the legitimacy of the simulation for comparing sexual and asexual populations. Fixing this would require substantial alterations to the simulation and paper.

      CONCLUSION:

      There are at least eight distinct ways this simulation is biologically unrealistic, and these give the uncanny appearance of having been tuned to support the authors' conclusions. That is an undesirable result, as we all want a simulation we can rely on, and believe in. I encourage the authors to continue their work (with software upgrades and such), as I believe it can lead to a useful research tool.

    1. On 2019-08-01 15:07:03, user stephens999 wrote:

      This interesting and impressive<br /> paper presents extensions, implementation and application of a recently-developed<br /> statistical methodology (the knockoff filter) to large GWAS (UK Biobank).<br /> The methods provide guaranteed control of False Discovery Rates when<br /> testing pre-specified contiguous groups of SNPs (or other variants).<br /> Importantly, the null hypothesis being tested here<br /> is not the commonly-used null that the group of SNPs is *marginally* unassociated<br /> with the trait; instead the null is that the group is<br /> *conditionally* unassociated with the<br /> trait given all other observed SNPs. This conditional test<br /> is in many ways more informative than conventional marginal tests<br /> because it ensures that a significant group cannot be<br /> explained by linkage disequilibrium (LD) with other measured SNPs outside the group.<br /> Thus the conditional test comes closer to identifying groups of<br /> potentially-causal SNPs than do conventional marginal tests.

      The paper is very well presented, and the results and comparisons with other methods<br /> seem generally appropriate and interesting. My main request<br /> is that the paper should better highlight the limitations of the method --<br /> specifically, at high resolution ("fine-mapping")<br /> the need to confine tests to pre-specified contiguous groups of SNPs<br /> seems a clear disadvantage compared with existing fine-mapping methods.<br /> This is not to take away from the other important contributions of this work.

      Major Comment

      As mentioned above, the main limitation of the current implementation<br /> (and perhaps the whole framework?) is the requirement<br /> that groups of tested markers be both contiguous and pre-specified.<br /> At coarser resolutions, where the<br /> main goal is to identify genomic regions (conditionally) associated with the trait,<br /> these requirements are not a major limitation. However<br /> at fine-scale resolutions, where one is trying to get down to<br /> the likely causal markers, these requirements becomes more bothersome.<br /> For example suppose we have 4 SNPs, in order, A-B-C-D, and A and D<br /> are in very strong LD with each other (say LD of 1 for concreteness),<br /> but not in strong LD with B or C, and A is the causal SNP. Then<br /> the contiguity requirement of knockoffZoom will not allow<br /> it to refine the association beyond the entire group (A-D),<br /> even though in principle one could narrow it down further to SNPs A and D.<br /> Existing fine-mapping methods do not have this limitation<br /> and could report (A,D) as the set of potential causal markers.<br /> Further, even if the contiguity requirement were relaxed<br /> (e.g. to allow prespecified non-continguous groups), the need to<br /> prespecify groups to be tested may still limit the resolution<br /> to which associations can be refined.

      For this reason I think it is premature to claim<br /> "...KnockoffZoom unifies locus discovery and fine-mapping into a<br /> coherent statistical framework" (p15). Specifically, I think its<br /> abilities to solve the fine-mapping problem are not<br /> yet adequate to make this claim, and that studies interested<br /> in fine mapping will continue to want to use<br /> existing Bayesian fine-mapping methods like SUSIE (quite possibly as a complement to knockoffZoom)<br /> to refine associations as far as possible.<br /> In any case, the limited resolution that comes with testing contiguous pre-specified marker<br /> groups should be better highlighted in the text.

      Besides better highlighting this limitation in text, the<br /> comparisons with fine-mapping methods should be extended to<br /> quantify the effect. Currently the comparisons show<br /> the "width" of region identified by each method (Figure 4, right panel).<br /> However, fine-mapping methods do not strictly identify a region but a set of<br /> SNPs, so the figure should also compare the number of SNPs identified<br /> by each method. It would also be informative to show<br /> the minimum pairwise LD between the markers identified -- does knockoffZoom sometimes<br /> report markers not in high LD with one another due to the contiguity<br /> constraint? (Incidentally, the y axis on this figure is too large to<br /> see the interesting region, which for fine-mapping is <0.1 Mb.<br /> Getting to a region of 0.5 Mb is not really fine mapping in my opinion.)

      It would also be interesting to get the authors' perspective on how easy<br /> or difficult it might be for the contiguity<br /> requirement to be relaxed in the future. (Also the pre-specification requirement,<br /> although this seems more fundamental.)

      Other main comments

      • Some aspects of Table 1 are surprising to me. Eg the<br /> number of bmi findings going from 24 -> 0 -> 15 as resolution increases.<br /> Shouldn't power increase as larger groups are tested? (I realize<br /> there are fewer tests as groups get bigger...so this is not a simple<br /> issue.) The hypothyroidism results are perhaps even weirder. Can you<br /> provide any intuitive explanation for why this might occur? Is it simply<br /> chance, since the knockoff procedure can produce different results if run<br /> multiple times?

      • The introduction criticizes the<br /> two-step approach as "not fully satisfactory because it requires<br /> switching models and assumptions in the middle of the analysis,<br /> obfuscating the interpretation of the findings and possibly<br /> invalidating type-I error guarantees." However, from Table 1 (see above comment),<br /> performing separate analyses at different resolutions appears to have similar problems<br /> regarding interpretation. The method that<br /> avoids "floating" discoveries at high resolution (Supplement S1B)<br /> seems to address this, but at a cost in power. What is that cost in power<br /> for the analyses here? How does Table 1 look if you apply that method?<br /> (with or without the 1.93 factor mentioned in the supplement).

      • As I understand it the output at each resolution depends on a single<br /> generation of the knockoff variables, and so the method will report<br /> different significant results each time it is run? Is this correct?<br /> If so, how different/similar are the results if you run things a second time<br /> with another knockoff realization? (It could suffice to do one trait twice<br /> to illustrate this)

      • The notation (X,Xtilde) suggests that the knockoffs are always included after the<br /> real variables in the input file to the lasso/bigsnpr. In principle the location<br /> of the knockoffs in the input file should not matter when a convex method like lasso<br /> is being applied (with the exception of variables with<br /> LD=1, which is already dealt with here as a special case). However, if one were to replace<br /> the lasso with non-convex methods the non-random order of the markers<br /> into the method could lead to failure to control FDR (eg if the method<br /> has a bias towards choosing columns earlier in the list of covariates).<br /> Further, even for convex methods, there is some concern numerical issues<br /> could arise to create this bias. As a safety check I suggest<br /> running the method with randomly ordered columns, or if that is<br /> too much of a pain simply reversing (Xtilde,X) to check it makes no difference.

      • I found the references to the Li-Stephens model vs fastPHASE<br /> model in the Supplement confusing. The description of Li-Stephens<br /> as "This HMM describes the distribution of genotypes as a<br /> patchwork of latent ancestral motifs"<br /> is incorrect - this describes the fastPHASE model.<br /> The Li-Stephens model describes each<br /> haplotype as a patchwork of<br /> other observed haplotypes, not latent motifs.<br /> As I understand the text all the models here are<br /> essentially fastPHASE models not Li-Stephens models.<br /> Please clarify.

      • The results in the supplement that reduce forward-backward calculations<br /> to O(K) and O(K^2) look similar to results that are already well<br /> established (e.g. Fearnhead and Donnelly,<br /> 2001, Estimating recombination rates from population genetic data, Genetics).<br /> Is there anything new here?

      • Please provide more details about the comparisons with other methods,<br /> including versions of software and the settings used.<br /> Ideally the code used to run the comparisons with other methods<br /> should be made available - even without documentation this can<br /> be invaluable for others to see what was done.

      Other comments/questions:

      • Getting the method working on problems of UK biobank scale is<br /> impressive, even though limited to "only" 591k SNPs.<br /> Would applying to ~50 million SNPs be feasible, and<br /> require about 100 times the computation?<br /> For coarse resolution it might not matter much to include the extra<br /> SNPs, but for fine-mapping it ultimately<br /> seems important to include as many SNPs as possible.

      • The paper discards tests where the knockoffs are very highly<br /> correlated with the original variables (which makes sense as<br /> they have no power). For intuition I would be interested to see the<br /> distribution of the correlation of knockoffs with the original variables<br /> (say at the finest resolution).

      • What is the MAF distribution of the variants analyzed here?<br /> Does the method work equally well for common vs rare variants?<br /> (I ask because the LD models may tend to work best for common variants.)

      • It would perhaps be helpful to cite (and contrast with) previous work that attempts<br /> to control error rates of conditional tests of groups of variables<br /> (eg work on hierarchical testing by Yekutieli, Meinhausen, Bu\"hlmann etc).

      Minor:

      p6: "by likelihood of the trait" -> "in distribution of the trait"

      p11: "its intrinsic limitations discussed above" - I do not see where they<br /> were discussed.

      p12: "As the resolution increases, we report fewer findings" - not always!

      Table 1: I suggest giving resolution in terms of kb instead of Mb. Is 0.000 down to<br /> single SNP resolution?

      p16: "possible *to* construct"

      refs: markov -> Markov ; uk -> UK

    1. On 2019-07-30 22:49:30, user Charles Warden wrote:

      Thank you for putting together this paper.

      I was a little concerned when I saw "We estimate that a sample sequenced to the depth of 70 million total reads will typically have sufficient data for accurate gene expression<br /> analysis." for a couple reasons:

      1) For most gene expression projects, I think 10 million aligned reads is OK and 20-30 million total reads is often pretty safe. While the exonic percentage varies for library protocol, and I'm not sure about the unique read conversion (or if that conversation also varies between library protocols and sample types).

      2) I think the specifics have to be figured out for specific protocols (and raw data can be used for research purposes in different applications, or to check the validity of processed data).

      For 1), I think that was justified from both my own experience (with 50 bp single-end reads), as well as Liu et al. 2014 / Wang et al. 2011 / Tarazona et al. 2011. I noticed those papers while responding to this discussion.

      For 2), I don't exactly have a paper to show this, but I would say differential expression between groups requires testing / optimization per-project. So, you couldn't really define criteria that will work in all possible gene expression projects. While kind of messy, I have some notes from a Twitter discussion this past weekend.

      However, I think part of the discrepancy for b) is different interpretations for "differential expression," "over-/under-expression," and "outlier expression". I am mostly thinking of the 10-20 total million polyA reads for differential expression and genes with clear expression / over-expression. If you talking about a pattern that would more more likely to be a technical artifact, I can see how extra effort would be needed for gene expression analysis. For example, if you could have 2-3 biological replicates from slightly different sections of a sample (each with 10-20 million reads), that starts getting close to a total of 70 million total reads for that sample.

      I think your Figure 1A and Figure 4C (and possibly Figure 3C) makes me think there is more agreement than I originally expected from the abstract (since that emphasizes something with a threshold of 10-20 million MEND reads). However, I would say 90% specificity may be more reasonable for sensitivity (instead of 95%), for whatever metric is captured by that test. In general, I think 80% accuracy for a genomic signature is pretty good, and I think you need to be careful about over-fitting. That was part of the Twitter discussion that I linked above, but that is also described in my genomics for "hypothesis-generation" blog post.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Shan et al seeks to define the role of the CHI3L1 protein in macrophages during the progression of MASH. The authors argue that the Chil1 gene is expressed highly in hepatic macrophages. Subsequently, they use Chil1 flx mice crossed to Clec4F-Cre or LysM-Cre to assess the role of this factor in the progression of MASH using a high fat high, fructose diet (HFFC). They found that loss of Chil1 in KCs (Clec4F Cre) leads to enhanced KC death and worsened hepatic steatosis. Using scRNA seq they also provide evidence that loss of this factor promotes gene programs related to cell death. From a mechanistic perspective they provide evidence that CHI3L serves as a glucose sink and thus loss of this molecule enhances macrophage glucose uptake and susceptibility to cell death. Using a bone marrow macrophage system and KCs they demonstrate that cell death induced by palmitic acid is attenuated by the addition of rCHI3L1. While the article is well written and potentially highlights a new mechanism of macrophage dysfunction in MASH and the authors have addressed some of my concerns there are some concerns about the current data that continue to limit my enthusiasm for the study. Please see my specific comments below.

      Major:

      (1) The authors' interpretation of the results from the KC (Clec4F) and MdM KO (LysMCre) experiments is flawed. The authors have added new data that suggests LyM-Cre only leads to a 40% reduction of Chil1 in KCs and that this explains the difference in the phenotype compared to the Clec4F-Cre. However, this claim would be made stronger using flow sorted TIM4hi KCs as the plating method can lead to heterogenous populations and thus an underestimation of knockdown by qPCR. Moreover, in the supplemental data the authors show that Clec4f-Cre x Chil1flx leads to a significant knockdown of this gene in BMDMs. As BMDMs do not express Clec4f this data calls into question the rigor of the data. I am still concerned that the phenotype differences between Clec4f-cre and LyxM-cre is not related to the degree of knockdown in KCs but rather some other aspect of the model (microbiota etc). It woudl be more convincing if the authors could show the CHI3L reduction via IF in the tissue of these mice.

      We thank the reviewer for these constructive comments. We have performed FACSsorting of KCs (CD45<sup>+</sup> F4/80<sup>hi</sup> CD11b<sup>low</sup> TIM4<sup>hi</sup>) or MoMFs (CD45<sup>+</sup> F4/80<sup>low</sup> CD11b<sup>hi</sup> Ly6G<sup>-</sup> TIM4<sup>-</sup>) from Chil1<sup>fl/fl</sup> and Lyz2<sup>∆Chil1</sup> or Clec4f<sup>∆Chil1</sup>mice, respectively. Compared with Chil1<sup>fl/fl</sup> mice, mRNA levels of Chil1 was reduced more than 90% in KCs from Clec4f<sup>∆Chil1</sup> mice while not different in MoMFs (Revised Figure S3B). Besides, compared with Chil1<sup>fl/fl</sup> mice, mRNA levels of Chil1 was reduced more than 90% in MoMFs from Lyz2<sup>∆Chil1</sup> mice while roughly 40% in KCs (Revised Figure S5B). This revised data support the phenotypic difference between Lyz2-CKO and Clec4f-CKO mice.

      We agree with the reviewer that the significant knockdown of Chil1 in BMDM from Clec4f<sup>∆Chil1</sup>mice is confusing. To keep the rigor of our data, we remove this part from our manuscript. 

      Additionally, we performed immunofluorescence staining to detect Chi3l1 expression in liver tissues of these mice. The results show a reduction of Chi3l1 expression in KCs (TIM4+F4/80+ cells) of both Lyz2<sup>∆Chil1</sup>and Clec4f<sup>∆Chil1</sup>mice, with a more pronounced decrease in Clec4f<sup>∆Chil1</sup>mice (Author response image 1). 

      Author response image 1.

      The expression of Chi3l1 in liver tissues of Chil1<sup>fl/fl</sup>, Lyz2<sup>∆Chil1</sup>and Clec4f<sup>∆Chil1</sup>mice. Immunofluorescent staining to detect Chi3l1(green) expression in liver sections of Chil1<sup>fl/fl</sup>, Lyz2<sup>∆Chil1</sup>and Clec4f<sup>∆Chil1</sup>mice under normal chow diet. TIM4 (KCs marker, white), F4/80 (macrophage marker, red), nuclei were counterstained with DAPI, Scale bar=20 µm and 10 µm (Inset).

      (2) Figure 4 suggests that KC death is increased with KO of Chil1. The authors have added new data with TIM4 tht better characterizes this phenotype. The lack of TIM4 low, F4/80 hi cells further supports that their diet model is not producing any signs of the inflammatory changes that occur with MASLD and MASH. This is also supported by no meaningful changes in the CD11b hi, F4/80 int cells that are predominantly monocytes and early Mdms). It is also concerning that loss of KCs does not lead to an increase in Mo-KCs as has been demonstrated in several studies (PMID37639126, PMID:33997821). This would suggest that the degree of resident KC loss is trivial.

      We appreciate the reviewer’s insightful comment. We agree that our data show no substantial generation of monocyte-derived Kupffer cells (MoKCs) within the 16-week HFHC model. However, we do not believe the degree of resident KC loss is trivial, since 60% of KCs die at 16 weeks compared with 0 week (Revised Figure 5D). Instead, our observations align with a phased replacement model: recruited monocytes first differentiate into monocyte-derived macrophages (MoMFs), which we see accumulate (Revised Figure 5D), and only later adopt a KC phenotype. Consistent with this, our 16-week model shows significant EmKC loss and MoMFs expansion, but not yet the emergence of TIM4-MoKCs. This timing is supported by prior studies, where TIM4KCs were observed at 24 weeks, but not at 16 weeks, on similar diets (PMID: 33440159; PMID: 32888418). Therefore, we interpret our findings as capturing an earlier phase of MASLD progression, characterized by EmKC death and MoMF accumulation, prior to their full differentiation into MoKCs.

      (3) The authors demonstrated that Clec4f-Cre itself was not responsible for the observed phenotype, which mitigates my concerns about this influencing their model.

      We thank the reviewer for this comment and are pleased they agree that our control experiment using Clec4f-Cre alone confirms that the phenotype is specific to our genetic manipulation and not an artifact of the Cre driver.

      (4) I remain somewhat concerned about the conclusion that Chil1 is highly expressed in liver macrophages. The author agrees that mRNA levels of this gene are hard to see in the datasets; however, they argue that IF demonstrates clear evidence of the protein, CHI3L. The IF in the paper only shows a high power view of one KC. I would like to see what percentage of KCs express CHI3L and how this changes with HFHC diet. In addition, showing the knockout IF would further validate the IF staining patterns.

      We thank the reviewer for their thoughtful and constructive feedback. We agree that our initial conclusion regarding Chil1 expression in liver macrophages relied heavily on prior observations and was not sufficiently supported by the data presented. In response, we have revised our conclusion to state: "Hepatic macrophages express Chi3l1 and upregulate its expression following HFHC feeding." (Revised manuscript, page 4, line 136-137)

      To strengthen this finding, we have replaced the original high-power image of a single Kupffer cell with a representative low-power view showing multiple F4/80+ macrophages (Revised Figure 1A). Furthermore, we performed quantitative colocalization analysis, which revealed that under normal chow diet (NCD), approximately 8% of F4/80+ macrophages are Chi3l1-positive. This proportion significantly increases to 15% upon HFHC feeding (Revised Figure 1A).

      Additionally, to validate the specificity of the Chi3l1 immunofluorescence signal, we have included staining of liver sections from Chil1 knockout mice. In contrast to wildtype mice, Chi3l1 signal was completely absent within F4/80+ macrophages in Chil1<sup>-/-</sup> mice, confirming the specificity of the staining (Revised Figure 1B, Revised manuscript, page 4, line 152-157).

      Minor:

      (1) The authors have answered my question about liver fibrosis. In line with their macrophage data their diet model does not appear to induce even mild MASH.

      We thank the reviewer for this observation. We agree that under our HFHC dietary conditions, the mice do not develop MASH pathology. However, we believe this earlystage model is a strength of our study, as it allows us to dissect the initial role of the Chi3l1-glucose interaction in regulating Kupffer cell fate during early MASLD, prior to the onset of significant fibrosis. This approach enables us to capture early macrophage adaptations (such as Chi3l1 upregulation) that might otherwise be masked or become secondary to the overt inflammation and scarring characteristic of late-stage MASH models.

      Reviewer #2 (Public review):

      In the revised version of the manuscript, the authors have attempted to address my questions, however, a number of my original concerns still remain.

      Firstly, I had asked for a validation of the different CRE lines used - Lysm and Clec4f. The authors have now looked at BMDMs and KCs (steady state) from these animals. They conclude LysM only targets BMDMs not KCs, while CLEC4F targets both KCs and BMDMs. This I do not understand, BMDMs do not express CLEC4F so why are they targeted with this CRE? Additionally, BMDMs are not the correct control here, rather the authors should look at the incoming moMFs in the livers of these mice in the MASLD setting. Similarly, the KO in the MASLD KCs should be verified.

      We thank the reviewer for these constructive comments. We have performed FACSsorting of KCs (CD45<sup>+</sup> F4/80<sup>hi</sup> CD11b<sup>low</sup> TIM4<sup>hi</sup>) or MoMFs (CD45<sup>+</sup> F4/80<sup>low</sup> CD11b<sup>hi</sup> Ly6G<sup>-</sup> TIM4<sup>-</sup>) from Chil1<sup>fl/fl</sup> and Lyz2<sup>∆Chil1</sup> or Clec4f<sup>∆Chil1</sup>mice fed NCD or HFHC for 4 weeks, respectively. Compared with Chil1<sup>fl/fl</sup> mice, mRNA levels of Chil1 was reduced more than 90% in KCs from Clec4f<sup>∆Chil1</sup> mice while not different in MoMFs at both 0 and 4 weeks (Revised Figure S3B). Besides, compared with Chil1<sup>fl/fI</sup mice, mRNA levels of Chil1<sup>fl/fI</sup was reduced more than 90% in MoMFs from Lyz2<sup>∆Chil1</sup> mice while roughly 40% in KCs at both 0 and 4 weeks (Revised Figure S5B). This revised data support the phenotypic difference between Lyz2-CKO and Clec4f-CKO mice. 

      Then I had asked for validation of macrophage expression of Chil1 in other MASLD human and mouse datasets. The authors have looked into this, but the data provided do not suggest it is highly expressed by these cells either in the other mouse models or in the human. Nevertheless, they include a statement suggesting a similar expression pattern (although also being expressed by other cells). This is not an accurate discussion of the data and hence must be revised. This also prompted me to take another look at their data and this has left me querying the data in Figure 1D. Is the percent expressed 1%? In Figure 1C the scale goes from 0-100 but here 0-1. If we are talking about expression in 1% of cells which would fit with the additional public mouse data now analysed then how relevant are any of these claims? How sure are the authors that the effects seen are through KCs/moMFs? In figure 1D all cells profiled by scRNA-seq should be shown not just MFs to get a better sense of this data. What is macrophage expression of Chil1 compared with all other liver cells?

      We thank the reviewer for the thoughtful feedback. We agree that the expression pattern of Chil1 should be described more accurately. To address this point, we examined four additional publicly available scRNA-seq datasets, including two mouse MASLD models and two human MASLD datasets (Author response image 2). Across these studies, the cell type with the highest Chil1 expression varied, whereas Chil1 transcripts were detected at relatively low frequency in macrophages (~1% of cells; Author response image 2C, E, K). To better present these data, we regenerated the UMAP plots to include all captured liver non-parenchymal cells, defined using the top two lineage specific markers (Author response image 3A–B). Consistent with Figure 2A–C, violin plots show that Chil1 is highly expressed in neutrophils, with only modest expression detected in macrophages (Author response image 3C). Further analysis of monocyte/macrophage subsets indicates that approximately ~1% of MoMFs or KCs express Chil1 (Author response image 3D–F). As the reviewer noted, the y-axis in Author response image 3F ranges from 0–1%, reflecting the low transcriptional detection frequency of Chil1 in macrophages, which is consistent with the additional public datasets analyzed.

      We also recognize that mRNA detection by scRNA-seq does not necessarily reflect protein abundance. Therefore, we assessed Chi3l1 protein expression in hepatic macrophages using immunofluorescence staining for F4/80, TIM4, and Chi3l1 in liver sections from mice fed either normal chow diet (NCD) or HFHC diet. These analyses show that Chi3l1 protein is detectable in both KCs (TIM4<sup>+</sup>F4/80<sup>+</sup>) and MoMFs (TIM4<sup>-</sup>F4/80<sup>+</sup>) (Revised Figure 1A). Quantitative colocalization analysis revealed that under NCD conditions, approximately 8% of F4/80<sup>+</sup> macrophages are Chi3l1-positive, which increases to ~15% following HFHC feeding (Revised Figure 1A). To confirm antibody specificity, we additionally performed staining in Chil1 knockout mice. In contrast to wild-type mice, Chi3l1 signal was completely absent in F4/80<sup>+</sup> macrophages from Chil1<sup>-/-</sup> mice, validating the specificity of the staining (Revised Figure 1B). Together, these results suggest that low-abundance Chil1 transcripts may be under-detected by scRNA-seq, whereas immunofluorescence captures accumulated protein. Importantly, our functional experiments using Clec4f-Cre– mediated deletion directly support that the observed phenotypes are mediated through Kupffer cells, regardless of expression levels in other liver cell types.

      In response to the reviewer’s comments, we have made the following revisions:

      (1) Softened our conclusion to: “Hepatic macrophages express CHI3L1 and upregulate its expression following HFHC feeding” (Revised manuscript, page 4, lines 136–137).

      (2) Included representative low-magnification images showing multiple F4/80<sup>+</sup> macrophages along with quantitative analysis (Revised Figure 1A).

      (3) Added immunofluorescence staining of Chil1<sup>-/-</sup> liver sections demonstrating complete absence of Chi3l1 signal in F4/80<sup>+</sup> macrophages, validating antibody specificity (Revised Figure 1B).

      (4) Regenerated UMAP plots to display all liver non-parenchymal cells and clearly indicate the low detection frequency of Chil1 transcripts in macrophages (Author response image 3).

      (5) Revised the relevant text to more accurately describe Chil1 expression patterns in hepatic macrophages (Revised manuscript, page 4, lines 136–157).

      Author response image 2.

      Analysis of Chil1 expression in additional single-cell RNA sequencing datasets. (A-C) Chil1 expression in a mouse model of NASH. (A) t-SNE projection of cell clusters from scRNA-seq data (GSE1283338) of livers from C57BL/6J mice fed a control or NASH diet for 30 weeks. (B) Dot plot showing scaled Chil1 expression across all identified cell clusters. (C) Dot plot of scaled Chil1 expression after excluding the neutrophil cluster, highlighting expression in macrophage populations. Analyzed cell clusters and cell numbers: KC_H (healthy, 1178); KC3_Control (1142); KC_N (NASH, 1045); KN_RM (recruited macrophage in KC niche, 950); Proliferating_KC (364); PDC_Control (356); Ly6CHi_RM (320); LSEC (299); NK_NKT (393); B_cell (244); DC_1 (107); DC_2 (118); Ly6CLo_RM (127); Hepatocyte (57); PDC_NASH (46); Neutrophil (21). (D-E) Chil1 expression during NAFLD progression in a mouse Western diet model. (D) t-SNE projection of cell clusters from scRNA-seq data (GSE156059) of livers from C57BL/6J mice fed a Western diet with fructose/sucrose for 12, 24, and 36 weeks. (E) Dot plot showing scaled Chil1 expression across all identified cell clusters. Analyzed cell clusters and cell numbers: capsule macs (250), LAMs (1419), Ly6chi monocytes (6912), mac1 (638), moKCs (767), Patrolling monocytes (690), Prolif.macs (521), Resident KCs (3629), Transitioning monocytes (3615). (F-H) Chil1 expression in human cirrhotic liver biopsies. (F) t-SNE projection of cell clusters from scRNA-seq data (GSE136103) of healthy and cirrhotic human liver samples. (G) Dot plot showing scaled Chil1 expression across major cell lineages. (H) Dot plot of scaled Chil1 expression specifically within the mononuclear phagocyte (MP) population. Analyzed cell clusters and cell numbers: B cell (1951); cycling (967); Epithelia (3751); ILC (10091); mast cell (2511); Mesenchyme (2382); MP (10874); pDC (317); Plasma cell (877); T cell (19076). (I-K) Chil1 expression in a human NAFLD explant. (I) t-SNE projection of cell clusters from scRNA-seq data (GSE190487) of a human NAFLD liver explant. (J) Dot plot showing scaled Chil1 expression across all identified cell clusters. (K) Dot plot of scaled Chil1 expression within the MP subpopulations. Analyzed cell clusters and cell numbers: B cell (1278); Cycling (152); MP (2897); pDC (391); Plasma cell (85); T cell (1551); KC (403); SAMac (scar-associated macrophages, 723); TM (tissue monocytes, 1265).

      Author response image 3.

      Hepatic macrophages express Chi3l1. (A-D) Wildtype C57BL/6J mice were fed either a normal chow diet (NCD) or HFHC for 16 weeks. NPCs were isolated and subjected to BD Rhapsody scRNA sequencing. (A) Uniform manifold approximation and projection (UMAP) plots illustrate the clustering of NPCs from the livers of mice fed NCD and HFHC. Major cell types are colored. (B) Heatmap showing the mean expression of top2 markers of each cell type. (C) Violin plots show the RNA expression of Chil1 between NCD and HFHC livers in each cell cluster. (D) UMAP plots depict the clustering of Monocytes/Macrophages in the livers of mice fed NCD and HFHC. Cell clusters are color-coded. (E) Dot plot displays the scaled gene expression levels of lineage-specific marker genes in different cell clusters. (F) Dot plot shows the scaled gene expression levels of Chil1 in the indicated cell clusters.

      The cell death had also previously concerned me that 40-60% of KCs were tunel +ve. I do not understand how 60% are +ve at 8 weeks but then they have more or less same number of TIM4+ cells at 16 weeks? How can this be? why do the tunel +ve cells not die? This concern remains as I don't understand how they reached these numbers given the images. Additional, larger images were also not provided to be sure that they are representative images in the figure. Now in the images provided, there are clearly cells which are TIM4+ where the tunel does not overlap, likely it is in a LSEC or other neighbouring cell. Indeed also taking Fig S11b as an example there are ˜7KCs and at best 1 expresses tunel so how do they get to 60%?

      We thank the reviewer for these constructive feedback. We agree that the sustained TUNEL positivity without corresponding KC depletion presents an apparent paradox. Based on our data, we propose that TUNEL-positive KCs represent cells in a prolonged stressed or pre-apoptotic state rather than undergoing immediate clearance. This interpretation is supported by the relatively stable TIM4+ cell numbers between 8 and 16 weeks, which would be inconsistent with rapid cell death and removal. Previous studies (PMID: 33440159; PMID: 32888418) have similarly documented gradual KC loss during MASLD progression, supporting our view that KC death occurs over an extended timeframe rather than acutely.

      Regarding quantification concerns, we acknowledge that the representative images in the original figure may have been misleading. To address this, we have now quantified KC apoptosis using low-magnification fields across multiple liver sections to ensure statistical rigor. Figure S11B (now Revised Figure S9B) presents these data, showing that under NCD conditions, KC apoptosis rates are minimal in both genotypes. Following HFHC feeding, apoptosis rates are comparable between Chil1<sup>fl/fl</sup> and Lyz2<sup>Δ Chil1</sup> mice. Importantly, we have replaced all TIM4/TUNEL co-staining images with lowmagnification representative images in the revised figures (Revised Figure 1A, 1B, 5E, S9A, S9B). These images better reflect the quantitative data and confirm that the originally highlighted high-magnification fields were not representative of global apoptosis rates.

      Reviewer #3 (Public review):

      This paper investigates the role of Chi3l1 in regulating the fate of liver macrophages in the context of metabolic dysfunction leading to the development of MASLD. I do see value in this work, but some issues exist that should be addressed as well as possible.

      Here are my comments:

      (1) Chi3l1 has been linked to macrophage functions in MASLD/MASH, acute liver injury, and fibrosis models before (e.g., PMID: 37166517), which limits the novelty of the current work. It has even been linked to macrophage cell death/survival (PMID:31250532) in the context of fibrosis, which is a main observation from the current study.

      We thank the reviewer for raising this important point and acknowledge previous studies linking Chi3l1 to macrophage function in liver disease. However, several aspects of our work extend beyond these prior reports. First, although global Chi3l1 deficiency has been shown to promote macrophage apoptosis in toxin-induced fibrosis models (PMID: 31250532), our study demonstrates that Chi3l1 differentially regulates the fate of distinct hepatic macrophage subsets embryo-derived Kupffer cells (KCs) and monocyte-derived macrophages (MoMFs)—in MASLD. To our knowledge, this subset-specific regulation of hepatic macrophages has not been previously described. Second, we identify a previously unrecognized metabolic mechanism by which Chi3l1 regulates macrophage survival. Specifically, we find that Chi3l1 binds glucose and promotes glucose uptake, thereby protecting the highly glucose-dependent KCs from metabolic stress–induced death, while exerting minimal effects on MoMFs. This mechanism is distinct from the previously reported Fas/Akt-mediated pathway (PMID: 31250532) and highlights a metabolic checkpoint controlling macrophage subset– specific vulnerability. Third, our findings reveal context- and cell type-dependent roles of Chi3l1. While myeloid-specific deletion of Chi3l1 has been reported to ameliorate steatohepatitis and fibrosis (PMID: 37166517), our KC-specific deletion model shows that loss of Chi3l1 in KCs exacerbates disease, indicating a previously unrecognized protective role of Chi3l1 in KCs during early MASLD. Together, these findings provide new insights into macrophage subset-specific regulation, identify a novel glucose related metabolic mechanism, and reveal context-dependent functions of Chi3l1 in MASLD pathogenesis.

      (2) The LysCre-experiments differ from experiments conducted by Ariel Feldstein's team (PMID: 37166517). What is the explanation for this difference? - The LysCre system is neither specific to macrophages (it also depletes in neutrophils, etc), nor is this system necessarily efficient in all myeloid cells (e.g., Kupffer cells vs other macrophages). The authors need to show the efficacy and specificity of the conditional KO regarding Chi3l1 in the different myeloid populations in the liver and the circulation.

      We thank the reviewer for raising this important point regarding the specificity of the genetic models and the apparent discrepancy with the study by Feldstein and colleagues (PMID: 37166517). To address these concerns, we performed additional experiments to directly assess the efficiency and cell-type specificity of Chi3l1 deletion in our models.

      (1) Efficiency and specificity of LysM-Cre and Clec4f-Cre models

      We isolated KCs (CD45<sup>+</sup> F4/80<sup>hi</sup> CD11b<sup>low</sup> TIM4<sup>hi</sup>) or MoMFs (CD45<sup>+</sup> F4/80<sup>low</sup> CD11b<sup>hi</sup> Ly6G<sup>-</sup> TIM4<sup>-</sup>) by FACS from Chil1<sup>fl/fl</sup>, Lyz2<sup>∆Chil1</sup> and Clec4f<sup>∆Chil1</sup>mice fed either NCD or HFHC diet. Consistent with the known specificity of these Cre lines, Clec4f-Cre resulted in >90% reduction of Chil1 mRNA in KCs with no significant change in MoMFs (Revised Figure S3B), confirming efficient KC-specific deletion. In contrast, LysM-Cre reduced Chil1 expression by >90% in MoMFs but only ~40% in KCs (Revised Figure S5B). These data support the reviewer’s concern that LysM-Cre mediates incomplete recombination in KCs, whereas the Clec4f-Cre model provides KC-specific deletion, explaining why the phenotype observed in Lyz2<sup>∆Chil1</sup> mice is relatively modest.

      (2) Relationship to the study by Feldstein et al.

      We agree that our LysM-Cre results appear different from those reported by Feldstein and colleagues. However, considering the new recombination data and differences in disease models, we believe the findings are complementary rather than contradictory. First, the disease models differ substantially. Feldstein et al. used a CDAA-HFAT diet for 10 weeks, which rapidly induces severe inflammation and fibrosis, whereas our study employed a long-term HFHC diet, modeling the more gradual metabolic progression of MASLD. These distinct disease contexts may engage different CHI3L1dependent pathways. Second, the mechanistic focus differs. Feldstein et al. reported that myeloid Chi3l1 promotes steatohepatitis and fibrosis through inflammatory macrophage recruitment and IL13Rα2-mediated stellate cell activation. In contrast, our study identifies a metabolic mechanism in which CHI3L1 binds glucose and promotes glucose uptake, protecting metabolically vulnerable KCs from stress-induced death. Finally, and importantly, KC-specific deletion using Clec4f-Cre recapitulates the key phenotypes observed in our study, including effects on KC survival and metabolic regulation. This confirms that the observed effects are KC-autonomous and not due to broader Cre activity in other myeloid populations.

      Together, these additional experiments clarify the recombination efficiency of our models and demonstrate that our conclusions are supported by KC-specific genetic evidence.

      (3) The conclusions are exclusively based on one MASLD model. I recommend confirming the key findings in a second, ideally a more fibrotic, MASH model.

      We thank the reviewer for this valuable suggestion. To address this point, we tested our key findings in an additional MASH model using a methionine–choline-deficient (MCD) diet. First, we examined Chi3l1 expression in this model. Wild-type mice fed an MCD diet for 6 weeks showed significantly increased Chi3l1 mRNA and protein levels in liver tissues compared with NCD controls, confirming diet-induced upregulation (Revised Figure 3A–B). To determine the functional contribution of Kupffer cell–derived Chi3l1, we subjected Clec4f<sup>ΔChil1</sup> mice and Chil1<sup>fl/fl</sup> controls to MCD feeding for 6 weeks. Body weight was comparable between genotypes throughout the feeding period (Revised Figure 3C). However, KC-specific deletion of Chi3l1 significantly exacerbated MCD diet–induced liver pathology, including increased steatosis, inflammation, and fibrosis, as indicated by higher MASLD activity scores, enhanced Oil Red O staining, increased Sirius Red deposition, and elevated α-SMA expression (Revised Figure 3D). Consistent with these histological findings, Clec4f<sup>ΔChil1</sup> mice exhibited an increased liver index, whereas serum ALT levels remained comparable between groups, suggesting increased hepatic lipid accumulation rather than aggravated hepatocellular injury (Revised Figure 3E). In addition, serum and hepatic triglyceride levels and serum cholesterol were significantly elevated, while hepatic cholesterol levels were not significantly different from controls (Revised Figure 3E). Together, these results validate our findings in an independent MASH model and further support a protective role for Kupffer cell–derived Chi3l1 in limiting steatosis and disease progression (Revised manuscript, page 5, line 188-205).

      (4) Very few human data are being provided (e.g., no work with own human liver samples, work with primary human cells). Thus, the translational relevance of the observations remains unclear.

      We thank the reviewer for raising this important point. We agree that additional human validation would further strengthen the translational relevance of our findings. We initially attempted to examine macrophage cell death in human liver samples by performing TUNEL and F4/80 co-staining on human liver cancer tissues. However, we did not detect clear colocalization in these samples. We speculate that this may reflect differences in disease context and stage, as the available samples represent endstage liver disease, whereas our study focuses on early MASLD progression. Despite this limitation, we provide several lines of evidence supporting the human relevance of our findings. First, analysis of multiple public human MASLD scRNA-seq datasets demonstrates Chi3l1 expression in hepatic macrophages (Figure 2F–K). Second, analysis of public bulk RNA-seq datasets shows that Chi3l1 expression positively correlates with MASLD disease activity and progression (Revised Figure 1EF). Third, our observations are consistent with previous clinical studies reporting elevated CHI3L1 levels in patients with MASLD/MASH and advanced liver disease. We acknowledge that functional validation in primary human macrophages or human liver tissues would further strengthen the translational significance of this work. This limitation and future direction have now been added to the Discussion (Revised manuscript, page 10, lines 409–411).

      Comments on revisions:

      The authors have done a thorough job addressing my comments. However, I am not convinced about the MCD diet model, which is somewhat hidden in the Supplementary Files. Neither seems MASH different nor are any fibrosis data shown to support the conclusions. I am not satisfied with this part of the revised manuscript, and I do not agree that the second MASH model would support the conclusions.

      We thank the reviewer for their continued careful evaluation and for highlighting the need for clearer presentation of the MCD model data. To address this concern, we have substantially revised this section of the manuscript. First, the MCD model results have now been moved from the Supplementary Figure to a new main figure (Revised Figure 3) to improve visibility and clarity. Second, we have added additional fibrosis analyses, including Sirius Red staining and α-SMA immunostaining, to directly assess fibrotic changes. These analyses show that MCD feeding induces significant collagen deposition in control mice and that fibrosis is further increased in Clec4f<sup>ΔChil1</sup> mice (Revised Figure 3D). Importantly, the MCD model recapitulates the key phenotypes observed in the HFHC model, with KC-specific Chi3l1 deletion leading to increased MASLD progression. These findings support the conclusion that the protective role of Kupffer cell–derived Chi3l1 is not restricted to a single dietary model, but is observed across distinct models of steatohepatitis. We hope that these revisions clarify the results and strengthen the evidence supporting our conclusions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor:

      Line 73 - should be moMfs not moKCs

      We thank the reviewer for this helpful comment. The term moKCs was used intentionally in line 73 to refer to monocyte-derived Kupffer cells, rather than MoMFs (monocyte-derived macrophages). To avoid potential confusion, we have clarified the terminology in the revised manuscript.

      Methods: diet is mentioned for 6 weeks but for HFHC should be 16.

      The correction has been made in the Methods section (page 3,line115).

      Liver/body weight ratios are >3 then I think it is body/liver weight ratio?

      We thank the reviewer for this query. The reported values represent liver-to-body weight ratios, calculated as (liver weight ÷ body weight) × 100%. A value of ~3% is consistent with the expected range for mice with MASLD-associated hepatomegaly.

      This clarification has been added to the revised figure legend.

      Figure 5F - what happens in Clec4f-CRE mice fed HFHC?

      We thank the reviewer for this question. Western blot analysis showed that the HFHC diet upregulated Chi3l1 protein in the livers of Clec4f-Cre mice post HFHC diet (Author response image 4.), similar to the increase observed in wild-type mice.

      Author response image 4.

      The expression of Chi3l1 in serum of Clec4f cre mice. (A) Western blot to detect Chi3l1 expression in murine serum of Clec4f cre mice before and after HFHC feeding. n=3 mice/group.

    1. On 2019-07-01 19:48:25, user Julius Adler wrote:

      July 2, 2019: some changes to April 18, 2019

      Drosophila Mutants that Are Motile but Respond Poorly to All Stimuli Tested Mutants in RNA splicing and RNA Helices, Mutants in The Boss

      Lar L. Vang and Julius Adler

      The following idea was presented in 2011 in “My Life with Nature” by Julius Adler, p. 60:

      “Recently I conceived a new idea. “The Boss is the thing inside every organism – humans, other animals, plants, microorganisms – that is in charge of the organism. I don’t mean this in any mystical or spiritual or religious sense, but rather I mean it in terms of chemistry and physics. You may think that The Boss is a wild idea, and certainly the evidence for it is poor, but I think it’s true, and at least it’s a hypothesis to be tested.”

      Now we have tested this idea:

      Adler and Vang (2016) and Vang and Adler p. 13, 2018) reported Drosophila mutants that lack all responses to external and internal stimuli at 34 degrees but at room temperature these mutants are not deficient. This means that activity by the Boss can be eliminated at 34 degrees but the activity is still present at room temperature.

      And they (Vang and Adler, 2016) reported a Drosophila mutant that lacks responses to all stimuli tested at both 34 degrees and room temperature. That indicates that this mutant lacks behavioral action by The Boss.

      (It must be admitted that the defects in these mutants were caused by defects in The Boss.)

      What is The Boss? It is a mechanism that acts as described in Figure 10 of Adler, 2016:

      https://uploads.disquscdn.c... https://uploads.disquscdn.c...

      Fig. 10 of Adler, 2016

      The idea that each organism has something in control of the organism is novel. Before this, it was believed that each organism has properties that are largely independent of each other. Now it is suggested that all the properties are controlled by a single factor, The Boss, which directs both the interior and the outside of the organism. The Boss is to be found in humans, other animals, plants, and microorganisms. The evidence for this idea is incomplete.

      Adler J (2011) My life with nature. Ann Rev Biochem 80 42-70.

      Adler J (2016). A search for The Boss: The thing inside each organism that is in charge. Anat Physiol Biochem Int J Vol.1, 2016.

      Adler J, Vang LL (2016) Decision making by Drosophila flies. bioRxiv March 24, 2016.

      Vang LL, Adler J (2018) Drosophila mutants that are motile but respond poorly to all stimuli tested: Mutants in RNA splicing and RNA helices, mutants in The Boss. bioRxiv October 1, 2018.

    1. On 2019-05-19 23:04:36, user Charles Warden wrote:

      Thank you for putting together this pre-print.

      I am sure that there are some situations where higher read coverage can be beneficial. Admittedly, I think other applications like mutation calling would have a relatively greater need for more reads (and that would depend upon the evenness of coverage for your library type, and possibly what genes you have the greatest need to check mutations in), but I think it is perfectly reasonable to focus on the differential expression part for one paper.

      That said, when I saw the tweet mentioning "We find > 70% published studies would have benefitted from increasing number of reads sequenced", I was a little worried about the influence it could have on readers for the following reasons:

      1) If somebody is considering purchasing a Desktop sequencer for RNA-Seq analysis, I think 2-6 Million reads to cover genes with above average expression may be a better option than using targeted gene panels. For example, if you do re-analysis (with unique read counts and updated differential expression methods, like DESeq2, limma-voom, etc.), I think the MiSeq data from the cuffdiff2 paper shows reasonable results (for treatments with clear gene expression changes).

      2) In most cases, I am more concerned about people having replicates than needing more reads (at least for gene expression).

      I apologize that I think it may be a little while before I can focus more on point #1, but I tried to take a quick look at this paper.

      I think that it is great that you performed benchmarks with DESeq2, edgeR, and limma-voom (although maybe you want to change “limma” to “limma-voom” in the abstract?). I apologize for not being able to find this on the superSeq page (although I did find the reminder for the previous biocLite() command for dependencies to be helpful), but are tables of pre-processed counts (and their gene lists with all 3 methods) readily available for the 1,021 contrasts?

      I am also glad that you are looking at differentially expressed gene counts (and not just unique read sequences) for your rarefaction plot, since I think that is a more relevant measure for whether you get functionally relevant results. However, in terms of the vignette example, I think the difference between 1338.968 “Estimated number of discoveries” at read depth of 1 and 1888.286 “Estimated number of discoveries” at read depth of 3x is within the range that could be achieved from changing the p-value method and/or changing the FDR cutoff (from 0.05 to 0.25 or 0.50, for example).

      Similarly, I am concerned about some of the maximum gene counts in the pre-print, which look like pretty much the entire genome in Figures 2 (and are already above 2000-4000 genes in the theoretical example in Figure 1). I think the best balance for functional enrichment is often around 1000-2000 total genes (~5-10% of genes). So, I would be interesting in knowing if your framework can answer a question like “What is the range of reads needed to identify 1000 or 2000 differentially expressed genes?.”

      While some treatments have greater effects, I think 10-20 million reads for a human polyA library is probably usually OK (and perhaps double that for a ribosome-depleted library, with a lower exonic percentage). I think that is pretty much what Figure 1A shows (although that looks like close to 30 million reads), but I am wondering if there is a figure derived from your ~1000 comparisons (and/or a parameter that can be added to plot pre-computed values in the R package).

      Also, am I correctly understanding that you downloaded pre-processed counts? Did you look at some of the most extreme differences and test reprocessing the samples to see if that helped the differentially expressed gene counts become more similar? There are situations where I would prefer to start from FASTQ files and process all samples the same way.

      For example, 60,000 in Figure 5 seems like it probably includes transcripts – is it possible to only look at unique gene-level counts (that is admittedly what I would be interested in checking)? Or, are there outliers that can be excluded if you only look at human and mouse experiments (trying to control for annotation effects)? Also, I’m not hugely concerned about the annotation in model organisms like yeast or fly, but the total number of genes in the genome is going to have some effect (both in terms of the effect on the differential expression models, as well as having very different genome sizes and maximum gene counts).

      Finally, going back to my original point #2, I would expect replicates should help reduce false positives. With large enough sample sizes, I would expect to pick up more subtle effects. However, with 1-3 replicates, I think fewer genes to narrow down candidates may be beneficial (rather than increasing the number of genes identified). For example, at an estimated FDR of 0.05, how many genes are identified between biological replicates for the same group (to see if increased sensitivity may actually be affecting the accuracy of the estimation to allow more false positives, which seems likely if you are identifying >20% of the genome, in my opinion).

      Or, it is a slightly different point, but I think 6 replicates are used in Figure 3A. If 6 replicates exist for an experiment, what is the effect of having 3 replicates at the current coverage versus 6 replicates at halved coverage? Sometimes, getting people to even do comparisons with triplicates can be a challenge.

      I apologize that this is kind of a long comment, but that is because I think this is an important topic. When I get the point of being able to post some pre-prints, I realize that answering questions from long commenters can take time, but I think that is very important for the scientific community (in terms of helping put together the best possible paper for peer-review).

    1. On 2019-05-10 06:28:34, user Milind Watve wrote:

      Our manuscript was rejected by a leading journal with comments by three reviewers. We expressed our desire that in the spirit of transparency of the review process, the reviewers’ comments and our responses should be allowed to be posted and made public. Two of the two reviewers and the journal editors agreed to the request and therefore we are posting their comments and our responses to them here. Although the journal editors consented to post them, on the advice of Biorxiv admin, we are keeping the journal, editors as well as reviewers anonymous. <br /> Rejection is a part of the game and we respect the editors’ decision. However, the reasons for rejection should be transparent so that readers can make their own judgment about the fairness of the editorial process. Transparency would make the review process more responsible and we express our full support to it. <br /> We thank the editors and all the three reviewers for their inputs. We would have been happier if reviewer 1 also agreed to post his comments.<br /> Milind


      Reviewer #1:<br /> Did not respond to the request for consent to post the comments.

      Reviewer #2:

      The authors provide a systematic literature study on the question: “does insulin signaling decide glucose levels in the fasting steady state?”. The answer is a clear no. Although the overview looks solid - I am not an expert in all the literature on glucose homeostasis, so I cannot decide on that, really – the conceptual aspects of this study are rather weak. This may very well reflect the general weakness in conceptual thinking in biomedical sciences, but certainly the control engineers that build feedback control system for artificial pancreas applications will find the answer trivial. The authors use biologically fuzzy terminology, such as “drivers” and navigators”, CSS and TSS, and later r and K strategies, where terminology of control theory would be most appropriate. Not a single reference to control theory, where an integral feedback principle could explain much, if not all of the observations, it seems.

      Response: The reviewer appropriately captures the state of control theory and models by the words “much, if not all”. All the models of glucose homeostasis today explain only a small part of the demonstrated features of glucose homeostasis and of diabetes. The “much” is a very small fraction of reality and most models stop at explaining only some of the features. Not being able to explain a certain empirical finding does not immediately invalidate a model. However, a direct contradiction with empirical findings certainly raises questions about the model. The model suggested by the reviewer below is an excellent example of it.

      For illustration: if the CSS model that the authors use in the supplements is slightly modified by:

      dGlc/dt = (Gt+L) – K1 Glc – Ins_sens K2 ins<br /> dIns/dt = K3 Glc - d

      (so insulin removal is independent of the insulin level), then at steady state of this coupled system (where dGlc/dt = dIns/dt = 0):<br /> Glc_s = d/K3<br /> Ins_s = {(Gt+L) – K1/K3 d }/(Ins_sens K2)

      Thus, Glc at steady state is independent of insulin sensitivity, or glucose production or consumption. It is also said to be perfectly adapted to these parameters. So if Ins_sens is lower, Ins_s will be higher but glc_s remains the same: a perfect basis for the HOMA index!<br /> Only the experiments with reduced removal of Ins (parameter d) would be expected to have lower glucose, but of course this is a very very simple model of glucose homeostasis. Also poor synthesis of insulin by impaired beta cells would lower K3 and this may explain higher fasting glucose levels.

      Response: This is an interesting model and a perfect example of how in order to explain one empirical finding the model contradicts many others. Certainly the model accounts for hyperinsulinemia in response to insulin resistance without a change in glucose level. However, it does not explain the results of insulin degrading enzyme knockouts, which would decrease d and is thereby expected to increase glucose, but that does not happen in experiments. Further we simulated using this model to see whether the FG-FI correlation in the steady state would be different than during post glucose load dynamics. Even in this model the regression correlation parameters remain the same and only the range shifts upwards. Thus the model suggested by the reviewer does not account for the experimental and epidemiological results that we cite in this manuscript. <br /> The focus of our manuscript is to look at convergence of many sets of experiments and therefore suggesting a model that satisfies one but not others is not an appropriate solution. <br /> The other problem with the model suggested by the reviewer is that it makes an assumption of constant degradation rate of insulin independent of its standing concentration. Most biochemical decays are known to follow negative exponential. If you want to make an assumption deviant with the general pattern, you need a justification and validation for the assumption. In the case of insulin there is published literature on the half-life of insulin.So the baseline assumption should be that insulin degradation follows half-life dynamics and if you want to make any other assumption, you need convincing justification for it.<br /> So I am a bit puzzled. What is the point of this paper? Does anyone take CSS seriously, really? Again, I do not know all the literature but I am sure there are good models out there that can and do explain T2D and glucose homeostasis very well. <br /> Response: The whole point is that in existing there isn’t a model that does so. Believing that there are good models out there is not sufficient for the reviewer. If there is any kindly point it out specifically. <br /> Should ….(Journal name)…. fix a failure in the education of doctors? And if ….(journal name)… decide they want to do that, please teach them the right vocabulary and conceptual frame work, and properly cite the control theory literature!<br /> Response: We would be glad if control theory has a model that is compatible with all the empirical results pointed out in our manuscript. It is not enough for the reviewer to say that there are. Kindly point out specifically if there really are. As far as we know there aren’t any. But this manuscript is not an intended review of models, it rather lays out the set of experimental results and epidemiological patterns that any model of glucose homeostasis needs to explain. This set has been put together for the first time and that is the main contribution of the paper. Our central argument is that glucose homeostasis needs to take into account all these results TOGETHER. You cannot look at partial picture again and say there are models that are compatible with the partial picture. <br /> To the best of our knowledge, none of the existing models would explain all of them together. We are suggesting here that this is because the set of foundational assumptions of these models is not correct. We are suggesting what change might be needed in it. Building models with the new set of assumptions would certainly deserve a separate publication. Our manuscript is not intended to give the answer, we are defining the question in a broader perspective that has not been taken so far.

      Specific comments:<br /> 1. “The belief that this product (HOMA) reflects insulin resistance is necessarily based on the assumption that insulin signalling alone quantitatively determines glucose level in a fasting steady state.”<br /> I really do not get this. See the above simple model: many parameters determine the steady state levels, but if Ins_sens is lower (or L is higher by less insulin inhibition), steady state insulin is higher at the same glucose concentration, so HOMA makes perfect sense to me. Obviously, there can be other ways to change HOMA, but it is simple and effective in the clinic.<br /> Response: HOMA does make sense w.r.t the above model but as pointed out earlier this model has multiple flaws and unless we have a model that is compatible with all experimental and epidemiological results it is difficult to claim that HOMA makes sense.

      1. “There is a subtle circularity in the working definition of insulin resistance. Insulin resistance is blamed for the failure of normal or elevated levels of insulin to regulate glucose…. However, clinically insulin resistance is measured by the inability of insulin to regulate glucose. Such a measure cannot be used to test the hypothesis that insulin resistance leads to the failure of insulin to regulate glucose.”<br /> Sorry but the circularity is so subtle that I miss it. If the argument is that insulin regulation is impaired in insulin resistance (what’s in the name), people should measure the action of insulin, right? What is wrong here?<br /> Response: To explain the circularity in different words-<br /> (i) Insulin is unable to regulate glucose because the body has insulin resistance<br /> (ii) Insulin resistance is measured as the inability of insulin to regulate glucose<br /> (iii) Put (i) and (ii) together, it reads “insulin is unable to regulate glucose because of the inability of insulin to regulate glucose”<br /> Isn’t this circular enough or is more clarification needed?

      2. line 437: suddenly, “hysteresis” appears out of nowhere. What is this? Please explain properly if relevant, do you really think these poor doctors know what that is?<br /> Response: We agree and will revise the text here to explain the context without the word “hysteresis”.<br /> In brief, the comments by this reviewer are thought provoking and we learnt a lot while addressing them, but they leave us with a little bit of doubt about the soundness of his/her ideas about control theory. <br /> --

      Reviewer #3:

      This is a very interesting question, and a novel approach to addressing it. I have focussed primarily on the systematic review aspects.<br /> 1. The meta-analysis technique used is essentially "vote counting", and this is not recommended (see https://handbook-5-1.cochra... for reasons given in the reference.<br /> Response: Many many thanks to the reviewer for pointing this out. We read the link carefully to find that our analysis is very sound by these guidelines. It does not recommend vote counting in significant versus non-significant types of outcomes. But it clearly says, <br /> “To undertake vote counting properly the number of studies showing harm should be compared with the number showing benefit, regardless of the statistical significance or size of their results. A sign test can be used to assess the significance of evidence for the existence of an effect in either direction”<br /> This is precisely what we have done. So this comment validates our analysis and increases our confidence. Thanks once again. <br /> 2. I could find no mention of a PROSPERO registration - this is important<br /> Response: We agree and will improve during revision.<br /> 3. There is no attempt, as far as I can see, to address the possibility of publication bias<br /> Response: Publication biases are discussed already in the main text line 125-129, but we will elaborate more and also include in supplemental table 3.<br /> 4. The analysis is not reported in a way consistent with the PRISMA guidelines (although these relate to reviews of human data, they have lessons for animal reviews<br /> Response: We made our best attempts to follow PRISMA guidelines for animal experiment reviews as well. It would have been more useful if any inconsistency was specifically pointed out by the reviewer.<br /> 5. There is, as far as I can see, no assessment of risks of bias in the contributing animal studies<br /> Response: We agree and would be glad to improve on. <br /> 6. In my view, it is not enough to say that data will be made available on acceptance - part of peer review should be to ensure that it is made available in a form which is complete, comprehensible and useable, so it needs to be avaialble (even if only through a private link) at this stage.<br /> Response: That is certainly possible and will be done for the revised version.

      Regarding the animal experiments these should be reported according to the ARRIVE guidelines, and as far as I can see (I may have missed it, or you may have done it but not reported it) these were non randomised unblinded experiments without an a priori sample size calculation.<br /> Response: We see the importance of reporting these details for the primary experiments that we performed, but for the review and meta-analysis section we do not have control over what the authors did.<br /> In a nutshell, comments by all the three reviewers are a convincing reinforcement that our central argument is sound and strong. We agree with many of the refinement suggestions and look forward to publish a revised version soon.

    1. On 2019-04-26 17:10:06, user Kristen Naegle wrote:

      From the UVA Systems Biology Journal club discussion of this paper 4/23/19

      We found this to be a really interesting paper with a timely machine learning method on a topic with a lot of room to advance. The authors do a great job motivating the needs in the field, based on limitations of existing methods. Specifically, it is exciting to see a method that seeks to learn globally from all kinases and to extract kinase features that shape kinase-substrate specificity. We found we could not completely understand some key features of the model and its use with the text as it stands and we hope our experience with this manuscript, as outlined below, will be of help to the authors.

      Models and model interpretation<br /> We had some confusion about the model as implemented, especially around whether certain aspects were used to make the model interpretable vs. what was in the model. <br /> 1. PSSMs: A major strength of the neural network approach is the ability to learn and encode conditional dependence between positions in the kinase and amongst positions in the substrate. However, as currently depicted in the approach, it seems that the final predictor relies on collapsing the RNN model into a PSSM and scoring substrates across RNN-derived PSSMs. If this is the case, it is unfortunate to rely on a scoring methodology that is incapable of incorporating conditional dependence between positions. It would be great if the paper could clarify the methodology and explore prediction results that avoid the PSSM as a primary scoring function. <br /> 2. Attention Matrix: The attention matrix is really interesting and has a lot of power to explore specificity determining positions. However, we were unclear about some of the details about the attention matrix, its use, and its presentation in this work:<br /> 2a. Is the feature selection process that determined the attention matrix values used in the final classifier? As written, we were unclear about this. On the one hand, performance as a function of forward feature selection was given. On the other hand, if there are ultimately only 15 kinase sequence features used, then it seems unlikely that that broad range of mutations lands in those features and would make it impossible to score differences as a result of kinase mutations. <br /> 2b. The attention matrix in Figure 2 appears to highlight more than 15 kinase features, and suggests there are family-specific kinase features. However, the text suggests there was a universal set of 15 kinase features. How these 15 were chosen was also under debate in terms of the effectiveness and resolution of the feature selection method. Given the intense growth in performance between 5 and 15 features, it seems it would be beneficial to increase the testing of performance at a higher resolution (1:15 features with one at a time addition).<br /> 2c. It was clearly stated how many features selected by DeepSignal overlapped with KinSpect and DoS, but it would also be nice to know how many KinSpect and DoS features were not identified by DeepSignal (set differences vs. set intersections). <br /> 3. Model Details: <br /> 3a. Is this a “deep” neural network - where are the layers of convolution? Are there hidden layers?<br /> 3b. What are the exact inputs to the model?<br /> 3c. How long is the sequence retained in the recurrent neural network? Is there a limit to how far back the LSTM considers? <br /> 3d. How is allostery incorporated in the model (e.g. as conditional dependence)? Long-range interactions not encoded in local sequence space would appear to be missed unless the entire sequence is considered throughout the recurrent neural network.

      Figure 3 and related methods:<br /> The choice of negative data is hard when the training set only contains positives. The authors used a method that is consistently used in the field. However, because it is a random draw and makes many assumptions about the draw (that there are not false negatives in the set), we felt it would be beneficial to test the robustness of conclusions drawn by repeating this analysis across many resamples of a negative set. This would help us understand the sensitivity or robustness of the conclusions to that particular selection of data. Additionally, it is not clear what model hyperparameters have been tuned to generate the precision-recall and AUROC analyses for the comparator predictors.

      Generalizability of learning on global kinases and training misbalance<br /> We were intrigued by the results in Figure 2E. We think this is a really interesting experiment to test applicability of a globally learned model. We noticed that the only tyrosine kinase in this batch (as a result we assume of being the only tyrosine kinase with more than 100 substrates annotated in the training set) was affected the most when predicted by a model of all kinases in that set, when compared to a single-kinase SRC model. We feel that may suggest that if a training set is predominantly skewed towards serine/threonine kinases that it will not produce the ideal model for tyrosine kinases. As tyrosine and serine/threonine signaling are separated both evolutionarily and physicochemically, it seems reasonable to make two models of kinase-substrate predictions and explore the results of those independently to assess whether the attention value matrices and performance differ greatly. We also wondered if data skew in Figure 2E analyses or more broadly could be a factor (perhaps it would be beneficial to add an analysis of the training data itself).

      Mutation analysis<br /> In addition to the confusion we noted earlier about how the attention value matrix and feature selection is wrapped back into the model and its effect on the ability to test mutational effects, we also wondered what the “false positive rate” was on determination of cancer genes as a function of kinase-substrate misregulation using DeepSignal. The authors focus on capturing known oncogenes (as a function of percent covered), but we wished to know how many total were predicted to be detrimental and whether this differed greatly between DeepSignal and MSM/D-PEM (i.e. both specificity and sensitivity). One representation that might be helpful is to display the total number of predicted cancer genes with the proportion of true highlighted in the subset.

      SH2 domain analysis<br /> As some of our members are very familiar with the problems with the published SH2 domain data (e.g. that they cannot be merged as there are disagreements, different types, and different scales), we understand why the authors chose to build individual models for each dataset. However, in the mutation analysis, it is unclear what final SH2 domain model they used and the authors do not provide the same level of detail on what was learned in the SH2 domain as they did for kinases. In addition to providing more clarity in the methods used for mutation analysis (as it relates to SH2 domains), it would likely be beneficial to do a sensitivity analysis in the outcomes about predicted oncogenic mutations as a result of isolating the kinase and SH2 domain components. Finally, although the paper used was cited, it would be helpful to describe in more detail exactly how an oncogene was determined for readers to better interpret the method and results provided here.

      Signed by:<br /> Kristen Naegle, Ben Jordan, Kevin Janes on behalf of the University of Virginia Systems Biology Journal Club (Journal Club of 4/23/19)

    1. On 2019-04-25 19:30:54, user Madhavi Adiga wrote:

      Hi, <br /> I'm Madhavi Adiga, just started my graduation in Pharmacology Dept. I'm interested in Tumour angiogenesis and I want to build my project in this field. As for the starter I proceeded with different tumor model system. While across searching I found this article, in which LLC subcutaneous tumor model system is being used. Several other studies shows injecting LLC cells with matrigel as substrate to minimize the variability in tumor size during the course of tumor growth. My question is, you started with low number of LLC cells to inject with and studied up to 16 days without using any solid substrate support as there may be chances of leakage into surrounding tissues rather than being confined to the injected place (as I think this may lead to variability between the groups we study), how this will be different from being used with a solid substrate (matrigel) as the solid substrate may give more support to tumor cells to grow in a confined region. Secondly, on what basis you took 16days as criteria to sacrifice mice? based on humane endpoint criteria? or did you do any growth curve study to select 16days? If you keep more time the knockout mice you used (s1pr1) develop more tumor? <br /> Please let me know as this may be helpful for me for my further studies.

    1. On 2019-02-16 20:45:24, user GuyguyKabundi Tshima wrote:

      Patients with a thick negative drop were excluded from the small sample taken to explain HIV-malaria coinfection.

      These excluded patients interested me later with the performance of the diagnosis of malaria by PCR which could detect positive the negative cases of the thick drop even asymptomatic cases which are then treated to reduce the parasite biomass.

      The positive slope means that the weight loss under ART is accompanied by the number<br /> malaria episodes and if we do not want to see the weight gain won under ART be erased in case of malaria, it was necessary to set in motion all necessary means (clinical, paraclinical, therapeutic and nutritional) to prevent HIV positive subjects to do Malaria-disease.

      In 2013, I interacted again with a reader's questions.

      Q. A reader writes: For my part, I would have liked the data of this work are supported by laboratory results from your own investigations:

      A. At variance. I know my answer is LOW: "For my part, I've been recommended by the original supervisor to collect existing data at AMOCONGO, I was authorized by the Vice-Dean in charge of Research, Specialization and Aggregation, and I received the approval of the Ethics Committee of the national program of struggle against AIDS and sexually transmitted infections (PNLS/ IST). The essence of the question is the guarantee of the integrity of the data: what I can attest by having myself collected the data on the medical files.

      Q. A reader writes: Can we present a work of thesis of aggregation on a base as held as the one you present us: the medical files!<br /> Comments : In this case, the elements of the cards used have been designed by others. You have analyzed this data from a perspective that you have set for yourself. Hence, the poverty in the material presented for your subject: the medical files!

      A. At variance. I know that my answer is still LOW, same reason that in 1: evoking the original supervisor is not a "scientific" argument. Here also the background of<br /> the question is the integrity of the data.<br /> The medical forms were used to finalize a process in which the original Promoter advocated for the collection of the data necessary for the finalization of the thesis project.

      Q. A reader writes: What do we mean by prospective study?<br /> Comments: In my opinion, shared by most researchers, a prospective study is one in which the researcher masters the essential stages of research from beginning to end. He establishes his program of study: he foresees the statistical methods, then, collects himself or with the collaborators his data in the laboratory or in the field. Then it analyzes the data collected and identifies the conclusions

      A. At variance. I know my answer is in MIDDLE: "In my opinion, shared by the late Dr. Mulumba Madishala Paul (Biomedical research: methodological bases and elements of biostatistics. Biométrix Editions, Kinshasa. 74 pages, 1994, 200l), it is right and wrong that most researchers consider any study conducted on the basis of medical records as retrospective. In our article, this is an authentic prospective study because the data collected there are of a longitudinal nature (weight at admission, at 3, 6 and 12 months under ART) ". I plan to add 2 or 3 other articles references as this is a great criticism of my methodology. So far I have noted that this prospective / retrospective definition is not consensual, and modern epidemiologists therefore recommend that they no longer use this terminology: it is the reference of a course of biostatistics which one can see on the site of the Faculty of Medicine of Pierre and Marie-Curie University (http://www.chups.jussieu.fr... consultation of the<br /> 28.10.2015).

      Q.4. A reader writes: you talk about a search prospective in the case of a study conducted on the basis of the rereading of medical records. It is therefore in a prospective vision relating to the first year of putting patients under triple therapy that this study was conducted.

      A.4. In agreement. My answer is GOOD, but I have to take out the limitations on my<br /> results. I mention that the limitations of the thesis should be emphasized and well defined.

      Q.5. A reader writes: Compared to the work (ANTERRETROVIRAL FLOODING AND INTERACTIONS WITH MALARIA), what is the original contribution of this work?

      A.5. In agreement. This work had this conclusion: "there is on average no change in weight in the first year under ART". The original contribution of this work is that it must be understood that the link between Selenium and NADPH oxidase was not formally established. And I did not study it with data, but through articles.

      The subject WEIGHT FLUCTUATION UNDER ART AND POTENTIAL INTERACTIONS WITH MALARIA

      "Weight loss under ART and potential interactions with malaria"

      Problematic<br /> Rapid increase in access to antiretroviral therapy in developing countries brought new challenges. These include the unprecedented need for perpetual treatment for an illness<br /> infectious for life, and the pressure this will place on health services [Khoo S., 2004]. Gaps in current knowledge urgently require emphasis on the change in body weight on antiretroviral therapy and the different interactions with other drugs, including antimalarials [Khoo S., 2004]. Malaria is spread across areas of the world where resources are limited,<br /> and most of these sectors have also been shaken by the HIV pandemic.

      Research hypotheses<br /> There are potentially many different ways in which both diseases act each other at the political, social and public health levels, as well as new evidence of how one can affect the pathogenesis and the results of the other [Khoo S., 2004].As access to antiretroviral drugs increases, and new combinations of antimalarials are evaluated. It is important that potential interactions between therapies for these two infections are also reviewed [Khoo S., 2004].

      Main objective

      Contribute to the fight against HIV / AIDS infection and malaria, two major diseases<br /> in the Democratic Republic of Congo with scary figures:<br /> - Malaria: 10% of global mortality<br /> - HIV: 3,000,000 Congolese are infected (?)

      Specific objectives

      • This study was undertaken to evaluate the evolution of the mass index (BMI) or quetelet index of patients living with HIV / AIDS (PVV) under the antiretroviral therapy in a malaria endemic area.
      • Provide clinicians with a nutritional monitoring tool in a malaria endemic area<br /> for people living with HIV

      Methods

      • A simple random sample of 72 medical records of patients followed in Kinshasa<br /> been taken to the medical center of ACS / AMOCONGO, a specialized N.G O. in the Democratic Republic of Tthe Congo, but data not available in the variable size in many cases forced us to consider only the variable weight in order to evaluate the evolution of the nutritional status of PVV under ART.
      • The CD4 lymphocyte variable before treatment was also taken. For the latter, there were also missing data. In fact, CD4 lymphocytes were considered as confounders.
      • At ACS / AMOCONGO all patients' medical records are listed from A to Z:<br /> at random we chose the letter D and took the first 72 patient records in<br /> which the following variables were found: age, weight before and after<br /> ART (weight at the date of the last visit), malaria (suspected clinically and confirmed by a thick blood thin smear), antimalarials (quinine, sulfadoxine-pyrimethamine, arthemeter-amodiaquine) and ART (all patients were under Triomune - stavudine, lamivudine and<br /> nevirapine)

      Results

      The percentage of PVV with high CD4 lymphocyte levels:<br /> - compared with that of PVV with the levels of collapsed CD4 lymphocytes was<br /> 15.79% vs. 84.21%, or in a ratio of 1/5 (patients with<br /> CD4 cells collapsed 5 times more than those with high CD4).<br /> The percentage of PVV with high CD4 lymphocyte levels:<br /> - and its correlation with malaria compared to that of PVV with lymphocyte levels<br /> CD4 collapsed and its correlation with malaria was 5.26% and 31.58%, respectively, in a ratio of 1/6 (patients with collapsed CD4 cells were 6 times more likely to be malaria patients than those with high CD4 ).<br /> Quinine was prescribed first-line followed by Sulfadoxine Pyrimethamine and<br /> artemisinin-amodiaquine.<br /> • The weight gain was 16.67% compared to the weight loss which was 61.11%<br /> in a ratio of ¼ (1 in 4 patients gained weight during HIV-malaria co-infection)

      Discussion

      All of these results should be considered with the following confounding factors:<br /> - the level of CD4 lymphocytes (generally classified as collapsed if less than 410 and elevated if higher than 410 CD4 cells / mm3)<br /> - patient income (which can determine the quality of the diet),<br /> - the duration of ARV treatment<br /> - associated opportunistic infections.<br /> 72 patients: small sample? But representative because calculated according to the formula: n≥ Z2αpq / d2<br /> n: sample p: HIV prevalence<br /> d: precision of 95% so d = 5% Zα = Z0.05 = 1.96<br /> Z0.05 = 1.96 = 2<br /> p = 0.046 = 4.6%<br /> q = 1-p = 1-.046 = .954<br /> d = 0.05<br /> n≥4 * 0.046 * 0.954 / 0.0025 = 70<br /> Nevertheless, this being an exploratory study, we will complete our data to arrive at a sample of at least 200 patients. The information gathered corroborated the results of the work on more than one point presented by Saye Khoo, David Back and Peter Winstanley in June 2004 at WHO in Geneva on interactions between HIV and malaria (1)<br /> The results obtained will allow integration of care.

      Conclusion

      In conclusion, this study has shown that attention can be highlighted in cases of HIV-malaria coinfection:<br /> - malaria is an aggravating factor that with fever induces catabolism and requires<br /> energy<br /> - to this we must also add its symptoms and the side effects of antimalarials<br /> (anorexia,…) that can lead to decreased dietary intake and weight loss.

      Recommendation

      For weight monitoring, we recommend using the "Body-Check System"<br /> (KORONA) originally planned for fitness, we think with the agreement of our<br /> promoter, this can be adopted for the nutritional monitoring of subjects living with HIV because they can:<br /> - measure body fat (energy source)<br /> - indicate the body water rate<br /> - display BMI or body mass index<br /> - display the consumption in Kcal

      Key words: antiretrovirals, antimalarials, body mass index, weight gain, weight loss, Kinshasa (Democratic Republic of Congo)

      Bibliography

      1. Khoo S., Back D., Winstanley P. The potential for interactions between antimalarial and<br /> antiretroviral drugs. In AIDS 2005, 19: 995-1005.
      2. Back D., Gatti G., Fletcher C., Garaffo R., Haubrich R., Hoetelmans R., et al. Therapeutic drug monitoring in HIV infection: current status and future directions. AIDS 2002; 16 (Suppl 1): S5-S37.

      Q. A reader writes: Viral load: Reason advanced: it was not our database (missing data). This reason is not valid: Because the real reason is that, at the time, no laboratory in Kinshasa still had equipment for measuring of this viral load.

      A. In agreement.

      Q. A reader writes: Do different ART regimens have any effect?

      A. In agreement. They have effects, but in our sample, all patients were under the same ART regimen in first-line treatment with triomune-40.

      Q.A reader writes: We know that some ART train more easily resistances than others.

      A.15. In agreement.

      Q. A reader writes: Opportunistic diseases and comorbidities: not take into account, is this a valid hypothesis?

      A. According to our collect of routinely data, the model that does not exclude another model that can hold account of this valid hypothesis. The important thing for a model is its interpretation:<br /> - Our model is limited to weight on admission and 12 months under ART.<br /> - However, its interpretation takes into account opportunistic diseases and co-morbidities.<br /> And it is obvious that co-infection with severe malaria-HIV / AIDS should be cited first<br /> in a tropical area.<br /> It is this explanation that our model has brought. With the exception of severe malaria<br /> causing weight loss, there are:<br /> - HIV itself which is supposed to be inactive under ART<br /> - other opportunistic diseases that are eliminated as and when e of the recovery of<br /> CD4 lymphocytes with ART.<br /> - other comorbidities such as cirrhosis or diabetes that can be controlled,<br /> But malaria that is often severe in immunocompromised patients is overlooked, no lines<br /> guidelines for the treatment of HIV-Malaria co-infection on a global scale according to<br /> Flateau's review of the literature which states that because of the lack of criteria<br /> rigorous diagnostics to prove malaria, the precise assessment of the effect of<br /> Malaria in HIV-infected patients is limited (Flateau CG: 2011).

      Q. A reader writes: Civil status: he was not mentioned on the health data consulted?

      A. In agreement. Yes, it was missing on some medical records consulted.

      Q. A reader writes: Absence of control with HIV (-).

      A. In agreement. The study focused on the medical records of HIV + patients under ART.

      Q. A reader writes: Some limitations could have been overcome.

      A. In agreement.

      Q. A reader writes: Targets of insulin: hepatocyte, adipocyte, myocyte,... there is also the neuron!

      A. In agreement.

      Q. A reader writes: p.44 (6th line): ... .TNFα increases what catabolism:<br /> hat of proteins, carbohydrates or lipids?

      A. The proteins.

      Q.A reader writes: It can be understood that the excess of the production of SOD which releases H2O2 precursor hydroxyl radical HO ° according to the reaction it<br /> catalysis: 2 O2 + 2H + H2O2 + O2 May exacerbate oxidative stress. But how to integrate in this exacerbation the opposite phenomenon of insufficient production of SOD.

      A. In agreement. It is the excessive production of SOD that demands the organism to use another non-enzymatic pathway with NADPH oxidase which involves the Selenium in its composition. This is the key to the thesis: fever (malaria or HIV) activates NADPH oxidase. HIV is blocked by ARVs. So if there is fever in an HIV subject on ARV, the HIV factor is eliminated, while the severe malaria factor due to the endemic area is always present. Which makes us say that this fever is mostly of malaria origin. NADPH oxidase fights oxidative stress (SOD). Selenium intake goes into the sense to increase the role of antioxidant played by NADPH oxidase.

      Q.A reader writes: As non-enzymatic antioxidants, there is no that selenium, we must also mention Vit C and Vit E.

      A. In agreement, but selenium is powerful non-enzymatic antioxidant, more powerful<br /> that Vit C and Vit E together.

      Q.A reader writes: introduction of a parameter different from previous ones: 200 CD4 / μl whereas everywhere else in the work it is 50 CD4 / μl you speak. How to reconcile this change of cell count?

      A. In agreement. I remember that the cut-off for ARV is less than 200 CD4 / μl whereas in the cards consulted, the patients had a quarter of this number less than 50 CD4 / μl so on admission, patients had very compromised immunity so naive to make a serious malaria.

      Q. A reader writes: The title of table 4 is not precise: it is actually about<br /> analysis of variance for the four moments of weight: 0, 3, 6 and 12 months.

      A. In agreement.

      Q.A reader writes: You write: HIV infection increases the repetition of episodes of severe malaria.

      A. In agreement.

      Q.A reader writes: Will weight loss be associated with HIV or repeat episodes of severe malaria?

      A. In agreement. HIV is inactive on ARVs, so weight loss would be associated with<br /> repetition of severe malaria episodes that activate the enzyme NADPH oxidase.

      Q. A reader writes: We know that HIV is already associated with a loss weight. So?

      A. In agreement. HIV is inactive on ART, so weight loss would be associated with<br /> repetition of severe malaria episodes that activate the enzyme NADPH oxidase.

      Q. A reader writes: the variance analysis table shows the test of non-significance of the weights on admission, after 3, 6 and 12 months?

      A. I agree

      Q. A reader writes: Apparently from your statistical results, you only have 2 variables: response variable (Y) ; Predictive variable 1 (X1). Finally, the equation used would be: Y = a + b1X1.

      A. In agreement. Weight loss can be adequately modeled at 12 months on ART<br /> (y), the diagnosis of severe malaria on admission (x) as y = ax + b; where "a" is<br /> a constant and "b" is the slope of the linear regression.

      Q. A reader writes: The binary logistic regression. We read ... Using Minitab software, we calculate the binary logistic regressi we have follows: Severe malaria = Number of CD4 <50 cells / μl (no separation) Weight (in) on admission(no separation) Weight (in) 12 months later ...It would have been clearer to systematize your model: Y = severe malaria; X1 = CD4; X2 = initial weight; X3 = Weight after 12 months. What would have given as equation:<br /> Y = a + b1x1 + b2X2 + b3X3.

      A. In agreement.

      Q. A reader writes: it is necessary to begin by exposing the complete model with Y = Initial weight, X1 = CD4 / μl, X2 = Weight after 12 months, X3 = severe Malaria, X4 = severe HIV / malaria coinfection, X i + j = diabetes, cirrhosis, etc ...

      A. In agreement.

      Q. A reader writes: This raises the question of how many predictive variables (2, 3, 4, 5, etc ...) have been incorporated into your initial model of logistic regression: (1) CD4 / μl, (2) Weight after 12 months, (3) severe malaria, (4) HIV / severe malaria coinfection, (5) diabetes, (6) cirrhosis, (7) tuberculosis, (8) ) cancer, (9) age ... etc. .. This is not explicit in your text. Because from 9, 10, 11 variables predictives poses the conceptual problem of the utility of each of these variables for include in the model. This problem needs to be explained clearly. Because we would have to show the table drawn for the Khi-Carré of each variable predictive so that we realize its meaning.

      A. In agreement: 3 predictor variables were incorporated in the initial model of<br /> Logistic regression: (1) CD4 / μl <50 cells, (2) Initial weight, (3) Weight after 12 months.<br /> The logistic regression was not significant however, she had shown<br /> in the cards consulted a link between the diagnosis of severe malaria and<br /> admission (y) and a number of CD4 / μl <50 cells (x1).<br /> So, I switched to linear regression to adequately model weight loss<br /> at 12 months on ARV (y), diagnosis of severe malaria on admission (x)<br /> y = ax + b; where "a" is a constant and "b" is the slope of the linear regression.<br /> y = Weight after 12 months, x1 = Diagnosis of severe malaria at admission, x2 = co-infection<br /> HIV / severe malaria, x i + j = diabetes, cirrhosis.

      Q. A reader writes: No evaluation of the accuracy or the reproducibility and the reliability of counting CD4 in the laboratory of AMOCONGO. For good reason: retrospective study!

      A. At variance. Good reason: AMOCONGO is a social structure, not for the scientific purpose. The laboratory is living with limited time subsidies.

      Q. A reader writes: Your Conceptual Model is not well explained: in the box beginning with ... 72 medical ... all the text included in this box should be reduced to a bare minimum, returning the rest in the text.

      A. In agreement. Here is the conceptual model well explained ; Evolution of the weight of HIV-positive subjects on antiretroviral treatment in an area of malaria endemic

      Q.A reader writes: BMI or IMC (Body Mass Index in French).<br /> This index is calculated by the formula: BMI = Weight (kg) / [Size (m)] 2. Your work is titled: "Evolution of BMI ..." in addition, your sample is limited to adults (age≥18 years). Under these conditions, within 12 months, can the size of a subject undergo significant variation to the point of affecting BMI?

      A. No. In agreement.

      Q. A reader writes: Of course, BMI is a report that changes when one terms: numerator or denominator changes. The analysis of the medical records of your sample suggest this change in subject size ??

      A. No. In agreement.

      Q. A reader writes: If this is not the case, then replace Evolution of BMI by Evolution of weight ...

      A. I agree

      Q. A reader writes: You evoke Eastern DRC as an unstable malaria and Kinshasa as a stable malaria area. Have you determined the workforce patients from that area who were eventually included in your sample of 72 patients?

      A. No. In agreement. However, this work draws our attention to the vulnerability of a<br /> HIV + who leaves an unstable malaria area and comes for treatment in an area stable malaria: it runs the risk of making more severe forms of malaria. And we know that the war would have increased the number of HIV-positive women in Eastern DRC with the rapes suffered by girls and sons in this part of the country during atrocities, this is no longer to be demonstrated with all the African forces who had elected home during the war of liberation.

      Q. A reader writes: MATERIEL. You're saying: Toshiba Computer,<br /> medical fislands, sheets of paper, pens. Is it really worth aligning sheets of paper<br /> and bics among the material used? Why not add chairs and tables too! In<br /> finally, your material consisted only of patients' medical files!

      R. In agreement.

      Q. A reader writes: Admit it's simple!

      A. In disagreement. The medical forms were used for the finalization of the thesis to be included in the whole of the global theme which is POVERTY with 5 PREPRINT<br /> published articles and 2 in peer-review submission. And talking about POVERTY is not lean. Regarding weight loss, there are 10 key messages:<br /> - 1. HIV-AIDS and malnutrition are interdependent.<br /> - 2. HIV affects nutrition through multiple mechanisms. Its impact starts early<br /> during asymptomatic infection and continues throughout the life cycle.<br /> -3. HIV exposure and HIV infection worsen malnutrition issues<br /> infantile<br /> -4. Infants who are not breastfed because of maternal choice, illness or<br /> mortality are particularly vulnerable to malnutrition.<br /> -5. Nutritional interventions benefit HIV patients<br /> -6. Nutritional education can improve adherence or adherence to ARVs and<br /> other drugs to treat opportunistic infections.<br /> -7. The objectives for nutrition education vary at different stages of infection<br /> Asymptomatic HIV HIV and AIDS and post-mortem HIV<br /> surviving members of the family.<br /> -8. Priority actions include nutrition for a positive life, management of<br /> disease nutrition, management of interactions between ARVs and foods,<br /> Therapeutic feeding for HIV seropositive moderately and severely malnourished,<br /> children and adults, infants and young children, and the elderly in<br /> accommodation or palliative care.<br /> -9. Nutrition interventions for people living with HIV / AIDS<br /> include the food supply and the assessment of nutritional status,<br /> support tips, targeted nutritional supplements, and links to programs<br /> supply and food security.<br /> -10. Nutrition education, care and support are important elements of<br /> in charge of HIV and should be considered initially when planning<br /> programs.

      Q. A reader writes: In a real environment, can we observe a phenomenon<br /> with p = 0.00? No.

      A. In agreement.

      Q.A reader writes: The probability that you score 0.000 is indeed a very low probability that it should be indicated 0.0003 .... 0.00005 ... At least indicate that it is inferior to such value and not to affirm that it is 0.000!

      A. In agreement.

      Q. A reader writes: Where do you plan to present the research question?

      A. In agreement. The research question is presented in the introduction.

      Q. A reader writes: A summary should summarize the essence of the work: a<br /> brief introduction with objective of the subject: methods used in a few words, results<br /> essentials and conclusion and not to exceed a certain number of words: 250 words! That's not what we find in your summary.

      A. In agreement.

      Q. A reader writes: The title of the project is too long for nothing. We can<br /> shorten by replacing it with: PROSPECTIVE STUDY ON BMI EVOLUTION OF<br /> HIV / AIDS SUBJECTS UNDER ART IN MALARIA ENDEMIC AREA.

      A. In agreement.

      Q. A reader writes: The whole page and the ¾ of the page are devoted to the mechanisms of oxidative stress in the progression of HIV and malaria. Is it in the acknowledgments the appropriate place to talk about these mechanisms?

      A. In agreement. No, it's in the generalities.

      Q. A reader writes: Can it be understood that these are febrile patients with diagnosis of severe malaria with a CD4 count <50 cells / μl ... is not better, so expressed?

      A. In agreement.

      Q. A reader writes: ... confounding factors as opportunistic infections (OI), helminths, poverty, diabetic, cirrhosis, ... In this line, what is the grammatical role diabetic : adjective or noun? If adjectiof, how do you list it with nouns: infections, poverty, etc ...? Replace diabetic by diabetes.

      A. In agreement.

    1. On 2019-02-07 15:55:21, user UMass microbial ecology jclub wrote:

      Thank you for this paper. It does a nice job of demonstrating that priming effect is in the eye of the beholder. We read it for journal club today, and I am summarizing some comments and suggestions we came up with, primarily related to the display of the data. This is because the objective set out for the paper (see if bacteria can grow on NOM) is not in line with much of the introduction, experimental design, or interpretation of the results. We suggest 1. see if bacteria grow on NOM, and 2. how the presence of LOM affects this. Figure 1 should then be just NOM minus C-free controls, and a separate figure for just the composite and mix samples were plotted (as figure 2). Even better, just plot the priming effect through time by subtracting the composite from the mix. At present, figure 1 is complete information overload, and making everything divided by or subtracted from some control will go a long way to remedying this. And hopefully also getting rid of the ANOVA tables. We would also suggest plotting the respiration data as a rate rather than cumulative respiration to enable figure 1 and 2 to be viewed more comparably could also be useful.

      A strength of your paper is that it shows that whether priming effect exists depends on whether you look at respiration or growth. However, what we are usually interested in when we think of priming is how much of the native organic matter will be lost. If you have any measures of the remaining LOM or NOM to indicate whether more was lost overall under priming, this would be a great addition. Including in particular LOM data from the different components to show if the crash and burn growth was a response to depleting the LOM or whether LOM became limiting would also be very useful in interpreting the priming results.

      Finally, a strong theoretical basis for why time matters for priming effect is much needed; is a priming effect real if it is not consistent? What does it mean? How does growing the cells on acetate and then switching them to NOM affect results compared to another source? Do bacteria undergo batch culture in estuaries, or is it more like chemostats? Physiology is very different during different growth phases and this may ultimately change the conclusions made in the paper.

    1. On 2019-01-26 14:53:59, user Jingjing Liang wrote:

      Response to Dormann et al.

      Thank you for your comments on Liang et al. 2016. It is always stimulating when someone is discussing our findings. There are many interesting questions you raise, and others neither you nor we have yet wrestled with fully. Please find, below, our response to your comments as numbered on Page 1.

      (1) The authors computed “relative tree species richness” in such a way that it represents a gradient from boreal to tropical plots, rather than in local species richness. When instead computing species richness relative to the maximum value in the region the effect of species richness on productivity is dramatically reduced.

      Response: Thank you for your suggestion in your first sentence. However, confining our analysis strictly at the ecoregion level would render us unable to derive a true global biodiversity-productivity relationship (BPR) which should account for both intra- and inter-ecoregion variability. There are likely a variety of different ways of assessing this; ours and yours are just two. Considering mounting concerns on the delineation of ecoregion boundaries (e.g. Jepson and Whittaker 2002), an ecoregion-level study would create substantial problems of its own. Thus we believe both options (yours and ours) have strengths and weaknesses, and address the same overall question but from different angles. There are many other issues that could be, and should be addressed, in grappling with how best to do this. This includes whether productivity should be standardized (i.e. the issues raised for richness might also apply in some way for productivity); and how best to standardize either richness or productivity (as there a number of ways of doing this). We are working on delving further into these issues.

      Regarding the point in the second sentence, we disagree that the BPR relationship is dramatically reduced when examined at eco-regional scales. We will demonstrate below that even when we use relative tree species richness at an ecoregion-level, the trendline and standard error bands are similar to the global trend as reported by Liang et al. 2016.

      For this demonstration, we selected the three grassland biomes (i.e. Montane Grasslands and Shrublands, Flooded Grasslands and Savannas, and Temperate Grasslands, Savannas and Shrublands), because your graphs in Page 32 suggest that these biomes do not conform to the global trend of Liang et al. 2016. For this analysis, we combined the three biomes together, because there are less than 2000 plots for Montane Grasslands and Shrublands and Flooded Grasslands and Savannas together, and almost a half of the plots within these two ecoregions are monocultures

      The combined grassland biomes have a total of 23,133 plots (including ~3000 monoculture plots). For simplicity, we ignored the spatial autocorrelation, and the result from a robust bootstrapping estimation (Efron and Tibshirani, 1993) is quite consistent with the global trend of Liang et al. 2016 (Fig. B1) (see the Appendix for the R script for estimating BPR for the grassland biomes). This is also generally true for most of the other ecoregions (not shown), as long as there are a sufficient number of plots and a sufficient number of mixed-species plots. In fact, the theta values we have produced to date across regions don’t systematically differ from the global one, although we are still working on making sure we are doing these appropriately. So we are unclear how you arrived at the values you did. Additionally, we also think that perhaps we (and you or anyone else working with these data) should eventually re-run everything eliminating data from the desert ecoregion. Looking at the total lack of productivity there it is hard to justify calling anything that went into that group a forest.

      We acknowledge that performing an ecoregion-level study would be a good supplement to Liang et al. 2016. We would be glad to collaborate with you or anyone else on this idea. Additionally we believe that examining alternative approaches, including non-parametric models, and different ways of standardizing either or both productivity and richness, to the global relationship would be worth doing.

      We also note that we have some residual questions about your approach. We are unable to understand how a global line like yours (your left panel, Figure 1) could average and max out around 2.5 for productivity when so many of the Ecoregions with most of the data have means so much higher than that? Additionally, you call the x-axis of your first panel in Figure 1 ”relative local species richness” which confuses us. If your draws were across all data, then the ‘relative’ value is not ‘local’ even if you used the maximum values of each draw rather than the global max as we did (but we am not entirely sure what you did). If the maximum richness was from each draw, should your x-axis be “sample max” not “local max”. Are we misinterpreting what you did or is this just unclearly labeled?

      https://uploads.disquscdn.c... <br /> Figure B1. Estimated BPR curve (with 95% confidence interval bands), using an ordinary least squares (OLS) model, based on the three grassland biomes (i.e. Montane Grasslands and Shrublands, Flooded Grasslands and Savannas, and Temperate Grasslands, Savannas and Shrublands). We converted species richness (S) to relative species richness (S_hat): S_hat = S *100 / 271.

      (2) Plots are overwhelmingly from temperate forest; indeed only some 2500 plots are from the tropics (equivalent to 0.4%), despite these forests representing around 30% of the world’s forest. Stratifying the plots accordingly weakens the TSR-P-relationship.

      Response: Thanks for the concern raised in your first sentence. We are well aware of that problem, and have even discussed it in our paper. Of course this is just one more case of a general trend of under-documentation of all species (not just trees) from developing countries. This is problem all researchers from developed countries should at least be aware of and try to mend as best we can; we at the GFBi are doing our part and currently trying to collect more samples from the tropics for future research studies.

      Regarding your second point, we recognize that stratifying the plots may make the results more robust, but its effect would be limited and will not alter the overall global trend, because you already stated in your comments (Page 20) that “the (stratification) effect is moderate, with slightly lower values than the original non-stratified approach. This result suggests that also with non-stratified sampling always some tropical plots with high species richness are drawn, making the original Š robust to unrepresentative sampling.”

      Additionally, because the data are overwhelmingly temperate, roughly 3% boreal, and <1% tropical, and draws in Liang et al 2016 were random across the globe, most of the 500-stand draws in our original 2016 paper were likely to have most data from non-tropical sites, so the influence of tropical high diversity, high productivity sites were likely modest, unless they had extremely high influence per datum on the overall fitted function because of their position in data space (which is possible). This is relevant to your concern (above) about our global result being influenced by the sharp gradient in boreal to tropical forests in both productivity and richness. Similarly, boreal stands would have shown up not very often; maybe 15 or times on average in each 500 stand draw, with tropical stands drawn twice or so on average out of each 500 draw. In contrast, if our data had hypothetically been roughly representative equally of boreal, temperate and tropical forests, the global relationship might have been much more influenced by the gradient from low diversity, low productivity boreal to high-high tropical. In other words, our original data and fits were likely strongly temperate in flavor, despite our concerns about the undue influence of the boreal-tropical gradient. It may be in fact that we should have a different concern; not that boreal-tropical gradient exerted too much influence on our published global fitted relationship of productivity-richness, but that our global analysis ‘undercounted’ the impact of tropical and boreal forests on the global relationship, given that the vast majority of stands in each 500-lot draw were temperate. We are not yet sure how best to check these issues.

      (3) In the spatial regression model, distances between plots were computed without taking the spherical nature of earth into account. This had little effect on the slope estimate of the TSRP-relationship.

      Response: Thank you for sharing your insight into and findings about this. We appreciate it. We recognize that calculating distances between plots by taking the spherical nature of earth into account may slightly improve the accuracy of our estimated BPR. The magnitude of such improvement is yet to be determined by future research.

      (4) The computational burden of the spatial model required subsampling the data to 500 data points. The authors did not correctly compute confidence intervals for this approach, wrongly interpreting subsampling as bootstrapping and additionally incorrectly computing bootstrap standard errors. A correct subsampling-based estimation led to approximate trippling of the reported confidence interval.

      Response:

      Thank you for raising this concern. Bootstrapping is only efficient at depicting a global trend if the re-sampling size is close to the global sampling size (Efron and Tibshirani 1993). However, for our study, the 500-plot subsample is far from our global sampling size (>700,000). Considering that you used a minimalism approach, in which “while Liang et al. (2016) run 10000 bootstraps, we only do 50,” (p.8) your suggested global results only represent, in fact, ~ 50*500=25000 plots or approximately 3 percent of the global sample. In other words, there is a 97% information loss in your approach.

      In the textbook description of the bootstrapping by Efron and Tibshirani (1993), echoed by many (e.g. Hesterberg 2015), it is outlined that the bootstrap sample should be equal in size as the original sample, and that any smaller re-sampling sizes would lead to a biased estimate of standard error. This is also the main reason why you did not find a significant global BPR as it should have been.

      Allow us to demonstrate, with R-code (in blue) and outputs, how we have derived our results. While there is well-established literature regarding the validity of the subsampling method we have taken, less is known about an appropriate choice of the size of a subsample and the number of subsamples. With a global sample size over 600,000, we have chosen the subsample size to be 500 and a total of 10,000 subsamples out of consideration for computational feasibility and adequate representation of the global sample. Our approach leading to these choices is indeed ad-hoc and the standard errors are at best approximations. We welcome ideas and possible collaboration to establish more rigorous approaches. On the other hand, with a large amount of data and thus information, statistical significance is not tenuous to attain.

      1. For each random subset of 500 plots, we consider this subset a separate study unit (one can regard this as equivalent to a subregion). In the Geospatial Random Forests model, we calibrate one biodiversity-productivity relationship (BPR) curve based on this subset. With a global sampling size of >700,000, we find that it takes more than 2,000 subsets of 500 samples in our global BPR analysis, so that any single plot would have been accounted for at least once in the analysis. To be safe, we used 10,000 subsets (i.e. iterations) (Fig. 1);

      https://uploads.disquscdn.c... <br /> Figure 1. A graphic demonstration of the Geospatial random forests model. We randomly select 500 plots from across the world as one study unit or “subregion” (yellow), calibrate one biodiversity-productivity relationship (BPR) using the model, and draw a ceteris paribus BPR curve. Repeating this 10,000 times provide a sufficient global coverage as each plot has on average been covered for ~7 times (500*10000/720000≈7). Note that actual subregions can be spatially discontinuous depending on the randomization. <br /> A major strength of this approach is that it does not require any a priori assumption on the population distribution or any a priori delineation of forest type units across the world, within which forests have similar conditions. This is especially useful because there is no universally accepted forest type delineation across the world (FAO 2015).

      1. Load the global data set, note that we did remove plots with extreme species richness or productivity values (i.e. those beyond 99.996th percentile), and plots with zero species richness or productivity.

      Load packages

      library(nlme)

      Load plot-level data

      Download GFB1_data_figshare.xlxs from Figshare and convert to a csv file

      data<- read.csv("GFB1_data_figshare.csv")<br /> data <- subset(data, P>0)<br /> data <- subset(data, S>0)

      quantile(data$S,0.99996)<br /> quantile(data$P,0.99996)

      data1 <- subset(data,data$S<=270 & data$P<=533 & data$S >0 & data$P>0) # removed 894 plots with 0 or extreme S and P values

      1. For each subset of 500 plots (without replacement), we consider this subset a separate study unit (one can regard this as equivalent to a subregion). We draw one BPR curve based on this subset, using our geospatial random forests, by keeping other variables constant at their sample mean, only increasing species richness from 1 to 271 (the global maximum).
      #############################################################
      ############# Derive Global GeoRF Estimation #####################
      #############################################################

      logP <- log(data1$P)

      jig coordinates to avoid duplicated values

      Lon1 <- data1$Lon+ runif(length(data1$Lon),-0.0001,0.0001)<br /> Lat1 <- data1$Lat+ runif(length(data1$Lat),-0.0001,0.0001)<br /> data1 <- cbind.data.frame(data1, logP, Lat1, Lon1)

      ###### Loop ##################

      coef <- matrix(0, nrow=10000, ncol=20) # Coef Matrix

      for(i in 1: 10000) {<br /> tryCatch({<br /> training <- data1[sample(1:nrow(data1), 500, replace=FALSE),] # turn 'replace' off to maximize inclusion of new plots<br /> logS <- log(training$S)<br /> training <- cbind.data.frame(training, logS)<br /> gls1 <- gls(logP~ logS + G + T3 + C1 + C3 + PET + IAA + E, data=training, method="ML", corr= corSpher(form = ~ Lon1 + Lat1, nugget = TRUE), control=glsControl(singular.ok=TRUE))<br /> coef[i,3] <- i<br /> coef[i,4] <- logLik (gls1)<br /> coef[i,5] <- AIC (gls1)<br /> coef[i,6]<- BIC (gls1)<br /> #Generalized coefficient of determination<br /> gls0 <- gls(logP~ 1, data=training, method="ML") <br /> R2 <- 1-exp(logLik(gls0)-logLik(gls1))^(2/500)<br /> coef[i,7]<- R2<br /> coef[i,8] <- coef(gls1)[1] <br /> coef[i,9] <- coef(gls1)[2]<br /> coef[i,10] <- coef(gls1)[3]<br /> coef[i,11] <- coef(gls1)[4]<br /> coef[i,12] <- coef(gls1)[5]<br /> coef[i,13] <- coef(gls1)[6]<br /> coef[i,14] <- coef(gls1)[7]<br /> coef[i,15] <- coef(gls1)[8]<br /> coef[i,16] <- coef(gls1)[9]<br /> coef[i,17] <- 0<br /> # Baseline (S=1) productivity<br /> # logS + B1 + T3 + C1 + C3 + PET + IAA + E<br /> newdata <- data.frame(logS=0, G=mean(training$G), T3=mean(training$T3), C1=mean(training$C1), C3=mean(training$C3),PET=mean(training$PET), IAA=mean(training$IAA), E=mean(training$E))<br /> coef[i,20] <- exp(predict(gls1,newdata))<br /> #counter<br /> cat(i, " of ", 1000, date(),"Theta=",coef(gls1)[2], "R2=", R2, "\n" )<br /> #remove files<br /> rm(training, newdata, gls1, R2)<br /> }, error=function(e){})<br /> }<br /> coef_df <- as.data.frame(coef)

      names(coef_df) <- c("0", "0", "i", "Loglik", "AIC", "BIC", "R2","const","theta", "B", "T3", "C1", "C3", "PET", "IAA", "E", "0", "0", "0", "P_1")

      write.csv(coef_df, "global_estimates.csv")

      1. Repeating the foregoing step 10,000 times, we get a combined subregions that cover the entire global forest range. Meanwhile, we have 10,000 curves (green in the following Fig. 2) that represent possible BPR’s across the world. Treating each region as an independent study unit, instead of a bootstrapping re-sample, we can calculate and plot the mean and standard error (SE) of the predicted BPR curves across the world as shown in the figure below (mean: black line, with red curves representing 95% C.I.)
      ##############################################################

      Draw estimated Biodiversity-Productivity Relationship (BPR) curves #########

      data<- read.csv("global_estimates.csv")

      theta <- data$theta<br /> mean(theta)<br /> P_base <- mean(data$P_1)

      Predict P over an increased S from 1 to global max (271), which corresponds to S_hat from 100/271 to 100

      S <- seq(1,271,1)<br /> S_hat <- S*100/271

      P_est <- data.frame(matrix(0, 10000, ncol =273))<br /> P_est[,1] <- P_base<br /> P_est[,2] <- theta

      for (i in 1:10000){<br /> P_est[i,3:273] <- P_est[i,1] * S ^ P_est[i,2]<br /> }

      demosntration plot only shows the first 18 iterations

      plot(S_hat,colMeans(P_est[,3:273]), ylim=c(0,20), type="l",col = "blue", ylab="P")<br /> for (i in 1:18){<br /> P_est[i,3:273] <- P_est[i,1] * S ^ P_est[i,2]<br /> lines(S_hat,P_est[i,3:273],col = "green")<br /> }

      Confidence intervals

      lines(S_hat,colMeans(P_est[,3:273])+1.96*apply(P_est[,3:273], 2, sd)/sqrt(10000), ylim=c(0,20), type="l",col = "red")<br /> lines(S_hat,colMeans(P_est[,3:273])-1.96*apply(P_est[,3:273], 2, sd)/sqrt(10000), ylim=c(0,20), type="l",col = "red")

      https://uploads.disquscdn.c... <br /> Figure 2. Sample BPR curves from the 10,000 estimated curves from across the world. The figure is nearly identical to Fig. 3A of Liang et al. 2016, with some minor differences due to the random process. For easy comparison across the world, we set the base value of P as 2.5m3ha-1yr-1, and convert species richness (S) to relative species richness (S_hat): S_hat = S *100 / 271.

      1. To demonstrate that this estimated global mean and confidence interval from our Geospatial random forests model (Fig. 2) is a good proxy of the true global BPR trend, we compare this result with an outcome from an ordinary least squares model (OLS), of which the estimates are based on the entire sample (with >700,000 plots).

      A comparison with OLS model ##

      data <- read.csv("GFB1_data_figshare.csv")<br /> data1 <- subset(data,data$S<=270 & data$P<=533 & data$S >0 & data$P>0) # removed 894

      logS <- log(data1$S)<br /> ols1 <- lm(logP~ logS + G + T3 + C1 + C3 + PET + IAA + E, data=data1)

      theta <- coef(ols1)[2]<br /> summary(ols1)<br /> se_theta <- 2.100e-03

      S <- seq(1,271,1)<br /> S_hat <- S*100/271<br /> P_base <- 2.5

      P_est_ols <- P_base * S ^ theta # mean predicted BPR<br /> P_est_ols_ub <- P_base * S ^ (theta+1.96* se_theta) # upper bound of 95% CI<br /> P_est_ols_lb <- P_base * S ^ (theta-1.96* se_theta) # lower bound of 95% CI

      plot(S_hat, P_est_ols, ylim=c(0,20), type="l",col = "blue", ylab="P")

      Confidence intervals

      lines(S_hat,P_est_ols_ub, ylim=c(0,20), type="l",col = "red")<br /> lines(S_hat,P_est_ols_lb , ylim=c(0,20), type="l",col = "red")

      The corresponding line plot is printed below. According to this graph, the BPR has the same curvature, but estimated productivity (P) is in general 10-20% lower than the estimated values from the Geospatial random forests, presumably due to the fact that spatial autocorrelation is not accounted for in the OLS model. Nevertheless, the confidence interval from the OLS model generally matches the confidence interval from the Geospatial random forests (Fig. 2).

      https://uploads.disquscdn.c... <br /> Figure 3 Estimated BPR curve (with 95% confidence interval bands), using an ordinary least squares (OLS) model, based on the entire GFB sample with >700,000 plots. For easy comparison across the world, we set the base value of P as 2.5m3ha-1yr-1, and convert species richness (S) to relative species richness (S_hat): S_hat = S *100 / 271. <br /> <br /> (5) As noted earlier (Schulze et al., 2018), some 4% of the plots had productivity values (far)<br /> beyond what is biologically plausible (Stape et al., 2010). The likely reason is that small plots with large inventory errors in the productivity may lead to erratically high values. Not taking this into account in the analysis, e.g. by down-weighting plots with productivities above 30 m2ha????1y????1 at least indicates an unre ected use of data.

      Response: <br /> Thank you for your concern. As shown in the R-code above, we have removed extremely high productivity values, above the top 0.004 percent quantile (P<=533). It is admittedly a difficult task to filter out the potentially biased values from such a large sample, but we are working with data scientists and data contributors to further improve the accuracy of our data.

      References

      Efron, B., and R. J. Tibshirani. 1993. An introduction to the bootstrap. Chapman & Hall, New York.<br /> FAO. 2015. Global Forest Resources Assessment 2015 - How are the world’s forests changing? , Food and Agriculture Organization of the United Nations, Rome, Italy.<br /> Hesterberg, T. C. 2015. What teachers should know about the bootstrap: Resampling in the undergraduate statistics curriculum. The American Statistician 69:371-386.<br /> Jepson, P., and R. J. Whittaker. 2002. Ecoregions in Context: A Critique with Special Reference to Indonesia. Conservation Biology 16:42-57.

      Appendix: R script for estimating BPR for the grassland biomes

      Estimate BPR curves by ecoregion

      (C) Jingjing Liang 2018

      library(nlme)

      Load plot-level data

      Download GFB1_data_figshare.xlxs from Figshare and convert to a csv file

      data<- read.csv("GFB1_data_figshare.csv")<br /> data <- subset(data, P>0)<br /> data <- subset(data, S>0)<br /> attach(data)

      Montane Grass and shrubs

      data1 <- subset(data, data$Ecoregion==10 | data$ Ecoregion ==9 | data$Ecoregion ==8)<br /> data1 <- subset(data1,data1$P<=quantile(data1$P,0.999))

      ######## BPR Estimation
      ###### Bootstrapping ##################

      coef <- matrix(0, nrow=50, ncol=101) # Coef Matrix

      for(i in 1: 50) {<br /> tryCatch({

      training <- data1[sample(1:nrow(data1), 23133, replace=TRUE),]<br /> logP <- log(training$P)

      Lat1 <- training$Lat + rnorm(length(training$Lat))<br /> Lon1 <- training$Lon + rnorm(length(training$Lon))<br /> training <- cbind(training, logP, Lat1, Lon1)

      S_max <- max(training$S)<br /> SR <- training$S/S_max*100

      logS <- log(SR)<br /> training <- cbind(training, logS)

      lm1 <- lm(logP~ logS + G + T3 + C1 + C3 + PET + IAA + E, data=training)

      Derive ceteris paribus BPR curve

      newdata <- data.frame(logS=log(seq(1,100,1)), G=mean(training$G),T3=mean(training$T3), C1=mean(training$C1), C3=mean(training$C3),PET=mean(training$PET), IAA=mean(training$IAA), E=mean(training$E))<br /> coef[i,1] <-coef(lm1)[2] #theta<br /> coef[i,2:101] <- exp(predict(lm1,newdata))<br /> plot(coef[i,])<br /> #counter<br /> cat(i, " of ", 50, date(), "\n" )

      remove files

      rm(training, newdata, gls1)

      }, error=function(e){})<br /> }

      coef_df <- as.data.frame(coef)

      write.csv(coef_df, "Ecoregion_Grasslands_BPR.csv")

      Plot mean and 95% CI of bootstrapping

      plot(seq(1,100,1),colMeans(coef_df[,2:101]), ylim=c(0,6), type="l",col = "blue", ylab="P",xlab="S_relative")

      Confidence interval

      lines(seq(1,100,1),colMeans(coef_df[,2:101])+1.96*apply(coef_df[,2:101], 2, sd), ylim=c(5,8), type="l",col = "red")<br /> lines(seq(1,100,1),colMeans(coef_df[,2:101])-1.96*apply(coef_df[,2:101], 2, sd), ylim=c(5,8), type="l",col = "red")

      End of the code

    1. On 2018-11-29 09:06:32, user Conrad Mullineaux wrote:

      Speculative hypothesis papers can be fun and good for stimulating debate. But, to be useful, I think they need to present a plausible and coherent scenario (something that at least has a chance of being true) and they need to pay reasonable attention to the facts. I’m not sure that’s the case here. My main concerns are:<br /> 1. Fig. 3. The feedback loop looks neat, but it ignores the fact that the local [O2] around the nitrogenase need not correlate to any significant extent with the global atmospheric [O2]. Huge discrepancies could occur, due to local environmental conditions, and also due to the metabolic activity of the cell itself. Considering only the latter factor, the intracellular [O2] could be much higher than ambient (due to PSII activity) or much lower than ambient (due to respiration). If the nitrogenase doesn’t actually see the global atmospheric [O2], such a feedback loop could not clamp global [O2] at any particular level as proposed.<br /> 2. P.5 “If diazotrophic cyanobacteria are grown under conditions where they have sufficient CO2 and light, and with N2 as the sole N source, then they grow and accumulate no more than 2% oxygen in their culture atmosphere (16). The 2% O2 remains constant during prolonged culture growth because this is the O2 partial pressure beyond which nitrogenase activity becomes inhibited. With greater O2, nitrogenase is inactivated and there is no fixed N to support further biomass accumulation. With less O2, nitrogenase outpaces CO2 fixation until the latter catches up, returning O2 to 2% in the culture.” The outcome of this experiment will come as a surprise to anyone who has observed diazotrophic cyanobacteria happily growing without a combined nitrogen source at 21% ambient O2 (it depends on the cyanobacterium, of course). The result is a key plank of the authors’ argument, but it’s not clear if, when or how the experiment has been carried out. It’s not as straightforward as it seems, and nothing like that is described in the cited reference (16: Berman-Frank et al 2003). The nearest thing in that paper is a statement that a specific cyanobacterium, Plectonema boryanum, is unable to fix nitrogen above certain ambient [O2] levels. The limits are actually rather lower that the 2% quoted: 0.5% in the light and 1.5% in the dark (16). Plectonema is a specialist for microaerobic environments, and most other diazotrophic cyanobacteria are not so susceptible to O2 inhibition. <br /> 3. P.5 “Cyanobacteria have evolved mechanisms to avoid nitrogenase inhibition by oxygen, including N2 fixation in the dark, heterocysts or filament bundles as in Trichodesmium. Critics might counter that any one of those mechanisms could have bypassed O2 feedback inhibition.” Indeed they might. The authors go on to brush aside their imaginary critic on 3 grounds, none of which seem valid. “First, evolution operates without foresight”. Foresight isn’t needed: there would have been an immediate selective advantage to acquiring an O2 protection mechanism. “Second, the mechanisms that cyanobacteria use to deal with modern O2 levels appear to have arisen independently in diverse phylogenetic lineages, not at the base of cyanobacterial evolution when water oxidation had just been discovered”. Very likely so, but what about the next 2 billion years? “Third, the oldest uncontroversial fossil heterocysts trace to land ecosystems of the Rhynie chert”. It may or may not be the case that heterocysts evolved late, but, in any case, heterocysts are not significant contributors to marine nitrogen fixation: in extant cyanobacteria it’s the other protection mechanisms that allow cyanobacteria to make a huge contribution to oceanic nitrogen fixation even in the presence of 21% atmospheric O2. What about those other mechanisms? The fact that different lineages of cyanobacteria have independently come up with at least 3 different ways to protect their nitrogenase from O2 indicates that evolving such mechanisms is not really such a big deal. The authors’ scenario suggests that for a period approaching 2 billion years there was a nitrogen-limited biosphere with cyanobacterial nitrogenase operating right up against an inhibitory concentration of O2. There would have been intensive selective pressure for adaptations to protect the nitrogenase from oxygen. The scenario depends on the assumption that no cyanobacterium was able to develop a protection mechanism that would allow nitrogen fixation at >2% O2, despite selective pressure operating over a period of about 2 billion years and the availability of multiple solutions to the problem, as seen in extant cyanobacteria. I’m afraid that’s implausible, and I suggest that we need to look elsewhere for an explanation of the low O2 level through the Proterozoic.