7,842 Matching Annotations
  1. Mar 2026
    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript describes the pattern of relaxed selection observed at spermatogenesis genes in gorillas, presumably due to the low sperm competition associated with single-male polygyny. The analyses to detect patterns of selection are very thorough, as are the follow-up analyses to characterize the function of these genes. Furthermore, the authors take the extra steps of in vivo determination of function with a Drosophila model.

      This is an excellent paper. It addresses the interesting phenomenon of relaxation of selection as a genomic signal of reproductive strategies using multiple computational approaches and follow-up analyses by pulling in data from GO, mouse knockouts, human infertility database, and even Drosophila RNAi experiments. I really appreciate the comprehensive and creative approach to analyze and explore the data. As far as I can tell, the analyses were performed soundly and statistics are appropriate. The Introduction and Discussion sections are thoughtful and well-written. I have no major criticisms of the manuscript.

      We thank you for your kind words!

      The main area that I would suggest for improvement is in the "Caveats and Limitations" section of the Discussion. Currently, the first paragraph of this section states the obvious that genetic manipulation of gorillas is not feasible. Beyond a reminder to the reader that this was a rationale for the Drosophila work, it isn't really adding much insight. The second paragraph is a brief discussion of the directionality of change. I think it comes across as overly simplistic, with a sort of "well, we can never know" feel. Obviously, there are plenty of researchers who do model change to infer direction and causation, and there are plenty of published papers attempting to do so with respect to mating systems in primates.

      We understand these statements might seem trivial, but they are meant to fully acknowledge, particularly to non-evolutionary biologists, the fact that we can’t do the genetics to “prove” these putatively deleterious mutations really are so (hence the statement about forward/reverse genetic experiments), nor causation (since this mating system evolved once in the history of gorillas we cannot know directionality in this lineage, although we could infer it if we had species in which different stages were extant, for example).”

      I do not think the authors need to remove these paragraphs, but I do encourage them to turn the "Caveats and Limitations" section into something more meaningful by addressing limitations of the work that was actually done rather than limitations of hypothetical things that were not done. A few areas come to mind. First, the authors should discuss the effect of gene-tree vs species-tree inconsistencies in the analyses, which could affect the identification of gorilla-specific amino acid changes and/or the dN/dS estimates. Incomplete lineage sorting is very common in primates including the gorilla-chimp-human splits (Rivas-González et al. 2023). It would be nice to hear the authors' thoughts on how that might affect their analyses. Second, the dN/dS-based analyses assume the neutrality of synonymous substitutions. Of course, that assumption is not completely true; it might be true enough, and the authors should at least note it as a caveat. Third, and potentially related, is the consideration that these protein-coding genes may be functioning in other ways such as via antisense transcription. The genes under relaxed selection may be on their way to becoming pseudogenes and evolving as such at the sequence level, but many pseudogenes continue to be transcribed sense or anti-sense in a regulatory purpose. I don't think there is a way to incorporate this into the authors' analyses but it would be nice to see it acknowledged as a caveat or limitation.

      We thank you for the helpful suggestion and have added a discussion of these issues in the reworked Caveats and limitations section (lines 639 - 710).

      Reviewer #1 (Recommendations for The Authors):

      This is an excellent paper with thorough and creative approaches to address an interesting connection between genotype and phenotype. Stylistically the paper is very well written.

      We thank you for your kind words.

      Page 3: I suggest deleting the word "vaginal" so the sentence reads "... the evolution of female traits such as anatomical features that allow female control...". Most of the well-documented examples of cryptic female choice are in animals that do not have vaginas like insects, fish, and birds, including the reference given at the end of the sentence (Brennan et al. 2007 on waterfowl).

      We agree and have made this edit.

      Page 3: I would delete the words "multimale-multifemale" when discussing gorillas, to make the sentence read "Most gorillas, for example, live in groups with age-graded...". The use of "multimale-multifemale" here is not exactly wrong, but can be confusing to the reader since the authors essentially use "multimale-multifemale" as a synonym for "polygamous" in the previous paragraph.

      We agree and have made this edit.

      The writing in the Materials and Methods fluctuates between present and past tense. The authors should pick a consistent style, probably past tense by convention.

      We have edited the Materials and Methods only to use past tense.

      "Drosophila" is italicized sometimes, but not sometimes not. Make consistent.

      To ensure consistency, italics were used only when genus and species were shown together (i.e., Drosophila melanogaster).

      In the main text, a few reference typos/confusions:

      Box 1, Figure 1B caption: I believe this "Dixson, n.d." reference should be Dixson (2009), if it refers to the book (Oxford Press).

      Yes, that is the case. Thank you for having spotted this. The reference has been corrected.

      Page 21: The authors use the term "false exons" and "fake exons" in the same paragraph. Are these the same thing? If so, just use "false exons" both times.

      These are the same, we have changed fake to false.

      Page 22-23, maybe elsewhere: The Smith et al. reference includes Martin's first name.

      Thank you for bringing this issue to our attention. The reference has been corrected.

      Page 25: in the parenthetical listing of scientific species names, the word "and" should not be italicized. In this same section, there's really no reason to include "gorilla" as the subspecies. It isn't given for the other species.

      Corrected.

      Page 27: Missing period in the second paragraph after "(Guyonnet et al. 2012)".

      Corrected.

      Page 29: Should read "... available in gnomAD that would allow us to exclude..." (or possibly "... available in gnomAD that would allow the exclusion of ...").

      Corrected.

      Page 33, figure legend off Appendix Figure 1A: "gray line" not "gray liner".

      Corrected.

      Box 1, Figure 1A: This is confusing in a few ways. First, the gorilla red dot is labeled "Gorilla", but the chimpanzee and bonobo dots are not labeled. Perhaps in the legend the colors could be indicated, such as "... percentage of body mass for gorilla (red), common chimpanzee (dark blue), and bonobo (light blue)"? Secondly, the bar chart shows the testes/body mass ratio but it is not clear what they are scaled to. Should there be a second y-axis on the right side of the plot?

      The bar chart showed the testis weight/body weight ratio (log), but it is not really necessary. We have removed the bar chart and labeled chimpanzees and gorillas.

      Figure 1D: I found myself confused by the vertical label of "Percent of genes with w>1 in Gorilla". Because all genes are in the stacked histogram, my first thought was that ~99% of the genes have w>1 (gray). Would be more clear if the label was the same as 1G ("Percent of genes").

      We agree and have made this change.

      The text in the figures is extremely small. I don't know what it will look like once it is fully formatted for publication, so I'll leave those concerns to the editor/publisher.

      We will wait until the proofs to determine if this figure needs to be split into multiple figures with larger text.

      References in the reference section need a LOT of cleaning up. It does not appear that any manual editing was done. Please check for consistency in capitalization, italicization, abbreviations, missing information, etc. The level of neglect to this section is frankly unprofessional.

      I (VJL) apologize for this; it is entirely my fault. To explain but not justify, I have dyslexia, and the shifting combination of text, numbers, punctuation, fonts, and font styles makes it difficult to see the inconsistencies. To mitigate this, I use a reference manager to format references (like everyone else) and almost always have someone proofread the reference section, but I didn’t do that with this manuscript. I apologize for the oversight. My dedicated co-authors have cleaned the reference section.

      Reviewer #2 (Public Review):

      As outlined in the public review, this is a nicely executed molecular evolutionary study. The analyses and overall patterns described in gorillas appear rigorous and convincing. The fundamental limitation here is a lack of comparative context to specifically establish the connection to mating system or the uniqueness of these overall patterns to gorillas.

      We thank the reviewer for the compliments. However, there is some confusion about the hypothesis we tested. We hypothesized that genes involved in male reproductive biology would have relaxed selective constraints in gorillas because of their mating system, not that polygynous mating systems would lead to relaxed selection. While that may be true, it is not the hypothesis we tested, nor do we state that the overall pattern we observe is unique to gorillas. Our data, however, support our claims: 1) We performed an unbiased selection scan in gorillas and identified genes with K<1, an evolutionary signature of reduced selection intensity; 2) We found that those genes were enriched for male reproductive functions; and 3) Some of those genes had effects on male reproduction in both Drosophila screens and in infertile men. These are the results one would expect if our hypothesis were true.

      To partly address the concern that our results do not have a connection to mating systems or may be an overall pattern rather than a gorilla-specific one, we ran RELAX using the same dataset but in the elephant seal, another species with a highly polygynous mating system. Although elephant seals are a polygynous species, they differ from gorillas in that their spermatogenesis does not undergo persistent deterioration, but instead follows a seasonal pattern. According to the comprehensive study by Laws (The Elephant Seal (Mirounga Leonina Linn.): III. The physiology of reproduction; Scientific Reports, 15, Falkland Islands Dependencies Survey, 1956], male gamete production is upregulated during the mating season and is mostly inactive throughout the rest of the year. Of the 573 genes with K<1 in gorillas only 14 also have K<1 in elephant seals, which had 350 genes with K<1. A GO analysis of the 350 elephant seal K<1 genes does not identify enrichment in spermatogenesis-related terms. In fact, the list of GO terms is quite broad. A potential, if admittedly speculative, interpretation of these findings is that although polygynous, the selective pressure on elephant seal spermatogenesis is not relaxed (unlike in gorillas) because of the seasonal nature of their mating period. In other words, by having a temporally narrower window for reproductive success than gorillas, the selective constraint on male gametogenesis in seals is not weakened. Regardless, the low overlap in relaxed genes between the two tested polygynous species support the view that this reproductive strategy is probably associated with different evolutionary signatures in the genome (depending on the species), a likely reflection of the complex, nuanced and multi-factorial aspects of such strategies. We include this analysis in the Appendix (lines 1112 - 1132).

      While there is much that I like about the study and approach, this is a substantial shortcoming that really limits the significance of the, especially given that lineage specific patterns were also analyzed by Scally et al. (2012) over a decade ago.

      While Scally et al. (2012) reported the initial sequencing, assembly, and analyses of the gorilla genome, the method they used to characterize selective pressure on coding genes - the branch and branch-site model implemented in PAML - is misspecified to detect relaxed selection (PMID: 25540451). Under relaxed selection, the d<sub>N</sub>/d<sub>S</sub> of sites under purifying selection will move towards 1, the d<sub>N</sub>/d<sub>S</sub> of sites under positive selection will also move towards 1, and some sites will not experience a change in d<sub>N</sub>/d<sub>S</sub>. The PAML test used Scally et al. (2012) averages d<sub>N</sub>/d<sub>S</sub> across all sites, rather than having distinct rate categories for each of the three selection classes. A change in d<sub>N</sub>/d<sub>S</sub> toward 1 under the PAML model can arise because the strength of positive selection is weaker in the foreground lineage than the background lineage, even if there is still positive selection acting on some sites. Averaging across all sites also means there is little power to detect relaxed selection, even if it is relaxed selection. Furthermore, the PAML test used by Scally et al. (2012) is underpowered to detect relaxed selection because it depends on selective regimes in background species. Scally et al. (2012) also used six species, which underpowers their test of relaxation, because if one or more of those species experience an increase in their d<sub>N</sub>/d<sub>S</sub> rate, the background rate will increase giving the appearance of a decrease in the gorilla lineage even if its d<sub>N</sub>/d<sub>S</sub> rate has not changed. We elaborate on this in the Appendix section (lines 1036 - 1073). Finally the method implemented in PAML does not allow for synonymous rate variation across sites or multi-nucleotide mutations per codon, ignoring synonymous rate variation dramatically inflates the false positive rates in selection tests (PMID: 32068869) as does ignoring multi-nucleotide mutations (PMID: 29967485 and PMID: 37395787); we have added a discussion of these issues in our Caveats and limitations section (lines 683 - 710).

      Reviewer #2 (Recommendations for The Authors):

      Specific comments

      Framing: Overall, the connection between mating system is referred in variable levels of certainty, some appropriate, others overstated. The paper title uses 'coincident' which is appropriate, but also at odds with the stronger conclusions that are emphasized throughout. Elsewhere the phrasing is much stronger (abstract, discussion) implying a direct statistical association with mating system variation that has not been established. Elsewhere the term 'association' is used in the same manner, but in instances where a statistical association is tested and demonstrated (tests of enrichment, etc).

      We are unsure why the Reviewer considers our claims overstatements. The patterns of molecular evolution we found are ‘associated,’ and 'coincident with,' and we believe our results are ‘compelling’. Our tests for relaxed and positive selection are statistically associated with a polygynous social system which we a priori hypothesized. We have taken care to ensure a more consistent framing of this connection throughout the manuscript to avoid potential misinterpretations of causality.

      Page 7, elsewhere- It is essential to compare the reported patterns (percentage of relaxed genes in gorilla, patterns of enrichment, etc) to other primate lineages to identify if this number is enriched due to mating system or if these patterns are unusually for sperm genes across mammals. The implication here and throughout is that the specific pattern reflects specific aspects of gorilla mating biology, but this is never established. Additionally, it would be interesting to know the relative number of genes under positive selection across species (or across great apes).

      We agree that if we were using a PAML-like approach that these controls would be informative. But with the RELAX method the foreground K is compared to the background K, K only becomes significantly less than one if there is relaxing in the intensity of selection in the foreground. If these patterns were common to sperm genes across mammals the background and foreground K would not be significantly different. Our a priori hypothesis was that genes related to male reproductive biology would show evidence of a decrease in the intensity of selection (both positive and purifying), which we tested and found to be true. In this regard, we can conclude that the gorilla mating system is associated with patterns of molecular evolution in the species’ genome.

      While we too would find it interesting to know the relative number of genes under positive selection across species (or across great apes), that is not the study we performed and is beyond the scope of this one (and we only identified 96 genes that were positively selected in gorilla suggesting that few genes are positively selected across species).

      Page 8, bottom, elsewhere- "13,491 background set" elsewhere this is 13,310 (abstract). The number of genes here is different, and the set seems to change across multiple parts of the paper without explanation. This could be a simple typo, however, it may affect statistical analysis if the problem is widespread, especially when assessing enrichment of (presumably) small sets of genes.

      This is partly true and partly a typo. We generated 13,491 alignments, 13,310 of which had HUGO gene symbols. These 13,310 genes were used in all subsequent studies. We have re-written the text to clarify this point, and have added a statement: “We thus generated a dataset of 13,491 orthologous coding gene alignments from the genomes of 261 Eutherian mammals, corresponding to 62.7% of all protein-coding genes in the gorilla genome. Of the 13,491 alignments, 13,310 had an identifiable HUGO gene symbol and were used in all subsequent analyses (lines 158 - 162).”

      Related to this, it is difficult to determine how many genes these GO associations are based on. Even small numbers of genes can result in very significant results with these tests. How many genes are these associations based on? This connection is a key component of the overall narrative that changes in sperm competition have a large effect on genome-wide shifts.

      All analyses are based on the 13,310 genes with identifiable HUGO gene symbols, including over-representation analyses (ORA). Our dataset submitted with this manuscript includes these 13,310 genes (as well as the genes with K<1 and K>1). The number of genes used as the foreground is the 578 with K<1, these genes are given in Figure 1 – source data 3. The minimum number of genes annotated in a GO or pathway term was 3. While it is unlikely that statistically significant GO term enrichments result from a few genes annotating to each term, that scenario would produce small P-values, the false discovery rate would be high and readers can decide what false discovery they are willing to accept.

      How many of these 578 genes are plausibly related to reproduction? Apologies if I missed this detail, but Figure 3 does not convey this. Could you speak to this directly in the text and include a table or supplemental table of the GO terms to show the differences in enrichment between classes of genes, and counts per term?

      These data are included in Figure – 3 source data 1.

      One of the key results is the relative frequency of relaxed constraint versus positive selection. This is expected on some level as the form of recurrent positive directional selection detected with these models is usually relatively rare. However, it is not at all clear that it is rarer in gorillas versus other mammals, as implied.

      Our comparison of relaxed constraint to positive selection was to explore if more genes experienced one pattern of molecular evolution or the other within gorillas, we do not imply that it is rarer in gorillas than in other mammals.

      Likewise, I was wondering how the dataset itself may be biased toward this result. If I understand correctly, you are requiring very high levels of conservation (251/261 genes) for inclusion in the dataset, resulting in ~60% of all gorilla genes being included. Rapidly evolving genes that are targets of recurrent positive selection often also tend not be highly conserved across such a deep phylogenetic sample. It would be good to acknowledge this potential bias when implying meaning to the differences in relative rates of the two forms of selection.

      Our results are unlikely to be subject to this bias. The RELAX test relies on accurately estimating K in background lineages, which requires that we include as many species as possible. The tradeoff is a reduction in the number of genes included in the dataset due to evolutionary dynamics across a wide range of species. However, it's not that 40% of the genes are excluded because they are evolving so rapidly we cannot identify or align them, it mainly reflects the fact that we cannot identify the gene in 251 of the 261 species included in the dataset (due to gene loss, etc).

      Page 9 - The results here (and in Figure 3D) shows that relaxed genes are enriched broadly across spermatogenesis cell types except for Sertoli cells. But the Sertoli cells and a few non-significant cell types are the only thing to compare to. Instead, it would be interesting to identify single cell expression patterns from other tissues- or even bulk RNA as sc-RNA may be limited in the species. This would show that these genes are enriched in testis compared to other tissues, as opposed to just being broadly expressed. Additionally, the authors could compare to the other primate testis sc-RNA available in Murat et al. Without such comparisons the interpretations here seem limited.

      We did not test whether K<1 were enriched in other cell types because: 1) we had an a priori hypothesis that genes with K<1 would be enriched in cells involved in male reproduction, rather than enriched in cell types in the testis compared to any other cell type; and 2) The number of genes with K<1 is relatively small and the number of known cell-types in very large, at least one estimate points to ~400 major cell types in a higher primate (PMID: 37722043). Using a P-value of 0.05 from a hypergeometric or Fisher's exact test and a Bonferroni correction to control for multiple hypothesis testing, we would need the P-value for enrichment in any cell type to be 0.000125, which we are unlikely to achieve.

      More comprehensive functional comparisons could provide evidence that even though relaxed constraint is present in all lineages, perhaps relaxed constraints in the gorilla lineages are more related to sperm formation and function.

      The RELAX test is a relative one; while relaxed constraint may be present in other lineages, to observe a statistically significant K<1 in gorillas the degree of relaxation would have to have a greater effect size in gorilla than in other lineages.

      I was also a little unclear what to make of the interpretation of K<1 versus K >1 enrichment by cell type. The enrichment of K<1 is called out as noteworthy because this is when the spermatogenesis specific genes begin to be expressed, but then the K > 1 result is dismissed as occurring during pachytene which is a transcriptional permissive state of testis. To be clear, pachytene is also a critical checkpoint for fertility and enhanced purifying selection at this step could be reasonably interpreted as being at odds with the entire erosion of reproduction argument. This seems to be a selective interpretation for the overall narrative. Also, permissive transcription is not only limited to the pachytene stage and the relaxation of constraint concomitant with increased specificity and permissive expression during the later stages of spermatogenesis is a well-known result in mammals, and not anything that can be ascribed gorillas and their change in mating system.

      We agree with the Reviewer’s comment and have removed the K<1 versus K>1 interpretation from the manuscript.

      Page 13 - The LOF enrichment identified from this random sampling is borderline significant. An improved approach would be to perform permutations of random samplings and identify the range of significance based on 1000+ permutations.

      We have redone the burden test with population-matched groups to confirm the reliability of this association (lines 435 - 446). In addition, we now acknowledge in the Caveats and limitation section that our observations could benefit from a permutation analysis (lines 695 - 697).

      Page 17, bottom- Statements like these are overstating the correlation as the comparative analyses were not shown.

      We agree and have edited the text to avoid potential overstatements.

      This is good to include the role of female reproductive tract. Shouldn't the unbiased screen pull these out anyway? The authors did find some female GO terms enriched. What additional information or experiments would be needed to test the hypothesis of female compensation? The expectations for this should be made clearer.

      Given the nature of these putative female compensatory mechanisms (primarily acting on the oviduct and lower uterus, as speculated in lines 586 – 601), it is currently impossible to functionally test them in gorillas. The continued development of in vitro systems mimicking the female reproductive tract may allow such studies in the future.

      Page 18, middle- Pleiotropy is an important consideration and this paragraph discusses some valuable points. However, this is another section that could be improved by discussing the relaxed constraints in later spermatogenesis, which likely suggests that genes expressed in later stages are less pleiotropic and more testis- specific.

      We agree and have added a brief discussion of this in lines 619 - 622: “It is also possible that the negative consequences of deleterious pleiotropy become less pronounced at later stages of spermatogenesis as meiotic and post-meiotically expressed genes are enriched for testis-specific functions (PMID: 36544022).”

      Page 27, Bottom- The criteria for selection of genes to target here is interesting and disconnected from the claimed interpretation of the results. If you're targeting genes with reliable expression in Drosophila, it is not surprising that a percentage of them will lead to fertility loss. Shouldn't the background be a random set of testis-expressed genes? This test would show that relaxed constraint is a strong way to screen for fertility genes. Additionally, the authors previously showed that these genes were enriched in SC-rna in gorilla,- and likely other species. Suggesting that you identified genes 'lacking evidence' of a role in spermatogenesis in previous studies is misleading, when many of these genes are present in testis RNA datasets and enriched for sperm go terms. I would argue that genes found to be expressed in testis and spermatogenesis specific cell types, certainly have evidence of being involved in spermatogenesis.

      We thank you for the helpful suggestion. We have generated a new background group composed of a random set of testis-expressed genes. More specifically, by looking at previously published Drosophila testis expression data (PMID: 30249207), we randomly selected 156 genes with TPM>1 (transcript per million) and determined the percentage of them with reported spermatogenic / male fertility defects in Drosophila. We observed that 18 (11.5%) had been previously demonstrated to be functionally required for male reproductive fitness. This percentage is slightly higher than what we had previously observed for a random selection of Drosophila genes (9.6% - an update, using the latest available data, to the 7.7% reported in the original version). Nevertheless, both figures are still well below the 27.6% hit rate we found for the Drosophila orthologs of the gorilla K<1 genes. We have added this new information to the manuscript (lines 380 - 386).

      Regarding the potential correlation between expression and function in spermatogenesis, we and others have shown that the majority of the protein-coding genome is expressed during spermatogenesis in both vertebrate and invertebrate species (PMID: 39388236). Although the reasons for such widespread transcription in the male germ line are not entirely clear, it advises a cautious approach in terms of correlating expression with function. Indeed, our recent analysis of 920 genes reliably expressed in insect and mammalian spermatogenesis revealed that only 27.2% of them caused male reproductive impairment when individually silenced in the Drosophila testis (PMID: 39388236). Since genetic redundancy is a factor that needs to be taken into consideration when dealing with such a central biological process for the survival of a species, we take the more stringent approach of only considering a gene to be functionally involved in spermatogenesis if there is phenotypical evidence (from our RNAi assay or from previous publications) that its disruption is associated with spermatogenic impairment and/or abnormal fertility. We have added this clarification to the manuscript (lines 349 - 363).

      Page 17 "Our data ... suggests that gorillas may be at the lowest limit of male reproductive function that can be maintained by natural selection (at least in mammals or vertebrates)." I realize this is the speculation section, but this is a massive overstatement. There is absolutely nothing in your data or results that support this statement, nor is this supported by the extensive comparative reproductive data in mammals. For example, there are many mammalian systems that show lower metrics of reproductive function than gorillas. For example, the sperm abnormality indices in Box 1F are nowhere near as severe as found in many species that still somehow manage to reproduce.

      We agree and have edited the text to avoid potential overstatements (see above).

      Reviewer #3 (Recommendations for The Authors):

      (1) More discussion is needed as to whether their results could be explained by a reduction in effective population size in gorillas.

      Thank you for raising this important point. As you know, reduced effective population size can lead to an increased load of deleterious mutations/relaxed selection intensity. However, we do not believe that it substantially affects our observations. Indeed, relatively few genes have K<1 and those are enriched in sperm biology. Given that a reduced effective population size will plausibly increase the load of deleterious mutations and relaxed selection across many genes, it is unlikely that such a broad phenomenon would result in a specific enrichment in genes related to male reproductive biology. We have added this reasoning to the Caveats and limitations section (lines 675 - 682).

      (2) Properly controlled genetic association testing when performing a burden test is essential, and methods that allow for some variants to be associated with increased fertility should be considered. Rare variants are much more likely to show population-specific differences, and selecting humans from two potentially very different cohorts and sample sizes can easily lead to confounding. I suggest performing a principal component analysis to ascertain the degree of genetic differentiation between these cohorts, and use this to guide the selection of a subset of the control cohort as well.

      We agree and have replicated this analysis using only individuals of European descent; our conclusions have not changed but the P-values have become lower (lines 435 - 446).

      (3) Citations should also be included in Table 1, for each relevant phenotype. You may also want to consider a more general comparison of p-values and effect sizes of genome-wide association studies for human male infertility to test for an enrichment in/nearby genes showing relaxed selection along the gorilla lineage. In other words, do the relaxed genes in the gorilla lineage have an enrichment of small p-values for being associated with male infertility.

      Citations have been included in Table 1, as suggested, and the table has been updated to include the latest reported phenotypes.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study presents an interesting investigation into the role of trained immunity in inflammatory bowel disease, demonstrating that β-glucan-induced reprogramming of innate immune cells can ameliorate experimental colitis. The findings are novel and clinically relevant, with potential implications for therapeutic strategies in IBD. The combination of functional assays, adoptive transfer experiments, and single-cell RNA sequencing provides comprehensive mechanistic insights. However, some aspects of the study could benefit from further clarification to strengthen the conclusions.

      We are grateful for the reviewer’s positive assessment of our study and constructive suggestions to improve the manuscript.

      Strengths:

      (1) This study elegantly connects trained immunity with IBD, demonstrating how βglucan-induced innate immune reprogramming can mitigate chronic inflammation.

      (2) Adoptive transfer experiments robustly confirm the protective role of monocytes/macrophages in colitis resolution.

      (3) Single-cell RNA sequencing provides mechanistic depth, revealing the expansion of reparative Cx3cr1⁺ macrophages and their contribution to epithelial repair.

      (4) The work highlights the therapeutic potential of trained immunity in restoring gut homeostasis, offering new directions for IBD treatment.

      Weaknesses:

      While β-glucan may exert its training effect on hematopoietic stem cells, performing ATAC-seq on HSCs or monocytes to profile chromatin accessibility at antibacterial defense and mucosal repair-related genes would further validate the trained immunity mechanism. Alternatively, the authors could acknowledge this as a study limitation and future research direction.

      We appreciate your comments on assessing the chormoatain accessibility of HSCs induced by b-glucan training, as epigenetic reprogramming is known to be one of the underlying mechanisms for trained immunity suggest by many groups including our group. To delineate the genome-wide epigenetic reprogramming induced by β-glucan (BG), we reanalyzed publicly available chromatin profiling datasets where ATACseq of HSC from control and β-glucan trained mice was performed (accession number: CRA014389). Comparative analysis revealed HSC from BG-trained mice demonstrated pronounced enrichment at promoters and distal intergenic regions—key regulatory loci governing transcriptional activity (Fig. S7A). This divergent genomic targeting was further corroborated by distinct signal distribution profiles (Fig. S7B), supporting pronounced upregulation-driven remodeling of the epigenomic landscape induced by BG treatment. Functional annotation of these epigenetically primed promoters via GO term analysis revealed significant enrichment of immune-relevant processes, including leukocyte migration, cell-cell adhesion, and chemotaxis (Fig. S7C). Consistently, KEGG pathway analysis highlighted the enrichment of signaling cascades such as chemokine signaling and cell adhesion molecules (Fig. S7D), reinforcing the involvement of BG-induced trained immunity in inflammatory and mucosal homing pathways.

      Furthermore, promoter-centric enrichment of terms related to “defense response to bacterium” (Fig. S7E) underscored the role of BG in priming antibacterial transcriptional programs, which is a crucial axis for maintaining intestinal homeostasis. Locus-specific examination of chromatin states further validated BG-induced epigenetic modifications in the upstream regions of selected target genes, including Gbp5, Gbp2 and S100a8 and Nos2 (Fig. S7F). Collectively, our integrative reanalysis demonstrates that BG reshapes the epigenomic architecture at regulatory elements, thereby orchestrating immune gene expression programs directly relevant to IBD pathophysiology and mucosal immunity. (Line 201-211)

      Reviewer 1 (Recommendations for the authors):

      (1) It’s better to include a schematic summarizing the proposed mechanism for reader clarity.

      We appreciate your comments and proposed a graphical abstract as in Author response image 1.

      Author response image 1.

      (2) Discuss potential off-target effects of β-glucan-induced trained immunity (e.g., risk of exacerbated inflammation in other contexts).

      We appreciate this important comment regarding the potential off-target or side-effects of β-glucan induced trained immunity. As trained immunity is known to augment inflammatory responses upon heterologous stimulation and has been implicated in chronic inflammation–prone conditions such as atherosclerosis, this is an important consideration. Previous in vivo studies have shown that β-glucan pretreatment can enhance antibacterial or antitumor responses without inducing basal inflammation after one week of administration (PMID: 22901542, PMID: 30380404, PMID: 36604547, PMID: 33125892). Nevertheless, it remains possible that β-glucan–induced trained immunity could have unintended effects in certain contexts, which warrants further investigation and caution. We have discussed this potential caveat in the discussion (Lines 299-302)

      Reviewer #2 (Public review):

      Summary:

      The study investigates whether β-glucan (BG) can reprogram the innate immune system to protect against intestinal inflammation. The authors show that mice pretreated with BG prior to DSS-induced colitis experience reduced colitis severity, including less weight loss, colon damage, improved gut repair, and lowered inflammation. These effects were independent of adaptive immunity and were linked to changes in monocyte function.

      The authors show that the BG-trained monocytes not only help control inflammation but confer non-specific protection against experimental infections (Salmonella), suggesting the involvement of trained immunity (TI) mechanisms. Using single-cell RNA sequencing, they map the transcriptional changes in these cells and show enhanced differentiation of monocytes into reparative CX3CR1<sup>+</sup> macrophages. Importantly, these protective effects were transferable to other mice via adoptive cell transfer and bone marrow transplantation, suggesting that the innate immune system had been reprogrammed at the level of stem/progenitor cells.

      Overall, this study provides evidence that TI, often associated with heightened inflammatory programs, can also promote tissue repair and resolution of inflammation. Moreover, this BG-induced functional reprogramming can be further harnessed to treat chronic inflammatory disorders like IBD.

      Strengths:

      (1) The authors use advanced experimental approaches to explore the potential therapeutic use of myeloid reprogramming by β-glucan in IBD.

      (2) The authors follow a data-to-function approach, integrating bulk and single-cell RNA sequencing with in vivo functional validation to support their conclusions.

      (3) The study adds to the growing evidence that TI is not a singular pro-inflammatory program, but can adopt distinct functional states, including anti-inflammatory and reparative phenotypes, depending on the context.

      We are grateful for your positive assessment of our study and recognition of its translational implications. We particularly appreciate the acknowledgment that our work expands the therapeutic potential of β-glucan–mediated trained immunity in ameliorating colitis.

      Weaknesses:

      (1) The epigenetic and metabolic basis of TI is not explored, which weakens the mechanistic claim of TI. This is especially relevant given that a novel reparative, antiinflammatory TI program is proposed.

      We appreciate your valuable comment highlighting the importance of the epigenetic and metabolic basis of TI in providing mechanistic insight. While previous studies, including work from our group (S.-C. Cheng), have extensively characterized the epigenetic and metabolic signatures of monocytes from BG-trained mice—primarily in the context of inflammatory genes—we acknowledge that these aspects are not directly addressed in our current manuscript as the current manuscript was aimed to build on the foundation of β-glucan-induced trained immunity established by many other groups including us and address its potential as a therapeutic approaches in the colitis setup.

      That being said, we fully agree with your comments to analyze the epigenetic profile on key pathways similar to the question raised by reviewer 1, we reanalyze the relevant public datasets and presenting summarize the finding in Supplementary Figure S7. ATAC-seq analysis further validated and provide the epigenetic basis of the enhanced inflammatory and antibacterial capacity of monocytes which are seeded back in the HSC compartment.

      (2) The absence of a BG-only group limits interpretation of the results. Since the authors report tissue-level effects such as enhanced mucosal repair and transcriptional shifts in intestinal macrophages (colonic RNA-Seq), it is important to rule out whether BG alone could influence the gut independently of DSS-induced inflammation. Without a BG-only control, it is hard to distinguish a true trained response from a potential modulation caused directly by BG.

      We thank the reviewer for this important suggestion. Although we did not perform qPCR for mucosal repair genes in Figure S1C and Figure S1D, our colon RNA-seq analysis in Figure 5G included a BG-only control group (Colitis_d0). These results indicate that BG preconditioning alone does not alter baseline expression of colon mucosal repair genes, supporting the conclusion that the observed effects occur in the context of DSS-induced inflammation.

      (3) Although monocyte transfer experiments show protection in colitis, the fate of the transferred cells is not described (e.g., homing or differentiation into Cx3cr1<sup>+</sup> macrophage subsets). This weakens the link between specific monocyte subsets and the observed phenotype.

      We thank the reviewer for this important point. We acknowledge that direct in vivo tracking of the adoptively transferred monocytes to confirm their homing to the colon and differentiation into specific macrophage subsets would strengthen the mechanistic link. However, due to technical limitations in reliably tracing the fate of transferred cells in our experimental setting, we were unable to provide this direct evidence. Instead, we present a strong correlative and functional evidence chain that supports the proposed model:

      (a) Following BG pretreatment, we observed a significant decrease in circulating Ly6Chi monocytes specifically at the peak of colitis (day 7, Fig. 5D), concurrent with a marked increase in monocytes/macrophages within the colonic lamina propria (Fig. 2D). This inverse relationship strongly suggests enhanced recruitment of monocytes from the blood into the inflamed colon upon BG training.

      (b) Using CX3CR1-GFP reporter mice, we found that BG pretreatment led to an increased proportion of colonic myeloid cells in an intermediate state (P5: Ly6C<sup>+</sup>MHCII<sup>+</sup>CX3CR1<sup>+</sup>, Fig. 5F). This population represents monocytes actively undergoing differentiation into intestinal macrophages, supporting the idea that BG accelerates the monocyte-to-macrophage transition in situ.

      (c) Our scRNA-seq analysis independently revealed an expansion of monocyte-derived macrophage clusters (e.g., Macro1, Macro2) in BG-treated mice, which express canonical tissue macrophage markers (including Cx3cr1) and genes associated with tissue repair (e.g., Vegfa, Fig. 4A, 5H, 5I).

      These data collectively indicate that BG-trained monocytes exhibit enhanced capacity for colonic recruitment and preferential differentiation toward reparative macrophage subsets, which aligns with the protective phenotype observed after adoptive transfer. We have explicitly noted the absence of direct fate-mapping data as a limitation in the revised Discussion and agree that future studies employing advanced tracing techniques would be valuable to definitively establish this cellular trajectory. (Line 378-380)

      (4) While scRNA-seq reveals distinct monocyte/macrophage subclusters (Mono1-3.), their specific functional roles remain speculative. The authors assign reparative or antimicrobial functions based on transcriptional signatures, but do not perform causal experiments (depletion or in vitro assays). The biological roles of these cells remain correlative.

      We agree that the functional role of CX3CR1<sup>+</sup> macrophages is not comprehensively validated and is currently inferred from scRNA-seq clustering. While our flow cytometry data show increased CX3CR1<sup>+</sup> macrophages in the BG-TI group, and our CCR2 KO and monocyte adoptive transfer experiments indicate these macrophages are monocyte-derived, suggesting at least that β-glucan pretreatment alters the monocyte capacity which directly contribute to the enhanced colitis alleviation phenotype as observed. However, due to the fact that we fail to find a cluster dependent marker, which is also the current biggest caveats of the scRNAseq defined cell subclusters, we were not able to show direct casual evidence via specifically depleting subcluster cells. However, the result from the monocyte adoptive transfer experiment with Ccr2 KO mice experimental strongly suggest the presence of monocytes is crucial for this protective effect. We fully acknowledge this as a limitation of current study and clarify in the discussion that our conclusions regarding CX3CR1<sup>+</sup> macrophage function are mainly based on transcriptional profiling and association with protective phenotypes, rather than direct causal evidence (Lines 400-404).

      (5) While Rag1<sup>-/-</sup> mice were used to rule out adaptive immunity, the potential role of innate lymphoid cells (ILCs), particularly ILC2s and ILC3s, which are known to promote mucosal repair (PMID: 27484190 IF: 7.6 Q1 IF: 7.6 Q1 IF: 7.6 Q1), was not explored. Given the reparative phenotype observed, the contribution of ILCs remains a confounding factor.

      We appreciate your valuable comment regarding the potential role of ILCs in the observed mucosal repair. Indeed, in our current manuscript examining the BG-trained immunity effect, the contribution of ILCs was not evaluated. Due to the fact that adoptive transfer of trained monocytes into CCR2 KO mice could recapitulate the colitis alleviation phenotype, we think at least the β-glucan enhanced protection are dependent on trained monocytes. While acknowledge that the limitation and we could not rule out the possible role of ILCs in this process and discuss this limitation in the discussion in the revised manuscript.

      The literature (PMID: 21502992; PMID: 32187516) supports a role for ILC3-mediated IL-22 production in tissue repair, which could overlap with our observed effects. However, our monocyte adoptive transfer experiments show that monocytes alone can alleviate DSS-induced colitis, suggesting a dominant role for monocytes in this context. Nonetheless, we will make it clear that ILC contributions cannot be excluded. (Line 322-326).

      Reviewer 2 (Recommendations for the authors):

      (1) The authors do not provide direct mechanistic evidence of TI (e.g., epigenetic and metabolic reprogramming). The absence of such data weakens the mechanistic strength of the TI claim. The authors should soften the terminology to BGinduced myeloid reprogramming suggestive of trained immunity, acknowledge, and discuss this limitation.

      We appreciate your comment highlighting the lack of direct epigenetic and metabolic assessment in our current study. Previous work from our group (S.-C. Cheng) and others has extensively documented the epigenetic and metabolic profiles of monocytes from β-glucan–trained mice, focusing primarily on inflammatory-related genes. Based on this established foundation, our current manuscript focuses on exploring the translational potential of BG-induced trained immunity.

      That said, as mentioned in our response to the identified weakness, we performed reanalysis from the public epigenetic datasets with a focus on pathways related to reparative and antibacterial functions and integrated this part in the revised manuscript (Fig S7, Lines 201-211).

      (2) CX3CR1<sup>+</sup> macrophages' role is not functionally validated. The data relies solely on scRNA-seq and cluster annotations, which are insufficient to confirm functional roles in vivo. Depletion or in vitro studies would provide stronger causal evidence. The authors should acknowledge this limitation in the Discussion.

      We agree that the functional role of CX3CR1<sup>+</sup> macrophages is not comprehensively validated and is currently inferred from scRNA-seq clustering. While our flow cytometry data show increased CX3CR1<sup>+</sup> macrophages in the BG-TI group, and our CCR2 KO and monocyte adoptive transfer experiments indicate these macrophages are monocyte-derived, suggesting at least that β-glucan pretreatment alters the monocyte capacity which directly contribute to the enhanced colitis alleviation phenotype as observed. However, due to the fact that we fail to find a cluster dependent marker, which is also the current biggest caveats of the scRNAseq defined cell subclusters, we were not able to show a direct casual evidence. We fully acknowledge this as a limitation of current study and clarify in the discussion that our conclusions regarding CX3CR1<sup>+</sup> macrophage function are mainly based on transcriptional profiling and association with protective phenotypes, rather than direct causal evidence (Lines 395-404).

      (3) Rag1<sup>-/-</sup> mice retain innate lymphoid cells (ILCs), particularly ILC3, which are mucosal and produce IL-22, contributing to tissue repair (PMID: 21502992; PMID: 32187516). The potential for BG to activate ILCs remains unexplored in this study. This limits the interpretation of whether the observed protection arises from monocyte/macrophage reprogramming or is partially mediated by residual ILC activity. The authors should explicitly acknowledge this limitation and discuss the possible contribution of ILCs to the observed phenotype.

      We appreciate your valuable comment regarding the potential role of ILCs in the observed mucosal repair. Indeed, in our current manuscript examining the BG-trained immunity effect, the contribution of ILCs was not evaluated. Due to the fact that adoptive transfer of trained monocytes into CCR2 KO mice could recapitulate the colitis alleviation phenotype, we think at least the β-glucan enhanced protection are dependent on trained monocytes. While acknowledge that the limitation and we could not rule out the possible role of ILCs in this process and discuss this limitation in the discussion in the revised manuscript

      The literature (PMID: 21502992; PMID: 32187516) supports a role for ILC3-mediated IL-22 production in tissue repair, which could overlap with our observed effects. However, our monocyte adoptive transfer experiments show that monocytes alone can alleviate DSS-induced colitis, suggesting a dominant role for monocytes in this context. Nonetheless, we will make it clear that ILC contributions cannot be excluded. (Line 322-327).

      (4) Figure 1-It would help to clarify whether a BG-only control group (without DSS) was included in the design. This would be critical to determine if BG alone alters the colon. If omitted, the authors should clearly state this and consider adding such a group in future experiments. This would help define the baseline effects of BG and support the claim that its benefits are dependent on TI (upon second challenge - DSS).

      We appreciate this valuable suggestion. While we did not perform qPCR to assess mucosal repair genes in Figure S1C and Figure S1D, our colon RNA-seq analysis in Figure 5G included a dedicated BG-only control group at based line before DSStreatment (Colitis_d0). These data indicate that BG preconditioning alone does not alter the baseline expression of colon mucosal repair genes.

      (5) Figure 3 - It would strengthen the conclusions to include a vehicle-treated PBS BMT donor control group, or to state its absence. It is unclear whether the protective effect observed in recipients of BG-treated BM is due to trained immunity or to non-specific effects of transplantation, irradiation, or batch variation.

      We fully agree with your comments that it is critical to including the vehicle-treated PBS BMT control to rule out any non-specific effects induced by transplantation, irradiation or batch variation. We actually did the blank PBS transfer control everytime after mice received irradiation treatment as a control to assess the successful induction of irradiation to get rid of bone marrow from irradiated mice. Mice that receive PBS only will die after 8 days while only mice receiving either bone marrow from PBScontrol or BG-treatment group will survive. We also perform flowcytometry to examine the successful BMT transplantation (Fig S5C). We have added part regarding the vehicle-treated control for BMT in the material method section for clarification (Lines 456-466).

      (6) No gene expression or phenotypic data is provided for monocytes/macrophages in BMT recipients; therefore, it cannot be confidently stated that these cells were reprogrammed. Expression/phenotypic data should be added or discussed.

      We thank the reviewer for raising this important point. We acknowledge that a detailed transcriptomic or phenotypic analysis of donor-derived tissue-resident myeloid cells in the BMT recipients would provide the most direct evidence for their reprogrammed state.

      While our BMT study focused primarily on assessing the transferability of the protective phenotype via endpoint disease parameters and circulating immune cell composition, we present a coherent and compelling line of evidence supporting the conclusion that BG's training effect is maintained within the hematopoietic system of recipients and mediated by reprogrammed myeloid cells:

      (a) A key finding is the significant increase in the proportion of donor-derived Ly6Chi monocytes in the peripheral blood of recipients receiving BG-trained bone marrow (Fig. 3J). This is not a bystander effect but direct evidence that the BG-induced on donor hematopoietic stem/progenitor cells instructs a biased differentiation program towards a specific effector precursor population within the new host, demonstrating the functional persistence of the trained state post-transplantation.

      (b) The core of reprogramming in trained immunity lies in persistent epigenetic and functional changes. Our new analysis of public datasets (Fig. S7) confirms that BG directly reshapes the chromatin accessibility landscape in hematopoietic stem cells (HSCs), particularly at loci regulating immune and antibacterial responses. This provides the fundamental mechanism explaining how the trained phenotype is both long-lasting and transplantable: the reprogramming occurs at the progenitor level.

      (c) The most causally compelling data in our study comes from the independent adoptive transfer experiment, where transfer of purified BG-trained monocytes alone was sufficient to ameliorate colitis in recipient mice (Fig. 3K, L). This definitively proves that the trained monocytes themselves carry the protective functional program. It strongly suggests that these reprogrammed monocytes/macrophages are the likely effectors mediating protection in the BMT model.

      (d) Our interpretation aligns with well-established paradigms in the field. Precedent studies confirm that the BG-trained phenotype (e.g., enhanced cytokine potential) can be transferred via BMT or monocyte adoption. For instance, Haacke et al. (PMID: 40020679) demonstrated that splenic monocytes from BG-trained donors, when transferred into arthritic recipient mice, led to elevated inflammatory cytokine (e.g., Tnf, Il6) expression in recipient joints, directly proving the maintained functional reprogramming of trained cells in a heterologous host environment. This provides a strong precedent supporting the functional activity of transferred trained cells in our model.

      (7) The study is consistent with emerging evidence that distinct TI programs may exist depending on the stimulus and context, including immunoregulatory and tissue-reparative responses (PMID: 35133977; PMID: 31732931; PMID: 32716363; PMID: 30555483). The authors should integrate this perspective into the Discussion to acknowledge that their findings may represent one example of such context-dependent, potentially reparative TI programs. This would place the study within the growing literature describing functional heterogeneity in innate immune training.

      We appreciate this suggestion and have incorporated it into the discussion. In the revised manuscript, we discussed how our findings of BG-induced protective myeloid reprogramming align with the concept of tissue-reparative or immunoregulatory TI, which is distinct from the pro-inflammatory TI phenotypes described in other contexts. By highlighting the functional heterogeneity of innate immune training, we position our work as an example of a stimulus-specific, reparative TI program. (Lines 356-379)

      Reviewer #3 (Public review):

      Summary:

      In the present work, Yinyin Lv et al offer evidence for the therapeutic potential of trained immunity in the context of inflammatory bowel disease (IBD). Prior research has demonstrated that innate cells pre-treated (trained) with β-glucan show an enhanced pro-inflammatory response upon a second challenge.

      While an increased immune response can be beneficial and protect against bacterial infections, there is also the risk that it will worsen symptoms in various inflammatory disorders. In the present study, the authors show that mice preconditioned with β-glucan have enhanced resistance to Staphylococcus aureus infection, indicating heightened immune responses.

      The authors demonstrate that β-glucan training of bone marrow hematopoietic progenitors and peripheral monocytes mitigates the pro-inflammatory effects of colitis, with protection extending to naïve recipients of the trained cells.

      Using a dextran sulfate sodium (DSS)-induced model of colitis, β-glucan pre-treatment significantly dampens disease severity. Importantly, the use of Rag1<sup>-/-</sup> mice, which lack adaptive immune cells, confirms that the protective effects of β-glucan are mediated by innate immune mechanisms. Further, experiments using Ccr2<sup>-/-</sup> mice underline the necessity of monocyte recruitment in mediating this protection, highlighting CCR2 as a key factor in the mobilization of β-glucan-trained monocytes to inflamed tissues. Transcriptomic profiling reveals that β-glucan training upregulates genes associated with pattern recognition, antimicrobial defense, immunomodulation, and interferon signaling pathways, suggesting broad functional reprogramming of the innate immune compartment. In addition, β-glucan training induces a distinct monocyte subpopulation with enhanced activation and phagocytic capacity. These monocytes exhibit an increased ability to infiltrate inflamed colonic tissue and differentiate into macrophages, marked by increased expression of Cx3cr1. Moreover, among these trained monocyte and macrophage subsets, other gene expression signatures are associated with tissue and mucosal repair, suggesting a role in promoting resolution and regeneration following inflammatory insult.

      Strengths:

      (1) Overall, the authors present a mechanistically insightful investigation that advances our understanding of trained immunity in IBD.

      (2) By employing a range of well-characterized murine models, the authors investigate specific mechanisms involved in the effects of β-glucan training.

      (3) Furthermore, the study provides functional evidence that the protection conferred by the trained cells persists within the hematopoietic progenitors and can be transferred to naïve recipients. The integration of transcriptomic profiling allows the identification of changes in key genes and molecular pathways underlying the trained immune phenotype.

      (4) This is an important study that demonstrates that β-glucan-trained innate cells confer protection against colitis and promote mucosal repair, and these findings underscore the potential of harnessing innate immune memory as a therapeutic approach for chronic inflammatory diseases.

      Thank you for the positive evaluation and constructive feedback on our manuscript.

      Weaknesses:

      However, FPKM is not ideal for between-sample comparisons due to its within-sample normalization approach. Best practices recommend using raw counts (with DESeq2) for more robust statistical inference.

      We appreciate the reminder about best practices for RNA-seq analysis. We apologize for the inaccurate description in the Materials and Methods section. For all differential expression analyses, we have in fact used raw count data as input for DESeq2. FPKM values were only used for visualization purposes, such as in heatmaps and clustering analyses. We correct this description in the revised manuscript to accurately reflect our analysis workflow. (Lines 488-499)

      Reviewer 3 (Recommendations for the authors):

      (1) Current best practices recommend working with raw count data when using DESeq2 to ensure statistically robust differential expression analysis between samples. However, for visualization and clustering, like heatmaps, FPKMs can be used. Could the authors explain why they have used FPKM for differential gene expression analysis?

      We appreciate the reminder about best practices for RNA-seq analysis. We apologize for the inaccurate description in the Materials and Methods section. For all differential expression analyses, we have in fact used raw count data as input for DESeq2. FPKM values were only used for visualization purposes, such as in heatmaps and clustering analyses. We correct this description in the revised manuscript to accurately reflect our analysis workflow. (Lines 488-499)

      Minor Comment

      (1) Line 92: remove extra word "that".

      We remove the extra word “that” from Line 92 in the revised manuscript.

      (2) Line 201: please state here what "GBP" stands for, as it appears first.

      We define “GBP” as “Guanylate-Binding Protein” at its first appearance in Line 201. (Lines 213)

      (3) Line 235: consider rewriting "we analyzed the day 7 RNA-seq data, which revealed significant enrichment of the myeloid"; added spacing for "day 7", "which", and "the".

      We revise the sentence in Line 235 to read: “We analyzed the day 7 RNA-seq data, which revealed significant enrichment of the myeloid…” to improve readability. (Lines

      246-247)

      (4) Line 290: consider rewriting " as seen in conditions such as rheumatoid arthritis and ...".

      We revise Line 290 to: “as observed in conditions such as rheumatoid arthritis and…” for clarity. (Lines 301-302)

      (5) Line 375-376: please check sentence starting lower case "with minor modifications, by assessing ".

      We correct the sentence to start with a capital letter: “With minor modifications, by assessing…” (Lines 422-423)

      (6) Line 399: kindly consider adding "was" after "cDNA".

      We revise Line 399 to include “was” as suggested: “cDNA was synthesized…” (Lines 446)

      (7) Line 346-347: consider adding "which" after "monocytes": "We transferred BGpreconditioned monocytes which significantly alleviated clinical symptoms".

      We revise Line 346-347 to include “which” as suggested for grammatical clarity. (Lines 385-386)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Figure 1B shows the PREDICTED force-extension curve for DNA based on a worm-like chain model. Where is the experimental evidence for this curve? This issue is crucial because the F-E curve will decide how and when a catch-bond is induced (if at all it is) as the motor moves against the tensiometer. Unless this is actually measured by some other means, I find it hard to accept all the results based on Figure 1B.

      The Worm-Like-Chain model for the elasticity of DNA was established by early work from the Bustamante lab (Smith et al., 1992) and Marko and Siggia (Marko and Siggia, 1995), and was further validated and refined by the Block lab (Bouchiat et al., 1999; Wang et al., 1997). The 50 nm persistence length is the consensus value, and was shown to be independent of force and extension in Figure 3 of Bouchiat et al (Bouchiat et al., 1999). However, we would like to stress that for our conclusions, the precise details of the Force-Extension relationship of our dsDNA are immaterial. The key point is that the motor stretches the DNA and stalls when it reaches its stall force. Our claim of the catch-bond character of kinesin is based on the longer duration at stall compared to the run duration in the absence of load. Provided that the motor is indeed stalling because it has stretched out the DNA (which is strongly supported by the repeated stalling around the predicted extension corresponding to ~6 pN of force), then the stall duration depends on neither the precise value for the extension nor the precise value of the force at stall.

      (2) The authors can correct me on this, but I believe that all the catch-bond studies using optical traps have exerted a load force that exceeds the actual force generated by the motor. For example, see Figure 2 in reference 42 (Kunwar et al). It is in this regime (load force > force from motor) that the dissociation rate is reduced (catch-bond is activated). Such a regime is never reached in the DNA tensiometer study because of the very construction of the experiment. I am very surprised that this point is overlooked in this manuscript. I am therefore not even sure that the present experiments even induce a catch-bond (in the sense reported for earlier papers).

      It is true that Kunwar et al measured binding durations at super-stall loads and used that to conclude that dynein does act as a catch-bond (but kinesin does not) (Kunwar et al., 2011). However, we would like to correct the reviewer on this one. This approach of exerting super-stall forces and measuring binding durations is in fact less common than the approach of allowing the motor to walk up to stall and measuring the binding duration. This ‘fixed trap’ approach has been used to show catch-bond behavior of dynein (Leidel et al., 2012; Rai et al., 2013) and kinesin (Kuo et al., 2022; Pyrpassopoulos et al., 2020). For the non-processive motor Myosin I, a dynamic force clamp was used to keep the actin filament in place while the myosin generated a single step (Laakso et al., 2008). Because the motor generates the force, these are not superstall forces either.

      (3) I appreciate the concerns about the Vertical force from the optical trap. But that leads to the following questions that have not at all been addressed in this paper:

      (i) Why is the Vertical force only a problem for Kinesins, and not a problem for the dynein studies?

      Actually, we do not claim that vertical force is not a problem for dynein; our data do not speak to this question. There is debate in the literature as to whether dynein has catch bond behavior in the traditional single-bead optical trap geometry - while some studies have measured dynein catch bond behavior (Kunwar et al., 2011; Leidel et al., 2012; Rai et al., 2013), others have found that dynein has slip-bond or ideal-bond behavior (Ezber et al., 2020; Nicholas et al., 2015; Rao et al., 2019). This discrepancy may relate to vertical forces, but not in an obvious way.

      (ii) The authors state that "With this geometry, a kinesin motor pulls against the elastic force of a stretched DNA solely in a direction parallel to the microtubule". Is this really true? What matters is not just how the kinesin pulls the DNA, but also how the DNA pulls on the kinesin. In Figure 1A, what is the guarantee that the DNA is oriented only in the plane of the paper? In fact, the DNA could even be bending transiently in a manner that it pulls the kinesin motor UPWARDS (Vertical force). How are the authors sure that the reaction force between DNA and kinesin is oriented SOLELY along the microtubule?

      We acknowledge that “solely” is an absolute term that is too strong to describe our geometry. We softened this term in our revision to “nearly parallel to the microtubule” (Line 464). In the Geometry Calculations section of Supplementary Methods, we calculate that if the motor and streptavidin are on the same protofilament, the vertical force will be <1% of the horizontal force. We also note that if the motor is on a different protofilament, there will be lateral forces and forces perpendicular to the microtubule surface, except they are oriented toward rather than away from the microtubule. The DNA can surely bend due to thermal forces, but because inertia plays a negligible role at the nanoscale (Howard, 2001; Purcell, 1977), any resulting upward forces will only be thermal forces, which the motor is already subjected to at all times.

      (4) For this study to be really impactful and for some of the above concerns to be addressed, the data should also have included DNA tensiometer experiments with Dynein. I wonder why this was not done?

      As much as we would love to fully characterize dynein here, this paper is about kinesin and it took a substantial effort. The dynein work merits a stand-alone paper.

      While I do like several aspects of the paper, I do not believe that the conclusions are supported by the data presented in this paper for the reasons stated above.

      The three key points the reviewer makes are the validity of the worm-like-chain model, the question of superstall loads, and the role of DNA bending in generating vertical forces. We hope that we have fully addressed these concerns in our responses above.

      Reviewer #2 (Public review):

      Major comments:

      (1) The use of the term "catch bond" is misleading, as the authors do not really mean consistently a catch bond in the classical sense (i.e., a protein-protein interaction having a dissociation rate that decreases with load). Instead, what they mean is that after motor detachment (i.e., after a motor protein dissociating from a tubulin protein), there is a slip state during which the reattachment rate is higher as compared to a motor diffusing in solution. While this may indeed influence the dynamics of bidirectional cargo transport (e.g., during tug-of-war events), the used terms (detachment (with or without slip?), dissociation, rescue, ...) need to be better defined and the results discussed in the context of these definitions. It is very unsatisfactory at the moment, for example, that kinesin-3 is at first not classified as a catch bond, but later on (after tweaking the definitions) it is. In essence, the typical slip/catch bond nomenclature used for protein-protein interaction is not readily applicable for motors with slippage.

      We acknowledge that our treatment of kinesin-3 was confusing. In response, we deleted any reference to kinesin-3 catch-bond in the Results section, and restricted it to the Discussion where it is interpretation. In Line 635 in the Discussion, we softened the statement of catch-bond activity to “…all three dominant kinesin transport families display catch-bond like behavior at stall…”. We acknowledge that, classically, the catch/slip bond nomenclature refers to simple protein-protein interactions and is easier to interpret there. However, the term ‘catch-bond’ has been used in the literature for myosin, dynein and kinesin, and thus we feel that it is sufficiently established to use it here.

      (2) The authors define the stall duration as the time at full load, terminated by >60 nm slips/detachments. Isn't that a problem? Smaller slips are not detected/considered... but are also indicative of a motor dissociation event, i.e., the end of a stall. What is the distribution of the slip distances? If the slip distances follow an exponential decay, a large number of short slips are expected, and the presented data (neglecting those short slips) would be highly distorted.

      The reviewer brings up a good point that there may be undetected slips. To address this question, we plotted the distribution of slip distances for kinesin-3, which by far had the most slip events. As the reviewer suggested, it is indeed an exponential distribution, and we calculated a corrected kinesin-3 stall duration due to these undetected slips. This data and analysis are included as a new Supplementary Figure S8. In the main text on Lines 283-293 we included the following text:

      “It was notable that the kinesin-3 stall durations at high load are longer than the ramp durations at low load, because this indicates that the kinesin-3 off-rate slows with increasing load. However, because kinesin-3 had the most slip events at stall, we were concerned that there may be undetected slip events below the 60 nm threshold of detection that led to an overestimation of the kinesin-3 stall duration. To test this hypothesis, we plotted the distribution of kinesin-3 slip distances at stall, fit an exponential, and calculated the fraction of missed slip events (Fig. S8). From this analysis, we calculated a correction factor of 1.42 that brought the kinesin-3 stall duration down 1.33 s. Notably, this stall duration value is still well above the kinesin-3 ramp duration value of 0.75 s in Fig. 3C and thus does not qualitatively change our conclusions.”

      We thank the reviewer for this suggestion.

      (3) Along the same line: Why do the authors compare the stall duration (without including the time it took the motor to reach stall) to the unloaded single motor run durations? Shouldn't the times of the runs be included?

      The elastic force of the DNA spring is variable as the motor steps up to stall, and so if we included the entire run duration then it would be difficult to specify what force we were comparing to unloaded. More importantly, if we assume that any stepping and detachment behavior is history independent, then it is mathematically proper to take any arbitrary starting point (such as when the motor reaches stall), start the clock there, and measure the distribution of detachments durations relative to that starting point. More importantly, what we do in Fig. 3 is to separate out the ramps from the stalls and, using a statistical model, we compute a separate duration parameter (which is the inverse of the off-rate) for the ramp and the stall. What we find is that the relationship between ramp, stall, and unloaded durations is different for the three motors, which is interesting in itself.

      (4) At many places, it appears too simple that for the biologically relevant processes, mainly/only the load-dependent off-rates of the motors matter. The stall forces and the kind of motor-cargo linkage (e.g., rigid vs. diffusive) do likely also matter. For example: "In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to maintain force generation and, hence, are distinct from true detachment events." I disagree. The kinesin force at reattachment (after slippage) is much smaller than at stall. What helps, however, is that due to the geometry of being held close to the microtubule (either by the DNA in the present case or by the cargo in vivo) the attachment rate is much higher. Note also that upon DNA relaxation, the motor is likely kept close to the microtubule surface, while, for example, when bound to a vesicle, the motor may diffuse away from the microtubule quickly (e.g., reference 20).

      We appreciate the reviewer’s detailed thinking here, and we offer our perspective. As to the first point, we agree that the stall force is relevant and that the rigidity of the motor-cargo linkage will play a role. The goal of the sentence on pulling cargo that the reviewer highlights is to set up our analysis of slips, which we define as rearward displacements that don’t return to the baseline before force generation resumes. We revised this sentence to the following: “In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to continue generating force after a small rearward displacement, rather than fully detaching and ‘resetting’ to zero load.” (Line 339-342)

      It should be noted that, as shown in the model diagram in Fig. 5, we differentiate between the slip state (and recovery from this slip state) and the detached state (and reattachment from this detached state). This delineation is important because, as the reviewer points out, if we are measuring detachment and reattachment with our DNA tensiometer, then the geometry of a vesicle in a cell will be different and diffusion away from the microtubule or elastic recoil perpendicular to the microtubule will suppress this reattachment.

      Our evidence for a slip state in which the motor maintains association with the microtubule comes from optical trapping work by Tokelis et al (Toleikis et al., 2020) and Sudhakar et al (Sudhakar et al., 2021). In particular, Sudhakar used small, high index Germanium microspheres that had a low drag coefficient. They showed that during ‘slip’ events, the relaxation time constant of the bead back to the center of the trap was nearly 10-fold slower than the trap response time, consistent with the motor exerting drag on the microtubule. (With larger beads, the drag of the bead swamps the motor-microtubule friction.) Another piece of support for the motor maintaining association during a slip is work by Ramaiya et al. who used birefringent microspheres to exert and measure rotational torque during kinesin stepping (Ramaiya et al., 2017). In most traces, when the motor returned to baseline following a stall, the torque was dissipated as well, consistent with a ‘detached’ state. However, a slip event is shown in S18a where the motor slips backward while maintaining torque. This is best explained by the motor slipping backward in a state where the heads are associated with the microtubule (at least sufficiently to resist rotational forces). Thus, we term the resumption after slip to be a rescue from the slip state rather than a reattachment from the detached state.

      To finish the point, with the complex geometry of a vesicle, during slip events the motor remains associated with the microtubule and hence primed for recovery. This recovery rate is expected to be the same as for the DNA tensiometer. Following a detachment, however, we agree that there will likely be a higher probability of reattachment in the DNA tensiometer due to proximity effects, whereas with a vesicle any elastic recoil or ‘rolling’ will pull the detached motor away from the microtubule, suppressing reattachment. To address this point, we added in the Discussion on lines 654-656:

      “Additionally, any ‘rolling’ of a spherical cargo following motor detachment will tend to suppress the motor reattachment rate.”

      (5) Why were all motors linked to the neck-coil domain of kinesin-1? Couldn't it be that for normal function, the different coils matter? Autoinhibition can also be circumvented by consistently shortening the constructs.

      We chose this dimerization approach to focus on how the mechoanochemical properties of kinesins vary between the three dominant transport families. We agree that in cells, autoinhibition of both kinesins and dynein likely play roles in regulating bidirectional transport, as will the activity of other regulatory proteins. The native coiled-coils may act as ‘shock absorbers’ due to their compliance, or they might slow the motor reattachment rate due to the relatively large search volumes created by their long lengths (10s of nm). These are topics for future work. By using the neck-coil domain of kinesin-1 for all three motors, we eliminate any differences in autoinhibition or other regulation between the three kinesin families and focus solely on differences in the mechanochemistry of their motor domains.

      (6) I am worried about the neutravidin on the microtubules, which may act as roadblocks (e.g. DOI: 10.1039/b803585g), slip termination sites (maybe without the neutravidin, the rescue rate would be much lower?), and potentially also DNA-interaction sites? At 8 nM neutravidin and the given level of biotinylation, what density of neutravidin do the authors expect on their microtubules? Can the authors rule out that the observed stall events are predominantly the result of a kinesin motor being stopped after a short slippage event at a neutravidin molecule?

      (7) Also, the unloaded runs should be performed on the same microtubules as in the DNA experiments, i.e., with neutravidin. Otherwise, I do not see how the values can be compared.

      To address the question of neutravidin acting as a roadblock, we did the following. Because of the sequence of injections used to assemble the tensiometer in the flow cell, there are often some residual GFP-kinesin motors that aren’t attached to DNA and thus serve as internal controls for unloaded motility on the neutravidin-functionalized Mt. We quantified the run durations of these free kinesin-GFP and found that their run duration was 0.92 s (95% CI: 0.79 to 1.04 by MEMLET). This is slightly lower but not statistically different from the 1.04 s [0.78, 1.31] on control microtubules in Fig 2A. This result is included in Figure S6 in the revised manuscript.

      We don’t have a precise estimate for the amount of neutravidin on the microtubules. Based on Fig. 3C of Korten and Diez (Korten and Diez, 2008), the reduction in the unloaded run duration that we see corresponds to a ~2% biotinylation ratio. We polymerize Mt with 10% biotinylated tubulin and add 8 nM neutravidin to the flow cell, so in principle the microtubules could be 10% biotin-streptavidin coated. However, there are a number of uncertainties that push this estimate lower – a) the precise degree of biotinylation, b) whether the %biotinylated tubulin in polymerized microtubules is lower than the mixing ratio due to unequal incorporation, and 3) what fraction of the biotinylated tubulin are occupied by the neutravidin when using this neutravidin flow-in method. Thus, our best estimate is ~2% biotin-streptavidin functionalization.

      The ramp durations in Fig. 3 provide another argument that biotinylated microtubules are not affecting the motors. Compared to unloaded durations for each motor, the kinesin-1 ramps were longer, the kinesin-2 ramps were the same, and the kinesin-3 ramps were shorter duration. That argues against any systematic effect of biotinylation on motor run durations, with the caveat that family-dependent differences could in principle be masking an effect. The fact that ramp durations aren’t systematically longer or shorter than the unloaded run durations also argues that the stalls we see, which are at the expected extension length of the dsDNA, are not caused by neutravidin roadblocks.

      The final point the reviewer brings up is whether neutravidin may be contributing to the rescues from slips events that we observe. This is difficult to fully rule out. However, because the unloaded run durations aren’t significantly altered by the biotin-streptavidin on the microtubules, we don’t expect the rescue events following a slip to be significantly affected. In principle, we could systematically increase and decrease the biotinylation and see whether the slip rescues change, but we haven’t done this.

      (8) If, as stated, "a portion of kinesin-3 unloaded run durations were limited by the length of the microtubules, meaning the unloaded duration is a lower limit." corrections (such as Kaplan-Meier) should be applied, DOI: 10.1016/j.bpj.2017.09.024.

      (9) Shouldn't Kaplan-Meier also be applied to the ramp durations ... as a ramp may also artificially end upon stall? Also, doesn't the comparison between ramp and stall duration have a problem, as each stall is preceded by a ramp ...and the (maximum) ramp times will depend on the speed of the motor? Kinesin-3 is the fastest motor and will reach stall much faster than kinesin-1. Isn't it obvious that the stall durations are longer than the ramp duration (as seen for all three motors in Figure 3)?

      The reviewer rightly notes the many challenges in estimating the motor off-rates during ramps. To estimate ramp off-rates and as an independent approach to calculating the unloaded and stall durations, we developed a Markov model coupled with Bayesian inference methods to estimate a duration parameter (equivalent to the inverse of the off-rate) for the unloaded, ramp, and stall duration distributions. With the ramps, we have left censoring due to the difficulty in detecting the start of the ramps in the fluctuating baseline, and we have right censoring due to reaching stall (with different censoring of the ramp duration for the three motors due to their different speeds). The Markov model assumes a constant detachment probability and history-independence, and thus is robust even in the face of left and right censoring (details in the Supplementary section). This approach is preferred over Kaplan-Meier because, although non-parametric methods such as K-M make no assumptions for the distribution, they require the user to know exactly where the start time is.

      Regarding the potential underestimate of the kinesin-3 unloaded run duration due to finite microtubule lengths. The first point is that the unloaded duration data in Fig. 2C are quite linear up to 6 s and are well fit by the single-exponential fit (the points above 6 s don’t affect the fit very much). The second point is that when we used our Markov model (which is robust against right censoring) to estimate the unloaded and stall durations, the results agreed with the single-exponential fits very well (Table S2). Specifically, the single-exponential fit for the kinesin-3 unloaded duration was 2.74 s (2.33 – 3.17 s 95% CI) and the estimate from the Markov model was 2.76 (2.28 – 3.34 s 95% CI). Thus, we chose not to make any corrections to the kinesin-3 unloaded run durations due to finite microtubule lengths. To address this point in the revision, we added the following note in Table S2: “* Because the Markov-Bayesian model, which is unaffected by left and right censoring of data gave same unloaded run durations for kinesin-3 as the MEMLET fit, we did not the kinesin-3 unloaded run durations for any right censoring due to finite microtubule lengths.” We also added the following point in the legend of Fig. S1: “A fraction of kinesin-3 unloaded run durations were limited by the length of the microtubules, but fitting to a model that took into account missed events gave a similar mean duration as an exponential fit, and so no correction was made (Table S2).”

      (10) It is not clear what is seen in Figure S6A: It looks like only single motors (green, w/o a DNA molecule) are walking ... Note: the influence of the attached DNA onto the stepping duration of a motor may depend on the DNA conformation (stretched and near to the microtubule (with neutravidin!) in the tethered case and spherically coiled in the untethered case).

      In Figure S6 kymograph, the green traces are GFP-labeled kinesin-1 without DNA attached (which are in excess) and the red diagonal trace is a motor with DNA attached. We clarified this in the revised Figure S6 legend. We agree that the DNA conformation will differ if it is attached and stretched (more linear) versus simply being transported (random coil), but by its nature this control experiment is only addressing random coil DNA.

      (11) Along this line: While the run time of kinesin-1 with DNA (1.4 s) is significantly shorter than the stall time (3.0 s), it is still larger than the unloaded run time (1.0 s). What do the authors think is the origin of this increase?

      We addressed this point in lines 200-212 of the revised manuscript:

      “We carried out two additional control experiments. First, to confirm that the neutravidin used to link the DNA to the microtubule wasn’t affecting kinesin motility, we analyzed the run durations of kinesin-1 motors on neutravidin-coated microtubules and found no change compared to unlabeled microtubules (Fig. S6). Second, we measured the run duration of kinesin-1 linked to a DNA tether that was not bound to the microtubule and thus was being transported (Fig. S6). The kinesin-DNA run duration was 1.40 s, longer than the 1.04 s of motors alone (Fig. 2A). We interpret this longer duration to reflect the slower diffusion constant of the dsDNA relative to the motor alone, which enables motors to transiently detach and rebind before the DNA cargo has diffused away, thus extending the run duration (Block et al., 1990). Notably, this slower diffusion constant should not play a role in the DNA tensiometer geometry because if the motor transiently detaches, it will be pulled backward by the elastic forces of the DNA and detected as a slip or detachment event.“

      (12) "The simplest prediction is that against the low loads experienced during ramps, the detachment rate should match the unloaded detachment rate." I disagree. I would already expect a slight increase.

      Agreed. We changed this text (Lines 265-267) to: “The prediction for a slip bond is that against the low loads experienced during ramps, the detachment rate should be equal to or faster than the unloaded detachment rate.”

      (13) Isn't the model over-defined by fitting the values for the load-dependence of the strong-to-weak transition and fitting the load dependence into the transition to the slip state?

      Essentially, yes, it is overdefined, but that is essentially by design and the model is still very useful. Our goal here was to make as simple a model as possible that could account for the data and use it to compare model parameters for the different motor families. Ignoring the complexity of the slip and detached states, a model with a strong and weak state in the stepping cycle and a single transition out of the stepping cycle is the simplest formulation possible. And having rate constants (k<sub>S-W</sub> and k<sub>slip</sub> in our case) that vary exponentially with load makes thermodynamic sense for modeling mechanochemistry (Howard, 2001). Thus, we were pleasantly surprised that this bare-bones model could recapitulate the unloaded and stall durations for all three motors (Fig. 5C-E).

      (14) "When kinesin-1 was tethered to a glass coverslip via a DNA linker and hydrodynamic forces were imposed on an associated microtubule, kinesin-1 dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (37)." This statement appears not to be true. In reference 37, very similar to the geometry reported here, the microtubules were fixed on the surface, and the stepping of single kinesin motors attached to large beads (to which defined forces were applied by hydrodynamics) via long DNA linkers was studied. In fact, quite a number of statements made in the present manuscript have been made already in ref. 37 (see in particular sections 2.6 and 2.7), and the authors may consider putting their results better into this context in the Introduction and Discussion. It is also noteworthy to discuss that the (admittedly limited) data in ref. 37 does not indicate a "catch-bond" behavior but rather an insensitivity to force over a defined range of forces.

      The reviewer misquoted our sentence. The actual wording of the sentence was: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (Urbanska et al., 2021).” The sentence the reviewer quoted was in a previous version that is available on BioRxiv and perhaps they were reading that version. Nonetheless, in the Discussion of the revision, we added text to note that this behavior is indicative of an ideal bond (not a catch-bond) on Lines 480-483: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics and instead characteristic of an ideal-bond.” We also added a sentence in the Introduction highlighting this work, Lines 84-87: “Fourth, when kinesin-1 was connected to a bead through a micron-long segment of DNA and hydrodynamic forces were imposed on the bead, motor interaction times were insensitive to hindering loads up to 3 pN, indicative of an ideal-bond.”

      Reviewer #3 (Public review):

      The authors attribute the differences in the behaviour of kinesins when pulling against a DNA tether compared to an optical trap to the differences in the perpendicular forces. However, the compliance is also much different in these two experiments. The optical trap acts like a ~ linear spring with stiffness ~ 0.05 pN/nm. The dsDNA tether is an entropic spring, with negligible stiffness at low extensions and very high compliance once the tether is extended to its contour length (Fig. 1B). The effect of the compliance on the results should be addressed in the manuscript.

      This is an interesting point. We added the following paragraph in Lines 101-111 in the Geometry Consideration section of the Supplementary Methods.

      “Another consideration when comparing the DNA tensiometer to optical trap measurements is the relative stiffness of the trap and dsDNA. Optical trap stiffnesses are generally in the range of 0.05 pN/nm [12,13]. To calculate the predicted stiffness of the dsDNA spring, we computed the slope of theoretical force-extension curve in Fig. 1B. The stiffness is highly nonlinear and is <0.001 pN/nM below 650 nm extension. At the predicted stall force of 6 pN (960 nm extension), the dsDNA stiffness ~0.2 pN/nm, which is stiffer than most optical traps, but it is similar to the estimated 0.3 pN/nm stiffness of kinesin motors themselves[12,13]. An 8 nm step at this stiffness leads to a 1.6 pN jump in force, so it is reasonable to expect that motors are dynamically stepping at stall. Therefore, there is no reason to expect that stiffness differences between optical traps and the dsDNA spring are affecting the motor detachment kinetics.”

      Compared to an optical trapping assay, the motors are also tethered closer to the microtubule in this geometry. In an optical trap assay, the bead could rotate when the kinesin is not bound. The authors should discuss how this tethering is expected to affect the kinesin reattachment and slipping. While likely outside the scope of this study, it would be interesting to compare the static tether used here with a dynamic tether like MAP7 or the CAP-GLY domain of p150glued.

      Please see our response to Reviewer #2 Major Comment #4 above, which asks this same question in the context of intracellular cargo. In response to the point from Reviewer #3, we added the following sentence on Lines 654-656: “Additionally, any ‘rolling’ of a spherical cargo following motor detachment will tend to suppress the motor reattachment rate.”

      Regarding a dynamic tether, we agree that’s interesting – there are kinesins that have a second, non-canonical binding site that achieves this tethering (e.g. ncd and Cin8); p150glued likely does this naturally for dynein-dynactin-activator complexes; and we speculated in a review some years ago (Hancock, 2014) that during bidirectional transport kinesin and dynein may act as dynamic tethers for one another when not engaged, enhancing the activity of the opposing motor.

      In the single-molecule extension traces (Figure 1F-H; S3), the kinesin-2 traces often show jumps in position at the beginning of runs (e.g., the four runs from ~4-13 s in Fig. 1G). These jumps are not apparent in the kinesin-1 and -3 traces. What is the explanation? Is kinesin-2 binding accelerated by resisting loads more strongly than kinesin-1 and -3?

      We agree that at first glance those jumps are puzzling. To investigate this question the first thing we did was to go back to our tensiometer dataset and look systematically at jumps for all three motors. We found roughly 4-6 large jumps like these for all three motors (kinesin-1: 250 +/- 99 nm (mean +/- SD; N=5); kinesin-2: 249 +/- 165 nm (N=6); kinesin-3: 490 +/- 231 nm (N=4)). Thus, although the apparent jumps may be more pronounced due to the specific rebinding kinetics of kinesin-2, this behavior is not unique to this motor. (Note that the motor binding position distribution in Fig. S2 is taken from initial binding positions that follow a clear period of detachment; thus, not all jumps are captured there.)

      Our interpretation is that these apparent jumps are simply a reflection of the long length and high compliance of the dsDNA tether. For instance, below 650 nm extension the stiffness, k <0.001 pN/nM (see Reviewer #3, point #1 above). Thus, we expect large fluctuations of the tethered motor when not bound to the microtubule. One reason that these events look like ‘jumps’ is that the sub-ms fluctuations during detached periods are not captured by the ~25 fps movies (40 ms frame acquisition time). Instead, the fitted Qdot position represents the average position during the acquisition window. Actually, due to these rapid fluctuations (and the limited depth of the TIRF illumination field) the position often can’t be determined during these periods of fluctuation (e.g. see gaps at ~2.5 s, 11 s and 24 s in Fig. 1F).

      When comparing the durations of unloaded and stall events (Fig. 2), there is a potential for bias in the measurement, where very long unloaded runs cannot be observed due to the limited length of the microtubule (Thompson, Hoeprich, and Berger, 2013), while the duration of tethered runs is only limited by photobleaching. Was the possible censoring of the results addressed in the analysis?

      Yes. Please see response to Reviewer #2 points (8) and (9) above.

      The mathematical model is helpful in interpreting the data. To assess how the "slip" state contributes to the association kinetics, it would be helpful to compare the proposed model with a similar model with no slip state. Could the slips be explained by fast reattachments from the detached state?

      In the model, the slip state and the detached states are conceptually similar; they only differ in the sequence (slip to detached) and the transition rates into and out of them. The simple answer is: yes, the slips could be explained by fast reattachments from the detached state. In that case, the slip state and recovery could be called a “detached state with fast reattachment kinetics”. However, the key data for defining the kinetics of the slip and detached states is the distribution of Recovery times shown in Fig. 4D-F, which required a triple exponential to account for all of the data. If we simplified the model by eliminating the slip state and incorporating fast reattachment from a single detached state, then the distribution of Recovery times would be a single-exponential with a time constant equivalent to t<sub>1</sub>, which would be a poor fit to the experimental distributions in Fig. 4D-F.

      Recommendations for the authors: 

      Reviewing Editor Comments:

      The reviewers are in agreement with the motivation and approach of this study. The use of DNA tethers is an important advance in tethering motor proteins to gain insight into how motors respond to load. However, all 3 reviewers express reservations on how well the results support the claims. In particular, the use of the term catch bond was problematic, with Reviewer #2 suggesting some alternative nomenclature. Reviewer #1 expressed concern with experimental evidence for the predicted force-extension curve shown in Figure 1. I agree with the reviewers that additional experimental evidence would be required to conclude the catch-bond detachment kinetics of kinesin.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) By eye, the run lengths, e.g., of kin-1 look very long in Figure S1 ... certainly above the expected 1 µm. Please check and comment.

      We agree that the long runs do stick out by eye in this figure. To address this point, we analyzed the run lengths and run times from the kymograph shown in Fig. S1. Fitting the run duration distribution gave t = 1.31 s with a 95% CI of 0.96 to 1.67. This is slightly longer than the 1.04 s duration in Fig. 2A, but the 95% CI include this population mean, and so the S1 data are not statistically significantly different. The run time distribution from the S1 kymograph is given in Author response image 1.

      Author response image 1.

      (2) The upper right kymograph in Figure 4A does not show a motor return to the baseline. Also, the scale bars, etc., are unreadable. Please modify.

      Our purpose for showing the kymographs in Fig. 4A was to show the specific features of slips and fast and slow reattachment. Because we blew up the kymographs to show those specific features, it precluded us from showing the entire return to baseline. As suggested, we magnified the scale bars and the labels on the kymograph labels to make them readable.

      Reviewer #3 (Recommendations for the authors):

      (1) The frequent references to 95% confidence intervals disrupt the flow of the text. Perhaps the confidence intervals could be listed in a table rather than in the body of the text.

      We deleted those from the text; they are shown in Fig. 2D and listed in Table S2.

      We appreciate the efforts and helpful suggestions of all three reviewers and the Editor.

      References

      Block, S.M., L.S. Goldstein, and B.J. Schnapp. 1990. Bead movement by single kinesin molecules studied with optical tweezers. Nature. 348:348-352.

      Bouchiat, C., M.D. Wang, J. Allemand, T. Strick, S.M. Block, and V. Croquette. 1999. Estimating the persistence length of a worm-like chain molecule from force-extension measurements. Biophys J. 76:409-413.

      Ezber, Y., V. Belyy, S. Can, and A. Yildiz. 2020. Dynein Harnesses Active Fluctuations of Microtubules for Faster Movement. Nat Phys. 16:312-316.

      Hancock, W.O. 2014. Bidirectional cargo transport: moving beyond tug of war. Nat Rev Mol Cell Biol. 15:615-628.

      Howard, J. 2001. Mechanics of Motor Proteins and the Cytoskeleton. Sinauer Associates, Inc., Sunderland, MA. 367 pp.

      Korten, T., and S. Diez. 2008. Setting up roadblocks for kinesin-1: mechanism for the selective speed control of cargo-carrying microtubules. Lab Chip. 8:1441-1447.

      Kunwar, A., S.K. Tripathy, J. Xu, M.K. Mattson, P. Anand, R. Sigua, M. Vershinin, R.J. McKenney, C.C. Yu, A. Mogilner, and S.P. Gross. 2011. Mechanical stochastic tug-ofwar models cannot explain bidirectional lipid-droplet transport. Proc Natl Acad Sci U S A. 108:18960-18965.

      Kuo, Y.W., M. Mahamdeh, Y. Tuna y J. Howard. 2022. The force required to remove tubulin from the microtubule lattice by pulling on its alpha-tubulin C-terminal tail. Nature communications. 13:3651.

      Laakso, J.M., J.H. Lewis, H. Shuman, and E.M. Ostap. 2008. Myosin I can act as a molecular force sensor. Science. 321:133-136.

      Leidel, C., R.A. Longoria, F.M. Gutierrez, and G.T. Shubeita. 2012. Measuring molecular motor forces in vivo: implications for tug-of-war models of bidirectional transport. Biophys J. 103:492-500.

      Marko, J.F., and E.D. Siggia. 1995. Stretching DNA. Macromolecules. 28:8759-8770.

      Nicholas, M.P., F. Berger, L. Rao, S. Brenner, C. Cho, and A. Gennerich. 2015. Cytoplasmic dynein regulates its attachment to microtubules via nucleotide state-switched mechanosensing at multiple AAA domains. Proc Natl Acad Sci U S A. 112:63716376.

      Purcell, E.M. 1977. Life at low Reynolds Number. Amer J. Phys. 45:3-11.

      Pyrpassopoulos, S., H. Shuman, and E.M. Ostap. 2020. Modulation of Kinesin's Load-Bearing Capacity by Force Geometry and the Microtubule Track. Biophys J. 118:243253.

      Rai, A.K., A. Rai, A.J. Ramaiya, R. Jha, and R. Mallik. 2013. Molecular adaptations allow dynein to generate large collective forces inside cells. Cell. 152:172-182.

      Ramaiya, A., B. Roy, M. Bugiel, and E. Schaher. 2017. Kinesin rotates unidirectionally and generates torque while walking on microtubules. Proc Natl Acad Sci U S A. 114:10894-10899.

      Rao, L., F. Berger, M.P. Nicholas, and A. Gennerich. 2019. Molecular mechanism of cytoplasmic dynein tension sensing. Nature communications. 10:3332.

      Smith, S.B., L. Finzi, and C. Bustamante. 1992. Direct mechanical measurements of the elasticity of single DNA molecules by using magnetic beads. Science. 258:11221126.

      Sudhakar, S., M.K. Abdosamadi, T.J. Jachowski, M. Bugiel, A. Jannasch, and E. Schaher. 2021. Germanium nanospheres for ultraresolution picotensiometry of kinesin motors. Science. 371.

      Toleikis, A., N.J. Carter, and R.A. Cross. 2020. Backstepping Mechanism of Kinesin-1. Biophys J. 119:1984-1994.

      Urbanska, M., A. Ludecke, W.J. Walter, A.M. van Oijen, K.E. Duderstadt, and S. Diez. 2021. Highly-Parallel Microfluidics-Based Force Spectroscopy on Single Cytoskeletal Motors. Small. 17: e2007388.

      Wang, M.D., H. Yin, R. Landick, J. Gelles, and S.M. Block. 1997. Stretching DNA with optical tweezers. Biophys J. 72:1335-1346.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This important study functionally profiled ligands targeting the LXR nuclear receptors using biochemical assays in order to classify ligands according to pharmacological functions. Overall, the evidence is solid, but nuances in the reconstituted biochemical assays and cellular studies and terminology of ligand pharmacology limit the potential impact of the study. This work will be of interest to scientists interested in nuclear receptor pharmacology.

      Strengths:

      (1) The authors rigorously tested their ligand set in CRTs for several nuclear receptors that could display ligand-dependent cross-talk with LXR cellular signaling and found that all compounds display LXR selectivity when used at ~1 µM.

      (2) The authors tested the ligand set for selectivity against two LXR isoforms (alpha and beta). Most compounds were found to be LXRbeta-specific.

      The majority of ligands were found to be LXRβ-selective; however, examples of non-selective and LXRα-selective ligands were identified. It should be noted that this is a small compound set of literature ligands with reasonable structural diversity.

      (3) The authors performed extensive LXR CRTs, performed correlation analysis to cellular transcription and gene expression, and classification profiling using heatmap analysis-seeking to use relatively easy-to-collect biochemical assays with purified ligand-binding domain (LBD) protein to explain the complex activity of full-length LXR-mediated transcription.

      Weaknesses:

      (1) The descriptions of some observations lack detail, which limits understanding of some key concepts.

      Changes to the submitted manuscript hopefully add clarity. Several observations reinforce aspects of the literature and are a corollary of the observation that the majority of ligands with agonist activity more strongly stabilize/induce coactivator-bound complexes with LXRβ. This results in general LXRβ selectivity for agonists and also more variability in the response of LXRα to different ligand chemotypes. The most significant observations were for partial agonists that stabilize corepressor binding, in particular of the complex with LXRα.

      (2) The presence of endogenous NR ligands within cells may confound the correlation of ligand activity of cellular assays to biochemical assay data.

      This is generally a confounding factor for ligands with apparent antagonist activity and is a source of ambiguity in designating inverse agonists across the nuclear receptor research field. Theoretically, this could also impact weak and partial agonists; however, this requires further study.

      (3) The normalization of biochemical assay data could confound the classification of graded activity ligands.

      Normalization to TO (100%) and vehicle (0%) is applied to most data. It is not clear how this confounds data interpretation. TO is a very reliable and reproducible agonist without significant bias towards LXR isoforms.

      (4) The presence of >1 coregulator peptide in the biplex (n=2 peptides) CRT (pCRT) format will bias the LBD conformation towards the peptide-bound form with the highest binding affinity, which will impact potency and interpretation of TR-FRET data.

      Multiplex assays must be optimized to balance binding affinity of the coregulator peptides (bear in mind these are somewhat-artificial small peptide constructs that are hoped to reflect binding of the much larger coregulator protein itself). Since the dominant theory of NR tissue-selectivity is based on the cellular availability (read concentration) of coregulators, this balance exists in a cellular context.

      (5) Correlation graphical plots lack sufficient statistical testing.

      Correlations are now supported by statistical data and we have added hierarchical clustering analysis.

      (6) Some of the proposed ligand pharmacology nomenclature is not clear and deviates from classifications used currently in the field (e.g., hard and soft antagonist; weak vs. partial agonist, definition of an inverse agonist that is not the opposite function to an agonist).

      Classifications used currently in the field vary from one NR to another and the use of partial and inverse agonist, in particular, is usually qualitative, unclear, and often misleading. We expand on these classifications with respect to our use of labels to classify pCRT response to LXR ligands. In agreement with the reviewer, we have replaced IA (inverse agonist) with (RA) reverse agonist as a label specifically associated with pCRT analysis.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript by Laham and co-workers, the authors profiled structurally diverse LXR ligands via a coregulator TR-FRET (CRT) assay for their ability to recruit coactivators and kick off corepressors, while identifying coregulator preference and LXR isoform selectivity.

      The relative ligand potencies measured via CRT for the two LXR isoforms were correlated with ABCA1 induction or lipogenic activation of SRE, depending on cellular contexts (i.e, astrocytoma or hepatocarcinoma cells). While these correlations are interesting, there is some leeway to improve the quantitative presentation of these correlations. Finally, the CRT signatures were correlated with the structural stabilization of the LXR: coregulator complexes. In aggregate, this study curated a set of LXR ligands with disparate agonism signatures that may guide the design of future nonlipogenic LXR agonists with potential therapeutic applications for cardiovascular disease, Alzheimer's, and type 2 diabetes, without inducing mechanisms that promote fat/lipid production.

      Strengths:

      This study has many strengths, from curating an excellent LXR compound set to the thoughtful design of the CRT and cellular assays. The design of a multiplexed precision CRT (pCRT) assay that detects corepressor displacement as a function of ligand-induced coactivator recruitment is quite impressive, as it allows measurement of ligand potencies to displace corepressors in the presence of coactivators, which cannot be achieved in a regular CRT assay that looks at coactivator recruitment and corepressor dissociation in separate experiments.

      Weaknesses:

      I did not identify any major weaknesses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Page 2. "The endogenous ligands ... activate LXR via canonical or alternate mechanisms." What is an alternate mechanism?

      Small modifications to Fig. 1 caption identify a mechanism alternative to the canonical mechanism: LXR transcriptional complexes are RXR heterodimers that can be activated by a canonical mechanism of coregulator recruitment or an alternative de-repression mechanism

      (2) Page 5: "Notably, the 25 amino acid SRC-1 peptide is the only coactivator tested for LXR binding that has the fluorophore remote from the coactivator peptide." What does this mean, and could it influence the results?

      The sentence has been expanded to clarify the meaning. Notably, the 25 amino acid SRC-1 peptide is the only coactivator, amongst those tested for LXR binding, which has the fluorophore remote from the coactivator peptide: i.e., the only coactivator tested that uses a fluorophore labeled anti-tag antibody to bind the tagged coactivator rather than a fluorophore-labeled coactivator. In methods based on fluorescent tags (CRT, TR-FRET, fluorescence polarization, etc.), a fluorophore that interacts directly with the receptor can generate a maximal signal that differs depending on this interaction: i.e. the identity of the coregulator used in CRT can influence the response. As seen in Figures 6 and S6, maximal response is dependent on ligand and coregulator.

      (3) Page 5: "The [CRT] assay measures the EC50 for coactivator recruitment, a measure of ligand binding affinity." The dose-dependent activity in the CRT assays is more classically defined as a functional "potency", not "affinity".

      The text is changed to remove “measure of affinity”: The assay measures the ligand-dependent EC<sub>50</sub> for ligand-induced coactivator recruitment to LXR; the affinity of the ligand for the LXR:coregulator complex contributes to this potency

      (4) Page 5: "Perhaps surprisingly, considering the description of multiple LXR ligands as partial agonists, most agonists studied gave maximal response at the same level as T0, behaving as full agonists." Can the authors speculate as to why partial agonist activity is not observed in their CRT assays when it has been observed in CRT assays for other nuclear receptors?

      This section has been reworded and please note the apparent partial agonist activity observed in CRT assays for multiple coactivators as shown in Figures 6 and S6 (also see (2) above). Although many LXR ligands have been reported to display partial agonist activity, most agonists studied in this specific biotin-SRC-1 CRT assay, gave maximal response at the same level as T0, behaving as full agonists.

      (5) Page 5: "Conformational cooperativity of LBD residues beyond these two amino acids leads to different conformations of Leu274 and Ala275 that generally favor ligand binding to LXRβ." Where are these residues located? Why are they important?

      We have simplified this paragraph that introduces the interesting observations and interpretation of Ding et al. to illustrate potential contributions to isoform selectivity: The ligand binding pockets of the two LXR isoforms differ by only one amino acid located in helix-3. (H3: LXRα-Val263 and LXRβ-Ile277) Interestingly, correction of this difference by mutation of these residues to alanine (V263A and I277A) was observed to lower, but not to ablate isoform selectivity in reporter assays.[108] Supported by modeling studies, this observation by Ding et al. led to the suggestion that conformational cooperativity of LBD residues beyond these two amino acids, generally favors ligand binding to LXRβ. Therefore, most reported ligands, including those examined in the current work, are LXRβ-selective or non-selective.

      (6) Some correlation plots are described to show "poor" correlations without showing the underlying statistical fits. All correlation plots should show Pearson and Spearman correlation coefficients and p-values within the figures.

      This section of the manuscript has been completely reworked with full correlation analysis and stats . There is no substantive change in data interpretation.

      (7) The normalization of TR-FRET data could introduce undesired bias when comparing activities. The methods section should provide more details about normalization of CRT data, including stating whether the control compounds' activity data were collected on the same CRT 384-well plate on the same day, or different plates, or different days, etc.

      This is now clarified in SI materials and methods section. In-plate controls are always used.

      (8) The authors describe their pCRT assay as "multiplex", whereas "biplex" might be more accurate, as they only used two peptides.

      Biplex is commonly used referring to qPCR. Bio-Plex is a commercial version of an antibody assay. Duplex is obviously a term used in nucleic acid research. Therefore, multiplex is a simpler, more generic term that we feel is suitable and can be extended to add a third coregulator.

      (9) The pCRT assays use the same peptide concentrations (200 nM). However, the peptides will have different affinities for the LBD, which may bias ligand-dependent pCRT profiles. The peptide that binds with higher affinity in the absence of ligand will bias the LBD conformation and impact ligand affinity. Can the authors comment on any limitations of the pCRT approach vs. a normal CRT? Did the authors perform any optimization to see if increasing peptide concentrations (>200 nM) or having different concentrations (e.g., 400 nM SRC1 and 200 nM NCorR2) influences the pCRT data, extracted parameters, correlations, etc.?

      As we write in the Limitations section, our assays are focused on ligand-dependence, whereas other excellent studies focus more on coregulator-dependence. The length and affinity of peptide constructs varies and therefore it is important to “balance” corepressor and coactivator concentrations. The most important conclusions from our pCRT assays concern the ability of some ligands to stabilize corepressor binding in the monoplex CRT and the universal ability of coactivator complex stabilization to eject the corepressor in the multiplex assay. Furthermore, without measurements and correlations in “natural” cellular contexts, the CRT data obtained in cell-free conditions is somewhat artificial. We evaluated a range of peptide concentrations to assess signal-to-background and overall assay performance. Each new receptor added to the panel underwent rigorous optimization to establish robust and reliable assay conditions. This included identifying a suitable positive control for each receptor, determining the optimal coregulator selection and concentration, and refining other key parameters such as buffer composition and total well volume. The concentrations reported represent the optimized balance—producing a strong, reproducible signal without oversaturation or disproportionate contribution from any individual assay component.

      (10) Page 11. The authors introduce a few ligand classification terms that are not standard in the field and unclear: "soft" vs. "hard" antagonist, "weak" vs. "partial" agonist, and their definition of an inverse agonist that, in classical pharmacologic terms, should have an opposite (inverse) function to an agonist. Furthermore, the presence of endogenous LXR ligands within cells may confound the correlation of ligand activity of cellular assays to biochemical assay data. See the following paper for an example of ligand-dependent classification and activation mechanisms when there are endogenous cellular ligands at play: https://elifesciences.org/articles/47172

      The paragraph discussing nomenclature went through many iterations of terminology and a further paragraph was removed that discussed problems with ligand classification in the broader field of NR pharmacology: this has now been added back. We apologise for not citing the excellent Strutzenberg et al. paper on RORa pharmacology, which is now included. In this paper, Griffin and co-workers also use terms that are not standard in the field, such as “silent agonist”, which covers, in part, ligands that we describe as “weak agonists”. A standard, definitive lexicon of terms across NRs is unfortunately problematic. We have added 2 paragraphs:

      The nomenclature for NR ligands often lacks precision and differs across NR classes. SERM (a subset of selective NR modulator) is used to describe varied families of ER ligands that show tissue-selective agonist and/or antagonist actions. Unfortunately, “partial agonist” is also widely used to describe SERMs, even though its use is usually pharmacologically incorrect and biased agonist may be a more accurate label.[124] The majority of reported ER ligands are SERMs, even some that cause ER degradation, because they are transcriptionally active. Consequently, the term “pure antagonist” (PA) has been used to differentiate transcriptionally null ligands[125]; although, pure antagonist/antiestrogen was originally introduced to describe antagonism of both AF1 and AF2 functions.[90]

      Elegant work by Griffin’s team on RAR-related orphan receptor C (RORɣ) is interesting, because it used a combination of HDX-MS and CRT and defined categories of RORɣ ligands.[126] In addition to full agonist, “silent agonist” was introduced to include endogenous and synthetic partial agonists; although, by definition, partial agonists should antagonize full agonists. On the antagonist side of the spectrum, “active antagonist” was used to describe ligands that reduce cellular activity to baseline; and “inverse agonist” for ligands that reduce cellular transcription below baseline and induce recruitment of corepressors. Curiously, inverse agonist has almost never been used to describe ER ligands and is used frequently for other NR ligands, mostly for ligands that reduce transcription below baseline, without any evidence for corepressor recruitment. GSK2033 and SR9238 show inverse agonist activity in cells (Figs 3, 5); however, neither is capable of recruiting SMRT2 or NCOR2 to LXR (Fig. 7).

      (11) Figure 9A and Figure S8. Could hierarchical clustering analysis be used to more rigorously compare the activities of the ligands?

      We have now added hierarchical clustering analysis (Figs 4 S4). It should be noted that the value of such an analysis is much higher when the number of ligands is increased.

      (12) How does cellular potency correlate to pCRT vs. CRT potencies? Does pCRT better explain cellular potency?

      We have added this specific correlation (multiplex CRT vs. monoplex CRT).

      (13) The authors should provide an SI table of parameters (potency values) used for correlation and heatmap analyses.

      Tables have been added to SI accordingly.

      Reviewer #2 (Recommendations for the authors):

      This manuscript has many strengths, but can still be improved by addressing the following critiques:

      (1) I am surprised the team did not find a ligand with a higher efficacy than T0. Please would you explain why T0 seems to have maxed out ligand efficacy for both LXRalpha and LXRbeta?

      Several ligands gave superior efficacy to T0 in cell-based reporter assays and in CRT assays shown in Figures 6 and S6: AZ876, BE1218, and MK9 gave maximal response higher than that of T0.

      (2) In the subsection, "Activity and isoform selectivity of LXR ligands", you mentioned that "The assay measures the EC50 for coactivator recruitment, a measure of ligand binding affinity." This is incorrect. EC50 is a measure of ligand potency, not affinity.

      See Reviewer-1 (3)

      (3) In Figure 3 it is unclear what was used to normalize the antagonist responses in Panel F. Also, I recommend changing the y-axis of Panel F to -100 to 50 to get a better view of the response.

      This has been clarified: zero is vehicle control. Change to y-axis is made.

      (4) In Figure 4, the correlation R-squared values should be presented as a Table to have a better qualitative assessment of the correlations. It is challenging to judge which correlations are better by relying only on visual inspection. I also recommend moving the two panels from Figure S3 to Figure 4 as panels E and F.

      Extensive changes to Figure 4 have been made in response to this comment and that of Reviewer 1, who wanted these values in the figures: Reviewer-1 points (6) and (12).

      (5) In Figure 5, the fold changes in panels G, H, and I could better be presented as a bar graph. Also, the cytotoxicity of ligands needs to be assessed. For instance, in BE1218, there is a sharp decrease in fold change going from ~1 uM to ~10 uM. This will also confirm if the downward trends for SR9238 and GSK2033 are "real" and not as a result of cells dying off at higher ligand concentrations.

      Across our many studies on potent NR ligands, at concentrations above 3 uM, cell growth inhibition is observed. This is true for ER ligands, such as tamoxifen, with explanations in the literature including membrane disruption and low-affinity cytoplasmic binding proteins. We include cell viability measurements in Supplemental as a specific response to the reviewer’s query. There is no loss of cell viability in HepG2 cells.

      (6) Several ligands induce recruitment of coactivators but with minimal ability to displace corepressors. Physiologically, what would be the expected effect of these ligands on LXR activity?\

      We have defined such ligands from pCRT analysis as weak agonists (WA); however, pCRT shows WA ligands induce corepressor loss in the presence of coactivator. Depending on coregulator balance and isoform expression and the importance of the derepression mechanism in a specific cell context, WA ligands might be expected to be differentiated from SA (strong agonist) ligands.

      (7) In the subsection, "synchronous coregulator recruitment by multiplex, precision CRT" you mentioned that "For LXRbeta, the correlation between SRC1 recruitment in monoplex and multiplexed CRT is good," but the data is not shown. I think it would be better to show this data for transparency.

      See query (4) and Reviewer-1. Done.

      (8) In Figure 9, Panel A, the heat map is quantitated as 0-150. Is this fold change? If so, add this label to the figure legend.

      It is Normalized Response as %, which is now added.

      (9) In Figure 9, Panel B, please explain why in all cases, CoA-bound LXR resides at a higher energy level than the CoR-bound, and the apo LXR is at a lower energy level than the CoA-bound protein. A coregulator-bound (holo) protein structure is generally a lower energy (more stable) structure than the unbound (apo) protein. The binding of a coregulator stabilizes the protein's conformation and shifts the equilibrium towards a more thermodynamically favorable state. Using the same argument, it does not make sense to me that the CoR-bound LXR is on the same energy level as the apo LXR.

      This schema reflects our observations in pCRT. No signal was observed for coactivator-bound (holo) protein in the absence of ligand; whereas, a signal was observed for corepressor-bound (holo) protein in the absence of ligand. Therefore, the CoA-bound LXR is higher energy than apo-LXR (+ unbound CoA). Conversely, the signal for CoR-bound LXR can be reduced or increased by ligands, requiring the CoA-bound LXR to be of similar energy to apo-LXR (+ unbound CoR).

      (10) In the Figure 9b caption, "measured at 1uM" pertains to the concentration of ligand or coregulator? This is unclear. You should report the concentration of both ligand and coregulator.

      Clarified in caption.

      (11) In Figure S4, signal for SR9238 shoot up to ~300 units for ligand concentrations >3 uM. Please explain what could have contributed to this anomalous activation and why this was moved to the Supplementary File and not shown in the main figure (Figure 5).

      The HepG2-SRE assay is a nano-luc reporter assay, unlike the CCF-ABCA1 that is a firefly luciferase assay. There is substantial anecdotal evidence that furimazine/nano-luc is susceptible to stabilization enhancement. The RT-PCR data presented in Fig. 5 confirms that this is an artifact for some biphenyl sulfones.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents results supporting a model that tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the stem cell niche and inhibit the differentiation of neighboring cells. The valuable findings show that GSC tumors often contain non-mutant cells whose differentiation is suppressed by the GSC tumorous cells. However, the evidence showing that the GSC tumors produce BMP ligands to suppress differentiation of non-mutant cells is incomplete. It could be strengthened by the use of sensitive RNA in situ hybridization approaches.

      Thank you for your valuable assessment. RNA in situ hybridization evidence has been added to the revised manuscript (Figure 5A-D) to support that GSC tumors produce BMP ligands.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This preprint from Shaowei Zhao and colleagues presents results that suggest tumorous germline stem cells (GSCs) in the Drosophila ovary mimic the ovarian stem cell niche and inhibit the differentiation of neighboring non-mutant GSC-like cells. The authors use FRT-mediated clonal analysis driven by a germline-specific gene (nos-Gal4, UASp-flp) to induce GSC-like cells mutant for bam or bam's co-factor bgcn. Bam-mutant or bgcn-mutant germ cells produce tumors in the stem cell compartment (the germarium) of the ovary (Figure 1). These tumors contain non-mutant cells - termed SGC for single-germ cells. 75% of SGCs do not exhibit signs of differentiation (as assessed by bamP-GFP) (Figure 2). The authors demonstrate that block in differentiation in SGC is a result of suppression of bam expression (Figure 2). They present data suggesting that in 73% of SGCs, BMP signaling is low (assessed by dad-lacZ) (Figure 3) and proliferation is less in SGCs vs GSCs. They present genetic evidence that mutations in BMP pathway receptors and transcription factors suppress some of the non-autonomous effects exhibited by SGCs within bam-mutant tumors (Figure 4). They show data that bam-mutant cells secrete Dpp, but this data is not compelling (see below) (Figure 5). They provide genetic data that loss of BMP ligands (dpp and gbb) suppresses the appearance of SGCs in bam-mutant tumors (Figure 6). Taken together, their data support a model in which bam-mutant GSC-like cells produce BMPs that act on non-mutant cells (i.e., SGCs) to prevent their differentiation, similar to what is seen in the ovarian stem cell niche.

      Strengths:

      (1) Use of an excellent and established model for tumorous cells in a stem cell microenvironment.

      (2) Powerful genetics allow them to test various factors in the tumorous vs non-tumorous cells.

      (3) Appropriate use of quantification and statistics.

      We greatly appreciate your valuable comments.

      Weaknesses:

      (1) What is the frequency of SGCs in nos>flp; bam-mutant tumors? For example, are they seen in every germarium, or in some germaria, etc, or in a few germaria?

      This is a good question. Because the SGC phenotype depends on the presence of both germline tumor clones and out-of-niche wild-type germ cells, our quantification was restricted to germaria containing both. In 14-day-old fly ovaries, 70% of germaria (432/618) met this criterion (Line 103). Each of them contained an average of 1.5 SGCs (Figure 1K).

      (2) Does the breakdown in clonality vary when they induce hs-flp clones in adults as opposed to in larvae/pupae?

      Our attempts to induce ovarian hs-FLP germline clones by heat-shocking adult flies were unsuccessful, with very few clones being observed. Therefore, we shifted our approach to an earlier developmental stage. Successful induction was achieved by subjecting late-L3/early-pupal animals to a twice-daily heatshock at 37°C for 6 consecutive days (2 hours per session with a 6-hour interval, see Lines 331-335) (Zhao et al., 2018).

      (3) Approximately 20-25% of SGCs are bam+, dad-LacZ+. Firstly, how do the authors explain this? Secondly, of the 70-75% of SGCs that have no/low BMP signaling, the authors should perform additional character rization using markers that are expressed in GSCs (i.e., Sex lethal and nanos).

      These 20-25% of SGCs are bamP-GFP<sup>+</sup> dad-lacZ<sup>-</sup>, not bam<sup>+</sup> dad-lacZ<sup>+</sup> (see Figure 2C and 3D). They would be cystoblast-like cells that may have initiated a differentiation program toward forming germline cysts (see Lines 122-130). The 70-75% of SGCs that have low BMP signaling exhibit GSC-like properties, including: 1) dot-like spectrosomes; 2) dad-lacZ positivity; 3) absence of bamP-GFP expression. While additional markers would be beneficial, we think that this combination of properties is sufficient to classify these cells as GSC-like.

      (4) All experiments except Figure 1I (where a single germarium with no quantification) were performed with nos-Gal4, UASp-flp. Have the authors performed any of the phenotypic characterizations (i.e., figures other than Figure 1) with hs-flp?

      Yes, we initially identified the SGC phenotype through hs-FLP-mediated mosaic analysis of bam or bgcn mutant in ovaries. However, as noted in our response to Weakness (2), this approach was very labor-intensive. Therefore, we switched to using the more convenient nos>FLP system for subsequent experiments. To our observation, there was no difference in inducing the SGC phenotype by these two approaches.

      (5) Does the number of SGCs change with the age of the female? The experiments were all performed in 14-day-old adult females. What happens when they look at a young female (like 2-day-old). I assume that the nos>flp is working in larval and pupal stages, and so the phenotype should be present in young females. Why did the authors choose this later age? For example, is the phenotype more robust in older females? Or do you see more SGCs at later time points?

      These are very good questions. The SGC phenotype was consistent over the 14-day analysis period (Figure 1J) and was specifically dependent on the presence of germline tumor clones. In 14-day-old fly ovaries, these clones were both larger and more frequent than in younger flies. This age-dependent enhancement in clone size and frequency significantly improved our quantification efficiency (see Lines 101-112).

      (6) Can the authors distinguish one copy of GFP versus 2 copies of GFP in germ cells of the ovary? This is not possible in the Drosophila testis. I ask because this could impact the clonal analyses diagrammed in Figure 4A and 4G and in 6A and B. Additionally, in most of the figures, the GFP is saturated, so it is not possible to discern one vs two copies of GFP.

      Thank you for this valuable comment. It was also difficult for us to distinguish 1 and 2 copies of GFP in the Drosophila ovary. In Figure 4A-F, to resolve this problem, we used a triple-color system, in which red germ cells (RFP<sup>+/+</sup> GFP<sup>-/-</sup>) are bam mutant, yellow germ cells (RFP<sup>+/-</sup> GFP<sup>+/-</sup>) are wild-type, and green germ cells (RFP<sup>-/-</sup> GFP<sup>+/+</sup>) are punt or med mutant. In Figure 4G-J, we quantified the SGC phenotype only in black germ cells (GFP<sup>-/-</sup>), which are wild-type (control) or mad mutant. In Figure 6, we quantified the SGC phenotype only in green germ cells (both GFP<sup>+/+</sup> and GFP<sup>+/-</sup>), all of which are wild-type.

      (7) More evidence is needed to support the claim of elevated Dpp levels in bam or bgcn mutant tumors. The current results with the dpp-lacZ enhancer trap in Figure 5A, B are not convincing. First, why is the dpp-lacZ so much brighter in the mosaic analysis (A) than in the no-clone analysis (B)? It is expected that the level of dpp-lacZ in cap cells should be invariant between ovaries, and yet LacZ is very faint in Figure 5B. I think that if the settings in A matched those in B, the apparent expression of dpp-lacZ in the tumor would be much lower and likely not statistically significant. Second, they should use RNA in situ hybridization with a sensitive technique like hybridization chain reactions (HCR) - an approach that has worked well in numerous Drosophila tissues, including the ovary.

      Thank you for this critical comment. The settings of immunofluorescent staining and confocal parameters in the original Figure 5A were the same as those in 5B. To our observation, the levels of dpp-lacZ in terminal filament and cap cells were highly variable across germaria, even within the same ovary. We have omitted these results from the revised Figure 5. Instead, the HCR-FISH data have been added (Figure 5A-D) to support that bam mutant germline tumors secret BMP ligands.

      (8) In Figure 6, the authors report results obtained with the bamBG allele. Do they obtain similar data with another bam allele (i.e., bamdelta86)?

      No. Given that bam<sup>BG</sup> was functionally indistinguishable from bam<sup>Δ86</sup> in inducing the SGC phenotype (Figure 1J), we believe that repeating these experiments with bam<sup>Δ86</sup> would be redundant and would not alter the key conclusion of our study. Thank you for your understanding!

      Reviewer #2 (Public review):

      While the study by Zhang et al. provides valuable insights into how germline tumors can non-autonomously suppress the differentiation of neighboring wild-type germline stem cells (GSCs), several conceptual and technical issues limit the strength of the conclusions.

      Major points:

      (1) Naming of SGCs is confusing. In line 68, the authors state that "many wild-type germ cells located outside the niche retained a GSC-like single-germ-cell (SGC) morphology." However, bam or bgcn mutant GSCs are also referred to as "SGCs," which creates confusion when reading the text and interpreting the figures. The authors should clarify the terminology used to distinguish between wild-type SGCs and tumor (bam/bgcn mutant) SGCs, and apply consistent naming throughout the manuscript and figure legends.

      We apologize for any confusion. In our manuscript, the term "SGC" is reserved specifically for wild-type germ cells that maintain a GSC-like morphology outside the niche. bam or bgcn mutant germ cells are referred to as GSC-like tumor cells (Lines 89-90), not SGCs.

      (a) The same confusion appears in Figure 2. It is unclear whether the analyzed SGCs are wild-type or bam mutant cells. If the SGCs analyzed are Bam mutants, then the lack of Bam expression and failure to differentiate would be expected and not informative. However, if the SGCs are wild-type GSCs located outside the niche, then the observation would suggest that Bam expression is silenced in these wild-type cells, which is a significant finding. The authors should clarify the genotype of the SGCs analyzed in Figure 2C, as this information is not currently provided.

      The SGCs analyzed in Figure 2A-C are wild-type, GSC-like cells located outside the niche. They were generated using the same genetic strategy depicted in Figures 1C and 1E (with the schematic in Figure 1B). The complete genotypes for all experiments are available in Source data 1.

      (b) In Figures 4B and 4E, the analysis of SGC composition is confusing. In the control germaria (bam mutant mosaic), the authors label GFP⁺ SGCs as "wild-type," which makes interpretation unclear. Note, this is completely different from their earlier definition shown in line 68.

      The strategy to generate SGCs in Figure 4B-F (with the schematic in Figure 4A) is different from that in Figure 1C-F, H, and I (with the schematic in Figure 1B). In Figure 4B-F, we needed to distinguish punt<sup>-/-</sup> (or med<sup>-/-</sup>) with punt<sup>+/-</sup> (or med<sup>+/-</sup>) germ cells. As noted in our response to Reviewer #1’s Weakness (6), it was difficult for us to distinguish 1 and 2 copies of GFP in the Drosophila ovary. Therefore, we chose to use the triple-color system to distinguish these germ cells in Figure 4B-F (see genotypes in Source data 1).

      (c) Additionally, bam<sup>+/-</sup> GSCs (the first bar in Figure 4E) should appear GFP<sup>+</sup> and Red>sup>+</sup> (i.e., yellow). It would be helpful if the authors could indicate these bam<sup>+/-</sup> germ cells directly in the image and clarify the corresponding color representation in the main text. In Figure 2A, although a color code is shown, the legend does not explain it clearly, nor does it specify the identity of bam<sup>+/-</sup> cells alone. Figure 4F has the same issue, and in this graph, the color does not match Figure 4A.

      The color-to-genotype relationships for the schematics in Figures 2A and 4E are provided in Figures 1B and 4A, respectively. Due to the high density of germ cells, it is impractical to label each genotype directly in the images. In contrast to Figure 4E, the colors in Figure 4F do not represent genotypes; instead, blue denotes the percentage of SGCs, and red denotes the percentage of germline cysts, as indicated below the bar chart.

      (2) The frequencies of bam or bgcn mutant mosaic germaria carrying [wild-type] SGCs or wild-type germ cell cysts with branched fusomes, as well as the average number of wild-type SGCs per germarium and the number of days after heat shock for the representative images, are not provided when Figure 1 is first introduced. Since this is the first time the authors describe these phenotypes, including these details is essential. Without this information, it is difficult for readers to follow and evaluate the presented observations.

      Thank you for this constructive suggestion. These quantification data have been added to the revised Figure 1 (Figure 1J, K).

      (3) Without the information mentioned in point 2, it causes problems when reading through the section regarding [wild-type] SGCs induced by impairment of differentiation or dedifferentiation. In lines 90-97, the authors use the presence of midbodies between cystocytes as a criterion to determine whether the wild-type GSCs surrounded by tumor GSCs arise through dedifferentiation. However, the cited study (Mathieu et al., 2022) reports that midbodies can be detected between two germ cells within a cyst carrying a branched fusome upon USP8 loss.

      Unlike wild-type cystocytes, which undergo incomplete cytokinesis and lack midbodies, those with USP8 loss undergo complete cell division, with the presence of midbodies (white arrow, Figure 1F’ from Mathieu et al., 2022) as a marker of the late cytokinesis stage (Mathieu et al., 2022).

      (a) Are wild-type germ cell cysts with branched fusomes present in the bam mutant mosaic germaria? What is the proportion of germaria containing wild-type SGCs versus those containing wild-type germ cell cysts with branched fusomes?

      (b) If all bam mutant mosaic germaria carry only wild-type GSCs outside the niche and no germaria contain wild-type germ cell cysts with branched fusomes, then examining midbodies as an indicator of dedifferentiation may not be appropriate.

      We appreciate your critical comment. bam mutant mosaic germaria indeed contained wild-type germline cysts, as evidenced by an SGC frequency of ~70%, rather than 100% (see Figures 2H, 4F, 4J, 6F, 6I, and Figure 6-figure supplement 3C). Since the SGC phenotype depends on the presence of bam or bgcn mutant germline tumors, we quantified it as “the percentage of SGCs relative to the total number of SGCs and germline cysts that are surrounded by germline tumors” (see Lines 103-108). Quantifying the SGC phenotype as "the percentage of germaria with SGCs" would be imprecise. This is because the presence and number of SGCs were variable among germaria with bam or bgcn mutant germline clones, and a small number of germaria entirely lacked these clones. The data of "SGCs per germarium with both germline clones and out-of-niche wild-type germ cells" have been added to the revised Figure 1 (Figure 1K).

      (c) If, however, some germaria do contain wild-type germ cell cysts with branched fusomes, the authors should provide representative images and quantify their proportion.

      Such germaria could be found in Figure 2G, 3B, 3C, 6D, 6E, and 6H. The percentage of germline cysts can be calculated by “100% - SGC%”.

      (d) In line 95, although the authors state that 50 germ cell cysts were analyzed for the presence of midbodies, it would be more informative to specify how many germaria these cysts were derived from and how many biological replicates were examined.

      As noted in our response to points a) and b) above, the germ cells surrounded by germline tumors, rather than germarial numbers, are more precise for analyzing the phenotype. For this experiment, we examined >50 such germline cysts via confocal microscopy. As the analysis was performed on a defined cellular population, this sample size should be sufficient to support our conclusion.

      (4) Note that both bam mutant GSCs and wild-type SGCs can undergo division to generate midbodies (double cells), as shown in Figure 4H. Therefore, the current description of the midbody analysis is confusing. The authors should clarify which cell types were examined and explain how midbodies were interpreted in distinguishing between cell division and differentiation.

      We assayed for the presence of midbodies or not specifically within the wild-type germline cysts surrounded by bam or bgcn mutant tumors, not within the tumors themselves (Lines 96-97). As detailed in Lines 90-100, the absence of midbodies was used as a key criterion to exclude the possibility of dedifferentiation.

      (5) The data in Figure 5 showing Dpp expression in bam mutant tumorous GSCs are not convincing. The Dpp-lacZ signal appears broadly distributed throughout the germarium, including in escort cells. To support the claim more clearly, the authors should present corresponding images for Figures 5D and 5E, in which dpp expression was knocked down in the germ cells of bam or bgcn mutant mosaic germaria. Showing these images would help clarify the localization and specificity of Dpp-lacZ expression relative to the tumorous GSCs.

      Thank you for your constructive comment. RNA in situ hybridization data have been added to support that bam or bgcn mutant germline tumors secret BMP ligands (Figure 5A-D).

      (6) While Figure 6 provides genetic evidence that bam mutant tumorous GSCs produce Dpp to inhibit the differentiation of wild-type SGCs, it should be noted that these analyses were performed in a dpp⁺/⁻ background. To strengthen the conclusion, the authors should include appropriate controls showing [dpp<sup>+/-</sup>; bam<sup>+/-</sup>] SGCs and [dpp<sup>+/-</sup>; bam<sup>+/-</sup>] germ cell cysts without heat shock (as referenced in Figures 6F and 6I).

      Schematic cartoons in Figure 6A and 6B demonstrate that these analyses were performed in a dpp<sup>+/-</sup> background. Figure 6-figure supplement 1 indicates tha dpp<sup>+/-</sup> or gbb<sup>+/-</sup> does not affect GSC maintenance, germ cell differentiation, and female fly fertility. Figure 6C is the control for 6D and 6E, and 6G is the control for 6H, with quantification in 6F and 6I. We used nos>FLP, not the heat shock method, to induce germline clones in these experiments (see genotypes in Source data 1).

      (7) Previous studies have reported that bam mutant germ cells cause blunted escort cell protrusions (e.g., Kirilly et al., Development, 2011), which are known to contribute to germ cell differentiation (e.g., Chen et al., Frontiers in Cell and Developmental Biology, 2022). The authors should include these findings in the Discussion to provide a broader context and to acknowledge how alterations in escort cell morphology may further influence differentiation defects in their model.

      Thank you for teaching us! We have included the introduction of these two papers in the revised manuscript (Lines 197-199).

      (8) Since fusome morphology is an important readout of SGCs vs differentiation. All the clonal analysis should have fusome staining.

      SGC is readily distinguishable from multi-cellular germline cyst based on morphology. In some clonal-analysis experiments, fusome staining was not feasible due to technical limitations such as channel saturation or antibody incompatibility. Thank you for your understanding!

      (9) Figure arrangement. It is somewhat difficult to identify the figure panels cited in the text due to the current panel arrangement.

      The figure panels were arranged to optimize space while ensuring that related panels are grouped in close proximity for logical comparison. We would be happy to consider any specific suggestions for an alternative layout that could improve clarity.

      (10) The number of biological replicates and germaria analyzed should be clearly stated somewhere in the manuscript-ideally in the Methods section or figure legends. Providing this information is essential for assessing data reliability and reproducibility.

      The detailed quantification information is labeled directly in figures or described in figure legends, and all raw quantification data are provided in Source data 2.

      Reviewer #3 (Public review):

      Summary:

      Zhang et al. investigated how germline tumors influence the development of neighboring wild-type (WT) germline stem cells (GSC) in the Drosophila ovary. They report that germline tumors inhibit the differentiation of neighboring WT GSCs by arresting them in an undifferentiated state, resulting from reduced expression of the differentiation-promoting factor Bam. They find that these tumor cells produce low levels of the niche-associated signaling molecules Dpp and Gbb, which suppress bam expression and consequently inhibit the differentiation of neighboring WT GSCs non-cell-autonomously. Based on these findings, the authors propose that germline tumors mimic the niche to suppress the differentiation of the neighboring stem cells.

      Strengths:

      This study addresses an important biological question concerning the interaction between germline tumor cells and WT germline stem cells in the Drosophila ovary. If the findings are substantiated, they could provide valuable insights applicable to other stem cell systems.

      We greatly appreciate your valuable comments.

      Weaknesses:

      Previous work from Xie's lab demonstrated that bam and bgcn mutant GSCs can outcompete WT GSCs for niche occupancy. Furthermore, a large body of literature has established that the interactions between escort cells (ECs) and GSC daughters are essential for proper and timely germline differentiation (the differentiation niche). Disruption of these interactions leads to arrest of germline cell differentiation in a status with weak BMP signaling activation and low bam expression, a phenotype virtually identical to what is reported here. Thus, it remains unclear whether the observed phenotype reflects "direct inhibition by tumor cells" or "arrested differentiation due to the loss of the differentiation niche." Because most data were collected at a very late stage (more than 10 days after clonal induction), when tumor cells already dominate the germarium, this question cannot be solved. To distinguish between these two possibilities, the authors could conduct a time-course analysis to examine the onset of the WT GSC-like single-germ-cell (SGC) phenotype and determine whether early-stage tumor clones with a few tumor cells can suppress the differentiation of neighboring WT GSCs with only a few tumor cells present. If tumor cells indeed produce Dpp and Gbb (as proposed here) to inhibit the differentiation of neighboring germline cells, a small cluster or probably even a single tumor cell generated at an early stage might prevent the differentiation of their neighboring germ cells.

      Thank you for your critical comment. The revised manuscript now includes a time-course analysis of the SGC phenotype (Figure 1J). Our data in Figure 6 demonstrate that BMP ligands from germline tumors are required to inhibit SGC differentiation. Furthermore, we have incorporated into the manuscript the possibility that disruption of the differentiation niche may also contribute to the SGC phenotype (Lines 197-199).

      The key evidence supporting the claim that tumor cells produce Gpp and Gbb comes from Figures 5 and 6, which suggest that tumor-derived dpp and gbb are required for this inhibition. However, interpretation of these data requires caution. In Figure 5, the authors use dpp-lacZ to support the claim that dpp is upregulated in tumor cells (Figure 5A and 5B). However, the background expression in somatic cells (ECs and pre-follicular cells) differs noticeably between these panels. In Figure 5A, dpp-lacZ expression in somatic cells in 5A is clearly higher than in 5B, and the expression level in tumor cells appears comparable to that in somatic cells (dpp-lacZ single channel). Similarly, in Figure 5B, dpp-lacZ expression in germline cells is also comparable to that in somatic cells. Providing clear evidence of upregulated dpp and gbb expression in tumor cells (for example, through single-molecular RNA in situ) would be essential.

      We greatly appreciate your critical comment. In our data, the expression levels of dpp-lacZ in terminal filament and cap cells were highly variable across germaria, even within the same ovary. We have omitted these results in the revised Figure 5. RNA in situ hybridization data have been added to visualize the expression of BMP ligands within bam mutant germline tumor cells (Figure 5A-D).

      Most tumor data present in this study were collected from the bam[86] null allele, whereas the data in Figure 6 were derived from a weaker bam[BG] allele. This bam[BG] allele is not molecularly defined and shows some genetic interaction with dpp mutants. As shown in Figure 6E, removal of dpp from homozygous bam[BG] mutant leads to germline differentiation (evidenced by a branched fusome connecting several cystocytes, located at the right side of the white arrowhead). In Figure 6D, fusome is likely present in some GFP-negative bam[BG]/bam[BG] cells. To strengthen their claim that the tumor produces Dpp and Gbb to inhibit WT germline cell differentiation, the authors should repeat these experiments using the bam[86] null allele.

      Although a structure resembling a "branched fusome" is visible in Figure 6E (right of the white arrowhead), it is an artifact resulting from the cytoplasm of GFP-positive follicle cells, which also stain for α-Spectrin, projecting between germ cells of different clones (see the merged image). In both our previous (Zhang et al., 2023) and current studies, bam<sup>BG</sup> was functionally indistinguishable from bam<sup>Δ86</sup> in its ability to block GSC differentiation and induce the SGC phenotype (Figure 1J). Given this, we believe that repeating the extensive experiments in Figure 6 with the bam<sup>Δ86</sup> allele would be scientifically redundant and would not change the key conclusion of our study.

      It is well established that the stem niche provides multiple functional supports for maintaining resident stem cells, including physical anchorage and signaling regulation. In Drosophila, several signaling molecules produced by the niche have been identified, each with a distinct function - some promoting stemness, while others regulate differentiation. Expression of Dpp and Gbb alone does not substantiate the claim that these tumor cells have acquired the niche-like property. To support their assertion that these tumors mimic the niche, the authors should provide additional evidence showing that these tumor cells also express other niche-associated markers. Alternatively, they could revise the manuscript title to more accurately reflect their findings.

      Dpp and Gbb are the key niche signals from cap cells for maintaining GSC stemness. Our work demonstrates that germline tumors can specifically mimic this signaling function, not the full suite of cap cell properties, to create a non-cell-autonomous differentiation block. The current title “Tumors mimic the niche to inhibit neighboring stem cell differentiation” reflects this precise concept: a partial, functional mimicry of the niche's most relevant activity in this context. We feel it is an appropriate and compelling summary of our main conclusion.

      In the Method section, the authors need to provide details on how dpp-lacZ expression levels were quantified and normalized.

      Because of the highly variable expression levels in terminal filament and cap cells, we have omitted the dpp-lacZ results in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Not all readers may be familiar with the nos>FLP/FRT or hs-FLP/FRT systems. It would be helpful if the authors could briefly introduce these genetic mosaic systems and explain how they were used in this study before presenting the results.

      Thank you for this constructive suggestion. Such brief introduction has been added to the revised manuscript (Lines 64-70).

      (2) Line 68-70: "Surprisingly, ...outside the niche retained a GSC-like single-germ-cell (SGC) morphology, even when encapsulated within egg chambers (Figure 1C, D, Figure 1- figure supplement 1).

      (3) The figure citation is not appropriate, as Figures 1C and 1D do not show "single germ cells (SGCs) encapsulated within egg chambers." To improve clarity, the authors could revise the sentence as follows: "Surprisingly, wild-type germ cells located outside the niche retained a GSC-like single-germ-cell (SGC) morphology (Figures 1C and D), even when encapsulated within egg chambers (Figure 1-figure supplement 1)." This modification would make the description consistent with the figure content and easier for readers to follow.

      Thank you for teaching us! The manuscript has been revised following this suggestion (Lines 70-73).

      (4) Line 106-110. The description is confusing. The authors state, "Under normal conditions... Notably, 74% of SGCs (n = 132) were GFP-negative, while the remaining 26% were GFP-positive (Figure 2B, C). However, Figure 2B shows the bam mutant mosaic germaria, and Figure 2C does not specify the genotypes of the germaria used for the analysis of GSCs, CBs, and SGCs. The authors should clarify the experimental conditions and genotypes corresponding to each panel. In addition, it would be more informative to indicate how many germaria these quantified GSCs, CBs, and SGCs were derived from.

      (5) Throughout the manuscript, the authors report the number of SGCs analyzed (e.g., Lines 149-151). However, it would be more informative to also indicate how many germaria these quantified SGCs were derived from. Providing this information would help readers assess the sampling size and variability across biological replicates.

      Thank you for your suggestion. As shown in Figure 2B, these wild-type (RFP-positive) GSCs and CBs were also derived from bam mutant mosaic germaria. The phrase "under normal conditions" has been deleted from the revised manuscript to prevent any potential ambiguity. Given the specificity of the SGC phenotype, the germ cells surrounded by germline tumors, rather than germarial numbers, are more precise for its quantification (Lines 103-108). The data of “SGCs per germarium with both germline clones and out-of-niche wild-type germ cells” have been added to the revised Figure 1K.

      Reviewer #3 (Recommendations for the authors):

      (1) Additionally, the authors should clarify what the "red dot" signal in the GFP-positive cap cell in Figure 3 F (left panel) represents.

      The “red dot” is an asterisk that is used to mark a cap cell (Line 620).

      (2) Finally, on line 266, "bamP-GFP-positive" should be corrected to "bamP-GFP-negative."

      It should be “bamP-GFP-positive”, not “bamP-GFP-negative” (see Figure 2B).

      Reference:

      Mathieu, J., Michel-Hissier, P., Boucherit, V., and Huynh, J.R. (2022). The deubiquitinase USP8 targets ESCRT-III to promote incomplete cell division. Science 376, 818-823.

      Zhang, Q., Zhang, Y., Zhang, Q., Li, L., and Zhao, S. (2023). Division promotes adult stem cells to perform active niche competition. Genetics 224.

      Zhao, S., Fortier, T.M., and Baehrecke, E.H. (2018). Autophagy Promotes Tumor-like Stem Cell Niche Occupancy. Curr Biol 28, 3056-3064.e3053.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Lymphatic vessels drain tissue fluid, absorb lipids, and traffic immune cells. Recent studies on adaptive immunity have identified lymphatics as a potential key target to treat inflammation-associated diseases. In this context, studies on lymphatic sprouting, i.e. the process by which lymphatics expand, are timely. Although Zebrafish lymphatics are somewhat different from mammalian lymphatics, still, the zebrafish has been a useful model for the identification of the key players regulating lymphatic vessel growth, thus, presenting potential targets for pre-clinical studies.

      Woutersen et. al. have studied the shp2a and shp2b douple mutant zebrafish and identified a requirement for shp2 in lymphatic vessel formation 3-5 days post fertilization. The authors state that the shp2 is required for migration and differentiation of the future lymphatic vessels but not the formation of the venous intersegmental vessels (in contrast to other relevant genes, such as vegfr3). The phenotype is rescued by the expression of wild-type but not mutant shp2.

      Major comments:

      The authors use shp2 deleted strains, live imaging and mRNA rescue experiments. The results, as such, are convincing and the reporting is accurate, allowing reproduction of the experiments. Still, some of the conclusions are not fully backed up by the presented results and would need further experimentation as outlined below:

      1. The other "lymphatic vessel mutants", such as vegfr3, vegfc, and grb2, also cause blood vessel phenotypes, i.e. have an effect on venous intersegmental vessels. The authors state that the shp2 mutants are the first ones to have a lymphatic vessel-specific phenotype. Authors should discuss whether this is due to maternal contribution, i.e. long maternal shp2 mRNA or protein half-life? To back up the statement, authors should investigate later angiogenesis events (developmental or induced) to show that shp2 is not required. * We cannot exclude the possibility that maternally contributed Shp2 is responsible for normal venous intersegmental formation. However, this is unlikely, because at the same time, we did observe defects in lymphangiogenesis. It is unlikely that the half-life of Shp2 is regulated differentially in endothelial cells that contribute to future vISVs compared to future ISLVs.

      To show that shp2 has a lymphatic endothelium autonomous role, the authors show that the vegfc mRNA expression is not altered. Authors should quantify the in situ signals (vegfc and vegfr3) and use non-specific probes to show the level of non-specific staining. It is still possible that shp2 would have a lymphatic endothelium-independent role, for example, in Vegf-c processing. Authors should discuss this or delete shp2 in an endothelium-specific manner. Authors should also stain, use in situ hybridization or qPCR (of extracted flt4 reporter-expressing cells) to show that shp2 is expressed in lymphatic endothelial cells.

      * Expression of vegfc was assessed to establish whether loss of Shp2 affected its expression, not to show that Shp2 has a lymphatic endothelium autonomous role. In situ hybridization is semi-quantitative at best. The vegfc in situ hybridizations are similar between wild type and knock-out and do not provide an indication that vegfc expression is altered, warranting further investigation by qPCR. On the other hand, the flt4 in situ hybridizations show a clear reduction in signal in Shp2 double knockout embryos, which was confirmed by qPCR experiments (Fig. 3g). We cannot exclude the possibility that Shp2 has a role in Vegfc processing as suggested by the reviewer and we have included a statement to this effect in the Discussion of the revised version (line 411, 412). In situ hybridization patterns are not very informative for Shp2, because Shp2 is expressed in most, if not all cells, which results in rather indiscriminate expression patterns (Bonetti et al. 2014, PLoS ONE 9, e94884. doi:10.1371/journal.pone.0094884).

      Authors highlight lymphatic endothelial cells and precursors with flt4 (vegfr3) reporter. Furthermore, authors write "a pivotal role for Shp2 signaling in the migration and differentiation of lymphatic endothelial" but do not provide any evidence for the differentiation expect the presence of flt4 (vegfr3) reporter expressing cells. To use a second method for detecting lymphatic vessels and to investigate the differentiation, the authors should show and quantify Prox1 expression in PCV endothelial cells prior to sprouting and in migrating future lymphatic endothelial cells.

      * We changed “differentiation” in the title and in the abstract to “formation”, because we do not provide formal proof that Shp2 is involved in differentiation of lymphatic endothelial cells. We routinely use Tg(flt4:mCitrine; flt1:tdTomato) reporters to highlight lymphatic endothelial cells. We have also used Tg(fli1a:GFP; kdrl:mCherry) to highlight lymphatic endothelial cells. Because the signals were more robust, we mainly used the former transgenic line. We have included representative images of the Tg(fli1a:GFP; kdrl:mCherry) line in Supplementary Figure 1 as a second method for detecting lymphatic vessels. We included a statement to this effect in the text (line 182-188).

      SHP2 has not been linked to VEGFR3 earlier, but has been shown to control VEGFR2. However, it is not obvious whether SHP2 is a positive or a negative regulator of VEGFR2. Here, authors should try to stain pErk in sprouting control and shp2 deleted cells, similar to their previous study (Mauri et al. 2021), to show the effect of shp2 loss on the growth factor receptor downstream signaling.

      * We have considered staining pErk using whole mount immunohistochemistry. However, subsequent imaging of the target cells is extremely difficult, because we would be interested in a subset of endothelial cells, the ones that are sprouting. Timing is also an issue, because we would be interested to image these cells around the time they are sprouting. Only a small number of endothelial cells sprouts and these cells will be hard to discern from surrounding endothelial cells. Some of the surrounding endothelial and non-endothelial cells may express high levels of pErk as well. Hence, interpretation of the pErk immunohistochemistry data is extremely difficult. It would be interesting to use a reporter line for MAPK activation, which might allow for imaging specifically of the target cells in double or triple transgenic backgrounds, but this is beyond the scope of this paper.

      Reporting the sample numbers: In most of the experiments/figures, the authors do not have sufficient information. The number of independent experiments and biological replicates should be shown for each, even representative, experiment. Data should always be derived from more than one independent experiment.

      * We have included the number of experiments for the different experiments and we have increased the number of embryos for the different conditions to include the data of at least 8 samples for each experiment.

      Minor comments:

      P.13 rows 269-271: "In addition, we observed normal perfusion and blood flow in the established vISV connections of the ptpn11a-/-ptpn11b-/- embryos and their siblings, suggesting that Shp2 is dispensable for the formation of vISVs.". The authors should show all the data mentioned in the manuscript. If this is shown in a provided movie, please, indicate which one.

      * In the revised version, we refer to Figure 7d, where perfusion of vISVs is evident (line 278).

      Figure legend 6: change "arrow" to "arrowhead".

      * This has been corrected

      **Referee cross-commenting** No further comments

      Reviewer #1 (Significance (Required)):

      The current manuscript is focused on the characterization of the shp2 mutant embryo phenotype and the rescue experiments. Upon completion of the above-mentioned experiments, the manuscript presents shp2 as a novel regulator of lymphatic vessel formation/lymphatic endothelial cell survival. As such, this notion is quite isolated, since there is no biochemical evidence of, for example, VEGFR3-SHP2 interaction. Broader impact (and audience) would be reached if the authors could show the molecular mechanisms governed by Shp2. Now, in the absence of this data, the impact is moderate. Still, lymphangiogenesis researchers would find the results interesting, thus potentially opening new avenues.

      Reviewer's field of expertise: Lymphatic endothelium. No expertise in zebrafish.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Woutersen et al. describe the effect of single and double knockouts of the zebrafish SHP2 orthologs ptpn11a and ptpb11b. Although some effects of single deletion of ptpn11a are observed, compound deletion results in profound ablation of VEGFR3 (flt4 in zebrafish)-dependent but interestingly, not Tie1-dependent lymphangiogenesis. Rescue experiments with genes encoding WT and mutant forms of SHP2 indicate that intact SH2 domains, PTP activity, and C-terminal tyrosines are required. They also observe differential rescue by the zebrafish analogs of Noonan syndrome (NS) and Noonan syndrome with multiple lentigines (NS-ML) mutants.

      Overall, this is a comprehensive analysis of the effects of WT and mutant SHP2 in lymphatic development in zebrafish. I support its publication with minimal revisions addressing the points below.

      1) For the general reader, it would be helpful to include (in the Supplementary Materials or in Fig. 1) a diagram showing the steps in lymphatic development described in the Introduction that shows the position of the various structures that are subsequently referred to only by abbreviations.

      * In the introduction, we refer to Hogan and Schulte-Merker 2017 Dev Cell 46, 567-583, a review that shows schematics and all the abbreviations we use in our manuscript.

      2) For several figures, there is no statement of what the arrowheads and asterisks point to either in the text or figure legends (e.g. Fig. 2, Fig. 5, Fig. 7). Also, Fig. 6 has "arrowheads", not "arrows". Please check all figure legends carefully to ensure that they fully describe the results shown).

      * We have included statements of what the arrowheads and asterisks in all figures indicate in the revised version.

      3) In the legend to Fig. 1, the authors state that ptpn11a-/- embryos have a "slim" phenotype. How was this assessed-and can it be quantified?

      * We have not systematically quantified this trait of ptpn11a-/- fish and we have not studied the functional consequences, if any. This is a qualitative characteristic that is obvious when analyzing the embryos. We do not want to put much emphasis on the slim phenotype and we have removed the statement from the legend of Fig. 1 in the revised version (line 738).

      4) In the experiments shown in Fig. 6 (and Supplemental movie 1), the authors show that initial sprouting occurs in double mutant embryos, but the sprouts are unable to connect to an aiSV. There are clearly sprouts in the double mutant embryos shown, but there appear to be fewer of them. Do normal numbers of initial sprouts form?

      * Close analysis of the imaging data indicates that normal numbers of initial sprouts form in the double mutant, one sprout for each intersegmental vessel.

      5) If possible, the authors should show immunoblots for all the rescue experiments to convince the reader that each construct was expressed appropriately.

      * Whereas this is an interesting suggestion, this is technically not feasible, because the amount of material from individual embryos is not sufficient for detection of microinjected Shp2 protein by immunoblotting. In fact, only part of the embryo would be available, because a part is needed for genotyping, as we use incrosses of heterozygous fish to generate embryos for the injections. Instead, we expressed constructs encoding GFP and the autoproteolytic peptide 2A linker to the N-terminal side of Shp2a and variants. In line 121, we provide a reference to the paper where we first used this construct, which includes a schematic representation of the construct (Bonetti et al., 2014, Development 141, 1961-1970, DOI: 10.1242/dev.106310). We assessed GFP fluorescence at 1 dpf and discarded embryos that did not express GFP, thus selecting for embryos that did express Shp2 (variants).

      6) The finding of incomplete, or in the case of ptpn11D61G, lack of rescue of lymphangiogenesis by RASopathy-associated mutants is particularly interesting. Have the authors looked at why this is so-i.e., does sprouting occur in D61G-reconstituted embryos? Is migration then blocked or accelerated? Is fusion to aiSVs defective? Although not necessary for the current publication, such information would certainly strengthen the paper. Also, I am not sure that I agree with the authors' statement that the two NS-ML mutants rescue equally to WT; A462T, in particular, is at least nominally less effective and if the n was higher, it might well show statistically lower rescue. The authors should consider tempering this statement.

      * We are planning to investigate in-depth the effects of Shp2-D61G and other NS-associated genes on lymphangiogenesis, but this is beyond the scope of this paper. Here we demonstrate that Shp2 variants rescue or not, upon expression of synthetic mRNA encoding Shp2 variants by microinjection at the one-cell stage. We have tempered our statement about the NS-ML mutants in the text (line 369-372): “Both NSML variants rescued the lymphangiogenesis defects in ptpn11a-/-ptpn11b-/- embryos to the extent that there was no significant difference with their wild type and heterozygous siblings anymore (Figure 10b).”

      7) In the Discussion, the authors reference recent papers on lymphatic defects in NS patients. Although there is no harm in citing these papers, lymphatic abnormalities have been noted in NS patients since the initial descriptions of the syndrome. Either those papers or a review should be cited as well.

      * We have included a reference (line 486) to the review by Roberts et al. 2013 Lancet 381,333-342, https://doi.org/10.1016/S0140-6736(12)61023-X in addition to the recent papers we cited that report lymphatic anomalies in human NS patients, based on lymphangiograms.

      8) The authors might want to note that peripheral edema has been universally associated with SHP2 inhibitor treatment in patients.

      * It is an interesting notion that peripheral edema is the second most frequently occurring side effect in response to SHP2 inhibitor treatment in human subjects (Johnson ML et al. 2024 Mol Cancer Ther 2025;24:384–91 doi: 10.1158/1535-7163.MCT-24-0466). We have included a statement to this effect in the Discussion of the manuscript (line 423-430).

      9) Also, why do the authors think that Tie1 signaling does not require SHP2? It would be interesting to note for the reader that SHP2 has been reported to bind to activated Tie1 and discuss anything known about SHP2 requirements for Tie1 action in mammalian systems.

      * SHP2 interacts with many RTKs that are involved in many developmental processes. Zebrafish embryos lacking functional Tie1 display reduced endothelial and endocardial cell numbers and reduced heart size (Carlantoni et al. 2021 Dev Biol. 469:54-67. doi: 10.1016/j.ydbio.2020.09.008). Whereas we have not investigated this in detail, we have not observed obvious defects in cardiac development. Yet, Tie1 signaling has been implicated in lymphangiogenesis and we cannot exclude involvement of defective Tie1 signaling due to lack of functional Shp2 in the Shp2 double knockouts.

      **Referee cross-commenting** No further comments

      Reviewer #2 (Significance (Required)):

      Thie is a comprehensive study of the role of SHP2 in lymphatic development, using zebrafish as a model. Although descriptive, this paper is important because mutations in SHP2 are associated with lymphatic abnormalities and SHP2 inhibitors cause lymphedema. Also, the unique features of the zebrafish system allow the authors to define the steps and signaling pathways defective in these models.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      SHP2 is an adaptor protein that plays an important role in the RAS/MAPK pathway. Abnormal activity in this pathway has been involved in various cancer as well as developmental disorders like Noonan Syndrome. Here, the authors show the important role of Shp2 in physiological lymphatic development in zebrafish using various Shp2 mutants. This promising manuscript, however, needs some adjustments and further clarifications.

      Results section:

      • Transmitted light images of ptpn11a-/- ptpn11b-/- embryos are not consistent throughout the figures. Larvae in figure 1 is particularly severe compared to images of the same line at 5dpf in the rest of the article (ex. Supp fig1 c, Supp fig4 c&l). Authors should have a consistent representative images. Was there a range of phenotype severity in this model ? Additional phenotype details and quantifications should be included about this double knockout model.

      * We consistently observed a range of phenotypes in the double mutant embryo since the first description of the phenotype (Bonetti et al. 2014, PLoS ONE 9, e94884. doi:10.1371/journal.pone.0094884). The variation depends on the families that are being used to generate the embryos. This is why we include non-injected controls for all injection experiments. Whereas not all double homozygous embryos show edemas, edemas are representative of the phenotype.

      • Line 165-167 : "Loss of functional Shp2a in ptpn11a-/- ptpn11b+/+ embryos induced a pleiotropic phenotype from 4 days post fertilization (dpf) onwards (Figure 1a-d) and was previously shown to be embryonic lethal". Line 178 : "Wild-type siblings and single mutants showed normal lymphatic vasculature...". There is a discrepancy between these 2 sections because one of the single mutant is embryonically lethal. What was the cause of lethality in this model and was it vascular-related ? Could the authors provide more detail about that ?

      * In our view, there is no discrepancy between these sections. The ptpn11a-/-ptpn11b+/+ embryos start to show a morphological phenotype at 4 dpf, but lymphangiogenesis is normal in these embryos. The embryos lacking functional Shp2a do not survive long after reaching 5 dpf and we have never obtained adult ptpn11a-/- fish. Hence, Shp2a is required for normal zebrafish embryogenesis, but lymphangiogenesis is only impaired in embryos lacking all Shp2. We have not investigated lethality of ptpn11a-/-ptpn11b+/+ embryos or larvae in detail, but the absence of a functional swim bladder (Fig. 2c) is likely causing lethality. We have no indication that lethality was vascular related.

      • Authors managed to create various mutant zebrafish model crossed with the double transgenic flt4:mCitrine;flt1:tdTomato. In the double mutant, it is surprising to see an important decrease in the tdTomato arterial expression. Please choose a more representative image or add further explanations.

      * The tdTomato signal in this particular experiment is reduced in the double mutant compared to the other genotypes we show here. We believe that by coincidence the embryo in Figure 2d is heterozygous for tdTomato, whereas the other embryos are homozygous. The conclusion of this experiment is not affected by this apparent difference in expression: double homozygous embryos lack the lymphatic vasculature.

      • Authors had shown clear defects in the zebrafish model in figure 1. It is confusing since zebrafish were imaged at 4dpf (line 176) but figure 2 shows images at 4dpf whereas the TD is fully visible and developed at 5dpf. Authors should correct that or show both set of images at 4 and 5 dpf (one can be placed in supplementary). Also, text refers the presence of TD at 5 dpf (line 184-185) and correlated quantification (figure 2e) whereas images from figure 2 are from 4dpf fish.

      * The thoracic duct is detectable in all segments of zebrafish embryos at 4 dpf (Fig. 2a). Morphological defects do not necessarily correlate with defective development of the thoracic duct. However, severe edemas in the double knockouts distort the vasculature and/or interfere with imaging of the thoracic duct and therefore we assessed the presence of the thoracic duct at 4 dpf. Line 193 – the quantifications were done using embryos at 4 dpf. We have corrected this mistake in the text of the revised version.

      • Line 167 & 173: authors mentioned embryonically lethal model without explaining how old the larvae were, could you please add the information.

      * The term “embryonic lethal” is technically not correct, because the embryos do not die in significant numbers before they reach 5 dpf. We have rephrased this to “lethal after the embryonic stage” (line 168 and 174) to be more accurate. We have not established exactly when the larvae died. Most embryos survive until 5 dpf, and we never obtained adult ptpn11a-/- fish. Establishing when the larvae die is considered an animal experiment under European law. We have chosen not to sacrifice larvae just to establish when they died.

      • Authors claim that no significant lymphatic deficiencies were observed in the single Shp2a or Shp2b alone. Is this result due to compensatory mechanisms from one isoform to the other ? Further molecular quantifications such as qPCR or Western blot could be performed in both single mutant to characterize this phenomenon.

      * Indeed, we believe that redundancy between Shp2a and Shp2b is the cause that there are no lymphatic deficiencies in the single mutants. Previously, we have shown that Shp2a and Shp2b are both functional, that both Shp2a and Shp2b rescue developmental defects and that Shp2a and Shp2b are both expressed in zebrafish embryos (Bonetti et al., 2014 PLoS ONE 9: e94884, doi:10.1371/journal.pone.0094884). Moreover, expression of either Shp2a or Shp2b rescued defects in the lymphatic vasculature in double knockout embryos (Fig. 4), which is consistent with Shp2a and Shp2b having compensatory roles.

      • Figure 3 - the authors show differential development of the head vasculature. It would be consistent with the rest of the figures to keep the same labelling and colors rather than black and white images. Authors nicely added figure 3c and 3f as great schematic, it would be helpful to highlight all of them in the zebrafish images (ex. BLEC) and add different colors of arrows for each structure. Adding single mutant images as supplementary figures would be important to confirm that there are no significant defects.

      Measurements and quantification should be performed to validate the authors claim of missing and impaired lymphatic structures. Could the authors provide details about the vascular vessels of the head, is there any consequence in the blood vasculature ?

      Additionally, using a nuclear line or a nuclear staining is essential before making any conclusion about lymphatic cell population abnormality.

      * We provide the representation as shown in Figure 3, because the contrast of the flt4:mCitrine signal is superior in this black and white representation compared to the green signal on black background representation. We have included differently colored arrowheads to indicate the different lymphatic structures and we have included representative images of the single mutants in Supplementary Figure 2.

      Our conclusions regarding the lymphatic vasculature in the head are qualitative. Most lymphatic structures are missing altogether in the double mutant, which does not allow meaningful quantification. We have not observed obvious defects in the blood vasculature in the double mutant.

      We conclude that lymphatic vasculature does not develop normally. A nuclear reporter line would be required to conclude that the number of lymphatic cells is aberrant in the double mutant, which is interesting, but is not what we conclude from these experiments.

      • Figure 4 - Authors performed rescue experiments with injection of mRNA to demonstrate that the lymphatic KO phenotype was due to the lack of functional Shp2. Successful mRNA injection and so Shp2a/Shp2b increased expression should be confirmed using qPCR to validate the experiment in the first place. Representative images correlating with quantifications should be added in the figure to support the authors results.

      * The constructs we used for the rescue experiments contain GFP fused to the autoproteolytic peptide 2A and Shp2 (variant) (Bonetti et al., 2014, Development 141, 1961-1970, DOI: 10.1242/dev.106310). These constructs drive expression of the fusion protein, which is cleaved into GFP and the Shp2 variant. Hence, expression of GFP is indicative of expression of Shp2. We routinely discarded embryos that did not express GFP at 1 dpf, thus selecting embryos that express the Shp2 (variants).

      • Figure 5 - Authors should perform experiment with a nuclear line or a nuclear staining in the fish lines before making any conclusion about the number of PL cells. Additional clarifications about the methods of quantification should be included. The authors should count the number of segments/missing segments instead. Individual values with standard deviation should be shown in the graph instead of the total mean value and standard variation and should be specified in the figure legend.

      * We agree with the reviewer that counting cells with a nuclear reporter would be superior to the way we quantified the number of PL cells in the transgenic flt4:mCitrine reporter line. It is possible that if two PL cells are very close together, they will be counted as one and hence that the numbers we provide are an underestimate of the total number of PL cells. We feel that this potential intrinsic error in counting would be the same for all conditions/ genotypes. The point of Figure 5 is that the double mutants have no PL cells and the other genotypes have similar numbers of PL cells. The potential intrinsic error would not alter the conclusion of this figure. We have included how we counted the number of PL cells in the legend to Fig. 5 and we included the standard deviation in Fig. 5e.

      • Figure 6 - Time-lapse imaging shows aberrant sprouting in the double mutant compared to control larvae. However, it is not clear if that process is just delayed or completely impaired in the mutant : time-lapses experiment should be performed in later stages. It seems that the chosen time-points images are different from the wild-type and the mutant groups, it would be best to have the same time-point to highlight the difference between the two groups. Authors affirm that vISV formation is unaffected in the double mutant larvae, however, it is hard to confirm that statement with black and white images and supplementary movies. Raw confocal images and movies should be included instead to distinguish lympho-venous and arterial structures.

      * The supplementary movies and Fig. 6, which is derived from these movies, show lack of PL cell formation in the double mutant (Fig. 6B). PL cell formation is clearly visible in wild type embryos (Fig. 6A). The sprouts that (are supposed to) give rise to PL cells are indicated with arrowheads. In both embryos, vISV formation is evident in the ISVs next to the ones where PL cells start to form, i.e. the ISVs next to the ones indicated with arrowheads. Sprouting of the endothelial cells is best observed in the time lapse movies. Whereas the exact timing may be different due to the exact conditions, the developmental timing of the sequence of images is similar between the wild type and the double mutant. The black and white representation gives higher contrast than the original fluorescent movies/ pictures, which is why we prefer this representation.

      • Figure 7 - Figure 7d does not correlate with previous imaging included in figure 2, in fact, fluorescent expressions appear inverted between the two figures. Please standardize this as they are not comparable. Quantification of the percentage of veins may not be the best parameter to investigate the normality of the vISV. Measurements of the diameter of the vISV would be more relevant. Individual values with standard deviation should be shown in the graph instead of the total mean value and standard variation and should be specified in the figure legend.

      * We believe the intensities of the signals in Figure 7d and Figure 2d may be different, because the embryo in Figure 2d may be heterozygous for the flt1:tdTomato transgene, whereas the embryo in Figure 7d is homozygous. Whereas the intensities of tdTomato are different, we clearly observe the absence of the lymphatic vasculature in Figure 2d and normal formation of vISVs in Figure 7d. We have indicated in the legend of the figure that the percentage of vISVs was determined in the number of embryos indicated and that the average percentage is plotted in the graph with the error bars indicating the standard deviation (lines 787-789).

      • Figure 8 - Authors have analyzed flt4 and vegfc expression in the mutant embryos to further characterize Lymphangiogenesis processes in the model. Fold change expression of flt4 appears to be decreased in the double mutant compared to control. It would be useful to also quantify it in uninjected and ptpn11a+/- ptpn11b-/- groups as additional appropriate control groups. Images of ptpn11a+/+ ptpn11b+/+ embryos should be added. Lack of consistency between images and quantification are confusing.

      Considering that quantifications in other figures were performed in a high number of larvae and only 3 were included in this figure in the double mutant group, it would be important to increase the number of ptpn11a-/- ptpn11b-/- embryos for this experiment. To confirm that vegfc expression is normal, fold change expression should be included as performed for flt4 expression.

      Figure number is missing.

      QPCR was done with ptpn11a+/+ptpn11-/- and ptpn11a-/-ptpn11b-/- embryos, correlating to the genotypes that were used for in situ hybridization. There were no injections performed in the framework of this experiment. Because ptpn11a+/+ptpn11b-/- embryos formed lymphatic vasculature like wild type embryos (Figure 2), we focused on embryos derived from an incross of ptpn11a+/-ptpn11-/- fish, generating ptpn11a-/-ptpn11b-/- double mutant embryos as well as ptpn11a+/+ptpn11-/- and ptpn11a+/-ptpn11b-/- siblings. In situ hybridization indicated that flt4 expression was reduced, which was confirmed by QPCR. We have not included vegfc in the QPCR experiments, because the in situ hybridization experiments did not suggest a difference in expression between the genotypes. The Figure number was added.

      • Figure 9: A different background line was used for this figure (fli1a:eGFP;kdrl:mCherry vs flt4:mCitrine;flt1:tdTomato), could the authors explain the purpose of this change and add a brief experiment to confirm the findings and phenotype do not change from one line to another. The overall purpose of this set of experiment is not very clear, maybe one or two sentences of transition as well as rephrasing parts of this section could help better understand the objective and results.

      * A different transgenic background was used for this figure. Like Tg(flt4:mCitrine;flt1:tdTomato), the Tg(fli1a:eGFP;kdrl:mCherry) line allows analysis of the lymphatic vasculature (all lymphatic vessels are labeled with eGFP, not mCherry). The results were the same between the two transgenic lines. The flt4:mCitrine signal is more robust than the flia:eGFP signal, which is why we showed images of the former in most of the figures. Representative images of the Tg(fli1a:eGFP;kdrl:mCherry) line are shown in Supplementary Figure 1. We have included a statement to explain the objective of this part (line 311-312): “We used mutants of Shp2a to assess which signaling functions of Shp2 are required for normal lymphangiogenesis.”

      • Figure 10 - Correlating zebrafish data with human disease is very interesting and highlight the importance of this work. The authors characterize the effect of NS and NSML variants on morphological and lymphatic defects in zebrafish embryos and find that these variants significantly rescued anomalies in double mutant larvae. Since these variants have opposite effects (increase signaling activity in NS and decreasing activity in NSML), authors should add a few words about how two opposite variants could have the same outcome on the zebrafish model. It may also be helpful to include information about these diseases in the introduction, including the lymphatic complications.

      * In the discussion, we included a paragraph where we discuss the effects of the NS and NSML variants and why both variants may rescue the phenotype in Shp2 double knockout embryos (lines 458-488).

      • On supplementary figure 4, double mutant expressing Shp2a A462T fish seems to develop edema. Similarly to figure 8, on all supplementary figures, data were collected from only 3 larvae per group in some groups (2 in supplementary fig 2l) is weak considering that this in vivo model allows to generate a very high number of embryos. Authors should increase the number of larvae per group to reach at least N=10/group to be more robust.

      Line 357 "... was observed more frequently in Shp2a-D61G injected double mutant embryos" this statement should be supported by the appropriate quantifications and statistical analysis.

      * We increased the number of embryos that we evaluated for each condition of the injection experiments to at least 9.

      Line 361-362 " (cf. Figure 4, 10b)" incorrect typo?

      * We have altered the statement (line 369-372) to: “Both NSML variants rescued the lymphangiogenesis defects in ptpn11a-/-ptpn11b-/- embryos to the extent that there was no significant difference with their siblings anymore (Figure 10b).

      Materials and Methods section :

      Overall, this section needs significant clarifications considering the amount of work and data that have been collected. Additionally, each reagent, material, solution, objective, need to be rigorously referenced with reference number and supplier name.

      * The catalog numbers of special reagents have been added.

      Each software should also have the version specified and be correctly cited (ex: ImageJ software version 2.14.0/1.54f. and reference: Schneider, C. A., Rasband, W. S., & Eliceiri, K. W. (2012). NIH Image to ImageJ: 25 years of image analysis. Nature Methods, 9(7), 671-675) .

      * We have indicated the version number and included a reference to the Image J software in the revised version (line 136, 137)

      • Constructs, mRNA synthesis : Were the sequences validated ? If yes, how? Please explain.

      * The constructs were validated by sequencing. The mRNA synthesis was verified by running aliquots of the mRNA on agarose gels. Based on the signal on gel, the concentration was adjusted to ensure that equal amounts of mRNA of each Shp2 variant were injected at the one-cell stage.

      Microscopy : Precise references of the objectives that were used to capture images.

      * We included references to the objectives that were used in microscopy in the Materials and Methods section.

      • Quantification: Please specify how all quantifications were made. How figure 5e and 7e were collected?

      * In the legend to Fig. 5, we indicated how the data were quantified (line 772-774): “Quantification of the number of PL cells in the trunk at 54 hpf. The number of PL cells was counted in the trunk of 54 hpf embryos over the length of 10 somites and the average number of PL cells is depicted. The error bars indicate the standard deviation..” In the legend to Fig. 7 we have included a statement how the percentage of venous ISVs was determined (line 787-789): “The percentage of veins in siblings and double homozygous mutants was determined in the indicated number of embryos (n) and is depicted. The error bars indicate the standard error.”

      Statistical analysis: Specify how data are expressed (ex. Mean {plus minus} s.e.m). The authors have made a serious confusion in choosing the statical tests. Differences between the experimental groups should be evaluated with the use of the Mann-Whitney test only when two groups are compared. Differences between three or more experimental groups (your case in this paper) should be evaluated with the use of an analysis of variance test (ANOVA), followed by a Tukey-Kramer post hoc test when the results were significant (P* We use the Mann-Whitney test to compare the groups in pairs, i.e. the ptpn11a+/+ptpn11b-/- control group compared to ptpn11a+/-ptpn11b-/-, or compared to ptpn11a-/-ptpn11b-/- double knock-out. This is reflected in the brackets we use to indicate significance or the lack thereof between samples, e.g. Figure 4.

      Suggestions on additional supplemental figures :

      • Beginning of introduction gives an impression of a review article about vascular development in larvae, authors should shorten it and/or add a supplementary schematic to support this long description.

      * We try to be complete to help the reader understand the rest of the paper better.

      • Alignment of the different proteins of the study both in human and zebrafish to show homology

      * For an alignment of the Shp2a and Shp2b proteins with human SHP2, we refer to our previously published paper: Bonetti et al., 2014, PLoS One 9, e94884, doi:10.1371/journal.pone.0094884).

      Schematic of protein domains, binding domains and location of variants

      * This is an interesting suggestion, but for space reasons, we decided not to include such schematics.

      **Referee cross-commenting** No further comments

      Reviewer #3 (Significance (Required)):

      SHP2 is an adaptor protein that plays a critical role in regulating the RAS/MAPK signaling pathway. Dysregulation of this pathway has been implicated in various cancers and developmental disorders, including Noonan Syndrome. In this study, the authors demonstrate the essential function of Shp2 in physiological lymphatic development in zebrafish by examining multiple Shp2 mutant models. This promising manuscript, however, needs some adjustments and further clarifications.

      I believe the appropriate audience for this research is specialized - primarily scientists and researchers working in basic biomedical research, particularly in molecular biology, developmental biology, and signaling pathways. The study's focus on zebrafish models and the mechanistic role of Shp2 in lymphatic development positions it within the scope of fundamental biology rather than translational or clinical application, though it has relevance to both.

      As a member of a vascular malformations laboratory, my research focuses on advancing biomedical research through an integrative approach combining in vivo research, molecular biology, translational medicine, and public health. More specifically, my current work focuses on specific genes causing complex lymphatic anomalies and drug discovery using zebrafish models.

    1. Author response:

      General Statements

      First, we would like to thank the editor at Review Commons for the efficient handling of our manuscript. We also apologize for our delayed response.

      We would like to thank all three reviewers for their careful evaluation of our work and their constructive feedback, which will provide a valuable basis for improving the figures and the text, as described below. We expect to be able to complete the revision following the plan described below quickly.

      We would like to note that the reviewer reports (Rev. #1 and Rev. #3) made us realize that the manuscript text was misleading on the following point. Although we used the purified ATP hydrolysis–deficient Smc protein for sybody isolation, this does not restrict the selection to a specific conformation. As described in detail in Vazquez-Nunez et al. (Figure 5), this mutant displays the ATP-engaged conformation only in a smaller fraction of complexes (~25% in the presence of ATP and DNA), consistent with prior in vivo observations reported by Diebold-Durand et al. (Figure 5). Rather than limiting the selection to a particular configuration, our aim was to reduce the prevalence of the predominant rod state in order to broaden the range of conformations represented during sybody selection. Consistent with this interpretation, only a small number of isolated sybodies show strong conformation-specific binding in the presence or absence of ATP/DNA, as observed by ELISA (now included in the manuscript). We will revise the manuscript text accordingly to clarify this point.

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Gosselin et al., develop a method to target protein activity using synthetic single-domain nanobodies (sybodies). They screen a library of sybodies using ribosome/ phage display generated against bacillus Smc-ScpAB complex. Specifically, they use an ATP hydrolysis deficient mutant of SMC so as to identify sybodies that will potentially disrupt Smc-ScpAB activity. They next screen their library in vivo, using growth defects in rich media as a read-out for Smc activity perturbation. They identify 14 sybodies that mirror smc deletion phenotype including defective growth in fast-growth conditions, as well as chromosome segregation defects. The authors use a clever approach by making chimeras between bacillus and S. pnuemoniae Smc to narrow-down to specific regions within the bacillus Smc coiled-coil that are likely targets of the sybodies. Using ATPase assays, they find that the sybodies either impede DNA-stimulated ATP hydrolysis or hyperactivate ATP hydrolysis (even in the absence of DNA). The authors propose that the sybodies may likely be locking Smc-ScpAB in the "closed" or "open" state via interaction with the specific coiled-coil region on Smc. I have a few comments that the authors should consider:

      Major comments:

      (1) Lack of direct in vitro binding measurements:

      The authors do not provide measurements of sybody affinities, binding/ unbinding kinetics, stoichiometries with respect to Smc-ScpAB. Additionally, do the sybodies preferentially interact with Smc in ATP/ DNA-bound state? And, do the sybodies affect the interaction of ScpAB with SMC?

      It is understandable that such measurements for 14 sybodies is challenging, and not essential for this study. Nonetheless, it is informative to have biochemical characterization of sybody interaction with the Smc-ScpAB complex for at least 1-2 candidate sybodies described here.

      We agree with the reviewer that adding such data would be reassuring and that obtaining solid data using purified components is not easy even for a smaller selection of sybodies. We have data that show direct binding of Smc to sybodies by various methods including ELISA, pull-downs and by biophysical methods (GCI). Initially, we omitted these data from the manuscript as we are convinced that the mapping data obtained with chimeric SMC proteins is more definitive and relevant.  During the revision we will incorporate the ELISA data showing direct binding and also indicating a lack of preference for a specific state of Smc.

      (2) Many modes of sybody binding to Smc are plausible

      The authors provide an elaborate discussion of sybodies locking the Smc-ScpAB complex in open/ closed states. However, in the absence of structural support, the mechanistic inferences may need to be tempered. For example, is it also not possible for the sybodies to bind the inner interface of the coiled-coil, resulting in steric hinderance to coiled-coil interactions. It is also possible that sybody interaction disrupts ScpAB interaction (as data ruling this possibility out has not been provided). Thus, other potential mechanisms would be worth considering/ discussing. In this direction, did AlphaFold reveal any potential insights into putative binding locations?

      We have attempted to map the binding by structure prediction, however, so far, even the latest versions of AlphaFold are not able to clearly delineate the binding interface. Indeed, many ways of binding are possible, including disruption of ScpAB interaction. However, since the main binding site is located on the SMC coiled coils, the later scenario would likely be an indirect consequence of altered coiled coil configuration, consistent with our current interpretation.

      (3) Sybody expression in vivo

      Have the authors estimated sybody expression in vivo? Are they all expressed to similar levels?

      We have tagged selected sybodies with gfp and performed live cell imaging. This showed that they are all roughly equally expressed and that they localize as foci in the cell presumably by binding to Smc complexes loaded onto the chromosome at ParB/parS sites. We will include this data in the revised version of the manuscript.

      (4) Sybodies should phenocopy ATP hydrolysis mutant of Smc

      The sybodies were screened against an ATP hydrolysis deficient mutant of Smc, with the rationale that these sybodies would interfere this step of the Smc duty cycle. Does the expression of the sybodies in vivo phenocopy the ATP hydrolysis deficient mutant of Smc? Could the authors consider any phenotypic read-outs that can indicate whether the sybody action results in an smc-null effect or specifically an ATP hydrolysis deficient effect?

      As eluded to above, we think that our selection gave rise to sybodies that bind various, possibly multiple Smc conformations. Consistent with this idea, the phenotypes are similar to null mutant rather than the ATP-hydrolysis defective EQ mutant, which display even more severe growth phenotypes. We will add the following notes to the text:

      “These conditions favour ATP-engaged particles alongside the typically predominant ATP-disengaged rod-shaped state (add Vazquez Nunez et al., 2021).”

      “ELISA data confirm that nearly all clones bind Smc-ScpAB; however, their binding shows little or no dependence on the presence of ATP or DNA.”

      Minor comments:

      (1) It was surprising that no sybodies were found that could target both bacillus and spneu Smc. For example, sybodies targeting the head regions of Smc that might work in a more universal manner. Could the authors comment on the coverage of the sybodies across the protein structure?

      It is rather common that sybodies (like antibodies and nanobodies) exhibit strong affinity differences between highly conserved proteins (> 90 % identity). The underlying reasons for such strong discrimination are i) location of less conserved residues primarily at the target protein surface and ii) the large interaction interface between sybody and target which offers multiple vulnerabilities for disturbance, in particular through bulky side chains resulting in steric clashes. Another frequently observed phenomenon is sybody binding to a dominant epitope, which also often applies to nanobodies and antibodies. A great example for this are the dominant epitopes on SARS-CoV-2 RBDs.

      (2) Growth curves (Fig. S3) show a large jump in recovery in growth under sybody induction conditions. Could the authors address this observation here and in the text?

      We suppose that this recovery represents suppressor mutants and/or (more likely) improved growth in the absence of functional Smc during nutrient limitation (see Gruber et al., 2013 and Wang et al., 2013). We will add this statement to the text.

      (3) L41- Sentence correction: Loop can be removed.

      Ah, yes, sorry for this confusing error. Thank you.

      (4) L525 - bsuSmc 'E' :extra E can be removed.

      To do. Thank you.

      (5) References need to be properly formatted.

      To do. Thank you.

      (6) The authors should add in figure legend for Fig 1i) details on representation of the purple region, and explain the grey strokes for orientation of the loop.

      To do.

      (7) How many cells were analysed in the cell biological assays? Legends should include these information.

      To Be Included.

      Reviewer #1 (Significance):

      Overall, this is an impressive study that uses an elegant strategy to find inhibitors of protein activity in vivo. The manuscript is clearly written and the experiments are logical and well-designed. The findings from the study will be significant to the broad field of genome biology, synthetic biology and also SMC biology. Specifically, the coiled coil domain of SMC proteins have been proposed to be of high functional value. The authors have elegantly identified key coiled-coil regions that may be important for function, and parallelly exhibited potential of the use of synthetic sybody/designed binders for inhibition of protein activity.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Review: "Single Domain Antibody Inhibitors Target the Coiled Coil Arms of the Bacillus subtilis SMC complex" by Ophélie Gosselin et al, Review Commons RC-2025-03280 Structural Maintenance of Chromosome proteins (SMCs), a family of proteins found in almost all organisms, are organizers of DNA. They accomplish this by a process known as loop extrusion, wherein double-stranded DNA is actively reeled in and extruded into loops. Although SMCs are known to have several DNA binding regions, the exact mechanism by which they facilitate loop extrusion is not understood but is believed to entail large conformational changes. There are currently several models for loop extrusion, including one wherein the coiled coil (CC) arms open, but there is a lack of insightful experimentation and analysis to confirm any of these models. The work presented aims to provide much-needed new tools to investigate these questions: conformation-selective sybodies (synthetic nanobodies) that are likely to alter the CC opening and closing reactions.

      The authors produced, isolated, and expressed sybodies that specifically bound to Bacillus subtilis Smc-ScpAB. Using chimeric Smc constructs, where the coiled coils were partly replaced with the corresponding sequences from Streptococcus pneumoniae, the authors revealed that the isolated sybodies all targeted the same 4N CC element of the Smc arms. This region is likely disrupted by the sybodies either by stopping the arms from opening (correctly) or forcing them to stay open (enough). Disrupting these functional elements is suggested to cause the Smc-dependent chromosome organization lethal phenotype, implying that arm opening and closing is a key regulatory feature of bacterial Smc-ScpAB.

      In summary, the authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Some specific comments:

      Line 75: "likely stabilizing otherwise rare intermediates of the conformational cycle." - sorry, why is that being concluded? Why not stabilizing longer-lived oncformations?

      We will clarify this statement!

      Line 89: Sorry, possibly our lack of understanding: why first ribosome and then phage display?

      Ribosome display offers to screen around 10^12 sybodies per selection round (technically unrestricted library size), while for phage display, the library size is restricted to around 10^9 sybodies due to the fact that production of a phage library requires transformation of the phagemid plasmid into E. coli, thereby introducing a diversity bottleneck. This is why the sybody platform starts off with ribosome display. It switches to phage display from round 2 onwards because the output of the initial round of ribosome display is around 10^6 sybodies, which can be easily transferred into the phage display format. Phage display is used to minimize selection biases. For more information, please consult the original sybody paper (PMID: 29792401).

      Line 100: Why was only lethality selected? Less severe phenotypes not clear enough?

      Yes, colony size is more difficult to score robustly, as the sizes of individual transformant colonies can vary quite widely. The number of isolated sybodies was at the limit of further analysis.

      Line 106: Could it be tested somehow if convex and concave library sybodies fold in Bs?

      We did not focus on the non-functional sybody candidates and only sybodies of the loop library turned out to cause functional consequences at the cellular level. Notably, we will include gfp-imaging showing that non-lethal sybodies are expressed to similar levels that toxic sybodies. Given the identical scaffold of concave and loop sybodies (they only differ in their CDR3 length), we expect that the concave sybodies fold in the cytoplasm of B. subtilis. For the convex sybodies exhibiting a different scaffold, this will be tested.

      Line 125: Could Pxyl be repressed by glucose?

      To our knowledge and experience, repression by glucose (catabolite repression) does not work well in this context in B. subtilis.

      Line 131: The SMC replacement strain is a cool experiment and removes a lot of doubts!

      Thank you! (we agree).

      Line 141: The mapping is good and looks reliable, but looks and feels like a tour de force? Of course, some cryo-EM would have been lovely (lines 228-229 understood, it has been tried!).

      Yes, we have made several attempts at structural biology. Unfortunately, Smc-ScpAB is not well suited for cryo-EM in our hands and crystallography with Smc fragments and sybodies did not yield well-diffracting crystals.

      Line 179: Mmmh. Do we not assume DNA binding on top of the dimerised heads to open the CC (clamp)?

      We will clarify the text here.

      Line 187: Having sybodies that presumably keep the CC together (closing) and some that do not allow them to come together correctly (opening) is really cool and probably important going forward.

      Thank you!

      Figure 1 Ai is not very colour-blind friendly.

      We are sorry for this oversight. We will try to make the color scheme more inclusive. Thank you for the notification.

      Optional: did the authors see any spontaneous mutations emerge that bypass the lethal phenotype of sybody expression?

      No, we did not observe spontaneous mutations suppressing the phenotype, possibly due to the limited number of cell generations observed. We tried to avoid suppressors by limiting growth, but this may indeed be a good future approach for further fine map the binding sites and to obtain insights into the mechanism of inhibition.

      Optional: we think it would be nice to try some biochemical experiment with BMOE/cysteine-crosslinked B. subtilis Smc in the mid-region (4N or next to it) of the Smc coiled coils to try to further strengthen the story. Some of the authors are experts in this technique and strains might already exist?

      We have indeed tried to study the impact of sybody binding on Smc conformation by cysteine cross-linking. However, we were not convinced by the results and thus prefer not to draw any conclusions from them. We will add a corresponding note to the text.

      Reviewer #2 (Significance):

      The authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes.

      Thank you!

      Reviewer #3 (Evidence, reproducibility and clarity):

      Gosselin et al. use the sybody technology to study effects of in vivo inhibition oft he Bacillus subtilis SMC complex. Smc proteins are central DNA binding elements of several complexes that are vital for chromosome dynamics in almost all organisms. Sybodies are selected from three different libraries of the single domain antibodies, using the „transition state" mutant Smc. They identify 14 such mutant sybodies that are lethal when expressed in vivo, because they prevent proper function of Smc. The authors present evidence suggesting that all obtained sybodies bind to a coiled-coil region close to the Smc „neck", and thereby interfere with the Smc activity cycle, as evidenced by defective ATPase activity when Smc is bound to DNA.

      The study is well done and presented and shows that the strategy is very potent in finding a means to quickly turn off a protein's function in vivo, much quicker than depleting the protein.

      The authors also draw conclusions on the molecular mode of action of the SMC complex. The provide a number of suggestive experiments, but in my view mostly indirect evidence for such mechanism.

      My main criticism ist hat the authors have used a single - and catalytically trapped form of SMC. They speculate why they only obtain sybodies from one library, and then only idenfity sybodies that bind to a rather small part oft he large Smc protein. While the approach is definitely valuable, it is biassed towards sybodies that bind to Smc in a quite special way, it seems. Using wild type Smc would be interesting, to make more robust statements about the action of sybodies potentially binding to different parts of Smc.

      As explained above, we are quite confident the Smc ATPase mutation did not bias the selection in an obvious way. The surprising bias towards coiled coil binding sites has likely other explanations, as they likely form a preferred epitope recognized by sybodies.

      Line 105: Alternatively, the other libraries did not produce good binders or these sybodies were 106 not stably expressed in B. subtilis. This could be tested using Western blotting - I am assuming sybody antibodies are commercially available. However, this test is not important for the overall study, it would just clarify a minor point.

      While there are antibody fragments available to augment the size of sybodies (PMID: 40108246), these recognize 3D-epitopes and are thus not suited for Western blotting. We did not follow up on the negative results much, but would like to point out again that there are several biases that likely emerge for the same reason (bias to library, bias to coiled coil binding site). If correct, then likely few other sybodies are effectively lethal in B. subtilis, with the exception of the ones isolated and characterized. We have added this notion to the manuscript. We have also tested the expression of non-lethal sybodies by gfp-tagging and imaging. These results will be included in the revision.

      Fig. 2B: is is odd to count Spo0J foci per cells, as it is clear from the images that several origins must be present within the fluorescent foci. I am fine with the „counting" method, as the images show there is a clear segregation defect when sybodies are expressed, I believe the authors should state, though, that this is not a replication block, but failure to segregate origins.

      We agree that this is an important point and will add a corresponding comment to the text.

      Testing binding sites of sybodies tot he SMC complex is done in an indirect manner, by using chimeric Smc constructs. I am surprised why the authors have not used in vitro crosslinking: the authors can purify Smc, and mass spectrometry analyses would identify sites where sybodies are crosslinked to Smc. Again, I am fine with the indirect method, but the authors make quite concrete statements on binding based on non-inhibition of chimeric Smc; I can see alternative explanations why a chimera may not be targeted.

      We have made several attempts of testing direct binding with mixed outcomes and decided to not include those results in the light of the stronger and more relevant in vivo mapping. However, we will add ELISA results and briefly discuss grating coupled interferometry (GCI) data and pull-downs.

      Smc-disrupting sybodies affect the ATPase activity in one of two ways. Again, rather indirect experiments. This leads to the point Revealing Smc arm dynamics through synthetic binders in the discussion. The authors are quite careful in stating that their experiments are suggestive for a certain mode of action of Smc, which is warranted.

      In line 245, they state More broadly, the study demonstrates how synthetic binders can trap, stabilize, or block transient conformations of active chromatin-associated machines, providing a powerful means to probe their mechanisms in living cells. This is off course a possible scenario for the use of sybodies, but the study does not really trap Smc in a transient conformation, at least this is not clearly shown.

      We agree and will carefully rephrase this statement. Thank you.

      Overall, it is an interesting study, with a well-presented novel technology, and a limited gain of knowledge on SMC proteins.

      We respectfully disagree with the last point, since our unique results highlight the importance of the Smc coiled coils, which are otherwise largely neglected in the SMC literature, likely (at least in part) due the mild effect of single point mutations on coiled coil dynamics.

      Reviewer #3 (Significance):

      The work describes the gaining and use of single-binder antibodies (sybodies) to interfere with the function of proteins in bacteria. Using this technology for the SMC complex, the authors demonstrate that they can obtain a significant of binders that target a defined region is SMC and thereby interfere with the ATPase cycle.

      The study does not present a strong gain of knowledge of the mode of action of the SMC complex.

      As pointed out above, we respectfully disagree with this assertion.

      Description of analyses that authors prefer not to carry out

      As pointed out above, there are a few minor points that we prefer not to experimentally address. In particular, we do not consider it as necessary to determine the expression levels of sybodies which were non-inhibitory. We also wish to note that we attempted to obtain structural additional biochemical data and to that end performed cryo-EM, crystallography and cysteine cross-linking experiments. Unfortunately, we did not obtain sybody complex structures and the cross-linking data were unfortunately not conclusive.  We also wish to note that the first author has finished her PhD and left the lab, which limits our capacity to add additional experiments. However, as the reviewers also pointed out, the main conclusions are well supported by the data already.

    1. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tkacik et al describe their efforts to reconstitute and biochemically characterize ARAF, BRAF, and CRAF proteins and measure their ability to be paradoxically activated by current clinical and preclinical RAF inhibitors. Paradoxical activation of MAPK signaling is a major clinical problem plaguing current RAF inhibitors, and the mechanisms are complex and relatively poorly understood. The authors utilize their preparations of purified ARAF, BRAF, and CRAF kinase domains to measure paradoxical activation by type I and type II inhibitors, utilizing MEK protein as the substrate, and show that CRAF is activated in a similar fashion to BRAF, whereas ARAF appears resistant to activation. These data are analyzed using a simple cooperativity model with the goal of testing whether paradoxical activation involves negative cooperativity between RAF dimer binding sites, as has been previously reported. The authors conclude that it does not. They also test activation of B- and CRAF isoforms prepared in their full-length autoinhibited states and show that under the conditions of their assays, activation by inhibitors is not observed. In a particularly noteworthy part of the paper, the authors show that mutation of the N-terminal acidic (NtA) motif of ARAF and CRAF to match that of BRAF enhances paradoxical activation of CRAF and dramatically restores paradoxical activation of ARAF, which is not activated at all in its WT form, indicating a clear role for the NtA motif in the paradoxical activation mechanism. Additional experiments use mass photometry to measure BRAF dimer induction by inhibitors. The mass photometry measurements are a relatively novel way of achieving this, and the results are qualitatively consistent with previous studies that tracked BRAF dimerization in response to inhibitors using other methods. Overall, the paper establishes that WT CRAF is paradoxically activated by the same inhibitors that activate BRAF, and that ARAF contains the latent potential for activation that appears to be controlled by its NtA motif. The biochemical activation data for BRAF are qualitatively consistent with previous work.

      Strengths:

      While previous studies have put forward detailed molecular mechanisms for paradoxical activation of BRAF, comparatively little is known about the degree to which ARAF and CRAF are prone to this problem, and relatively little biochemical data of any sort are available for ARAF. Seen in this light, the current work should be considered of substantial potential significance for the RAF signaling field and for efforts to understand paradoxical activation and design new inhibitors that avoid it.

      Weaknesses:

      There are, unfortunately, some significant flaws in the data analysis and fitting of the RAF activation data that render the primary conclusion of the paper about the detailed activation mechanism, namely that it does not involve negative cooperativity between active sites, unjustified. This claim is made repeatedly throughout the manuscript, including in the title. Unfortunately, their data analysis approach is overly simplistic and does not probe this question thoroughly. This is the primary weakness of the study and should be addressed. A full biochemical modeling approach that accurately captures what is happening in the experiment needs to be applied in order for detailed inferences to be drawn about the mechanism beyond just the observation of activation.

      The authors' analysis of their RAF:MEK "monomer" paradoxical activation data (Figures 1, 3, and Tables 1, 2) suffers from two fundamental flaws that render the resulting AC50/IC50 and cooperativity (Hill) parameters essentially uninterpretable. Without explaining or justifying their choice, the authors use a two-phase cooperative binding model from GraphPad Prism to fit their activation/inhibition data. This model is intended to describe cooperative ligand binding to multiple coupled sites within a preformed receptor assembly, and does not provide an adequate description of what is happening in this complicated experiment. Specifically, it has two fundamental flaws when applied to the analysis in question:

      (a) It does not account for ligand depletion effects that occur with high-affinity drugs, and that profoundly affect the shapes of the dose-response curves, which are what are being fit 

      The chosen model is one of a class of ligand-binding models that are derived by assuming that the free ligand concentration is effectively equal to the total ligand concentration. Under these conditions, binding curves have a characteristic steepness, and the presence of cooperativity can be inferred from changes in this steepness as described by a Hill coefficient. However, many RAF inhibitors, including most of the type II inhibitors in this study, bind to the dimerized forms of at least one of the RAF isoforms with ultra-high affinity in the picomolar range (particularly apparent in Figure 1 with LY inhibiting BRAF). Under these conditions, the model assumption is not valid. Instead, binding occurs in the high-affinity regime in which the drug titrates the receptor and effectively all the added drug molecules bind, so there is hardly any free ligand (see e.g. Jarmoskaite and Herschlag eLife 2020 for a full description of this "titration" regime). The shapes of the curves under these conditions reflect the total amount of RAF protein (and to some extent drug affinity), rather than the presence of cooperativity. Fitting dose response curves with the chosen model under these conditions will result in conflating binding affinity and protein concentration with cooperativity.

      (b) It does not model the RAF monomer-dimer equilibrium, which is dramatically modulated by drug binding, rendering the results RAF-concentration dependent in a manner not accounted for by the analysis.

      The chosen analysis model also fails to consider the monomer-dimer equilibrium of RAF. This has two ramifications. Since drug binding is coupled to dimerization to a very strong degree, the observed apparent affinities of drug binding (reflected in AC50 and IC50 values) are functions of the concentration of RAF molecules used in the experiment. Since dimerization affinities are likely different for ARAF, BRAF, and CRAF, the measured AC50 values also cannot be compared between isoforms. This concentration dependence is not addressed by the authors. A related issue is that the model assumes drug binding occurs to two coupled sites on preformed dimers, not to a mixture of monomers and dimers. "Cooperativity" parameters determined in this manner will reflect the shifting monomer-dimer equilibrium rather than the cooperativity within dimers. Additionally, the inhibition side of the activation/inhibition curves is driven by binding of the drug to the single remaining site on the dimer, not to two coupled sites, and so one cannot determine cooperativity values for this process in this manner.

      As a result of both of these issues, the parameters reported in the tables do not correctly reflect cooperativity and cannot be used to infer the presence or absence of negative cooperativity between RAF dimer subunits. To address these major issues, the authors would need to apply a data analysis/fitting procedure that correctly models the biochemical interactions occurring in the sample, including both the monomer-dimer equilibrium and how this equilibrium is coupled to drug binding, such as that developed in e.g., Kholodenko Cell Reports 2015. Alternatively, the authors should remove the statements claiming a lack of negative cooperativity from the manuscript and alter the title to reflect this.

      The bell-shaped dose response model that we employed models the sum of two dose-response curves – one that activates and one that inhibits. That is a simple way of capturing the essence of paradoxical activation -- the superposition of drug-induced activation at low inhibitor concentrations with inhibition at higher concentrations. That said, we agree completely with the reviewer that the model does not capture the complexity of what is happening in the experiment. We worked extensively with the Kholodenko model (which we implemented in Kintek Explorer), which accounts for the effect of drug on the monomer/dimer equilibrium and for the affinity of drug for each protomer of a dimer (and can therefore model positive or negative cooperativity as well as non-cooperative binding). We could obtain excellent fits with this model with positive cooperativity – perhaps not surprising considering that this is a 12 parameter model – with reasonable Kd values for drug binding and monomer/dimer equilibrium. However, we ultimately chose not to include this analysis when we realized that the fits were not at steady-state. The underlying Kon and Koff rates for the reasonable Kd’s for monomer/dimer formation were unreasonably slow. We could also obtain superficially reasonable fits with negative or non-cooperative binding, but close inspection revealed that they did not accurately fit the steepness of the inhibition phase of the dose-response curves for type II inhibitors. Even the Kholodenko model does not capture all the key aspects of our experiment. Perhaps most notably competition with ATP, the effect of ATP on the monomer dimer equilibrium, and the divergent conformations of the kinase required for binding ATP vs a type II inhibitor. We put some effort into explicitly including ATP in the model, but quickly decided that it was beyond our modeling expertise (and it also was not feasible to implement in Kintek explorer). In the end, we settled on the bell-shaped dose-response model because it was the simplest model that fit the data. We expect to include a supplemental figure/note in the revised manuscript to discuss our work with the Kholodenko model. We will also acknowledge the limitations of the bell-shaped dose response model.

      This reviewer is also concerned that the steepness of the inhibition phase of the curves may be the result of enzyme-titration with these tight-binding inhibitors, rather than a result of positive cooperativity. We are reasonably sure that this is not the case. The shape of these curves and the IC50/AC50 values obtained is relatively insensitive to enzyme concentration, and we will include additional data in our revision to demonstrate this. Also, the steep hill slopes are unique to the type II inhibitors, which require a distinct inactive conformation of the kinase. Type I inhibitor SB590885 is similarly potent to the type II inhibitors, but does not exhibit this effect. If we were simply titrating enzyme, we would expect to see this with SB590885 as well.

      Also, we will clarify in the revised manuscript that our interpretation of positive cooperativity of inhibition by type II inhibitors is also supported by our prior work with 14-3-3-bound RAF dimers (Tkacik et al, JBC 2025). This is a much simpler experiment, as dimers are pre-formed. We have now done a thorough study of the effect of enzyme concentration on the IC<sub>50</sub> and apparent cooperativity in dimer inhibition, which we will include in our revised manuscript. These experiments confirm that we are not in a regime where we are titrating enzyme.

      As an aside, with respect to models that incorporate free inhibitor concentration, we did try to fit our 14-3-3-bound dimer inhibition data (in Tkacik et al, JBC 2025) with the Morrison equation for tight-binding inhibitors, which does take into account free ligand concentration. The fits were not reasonable with type II inhibitors, at least in part due to the non-ATP-competitive behavior of the type II drugs. Also the Morrison equation does not model cooperativity.

      Some other points to consider

      (1) The observation that ARAF is not activated by type II inhibitors is interesting. A detailed comparison of the activation magnitudes between inhibitors and between A-, B-, and CRAF is hampered by the arbitrary baseline signal in the assay, which arises from a non-zero FRET ratio in the absence of any RAF activity. The authors might consider background correcting their data using a calibration curve constructed using MEK samples of known degrees of phosphorylation, so that they can calculate turnover numbers and fold activation values rather than an increase over baseline. This will likely reveal that the activation effects are more substantial than they appear against the high background signal.

      We will explore this for our revision.

      (2) The authors note that full-length autoinhibited 14-3-3-bound RAF monomers are not activated by type I and II inhibitors. However, since this process involves the formation of a RAF dimer from two monomers, the process would also be expected to be concentration dependent, and the authors have only investigated this at a single protein concentration. Since disassembly of the autoinhibited state must also occur before dimerization, it might be expected to be kinetically disfavored as well. Have the authors tested this?

      Good points. We have carried out this experiment at more than one enzyme concentration and differing reaction times, and also failed to see activation. However, we have not systematically explored either variable.

      (3) ATP concentration modulates activation. While this is an interesting observation, some of this analysis suffers from the same issue discussed above, of not considering high-affinity binding effects. For instance, LY is not affected by ATP concentration in their data (Figure 4D), but this is easily explained as being due to its very tight binding affinity, resulting in titration of the receptor and the shape of the inhibition curve reflecting the amount of RAF kinase in the experiment and not the effective Kd or IC50 value.

      As discussed above, we’ve convinced ourselves that we are not simply titrating enzyme. It occurred to us that such an effect could explain both the steepness of the inhibition curves with LY and other type II inhibitors and the apparent ATP-insensitivity. Our studies of concentration-dependence and the correlation of this effect with the type II binding mode argue against this possibility.

      Finally, as an overarching comment to this Reviewer and the others, we understand well that our enzyme inhibition studies (here and in Tkacik 2025) do not rise to the level of a formal demonstration of cooperative ligand binding. We envision a future study in which we could address this directly, perhaps by using single molecule fluorescence to observe on/off rates for binding of fluorescently tagged inhibitors to immobilized RAF dimers. (This is clearly beyond the scope of the present work).

      Reviewer #2 (Public review):

      This manuscript by Tkacik et al. uses in vitro reconstituted systems to examine paradoxical activation across RAF isoforms and inhibitor classes. The authors conclude that paradoxical activation can be explained without invoking negative allostery and propose a general model in which ATP displacement from an "open monomer" promotes dimerization and activation. The biochemical work is technically sound, and the systematic comparison across RAF paralogs (along with mutational/functional analysis) across inhibitor classes is a strength.

      However, the central mechanistic conclusions are overgeneralized relative to the experimental systems, and several key claims, particularly the dismissal of negative allostery and the proposed unifying model in Figure 6, are not directly supported by the data presented. Most importantly, the absence of RAS, membranes, and relevant regulatory context fundamentally limits the physiological relevance of several conclusions, especially regarding the current clinical type I.5 RAF inhibitors and paradoxical activation.

      Overall, this is a potentially valuable biochemical study, but the manuscript would benefit from more restrained interpretation, clearer framing of scope, and revisions to the model and title to better reflect what is actually tested.

      (1) A central issue is that the biochemical system lacks RAS, membranes, 14-3-3 and endogenous regulatory factors that are known to be required for paradoxical RAF and MAPK activation in cells. As previous work has repeatedly shown and the authors also acknowledge, paradoxical activation by RAF inhibitors is RAS-dependent in cells, and this dependence presumably explains why full-length autoinhibited RAF complexes are refractory to activation in the authors' assays.

      Importantly, the absence of paradoxical activation by type I.5 inhibitors in this system is therefore not mechanistically informative. Type I.5 inhibitors (e.g., vemurafenib, dabrafenib, encorafenib), but not Paradox Breakers (e.g., plixorafenib), robustly induce paradoxical activation in cells because binding of the inhibitor to inactive cytosolic RAF monomer promotes a conformational change that drives RAF recruitment to RAS in the membrane, promoting dimerization. The inability of the type 1.5 inhibitor to suppress the newly formed dimers is the basis of the pronounced paradoxical activation in cells. In the absence of RAS and membrane recruitment, failure to observe paradoxical activation in vitro does not distinguish between competing mechanistic models.

      As a result, conclusions regarding inhibitor class differences, and especially the generality of the proposed model, should be substantially tempered.

      We will emphasize the limitations of our highly simplified experimental system in the revised manuscript, and temper some of our interpretations. And while the lack of membranes/RAS/14-3-3 in our system and the lack of observed PA with type I.5 inhibitors is a limitation of our study, we disagree that it renders our study of type I.5 inhibitors mechanistically uninformative. As seen here and consistent with prior studies, the binding mode of these compounds disfavors formation of the kinase dimer. While this may be overcome by 14-3-3 binding and other effects in the cellular context, it reflects a fundamental mechanistic difference as compared with type I and type II inhibitors, which also exhibit paradoxical activation.

      (2) The authors argue that their data argue against negative allostery as a central feature of paradoxical activation. However, the presented data do not directly test negative allostery, nor do they exclude it. The biochemical assays do not recreate the cellular context in which negative allostery has been inferred. Further, structural data showing asymmetric inhibitor occupancy in RAF dimers cannot be dismissed on the basis of alternative symmetric structures alone, particularly given the dynamic nature of RAF dimers in cells.

      Most importantly, negative allostery was proposed to explain paradoxical activation by Type I.5 RAF inhibitors, yet these inhibitors do not paradoxically activate in the assays presented here. The absence of paradoxical activation in this system, therefore, cannot be used to argue against a mechanism that is specifically invoked to explain cellular behavior not recapitulated by the assay.

      To be clear, we are not dismissing the possibility of negative cooperativity. And we do not think of our model as an alternative to the negative cooperativity model – rather it is a generalization that can account for paradoxical activation by diverse inhibitor classes, irrespective of positive, negative or non-cooperative modes of inhibition. We will emphasize these points in the revised manuscript.

      If negative allostery were a requisite feature of PA, we would not expect to see PA with type II inhibitors. As discussed in our response to Reviewer 1, we see clear evidence of positively cooperative inhibition of 14-3-3-bound RAF dimers by type II inhibitors (Tkacik JBC 2025) and in the present study, we find clear paradoxical activation by type II inhibitors (and there are many reports in the literature of PA by type II inhibitors in cellular contexts).

      (3) The model presented in Figure 6 is conceptually possible but remains speculative. Key elements of the model, including RAS engagement, membrane recruitment, 14-3-3 rearrangements, and the involvement of cellular kinases and phosphatases, are explicitly absent from the experimental system. Accordingly, the model is not tested by the data presented and should not be framed as a validated or general mechanism. The figure and accompanying text should be clearly labeled as a working or conceptual model rather than a mechanistically supported conclusion.

      We will revise the text to more clearly reflect that this is a working model, and importantly, that it is based on a large literature in this area in addition to the relevant experimental work in this manuscript.

      (4) The manuscript states that type I.5 inhibitors do not induce paradoxical activation in the biochemical assay because their C-helix-out binding mode disfavors dimerization. While this is true in isolation, it overlooks the well-established fact that type I.5 inhibitors (with the exception of paradox breakers) clearly promote RAS-dependent RAF dimerization in cells. This distinction is critical and should be explicitly acknowledged when interpreting the in vitro findings.

      We will explicitly make this point in the revised manuscript.

      (5) The title suggests a general mechanism for paradoxical activation across RAF isoforms and inhibitor classes, whereas the data primarily address type I and type II inhibitors acting on isolated kinase-domain monomers. A more accurate framing would avoid the term "general" and confine the conclusions to C-helix-in (type I/II) RAF inhibitors in a reduced biochemical context.

      As noted above, and in our response to Reviewer 3 below, we will clarify the contribution of data in present manuscript to the model and that it is based more broadly on the literature on PA and our insights into RAF structure and regulation. We will also revise the title to avoid the implication that the model arises mainly from the experimental data in the manuscript.

      Reviewer #3 (Public review):

      Summary:

      Tkacik et al. systematically characterized all three RAF kinase isoforms in vitro with all three types of RAF inhibitors (Type I, I1/2, and II) to investigate the mechanism underlying paradoxical activation.

      In this study, the authors reconstituted heterodimers of A-, B-, and C-RAF kinase domains bound to non-phosphorylable MEK1 (SASA), mimicking the monomeric auto-inhibited state of RAF. These "RAF monomers" were tested for MEK phosphorylation with an increasing concentration of all three types of RAF inhibitors (Type I, I1/2, and II). This study is reminiscent of a previous study of the same team measuring RAF kinase activity in the presence of all three types of inhibitors in the context of dimeric RAF isoforms stabilized by 14-3-3 proteins (Tkacik et al 2025 JBC). RAF monomers had little to no activity at low concentrations of inhibitors (consistent with their "monomeric state"). Addition of type I1/2 inhibitor did not induce paradoxical activation as, in this context, they do not induce RAF dimerization required for activation, as observed by MP. Addition of type I and type II inhibitors led to paradoxical activation consistent with the RAF dimerization induced by these inhibitors, as observed by MP. Interestingly, type II inhibitors induced activation only for B- and C-RAF and not A-RAF.

      At high concentrations of type II inhibitors, kinase activity is inhibited with a strong or weak positive cooperativity for BRAF and CRAF, respectively. This observation is very similar to what the authors previously observed with their dimeric RAF system. Interestingly, when the NtA motif is modified by phosphomimetic mutations in A- and C-Raf, basal kinase activity is stronger, but most importantly, inhibitor-induced paradoxical activation is much stronger with both type I and II inhibitors. This demonstrates that mutation of the NtA motif of ARAF and CRAF sensitized them to paradoxical activation by type II inhibitors.

      The authors also tested the effect of ATP in the paradoxical activation observed in their RAF "monomer" system. As previously published in their assay with 14-3-3 stabilized dimeric RAF, the authors observed an expected shift of the IC50 with Type I inhibitors, while Type II inhibitors seem to behave as a non-competitive inhibitor. The authors next reconstituted the MAP kinase pathway (with RAF monomers at the top of the phosphorylation cascade) to test paradoxical activation amplification. Again, Type I1/2 inhibitors did not induce paradoxical activation, while Type I and II inhibitors did. The authors tested the inhibitors with FL auto-inhibited RAF/MEK/14-3-3 complexes, where, contrary to the "RAF monomers" experiments, FL B- and C-RAF were not paradoxically activated but were inhibited by all three types of inhibitors.

      Overall, Tkacik et al. tackle an important question in the field for which definitive experiments and thorough biochemical investigation to understand the molecular mechanisms for the inhibitor-induced paradoxical activation are still missing, and of high importance for future drug development.

      Strengths:

      The biochemical experiments here are rigorously executed, and the results obtained are highly informative in the field to decipher the intricate mechanisms of RAF activation and inhibitor-induced paradoxical activation.

      Weaknesses:

      The interpretation of the results in the context of the current state of the art is ambiguous and raises questions about the relevance of introducing a new model for inhibitor-induced paradoxical activation, particularly since the findings presented here do not clearly contradict established paradigms. I believe some clarification and precision are required.

      While our model does not conflict with established paradigms (because it can allow for negative cooperativity) our experimental findings (here and in Tkacik et al JBC 2025) are in conflict with the negative allostery model. We will work to clarify this in the revised manuscript.

      Main comments:

      (1) Figure 2:

      The authors comment on the expected greater increase (for a cascade assay) in the magnitude of ERK phosphorylation compared to what was observed for MEK phosphorylation. However, this observation might be reflective of the stoichiometries used in the assay, with 40 times more MEK compared to RAF concentration (250nm vs 6nM), which might favour pERK vs pMEK.

      The authors should clarify their rationale for the protein concentration used in this assay and explain how protein stoichiometry was taken into account for the interpretation of their results.

      The Reviewer makes a good point, the concentrations and ratios chosen are expected to make a substantial difference in observed amplification. We intended this experiment more as a qualitative demonstration of cascade amplification and will clarify this in the revised manuscript.

      In addition, the authors should justify comparing pMEK and pERK TR-FRET values when different anti-phospho antibodies were used. Antibodies may have distinct binding affinities for their epitopes. Could this not lead to differences in FRET signal amplitudes that complicate direct comparison?

      Also a good point, we will note this limitation in the revised manuscript.

      (2) Supplementary Figure 2:

      The author mentioned that the inhibitors did not activate the FL auto-inhibited RAF complexes; however, they did inhibit the TR-FRET signal.

      Can the authors comment on the origin of the observed basal activity? Would the authors expect self-release of the RAF kinase protein from the auto-inhibited state in the absence of RAS, leading to dimerization and activation? Alternatively, do the inhibitors at low-concentration relieve the auto-inhibited state, thereby driving dimerization and activation?

      We think that the baseline activity that is being inhibited is due to low concentrations of active dimer in our autoinhibited state preparations.

      Did the author test the addition of RAS protein in their in vitro system to determine whether "soluble" RAS is sufficient to release the protective interactions with RBD/CRD/14-3-3 and lead to inhibitor-induced paradoxical activation of FL RAF?

      We did not, but we’ve thought about it. We expect that soluble RAS would not be activating. We have previously carried our extensive studies of BRAF activation by soluble vs. farnesylated RAS in a membrane environment (liposomes) and observed partial activation in the latter (Park et al, Nature Communications 2023).

      (3) Figure 5B:

      The authors said that the Kd values obtained from their MP assay are consistent with prior studies of RAF homodimerization and RAF:MEK heterodimerization. While this is true from the previous studies of RAF:MEK interaction by BLI (performed from the same team), the Kd of isolated RAF kinase homodimerization has been measured around ~30µM by AUC in the cited ref (24,27 & 37).

      The authors should discuss the discrepancy between their Kd of homodimerization and the reported Kd values in the literature. At the concentration used for MP, it is surprising to observe RAF dimerization while the Kd of homodimerization has been measured at ~30µM (in the absence of MEK).

      We will cite/discuss these differences in our revised manuscript.

      Would the authors expect the presence of MEK to influence the homodimerization affinity for the isolated KD?

      Perhaps, but likely only modestly. We do not think this explains the discrepancy noted above.

      (4) Conclusions:

      Several times in the introduction and the conclusion, the authors suggest that the negative allostery model (where "inhibitor binding to one protomer of the dimer promotes an active but inhibitor-resistant conformation in the other") is a model that applies to all types of RAF inhibitors (I, I1/2, and II).

      However, from my understanding and all the references cited by the authors, this model only applies to type I1/2 inhibitors, where indeed the aC IN conformation in the second (inhibitor-free) protomer of the RAF dimer might be incompatible with the type I1/2 inhibitors inducing aC OUT conformation. The type I and type II inhibitors are aC IN inhibitors and are expected to bind both protomers from RAF dimers with similar affinities. Therefore, the negative allostery model does not apply to the type I and type II inhibitors. The difference in the mechanism of action of inhibitors is even used to explain the difference in the concentration range in which inhibitor-induced activation is observed in cells. The description of the state of the art in this study is confusing and does not help to properly understand their argumentation to revise the established model for paradoxical RAF activation.

      We will work to clarify these complicated issues in the revised manuscript. While the reviewer is correct that the negative allostery model was developed in the context of Type 1.5 inhibitors, there are many examples in the literature of it being used to explain PA by type I and type II inhibitors as well.

      Can the authors clarify their analysis of the state of the art on the different mechanisms of action for the paradoxical activation of RAF by the different types of RAF inhibitors?

      We’ll try!

      5) Conclusions:

      "Our results suggest that negative allostery (or negative cooperativity) is not a requisite feature of paradoxical activation. The type I and type II inhibitors studied here induce RAF dimers and exhibit paradoxical activation but do so without evidence of negative cooperativity, nor do they appear to inhibit intentionally engineered RAF dimers with negative cooperativity (25). Indeed, type II inhibitors exhibit apparent positive cooperativity while type I inhibitors are non-cooperative inhibitors of RAF dimers (25)."

      Can the authors explain how results on the paradoxical activation induced by type I and type II inhibitors inform or challenge a model that specifically applies to type I1/2 inhibitors?

      As noted above, the negative allostery model has also been widely applied irrespective of inhibitor type (rightly or wrongly). Essentially any review or discussion of the topic will explain in one way or another how inhibitor binding to one side of a dimer leaves the opposite side active but resistant to inhibitor. Our model is agnostic with respect to cooperativity of inhibition – essentially we are pointing out a simple circumstance that seems to have been lost in the focus on negative allostery. Paradoxical activation is a result of drug action on RAF monomers, while inhibition is a result of drug action on RAF dimers. Because these are distinct molecular species/complexes, they can be expected to differ in their affinity for RAF inhibitors, irrespective of type. Because binding of ATP in the active site of RAF monomers stabilizes the inactive monomeric state, displacing ATP can promote activation/dimerization. For any inhibitor that is more potent at displacing ATP from a monomer that from an active dimer, we could expect to observe a window of paradoxical activation.

      The authors often refer to their previous study (reference 25), where they tested the inhibition of all three types of inhibitors with engineered RAF dimers. While I agree with the authors that in reference 25 the Type I and type II inhibitors inhibit RAF dimers without exhibiting negative cooperativity (as expected from the literature and the current model), the authors did observe some negative cooperativity for Type I1/2 inhibitors in their study most particularly for the type I1/2 PB (with hill slope ranging from -0.4 to -0.9, indicative of negative cooperativity).

      Correct! Although we do note the caveat that weak inhibition can also give rise to apparent negative cooperativity.

      While the observations that type II inhibitors display positive cooperativity is both novel and very interesting, from what I understand the results from thakick et al 2025 and the current study appear more in line with the current paradigm in the field (which describe paradoxical activation with negative cooperativity for type I1/2 inhibitors and no negative cooperativity for the Type I and II inhibitors) rather than disapproving of the current model and supporting for a new model. 

      In this context, can the authors clarify how their results challenge the current model for paradoxical activation?

      While the difference in binding modes and structural effects of type I.5 vs type I and type II inhibitors are well known in the field, we do not know of any work that suggests paradoxical activation arises from anything other than negative allostery. As one example to the contrary, Rasmussen et al. observe allosteric coupling asymmetry in binding of type II inhibitors to BRAF and attribute the observed paradoxical activation to “induction of dimers with one inhibited and one catalytically active subunit” (Rasmussen et al., Elife 2024). They also studied type I inhibitors in this work, but did not observe paradoxical activation.

      (6) Conclusions:

      The authors describe the JAB34 experiment from Poulikakos et al. 2010 to conclude that "While this experiment cleanly demonstrates inhibitor-induced transactivation of RAF dimers, it is important to recognize that the differential inhibitor sensitivity of the two subunits in this experiment is artificial - it is engineered rather than induced by inhibitor binding as the negative allostery model proposes."

      Indeed, the JAB34 experiment demonstrated the inhibitor-induced transactivation, but the Poulikakos et al. 2010 study does not discuss differential inhibitor sensitivity. The negative allostery model was proposed later by poulikakos team in other papers (Yao et al 2015 and Karoulia et al, 2016), in which JAB34 was not used.

      Can the authors clarify how the JAB34 experiments question differential inhibitor sensitivity?

      Good point, we neglected to discuss the Yao and Karoulia papers and will do so in our revised manuscript.

      (7) Conclusions:

      "Considering that the conformation required for binding of type I.5 inhibitors destabilizes RAF dimers, it is unclear how an inhibitor binding to one protomer would be able to transmit an allosteric change to the opposite protomer, if that inhibitor's binding causes the existing dimer to dissociate."

      The authors should comment on whether 14-3-3 proteins might overcome negative regulation by type I1/2 inhibitors, similar to what has been shown for ATP, which acts as a dimer breaker like type I1/2 inhibitors.

      Certainly we expect that they will, and we will discuss this in our revised manuscript.

      (8) Conclusions:

      "Furthermore, the complex effects of type I.5 inhibitors on dimer stability and the clear resistance of active RAF dimers to these inhibitors complicates interpretation of inhibition data - weak or incomplete inhibition of an enzyme can be difficult to discern from true negative cooperativity (43). As we discuss below, the clear resistance of RAF dimers to type I.5 inhibitors is alone sufficient to explain their ineffective inhibition during paradoxical activation, without invoking negative allostery." 

      The authors should explain how they reconcile this statement and their proposal of a new model that does not rely on negative allostery with their previous findings showing negative cooperativity for RAF dimer inhibition with type I1/2 inhibitors.

      As discussed above and in responses to other Reviewers, we do not exclude negative cooperativity for Type I.5 inhibitors. That said, we are skeptical, even in light of our own findings of apparent negative cooperativity by type 1.5 compounds, due in part to the caveats the reviewer highlights above.

      (9) Conclusions:

      Here, the authors propose a new universal model to explain paradoxical activation of RAF by all types of RAF inhibitors:

      " Our findings here, in light of structural studies of RAF complexes and prior cellular investigations of paradoxical activation, lead us to a model for paradoxical activation that does not rely on negative allostery and is consistent with activation by diverse inhibitor classes. In this model, the open monomer complex is the target of inhibitor-induced paradoxical activation (Figure 6). Binding of ATP to the RAF active site stabilizes the inactive conformation of the open monomer, which disfavors dimerization. Displacement of ATP by an ATP-competitive inhibitor, irrespective of class, alters the relative N- and C-lobe orientations of the kinase to promote dimerization (30, 35). Once dimerized, inhibitor dissociation from one or both sides of the dimer would allow phosphorylation and activation of MEK."

      From my understanding, the novelty of this new model is twofold: a) the open monomer is the target of the inhibitor-induced paradoxical activation and b) once dimerized, inhibitor dissociation from one or both sides of the dimer would allow phosphorylation and activation of MEK.

      Novelty a) implies, as the authors stated, that "Inhibitor-induced activation and inhibition act on distinct species - activation on the open monomer and inhibition on the 14-3-3-stabilized dimer". The authors should explain what they mean by "activation of the open monomer", while only RAF dimers are catalytically active (except for BRAF V600E mutant)?

      We will clarify – by activation we mean promoting conversion of the open monomer to a dimer.

      For novelty b), the authors should explain more clearly what experimental results support this new model.

      We will more explicitly detail how our results here as well as prior work in the field support this model.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      1) Summary

      This study investigates the mechanochemistry of Arp2/3-mediated branched actin networks at the level of individual branch junctions under load. Using microfluidic single-filament/branch force assays (including constant-force flow and open-chamber imaging) the authors quantify debranching, re‑nucleation, and mother- vs daughter‑interface stability across nucleotide states of Arp2/3 (ADP-Pi, ADP, and an ADP-BeFx proxy for ADP-Pi). They further test effects by two branch regulators (GMF and cortactin). Key findings include: (i) ADP-Pi and ADP complexes share similar force dependence but differ markedly (~20×) in intrinsic dissociation rate; (ii) phosphate turnover on the Arp2/3 complex is rapid ii) affinity for Pi drops when Arp2/3 loses its daughter filament; (iii) quantification from model fits uncovers large stability differences between daughter and mother interfaces of the Arp2/3 complex; (iv) extraordinary high stability of ADP-Pi-like Arp2/3 on the mother filament; and (v) distinct effects of GMF and cortactin on force‑dependent stability. Overall, the work combines technically demanding measurements with mechanistic modeling to probe how nucleotide state and regulatory factors tune branch mechanics.

      2) Major comments:

      1. Low force kinetics and completeness of survival curves (Figure 1). "For all forces, the surviving curves exhibited a clear single exponential behavior...." While the data can be fitted to monoexponential decay curves, data at low forces is clearly incomplete. >90% of branches have not dissociated by the end of the experiment. For the particular data shown in 1C (F00nN, n=60 total branches) it means that the time information is coming from

      Essential; experiment might already be performed. Otherwise straightforward to do (weeks time).

      In figure 1B, we indeed show a Survival curve for ADP-Arp2/3 complex branch dissociation at 0 pN up to 900 seconds. As now shown in updated supp figure S2, the data was in fact acquired for at least 5000 seconds for ADP-Arp2/3 and ADP-Pi states (N=2 repeats for each condition, with n = 60 and 90 branches for ADP-Arp2/3 branches, and 90 and 132 branches for ADP-Pi-Arp2/3 branches). The debranching rates reported in the initial submission were already obtained by fitting the surviving curves over the whole duration of the experiments.

      1. Stability Analysis (Figure 4). I can follow much of the arguments presented in the stability analysis of the daughter vs mother interfaces, which is in principle extremely interesting! However, there are some concerns here:

      i) The authors emphasize the zero force ratio derived from fits (which is linked to the stability difference of the two interfaces in the absence of force) despite this being only weakly constrained by data. Intuitively in the model, the stability difference should grow to very large values as the re-nucleation ratio approaches 1 at low force. This combined with the noise in the data poses an issue in my opinion. Looking at the data and the error margin, I think that the authors cannot state with high confidence that there is a real difference between the relative stability of the daughter and mother interfaces between the two nucleotide states of the complex.

      Essential; analysis and textual revision only

      We thank the reviewer for this comment. The difference in stability between the two interfaces is strongly constrained by the shape of the branch renucleation ratio versus force curve, and its value at 0 pN. This is illustrated in the figure shown below (new Supp Fig. S8), showing the dissociation rates of the two interfaces (in ‘dashed’ and ‘point-dashed’ style) that contribute to the overall debranching rate in each nucleotide condition. Despite the limited force range at which we probed the debranching rate, the branch renucleation ratio curve informs us on which interface is the weakest, and how this evolves with force.

      We have assessed the confidence intervals of the parameters obtained from the fits, taking into account the error bars on our experimental datapoints. It seems to indicate that the simultaneous fits of the debranching rate and the branch renucleation ratio curves indeed constrain the parameters quite strongly. These confidence intervals are now reported in the main text and in the summarizing table.

      We have repeated branch renucleation experiments for ADP-BeFx- and ADP-Pi-Arp2/3 complex branches (see new figure 4C&D, and our response to the next point). We believe these new measurements allow a better assessment of the relative stability between the two interfaces for Arp2/3 complex branch junctions in the ADP-BeFx state.

      Still, we agree with the reviewer that the dispersion of the experimental data does not allow us to have a strong confidence on the crossover force and relative stability difference of the interfaces. Therefore, we have slightly toned down the way we present and discuss the differences in stability when comparing the two nucleotide states.

      ii) For ADP-Pi, the renucleation ratio essentially remains flat over the measured force range. Hence, the data can only provide little leverage to estimate both the zero force ratio and, more importantly, the differential distance to the transition state in the slip-bond model in my opinion, which will show in the crossover force. Consequently, the quoted ">100×" stability difference at F=0 and the crossover force >20pN are driven largely by extrapolation rather than direct constraint by data. Given the high number of free parameters in the model, I would anticipate that several crossover forces and differential distances might explain the data nearly equally well. Instead of loosely reporting exact number from fits, I would have hoped for some sort of sensitivity analysis, for instance relying on profile likelihoods. Also parameter values could be reported as bounds (e.g crossover force≫measured range) rather than precise point estimates. This issue re-occurs (albeit not as drastically) for the cortactin experiments (Figure 6).

      Essential; analysis and textual revision only

      As mentioned in our response to the previous point, we have repeated renucleation experiments for ADP-BeFx- (and also for Arp2/3 complex branches in the presence of 50 mM Pi) (see new figure 4C&D) to better characterize the differential distance between to the transition force. The crossover force for the ADP-BeFx state is now 13.5 pN and the ratio of the stability between the two interfaces is roughly 100 times.

      We agree with the reviewer that the dispersion of the experimental data does not allow us to have a strong confidence on the crossover force and relative stability difference of the interfaces. We have thus toned down the way we report these values. We do believe though that the difference we report between the ADP and ADP-BeFx state appears to be significant and needs to be acknowledged.

      As a side note, it has proven to be challenging to pull on branches at forces higher than 7 pN. To apply a large force on the branch junction, we need to have a high flow rate. In this case, it appeared that the height of the filaments (both mother and daughter filaments) above the surface seem to deviate from what we have established in our previous studies (Jegou et al, Nat. Comm. 2013 & Wioland et al, PNAS 2019). This may originate from the fact branched filaments have a more complex shape than an individual filament. Characterizing accurately the evolution of the branch height as a function of the flow rate and applied force would require quite extensive additional characterization, which, we believe, is beyond the current focus of this study on the stability of Arp2/3 complexes.

      iii) One important expectation from the "two slip bond" model is that branch dissociation rates should not necessarily scale mono-exponentially as they mostly do over the accessible force range of the paper. However, once the "minor" pathway of dissociation from the mother starts to dominate at high forces, rates become more force sensitive. This is nicely recaptured by the model fits in Figure S6 but deserves some explanation in the text. Otherwise, people will simply remember the "ADP-Pi is 20-fold more stable than ADP at all forces" message.

      Essential; textual revision only

      We now have rephrased the key sentences (in the Abstract and Results sections) to more clearly state that the debranching rate is not increasing mono-exponentially with force.

      In the Abstract: “Remarkably, we find that branch junctions are over 30-fold more stable when the Arp2/3 complex is in the ADP-Pi rather than ADP state, and that force accelerates debranching with similar exponential factors in both states.”

      In the Results section: “The debranching rate seems to increase exponentially with the applied pulling force, in the range of 0 to 6 pN (Fig. 1F; see more refined analysis below). This behaviour is predicted by the Bell-Evans model for a slip bond.”

      iv) One important prerequisite for the model is that isolated Arp2/3 complexes (without a daughter filament) should dissociate with equal rates from mother filaments at all flow rates. Since the Arp2/3 complex prefers mother filament curvature, forces experienced by the mother might change its off-rate. It would be good to refer to this assumption in the text and experimentally verify it. I could not find it in the paper nor in Ghasemi et al 2024.

      Essential; simple experiment (a weeks time).

      We thank the reviewer for this important comment.

      First, we investigated whether the viscous drag force, applied on the ADP-Arp2/3 complexes which remain bound to mother filaments could affect their stability. We have performed branch renucleation experiments at different flow rates but with the same pulling force on branch junctions (average force 3.9 pN) by adapting the length of the daughter filament. As shown in new supp. figure S11 (shown below), we did not observe any significant differences between ‘low’ and ‘high’ flow rates. If the off-rate of the surviving Arp2/3 was significantly affected by the flow, this would have led to a variation of the renucleation ratio with the flow rate.

      Second, we have investigated the impact of the tension experienced by the mother filament at the location of the branch junction for ADP-Arp2/3 complex branches, with the same pulling force on the branches (average 4.1 pN pulling force on branches). We have quantified the debranching rate from three groups of branches depending on their position along mother filaments. As shown in new supp. figure S12 (shown below), we can observe a small trend, where the debranching rate decreases with the tension on the mother filament at the branching point.

      Doubling the tension on the mother filament from 15 to 30 pN decreases the debranching rate by a third. Though, pairwise logrank tests performed between the survival fractions of the three binned groups do not report any statistical significant difference (all p values > 0.05). One possible explanation for this is the height of the mother filament in the microfluidics flow that increases linearly from the anchoring point to the free barbed end. As a consequence the pulling force on the branches will be higher, as branches experience faster flows.

      For these same groups, upon branch dissociation, all remaining-bound Arp2/3 complexes are exposed to the same flow rate; the branch renucleation ratios were similar. Thus branch renucleation ratio seems to not significantly depend on the tension experienced by the mother filament at the branching point.

      Similarly, Pandit et al PNAS 2020, Extended figure S1, also reported no detectable impact of the mother filament tension on the debranching rate in their assay.

      v) The force dependence of the branch re-nucleation rate (Fig 3D) has been measured previously by the same group (Ghasemi et al). While the data in the older paper has not been fitted by a model, the trend of the data in the previous paper looks conspicuously different. Are there any explanations for this? I speculate that it might be related to actin and ATP not being saturated (low-force re-nucleation rate rarely exceeds 80%) in Ghasemi et al., but it would be good to know what the authors think about this. Essential; textual revision only

      This is a good point. We have plotted the data of the renucleation ratio from ADP-Arp2/3 complex from figure 1F of Ghasemi et al, Sc. Adv. 2024 (performed at 0.3 and 1 µM actin), together with the data of the current study from figure 4D (performed at 1.5 µM actin). We feel this comparison could be of interest to the readers, and have thus integrated it in the manuscript as new supp. figure S13 (shown below).

      As expected, the branch renucleation ratio is lower with lower concentrations of actin. The experimental data points from Ghasemi et al are similarly well fitted by the branch renucleation function obtained for 1.5 µM multiplied by a scaling parameter, which reflects the fact that the branch renucleation ratio is actin concentration dependent (Fig. 6A in Ghasemi et al). This scaling parameter was the only free parameter of those fits.

      Since the branch renucleation ratio depends on the actin concentration as follows, 0.97.kon.([actin] - Cc)kon.([actin] - Cc)+koffATP-Arp2/3 , with kon = 3.4 µM-1.s-1 and koff ATP-Arp2/3 = 0.66 s-1 from (Ghasemi et al. 2024), the scaling parameter obtained by the fits give estimates of the actin concentration in these experiments, of 0.6(±0.05) and 0.9(±0.2) µM for the experiments performed at 0.3 and 1 µM respectively in (Ghasemi et al. 2024).

      1. Stability of the authentic ADP-Pi-Arp2/3 complex on the mother filament. The extraordinary stability of the isolated ADP-BeFx-Arp2/3 complex on mother filaments is surprising, especially considering that both ATP and ADP states are much more labile (Ghasemi et al 2024). I would recommend repeating this experiment in the authentic ADP-Pi state with labelled Arp2/3 complexes as a more direct readout, even if this would require working with very high phosphate concentrations.

      Essential; simple experiment (a weeks time).

      We have followed the recommendation of the reviewer and have performed new experiments using fluorescent Arp2/3 complexes for ADP, ADP-BeFx and ADP-Pi states, now displayed in new figure 5C (also shown below).

      For fluorescent Arp2/3 complexes remaining bound to the mother filament, the Arp2/3 complex - mother filament interface is ~ 100 times more stable in the ADP-BeFx state (0.0046 s-1) compared to the ADP state (0.56 s-1). We also assessed the dissociation of surviving ADP-BeFx-Arp2/3 complexes using unlabelled Arp2/3 complexes (previously in figure 4B, repeated experiment shown in new supp. figure S10), which also indicates a remarkable stability.

      The dissociation curve of surviving Arp2/3 complexes in the presence of 50 mM Pi and 200 µM ATP in solution reflects the mixture of Arp2/3 dissociating in the ADP/ATP state and ADP-Pi-Arp2/3 that can either dissociate in the ADP-Pi state or lose their Pi and dissociate in the ATP state. Despite the presence of 50 mM Pi, the rate at which ADP dissociates and ATP reloads rate is much faster than Pi binding. Fitting this survival curve with a function that accounts for the initial double populations and the evolution of the ADP-Pi population (see Methods) gives a good estimate of the Pi release rate.

      OPTIONAL: Further, but beyond the scope of the present paper, would be titrating phosphate in these experiments, which would even allow the authors to independently verify the reduced Pi affinity for Arp2/3 in the mother filament. Of note, this affinity difference is needed to satisfy detailed balance in the reaction scheme (Fig 4 D)!

      We thank the reviewer for this suggestion. High concentrations of phosphate in the buffer renders glass surfaces quite sticky in our assays. We’ve tried several different passivation strategies (BSA, PLL-PEG, K-casein, …) but none gave satisfactory results. So titrating phosphate, by going beyond 50 mM phosphate, proved to be quite challenging.

      Detailed balance, considering the two possible routes connecting the ADP-Pi-Arp2/3 complex branch junction state and the surviving ADP-Arp2/3 complex state, can be written as KPi rel.branch junction . Kdebranching ADP-Arp2/3 = KdebranchingADP-Pi-Arp2/3 . KPi rel.surviving Arp2/3.. Some of these affinity constants are not known, because of the inability to determine reverse reactions rates such as the rebinding of a daughter filament to a surviving Arp2/3. It is thus hard to determine how the affinity of Pi for Arp2/3 complex changes between Arp2/3 complexes at branch junctions and surviving Arp2/3 complexes on mother filaments.

      While we cannot determine the affinity constant of Pi for a surviving Arp2.3 complex, our data indicates that the dissociation rate of Pi is higher from Arp2/3 complexes at branch junction (koff = 0.21 s-1) than from surviving Arp2/3 complexes (koff = 0.05 s-1). This unexpected finding indicates that surviving Arp2/3 complexes adopt a conformation where the nucleotides are readily exchanged, but where the ‘back door’ for Pi release is less open. We now discuss this point in our revised manuscript.

      1. Importance of "surviving" ADP-Pi-Arp2/3 complexes. The authors show a) rapid turnover of Pi on the ADP-Arp2/3 complex in both branch- or mother filament-bound state and b) the lowered Pi affinity of the latter. Nonetheless, they emphasize the importance of long-lived "surviving" ADP-Pi bound complexes on the mother (even stated in the abstract). I understand that this fraction shows under some experimental conditions (BeFx), but unless I am missing something, most complexes should rapidly lose their phosphate and either exchange nucleotide or dissociate from the mother under physiological conditions. Please clarify or tone done.

      Essential; textual revision only

      We thank the reviewer for their remark. We have tried to clarify this aspect in the manuscript.

      As shown now with the departure rate of fluorescent surviving Arp2/3 complexes together with branch renucleation data, we show that surviving ADP-Pi-Arp2/3 complexes are quite stable on mother filaments, because they detach and release their Pi slowly, such that branch regrowth will occur provided there is actin in solution. In the absence of actin monomers, as the reviewer correctly points out, the surviving ADP-Pi-Arp2/3 will predominantly release its Pi and thus become a surviving ADP-Arp2/3 complex. We have modified the text to avoid any confusion.

      1. GMF mechanism. The authors claim that GMF "...accelerates the departure of the surviving Arp2/3 complex from the mother...". I assume that they infer this from decrease in the re-nucleation ratio. However, alternatively GMF could simply dwell on the complex, inhibiting re-nucleation without promoting dissociation from the mother. The authors should either monitor Arp2/3 dwell times directly to discriminate between these possibilities or be more cautious in their conclusions.

      Essential; simple experiment (a weeks time) or textual revision.

      In Ghasemi et al. Sci. Adv. 2024, we examined the departure of Arp2/3 from the mother filament after GMF-induced debranching using fluorescent Arp2/3. Most of the fluorescent Arp2/3 dissociated from mother filaments within the same frame as the branch, i.e. within 0.5 seconds after the debranching event, and none were visible after another second . This could be due to Arp2/3 departing with the branch or an accelerated departure after branch dissociation. In any case, this rules out the possibility that GMF would dwell on the surviving complex for a substantial amount of time without promoting dissociation from the mother.

      In the present manuscript, we now show that increasing the ATP concentration 10-fold (from 0.2 to 2 mM) is sufficient to restore the branch renucleation ratio to its level without GMF. This shows that GMF does not cause Arp2/3 to leave with the branch, but rather that it (also) acts on the surviving Arp2/3 complex, in a way that is countered by high concentrations of ATP. More specifically, it suggests that GMF accelerates the departure of the surviving ADP-Arp2/3 complex, either directly and by hindering the reloading of ATP, and that GMF does not affect the surviving Arp2/3 complex once it has reloaded ATP.

      We now discuss these two non-mutually exclusive possibilities for the accelerated dissociation of the surviving ADP-Arp2/3 complex in the manuscript.

      6.Cortactin mechanism and the "leash model". I must say that the cortactin data are the most puzzling part of the paper and hard to reconcile with what we know from structure. I was hoping to find some of this resolved in the discussion. However, I do not understand the "leash model" in the discussion section for cortactin-mediated branch stabilization: "This would explain the observed increase in branch survival compared to the absence of cortactin. As the pulling force is increased, this rebinding mechanism becomes less efficient." According to my understanding of the data, this is opposite to what happens. Cortactin only stabilizes the labile interface at elevated forces! Some re-writing might help here.

      Essential; textual revision.

      We thank the reviewer for having us think more thoroughly about the model we initially proposed. We now believe that our ‘leash’ mechanism is not able to fully recapitulate our observations in a simple and satisfactory manner.

      We now propose a much simpler model, where the binding of cortactin to the Arp2/3 complex at the branch junction simply changes the energy landscape of the Arp2/3-daughter interface without the need to invoke a rebinding of the daughter filament upon branch departure. We have updated our interpretation of the data in the Discussion section accordingly.

      Overall, our results on the impact of cortactin on branch renucleation highlights a surprising behaviour that would require further investigation to fully decipher the underlying molecular mechanism.

      3) Minor comments

      Organization: - I do not want to impose on how to best tell the story, but I felt that Fig1 A-D and Fig 2 A-B belong to one logical unit (nucleotide dependence), whereas Fig 1 E-F and Fig 2 C belong to the other (Pi binding and exchange). Perhaps consider re-organizing to streamline presentation?

      We thank the reviewer for their suggestion. We agree that it flows more naturally as suggested, and have made the changes! Thank you.

      Semantics/Typos: - Abstract: „... ADP-Pi and ADP-Arp2/3 detach with the same exponential increase as a function of force...". Increase should refer to the dissociation rate, which should be added to the sentence.

      We have corrected this.

      Results page 8: "...and the majority of Arp2/3 complexes detach from the mother filament while remaining bound to the branch at the debranching time." "Branch" should likely be daughter here, as there is no branch after dissociation of either interface.

      We have corrected this, thank you.

      Results page 13: "Exposing ADP-BeFx-Arp2/3 complex branch junctions to a saturating amount of GMF...". It is strange to imply saturation, because GMF likely simply does not bind to the complex in this nucleotide state with appreciable affinity. Suggest to change to "high".

      We have made the changes accordingly.

      Discussion page 18: "Moreover, in mammalian Arp2/3, His80 in Arp3 (corresponding to His73 in mammalian actin) is not methylated, and corresponds to residue N77 in Arp3, which is also not modified." N77 likely belongs to Arp2?

      We have made the changes accordingly.

      Discussion page 19: "We showed that Pi affinity for Arp2/3 complexes at branch junctions is around 3.7 mM (Fig. 1), a value which lies within the reported 1-10 mM Pi concentration measured in the cytosol in different mammalian cell types". Notably, this is not too different from F-actin, which should be mentioned. By this measure alone, free inorganic phosphate could also directly regulate actin filament stability!

      We now mention this and discuss that intracellular Pi can also impact actin filament nucleotide state.

      Future interest (non essential): - It would be utterly exciting (but beyond current scope) to quantify how instantaneous debranching rates evolve for naturally aging branches starting from ATP-Arp2/3 complexes!

      We thank the reviewer for this remark. It is indeed quite beyond the scope of the current study, as this would require a way to probe ATP-Arp2/3 complex branches while daughter filaments are still quite short (so pulling on them is difficult). An interesting alternative could be to use ATP analogs, such as App-NHp (aka AMP-PNP), to stabilize this state. However, some studies have mentioned that App-NHp is not very stable.

      Significance

      General assessment:

      This is a compelling and carefully executed study that delivers a clear mechanistic framework for how Arp2/3 branch junctions fail and re‑form under load. The central strength is the tight integration of state‑of‑the‑art reconstitutions with careful and original kinetic analysis. The experimental design is elegant and experiments have been carried out to a masterful standard. The figures are clear, the statistics are appropriate with some exceptions as detailed above. There are very few labs in the world that could have achieved this feat!

      A few aspects could be further strengthened, most notably the explanation and application of the "two slip bond" model as well as slightly more restraint in speculating around specific regulatory mechanisms. However, these are minor refinements that do not detract from the important contributions of the paper.

      Overall, the clearly work merits publication with high priority after revision; most requested changes are textual/analytical with very few targeted experiments, which would substantially strengthen core claims.

      We thank the reviewer for their positive evaluation of our manuscript. We hope that our responses to the detailed points above, along with the corresponding revisions of the manuscript, will alleviate their concerns.

      Advance relative to prior literature: The major novel findings of the paper are already summarized above. There is some recent work done on the subject of branch mechanics by the authors (Ghasemi et al 2024, PMID: 38277459) and others (Pandit et al 2020 PMID: 32461373), but the focus of the present work is clearly unique and the there is plenty of novel insight.

      Audience and impact: Primary audience: specialists in cytoskeleton dynamics, in vitro reconstitution single molecule biophysics, and mechanobiochemistry. Secondary: researchers in cell motility, morphogenesis and mechanobiology, physicists working on active matter and modelers studying force producing and load-bearing biopolymer networks. The results and analysis framework should inform quantitative models of branched network turnover under load and the interpretation of regulatory factor action in vivo and in cells.

      Reviewer expertise: Actin dynamics; biochemical reconstitution; single molecule approaches; biophysics.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Xiao et al examine the molecular events occurring when Arp2/3 complex-mediated actin filament branches are removed from mother actin filaments. They do this using microfluidics assay with purified proteins combined with single filament TIRF imaging of branched actin filaments with distinct fluorescent labels. The contribution of different nucleotide states of Arp2/3 complex are tested in conjunction with the relationship force exerted on the branches and regulatory protein involvement from GMF and cortactin. The data seem comprehensive and highly quantified in response to concentration, force, fraction of branches and survival times and branching rates. They find that ADP-BeFx and high phosphate concentrations (leading to the ADP-Pi state) leads to a slower debranching rate at a given level of force applied. The ability to rapidly switch the buffer gives powerful information about response times of debranching compared with other actin remodelling events. They use renucleation experiments to determine that the previous debranching event most often occurs at the Arp2/3 complex/daughter interface, showing that filaments will be ready to re-branch in the stable ADP-Pi bound state. GMF addition allows debranching of the ADP state to occur at a lower force. Cortactin acts similarly to the ADP-Pi state to increase branch stability.

      Specific comments

      The pulling force on the branches seems to arise from different flow rates in the microfluidics. Viscous drag is mentioned and I can see there is methylcellulose in the buffer. It would be helpful to have the explanation of the conversion between flow and force, even if it has been standard in previous work.

      We apologize if this was unclear: in microfluidics experiments, the buffer does not contain methylcellulose. Methylcellulose is only used for ‘open chamber’ experiments, where no force is applied to Arp2/3 branches, to maintain them in the TIRF field of excitation (Figure S2).

      To better clarify the conversion between flow and force, we have rephrased and extended the Methods section to explain how the force on the branch junction is computed based on the local flow velocity and the length of the daughter filament.

      Pg 5 - what was the motivation to titrate phosphate? It seems a stretch that intracellular Pi levels are tuning branching inside cells more than protein-mediated control (GMF or cortactin) - can the authors evidence this at all?

      We are not claiming that the level of Pi plays a stronger regulatory role than proteins. We show that inorganic phosphate tunes the state of the Arp2/3 complex, which in turn modulates the action of regulatory proteins, such as GMF and cortactin.

      Nonetheless, we do show that the contribution of inorganic phosphate is quite central as it can (1) strongly stabilize branch junctions (~30-fold decrease in the dissociation rate), and (2) tune the activity of GMF and cortactin on Arp2/3 complexes at branch junctions as well as on the ‘surviving’ Arp2/3 complexes that remain bound to mother filaments.

      We thus titrated phosphate and found that its impact on Arp2/3 complex stability is significant in the range of Pi concentration that is explored in cells. For the sake of completeness, and following a comment from reviewer #1, we now also mention the affinity of Pi for actin subunits in filaments in the Discussion, and discuss the impact of intracellular Pi on actin itself.

      Minor comments

      • In the introduction, while the structural and mutagenesis evidence is clearly stated, in other cases a bit more detail would be helpful e.g. 'biochemical studies', which referred measurement of hydrolysis rates using radiolabelling

      We have made changes to more precisely define which biochemical assays were used in previous studies.

      • Page 3 Figures shouldn't be referenced in the introduction

      We have removed the references to the figures from the introduction.

      • Page 3 slip bond behaviour needs explanation

      We now explain the concept when first using this concept in the manuscript, as follows: “The debranching rate seems to increase exponentially with the applied pulling force, in the range of 0 to 6 pN (Fig. 1F; see more refined analysis below). This behaviour of accelerated debranching with the increase of the applied force is similar to the ‘slip bond’ concept, as predicted by the Bell-Evans model of the force-dependent lifetime of the interaction between two proteins”.

      • Figure 1B seems to be a theoretical schematic which is superfluous

      We suppose that the reviewer is actually referring to figure 3B of the initial manuscript, describing the energy potential of a molecular interaction as a function of the reaction coordinate. We agree with the reviewer that it is not absolutely required and we have removed it.

      • Figure 4D is helpful, different weight lines might help even more to explain the dominant pathways

      We have made modifications to the biochemical reaction scheme in this figure (now figure 5F in the revised version). We hope we succeeded in improving its readability. Since the different paths depend on mechano-chemical parameters, there is no real dominant pathway per se.

      **Referee cross-commenting**

      Rev1 sounds like the specialist here. I can't comment on their requests. Some similar points arise between the reviewers which need addressing.

      Reviewer #2 (Significance (Required)):

      Significance

      Taking a look at references 16 and 19, I do not find it clear what is achieved differently in the current work compared to these papers and what agrees and what disagrees. If it's a species difference I might expect the two species would be analysed side-by-side in this paper.

      We thank the reviewer for this important comment. The goal of our study was not to compare the behaviour of mammalian and yeast Arp2/3 complexes.

      We now try to better explain that the motivation of the present work is to address how the nucleotide state of the Arp2/3 complex tunes actin branch mechanosensitive stability, and regulates interactions with well known Arp2/3 complex binding proteins. Most of the reactions are quantified here for the first time. Moreover, the experiments with branch junctions in different nucleotide states are done under controlled mechanical conditions, providing the first direct measurements of the force-dependence of the debranching reactions. Our detailed kinetic analysis of the full reaction scheme allows us to model the different binding interfaces of the Arp2/3 complex.

      In addition, it is worth noting that:

      1. Species matter and this is why ref 16 and 19 can give the impression to disagree on the ability to renucleate branches thanks to the stability of surviving Arp2/3 complexes on mother filaments.
      2. In ref 16 (Pandit et al, PNAS 2020) species are mixed (yeast Arp2/3 and mammalian alpha actin from skeletal muscle), likely leading to a different behaviour compared to the only mammalian protein situation we examine in our current work. In particular, with mixed species one misses the ability to renucleate, as shown in our previous study Ghasemi et al (ref 19). However, since mixing species does not correspond to anything physiological, we do not think it is worth repeating these conditions alongside our experiments.
      3. Further, the analysis carried out in ref 16 suffers from important limitations: the force was unknown (not calibrated) and the data was fitted by a model that compounded several reactions, providing only an indirect estimation of the rates, in particular at zero force. In contrast, we have worked with calibrated forces (including dedicated experiments at zero force) and we have carried out specific experiments to directly measure several rates.
      4. In ref 19 (our earlier work) we did not investigate the impact of the nucleotide state of the branch junction at all, and we did not systematically measure the dissociation rates as a function of force.

      Contrary to Pandit et al, we directly measure the difference in branch stability at zero force between ADP and ADP-Pi states and show that the ~ 30 fold difference holds true at all probed forces. Last, the force dependence of the branch renucleation success rate gives us crucial information on which of the two Arp2/3 complex interfaces ruptures first.

      I'm not understanding how the authors can distinguish effects of adding phosphate and BeFx on Arp 2 and 3 compared to effects on actin. Importantly, are possible accompanying changes in the actin filament a confounding factor?

      We have checked that the nucleotide state (ADP-BeFx and ADP-Pi versus ADP) of the mother and daughter filaments have no impact on branch stability:

      • In the experiments shown in figure 2F, where the buffer condition to which branches are exposed is quickly changed from phosphate buffer to buffer without phosphate, we observe a rapid change of branch stability. Actin subunits at the branch junction are in F-actin conformation according to recent cyroEM observations (ref. Chavani et al, Nat Comm. 2024; Liu et al, NSMB 2024). These actin subunits, initially in the ADP-Pi state, are expected to age and become ADP with a rate of ~ 0.007 s-1 (ie half-time of 100 s; ref. Jegou et al, PLoS Biology 2011, Ooosterhert et al, NSMB 2023), a much lower rate than the observed change of the debranching rate (0.21 s-1). This means that the debranching rate is independent of the nucleotide state of daughter and mother filaments.

      • In new supp. Figure S4, we show that the debranching rate is similar for ADP-Arp2/3 complex branch junctions initiated from ADP- or ADP-BeFx-actin mother filaments.

      • In new supp. Figure S9, we initially exposed branch junctions to a BeFx solution then monitored debranching and branch renucleation in our standard buffer (ie without BeFX or Pi). We observed multiple rounds of branch renucleation, the first with ADP-BeFx-actin daughter filaments, and the following with daughter filaments never exposed to BeFx. They all had the same debranching rates and renucleation success rates.

      The paper is quite specialist to read and the advance appears to be incremental. My expertise is in molecular pathways to actin regulation outside the main area of the paper.

      The results we present in this study are often unexpected, and some go counter long-standing assumptions. The regulation of Arp2/3-nucleated branches is of importance for the stability and the force-generating capabilities of many actin networks in cells. Last, most of the measurements that we present had never been done, mainly because experiments are difficult to achieve, and require specific tools to monitor several events while controlling the applied force.

      We believe our results are of broad interest as they go counter long-standing assumptions. We have rewritten the text in several instances to convey our message more clearly.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Please find enclosed the review of the manuscript "Inorganic phosphate in Arp2/3 complex acts as a rapid switch for the stability of actin filament branches" by Xiao et al.

      The authors provide a detailed investigation of how the nucleotide bound to the Arp2/3 complex affects branch stability under flow force. From a kinetic perspective, this is an elegant study with generally high-quality data, although some conclusions rest on assumptions rather than direct experimental evidence.

      We thank the reviewer for their positive feedback. We have improved our manuscript and performed important additional experiments to provide more direct experimental evidence of our conclusions.

      A key question concerns the physiological relevance of these findings. For instance, the concept of branch regrowth may not be applicable in cellular contexts, since forces by actin polymerization would displace existing branches away from sites where they generate this active forces. The authors should clarify the relevance of regrowth during active force generation by branched networks.

      We thank the reviewer for this comment. Our in vitro results indeed point to a previously unreported property of branched actin networks, i.e. the ability of Arp2/3 complexes to readily renucleate branches in the ADP-Pi state and that it does require reloading ATP within Arp2/3.

      Branched actin networks, especially the lamellipodia or endocytotic patches, do exert active force thanks to actin polymerization of the individual branches at the forefront. Though, the whole actin network is exposed to stress, and the architecture of the network (inter-branch distance, crosslink between branches, …) presumably strongly impact its mechanical properties.

      In the case of other types of branched actin networks, such as the actin cortex, myosin motor put the whole network under tension. Such pulling forces on actin branches, depending on the amplitude of the pulling force, can lead to branch regrowth, and network self-repair.

      We have modified the text to make the physiological relevance clearer.

      Additionally, all experiments employ flow conditions that branches would probably not experience in cells-notably, the flow direction in the cellular context would be reversed. Altering the flow direction relative to the branches could affect not only the relationship between flow rate and branch stability, but potentially other system properties as well.

      We agree with the reviewer that in cells branches will not experience flow conditions similar to the ones we use in our in vitro assay. Nonetheless, in cells we expect mechanical stress on the branch junction to be applied in all directions. In lamellipodia, the compressive force applied at the leading edge is expected to result in diverse local orientations of the force on individual branch junctions within the network (as explained in Lappalainen et al. Nat Rev MBC 2022). Also, branch junctions are found in the cell cortex, where they are exposed to pulling forces resulting from the action of myosin motors and crosslinkers on mother and daughter filaments.

      This impact of the direction of the flow was addressed in our previous publication (Ghasemi et al, Sc. Adv. 2024, figure 2) and, to a lesser extent, by the lab of Enrique de la Cruz in Pandit et al, PNAS 2020 (ref. 16). We reported that flow direction has a minimal effect, if any, on branch dissociation rate and renucleation ratio.

      Reviewer #3 (Significance (Required)):

      Furthermore, the study appears not to account for the mother filament (particularly its nucleotide state) or the actin subunit bound to the Arp2/3 complex. The authors should discuss why their interpretation focuses exclusively on the Arp2/3 complex rather than on the actin filaments or Arp2/3-bound actin subunit.

      We have checked that the nucleotide state (ADP-BeFx and ADP-Pi versus ADP) of the mother and daughter filaments has no impact on branch stability :

      • In the experiments shown in figure 2F, where the buffer condition to which branches are exposed is quickly changed from phosphate buffer to buffer without phosphate, we observe a rapid change of branch stability. Actin subunits at the branch junction are in F-actin conformation according to recent cyroEM observations (ref. Chavani et al, Nat Comm. 2024; Liu et al, NSMB 2024). These actin subunits, initially in the ADP-Pi state, are expected to age and become ADP with a rate of ~ 0.007 s-1 (ie half-time of 100 s; ref. Jegou et al, PLoS Biology 2011, Ooosterhert et al, NSMB 2023), a rate much lower than the observed change of the debranching rate (0.21 s-1). This means that the debranching rate is independent of the nucleotide state of daughter and mother filaments.

      • In new supp. Figure S4, we show that the debranching rate is similar for ADP-Arp2/3 complex branch junctions initiated from ADP- or ADP-BeFx-actin mother filaments.

      • In new supp. Figure S9, we initially exposed branch junctions to a BeFx solution then monitored debranching and branch renucleation in a regular buffer. We observed multiple rounds of branch renucleation, the first with ADP-BeFx-actin daughter filaments, and the following with daughter filaments never exposed to BeFx. They all had the same debranching rates and renucleation success rates.

      An important concern involves the use of KPi (inorganic phosphate). Based our experience, KPi appears to have effects beyond simply impacting nucleotide state-actin filaments seem to assemble differently in the presence of KPi. The authors should exercise caution in their interpretation of KPi-based experiments.

      Concentration of KPi (up to 50 mM Pi) did not slow down barbed end elongation rate in our experiments.

      Overall, while the technical quality and kinetic analyses are state-of-the-art, relating this work to physiological contexts remains challenging, and some conclusions appear overstated.

      We have made changes in the discussion to try to more clearly relate our in vitro observations and conclusions with the cellular context where branch renucleation could have a strong impact on the architecture and mechanics of actin networks.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      While the authors have proved their hypothesis by temporally increasing the activity of cholinergic neurons at different life stages through the auxin-inducible degron system, their work raises two major concerns. First, they might want to discuss the conflicting data from Zullo et al (Nature 2019, vol 574, pp 359-364). For example, the authors show that increasing the activity of acr-2-expressing neurons after the 7th day of adulthood increases lifespan. However, Zullo et al (2019) show that the reciprocal experiment, inhibiting cholinergic neuron activity on the 1st day or the 8th day of adulthood, also increases lifespan. Is this because the two studies are using different promoters, that of the acr-2 ACh receptor (this work) versus that of the unc-17 vesicular ACh transporter (Zullo et al., 2019)? The two genes are expressed in different subsets of cells that do not completely overlap. CeNGEN shows that acr-2 is expressed in motor and non-motor neurons, but some of these neurons are also different from those that express unc-17. Is it possible that different cholinergic neurons also have opposite lifespan effects during adulthood? Or is it because both lack of signaling and hypersignaling can lead to a long-life phenotype? Leinwand et al (eLife 2015, vol 4, e10181) previously suggested that disturbing the balance in neurotransmission alone can extend lifespan. A simple discussion of these possibilities in the Discussion section is likely sufficient. Or can the auxin treatment and removal be confounding factors? Loose and Ghazi (Biol Open 2021, vol 10, bio058703) show that auxin IAA alone can affect lifespan and that this effect can depend on the time the animal is exposed to the auxin.

      We thank the reviewer for the thoughtful comments and valuable suggestions. In response, we have expanded the Discussion section to address the points raised, as detailed below.

      We fully agree with the reviewer that the different results between our study (activating acr-2-expressing neurons) and Zullo et al. (inhibiting unc-17- expressing neurons) are most likely due to the distinct cholinergic neurons targeted. Our new preliminary data further support this neuron-specific model, as inhibition of acetylcholine synthesis at mid-late life stages produces opposing lifespan effects in different cholinergic neurons. At the same time, we cannot rule out the alternative possibility raised by the reviewer (eLife, 2015) that both activation and inhibition of neuronal activity may extend lifespan by similarly disrupting the balance of neurotransmission. This hypothesis requires further experimental validation in the context of cholinergic motor neurons. Regarding the potential technical concern related to auxin exposure (Biol Open, 2021), our control experiments using 0.5 mM auxin did not show non-specific lifespan effects.

      Accordingly, in the revised manuscript, we have discussed the first two possibilities in the Discussion by stating (page 17-18): “Nevertheless, it is still unclear whether other neuronal populations share similar temporal regulatory mechanisms. A previous study reported that inhibiting cholinergic neurons activity (using unc-17 promoter) extends lifespan regardless of timing[2], which is different from the temporal lifespan regulation we observed in cholinergic motor neurons (using acr-2 promoter). This discrepancy is likely due to differences in subsets of neurons, as the unc-17 promoter labels a broad repertoire of cholinergic neurons, while the acr-2 promoter mainly marks cholinergic motor neurons[53]. Thus, the distinct lifespan-modulating effects of cholinergic motor neurons may be overshadowed by opposing contributions from other cholinergic subtypes when a mixed population is manipulated. Alternatively, both activation and inhibition of cholinergic activity may perturb neurotransmission balance, leading to similar effects on lifespan[54]. It will be interesting to test these hypotheses in future studies.”

      Second, the daf-16-dependence of the early longevity-inhibiting effect of ACh signaling needs clarification and further experimentation. The authors present a model in Figure 6D, where DAF-16 inhibits longevity. This contradicts published literature. Libina et al (Cell 2003, vol 115, pp 489-502) have shown that intestinal DAF-16 increases lifespan. From the authors' data, it is possible that ACh signaling inhibits DAF-16, not promotes it as they have drawn in Figure 6D.

      We thank the reviewer for this important point. We agree that intestinal DAF-16 promotes longevity. Our original model Figure 6D aimed to show that the larval pathway shortens lifespan by inhibiting DAF-16, not that DAF-16 itself shortens lifespan. The arrowhead style used in the original Fiugure 6D might have given an impression that DAF-16 shortens lifespan. Our apologies. We have now fixed this error in Figure 6D. In addition, as suggested, we have performed additional daf-16 experiments (see below).

      In Figure 3F, the authors used Pacr-2::TeTx, which inhibits cholinergic neuron activity, to show an increase in the expression of DAF-16 targets. Why did the authors not use the worms that express the transgene Pacr-2::syntaxin(T254I), which increases cholinergic neuron activity? What happens to the expression of DAF-16 targets in these animals? Do their expression go down? What happens if intestinal daf-16 is knocked down in animals with increased cholinergic neuron activity, instead of reduced cholinergic neuron activity?”

      Thanks for these insightful questions. In Figure 3F-H, we used TeTx instead of syntaxin(T254I) to investigate the function of DAF-16 in the early stage pathway based on the two main reasons. First, Pacr-2::TeTx transgene extends lifespan in early life by inhibiting cholinergic activity, which provides a genetic background complementary to that of syntaxin(T254I) for characterizing the role of DAF-16. Second, TeTx pathway is expected to activate DAF-16 and upregulate its target genes. This approach is more sensitive than measuring gene downregulation in Pacr-2::syntaxin(T254I) transgenic worms.

      We fully agree with the reviewer that performing the corresponding experiments in the syntaxin(T254I) background would strengthen the overall evidence. As suggested, we have now examined the expression of DAF-16 target genes in Pacr-2::syntaxin(T254I) transgenic worms, and performed intestine-specific RNAi of daf-16 in the same background. We found that these worms exhibit downregulation of DAF-16 target genes. Furthermore, intestinal daf-16 knockdown did not further shorten the already reduced lifespan of these transgenic worms. Together, these results from both the TeTx and syntaxin(T254I) lines confirms that cholinergic motor neurons require DAF-16 in the intestine to regulate lifespan. These new data has now been described in Figure S5A-5D (page 11-12): “As expected, the expression level of sod-3 and mtl-1, two commonly characterized DAF-16 target genes, was upregulated in transgenic worms deficient in releasing ACh from cholinergic motor neurons (Figure 3F), and downregulated in transgenic worms with enhanced ACh release from cholinergic motor neurons (Figure S5A), consistent with the notion that DAF-16 acts downstream of cholinergic motor neurons.”, and “RNAi of daf-16 in the intestine abolished the ability of cholinergic motor neurons to regulate lifespan at early life stage (Figure 3G, 3H and Figure S5C-S5E).”

      Recommendations for The Authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) “The Methods section needs to be clarified/expanded.”

      (a) “For example, are the authors using indole-3-acetic acid or a synthetic auxin? How long does it take for syntaxin to be made after the removal of the auxin?”

      We have now included auxin information and recovery time in the Method for auxin treatment by stating (page 24): “natural auxin indole-3-acetic acid (G&K Scientific)”, and “Expression of syntaxin(T254I) can be suppressed by auxin treatment and restored in 24 hours following auxin removal.”

      (b) “How much FUDR was used in some of the lifespan assays?”

      2 μg/mL FUDR was used in some of the lifespan assays. We have now included the concentration in the Method for lifespan assay by stating (page 23 line 526): “2 μg/mL 5-Fluoro-2’-deoxyuridine (FUDR) was included in assays involving TeTx transgene worms, unc-31 and unc-17 mutant worms, which show a defect in egg laying.”

      (c) “In line 494 of the Methods section, worms were anesthetized with 50 mM sodium azide. That concentration seems a bit high.”

      It is an error indeed. We used 5 mM NaN3. This has now been fixed in the text and in line 548.

      (d) “What are the concentrations of the transgenes used in the extrachromosomal arrays?”

      We have now included the concentrations in the Method for strains and genetics by stating (line 507-509 on page 22): “Microinjections were performed using standard protocols. Each plasmid DNA listed above in the transgenic line was injected at a concentration of 50 ng/μL. Each marker for RNAi was co-injected at a concentration of 25 ng/μL.”

      (2) “Gene expression can vary in different parts of the worm intestine. Do the measurements in Figure 6C represent the entire intestine or only certain parts of the intestine?”

      We have now included the intestine area used for quantification in the Method for microscopy by stating (page 24): “and the entire intestine area was selected by ImageJ”, and in the legends of Figure 6C by stating (page 36): “The entire intestinal area was selected for measurement.”

      (3) “In Figure S1C, does tph-1 have a slight effect? Might serotonin partly counteract the effects of ACh?”

      We thank the reviewer for raising this interesting point regarding the potential role of serotonin. We have re-examined our data in Figure S2C (the original Figure S1C) and agree that loss of tph-1 partly counteracted the lifespan-shortening effect of Pacr-2::syntaxin(T254I) transgene in early life stage, thought the whole-life suppression effect is slight. To assess whether the acr-2 promoter-driven manipulation might directly affect serotonergic neurons, we checked the CeNGen. We found that the transcript expression of acr-2 can be detected in serotonergic neurons (ADF, HSN, and NSM), but the levels are extremely low. In this regard, it is unlikely that the Pacr-2::syntaxin(T254I) transgene exerts its primary effect by substantially altering serotonin release. While a potential indirect interaction between cholinergic and serotonergic signaling in lifespan regulation remains, it falls beyond the primary focus of the current study. We would like to follow up this in future studies. We have now pointed this out in the text by stating (page 9):“As a control, we also tested mutants deficient in other types of small neurotransmitters, including glutamate (eat-4), GABA (unc-25), serotonin (tph-1), dopamine (cat-2), tyramine (tdc-1), and octopamine (tbh-1), but detected no effect, with the exception of tph-1, which showed a modest, partial suppression of the phenotype (Figure S2A-S2F). This observation suggests that the lifespan effects of cholinergic signaling can be modulated by serotonin.”

      (4) “Where else is GAR-2 expressed? Might there be redundancies between neuronal and intestinal GAR-2?”

      We appreciate this insightful question. Based on available single-cell gene expression atlases of C. elegans at both embryonic and adult stages[1,2], gar-2 expression has been detected not only in neurons and the intestine, but also in additional tissues such as the muscle. Regarding the observed lack of effects upon neuronal or intestinal gar-2 RNAi on the ability of cholinergic motor neurons to extend lifespan in mid-late life, and also suggested by another reviewer, we performed muscle-specific RNAi experiments. Together with our previously presented data, the results show that intestinal (but not neuronal or muscle) RNAi of gar-3 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, while muscle-specific (but not neuronal or intestinal) RNAi of gar-2 suppresses this effect. This finding indicates that GAR-3 and GAR-2 mediate cholinergic signaling in distinct peripheral tissues, with GAR-3 primarily in the intestine and GAR-2 primarily in muscle, to produce their effects on longevity. Given our focus on neuron-gut signaling, the role of GAR-2 in the muscle will be further investigated in future studies. The new data have now been described in Figure S8 by stating (page 13-14): “RNAi of gar-2 in the intestine (Figure 4D and 4E), but not in neurons or the muscle (Figure 4D-4F, and Figure S8A, S8D-S8E), abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stage. Thus, GAR-3 may function in the intestine to regulate lifespan. Surprisingly, RNAi of gar-2 in the muscle (Figure S8A-S8C), but not in neurons or the intestine (Figure S7F-S7H) had an effect on the ability of cholinergic motor neurons to extend lifespan in mid-late life, indicating that GAR-2 acts in the muscle to regulate lifespan.”

      (1) Packer, J. S. et al. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science 365, doi:10.1126/science.aax1971 (2019).

      (2) Roux, A. E. et al. Individual cell types in C. elegans age differently and activate distinct cell-protective responses. Cell Rep 42, 112902, doi:10.1016/j.celrep.2023.112902 (2023).

      (3) Chun, L. et al. Metabotropic GABA signalling modulates longevity in C. elegans. Nat Commun 6, 8828, doi:10.1038/ncomms9828 (2015).

      (4) Izquierdo, P. G. et al. Cholinergic signaling at the body wall neuromuscular junction distally inhibits feeding behavior in Caenorhabditis elegans. J Biol Chem 298, 101466, doi:10.1016/j.jbc.2021.101466 (2022).

      (5) “In line 344, please correct "fwork" to "work".”

      This has now been fixed.

      (6) “In line 360, please correct "acts" to "act".”

      This has now been fixed.

      (7) “Please check citations within the main text. Some of the citations do not fit the cited material. For example, in line 112, reference 28 is not about GABAergic neurons.”

      We thank the reviewer for pointing out these important details. We have now carefully checked and corrected the citations throughout the manuscript as suggested.

      Reviewer #2 (Recommendations for The Authors):

      (1) “How are the authors assessing the efficacy of the TeTx manipulations in their strains? Likely TeTx has a concentration-dependent effect. Are there any phenotypes associated with the loss of cholinergic signaling? Also, does TeTx expression in cholinergic neurons alter the neuronal activity of other associated neurons, or alter muscle integrity?”

      Thanks for the question. Our observations show that overexpression of TeTx results in defects including small size, slow growth, egg-laying deficiencies, and severe locomotion impairment, which are all associated with the loss of cholinergic signaling. While we did not directly examine the activity of interconnected neurons in our strains, we tested the muscle integrity by recording muscle reaction to 1 mM levamisole and found that overexpression of TeTx does not affect muscle integrity. To circumvent these pleiotropic complications, we instead employed Syntaxin(T254I) transgenic worms, which exhibits only slight locomotion defects, to further characterize the temporal effect of cholinergic motor neurons on lifespan. This data has now been described in Figure S1A by stating (page 6): “Overexpression of TeTx induces characteristic phenotypes of cholinergic deficiency, such as developmental delay and severe locomotion impairment[32], yet does not compromise muscle function (Figure S1A).”

      (2) “The authors are expressing TeTx throughout the lifespan of the animal, including during development. How does this contribute to the organismal phenotype?”

      As described above, chronic TeTx expression from egg stage results in developmental delay, which is similar to the development phenotype of unc-17 mutant worms defective in acetylcholine transmission. However, unc-17 mutation has no effect on lifespan[3], which is different from TeTx overexpression, indicating that the developmental delay caused by TeTx overexpression may not affect the lifespan phenotype.

      (3) Chun, L. et al. Metabotropic GABA signalling modulates longevity in C. elegans. Nat Commun 6, 8828, doi:10.1038/ncomms9828 (2015).

      (3) “A previous study has shown that increasing cholinergic activity by altering ACR-2 expression can cause neurodegeneration (DOI: https://doi.org/10.1523/JNEUROSCI.1515-10.2010). Does overexpressing syntaxin, or AID-mediated degradation of syntaxin cause motor neuron degeneration, which could also contribute to the lifespan phenotype?”

      We thank the reviewer for raising this important point regarding potential motor neuron degeneration. In response, we performed confocal microscopy to assess the motor neurons. We found that worms expressing the transgene Pacr-2::syntaxin::mCherry do not exhibit a defect in the number or morphology of labeled neuronal cell bodies compared to control worms expressing Pacr-2::mCherry. This observation indicates that chronic, increased cholinergic activity through syntaxin overexpression, under our experimental conditions, does not induce motor neuron degeneration. This data has now been described in Figure S1B by stating (page 7): “This transgene simply shortened lifespan without causing a pleotropic effect (Figure 1B), and critically, without inducing motor neuron degeneration (Figure S1B).”

      (4) “Figures 1I-1L: The authors do not show how long it takes for the expression of syntaxin to be restored following the removal of auxin from plates. This would be important to assess the age-dependent effects of neuronal signaling.”

      We thank the reviewer for pointing this out. In general, complete restoration of syntaxin expression occurred within 24 hours after auxin withdrawal. We have now pointed this out in the text by stating (the last sentence on page 24):“Expression of syntaxin(T254I) can be suppressed by auxin treatment and restored in 24 hours following auxin removal.”

      (5) “In Figures S1A-E: Although the mutant backgrounds decrease the lifespan of animals expressing the Pacr2::syntaxin(T254I) transgene, the lifespan of these transgenic animals appears to be extended compared to what was shown in Figure 1B. Is this the case? (can these experiments be repeated alongside wild-type N2s to assess if their lifespan is indeed extended compared to the N2?). Also, if so, could it be that the lifespan effects are modified to different extents by other small neurotransmitters?”

      We thank the reviewer for pointing this out. All the experiments presented in current Figure S2 (original Figure S1) were performed with wild-type N2 controls, which are now included in the updated Figure S2. This data shows that, in the Pacr-2::syntaxin(T254I) transgenic background, loss of unc-25 (GABA) or tph-1 (serotonin) leads to a further extension of lifespan, while loss of other genes had no effect. Importantly, while unc-25 mutation also extends lifespan in wild-type worms, tph-1 mutation does not. This observation indicates that the lifespan effects of cholinergic signaling can be modulated by serotonin. We have now pointed this out in the text by stating (page 9):“As a control, we also tested mutants deficient in other types of small neurotransmitters, including glutamate (eat-4),, GABA (unc-25), serotonin (tph-1), dopamine ,(cat-2), tyramine (tdc-1), and octopamine (tbh-1), but detected no effect, with the exception of tph-1, which showed a modest, partial suppression of the phenotype (Figure S2A-S2F). This observation suggests that the lifespan effects of cholinergic signaling can be modulated by serotonin.”

      (6) “RNAi of several of the receptors appear to modulate wild-type lifespan. Although I understand that this is not the main focus of the manuscript, the fact that this occurs should be mentioned in the results and discussed later on.”

      We thank the reviewer for pointing this out. As suggested by the reviewer, we have now pointed this out in the text by stating (page 9):“Notably, RNAi of several ACh receptors such as acr-11 appears to shorten wild-type lifespan, whereas RNAi of several other ACh receptors such as acr-9 extends wild-type lifespan, suggesting lifespan-modulating potential of ACh receptors (Figure S3).”

      (7) “Cholinergic signaling and ACR-6 have been previously shown to regulate pharyngeal pumping/feeding behavior. (https://doi.org/10.1016/j.jbc.2021.10146”). Could the requirements for ACR-6/cholinergic signaling in longevity be related to caloric restriction/nutritional intake which in turn could be expected to alter DAF-16 and HSF-1 activity? These previous studies should be referenced and discussed.”

      Thanks for the suggestion. As suggested by the reviewer, we have examined the pumping rate of acr-6 mutant worms. Our results showed that acr-6 mutation slightly reduced the pumping rate. As the decrease is relatively minor, we do not expect a major DR effect, though we cannot completely rule out such a possibility. Furthermore, as acr-6 acts in the pharynx to regulate pumping but in the intestine to regulate the role of cholinergic signaling in lifespan, we do not expect this would have a major contribution to our pathway. This new data has now been described in Figure S4I. As suggested by the reviewer, we have now pointed this out in the text by stating (page 10): Previous data has shown that cholinergic signaling and ACR-6 may control pharyngeal pumping[42]. As expected, we found that acr-6 mutation slightly reduced pumping rates (Figure S4G).”

      (8) “The expectation for the studies in Figure 3/DAF-16, is that animals expressing Ex[Pacr-2::syntaxin(T254I)], should have downregulated DAF-16 in the intestine. This needs to be shown through some method (increased daf-16 activation upon loss of cholinergic signaling does not necessarily imply that the converse is also true).”

      We thank the reviewer for the insightful suggestion. The reviewer has suggested us performing additional measurements to confirm that DAF-16 is the downstream transcription factor in the intestine. Specifically, the reviewer suggested testing if syntaxin(T254I) transgene signaling could inhibit DAF-16 activity. We have now followed the reviewer’s suggestion by performing two different assays. First, as also suggested by the first reviewer, we detected the expression of DAF-16 target genes in Pacr-2::syntaxin(T254I) transgenic worms, which exhibited downregulation of these genes, consistent with the notion that increasing cholinergic motor neuron activity inhibits DAF-16. This data has now been described in Figure S5A. Second, we performed an assay to detect DAF-16 subcellular localization pattern in the intestine. We found that acr-6 RNAi notably promotes nuclear translocation of DAF-16, suggesting that ACR-16 inhibits DAF-16, which is consistent with our model. This new data has now been described in Figure S5E. As suggested by the reviewers, we have now pointed this out in the text by stating (page 11): “As expected, the expression level of sod-3 and mtl-1, two commonly characterized DAF-16 target genes, was upregulated in transgenic worms deficient in releasing ACh from cholinergic motor neurons (Figure 3F), and downregulated in transgenic worms with enhanced ACh release from cholinergic motor neurons (Figure S5A), consistent with the notion that DAF-16 acts downstream of cholinergic motor neurons. To obtain further evidence, we assessed the subcellular localization pattern of DAF-16::GFP fusion and found that acr-6 RNAi notably promoted nuclear translocation of DAF-16, confirming that ACh signaling inhibits DAF-16 activity (Figure S5B).”

      (9) “Similarly, it would be good to have additional lines of evidence that signaling through GAR-3 impinges on HSF1, and that the lifespan effects are not due to non-specific effects of hsf-1 knockdown, which could lead to several un-related deficiencies and compromise lifespan (Figure 5b).”

      We thank the reviewer for the valuable suggestions. The reviewer correctly noted that the observed lifespan effect from hsf-1 RNAi could involve non-specific deficiencies. In response, we performed an assay to detect HSF-1 subcellular localization in the intestine upon gar-3 overexpression by using the strain EQ87 (iqIs28[pAH71(hsf-1p::hsf-1::gfp) + pRF4(rol-6)]). We found that the induced nuclear translocation of HSF-1 was weak. This result suggests that GAR-3 may modulate HSF-1 activity through a mechanism distinct from, or more subtle than, robust nuclear accumulation, or that its effect is highly dependent on the expression level and timing.

      (10) “Figure 6: An N2 control should be provided to assess the specificity of the mCherry signal from the intestine (given autofluorescence in the animals' gut).”

      Thanks for the suggestion. As suggested by the reviewer, we have now included the control in Figure S10.

      Reviewer #3 (Recommendations for The Authors):

      (1) “While the model is consistent with the data, there are alternatives that were not addressed. Additionally, there are some deficiencies in the interpretation of results that should be addressed, in my opinion. Possibly most importantly given the claims, the authors should address an alternative model: that it is the level of acetylcholine signaling that matters. Is it possible that the level auxin-inducible degradation of syntaxin(T254I) in acr-2 expressing cells is age dependent, such that one level increases lifespan and the other shortens it, and that the timing doesn't matter at all? A chronic dose response to auxin concentration would address if the level of syntaxin is a non-monotonic determinant of lifespan.”

      We sincerely thank the reviewer for raising this important alternative model. The reviewer suggested that the apparent temporal effect we observed might instead be explained by an age-dependent change in the efficiency of AID system in degrading syntaxin(T254I) in acr-2 expressing cells. That is, different levels of acetylcholine signaling, rather than timing, produce opposite lifespan outcomes. We agree that this is a formal possibility that our current data cannot fully rule out. On the other hand, other data in the manuscript suggests otherwise. For example, the expression of ACR-6 and GAR-3 in the intestine exhibited a temporal switch in early and mid-late life, providing support for a time-dependent mechanism. In addition, the differential requirement of the downstream transcription factors DAF-16 and HSF-1 in the early and mid-late life, respectively, provides further evidence supporting a temporal mechanism. Thus, while we agree that the possibility raised by the reviewer cannot be formally ruled out, the temporal mechanism we proposed may play an important role.

      The reviewer suggested performing a chronic dose-response experiment with varying auxin concentrations. Actually when we first employed the AID system to temporally manipulate motor neuron output at different life stages, we tested potential effects of auxin concentration. Using the soma-expressed TIR1 system, we found that, restoring syntaxin(T254I) activity from day 10 of adulthood extends lifespan, regardless of whether the prior suppression was maintained with 0.1 mM or 0.5 mM auxin. This suggests that the pro-longevity effect is likely not triggered by differences in the efficacy of prior suppression within this concentration range. We acknowledge that the tested dose range may not cover potential threshold concentrations. Furthermore, we cannot exclude the possibility of a non-linear relationship between auxin concentration and degradation efficiency. We agree that a comprehensive chronic dose-response analysis remains a valuable future direction, and we plan to employ more precise tools in the future to investigate the interplay between signal level and temporal context in lifespan regulation. The auxin concentration data have now been described in Figure S1C-1D by stating (page 7): “Comparable outcomes were obtained with both 0.1 mM and 0.5 mM auxin treatments (Figure S1C-1D).” As suggested by the reviewer, we have discussed the alternative model in the Discussion by stating (page 19): “An alternative mechanism based on differential levels of cholinergic signaling could also contribute to the observed lifespan effects.”

      (2) “Several times, including in several section headings, it is claimed that daf-16 (eg line 205-206) and acr-6 (eg line 185-186) function "early in life". This was not tested, so the claim is not warranted. For instance, these genes could act later in life to respond to signals made or sent early in life, or they could act both early and late, or only early (as they claim).”

      We thank the reviewer for this precise and important clarification. The reviewer is correct that our genetic interventions do not by themselves define the temporal window.

      Our experimental rationale was based on the observation that the lifespan-shortening effect of Pacr-2::syntaxin(T254I) expression is similar whether it is induced throughout life or specifically during larval stages (early life), indicating the detrimental effect results from enhanced motor neuron output in early life. Therefore, we used the lifelong expression paradigm as a tool to genetically dissect the downstream pathway triggered by early-life neuronal activation. We acknowledge the reviewer's point that this design does not formally prove that daf-16 or acr-6 acts only in early life; they could be required continuously or again later. However, we would like to note that our expression data show that the gut expression of ACR-6 is restricted to early life, which is consistent with a primary early-life function in this context.

      To reflect this more accurate interpretation, we have revised all relevant statements, including section headings. We now consistently state that daf-16 is required for the lifespan-shortening effect of cholinergic motor neuron, rather than claiming it functions "in early life". We have also toned down the discussion regarding their temporal function by stating (page 12): “Because this lifespan-shortening effect results from enhanced motor neuron output in early life and overwrites its beneficial effect at later stages, we propose this signaling circuit mediates the lifespan-shortening effect in early life.”

      (3) “In line 118, they note that such intervention led to a complex effect on the lifespan curve "by initially promoting worm's survival followed by inhibiting it at later stages." I think that while findings from later experiments support a time-dependent lifespan effect stemming from syntaxin function in the cholinergic motor neurons, this experiment's TeTx expression in those neurons is not time-dependent. Lifespan is an endpoint measure, so there is no sense in which a non-timed perturbation has an early or late effect on an individual. Rather, the effect on survival they observed is at the population level, their intervention increases the average lifespan while decreasing the worm-to-worm variation in lifespan.”

      We thank the reviewer for the critical and precise comment regarding our interpretation of the survival curves of TeTx transgenic worms. As suggested by the reviewers, we have revised the text by stating (page 6): “Surprisingly, such intervention led to a complex effect on the population survival curve by reducing both early mortality and the proportion of long-lived individuals (Figure 1A). Specifically, the 25% lifespan of these worms was prolonged, while their 75% and maximal lifespan were slightly shortened, leading to a mean lifespan slightly increased or unchanged compared to that of wild-type worms. This suggests that inhibiting cholinergic motor neurons may exert temporally distinct effects on survival, leading to decreased individual variation in lifespan.”

      (4) “The layout of the plots separating the responses of wild type and mutants to different panels makes it often difficult to interpret the results. For instance, do acr-6, gar-3, and other receptor mutants or knockdowns affect lifespan on their own? If they do, it matters to the interpretation whether they live longer or shorter than the wild type: which of the mutants phenocopy the lack of a lifespan-extending signal that activates them? Which phenocopy lacks a lifespan-shortening signal that activates them? Could they phenocopy the effect of an inhibitory signal? And critically, are the effects of these mutants on lifespan consistent with their model?”

      “The paper would be stronger if they determined when ACR-6 and GAR-3 functions are necessary and sufficient. Is it possible that the receptor doesn't matter, just that there be one of the two expressed in the intestine, and that other mechanisms determine the lifespan response to modulation of syntaxin(T254I)? What does time-dependent knockdown of these receptors do to daf-16 and hsf-1 localization and to the transcription of the targets of these transcription factors?”

      We thank the reviewer for these insightful comments. We have addressed the points as follows:

      As suggested, we have reorganized the lifespan data in Figure S4 to directly compare wild type and mutant/RNAi conditions within the same panels. This new presentation clarifies the autonomous effects of these genes. The data shows that loss of acr-6 or gar-2 (via RNAi or mutation) has minimal effect on lifespan. Notably, acr-8 RNAi shortens lifespan, whereas the acr-8 mutation does not, supporting our hypothesis of tissue-specific or compensatory roles for this receptor, as detailed in our following response to point (5). The reviewer's key question regarding when these receptors are necessary and sufficient is central to our model. We agree with the reviewer that complementary loss-of-function experiments with temporal precision, such as time-specific knockdown of the two receptors, would provide even stronger evidence. To this end, we attempted to generate endogenous degron-tagged alleles of acr-6 and gar-3 to apply the AID system for precise, stage-specific degradation. Unfortunately, despite multiple design attempts and screening efforts, we were unable to obtain homozeygous strains with the desired genomic edits using the same gRNA we used to knock in mCherry or other gRNAs. This is rather frustrating. Consequently, we are currently unable to perform the ideal temporally controlled loss-of-function experiments suggested by the reviewer.

      (5) “Why does RNAi but not mutation of acr-8 and gar-2 suppress the lifespan shortening effect of Pacr-2::syntaxin(T254I)?”

      Thanks for this important question regarding the differential effects of feeding RNAi versus mutation of acr-8 and gar-2. The discrepancy likely arises from the potential off-target effects of RNAi. RNAi is not strictly specific as it may target other related genes, generating a non-specific effect, whereas precise mutations in acr-8 and gar-2 alone may not produce the same effect.

      (6) “sid-1(-); Ex[Pacr-2::tetx lives longer than sid-1(-); in daf-16(+) worms in Figure 3G; so it is very hard to interpret the lack of effect of Pacr-2::tetx in daf-16(-) worms, since this transgene behaves differently in sid-1 mutants than in wild type worms. This would be clear if the two plots were combined (appropriately, since it is the same experiment). It looks like daf-16 RNAi has a shortening effect in the sid-1 mutant, but not in in sid-1 mutants expressing Pacr-2::text.”

      Thanks for this helpful suggestion. As suggested by the reviewer, we have now merged Figure 3G and 3H into one figure to present as Figure S5F. This combined presentation clarifies the comparison and shows that intestinal daf-16 RNAi shortens lifespan in both sid-1 mutants and sid-1 mutants expressing Pacr-2::TeTx.

      Reviewer #4 (Recommendations for The Authors):

      (1) “Lines 50-52: I would replace "leading to increased incidents in age-related diseases and probability of death" with "leading to the onset of age-related diseases and increased probability of death". Instead of "such an aging process" I would use "the aging process".”

      This has now been fixed.

      (2) “Figure 2E-F: By rescuing the expression of ACR-6 in neurons or intestinal cells alone, the authors show that the release of ACh from cholinergic neurons has effects on the intestine to shorten lifespan. Is ACR-6 expressed in other tissues (e.g. muscle?) It might be interesting to assess whether ACh also regulates lifespan through activating the ACR-6 receptor in other tissues or specifically targets the intestine. This question is partially answered with the tissue-specific RNAi experiments for DAF-16, but it is possible that ACR-6 also modulates other pathways beyond the tested transcription factors.”

      Analyzing the role of other tissues could also be applied to understand how GAR-3 influences lifespan. Along these lines, it would be interesting to expand the tissue-specific knockdown experiments for GAR-3 to other tissues. More importantly, these experiments can address whether activation of ACR-6 and GAR-3 can also have different effects on lifespan by regulating distinct tissues in addition to the intestine, and not only due to temporal expression patterns. For instance, whereas DAF-16 regulates lifespan primarily through its effects in the intestine, HSF1 could have effects on additional tissues. Although it would interesting to perform these experiments, I understand that the authors main focus is the nervous system-gut axis.

      We thank the reviewer for the insightful suggestions regarding the potential tissue-specific functions of ACR-6 and GAR-3. As noted in our response to point #6, endogenous expression imaging indicates that ACR-6 and GAR-3 are primarily expressed in neurons and the intestine with weak expression of GAR-3 in the muscle, so we tested the muscle. We found that muscle-specific RNAi of gar-2 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, whereas muscle-specific RNAi of gar-3 does not. This result further supports that GAR-3 primarily exerts this effect in the intestine.

      (3) “Can the authors specify in the corresponding figure legend at what age they tested sod-3 and mtl-1 expression in Pacr-2::TeTx worms (Figure 3F)? This is important to support the conclusions of the paper. Along these lines, can the authors also specify at what age they quantified the expression of HSF-1 targets (Figure 5F).”

      Thanks for the suggestion. As recommended, we have now provided the worm age in Figure 3F (day 1 adult) and Figure 5F legends (day 10 adult).

      (4) “To further strengthen the authors' conclusions, it might be interesting to examine the intracellular localization of DAF-16 in the intestine of Pacr-2::TeTx and syntaxin(T254I) worms compared to controls.”

      We thank the reviewer for this valuable suggestion, which was also raised by another reviewer. In response, we examined the subcellular localization of DAF-16 in the intestine. Direct imaging in the Pacr-2::TeTx or Pacr-2::syntaxin(T254I) backgrounds was technically challenging because their fluorescent protein tags (YFP or mCherry) would interfere with the detection of DAF-16::GFP. Therefore, we adopted an alternative approach by modulating the activity of acr-6, the intestinal acetylcholine receptor that transmits cholinergic signals from motor neurons to DAF-16. We found that acr-6 RNAi promotes the nuclear translocation of DAF-16. These new data are presented in Figure S5E by stating (page 11): “To obtain further evidence, we assessed the subcellular localization pattern of DAF-16::GFP fusion and found that acr-6 RNAi notably promotes nuclear translocation of DAF-16, confirming that ACh signaling modulate DAF-16 activity (Figure S5B).”

      (5) “The results with gar-2 RNAi are fascinating. I am very curious (and I assume potential readers too) about what tissues mediate the mid-late life effects of GAR-2 in longevity. Perhaps the authors could add experiments in a couple of other tissues known to regulate organismal lifespan (e.g. muscle). However, I totally understand why the authors focused on GAR-3, especially because both GAR-3 and ACR-6 have effects on the intestine and this is sufficient for the main conclusions of the paper.”

      We sincerely thank the reviewer for the insightful suggestion and for highlighting the potential role of GAR-2. In response, we performed muscle-specific RNAi experiments. Together with our previously presented data, the results show that intestinal (but not neuronal or muscle) RNAi of gar-3 abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stages, while muscle-specific (but not neuronal or intestinal) RNAi of gar-2 suppresses this effect. This finding indicates that GAR-3 and GAR-2 mediate cholinergic signaling in distinct peripheral tissues, with GAR-3 primarily in the intestine and GAR-2 primarily in the muscle, to produce their effects on longevity. Given our focus on neuron-gut signaling, the role of GAR-2 will be investigated in future studies. The new data have now been described in Figure S8 by stating (page 13-14): “RNAi of gar-3 in the intestine (Figure 4D and 4E), but not in neurons or the muscle (Figure 4D-4F, and Figure S8A, S8D-S8E), abolished the ability of cholinergic motor neurons to extend lifespan at mid-late life stage. Thus, GAR-3 may function in the intestine to regulate lifespan. Surprisingly, RNAi of gar-2 in the muscle (Figure S8A-S8C), but not in neurons or the intestine (Figure S7F-S7H) had effect on the ability of cholinergic motor neurons to extend lifespan in mid-late life, indicating that GAR-2 acts in the muscle to regulate lifespan.”

      (6) “Figure 6: It seems that the genes are also expressed in the muscle. Can the authors include images of other tissues in supplementary figures?”

      Thanks for the suggestion. As suggested by the reviewer, we have now included images of whole worms expressing mCherry, which was knocked in the endogenous locus off gar-3 or acr-6 by CRISPR in Figure S10. However, we did not detect strong expression of gar-3 or acr-6 in the muscle under the conditions examined, which may be limited by the low endogenous protein expression level of the two genes in the muscle, though the CeNGEN website shows they are expressed in the muscle. Determining the precise spatiotemporal expression profiles of these receptors will likely require more sensitive methods. We plan to address this important question in future studies by using such refined approaches.

    1. Author response:

      General Statements

      We thank all three reviewers for their time taken to provide valuable feedback on our manuscript, and for appreciating the quality and usefulness of our data and results presented in our study. We have improved the manuscript based on their suggestions and provide a detailed, point-by-point response below.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community.

      Thank you for your positive feedback.

      There are several single-cell methodologies all claim to co-profile chromatin modifications and gene expression from the same individual cell, such as CoTECH, Paired-tag and others. Although T-ChIC employs pA-Mnase and IVT to obtain these modalities from single cells which are different, could the author provide some direct comparisons among all these technologies to see whether T-ChIC outperforms?

      In a separate technical manuscript describing the application of T-ChIC in mouse cells (Zeller, Blotenburg et al 2024, (Zeller et al., 2024)), we have provided a direct comparison of data quality between T-ChIC and other single-cell methods for chromatin-RNA co-profiling (Please refer to Fig. 1C,D and Fig. S1D, E, of the preprint). We show that compared to other methods, T-ChIC is able to better preserve the expected biological relationship between the histone modifications and gene expression in single cells.

      In current study, T-ChIC profiled H3K27me3 and H3K4me1 modifications, these data look great. How about other histone modifications (eg H3K9me3 and H3K36me3) and transcription factors?

      While we haven’t profiled these other modifications using T-ChIC in Zebrafish, we have previously published high quality data on these histone modifications using the sortChIC method, on which T-ChIC is based (Zeller, Yeung et al 2023)(Zeller et al., 2022). In our comparison, we find that histone modification profiles between T-ChIC and sortChIC are very similar (Fig. S1C in Zeller, Blotenburg et al 2024). Therefore the method is expected to work as well for the other histone marks.

      T-ChIC can detect full length transcription from the same single cells, but in FigS3, the authors still used other published single cell transcriptomics to annotate the cell types, this seems unnecessary?

      We used the published scRNA-seq dataset with a larger number of cells to homogenize our cell type labels with these datasets, but we also cross-referenced our cluster-specific marker genes with ZFIN and homogenized the cell type labels with ZFIN ontology. This way our annotation is in line with previous datasets but not biased by it. Due the relatively smaller size of our data, we didn’t expect to identify unique, rare cell types, but our full-length total RNA assay helps us identify non-coding RNAs such as miRNA previously undetected in scRNA assays, which we have now highlighted in new figure S1c .

      Throughout the manuscript, the authors found some interesting dynamics between chromatin state and gene expression during embryogenesis, independent approaches should be used to validate these findings, such as IHC staining or RNA ISH?

      We appreciate that the ISH staining could be useful to validate the expression pattern of genes identified in this study. But to validate the relationships between the histone marks and gene expression, we need to combine these stainings with functional genomics experiments, such as PRC2-related knockouts. Due to their complexity, such experiments are beyond the scope of this manuscript (see also reply to reviewer #3, comment #4 for details).

      In Fig2 and FigS4, the authors showed H3K27me3 cis spreading during development, this looks really interesting. Is this zebrafish specific? H3K27me3 ChIP-seq or CutTag data from mouse and/or human embryos should be reanalyzed and used to compare. The authors could speculate some possible mechanisms to explain this spreading pattern?

      Thanks for the suggestion. In this revision, we have reanalysed a dataset of mouse ChIP-seq of H3K27me3 during mouse embryonic development by Xiang et al (Nature Genetics 2019) and find similar evidence of spreading of H3K27me3 signal from their pre-marked promoter regions at E5.5 epiblast upon differentiation (new Figure S4i). This observation, combined with the fact that the mechanism of pre-marking of promoters by PRC1-PRC2 interaction seems to be conserved between the two species (see (Hickey et al., 2022), (Mei et al., 2021) & (Chen et al., 2021)), suggests that the dynamics of H3K27me3 pattern establishment is conserved across vertebrates. But we think a high-resolution profiling via a method like T-ChIC would be more useful to demonstrate the dynamics of signal spreading during mouse embryonic development in the future. We have discussed this further in our revised manuscript.

      Reviewer #1 (Significance):

      The authors have a longstanding focus and reputation on single cell sequencing technology development and application. In this current study, the authors developed a novel single-cell multi-omic assay termed "T-ChIC" so that to jointly profile the histone modifications along with the full-length transcriptome from the same single cells, analyzed the dynamic relationship between chromatin state and gene expression during zebrafish development and cell fate determination. In general, the assay works well, the data look convincing and conclusions are beneficial to the community.

      Thank you very much for your supportive remarks.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Joint analysis of multiple modalities in single cells will provide a comprehensive view of cell fate states. In this manuscript, Bhardwaj et al developed a single-cell multi-omics assay, T-ChIC, to simultaneously capture histone modifications and full-length transcriptome and applied the method on early embryos of zebrafish. The authors observed a decoupled relationship between the chromatin modifications and gene expression at early developmental stages. The correlation becomes stronger as development proceeds, as genes are silenced by the cis-spreading of the repressive marker H3k27me3. Overall, the work is well performed, and the results are meaningful and interesting to readers in the epigenomic and embryonic development fields. There are some concerns before the manuscript is considered for publication.

      We thank the reviewer for appreciating the quality of our study.

      Major concerns:

      (1) A major point of this study is to understand embryo development, especially gastrulation, with the power of scMulti-Omics assay. However, the current analysis didn't focus on deciphering the biology of gastrulation, i.e., lineage-specific pioneer factors that help to reform the chromatin landscape. The majority of the data analysis is based on the temporal dimension, but not the cell-type-specific dimension, which reduces the value of the single-cell assay.

      We focussed on the lineage-specific transcription factor activity during gastrulation in Figure 4 and S8 of the manuscript and discovered several interesting regulators active at this stage. During our analysis of the temporal dimension for the rest of the manuscript, we also classified the cells by their germ layer and “latent” developmental time by taking the full advantage of the single-cell nature of our data. Additionally, we have now added the cell-type-specific H3K27me3 demethylation results for 24hpf in response to your comment below. We hope that these results, together with our openly available dataset would demonstrate the advantage of the single-cell aspect of our dataset.

      (2) The cis-spreading of H3K27me3 with developmental time is interesting. Considering H3k27me3 could mark bivalent regions, especially in pluripotent cells, there must be some regions that have lost H3k27me3 signals during development. Therefore, it's confusing that the authors didn't find these regions (30% spreading, 70% stable). The authors should explain and discuss this issue.

      Indeed we see that ~30% of the bins enriched in the pluripotent stage spread, while 70% do not seem to spread. In line with earlier observations(Hickey et al., 2022; Vastenhouw et al., 2010), we find that H3K27me3 is almost absent in the zygote and is still being accumulated until 24hpf and beyond. Therefore the majority of the sites in the genome still seem to be in the process of gaining H3K27me3 until 24hpf, explaining why we see mostly “spreading” and “stable” states. Considering most of these sites are at promoters and show signs of bivalency, we think that these sites are marked for activation or silencing at later stages. We have discussed this in the manuscript (“discussion”). However, in response to this and earlier comment, we went back and searched for genes that show H3K27me3 demethylation in the most mature cell types (at 24 hpf) in our data, and found a subset of genes that show K27 demethylation after acquiring them earlier. Interestingly, most of the top genes in this list are well-known as developmentally important for their corresponding cell types. We have added this new result and discussed it further in the manuscript (Fig. 2d,e, , Supplementary table 3).

      Minors:

      (1) The authors cited two scMulti-omics studies in the introduction, but there have been lots of single-cell multi-omics studies published recently. The authors should cite and consider them.

      We have cited more single-cell chromatin and multiome studies focussed on early embryogenesis in the introduction now.

      (2) bT-ChIC seems to have been presented in a previous paper (ref 15). Therefore, Fig. 1a is unnecessary to show.

      Figure 1a. shows a summary of our Zebrafish TChIC workflow, which contains the unique sample multiplexing and sorting strategy to reduce batch effects, which was not applied in the original TChIC workflow. We have now clarified this in “Results”.

      (3) It's better to show the percentage of cell numbers (30% vs 70%) for each heatmap in Figure 2C.

      We have added the numbers to the corresponding legends.

      (4) Please double-check the citation of Fig. S4C, which may not relate to the conclusion of signal differences between lineages.

      The citation seems to be correct (Fig. S4C supplements Fig. 2C, but shows mesodermal lineage cells) but the description of the legend was a bit misleading. We have clarified this now.

      (5) Figure 4C has not been cited or mentioned in the main text. Please check.

      Thanks for pointing it out. We have cited it in Results now.

      Reviewer #2 (Significance):

      Strengths:

      This work utilized a new single-cell multi-omics method and generated abundant epigenomics and transcriptomics datasets for cells covering multiple key developmental stages of zebrafish.

      Limitations:

      The data analysis was superficial and mainly focused on the correspondence between the two modalities. The discussion of developmental biology was limited.

      Advance:

      The zebrafish single-cell datasets are valuable. The T-ChIC method is new and interesting.

      The audience will be specialized and from basic research fields, such as developmental biology, epigenomics, bioinformatics, etc.

      I'm more specialized in the direction of single-cell epigenomics, gene regulation, 3D genomics, etc.

      Thank you for your remarks.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This manuscript introduces T‑ChIC, a single‑cell multi‑omics workflow that jointly profiles full‑length transcripts and histone modifications (H3K27me3 and H3K4me1) and applies it to early zebrafish embryos (4-24 hpf). The study convincingly demonstrates that chromatin-transcription coupling strengthens during gastrulation and somitogenesis, that promoter‑anchored H3K27me3 spreads in cis to enforce developmental gene silencing, and that integrating TF chromatin status with expression can predict lineage‑specific activators and repressors.

      Major concerns

      (1) Independent biological replicates are absent, so the authors should process at least one additional clutch of embryos for key stages (e.g., 6 hpf and 12 hpf) with T‑ChIC and demonstrate that the resulting data match the current dataset.

      Thanks for pointing this out. We had, in fact, performed T-ChIC experiments in four rounds of biological replicates (independent clutch of embryos) and merged the data to create our resource. Although not all timepoints were profiled in each replicate, two timepoints (10 and 24hpf) are present in all four, and the celltype composition of these replicates from these 2 timepoints are very similar. We have added new plots in figure S2f and added (new) supplementary table (#1) to highlight the presence of biological replicates.

      (2) The TF‑activity regression model uses an arbitrary R² {greater than or equal to} 0.6 threshold; cross‑validated R<sup>2</sup> distributions, permutation‑based FDR control, and effect‑size confidence intervals are needed to justify this cut‑off.

      Thank you for this suggestion. We did use 10-fold cross validation during training and obtained the R<sup>2</sup>> values of TF motifs from the independent test set as an unbiased estimate. However, the cutoff of R<sup>2</sup> > 0.6 to select the TFs for classification was indeed arbitrary. In the revised version, we now report the FDR-adjusted p-values for these R<sup>2</sup> estimates based on permutation tests, and select TFs with a cutoff of padj < 0.01. We have updated our supplementary table #4 to include the p-values for all tested TFs. However, we see that our arbitrary cutoff of 0.6 was in fact, too stringent, and we can classify many more TFs based on the FDR cutoffs. We also updated our reported numbers in Fig. 4c to reflect this. Moreover, supplementary table #4 contains the complete list of TFs used in the analysis to allow others to choose their own cutoff.

      (3) Predicted TF functions lack empirical support, making it essential to test representative activators (e.g., Tbx16) and repressors (e.g., Zbtb16a) via CRISPRi or morpholino knock‑down and to measure target‑gene expression and H3K4me1 changes.

      We agree that independent validation of the functions of our predicted TFs on target gene activity would be important. During this revision, we analysed recently published scRNA-seq data of Saunders et al. (2023) (Saunders et al., 2023), which includes CRISPR-mediated F0 knockouts of a couple of our predicted TFs, but the scRNAseq was performed at later stages (24hpf onward) compared to our H3K4me1 analysis (which was 4-12 hpf). Therefore, we saw off-target genes being affected in lineages where these TFs are clearly not expressed (attached Fig 1). We therefore didn’t include these results in the manuscript. In future, we aim to systematically test the TFs predicted in our study with CRISPRi or similar experiments.

      (4) The study does not prove that H3K27me3 spreading causes silencing; embryos treated with an Ezh2 inhibitor or prc2 mutants should be re‑profiled by T‑ChIC to show loss of spreading along with gene re‑expression.

      We appreciate the suggestion that indeed PRC2-disruption followed by T-ChIC or other forms of validation would be needed to confirm whether the H3K27me3 spreading is indeed causally linked to the silencing of the identified target genes. But performing this validation is complicated because of multiple reasons: 1) due to the EZH2 contribution from maternal RNA and the contradicting effects of various EZH2 zygotic mutations (depending on where the mutation occurs), the only properly validated PRC2-related mutant seems to be the maternal-zygotic mutant MZezh2, which requires germ cell transplantation (see Rougeot et al. 2019 (Rougeot et al., 2019)) , and San et al. 2019 (San et al., 2019) for details). The use of inhibitors have been described in other studies (den Broeder et al., 2020; Huang et al., 2021), but they do not show a validation of the H3K27me3 loss or a similar phenotype as the MZezh2 mutants, and can present unwanted side effects and toxicity at a high dose, affecting gene expression results. Moreover, in an attempt to validate, we performed our own trials with the EZH2 inhibitor (GSK123) and saw that this time window might be too short to see the effect within 24hpf (attached Fig. 2). Therefore, this validation is a more complex endeavor beyond the scope of this study. Nevertheless, our further analysis of H3K27me3 de-methylation on developmentally important genes (new Fig. 2e-f, Sup. table 3) adds more confidence that the polycomb repression plays an important role, and provides enough ground for future follow up studies.

      Minor concerns

      (1) Repressive chromatin coverage is limited, so profiling an additional silencing mark such as H3K9me3 or DNA methylation would clarify cooperation with H3K27me3 during development.

      We agree that H3K27me3 alone would not be sufficient to fully understand the repressive chromatin state. Extension to other chromatin marks and DNA methylation would be the focus of our follow up works.

      (2) Computational transparency is incomplete; a supplementary table listing all trimming, mapping, and peak‑calling parameters (cutadapt, STAR/hisat2, MACS2, histoneHMM, etc.) should be provided.

      As mentioned in the manuscript, we provide an open-source pre-processing pipeline “scChICflow” to perform all these steps (github.com/bhardwaj-lab/scChICflow). We have now also provided the configuration files on our zenodo repository (see below), which can simply be plugged into this pipeline together with the fastq files from GEO to obtain the processed dataset that we describe in the manuscript. Additionally, we have also clarified the peak calling and post-processing steps in the manuscript now.

      (3) Data‑ and code‑availability statements lack detail; the exact GEO accession release date, loom‑file contents, and a DOI‑tagged Zenodo archive of analysis scripts should be added.

      We have now publicly released the .h5ad files with raw counts, normalized counts, and complete gene and cell-level metadata, along with signal tracks (bigwigs) and peaks on GEO. Additionally, we now also released the source datasets and notebooks (Rmarkdown format) on Zenodo that can be used to replicate the figures in the manuscript, and updated our statements on “Data and code availability”.

      (4) Minor editorial issues remain, such as replacing "critical" with "crucial" in the Abstract, adding software version numbers to figure legends, and correcting the SAMtools reference.

      Thank you for spotting them. We have fixed these issues.

      Reviewer #3 (Significance):

      The method is technically innovative and the biological insights are valuable; however, several issues-mainly concerning experimental design, statistical rigor, and functional validation-must be addressed to solidify the conclusions.

      Thank you for your comments. We hope to have addressed your concerns in this revised version of our manuscript.

      Author response image 1.

      (1) (top) expression of tbx16, which was one of the common TFs detected in our study and also targeted by Saunders et al by CRISPR. tbx16 expression is restricted to presomitic mesoderm lineage by 12hpf, and is mostly absent from 24hpf cell types. (bottom) shows DE genes detected in different cellular neighborhoods (circled) in tbx16 crispants from 24hpf subset of cells in Saunders et al. None of these DE genes were detected as “direct targets” in our analysis and therefore seem to be downstream effects. (2) Effect of 3 different concentrations of EZH2 inhibitor (GSK123) on global H3K27me3 quantified by flow cytometry using fluorescent coupled antibody (same as we used in T-ChIC) in two replicates. The cells were incubated between 3 and 10 hpf and collected afterwards for this analysis. We observed a small shift in H3K27me3 signal, but it was inconsistent between replicates.

      References

      Chen, Z., Djekidel, M. N., & Zhang, Y. (2021). Distinct dynamics and functions of H2AK119ub1 and H3K27me3 in mouse preimplantation embryos. Nature Genetics, 53(4), 551–563. den Broeder, M. J., Ballangby, J., Kamminga, L. M., Aleström, P., Legler, J., Lindeman, L. C., & Kamstra, J. H. (2020). Inhibition of methyltransferase activity of enhancer of zeste 2 leads to enhanced lipid accumulation and altered chromatin status in zebrafish. Epigenetics & Chromatin, 13(1), 5.

      Hickey, G. J., Wike, C. L., Nie, X., Guo, Y., Tan, M., Murphy, P. J., & Cairns, B. R. (2022). Establishment of developmental gene silencing by ordered polycomb complex recruitment in early zebrafish embryos. eLife, 11, e67738.

      Huang, Y., Yu, S.-H., Zhen, W.-X., Cheng, T., Wang, D., Lin, J.-B., Wu, Y.-H., Wang, Y.-F., Chen, Y., Shu, L.-P., Wang, Y., Sun, X.-J., Zhou, Y., Yang, F., Hsu, C.-H., & Xu, P.-F. (2021). Tanshinone I, a new EZH2 inhibitor restricts normal and malignant hematopoiesis through upregulation of MMP9 and ABCG2. Theranostics, 11(14), 6891–6904.

      Mei, H., Kozuka, C., Hayashi, R., Kumon, M., Koseki, H., & Inoue, A. (2021). H2AK119ub1 guides maternal inheritance and zygotic deposition of H3K27me3 in mouse embryos. Nature Genetics, 53(4), 539–550.

      Rougeot, J., Chrispijn, N. D., Aben, M., Elurbe, D. M., Andralojc, K. M., Murphy, P. J., Jansen, P. W. T. C., Vermeulen, M., Cairns, B. R., & Kamminga, L. M. (2019). Maintenance of spatial gene expression by Polycomb-mediated repression after formation of a vertebrate body plan. Development (Cambridge, England), 146(19), dev178590.

      San, B., Rougeot, J., Voeltzke, K., van Vegchel, G., Aben, M., Andralojc, K. M., Flik, G., & Kamminga, L. M. (2019). The ezh2(sa1199) mutant zebrafish display no distinct phenotype. PloS One, 14(1), e0210217.

      Saunders, L. M., Srivatsan, S. R., Duran, M., Dorrity, M. W., Ewing, B., Linbo, T. H., Shendure, J., Raible, D. W., Moens, C. B., Kimelman, D., & Trapnell, C. (2023). Embryo-scale reverse genetics at single-cell resolution. Nature, 623(7988), 782–791.

      Vastenhouw, N. L., Zhang, Y., Woods, I. G., Imam, F., Regev, A., Liu, X. S., Rinn, J., & Schier, A. F. (2010). Chromatin signature of embryonic pluripotency is established during genome activation. Nature, 464(7290), 922–926.

      Zeller, P., Blotenburg, M., Bhardwaj, V., de Barbanson, B. A., Salmén, F., & van Oudenaarden, A. (2024). T-ChIC: multi-omic detection of histone modifications and full-length transcriptomes in the same single cell. In bioRxiv (p. 2024.05.09.593364). https://doi.org/10.1101/2024.05.09.593364

      Zeller, P., Yeung, J., Viñas Gaza, H., de Barbanson, B. A., Bhardwaj, V., Florescu, M., van der Linden, R., & van Oudenaarden, A. (2022). Single-cell sortChIC identifies hierarchical chromatin dynamics during hematopoiesis. Nature Genetics. https://doi.org/10.1038/s41588-022-01260-3

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study builds upon a major theoretical account of value-based choice, the 'attentional drift diffusion model' (aDDM), and examines whether and how this might be implemented in the human brain using functional magnetic resonance imaging (fMRI). The aDDM states that the process of internal evidence accumulation across time should be weighted by the decision maker's gaze, with more weight being assigned to the currently fixated item. The present study aims to test whether there are (a) regions of the brain where signals related to the currently presented value are affected by the participant's gaze; (b) regions of the brain where previously accumulated information is weighted by gaze.

      To examine this, the authors developed a novel paradigm that allowed them to dissociate currently and previously presented evidence, at a timescale amenable to measuring neural responses with fMRI. They asked participants to choose between bundles or 'lotteries' of food times, which they revealed sequentially and slowly to the participant across time. This allowed modelling of the haemodynamic response to each new observation in the lottery, separately for previously accumulated and currently presented evidence.

      Using this approach, they find that regions of the brain supporting valuation (vmPFC and ventral striatum) have responses reflecting gaze-weighted valuation of the currently presented item, whereas regions previously associated with evidence accumulation (preSMA and IPS) have responses reflecting gaze-weighted modulation of previously accumulated evidence.

      Strengths:

      A major strength of the current paper is the design of the task, nicely allowing the researchers to examine evidence accumulation across time despite using a technique with poor temporal resolution. The dissociation between currently presented and previously accumulated evidence in different brain regions in GLM1 (before gaze-weighting), as presented in Figure 5, is already compelling. The result that regions such as preSMA respond positively to |AV| (absolute difference in accumulated value) is particularly interesting, as it would seem that the 'decision conflict' account of this region's activity might predict the exact opposite result. Additionally, the behaviour has been well modelled at the end of the paper when examining temporal weighting functions across the multiple samples.

      Weaknesses:

      The results relating to gaze-weighting in the fMRI signal could do with some further explication to become more complete. A major concern with GLM2, which looks at the same effects as GLM1 but now with gaze-weighting, is that these gaze-weighted regressors may be (at least partially) correlated with their non-gaze-weighted counterparts (e.g., SVgaze will correlate with SV). But the non-gaze-weighted regressors have been excluded from this model. In other words, the authors are not testing for effects of gaze-weighting of value signals *over and above* the base effects of value in this model. In my mind, this means that the GLM2 results could simply be a replication of the findings from GLM1 at present. GLM3 is potentially a stronger test, as it includes the value signals and the interaction with gaze in the same model. But here, while the link to the currently attended item is quite clear (and a replication of Lim et al, 2011), the link to previously accumulated evidence is a bit contorted, depending upon the interpretation of a behavioural regression to interpret the fMRI evidence. The results from GLM3 are also, by the authors' own admission, marginal in places.

      We have addressed this comment with new GLMs. The new GLM1 includes both non-gazeweighted and gaze-weighted regressors and finds that the vmPFC and striatum reflect gazeweighted sampled value, while the preSMA reflects gaze-weighted accumulated value. We have now dropped the old GLM3 and added two other GLMs, one that explicitly interacts accumulated value with accumulated dwell, and the other that considers only partial gaze discounting. These analyses all support the preSMA as encoding gaze-weighted accumulated value.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors seek to disentangle brain areas that encode the subjective value of individual stimuli/items (input regions) from those that accumulate those values into decision variables (integrators) for value-based choice. The authors used a novel task in which stimulus presentation was slowed down to ensure that such a dissociation was possible using fMRI despite its relatively low temporal resolution. In addition, the authors leveraged the fact that gaze increases item value, providing a means of distinguishing brain regions that encode decision variables from those that encode other quantities such as conflict or time-on-task. The authors adopt a region-of-interest approach based on an extensive previous literature and found that the ventral striatum and vmPFC correlated with the item values and not their accumulation, whereas the pre-SMA, IPS, and dlPFC correlated more strongly with their accumulation. Further analysis revealed that the preSMA was the only one of the three integrator regions to also exhibit gaze modulation.

      Strengths:

      The study uses a highly innovative design and addresses an important and timely topic. The manuscript is well-written and engaging, while the data analysis appears highly rigorous.

      Weaknesses:

      With 23 subjects, the study has relatively low statistical power for fMRI.

      We believe several features of our study design and analytic approach mitigate concerns regarding statistical power.

      First, our paradigm leveraged a within-subjects design with high total sample counts. Each participant completed approximately 60 choice trials across three 15-minute runs, with an average of 6.37 samples per trial. This yielded roughly 380 observations per participant, providing substantial statistical power at the individual level before aggregating across subjects. This within-subject power is particularly important for detecting parametric effects, as our regressors of interest (|∆_S_V| and |∆AV|) varied continuously across and within trials.

      Second, rather than conducting an exploratory whole-brain analysis that would require larger sample sizes to correct for multiple comparisons, we employed a targeted ROI approach based on well-established regions from prior literature (e.g., Bartra et al., 2013; Hare et al., 2011). This ROI-driven approach substantially increases statistical power by reducing the search space and leverages theoretical predictions about where effects should occur. Our novel contribution that gaze modulation of accumulated evidence signals was reflected in preSMA activity builds naturally on established findings. However, we acknowledge that a larger sample size would provide greater confidence in the null effects and would enable more detailed individual differences analyses.

      We have added a brief acknowledgement of the sample size limitation to the Discussion section of the main text:

      “While our sample size of 20 subjects is modest by current neuroimaging standards, the withinsubject statistical power from our extended decision paradigm (~380 observations per subject), combined with hypothesis-driven ROI analyses and multiple comparisons correction, provides confidence in our core findings. Nevertheless, replication with larger samples would be valuable, particularly for more fully characterizing null effects and marginal findings.”

      Recommendations for the authors:

      Editor Comments:

      Reviewer 1 in particular makes a number of suggestions for additional analyses that would help to strengthen the evidence supporting your conclusions.

      We thank the editor and the reviewers for the helpful suggestions for improving our manuscript. We discuss our efforts to address each point below.

      Reviewer #1 (Recommendations for the authors):

      (1) To address my concerns about GLM2, the first thing to do might be to simply show the correlation between the regressors used across the three different models (e.g., as a figure in the methods). Although the authors have done a good job to ensure that AV and SV are decorrelated when including them both in the same model, they haven't shown us whether the regressors used in, for example, GLM2 are correlated/similar to the regressors used in GLM1. This is important information for interpretation.

      Thank you for raising concerns about the overlap between different models. We agree that additional information regarding the correlation among sample-level regressors would aide readers in understanding the differences among the analyses. We now include this information in Figure 7 in the Methods section, as requested. While |SV| was uncorrelated with gaze-weighted |SV| (|SV<sub>Gaze</sub>|; Pearson’s r = 0.002, p = 0.848), lagged |AV| was significantly correlated with lagged, gaze-weighted |AV| (lagged |AV<sub>Gaze</sub>|; r = 0.365, p < 2.2 × 10<sup.-16</sup>).

      (2) The acid test for gaze-modulation of value signals would be to show that the gazemodulated signals explain the fMRI results over and above the non-gaze-modulated signals. This could simply mean including SVgaze and SV (and equivalent terms for AV) within the same GLM. Following from point (1), the authors may point out that these terms are highly correlated - yes, but the GLM will then test for the effects of SVgaze *over and above* the effects of SV. (In fact, although I'd normally caution against orthogonalisation - it would here be totally legitimate to orthogonalise SVgaze w.r.t. SV).

      We appreciate the reviewer’s suggestions for more robust tests of the presence of gaze-weighted signals. For reasons highlighted in our response above, we were initially hesitant to include both types of regressors in the same model due to their significant correlation. However, we now report the results of this analysis in the main text as the new GLM 1. This model incorporates both gaze-weighted and non-gaze-weighted terms. For each contrast we used the same procedures as reported in the main text (family-wise error corrected at p<0.05 and clusterforming thresholds at p<0.005).

      In the vmPFC, we found significant effects of both |∆SV| (peak voxel: x = -14, y = 44, z = -12; t = 3.90, p = 0.0190) and |∆SV<sub>Gaze</sub>| (peak voxel: x = 4, y = 38, z = -4; t= 5.21 p = 0.004), but no effects of |∆AV| or |∆AV<sub>Gaze</sub>|. The striatum also showed a significant correlation with |∆SV<sub>Gaze</sub>| (peak voxel: x = 22, y = 20, z = -10; t = 5.10 p = 0.014), but no other regressors.

      In the pre-SMA, we found a significantly positive relationship with both |∆AV| (peak voxel: x = 4, y = 14, z = 50; t = 4.75 p < 0.001) and |∆AV<sub>Gaze</sub>| (peak voxel: x = 4, y = 18, z = 50; t = 2.98, p = 0.032). In contrast, the dlPFC (x = 40, y = 34, z = 26; t = 6.83, p < 0.001) and IPS (x = 42, y = -50, z = 42; t = 5.16, p \= 0.010) were only correlated with |∆AV|. No other significant contrasts emerged.

      These results provide direct support for the presence of gaze-modulated value signals in the brain, which we now describe in the main text Results section.

      (3) With regards to GLM3, it would help to provide a bit more detail on what the time series looks like for the gaze regressor in this model - is it the entire timeseries of gaze (which presumably shifts back/forth between options multiple times within each trial) which is being convolved with the HRF? This seems different from how gaze is being calculated in GLM2, where it is amalgamated into an 'average gaze difference' within a sample between left/right options, if I understand the text correctly?

      We apologize for the lack of details regarding how we operationalized the gaze regressors in our analyses. You are correct that the gaze regressor was calculated differently in GLM2 and GLM3.

      However, in response to the reviewer’s points above (Major Point 2) and below (Major Point 4, Minor Point 1), we have decided to drop the old GLM3 from the paper while incorporating a revised GLM1 (combining old GLM1 and GLM2) and two new GLMs (see responses to Major Point 4 and Minor Point 1) to provide clearer evidence for gaze modulation of accumulated value in the brain.

      (4) Also, is there not a reason why it isn't more appropriate to interact AV with *previously deployed gaze difference* (accumulated across previous samples) in this model, rather than the current gaze location? The latter seems to rely upon the indirect linkage via the behavioural modelling result, which seems to weaken the claim.

      We thank the reviewer for this suggestion. We agree that our original GLM3 approach was limited because it interacted AV with current binary gaze location, which relies on the indirect behavioral relationship we established (i.e., that current gaze is negatively correlated with accumulated past gaze).

      The original GLM2 (which is now incorporated into the new GLM1) implemented something similar to what the reviewer is suggesting as it used gaze-weighted values accumulated across all previous samples. Specifically, in GLM2, the gaze-weighted accumulated value (AV<sub>gaze</sub>) was calculated as the sum of all previous sampled values, each weighted by the proportion of gaze allocated to each option during that sampling period.

      However, to more directly test whether accumulated evidence signals are modulated by accumulated gaze allocation we have now run an additional analysis (GLM2). In this analysis we have revised the old GLM3 to include additional regressors: ∆SV, lagged ∆AV, current gaze location, accumulated dwell advantage, ∆SV × current gaze location, and lagged ∆AV × accumulated dwell advantage.

      The two new regressors were defined as follows:

      Accumulated dwell advantage: For each sample t, accumulated dwell advantage represents the cumulative difference in gaze allocation up to sample t-1, calculated as (total dwell left – total dwell right) / (total dwell left + total dwell right). This is a continuous measure from -1 (all previous gaze to right) to +1 (all previous gaze to left).

      ∆AV × accumulated dwell advantage: The interaction between accumulated values and accumulated dwell advantage, which directly tests whether brain regions encoding accumulated value are modulated by the history of gaze allocation.

      This approach is conceptually similar to old GLM2’s gaze-weighting method, but allows us to examine the interaction effect more explicitly as a separate regressor rather than having it embedded within the value calculation.

      Here, we found that the pre-SMA showed a positive correlation with the ∆AV × accumulated dwell advantage term (peak voxel: x = 8, y = 10, z = 58; t = 3.10, p = 0.0258). Surprisingly, the striatum also showed a correlation with this term (peak: x = -16, y = 10, z = -6; t = 4.07, p = 0.0176). No other ROIs showed significant relationships.

      This analysis provides additional evidence that pre-SMA encodes accumulated value signals that are modulated by accumulated gaze allocation, without relying on indirect relationships between current and past gaze. We now report these results in the main text as GLM2 as follows:

      “To more directly test whether accumulated evidence signals were modulated by accumulated gaze allocation throughout a trial, we conducted additional, exploratory analyses. Specifically, we ran a GLM that incorporated the following two terms: accumulated dwell advantage and ∆AV × accumulated dwell advantage, in addition to ∆SV, the current gaze location, and ∆SV × current gaze location.

      We calculated accumulated dwell advantage as follows: For each sample t, accumulated dwell advantage is the cumulative difference in gaze allocation up to sample t-1, calculated as (total dwell left – total dwell right) / (total dwell left + total dwell right). This is a continuous measure from -1 (all previous gaze to right) to +1 (all previous gaze to left).

      We also included the interaction between accumulated dwell advantage and ∆AV (i.e., signed accumulated evidence). This interaction term is positive when gaze is primarily to the left and left has more value or when gaze is primarily to the right and right has more value. This interaction term directly tests whether brain regions encoding accumulated evidence are modulated by the history of gaze allocation. This approach allows us to examine the interaction effect more explicitly as a separate regressor rather than having it embedded within the value calculation itself.

      This GLM revealed a positive correlation between pre-SMA activity and the ∆AV × accumulated dwell advantage term (peak voxel: x = 8, y = 10, z = 58; t = 3.01, p = 0.026). Surprisingly, the striatum also showed a correlation with this term (peak voxel: x = -16, y = 10, z = -6; t = 4.07, p = 0.018). Additionally, activity in the dlPFC was positively correlated with ∆SV (peak voxel: x = -36, y = 34, z = 22; t = 3.96, p \= 0.016). No other ROIs showed significant relations.

      This analysis provides additional evidence that the pre-SMA encodes accumulated value signals that are modulated by the history of gaze allocation.”

      Minor

      (1) "In Trial A, the subject looks left 30% of the time and right 70% of the time. In Trial B, the subject looks left 70% of the time and right 30% of the time. In Trial A, the net input value ("drift rate") would be |0.3 ∙ 7 − 0.7 ∙ 3| = 0. In Trial B, the drift rate would be |0.7 ∙ 7 − 0.3 ∙ 3| = 4." I may be missing something, but isn't this consistent with an aDDM with theta=0, rather than theta=0.3-0.5 as is typically found?

      The reviewer raises an important point about our assumptions regarding attentional discounting. We agree that our approach could be problematic as it may assume stronger discounting than has been observed in the literature.

      To address this concern, we calculated drift on a sample-by-sample basis before aggregating to the trial level. Following Smith, Krajbich, and Webb (2019), for each individual sample within a trial, we computed:

      β = (G<sub>Left</sub> × V<sub>Left</sub>) – (G<sub>Right</sub> × V<sub>Right</sub>)

      γ = (G<sub>Right</sub> × V<sub>Left</sub>) – (G<sub>Left</sub> × V<sub>Right</sub>),

      where G<sub>Left</sub> and G<sub>Right</sub> represent the proportion of time spent fixating left versus right within that specific sample, and V<sub>Left</sub> and V<sub>Right</sub> are the instantaneous values of the left and right options. We then averaged these sample-level β and γ values across all samples within each trial to obtain trial-level regressors. This approach preserves the fine-grained temporal dynamics of gazedependent value accumulation that would be lost by calculating gaze proportions only at the trial level.

      Using this sample-level method in a mixed-effects logistic regression predicting choice (left vs. right), we estimated subject-specific values of θ = γ/β. Across our sample (N=20), we found mean θ = 0.77 (SD = 0.21, range = 0.55–1.25). These estimates are somewhat higher than the typical aDDM findings of attentional bias (θ = 0.3–0.5). This may reflect the drawn-out nature of this task relative to prior aDDM tasks.

      Next, we ran a new GLM that incorporated these θ estimates in the sampled value estimates. For this GLM3, we computed θ-weighted sampled-value (|∆_TW_SV|) as:

      TWSV = (G<sub>Left</sub> × (V<sub>Left</sub> – θV<sub>Right</sub>)) – (G_R × (V<sub>Right</sub> – θV<sub>Left</sub>)).

      Similar to GLM1, we computed an accumulated value signal based on the lagged sum of previous samples’ |∆_TW_SV| (i.e., |∆_TW_AV|).

      We found significant positive effects of |∆TW_SV| in the vmPFC (peak voxel: x = -14, y = 44, z = -12; t = 3.57, _p = 0.0270) and IPS (peak voxel: x = 30, y = -28, z = 40; t = 4.58 p = 0.0198), but in no other ROI.

      In contrast, we found significant positive relationships between |∆TW_AV| and activity in the preSMA (peak voxel: x = 0, y = 22, z = 52; t = 4.68, _p = 0.0014), dlPFC (peak voxel: x = 40, y = 32, z = 26; t = 4.32, p = 0.0040), and IPS (peak voxel: x = 44, y = -48, z = 42; t = 6.26, p < 0.0000). Notably, we also observed a significant relationship between |∆TW_AV| and activity in the vmPFC (x = 8, y = 38, z = 18; t = 3.89, _p = 0.0410). No other significant contrasts emerged.

      We now report this additional analysis as GLM3 in the main text, as follows:

      “In our first set of analyses, we implicitly assumed complete discounting of non-fixated information, in contrast with previous studies that have generally found only partial discounting (Krajbich et al., 2010; Sepulveda et al., 2020; Smith & Krajbich, 2019; Westbrook et al., 2020). To verify that our results are robust to inter-subject variability in attentional discounting, we estimated subject-level attentional discounting parameters and then re-estimated our original GLM with new, recalculated gaze-weighted value regressors.

      Following Smith, Krajbich, and Webb (2019), for each individual sample within a trial, we computed:

      β = (G<sub>Left</sub> × V<sub>Left</sub>) – (G<sub>Right</sub> × V<sub>Right</sub>) γ = (G<sub>Right</sub> × V<sub>Left</sub>) – (G<sub>Left</sub> × V<sub>Right</sub>), where G<sub>Left</sub> and G<sub>Right</sub> represent the proportion of time spent gazing left versus right within that specific sample, and V<sub>Left</sub> and V<sub>Right</sub> are the instantaneous values of the left and right options. We then averaged these sample-level β and γ values across all samples within each trial to obtain trial-level regressors. We then ran a mixed-effects logistic regression predicting choice (left vs. right) as a function of β and γ and then calculated subject-specific values of θ = γ/β. Across our sample (N=20), we found mean θ = 0.77 (SD = 0.21, range = 0.55–1.25).

      Next, for the GLM, we computed θ-weighted sampled-value (|∆SV<sub>θ</sub>|) as:

      SV<sub>θ</sub> = (G<sub>Left</sub> × (V<sub>Left</sub> − _θ_V<sub>Right</sub>)) – (G<sub>Right</sub> × (V<sub>Right</sub> − _θ_V<sub>Left</sub>))

      Similar to the original GLM, we computed an accumulated value signal, |∆AV<sub>θ</sub>|, based on the lagged sum of previous samples’ |∆SV<sub>θ</sub>|.

      We found significant positive effects of |∆SV<sub>θ</sub>| in the vmPFC (peak voxel: x = -14, y = 44, z = 12; t = 3.57 p = 0.027) and IPS (peak voxel: x = 30, y = -28, z = 40; t = 4.58 p = 0.020), but in no other ROI.

      In contrast, we found significant positive relationships between |∆AV<sub>θ</sub>| and activity in the preSMA (peak voxel: x = 0, y = 22, z = 52; t = 4.68, p = 0.001), dlPFC (peak voxel: x = 40, y = 32, z = 26; t = 4.32, p = 0.004), and IPS (peak voxel: x = 44, y = -48, z = 42; t = 6.26, p < 0.0001). Notably, we also observed a significant relationship between |∆AV<sub>θ</sub>| and activity in the vmPFC (x = 8, y = 38, z = 18; t = 3.89, p = 0.041). No other significant contrasts emerged.

      In summary, these analyses provide additional evidence that the vmPFC encodes gaze-weighted sampled value signals and the pre-SMA encodes gaze-weighted accumulated value signals, though other correlations also emerged.”

      (2) The reporting of statistical results in the fMRI could be sharpened - e.g. in the figure legends, don't just say "Voxels thresholded at p < .05.", but make clear whether you mean FWE whole-brain corrected (I think you do from the methods) or whether this is uncorrected for display; similarly, for the peak voxels, report the associated Z statistic at that voxel rather than just "negative beta".

      We agree that it is important to include additional details regarding how we reported the statistical results. We now clarify our procedures in the main text:

      “We report results using FWE-corrected statistical significance of p < 0.05 and a cluster significance threshold of p < 0.005.”

      We now also report the T statistics for peak voxels.

      (3) A couple of the citations are slightly wrong - e.g., Kolling et al 2012 shouldn't be cited as arguing for decision conflict, as in fact it argues strongly against this account and in favour of a foraging account of ACC activity. Similarly, Hunt et al 2018 doesn't provide support for decision conflict; instead, it shows signals in ACC show evidence accumulation for left/right actions over time (although not whether these accumulator signals are gazeweighted, in the same way as the present study).

      We thank the reviewer for pointing out these mistakes in our citations. We have revised the references throughout.

      Reviewer #2 (Recommendations for the authors):

      (1) In some places, the introduction would benefit from fleshing out certain points. For example it is stated “For instance, decisions that are less predictable also tend to take more time (Konovalov & Krajbich, 2019) and can be influenced by attention manipulations (Parnamets et al., 2015; Tavares et al., 2017; Gwinn et al., 2019; Bhatnagar & Orquin, 2022). The quantitative relations between these measures argue for an evidenceaccumulation process.” It is not clear why the relations between them argue for an EA process, and the reader would benefit from some further explanation.

      We thank the reviewer for this helpful suggestion. We agree that the original text did not sufficiently explain why these relationships support evidence-accumulation models. We have revised the introduction to better articulate the mechanistic basis for this claim.

      This revision clarifies these points in the main text:

      “Decisions like this are thought to rely on a bounded, evidence-accumulation process that depends on factors such as the value of the sampled information and shifts in attention. According to this framework, when two options are similar in value, evidence accumulates more slowly towards the decision threshold, resulting in longer response times (RT) and more opportunity for shifts in attention to influence the choice outcome. In contrast, when one option is clearly superior, evidence accumulates more rapidly and the decision is made quickly with less of a relation between gaze and choice. This choice process produces reliable, quantitative patterns in choice, RT, and eye-tracking data (Ashby et al., 2016; Callaway et al., 2021; Gluth et al., 2018; Krajbich et al., 2010; Smith & Krajbich, 2018). For instance, decisions with similar values are more random (i.e., less predictable), tend to take more time (Konovalov & Krajbich, 2019), and can be experimentally manipulated by diverting attention towards one option more than the other (Bhatnagar & Orquin, 2022; Gwinn et al., 2019; Pärnamets et al., 2015; Pleskac et al., 2022; Tavares et al., 2017). Critically, these behavioral measures do not simply correlate; rather, they exhibit precise quantitative relationships consistent with evidence accumulation models (Konovalov & Krajbich, 2019).”

      (2) Some of the study hypotheses also need to be clarified. What are the hypotheses regarding how SV and AV should translate to BOLD in an input vs integrator region? Larger SV/AV = larger BOLD? What predictions would be made for a time-on-task or conflict region? Are the predictions the same or different? Clarifying this will help the reader to understand to what extent the gaze manipulation is pivotal in identifying integrator regions.

      We thank the reviewer for this excellent suggestion. We agree that it is useful to clearly articulate our hypotheses about BOLD signal predictions for different aspects of the model, and why gaze manipulation is critical for distinguishing between them. We have now expanded the introduction to clarify these predictions.

      For input regions, we predicted a straightforward positive relationship: larger sampled value (|ΔSV|) should produce larger BOLD activity. Input regions encode the momentary evidence being sampled (i.e., the relative value of currently presented stimuli). Consistent with prior work (Bartra et al., 2013), we expected such activity in the vmPFC and ventral striatum.

      Critically, we also predicted that these sampled value signals should be modulated by gaze location. The attentional drift-diffusion model (aDDM; Krajbich et al., 2010) posits that attended items receive full value weight while unattended items are discounted. Consistent with prior work (Lim et al., 2011), we expected stronger vmPFC/striatum activity when the higher-value item is fixated compared to when the lower-value item is fixated

      For integrator regions, we predicted an analogous positive relationship: larger accumulated value (|ΔAV|) should produce more BOLD activity. Accumulator regions encode the summed evidence over the course of the decision. Consistent with prior work (Hare et al. 2011; Gluth et al. 2021; Pisauro et al. 2017) we expected such activity in the pre-SMA, dlPFC, and, IPS.

      As with sampled value, we predicted that integrator activity should reflect gaze-weighted accumulated value. Just as inputs are modulated by current gaze, the accumulated evidence should be weighted by the history of gaze allocation over the entire trial.

      Conflict-based models make qualitatively different predictions. Regions implementing conflict monitoring should show increased activity when options are similar in value, regardless of time.

      The conflict account predicts that BOLD activity should scale with inverse value difference: smaller |ΔV| → higher conflict → higher BOLD (Shenhav et al., 2014, 2016). In simple choice tasks, high conflict and high accumulated value are both associated with long RT (Pisauro et al. 2017), leading to ambiguity about how to interpret purported neural correlates of accumulated value. In our task we avoid this ambiguity – we analyze the effect of accumulated value at each point in time, not just at the time of decision. In this case, conflict should be inversely correlated with accumulated value. Moreover, the conflict account makes no predictions about how BOLD activity should be modulated by gaze allocation for a given set of values.

      A more serious concern is the potential link to putative time-on-task BOLD activity. Accumulated value inevitably increases with time, leading to a correlation between the two variables (Grinband et al. 2011; Holroyd et al., 2018; Mumford et al. 2024). This is where the gaze data become particularly important. Time-on-task regions should show no relation with gaze allocation. After accounting for non-gaze-weighted accumulated value, only accumulator, and not time-on-task, regions should show a relation with gaze-weighted accumulated value. The results of the revised GLMs provide exactly such evidence.

      We have edited the manuscript to make clear to readers why our gaze manipulation was not merely exploratory but rather a theoretically-motivated test to distinguish between competing models of decision-related neural activity.

      We have clarified our study hypotheses in the Introduction as follows:

      “We hypothesized that we would find (1) a positive correlation between gaze-weighted |SV| and activity in the reward network (the ventromedial prefrontal cortex (vmPFC) and ventral striatum), and (2) a positive correlation between gaze-weighted |AV| in the pre-supplementary motor area (pre-SMA) (Aquino et al., 2023), dorsolateral prefrontal cortex (dlPFC), and intraparietal sulcus (IPS).”

      We have also added clarifying text about conflict and time-on-task to the Discussion as follows: “Conflict-based models make qualitatively different predictions. Regions implementing conflict monitoring should show increased activity when options are similar in value, regardless of time. The conflict account predicts that BOLD activity should scale with the inverse value difference: smaller |ΔV| → higher conflict → higher BOLD (Shenhav et al., 2014, 2016). In simple choice tasks, high conflict and high accumulated value are both associated with long response times (Pisauro et al., 2017), leading to ambiguity about how to interpret purported neural correlates of accumulated value. In our task we avoided this ambiguity by analyzing the effect of accumulated value at each point in time, not just at the moment of decision. Under this approach, conflict should be inversely correlated with accumulated value (as higher accumulated evidence indicates less similarity between options). Moreover, the conflict account makes no predictions about how BOLD activity should be modulated by gaze allocation for a given set of option values.

      A more serious concern is the potential confound with time-on-task BOLD activity. Accumulated value inevitably increases with time within a trial, leading to a correlation between the two variables (Grinband et al., 2011; Holroyd et al., 2018; Mumford et al., 2024). This is where the gaze data were particularly important. Time-on-task regions should show no relation with gaze allocation patterns. After accounting for non-gaze-weighted accumulated value, only accumulator regions, and not time-on-task regions, should show a relationship with gazeweighted accumulated value. The results of our analyses provide exactly such evidence: preSMA activity was positively correlated with gaze-weighted accumulated value, even when accounting for previous gaze history and individual differences in attention discounting.”

      (3) The authors allude to there being a correlation between SV and AV on this task, but the correlation is never reported. Please report the correlation with and without the removal of T-1.

      We appreciate the reviewer pointing out this omission. We now report all correlations between SV and both the lagged and non-lagged versions of AV in the Methods section (Fig. 7). SV was significantly correlated with the full calculation of AV (Pearson’s r = 0.27). In contrast, this correlation, while still statistically significant, decreased when compared to lagged AV (Pearson’s r = 0.06).

      (4) When examining relationships between SV, AV, and choice probability, the authors note that a larger coefficient for SV compared to AV is an inevitable consequence of an SSM choice process. Please explain why this is the case.

      The reviewer is correct in observing that this point was not made sufficiently clear in the main text. We have now expanded the explanation in the behavioral results section.

      The key insight is that in sequential sampling models, choices occur when accumulated evidence reaches a decision threshold. Importantly, the perceived value of each sample consists of the true underlying value plus random noise. The final sample (SV) is what pushes the accumulated evidence over the threshold, which creates a selection bias: decisions tend to occur when the noise component of SV happens to be positive and large. This means that the perceived final SV systematically overestimates the true SV, biasing upward the regression coefficient for the effect of SV on choice. In contrast, AV represents the sum of all previous sampled evidence, samples that we know did not lead to a choice. These samples are thus more likely to have had a negative or small noise component, meaning that the perceived AV systematically underestimates the true AV. This biases downwards the regression coefficient for the effect of AV on choice.

      In the net, we expect that even when sample evidence is weighted equally over time in the true decision process, regression analyses will inevitably shower larger coefficients for the effects of SV then for those of AV. This is a statistical artefact of the threshold-crossing mechanism, and not a reflection of differential weighting. We have incorporated this explanation into the revised manuscript to make clear why this pattern is an expected consequence of the SSM framework:

      “The larger coefficient for ∆SV compared to ∆AV is an inevitable consequence of an SSM choice process. In SSMs, a choice occurs when accumulated evidence reaches a threshold. Critically, perceived value for any given sample consists of the true underlying value plus random noise. The final sample (∆SV) is what pushes the accumulated evidence over the threshold, which creates a selection effect: decisions tend to be made when the noise component of ∆SV is relatively large and aligned with the ultimate choice, causing the perceived final ∆SV to systematically overestimate the true ∆SV. As a result, the regression coefficient for the effect of final ∆SV on choice is overestimated. In contrast, ∆AV represents the sum of all previous evidence, which includes samples that were insufficient to trigger a choice and thus more likely to have noise components that favored the non-chosen option. This means that the perceived ∆AV systematically underestimates the true ∆AV. As a result, the regression coefficient for the effect of ∆AV on choice is underestimated. This creates an inherent asymmetry between ∆SV and ∆AV: even when the true decision process weights evidence equally over time, regression analyses will show larger coefficients for ∆SV than ∆AV. For any data generated by an SSM, regressing choice probability on final ∆SV and total ∆AV would produce a larger coefficient for ∆SV due to this threshold-crossing selection effect.”

      (5) It is not clear to me why the authors single out the pre-SMA only in the abstract when IPS and dlPFC also show stronger correlations with AV and exhibit gaze modulation in the authors' final non-linear analysis. Further explanation is required in the Discussion and I would also suggest amending the Abstract because the 'Most importantly' claim will not be meaningful for the reader.

      We appreciate the reviewer’s point. In the revised manuscript, we have included several new GLMs, including the new GLM1 that looks at gaze-weighted AV, above and beyond the effect of non-gaze-weighted AV. That analysis only supports pre-SMA. We have now clarified this in the Abstract as follows:

      “Finally, we found gaze modulated accumulated-value signals, above and beyond the non-gazemodulated signals, in the pre-supplementary motor area (pre-SMA), providing novel evidence that visual attention has lasting effects on decision variables and suggesting that activity in the pre-SMA reflects accumulated evidence.”

      (6) Some discussion of statistical power would be warranted given that a sample of 23 is now considered small by current fMRI standards.

      We appreciate the reviewer raising this important issue. We acknowledge that our sample size of 23 subjects (with only 20 having useable eye-tracking data) is on the small side by current fMRI standards. However, we believe several features of our study design and analytic approach mitigate concerns regarding statistical power.

      First, our paradigm leveraged a within-subjects design with high total sample counts. Each participant completed approximately 60 choice trials across three 15-minute runs, with an average of 6.37 samples per trial. This yielded roughly 380 observations per participant, providing substantial statistical power at the individual level before aggregating across subjects. This within-subject power is particularly important for detecting parametric effects, as our regressors of interest (|∆SV| and |∆AV|) varied continuously across and within trials.

      Second, rather than conducting an exploratory whole-brain analysis that would require larger sample sizes to correct for multiple comparisons, we employed a targeted ROI approach based on well-established regions from prior literature (e.g., Bartra et al., 2013; Hare et al., 2011). This ROI-driven approach substantially increases statistical power by reducing the search space and leverages theoretical predictions about where effects should occur. Our novel contribution that gaze modulation of accumulated evidence signals was reflected in pre-SMA activity builds naturally on established findings.

      However, we acknowledge that a larger sample size would provide greater confidence in the null effects and would enable more detailed individual differences analyses.

      We have added a brief acknowledgement of the sample size limitation to the Discussion section of the main text:

      “While our sample size of 20 subjects is modest by current neuroimaging standards, the withinsubject statistical power from our extended decision paradigm (~380 observations per subject), combined with hypothesis-driven ROI analyses and multiple comparisons correction, provides confidence in our core findings. Nevertheless, replication with larger samples would be valuable, particularly for more fully characterizing null effects and marginal findings.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors describe a method to probe both the proteins associated with genomic elements in cells, as well as 3D contacts between sites in chromatin. The approach is interesting and promising, and it is great to see a proximity labeling method like this that can make both proteins and 3D contacts. It utilizes DNA oligomers, which will likely make it a widely adopted method. However, the manuscript over-interprets its successes, which are likely due to the limited appropriate controls, and of any validation experiments. I think the study requires better proteomic controls, and some validation experiments of the "new" proteins and 3D contacts described. In addition, toning down the claims made in the paper would assist those looking to implement one of the various available proximity labeling methods and would make this manuscript more reliable to non-experts.

      Strengths:

      (1) The mapping of 3D contacts for 20 kb regions using proximity labeling is beautiful.

      (2) The use of in situ hybridization will probably improve background and specificity.

      (3) The use of fixed cells should prove enabling and is a strong alternative to similar, living cell methods.

      Weaknesses:

      (1) A major drawback to the experimental approach of this study is the "multiplexed comparisons". Using the mtDNA as a comparator is not a great comparison - there is no reason to think the telomeres/centrosomes would look like mtDNA as a whole. The mito proteome is much less complex. It is going to provide a large number of false positives. The centromere/telomere comparison is ok, if one is interested in what's different between those two repetitive elements.

      We appreciate the reviewers' point here. In fact we selected the mitochondrial DNA as a target for just the reason that the reviewer notes. mtDNA should be spatially distinct from the nuclear targets and allow us to determine if we were in fact seeing spatially distinct proteins at the interorganelle (mtDNA vs. telomeres/centrosomes) and intraorganelle (telomeres vs centromeres) levels.

      But the more realistic use case of this method would be "what is at a specific genomic element"? A purely nuclear-localized control would be needed for that. Or a genomic element that has nothing interesting at it (I do not know of one).

      We have now added two studies in Figure 4 and Figure 5 detailing the use of OMAP to investigate specific genomic elements. In this case the Hox clusters (HOXA and HOXB) and haplotype-specific analysis of X-chromosome inactivation centers in female murine (EY.T4) cells. The controls in these cases are more specific, in line with those suggested by the reviewer as we (1) compare HOXA and HOXB with or without EZH2 inhibition using the same sets of probes and (2) specifically compare the region surrounding the XIC in female cells for the inactive and active X chromosomes.

      You can see this in the label-free work: non-specific, nuclear GO terms are enriched likely due to the random plus non-random labeling in the nucleus. What would a Telo vs general nucleus GSEA look like? (GSEA should be used for quantitative data, no GO). That would provide some specificity. Figures 2G and S4A are encouraging, but a) these proteins are largely sequestered in their respective locations, and b) no validation by an orthogonal method like ChIP or Cut and Run/Tag is used.

      We performed GSEA on the enrichment scores for the label-free proteomics data from the SAINT output in Figure 1D and that several of these proteins (e.g., those highlighted in Figure 2A: TERF1, CENPN, TOM70) have already been extensively validated to co-localize to these locations.

      To the reviewers request for additional validation, we analyzed ChIP-seq data for several proteins to determine if they were enriched surrounding specific loci. In the case of the HoxA/B analysis, we found that HDAC3 and TCF12 were enriched at HOXB compared to HOXA, and SMARCB1 and ZC3H13 were enriched at HOXA compared to HOXB (Figure 4C). HDAC3 and TCF12 ChIP data confirmed increased peak calls at HOXB and SMARCB1 and ZC3H13 ChIP data confirmed increased peak calls at HOXA for these four selected proteins (Figure 4D).

      You can also see this in the enormous number of "enriched" proteins in the supplemental volcano plots. The hypothesis-supporting ones are labeled, but do the authors really believe all of those proteins are specific to the loci being looked at? Maybe compared to mitochondria, but it's hard to believe there are not a lot of false positives in those blue clouds. I believe the authors are more seeing mito vs nucleus + Telo than the stated comparison. For example, if you have no labeling in the nucleus in the control (Figures 1C and 2C) you cannot separate background labeling from specific labeling. Same with mito vs. nuc+Telo. It is not the proper control to say what is specifically at the Telo.

      We agree with the reviewer that compared to mitochondrial targeting, there could be non-specific nuclear comparisons. We note again though that we purposefully stayed away from using the word “specifically” when describing the proteomics work developed here. The reason being that we are not atlasing a large number of targets to define specificity. Instead, we highlight in Figure 2 that we did observe differences in proteins associating with telomeres and mitochondrial DNA. That may be non-specific, and in fact, this is also why we decided to include two nuclear targets to determine what might be specifically enriched. Thus, we compared centromeric and telomeric protein enrichment as determined by OMAP and observed consistent differential enrichment of shelterin proteins at telomeres (Figure 2I) and CENP-A complex members at centromeres (Figure 2J). We could have done the relative comparisons to no-oligo controls, analogous to how CASPEX compared targeted analyses to no-sgRNA controls (PMID: 29735997). However, we found that the mitochondrial targeted samples were generally better as a comparator because (1) we have clear means to validate differences and (2) the local environment around DNA is being labeled.

      I would like to see a Telo vs nuclear control and a Centromere vs nuc control. One could then subtract the background from both experiments, then contrast Telo vs Cent for a proper, rigorous comparison. However, I realize that is a lot of work, so rewriting the manuscript to better and more accurately reflect what was accomplished here, and its limitations, would suffice.

      Assuming the nuclear control was the same, It is unclear how this ratio-of-ratios ([Telo/Ctrl]/[Cent/ctrl]) experiment would be inherently different from the direct comparison between Telo and Centromere. Again, assuming the backgrounds are derived from the same cellular samples. More than likely adding the extra ratios could increase the artifactual variance in the estimates, reducing the power of the comparisons as has been seen in proteomics data using ratio-of-ratio comparisons in the past (Super-SILAC).

      (2) A second major drawback is the lack of validation experiments. References to literature are helpful but do not make up for the lack of validation of a new method claiming new protein-DNA or DNA-DNA interactions. At least a handful of newly described proximal proteins need to be validated by an orthogonal method, like ChIP qPCR, other genomic methods, or gel shifts if they are likely to directly bind DNA. It is ok to have false positives in a challenging assay like this. But it needs to be well and clearly estimated and communicated.

      We appreciate the reviewers' point here. To be clear, we have not made any claims about new proteins at specific loci. Instead we validated that known telomeric and centromeric associating proteins were consistently enriched by DNA OMAP (Figure 2). We also want to emphasize that while valuable, the current paper is not an atlasing paper to define the full and specific proteomes of two genomic loci. We instead show how this method can be used to observe quantitative differences in proteins enriched at certain loci (HOXA/B work, Figure 4) and even between haplotypes (Xi/Xa work, Figure 5).

      (3) The mapping of 3D contacts for 20 kb regions is beautiful. Some added discussion on this method's benefits over HiC-variants would be welcomed.

      We appreciate the reviewers' point here and have added the following text to the discussion: “Additionally, we show that this method is also able to detect DNA-DNA contacts through biotinylation of loop anchors. Our approach functions similarly to 4C[86]. However, our approach of biotin labeling of contacts does not rely on pairwise ligation events. Thus, detection of contacts through DNA O-MAP will vary in the sampling of DNA-DNA contacts in comparison.”

      (4) The study claims this method circumvents the need for transfectable cells. However, the authors go on to describe how they needed tons of cells, now in solution, to get it to work. The intro should be more in line with what was actually accomplished.

      We took the reviewers point and have worked to scale down the DNA OMAP experiments while revising this manuscript. As noted in Figure 5, we have been able to scale this work down to work on plates with ~10x fewer cells than with our initial experiments. This is on top of the initial DNA OMAP work in Figure 1 and 2, as well as our additional work in Figure 4, where we are using 30-60 million cells in solutions which is still 10x less material than previous work (PMID: 29735997). Thus, the newest DNA OMAP platform uses ~100x fewer cells than previous work.

      (5) Comments like "Compared to other repetitive elements in the human genome...." appear to circumvent the fact that this method is still (apparently) largely limited to repetitive elements. Other than Glopro, which did analyze non-repetitive promoter elements, most comparable methods looked at telomeres. So, this isn't quite the advancement you are implying. Plus, the overlap with telomeric proteins and other studies should be addressed. However, that will be challenging due to the controls used here, discussed above.

      As noted above, we have added Figures 4 and 5 to address the reviewer concerns by targeting multiple non-repetitive loci (HOXA and HOXB clusters and a 4.5Mb region straddling X-inactivation center on both the active and inactive X homolog). Targeting the regions around the X-inactivation center shows the potential to perform haplotype-resolved proteome analysis of chromatin interactors.

      For the telomeric protein overlap, we tried to do this specifically in Figure 1F, we agree with the reviewer that the controls used dramatically change the proteins considered enriched. The goal of the network analysis was to show (1) that we identify proteins previously observed in telomere proteomic datasets and (2) that we gain a more complete view of proteins based on capturing more known interacting proteins than many previous methods as was noted for the RNA OMAP platform (PMID: 39468212). For example, we observed enrichment of PRPF40A in the telomeric DNA OMAP data. From the Bioplex interactome, PRPF40A was observed to interact with TERF2IP and TERF2, suggesting that through these interactions PRPF40A may colocalize at telomeres. Similarly, we observed enrichment of SF3A1, SF3B1, and SF3B2. The SF3 proteins are known regulators of telomere maintenance (PMID: 27818134), but have not previously been observed in telomeric proteomics datasets, except now in DNA OMAP.

      We have added the following text to the Results to clarify these points:

      “To benchmark DNA O-MAP, we compared the full set of telomeric proteins to proteins observed in five established telomeric datasets (PICh, C-BERST, CAPLOCUS, CAPTURE, BioID)12,14,16,35,36 (Figure 1F). DNA O-MAP captured both previously observed telomeric interacting proteins (shelterins) as well as telomere associated proteins (ribonucleoproteins). We identified multiple heterogeneous nuclear ribonucleoproteins (hnRNPs) previously annotated as telomere-associated, including HNRNPA1 and HNRNPU. HNRNPA1 has been demonstrated to displace replication protein A (RPA) and directly interact with single-stranded telomeric DNA to regulate telomerase activity37–39. HNRNPU belongs to the telomerase-associated proteome40 where it binds the telomeric G-quadruplex to prevent RPA from recognizing chromosome ends41. We mapped DNA O-MAP enriched telomeric proteins to the BioPlex protein interactome and observed that in addition to capturing proteins from previously observed telomeric datasets (Figure 1F), DNA O-MAP enriched for interactors of previously observed telomeric proteins. Previous data found RBM17 and SNRPA1 at telomeres, and in BioPlex these proteins interact with three SF3 proteins (SF3A1, SF3B1, SF3B2). Though they were not identified in previous telomeric proteome datasets, all three of these SF3 proteins were enriched in the DNA O-MAP telomeric data. Furthermore, through interactions with G-quadruplex binding factors, these SF3 proteins are regulators of telomere maintenance (PMID: 27818134). Taken together, this data supports the effectiveness of DNA O-MAP for sensitively and selectively isolating loci-specific proteomes.”

      Reviewer #2 (Public review):

      Summary

      Liu and MacGann et al. introduce the method DNA O-MAP that uses oligo-based ISH probes to recruit horseradish peroxidase for targeted proximity biotinylation at specific DNA loci. The method's specificity was tested by profiling the proteomic composition at repetitive DNA loci such as telomeres and pericentromeric alpha satellite repeats. In addition, the authors provide proof-of-principle for the capture and mapping of contact frequencies between individual DNA loop anchors.

      Strengths

      Identifying locus-specific proteomes still represents a major technical challenge and remains an outstanding issue (1). Theoretically, this method could benefit from the specificity of ISH probes and be applied to identify proteomes at non-repetitive DNA loci. This method also requires significantly fewer cells than other ISH- or dCas9-based locus-enrichment methods. Another potential advantage to be tested is the lack of cell line engineering that allows its application to primary cell lines or tissue.

      We thank the reviewers for their comments and note that we have followed up on the idea of targeting non-repetitive DNA loci (HOXA and HOXB clusters and a 4.5Mb section of the X chromosome on each homolog) in the revised manuscript (Figures 4 and 5).

      Weaknesses

      The authors indicate that DNA O-MAP is superior to other methods for identifying locus-specific proteomes. Still, no proof exists that this method could uncover proteomes at non-repetitive DNA loci. Also, there is very little validation of novel factors to confirm the superiority of the technique regarding specificity.

      Our primary claim for DNA OMAP is that it requires orders of magnitude fewer cells than previous studies. Based on comments along these lines from both reviewers, we performed DNA OMAP targeting non-repetitive DNA loci (HOXA and HOXB clusters and a 4.5Mb section of the X chromosome on each homolog) in the revised manuscript (Figure 4 and 5). For the X chromosome targeting, we used ~3 million cells per condition with methods that we optimized during revision. When targeting HOXA and HOXA, we were able to identify HDAC3 and TCF12 enrichment at HOXB compared to HOXA as well as ZC3H13 and SMARB1 enrichment at HOXA compared to HOXB, which is consistent with ChIP-seq reads from ENCODE for these proteins (Figure 4C, D). Both the HOXand X chromosome work help to address limitations noted in the Gauchier et al. paper the reviewer notes as both show progress towards overcoming “the major signal-to-noise ratio problem will need to be addressed before they can fully describe the specific composition of single-copy loci”.

      The authors first tested their method's specificity at repetitive telomeric regions, and like other approaches, expected low-abundant telomere-specific proteins were absent (for example, all subunits of the telomerase holoenzyme complex). Detecting known proteins while identifying noncanonical and unexpected protein factors with high confidence could indicate that DNA O-MAP does not fully capture biologically crucial proteins due to insufficient enrichment of locus-specific factors. The newly identified proteins in Figure 1E might still be relevant, but independent validation is missing entirely. In my opinion, the current data cannot be interpreted as successfully describing local protein composition.

      We analyzed ChIP-seq reads for our HOXA and HOXB (Figure 4C,D) which recapitulate our findings for four of our differentially enriched proteins. We also note that with the addition of the nonrepetitive loci (Figures 4 and 5), we have performed DNA OMAP on seven different targets (telomeres, pericentromeres, mitoDNA, HOXA, HOXB, Xi, and Xa) and identified expected targets at each of these. The consistency of these data, which mirrors the consistency of the RNA implementation of OMAP (PMID: 39468212), reinforces that we can successfully enrich local proteomes at genomic loci.

      Finally, the authors could have discussed the limitations of DNA O-MAP and made a fair comparison to other existing methods (2-5). Unlike targeted proximity biotinylation methods, DNA O-MAP requires paraformaldehyde crosslinking, which has several disadvantages. For instance, transient protein-protein interactions may not be efficiently retained on crosslinked chromatin. Similarly, some proteins may not be crosslinked by formaldehyde and thus will be lost during preparation (6).

      Based on this critique we have gone back through the manuscript to improve the fairness of our comparisons and expanded the limitations in our discussion section.

      To the point about fixation, Schmiedeberg et al., which the reviewer references, does describe crosslinking requiring longer interactions (~5 s). Yet, as featured in reviews, many additional studies have found that “it has been possible to perform ChIP on transcription factors whose interactions with chromatin are known from imaging studies to be highly transient” (Review PMID: 26354429). We note similar results in proteomics analysis in Subbotin and Chait that state that the linkage of lysine-based fixatives like formaldehyde and “glutaraldehyde to reactive amines within the cellular milieu were sufficient to preserve even labile and transient interactions (PMID: 25172955).

      (1) Gauchier M, van Mierlo G, Vermeulen M, Dejardin J. Purification and enrichment of specific chromatin loci. Nat Methods. 2020;17(4):380-9.

      (2) Dejardin J, Kingston RE. Purification of proteins associated with specific genomic Loci. Cell. 2009;136(1):175-86.

      (3) Liu X, Zhang Y, Chen Y, Li M, Zhou F, Li K, et al. In Situ Capture of Chromatin Interactions by Biotinylated dCas9. Cell. 2017;170(5):1028-43 e19.

      (4) Villasenor R, Pfaendler R, Ambrosi C, Butz S, Giuliani S, Bryan E, et al. ChromID identifies the protein interactome at chromatin marks. Nat Biotechnol. 2020;38(6):728-36.

      (5) Santos-Barriopedro I, van Mierlo G, Vermeulen M. Off-the-shelf proximity biotinylation for interaction proteomics. Nat Commun. 2021;12(1):5015.

      (6) Schmiedeberg L, Skene P, Deaton A, Bird A. A temporal threshold for formaldehyde crosslinking and fixation. PLoS One. 2009;4(2):e4636.

      Reviewer #3 (Public review):

      Significance of the Findings:

      The study by Liu et al. presents a novel method, DNA-O-MAP, which combines locus-specific hybridisation with proximity biotinylation to isolate specific genomic regions and their associated proteins. The potential significance of this approach lies in its purported ability to target genomic loci with heightened specificity by enabling extensive washing prior to the biotinylation reaction, theoretically improving the signal-to-noise ratio when compared with other methods such as dCas9-based techniques. Should the method prove successful, it could represent a notable advancement in the field of chromatin biology, particularly in establishing the proteomes of individual chromatin regions - an extremely challenging objective that has not yet been comprehensively addressed by existing methodologies.

      Strength of the Evidence:

      The evidence presented by the authors is somewhat mixed, and the robustness of the findings appears to be preliminary at this stage. While certain data indicate that DNA-O-MAP may function effectively for repetitive DNA regions, a number of the claims made in the manuscript are either unsupported or require further substantiation. There are significant concerns about the resolution of the method, with substantial biotinylation signals extending well beyond the intended target regions (megabases around the target), suggesting a lack of specificity and poor resolution, particularly for smaller loci.

      We thank the reviewers for their comments and note that we have followed up on the idea of targeting non-repetitive DNA loci (HOX clusters and part of the X chromosome) in the revised manuscript (Figures 4 and 5).

      Furthermore, comparisons with previous techniques are unfounded since the authors have not provided direct comparisons with the same mass spectrometry (MS) equipment and protocols. Additionally, although the authors assert an advantage in multiplexing, this claim appears overstated, as previous methods could achieve similar outcomes through TMT multiplexing. Therefore, while the method has potential, the evidence requires more rigorous support, comprehensive benchmarking, and further experimental validation to demonstrate the claimed improvements in specificity and practical applicability.

      We have made the comparisons as best as possible. In fact, we found it difficult to find examples of recent implementations of many of these methods. Purchasing the exact mass spectrometers or performing every version of chromatin proteomics would be well beyond the scope of this work. On the other hand, OMAP has already generated data for three manuscripts. We are making the claim that using the instrumentation and methods available to us, we were able to reduce the number of cells required to analyze a given genomic loci. We then applied TMT multiplexing to further improve the throughput and perform replicate analyses. To fully validate that one protein exists at one loci and no other would require exhaustive atlasing of protein-genomic interactions which would be well beyond the scope of this single paper. Similarly, ChIP for every target identified to assess an empirical FDR would be well beyond the scope of this work.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In summary, all three reviewers raised major concerns about the limitations of the method, many of which could be resolved by more precise and transparent language about these limitations. If you choose to resubmit a revised version, you should address questions like: What scale does "individual locus" refer to? At what scale can the method map protein-DNA interactions at individual targeted loci, rather than large repetitive domains? What is the estimated false discovery rate for a set of enriched proteins? The eLife assessment for this version of the manuscript is based on reviewer concerns. Note that this assessment can be updated after receiving a response to reviewer comments.

      Reviewer #1 (Recommendations for the authors):

      (1)The first couple of paragraphs make it sound like your method would exclusively benefit from sample multiplexing with MS-based proteomics. That is a bit misleading. The other stated methods use TMT. They don't use it to compare very different genomic (or compartmental) regions, but there is no reason cberst, glopro or CasID could not.

      A good point and we have updated the manuscript to reflect this. While previous methods generally did not use TMT, they could be adapted to do so and, similar to OMAP, improved by the use of more replicates in their analyses.

      (2) Please make the colors in 1F for the dataset overlap easier to read. 2 and 4+ are too similar.

      We appreciate the comment on making the colors easier to discern. Along these lines we’ve changed the color of “2” to make it easier to distinguish from “4+”.

      (3) Label as many dots as legible in your volcano plots.

      We’ve labeled a number of proteins that are relevant to the discussion in this paper as well as some additional proteins. We feel that additional labeling would detract from the points that we are trying to make in individual figure panels about groups of proteins, rather than general remodeling of all proteins.

      (4) Figure 2E needs a divergent color scheme since it crosses 0. And is it scaled, log-transformed, or both? And compared to what then?

      Figure 2E (heatmap) is z-scaled relative protein abundance measurements based on TMTpro reporter ion signal to noise (“s/n”). We have added additional information to the legend to highlight the information that the reviewer points out here. For the color, we are unsure of what is being asked for, as above 0 is red and below 0 is blue.

      (5) Unclear what you are implying with "...only 1-2 biological replicates." I would omit or clarify.

      Fair point, we have updated the manuscript to omit this section to simplify the introduction.

      (6) H2O2 and biotin phenols might be toxic to living organisms. But so is 4% PFA and ISH. I realize you are trying to justify your new approach but you don't need to do it with exaggerated contrasts. This O-MAP is a great approach and probably more likely for people to adopt it because it's DNA ISH based. Plus, with the clinking, you are likely not displacing proteins via Cas9 landing.

      We appreciate the reviewer’s comments about adoption and lack of protein displacement. We’ve scaled back on the claims and added more about limitations owing to crosslinking and ISH.

      (7) How much genome does the Cent regions take up? You state 500 kb for Telos.

      In the text we delineate how large of a region the PanAlpha probes target “The genome-wide binding profile of the pan-alpha probe closely overlaps with centromeres (Figure S1) and covers approximately 35 Mb of the genome according to in silico predictions.” Additionally, we’ve added Table S4 to summarize target locus sizes for all of the included targets.

      (8) You seem to be underestimating the lysine labeling. Is that after TMT labeling and analysis? If so, you're already ignoring what couldn't be seen. I don't think it's that important but you included it, so please describe clearly why it's an issue and how much of an issue it is. How does that relate to lit values? And it's not just TMTpro, it's any lysine labeler.

      We appreciate the reviewers point about specifying the reasoning and the lack of clarity around overall lysine labeling. That 1.38% is the number of peptides with remainder modifications due to formaldehyde crosslinking. For overall acylation of lysines with TMT labels, we generally expect (and achieve) >97% labeling of lysines with TMT reagents as the Kuster and Carr labs nicely demonstrated across a range of labeling conditions (PMID: 30967486).

      Decrosslinking is a critical step generally for proteomics workflows on fixed or FFPE tissues and thus we sought to explore whether we could achieve sufficiently low residual lysine alkylation to enable protein quantitation by TMTpro reagents (or any lysine labeler, as the reviewer notes). For TMTpro-based methods on peptides, this is less of a concern generally as protease cleavage frees new primary amines at the N-termini of peptides which can be labeled for quantitation. But in part since we are describing a proteomics method on fixed tissues we wanted to share these data and the potential inclusion of residual fixation modifications for readers to potentially take into consideration when performing this method.

      Reviewer #3 (Recommendations for the authors):

      Liu et al. describe an original locus labelling approach that enables the isolation of specific genomic regions and their associated proteins. I have mixed views on this work, which, in my opinion, remains preliminary at this stage. Establishing the proteome of a single chromatin region is one of the most complex challenges in chromatin biology, as extensively discussed in Gauchier et al. (2020). Any breakthrough towards this goal is of significant interest to the community, making this manuscript potentially compelling. Indeed, some data suggest that the method works for repetitive DNA to some extent. However, much of the data is not very convincing, and in the case of small DNA targets, it argues against the use of DNA-O-MAP.

      In contrast to existing methods, DNA-O-MAP combines locus-specific hybridisation in situ (using affordable oligonucleotides) with proximity biotinylation. A major advantage of this strategy over other locus-specific biotinylation methods is the possibility of extensively washing excess or non-specifically hybridised probes before the biotinylation reaction, theoretically limiting biotinylation to the target region and thus significantly enhancing the signal-to-noise ratio. Other methods involving proximity biotinylation, such as targeted dCas9, do not have this capacity, meaning biotinylation occurs not only at the locus where a small fraction of dCas9 molecules is targeted but also around non-bound dCas9 molecules (representing the vast majority of dCas9 expressed in a given cell). This aspect potentially represents an interesting advance.

      We thank the reviewer for their thoughts and critiques, which we hope have in part relieved concerns pertaining to limitation on repetitive elements. To the latter points, we confirmed this with new specificity analysis that showed labeling to be highly specific to a given probe locus (Figure S3).

      Below, I outline the significant issues:

      The manuscript implies that DNA-O-MAP has better sensitivity than earlier techniques like CAPTURE, GLOPRO, or PICh. The authors state that PICh uses one trillion cells (which I doubt is accurate), and other methods require 300 million cells, whereas DNA-O-MAP uses only 60 million cells, suggesting the latter is more feasible. However, these earlier experiments were conducted almost 15 and 6 years ago, when mass spectrometry (MS) sensitivity was considerably lower than that of current instruments. The authors cannot know whether the proteome obtained by previous methods using 60 million cells, but analysed with current MS technology, would yield results inferior to those of DNA-O-MAP. Unless the authors directly compare these methods using the same number of cells and identical MS setups, I find their argument unjustified and misleading.

      Based on the instrumentation listed, we actually do have a good idea of how sensitivity changes may have affected identifications and overall sensitivity. For example, the CASPEX data was collected on an Orbitrap Fusion Lumos, while our data was collected on an Orbitrap Fusion Eclipse. From our work characterizing these two instruments during the Eclipse development (PMID: 32250601), we do actually know that the ion optics improvements boosted sensitivity of the Eclipse used in our work compared to the Lumos by ~50%, meaning if GLOPRO was run on an Eclipse it would still require >200 million cells per replicate for input.

      It is suggested that DNA-O-MAP is capable of 'multiplexing', whereas previous methods are not. This statement is also misleading. As I understand it, the targeted regions do not originate from a common pool of cells. Instead, TMT multiplexing only occurs after each group of cells has been independently labelled (Telo, Centro, Mito, control). Therefore, previous methods could also perform multiplexing with TMT. Moreover, it is unclear how each proteome was compared: one would expect many more proteins from centromeres than from telomeres (I am unsure about the number of mitochondria in these cells) since these regions are significantly larger than telomeres (possibly 10 to 100 times larger?). Have the authors attempted to normalise their proteomics data to the size (concatenated) of each target? This is particularly relevant when comparing histone enrichment at chromatin regions of differing sizes.

      We agree with the reviewers that this was overstated. In fact the GLOPRO paper notes that they performed a MYC analysis with a previous generation of TMT that could multiplex 10 samples. We have amended the manuscript to be more specific in those contexts. As stated in the methods section, “Samples were column normalized for total protein concentration”, to account for the amount of protein and size of the different targets.

      Figure 1C shows streptavidin dots resembling telomeres. To substantiate this claim, simultaneous immunofluorescence with a telomere-specific protein (e.g., TRF1 or TRF2) is required. It is currently unknown whether all or only a subset of telomeres are targeted by DNA-O-MAP, and it is also unclear if some streptavidin foci are non-telomeric. Quantification is needed to indicate the reproducibility of the labelling (the same comment applies to the centromere probes later in the manuscript; an immunofluorescence assay with CENPB would be informative, alongside quantifications).

      We understand the reviewer’s concern about specificity and reproducibility of DNA-O-MAP. To address this we have added analysis showing the efficiency and specificity of our FISH and biotin labeling for Telomere, PanAlpha, and Mitochondria targeting oligos (Figure S3). We found that biotin deposition was highly specific to the intended targets with an average across the three probes of 98% specificity.

      Perhaps more importantly, the authors suggest that it may be possible to enrich proteins that are not necessarily present at the target locus but are instead in spatial proximity (e.g., RNA polymerase I subunits enriched upon centromere targeting). Does this not undermine the purpose of retrieving locus-specific proteomes?

      The goal of DNA OMAP is to identify a local neighborhood of proteins around a specific genomic loci, similar to GLOPRO. As we note in the work presented in Figure 4 and 5 now, these neighborhoods are inherently interesting for comparison of quantitative changes that occur around a genomic locus.

      Possibly related to the previous issue, when DNA-O-MAP is used to assess DNA-DNA interactions, probes covering regions of 20-25 kb are employed. Therefore, one would expect these regions to be significantly biotinylated compared to flanking regions. However, Genome Browser screenshots indicate extensive biotinylation signals spanning several megabases around the 20-25 kb targets. If the method were highly resolutive, the target region would be primarily enriched, with possibly discrete lower enrichment at distant interacting regions. The lack of discrete enrichment suggests poor resolution, likely due to the likely large scale of proximity biotinylation. This compromises the effectiveness of DNA-O-MAP, especially if it is intended to target small loci with complex sequences. Could the authors quantify the absolute number of reads from the target region compared to those from elsewhere in the genome (both megabases around the locus and other chromosomes, where many co-enriched regions seem to exist)? This would provide insights into both enrichment and specificity.

      Thanks for this suggestion, we have included a new Figure S8 to look at normalized read depth as a function of distance from the genomic target. The resolution of DNA OMAP, like all peroxidase mediated proximity labeling methods, is not dependent on the sequence length of the DNA region, but the 30-40nm of physical space around the HRP molecule that is targeted to the genomic loci. 

      Minor Issues:

      (1) Page 3, second paragraph: It is unclear why probes producing a visible signal in situ necessarily translates to their ability to retrieve a specific proteome.

      We have revised the manuscript to de-emphasize the visible signal aspect of probe targeting and re-emphasize our initial point that the number of probes needed to properly target unique regions makes the use of locked nucleic acid probes cost-prohibitive. The basic point though, we and others previously showed with RNA OMAP (PMID: 39468212) and Apex/proximity labeling strategies, the ability to deposit biotin and visualize generally directly translates to recovery of proximally labeled proteins (PMID: 26866790).

      (2) Page 3, last paragraph: "to reach a higher degree of enrichment...": Has it been demonstrated that direct protein biotinylation provides higher enrichment of relevant proteins? Certainly, there is higher enrichment of proteins, but whether they are relevant is another matter.

      Our point here was that the methods using direct protein biotinylation have higher levels of enrichment and thus require less cells than the previously mentioned PICh method, which is why we wrote the following: “In the case of GLoPro, APEX-based proximity labeling enhanced protein detection sensitivity, reducing the input required for each replicate analysis to ~300 million cells—a 10-fold reduction in cell input compared to PICh which used 3 billion cells.”

      Regarding if these proteins are relevant or not, we show enrichment of known proteins that are critical to the function of their occupied genomic region at telomeres and centromeres. Additionally, we’ve made added quantitative comparisons to assess relevance in our analysis of Hox and our targeted region of the X chromosome through comparisons to ChIP data at these regions. The improved enrichment that we’ve established in our initial submission as well as in the updated version also means that we can further scale down the number of cells required.

      (3) Figure 2B is misleading; it appears as though all three regions are targeted in the same cell, suggesting true multiplexing, which, I believe, is not the case.

      To avoid any potential confusion about how the samples were derived we’ve updated this figure panel to show three separate cells, each with a different region being targeted.

      (3) If I understand correctly, the 'no probe' control should primarily retrieve endogenously biotinylated proteins (carboxylases), which are mainly found in mitochondria. Why does the Pearson clustering in Supplementary Figure 2 not place this control proteome closer to the mitochondrial proteome?

      Under the assumption that the ~10 carboxylases are biotinylated at the same levels in all cells, yet the proportion of these carboxylases compared to all enriched proteins for a given target is markedly reduced. Thus, as a proportion of the enriched proteome we note in Figure S4 that mitochondrial DNA OMAP enriches proteins besides the carboxylases. We believe this explains why the ‘no probe’ sample can be clearly separated along PC2 in Figure 2D.

      (4) Was CENPA enriched in the centromere DNA-O-MAP? If not, have the authors scaled up (e.g., with ten times more cells) to see if the local proteome becomes deeper and detects relevant low-abundance proteins like CENPA or HJURP? This would be very informative.

      We did not observe CENPA, and we had originally contemplated the experiment the reviewer suggested, but noted that CENPA has only two tryptic peptides (>7 AA, <35AA), and they are both in the commonly phosphorylated region of the protein. Rather than scale up these experiments, we decided to attempt DNA OMAP on the non-repetitive locus experiments.

      (5) Using a few million cells, I do not see how the starting chromatin amount could range from 0.5 to 7 mg, as shown in Figures 2 and 3. How were these figures calculated? One diploid cell contains approximately 6 pg of DNA/chromatin, which means one billion cells represent about 6 mg of DNA/chromatin (a typical measurement for these methods).

      Thanks to the reviewer for catching this, that should have been the total lysate amount, not chromatin mass. We have corrected Figures 2 and 3.

      (6) Figure S1: There is no indication of the metrics used for the shades of red.

      We have added a gradient legend to depict this.

      (7) What is the purpose of HCl in the experiment?

      HCl treatment was done to reduce autofluorescence for imaging (PMID: 39548245).

      (8) I could not find the MS dataset on the server using the provided accession number (PDX054080).

      Thank you for pointing this out, we have confirmed the dataset is public now and added the new datasets for the Xi/Xa and Hox studies. We also note that the accession should be “PXD054080”

      (9) Why desthiobiotin instead of biotin?

      We have tested both; desthiobiotin was helpful to reduce adsorption to surfaces. Either biotin or desthiobiotin can be used, though, for OMAP.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Del Rosario et al characterized the extent and cell types of sibling chimerism in marmosets. To do so, they took advantage of the thousands of SNPs that are transcribed in single-nucleus RNA-seq (snRNA-seq) data to identify the sibling genotype of origin for all sequenced cells across 4 tissues (blood, liver, kidney, and brain) from many marmosets. They found that chimerism is prevalent and widespread across tissues in marmosets, which has previously been shown. However, their snRNA-seq approach allowed them to identify precisely which cells were of sibling origin, and which were not. In doing so they definitively show that sibling chimerism across tissues is limited to cells of myeloid and lymphoid lineages. The authors then focus on a large sample of microglia sequenced across many brain regions to quantify: (1) variation in chimerism across brain regions in the same individual, and (2) the relative importance of genetic vs. environmental context on microglia function/identity.

      (1) Much like across different tissues in the same individual, they found that the proportion of chimeric microglia varies across brain regions collected from the same individuals (as well as differing from the proportion of sibling cells found in the blood of the same animals), suggesting that cells from different genetic backgrounds may differ in their recruitment and/or proliferation across regions and local tissue contexts, or that this may be linked to stochastic bottleneck effects during brain development.

      (2) Their (admittedly smaller sample size) analyses of host-sibling gene expression showed that the local environment dominates genotype.

      All told, this thoughtful and thorough manuscript accomplishes two important goals. First, it all but closes a previously open question on the extent and cell origins of sibling chimerism. Second, it sets the stage for using this unique model system to examine, in a natural context, how genetic variation in microglia may impact brain development, function, and disease.

      The conclusions of this paper are well supported by the data, and the authors exert appropriate care when extrapolating their results that come from smaller samples. However, there are a few concerns that should be addressed.

      The "modest correlation" mentioned in lines 170-172 does not take into account the uncertainty in estimates of each chimeric cell proportion (although the plot shows those estimates nicely). This is particularly important for the macrophages, which are far less abundant. Perhaps a more appropriate way to model this would be in a binomial framework (with a random effect for individuals of origin). Here, you could model the sibling identity of each macrophage as a function of the proportion of sibling-origin microglia and then directly estimate the percent variance explained.

      We appreciate this good suggestion. We performed an analysis along these lines, and found that it supported the conclusion of a lack of strong relationship between microglial and macrophage chimerism. In particular (and as we now have added to the Methods):

      “To perform an analysis of Fig. 2D that takes into account the uncertainty in the estimate of the chimeric cell proportion, we performed a binomial generalized linear mixed-effects model analysis in R using the command glmer( y~(1|indiv) + chimerism_micro, family=binomial), where y is a vector (of length 1,333) containing the genomic identity of each macrophage (either host or twin), 1|indiv models a random effect for the identity of each animal, and chimerism_micro is the microglia chimerism of the animal’s brain. The fixed effects probability of chimerism_micro was 0.795, indicating that microglial chimerism fraction was not statistically significant as a predictor for macrophage chimerism fraction. The estimate for the intercept was -0.8115 and the estimate for chimerism_micro was 0.3106, which indicates that the probability of a cell is a macrophage given the microglia chimerism fraction was only 0.57 (plogis(-0.8115+0.3106)).”

      We have added the following in the main text:

      “We investigated further by performing a statistical test that takes into account the uncertainty in the estimates of the chimeric cell proportion using a binomial framework (Methods); in this analysis, microglia chimerism fraction was not a statistically significant predictor of macrophage chimerism fraction (Methods). This suggests that in addition to the cell’s genome, other factors such as local host environment play a role in differential recruitment, proliferation or survival of the sibling cells. (We note that macrophages often transit the fluid-filled perivascular space, with a substantially different migration history and arrival dynamics than microglia.)”

      Given this new analysis, and our original observation that the Pearson correlation was only 0.31, we believe that other factors in addition to the cell’s genome play a role in differential recruitment or survival of sibling cells.

      A similar (albeit more complicated because of the number of regions being compared) approach could be applied to more rigorously quantify the variation in chimerism across brain regions (L198-215; Figure 4). This would also help to answer the question of whether specific brain regions are more "amenable" to microglia chimerism than others.

      We performed the analysis along these lines and added the following in the Methods section:

      “We used the same framework to further analyze Fig. 4. We included brain region as a covariate in the binomial framework: glmer( y~(1|indiv) + brain_reg + assay, family=binomial), where, y is a vector (of length 48,439) containing the genomic identity of each microglia, and assay is either “Drop-seq” or “10X”. The brain regions assayed in Fig. 4 are the cortex, hippocampus, hypothalamus, striatum, thalamus, and basal forebrain. All these brain regions were statistically significant predictors for microglia chimerism fraction (all P-values<2x10<sup>-16</sup>), supporting the conclusion that chimerism varies across brain regions. We also re-analyzed Supplementary Fig. 4 (Fig. 4B in original manuscript) using the same framework and found that 18 out of 27 brain substructures were statistically significant predictors for microglia chimerism fraction.”

      We have added the following sentences in the main text:

      “We used the binomial generalized linear mixed-model framework and found that all brain regions were statistically significant predictors for microglia chimerism fraction, supporting the conclusion that chimerism varies across brain regions (Methods).

      Analysis of finer brain substructures showed a similar result (Supplementary Fig. 4; the binomial generalized linear mixed-model framework determined that 18 out of 27 brain substructures were statistically significant as predictors for microglia chimerism fraction, Methods).”

      While the sample size is small, it would be exciting to see if any microglia eQTL are driven by sibling chimerism across the marmosets.

      We like this idea, but our study is underpowered for eQTL analysis since we only have 14 data points in the correlation analysis (eight cases in which an animal’s brain hosted microglia derived from a single sibling, plus three cases in which an animal’s brain hosted microglia derived from two siblings, collectively allowing 8 + (2*3)=14 pairwise analyses).

      L290-292: The authors should propose ways in which they could test the two different explanations proposed in this paragraph. For instance, a simulation-based modeling approach could potentially differentiate more stochastic bottleneck effects from recruitment-like effects.

      While intriguing, the gene expression comparison (Figure 5) is extremely underpowered. It would be helpful to clarify this and note the statistical thresholds used for identifying DEGs (the black points in the figure).

      We agree; to help clarify this for readers, we added the following sentence at the end of the paragraph discussing Fig. 5A-C.

      “In all eleven individual marmosets, analysis identified genes whose differential expression distinguished microglia with the two sibling genomes (hundreds of genes in total), documenting a substantial effect of sibling genetic differences on microglial gene expression. However, we did not find any gene whose expression level recurrently distinguished “host” microglia (microglia with the same genome as neural cell types) from “guest” microglia (microglia with the sibling genome), aside from the XIST gene (a proxy for sibling sex differences, which were of course common) (Supplementary Fig. 5, Fig. 5A-C). In other words, although there were always gene-expression differences between sibling microglia, none of them consistently distinguished between host and guest microglia, suggesting that they were instead due to sibling genetic differences. We note that both analyses are power-limited, as the number of microglia in most animals, especially guest microglia, were modest (Supplementary Fig. 5); thus, we cannot rule out the possibility that there may be one or more genes whose expression levels reflect developmental histories (host vs. guest origin), just as there are likely far more genes (than the hundreds we identified) that can have sibling expression differences due e.g. to genetic differences between siblings. We sought to increase power (beyond single-gene analysis) by using latent factor analysis (Ling et al., 2024) to identify and quantify the expression of microglial gene-expression programs; however, even this analysis did not find any gene expression programs that exhibited consistent host-twin differences in expression levels (Methods).”

      And in the caption of Fig. 5A-C, we have included the statistical threshold for identifying DEGs:

      “In (A) to (C), each point represents a gene; its location on the plot represents the level of expression of that gene among microglia with two different genomes in the same animal. x- and y-axes: normalized gene expression levels (number of transcripts per 100,000 transcripts). FC: fold-change of gene expression, female/male for XIST. Fold-change and P-values were calculated using the binomTest method from the edgeR package (Robinson et al., 2010). Differentially expressed genes (black dots) were defined as: FDR Q-value<0.05 and fold-change>1.5 (in either direction) and the gene must be expressed in at least 10% of at least one of the two sets of microglia being compared.”

      Reviewer #2 (Public review):

      Summary:

      This manuscript reports a novel and quite important study of chimerism among common marmosets. As the authors discuss, it has been known for years that marmosets display chimerism across a number of tissues. However, as the authors also recognize, the scope and details of this chimerism have been controversial. Some prior publications have suggested that the chimerism only involves cells derived from hematopoietic stem cells, while other publications have suggested more cell types can also be chimeric, including a wide range of cell types present in multiple organs. The present authors address this question and several other important issues by using snRNA-seq to track the expression of host and sibling-derived mRNAs across multiple tissues and cell types. The results are clear and provide strong evidence that all chimeric cells are derived from hematopoietic cell lineages.

      This work will have an impact on studies using marmosets to investigate various biological questions but will have the biggest impact on neuroscience and studies of cellular function within the brain. The demonstration that microglia and macrophages from different siblings from a single pregnancy, with different genomes expressing different transcriptomes, are commonly present within specific brain structures of a single individual opens a number of new opportunities to study microglia and macrophage function as well as interactions between microglia, macrophages, and other cell types.

      Strengths:

      The paper has a number of important strengths. This analysis employs the first unambiguous approach providing a clear answer to the question of whether sibling-derived chimeric cells arise only from hematopoietic lineages or from a wider array of embryonic sources. That is a long-standing open question and these snRNA-seq data seem to provide a clear answer, at least for the brain, liver, and kidney. In addition, the present authors investigate quantitative variation in chimeric cell proportions across several dimensions, comparing the proportion of chimeric cells across individual marmosets, across organs within an individual, and across brain regions within an individual. All these are significant questions, and the answers have important implications for multiple research areas. Marmosets are increasingly being used for a range of neuroscience studies, and a better understanding of the process that leads to the chimerism of microglia and macrophages in the marmoset brain is a valuable and timely contribution. But this work also has implications for other lines of study. Third, the snRNA-seq data will be made available through the Brain Initiative NeMO portal and the software used to quantify host vs. sibling cell proportions in different biosamples will be available through GitHub.

      Weaknesses:

      I find no major weaknesses, but several minor ones. First, the main text of the manuscript provides no information about the specific animals used in this study, other than sex. Some basic information about the sources of animals and their ages at the time of study would be useful within the main paper, even though more information will be available in the supplementary material.

      We moved the table containing animal information (age at time of study, sex, source, tissues analyzed) from Supplementary Table 1 into the main text as Table 1. We also added the following sentences starting on line 140:

      “Brain snRNA-seq was performed on 11 animals (6 adults, 3 neonates and 1 six months old; Table 1). All were unrelated except for CJ006 and CJ007 which are birth siblings, and CJ025 and CJ026 which are (non-birth) siblings. All animals come from the three main marmoset colonies that comprise the animals in our facilities: New England Primate Research Center (NEPRC), CLEA Japan, and from a non-clinical contract research organization in Massachusetts. All adult marmosets had no known previous disease and were selected as part of a larger project to create a single cell atlas of the marmoset brain. The three neonates had died shortly after birth due to unknown reasons and were subsequently selected for snRNA-seq analysis.”

      Second, it is not clear why only 14 pairs of animals were used for estimating the correlation of chimerism levels in microglia and macrophages. Is this lower than the total number of pairwise comparisons possible in order to avoid using non-independent samples? Some explanation would be helpful.

      Only birth siblings (twins and triplets) can be meaningfully included in this analysis. The 14 pairs of animals we used to estimate the correlation of chimerism levels in microglia and macrophages included all pairs that we could use for this analysis: eight cases in which an animal’s brain hosted microglia derived from a single sibling, plus three cases in which an animal’s brain hosted microglia derived from two siblings, collectively allowing 8 + (2*3)=14 pairwise analyses.

      Finally, I think more analysis of the consistency and variability of gene expression in microglia across different regions of the brain would be valuable. Are there genetic pathways expressed similarly in host and sibling microglia, regardless of region of the brain? Are there pathways that are consistently expressed differently in host vs sibling microglia regardless of brain region?

      For brain-region differences in microglial gene expression, we are under-powered and would only be scratching the surface of a question (interesting but beyond the focus and scope of this paper) that needs deeper experimental sampling.

      For the questions about sibling-sibling differences (regardless of which sibling is host) and recurring host-sibling differences, we can do a stronger analysis, because these analyses have similar power to each other. We describe this analysis in the revised manuscript as follows:

      “In all eleven individual marmosets, analysis identified genes whose differential expression distinguished microglia with the two sibling genomes (hundreds of genes in total), documenting a substantial effect of sibling genetic differences on microglial gene expression. However, we did not find any gene whose expression level recurrently distinguished “host” microglia (microglia with the same genome as neural cell types) from “guest” microglia (microglia with the sibling genome), aside from the XIST gene (a proxy for sibling sex differences, which were of course common) (Supplementary Fig. 5, Fig. 5A-C). In other words, although there were always gene-expression differences between sibling microglia, none of them consistently distinguished between host and guest microglia, suggesting that they were instead due to sibling genetic differences. We note that both analyses are power-limited, as the number of microglia in most animals, especially guest microglia, were modest (Supplementary Fig. 5); thus, we cannot rule out the possibility that there may be one or more genes whose expression levels reflect developmental histories (host vs. guest origin), just as there are likely far more genes (than the hundreds we identified) that can have sibling expression differences due e.g. to genetic differences between siblings.”

      We also, as suggested, tried to get beyond single-gene analyses to expression of programs/pathways, by performing latent factor analysis on the single-cell gene expression measurements. 

      “Following the method described in (Ling et al., 2024), we performed latent factor analysis using the probabilistic estimation of expression residuals (PEER, Stegle et al., 2010) on the gene-by-donor matrix expression of microglia. We started by creating a gene-by-cell matrix of microglia gene expression from all animals, and we normalized the matrix using SCT transform version 2 (Choudhary and Satija, 2022) with 3000 variable features. We obtained the Pearson residuals from SCT normalization and summed up the residuals across cells with the same genome to obtain a gene-by-donor matrix of expression measurements of microglia. We used this matrix as input to PEER and ran the tool with a provided number of factors from 9 to 12. For each gene-expression latent factor, to evaluate whether host/sibling identity had a consistent effect on expression levels, we performed a linear regression with host/sibling identity using glm(peer_factor_k ~ host_or_twin). For all factors, the P-values for the effect of host_or_twin were all insignificant (greater than 0.1), indicating that no PEER factor associated with host-vs-twin identity. Thus, our results found no large-scale gene expression program that was consistently expressed differently between hosts and twins.”

      We have added the text above to the Methods section, and we added the following at the end of the section on Gene-expression comparisons of host- to sibling-derived microglia (lines 264-267):

      “We sought to increase power (beyond single-gene analysis) by using latent factor analysis (Ling et al., 2024) to identify and quantify the expression of microglial gene-expression programs; however, even this analysis did not find any gene expression programs that exhibited consistent host-twin differences in expression levels (Methods).”

      Gene-expression pathways/factors did (within some animals) did show host-twin differences in expression levels, but without a consistent host-twin direction of effect that was shared across the many host-twin comparisons. In particular, we used the PEER analysis that we have performed above and calculated the host-sibling expression level difference for each latent factor. Many factors differed in expression in individual cases, though none did so in all cases nor in a consistent-sign manner:

      Author response image 1.

      Difference between host and sibling expression of gene-expression latent factors for each of the 12 factors computed (using PEER) from the single-cell dataset. For a given factor, the factor expression value of the sibling-genome cells is subtracted from that of the host-genome cells and the difference is divided by the maximum of the absolute value of all elements in that factor.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the introduction (line 62), the authors mention that chimerism might have shaped behavior in marmosets (and perhaps been selected for). It would be helpful to see this revisited in the discussion. Is it possible that additional genetic variation in immune cells (resident and circulating) provides adaptive benefits and/or disease resistance? In the case of microglia, could the proportion of sibling cells be related (either positively or negatively) to local/regional pathology?

      We liked this suggestion and have added the following in the Discussion:

      “Chimerism could also enable interesting future analyses of whether there are adaptive benefits of chimerism in marmoset immune cells, among whom chimerism could in principle allow presentation of a wider variety of antigens for adaptive immunity. In a recent outbreak of yellow fever in Brazil in 2016-2018, marmosets were found to be less susceptible than other primates that lack immune system chimerism, including the howler monkeys (Alouatta), robust capuchins (Sapajus), and titi monkeys (Callicebus) (de Azebedo Fernandes, et al., 2021). In studying future outbreaks in marmosets, one could use single-cell RNA-seq and the methods described here to study how genetically distinct immune cells (in the same animal) have differentially migrated to affected tissues and/or assumed "activated" immune cell states. Recent innovations in spatial transcriptomics with sequencing readouts (that detect SNP alleles) may also make it possible to identify any differential recruitment of genetically distinct immune cells to focal infection sites.”

      Minor comments:

      L300 delete "temporal.”

      We have revised the text accordingly.

      L305: "more-restricted" should not be hyphenated.

      We have revised the text accordingly.

      L309: "from the non-cell" - delete "the.”

      We have revised the text accordingly.

      L367: Louvain, not Louvaine.

      We have revised the text accordingly.

      Figure 2B can be removed - it does not add much information and takes up a lot of space.

      We have moved Figure 2B to panel J Supplementary Fig. 1 (it is now displayed together with all other animals).

      The same can be said for Figure 4B, which is too tiny. There might be more effective ways to show this variation across animals.

      We have moved Figure 4B to Supplementary Fig. 4 and we have increased the font sizes to make the text in the figures more readable.

      Reviewer #2 (Recommendations for the authors):

      I would suggest providing some basic information about the sources of study animals within the main text. At a minimum, it would be useful to state which colonies are represented in the data, and if there is anything significant about the individual animal histories (e.g. prior exposure to surgical intervention or infectious disease). I believe this basic information should be in the main text, despite the inclusion of a broader range of information in the supplements.

      We appreciate this suggestion and revised lines 143 to 149 of the main text as follows:

      “All animals come from the three main marmoset colonies that comprise the animals in our facilities: New England Primate Research Center (NEPRC), CLEA Japan, and from a non-clinical contract research organization. All adult marmosets had no known previous disease and were selected as part of a larger project to create a single-cell atlas of the marmoset brain (Krienen et al., 2020; Krienen et al., 2023). The three neonates died shortly after birth due to unknown reasons and were subsequently selected for snRNA-seq analysis.”

      I would include the species name (Callithrix jacchus) in line 48.

      “On lines 47-48, we now indicate the name of the genus: “Chimerism is common, however, in the Callitrichidae family that consists of the marmosets (Callithrix) and their close relatives the tamarins (Saguinus)...”

      Then on line 65, we now indicate the species name: “Here, we analyze chimerism in the common marmoset (Callithrix jacchus) brain, liver, kidney and blood,...”

      The word "organisms" in line 59 should be "organs.”

      We have modified the text accordingly.

      Lines 100-101: I would suggest this would be clearer to readers if it read: "The relative likelihoods of the original source of each cell could be strongly...".

      We have modified the text accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important methodological issue - the fragility of meta-analytic findings - by extending fragility concepts beyond trial-level analysis. The proposed EOIMETA framework provides a generalizable and analytically tractable approach that complements existing methods such as the traditional Fragility Index and Atal et al.'s algorithm. The findings are significant in showing that even large meta-analyses can be highly fragile, with results overturned by very small numbers of event recodings or additions. The evidence is clearly presented, supported by applications to vitamin D supplementation trials, and contributes meaningfully to ongoing debates about the robustness of meta-analytic evidence. Overall, the strength of evidence is moderate to strong, though some clarifications would further enhance interpretability.

      Strengths:

      (1) The manuscript tackles a highly relevant methodological question on the robustness of meta-analytic evidence.

      (2) EOIMETA represents an innovative extension of fragility concepts from single trials to meta-analyses.

      (3) The applications are clearly presented and highlight the potential importance of fragility considerations for evidence synthesis.

      Weaknesses:

      (1) The rationale and mathematical details behind the proposed EOI and ROAR methods are insufficiently explained. Readers are asked to rely on external sources (Grimes, 2022; 2024b) without adequate exposition here. At a minimum, the definitions, intuition, and key formulas should be summarized in the manuscript to ensure comprehensibility.

      (2) EOIMETA is described as being applicable when heterogeneity is low, but guidance is missing on how to interpret results when heterogeneity is high (e.g., large I²). Clarification in the Results/Discussion is needed, and ideally, a simulation or illustrative example could be added.

      (3) The manuscript would benefit from side-by-side comparisons between the traditional FI at the trial level and EOIMETA at the meta-analytic level. This would contextualize the proposed approach and underscore the added value of EOIMETA.

      (4) Scope of FI: The statement that FI applies only to binary outcomes is inaccurate. While originally developed for dichotomous endpoints, extensions exist (e.g., Continuous Fragility Index, CFI). The manuscript should clarify that EOIMETA focuses on binary outcomes, but FI, as a concept, has been generalized.

      Reviewer #2 (Public review):

      Summary:

      The study expands existing analytical tools originally developed for randomized controlled trials with dichotomous outcomes to assess the potential impact of missing data, adapting them for meta-analytical contexts. These tools evaluate how missing data may influence meta-analyses where p-value distributions cluster around significance thresholds, often leading to conflicting meta-analyses addressing the same research question. The approach quantifies the number of recodings (adding events to the experimental group and/or removing events from the control group) required for a meta-analysis to lose or gain statistical significance. The author developed an R package to perform fragility and redaction analyses and to compare these methods with a previously established approach by Atal et al. (2019), also integrated into the package. Overall, the study provides valuable insights by applying existing analytical tools from randomized controlled trials to meta-analytical contexts.

      Strengths:

      The author's results support his claims. Analyzing the fragility of a given meta-analysis could be a valuable approach for identifying early signs of fragility within a specific topic or body of evidence. If fragility is detected alongside results that hover around the significance threshold, adjusting the significance cutoff as a function of sample size should be considered before making any binary decision regarding statistical significance for that body of evidence. Although the primary goal of meta-analysis is effect estimation, conclusions often still rely on threshold-based interpretations, which is understandable. In some of the examples presented by Atal et al. (2019), the event recoding required to shift a meta-analysis from significant to non-significant (or vice versa) produced only minimal changes in the effect size estimation. Therefore, in bodies of evidence where meta-analyses are fragile or where results cluster near the null, it may be appropriate to adjust the cutoff. Conducting such analyses-identifying fragility early and adapting thresholds accordingly-could help flag fragile bodies of evidence and prevent future conflicting meta-analyses on the same question, thereby reducing research waste and improving reproducibility.

      Weaknesses:

      It would be valuable to include additional bodies of conflicting literature in which meta-analyses have demonstrated fragility. This would allow for a more thorough assessment of the consistency of these analytical tools, their differences, and whether this particular body of literature favored one methodology over another. The method proposed by Atal et al. was applied to numerous meta-analyses and demonstrated consistent performance. I believe there is room for improvement, as both the EOI and ROAR appear to be very promising tools for identifying fragility in meta-analytical contexts.

      I believe the manuscript should be improved in terms of reporting, with clearer statements of the study's and methods' limitations, and by incorporating additional bodies of evidence to strengthen its claims.

      Reviewer #3 (Public review):

      Summary and strengths:

      In this manuscript, Grimes presents an extension of the Ellipse of Insignificant (EOI) and Region of Attainable Redaction (ROAR) metrics to the meta-analysis setting as metrics for fragility and robustness evaluation of meta-analysis. The author applies these metrics to three meta-analyses of Vitamin D and cancer mortality, finding substantial fragility in their conclusions. Overall, I think extension/adaptation is a conceptually valuable addition to meta-analysis evaluation, and the manuscript is generally well-written.

      Specific comments:

      (1) The manuscript would benefit from a clearer explanation of in what sense EOIMETA is generalizable. The author mentions this several times, but without a clear explanation of what they mean here.

      (2) The authors mentioned the proposed tools assume low between-study heterogeneity. Could the author illustrate mathematically in the paper how the between-study heterogeneity would influence the proposed measures? Moreover, the between-study heterogeneity is high in Zhang et al's 2022 study. It would be a good place to comment on the influence of such high heterogeneity on the results, and specifying a practical heterogeneity cutoff would better guide future users.

      (3) I think clarifying the concepts of "small effect", "fragile result", and "unreliable result" would be helpful for preventing misinterpretation by future users. I am concerned that the audience may be confusing these concepts. A small effect may be related to a fragile meta-analysis result. A fragile meta-analysis doesn't necessarily mean wrong/untrustworthy results. A fragile but precise estimate can still reflect a true effect, but whether that size of true effect is clinically meaningful is another question. Clarifying the effect magnitude, fragility, and reliability in the discussion would be helpful.

      I am very appreciative of the insightful comments you all shared, and in light of them have made several clarifications and revisions. Thank you again, I am grateful to have received such considered feedback and I hope I’ve addressed any outstanding issues. I have replied to each reviewer’s recommendations in this document sequentially for ease of scanning, and am most grateful for the summary strengths and weaknesses, which I am also incorporated into these replies. Thank you again!

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript makes the important argument that many meta-analyses are inherently fragile, which aligns with prior work (e.g., PMID: 40999337). Please add the reference to the statements.

      Excellent point, thank you – I’ve expanded the discussion of fragility analysis, and its application to meta-analysis, including this reference.

      (2) The rationale and mathematical underpinnings of the proposed EOI and ROAR methods are not sufficiently explained. While the authors cite Grimes (2022, 2024b), readers are expected to rely heavily on these external sources without adequate exposition in the current paper. This limits the ability to fully evaluate the reasonableness of the methods or to reproduce the approach. I strongly recommend expanding the description of EOI and ROAR within the manuscript.

      I agree fully – I was a little remiss in this scope, as I was worried about overwhelming the reader. However, I was too sparse with detail and have now extended the text this way to describe the methods intuitively as possible (see Discussion, subsection “Ellipse of Insignificance and Region of Attainable Redaction”

      (3) In the Methods, the authors note that EOIMETA is applicable when between-study heterogeneity is low. However, the manuscript provides little guidance on how to interpret results when heterogeneity is high (e.g., larger I² values). I recommend clarifying this issue in the Results or Discussion sections, emphasizing the limitations of EOIMETA under high heterogeneity. Ideally, the authors could include either a small simulation study or an illustrative example to demonstrate the performance of the method in such settings.

      This is an excellent question, and I was remiss for not considering it better in the manuscript. Originally, the simple idea was to just pool the results for EOI, in which case heterogeneity would be an issue. But I then subsequently added weighed-inverse variance methods to account for situations with increased heterogeneity, so my initial comment was not strictly correct. I’ve changed the text in several places, notably in the methods and in the discussion (see reply point 5).

      (4) While EOIMETA is introduced as a generalizable fragility metric for meta-analyses, the illustrative examples would benefit from clearer comparisons with the traditional Fragility Index (FI). Because FI is well established in the RCT literature and familiar to many readers, presenting side-by-side results (e.g., FI at the trial level versus EOIMETA at the meta-analytic level) would provide important context. Such comparisons would also highlight the added value of EOIMETA, underscoring that even when individual trials appear robust under FI, the pooled meta-analysis may remain fragile.

      This is an excellent idea! The new table is given below. Note that traditional FI are not defined for non-significant results, and EOI is ambiguous for counts <2.

      (5) In the Discussion currently states that the Fragility Index (FI) applies only to binary outcomes. This is not entirely accurate. While the original FI was indeed developed for dichotomous endpoints, subsequent methodological work has extended the concept to other data types, including continuous outcomes (continuous fragility index, CFI). The manuscript should acknowledge this distinction: EOIMETA presently focuses on binary outcomes at the meta-analytic level, but FI more broadly is not restricted to binary data. Adding this clarification, with appropriate citations, would improve accuracy and place EOIMETA more clearly within the broader fragility literature.

      Thank you for this catch – clarified now in the discussion:

      Reviewer #2 (Recommendations for the authors):

      (1) Typos/inconsistencies/writing clarifications: All table and figure legends and titles are missing a period at the end of each sentence. In the sentence "to be estimated by bootstrap methods. Initially, we ran...", there should be a space between "methods" and "Initially" (line 113).

      Apologies, these are now remedied.

      (2) In Table 2, the total number of patients in the meta-analysis of all 12 studies is reported as 133,262, whereas the text states 133,475 patients. Based on my calculations from Figure 2, the total appears to be 133,262. Could you please clarify this discrepancy?

      Certainly – your calculations are correct. The text figure was a typo based on a very early draft where the summation function was not correctly run, and doubled counted some cases. This was fixed for the figure but not the text. The text should now match, thank you for spotting this. There are some issues with figure 2, which I will address in next few points.

      (3) Regarding this point, the meta-analysis by Zhang et al. (2019) shows some inconsistencies in the reported number of patients in the paper. According to the data provided on GitHub the total number of patients is 37671. However, Table 1 of the paper lists 38538 patients, and the main text states "5 RCTs involving 39168 patients." Similarly, for Guo et al. (2023), the main text reports that the meta-analysis included 11 RCTs with 112165 patients, whereas the table lists 111952, which appears consistent with the data available on GitHub. There is also a discrepancy in Zhang et al. (2022), which cites 61853 patients in the introduction but 61223 patients in Table 1. These inconsistencies should be clarified, as even small discrepancies in reported sample sizes can undermine the credibility of the analyses presented.

      Well-spotted – the incorrect figures are artefacts of an early draft with a double-counting summation function, and I should have spotted them and removed them prior to submission. To clarify, the correct figures from each study (which agree with github data) are given in the corrected table 1.

      Thus, there are 38,538 subjects in the Zhang et al 2019 analysis, which matches the first sheet of the github listing. The confusion comes from sheet 2 which was included only with this, which breaks these events down into events / non-events (hence the total non-events being 37,671) but keeps the old labels. This is needlessly confusing, and accordingly I have re-uploaded the data with correct headers for sheet 2.  This summation problem was also apparent in the total of figure 2, which has been replaced with a correct version now. Thank you for spotting this!

      (4) In line 158, who does "He" refer to? Please clarify this in more detail.

      Apologies, this was a typo and should have read “the” – now corrected.

      (5) The discrepant results of the RCT by Scragg et al. (2018) between the meta-analysis by Zhang et al. and that by Guo et al. could be presented in a table. This could be included as supplementary material or, preferably, in the main text (Results section).

      To avoid confusion, I will add a version of this to the github files for interested users to explore.

      (6) In the legend of Figure 2, a period is missing at the end of the sentence. Additionally, although it is generally understood, it would be helpful to specify that the numbers in parentheses represent the confidence intervals. Please confirm whether these are 95%, 89%, or 99% confidence intervals.

      Apologies, these are 95% CIs. Clarified now in updated legends.

      (7) The statement of "The more recent and robust methods for fragility analysis (EOI) and redaction (ROAR) have potential applications beyond fragile-by-design RCTs, extending to cohort studies, preclinical work, and even ecological studies, as stated by the author" in line 163. Could you please provide references supporting these claims? I believe the relevant references may be included in the EOI paper, but it would be helpful to cite them here as well.

      This has recently been used in new analysis now cited in the introduction with fuller description of method for context. Please see response to reviewer 1, points 2

      (8) Since the study was previously published as a preprint (https://www.medrxiv.org/content/10.1101/2025.08.15.25333793v1.full-text), this should be mentioned in the manuscript.

      Added as a note now.

      (9) It would also be valuable to include a figure illustrating ROAR for the same meta-analyses presented in Figure 1 for EOI, possibly as supplementary material.

      See reply to point 10.

      (10) Finally, it would be interesting to provide plots of both EOI and ROAR for the meta-analyses of all 12 included studies. These graphs could be replicated using the code examples provided by the author in the original EOI and ROAR publications.

      These have now been added to the github repository as supplementary material.

      (11a) Replications of EOI fragility: eoicfunc.R (github): - In the code provided on GitHub, an error occurred in the "EllipseFromEquation" function within eoifunc. This was due to the PlaneGeometry package not being available for the latest version of R. I attempted several installation methods (using devtools, remotes, and GitHub, as well as direct installation from a URL). However, after adjusting the code, I was able to run the analyses. For the full cohort, including all 12 studies using the EOI approach, I obtained a Minimal Experimental Arm only recoding (xi) = 14 and a Minimal Control Arm only recoding (yi) = 15, whereas the authors reported that 5 recodings were sufficient. It appears that differences in code versions or functions might have slightly affected the results. After downgrading R and running the eoic function with PlaneGeometry successfully installed, the fragility index for the EOI approach was 15 rather than 5.

      Apologies for the issue with PlaneGeometry, I will try to fix this for future iterations. The difference you see is an artefact of running EOIFUNC on pooled data, rather than the dedicated EOIMETA function, with the chief difference being that EOIFUNC doesn’t apply WIV correction.  If we simply pool events, this is the output:

      Author response image 1.

      If the reviewer uses the EOIMETA function which employs inverse weighing, then to define each trial we use a vector of events and non-events in each arm. For all the 12 studies, this would be (in R code syntax, or import from github file)

      Author response image 2.

      Then they will obtain:

      Author response image 3.

      If the reviewer runs a simple pooler analysis with weighed inverse correction turned off, they should return a similar answer as a simple eoifunc call, save the zero count correction difference. But EOIMETA weighs the sample, and is reported in main paper.

      (12) I recalculated the eoic function for Zhang et al. (2019) and found a fragility index (dmin) of 1. FECKUP Vector Length: 0.5722. Minimal Experimental Arm Recoding (xi): 0.7738. Minimal Control Arm Recoding (yi): 0.8499.

      This again appears to be an artefact of using eoifunc rather than eoimeta; with eoimeta, which uses WIV to adjust the studies for heterogeneity effects, this is the reported output:

      Author response image 4.

      (13) Using the previous code (before downgrading R and loading PlaneGeometry), I recalculated the EOI for Zhang et al. (2022) and found Minimal Experimental Arm only recoding (xi) = 55 and Minimal Control Arm only recoding (yi) = 59-results slightly closer to those reported by the authors. After properly loading PlaneGeometry, I recalculated and obtained for Zhang et al. (2022): Fragility index (dmin) = 57; FECKUP Vector Length = 39.948; Minimal Experimental Arm Recoding (xi) = 54.5436; Minimal Control Arm Recoding (yi) = 58.635.

      Again this appears to be a difference in using eoifunc or eoimeta as a call -  I can replicate this result using EOIFUNC:

      Author response image 5:

      But adjusting for study weighing with eoimeta:

      Author response image 6.

      (14) For Guo et al. (2022), the EOI fragility index was 17 [dmin = 17]. FECKUP Vector Length: 11.3721. Minimal Experimental Arm Recoding (xi): -15.6825. Minimal Control Arm Recoding (yi): -16.5167. However, the authors report an EOI fragility of 38. Since I was able to load PlaneGeometry properly and run eoicfunc.R (from GitHub) without errors, the discrepancies likely reflect minor coding or version inconsistencies rather than software limitations.

      These again stem from using eoifunc on simple pooled data versus eoimeta, which adjusts by study.

      (15) Replications of ROAR fragility: roarfunc.R (github): - For Guo et al. (2022), the ROAR fragility calculated using roarfunc.R was 16 [rmin (Redaction Fragility Index) = 16]. FOCK Vector Length: 15.942. Minimal Experimental Arm Redaction (xc): 15.9442. Minimal Control Arm Redaction (yc): 978.8906. In the main text, the author reports a redaction fragility of 37. What might explain these discrepancies?

      Again, this stems from EOIMETA versus EOIFUNC (and roarfunc calls without weighed adjustment). As the reviewer has observed, the fragility increases when there is no study level adjustment, which we have now added to the discussion text.

      (16) In generic_run.R, line 6 contains a bug - it is missing a forward slash (/) between the directory path and the filename. The correct line of code should be: pathload = paste0(pathname, "/", filename, exname). The same issue occurs in generalcode.R.

      Apologies, I will correct this in the upload!

      (17) Theoretical framework: Is there any other method available for comparison besides the one proposed by Atal et al.? Could you include a brief literature review describing alternative approaches?

      To my knowledge, there is not – Xing et al (now referenced) covered this earlier in the year, and I have included an expanded background for this purpose. Please see reply to reviewer 1, point 1.

      (18a) There appears to be no heterogeneity in the meta-analysis in terms of effect sizes and I², likely because most values are quite large, yet the included studies address very different populations (e.g., patients with COPD, NSCLC survivors, older adults, women, and GI cancer survivors). This could have been explained more clearly, including how such diverse literature might influence fragility indices or whether there is a logical rationale for combining these studies. Could you perform a sensitivity analysis or provide a conceptual explanation of how the heterogeneity - or lack thereof - across these trials may affect the fragility indices? Although I² values are small, the conceptual heterogeneity among studies suggests that the pooled results may be comparing fundamentally different clinical contexts, which requires clarification.

      I think this is a very pertinent point, I am unsure as to why these authors combined such diverse populations without any consideration of whether they were comparable, but this is a common problem in meta-analysis. I have added the following to the discussion to address this problem:

      “The use of vitamin D meta-analyses in this work was chosen as illustrative rather than specific, but it is worth noting that there are methodological concerns with much vitamin D research. (Grimes aet al., 2024). The three studies cited in this work report relatively low heterogeneity in their meta-analysis in both effect sizes and I<sup>2</sup> values, but it is worth noting that the included studies addressed very different populations, including patients with Chronic Obstructive Pulmonary Disease, Non small cell lung cancer survivors, women only cohorts, older adults, and gastrological cancer survivors. These groups have presumably different risk factors for cancer deaths, and why the authors of these studies combined the cohorts with fundamentally different clinical contexts is unclear. Why the heterogeneity appeared so relatively low in different groups is also a curious feature. This goes beyond the scope of the current work, but serves as an example of the reality that meta-analysis is only as strong as its underlying data and methodological rigor in comparing like-with-like, and the conclusions drawn from them must always be seen in context.”

      Reviewer #3 (Recommendations for the authors):

      (1) Line 156, acronym FI not defined.

      Apologies, I this is now defined at the outset as “fragility index”.

      (2) Line 158, typo "He"?

      Apologies again, this was a typo and was supposed to read “the”, fixed now.

      (3) Across the manuscript, I think the "re-coding" phrasing may confuse clinical readers. Maybe rephrasing to "flipping event classification" or "flipping group" would be better.

      Excellent point – this has now been modified at the outset.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Although the data are generally solid and well interpreted, a control showing that protein depletion works properly in cell-cycle arrested cells is lacking, both when using siRNAs and degron-based depletion.

      We now demonstrate in Fig. S9 efficient degron-mediated depletion of both NUF2 and SPC24 in cell-cycle arrested cells by Western blotting. We show similar data for siRNA knockdowns. Our siRNA knockdown experiments include a “siDEATH” control that induces cytotoxicity by targeting several essential genes. In Fig. S6a we now show that siDEATH transfection results in strong cytotoxicity and cell death in cycling as well as cell cycle arrested G1/S and G2/M populations indicating efficient protein depletion. Additionally, in Fig. S6b we now show depletion NCAPH2 protein levels by siRNA knockdown in cycling as well as cell cycle arrested cell populations by Western blot analysis. We mention these results on page 11 and page 13.

      Reviewer #2 (Public review):

      The filtering strategy used in the screen imposes significant constraints, as it selects only for non-essential or functionally redundant genes. This is a critical point, as key regulators of chromatin organisation - such as components of the condensin and cohesin complexes-are typically essential for viability. Similarly, known effectors of centromere behaviour (e.g., work by the Fachinetti's lab) often lead to aneuploidy, micronuclei formation, and cell cycle arrest in G1. The implication of this selection criterion should be clearly discussed, as it fundamentally shapes the interpretation of the study's findings.

      We discussed our hit selection criteria on page 8 and in the Methods section. Some of the concerns regarding a bias towards non-essential genes are alleviated by the fact that our screen is limited to a relative short duration of 72 hours rather than the longer timepoints that are generally used to assess essentiality in pooled CRISPR-KO screens, allowing us to identify genes that may be essential if eliminated permanently. In support of this notion, we identify subunits of the essential condensin and cohesin complexes as hits with only limited effect on cell viability. In this case, the Z-score for change in cell number upon NCAPH2 knockout was -0.26 indicating only a mild reduction compared to the average cell number across all targets.

      Other confounding effects on hit selection due to micronuclei formation, cell cycle effects etc. are minimized as we closely monitor micronuclei formation and cell viability in our screen. Finally, aneuploidy is similarly not a confounding factor in hit identification since, as we previously demonstrated, the Ripley’s K-based clustering score is robust to changes in spot number (Keikhosravi, A., et al. 2025).

      A major limitation of the study is the lack of connection between centromere clustering and its biological significance. It remains unclear whether this clustering is a meaningful proxy for higher-order genome organisation. Additionally, the study does not explore potential links to cell identity or transcriptional landscapes. Readers may struggle to grasp the broader relevance of the findings: if gene knockouts that alter centromere positioning do not affect cell viability or cell cycle progression, does this imply that centromere clustering - and by extension, interphase genome organisation - is not biologically significant?

      We appreciate these points. Given the presence of one centromere on each chromosome, we used centromeres as surrogate landmarks of higher-order nuclear genome organization and considered centromere patterns as a general indicator of overall genome organization. While the relationship of centromere patterns to other genome features is poorly understood in mammalian cells, a link is suggested by observations in other organisms. For example, in yeast, the clustering of centromeres reflects the overall Rabl configuration of chromosomes. Having said that, we agree that our extrapolation to overall genome organization is somewhat speculative, and we have toned down these conclusions throughout the manuscript.

      We agree that one of the most interesting questions emerging from our study is whether centromere clustering has a functional role. In follow-up studies we will use some of the key regulator identified in these screens to perturb the native centromere distribution and assay for various cellular responses including in gene expression and genome integrity. These studies will be the subject of future publications.

      Another point requiring clarification is the conclusion that the four identified genes represent independent pathways regulating centromere clustering. In reality, all of these proteins localise to centromeres. For example, SPC24 and NUF2 are components of the NDC80 complex; Ki-67, a chromosome periphery protein, has been mapped to centromeres; and CAP-Hs, a subunit of the condensin II complex that during G1 promotes CENP-A deposition. Given their shared localisation, it would be informative to assess aneuploidy indices following depletion of each factor. Chromosome-specific probes could help determine whether centromere dysfunction leads to general mis-segregation or reflects distinct molecular mechanisms. Additionally, exploring whether Ki-67 mutants that affect its surfactant-like properties influence centromere clustering could provide a more mechanistic insight.

      We thank the reviewer for this comment. We now clarify the relationship of these proteins to centromeres in more detail on page 12. While they all have some relationship to centromeres, as would be expected if they contributed to centromere clustering, they represent multiple distinct pathways and processes.

      The observed effects on clustering are unlikely due to aneuploidy as only very limited aneuploidy is observed in our cells and because Ripley’s K measurement of centromere clustering is robust to change in chromosome copy number. Follow-up studies using live cell imaging approaches are currently in progress to address some of these mechanistic questions.

      Finally, the additive effects observed mild mis-segregation effects are amplified when two proteins within the same pathway are depleted. This possibility should be considered in the interpretation of the data.

      We rephrased the text on page 14 based on the reviewer’s recommendations.

      Reviewer #3 (Public review):

      Given the authors' suggestion that disorderly mitotic progression underlies the changes in centromere clustering in the subsequent interphase, I think it would be beneficial to showcase examples of disorderly mitosis in the AID samples and perhaps even quantify the misalignment on the metaphase plate.

      We now include in Fig. S11 examples of disordered mitotic nuclei observed in the absence of NUF2 or SPC24.

      I don't quite agree with the description that centromeres cluster into chromocenters (p4 para 2, p17 para 1, and other instances in the manuscript). To the best of my knowledge, chromocenters primarily consist of clustered pericentromeric heterochromatin, while the centromeres are studded on the chromocenter surface. This has been beautifully demonstrated in mouse cells (Guenatri et al., JCB, 2004), but it is true in other systems like flies and plants as well.

      We have modified this description on page 4.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Proper characterisation of the cell lines used in the manuscript. Tagged proteins have been known to affect protein levels compared to the parental cell, and where this is the case (or not), it needs to be transparently shown in the manuscript.

      The cell lines to conditionally deplete NCAPH2 and KI67 have previously been published, and they have been characterized to show normal expression levels of the tagged protein (Takagi et al., 2018). We also show quantification of Western blots to compare protein level of tagged SPC24 and NUF2 to that of the untagged proteins in the parental cell line (Fig. S8e-f) and discuss these results on page 11 and page 12.

      (2) Demonstration of protein depletion in the degron cell lines.

      We showed efficient protein depletion in the degron cell lines (Fig. S8c and S8d). In addition, we now show in Fig. S9 depletion of SPC24 and NUF2 in cells arrested at G1/S and G2/M.

      (3) The study examines centromere clustering, but not genome architecture. While it is understood that a complete investigation of genome architecture is beyond the scope of the current study, the interpretation does not match the data. The authors are suggested to pay attention to this point throughout the manuscript and consider their findings in terms of centromere clustering rather than genome architecture, including changing the title accordingly.

      We have toned down our statements regarding overall genome organization throughout the manuscript. Since centromeres are a natural fiducial marker for overall genome organization and a link to overall genome organization has been suggested in some organisms such as yeast, we have retained the wording in a few select instances, including the title. We also make it clear that we do not intend to draw conclusions regarding TADs or even compartments but consider centromere patterns an indicator of overall genome organization.

      Reviewer #1 (Recommendations for the authors):

      (1) Controls of depletion by western blot in synchronized cells (siRNAs and degrons) are lacking.

      We now show Western blots demonstrating efficient depletion of the target proteins in degron (Fig. S9) and siRNA treated cell-cycle arrested cells (Fig. S6b).

      It would have been very nice to discuss the implications of these findings further. For example, do centromere clustering changes gene expression/repression of pericentromeric heterochromatin expression? Is centromere clustering associated with specific diseases? How is global chromatin organization affecting gene expression/genome stability, etc? Although some of these aspects are unknown, a discussion about them would have been nice.

      We appreciate these interesting points. These questions are the subject of our ongoing follow up studies. We now discuss possible consequences of centromere re-organization on gene expression and genome stability on page 18.

      Reviewer #2 (Recommendations for the authors):

      Major Comments:

      (1) Clarify Scope and Avoid Overinterpretation

      (a) The study exclusively investigates centromere positioning, without addressing broader aspects of genome architecture.

      (b) There is no established link presented between centromere positioning and higher-order genome organisation.

      We have toned down our statements regarding overall genome organization throughout the manuscript. Since centromeres are a natural fiducial marker for overall genome organization and observations in yeast suggest such a link, we have retained the wording in a few select instances. We make it clear that we do not intend to draw conclusions regarding TADs or even compartments but consider centromere patterns an indicator of overall genome organization.

      (c) The exclusion criteria used in the screen should be clearly explained, including the implications of selecting only non-essential or redundant genes.

      We discuss on page 8 and in the Methods section the exclusion criteria used in the screen, including the implications for identifying essential genes.

      (d) The authors should discuss why the identified proteins significantly affect centromere clustering but do not impact cell cycle progression.

      We now discuss this topic briefly on page 9. While some hits are expected to affect both cell-cycle progression and centromere clustering (Fig. S4c), it is not a priori expected that all hits would affect both.

      (2) Supplementary Figure 1

      This figure appears unnecessary. The co-localisation between CENP-C and CENP-A is well established in the literature, and the scoring provided does not add essential new information.

      The data was included in response to repeat questions from a centromere expert. We prefer to retain this data for completeness.

      (3) Differential Hits between Cell Lines 

      For hits that behave differently across cell lines, expression data should be provided. Are the genes equally expressed in both cell types? What is the level of depletion achieved?

      It is possible that cell-type specific hits arise due to difference in expression. Cell-type specific hits may also arise due multiple other reason including cancer vs. non-cancer origin, hTERT-immortalization, cell growth properties, variation in underlying DNA sequences of the Cas9 target loci, initial state of centromere clustering to name a few. Each of these possibilities requires additional experiments to identify the exact reason for cell-type specificity of a given factor. A full analysis of the reason for cell-type specificity is, however, beyond the scope of current study.

      (4) Efficiency of Cell Cycle-Specific Degradation

      Degradation efficiency likely varies across cell cycle stages. The authors should provide Western blots showing the extent of protein depletion at each cell cycle block.

      We provide Western blot data in Fig. S9 to demonstrate efficient knockdown of proteins in G1/S and G2/M arrested cells.

      (5) Figure S6 - Validation of New Cell Lines

      Genotyping data for the newly generated cell lines should be included, along with Western blots using protein-specific antibodies (not just the tag), compared to the parental cell line.

      We provide in Fig. S7c-d genotyping data and in Fig. S8e-f Western blot data to compare levels of tagged and untagged proteins.

      (6) Figure S7 - G2/M Block Efficiency

      The G2/M block appears suboptimal after 20 hours in RO-3306, with only ~50% of cells in G2/M and just 21-27% for Ki-67, where most cells remain in S phase. This raises concerns about the interpretation of mitotic depletion effects. It is possible that cells never progressed from G1 or completed S phase without Ki-67. Prior studies (van Schaik et al., 2022; Stamatiou et al., 2024) have shown delayed and uneven replication of centromeric/pericentromeric regions upon Ki-67 depletion during S phase, which could affect the readout. Live-cell imaging would be a more robust approach to confirm mitotic status.

      For KI67 after RO-3306 treatment, 73 and 67% cells were arrested at the G2/M boundary in the presence or absence of KI67, respectively (Fig. S10a-b). Upon release from G2/M arrest, the proportion of G1 cells increased from 6-13% to 28-60% in all four factors tested (Fig. S10b, and d). Please note that our results are not directly dependent on release efficiency, since we use single-cell staging (Fig. 3b) and selectively analyze only G1 populations (Fig. 5c).

      We are currently working towards live cell imaging, but this requires development and characterization of additional cell lines which is beyond the scope of this study.

      Statistical analyses of cell cycle phase distributions should also be included.

      We include statistical analyses of cell cycle phase distributions in Fig. S4c and Fig. S10c-d by performing t-tests with FDR corrections to compare percentage of cells in either in G1, S or G2 in the presence and absence of each factor tested.

      (7) Aneuploidy Assessment

      Aneuploidy scores for the four key proteins should be provided, ideally using centromere-specific FISH probes.

      While an aneuploidy score for each hit would be interesting piece of information, we showed in a previous publication that the Ripley’s K-based Clustering Score method used here is robust to aneuploidy (Keikhosravi et al., 2025) and aneuploidy would thus not lead to spurious identification of these proteins in our screen.

      (8) Add-Back Experiment (Page 14)

      While the add-back experiment is conceptually strong, its execution could be improved. <br /> It should be performed on synchronised cells: deplete the protein in G2/M, arrest in thymidine, then release into G1 without the protein to observe the unclustering phenotype.

      Re-expression should occur during the block, followed by release and analysis in the next G1 phase. This would better demonstrate whether clustering defects from the previous division can be rescued.

      We have attempted these types of long-term depletion experiments in cell-cycle arrested cells, but have observed significant viability defects, making results uninterpretable.

      (9) Statistical Analyses

      Several figures lack statistical analysis, which is essential for data interpretation:

      (a) Figure 1B-E

      (b) Figure 3I

      (c) Figure 4B

      (d) Figure 5B, C, G

      (e) Supplementary Figures S4B and S7

      Statistical analyses were performed for a) Fig. 1b-e, b) Fig. 3i, c) Fig. 4b, d) Fig. 5b-c and the details of the test are mentioned in the corresponding figure legends. We also include statistical tests for Fig. 5g, S5b and S7c-d.

      Minor Comments:

      (1) Page 9: "Reassuringly, in line with known centromere-nucleoli association (Bury, Moodie et al. 2020, van Schaik, Manzo et al. 2022)..."

      The citation "van Schaik, Manzo et al. 2022" is incorrect and should be revised.

      We have removed this reference.

      (2) Page 10:

      "...were grouped into six categories: regulators of chromatin structure, kinetochore proteins, nucleolar proteins, nuclear pore complex components..."

      The authors should note that NUP160, listed as a nuclear pore complex hit, is also a kinetochore component during mitosis and may be linked to mitotic defects.

      We now mention this on page 10.

      (3) Page 12:

      "Progression through S phase was equally efficient in the presence or absence of KI67."

      While bulk S phase progression may appear unaffected, refined analyses (e.g., Repli-seq, EdU patterning) have shown delayed replication of centromeric/pericentromeric regions upon Ki-67 depletion. This should be acknowledged, especially given the study's focus on centromeres (see Schaik et al., 2022; Stamatiou et al., 2024).

      Our statement was meant to describe the results we observed in this study. We indicate that overall progression is not affected, but subtle effects may persist, and we cite the relevant references on page 13.

      (4) Page 12:

      "KI67 is a well-known marker of cell proliferation..."

      The first study demonstrating the dependency of chromosome periphery on Ki-67 was Booth et al., 2014, which should be cited.

      This citation has been added.

      Reviewer #3 (Recommendations for the authors):

      (1) On page 14, paragraph 1, the authors suggest that NCAPH2 and SPC24 act independently on centromere clustering. I'm not convinced that this is the right interpretation of the data. Rather, the lack of an additive phenotype following NCAPH2 and SPC24 dual depletion suggests to me that these two proteins are acting in the same pathway.

      We show that knockdown of NCAPH2 and SPC24 results in opposite effects in centromere clustering. However, knockdown of SPC24 in NCAPH2-AID cells produces an intermediate level of clustering compared to depletion of NCAPH2 or SPC24 knockdown alone. This indicates additive effects. We have modified our description of these results on p. 14.

      (2) The analysis and experimental design in Figure 5g could be improved. For one, I would add statistical comparisons like the other figure panels. Second, the authors would ideally perform AID depletion in a synchronized G2 population before washout during the subsequent G1. This design might make some of the more subtle changes (e.g., KI67-AID) more obvious.

      We now include statistical analysis in Fig. 5g. We have attempted long-term depletion experiments in cell-cycle arrested cells, but have observed significant viability defects, making results uninterpretable.

      (3) In the discussion, the authors allude to centromere clustering data from the NDC80 complex, HMGA1, and other HMGs but fail to direct the reader to where they may find the data. If these data are in Tables S4 and S5, perhaps the authors could make these tables more reader-friendly?

      For each target, the mean Z-score of two biological replicates based on Clustering Score is located in column H in Table S4 and S5.

      (4) In my opinion, the term 'clustering score' comes across a bit ambiguous. In most cases, this term appears to refer to the distance between centromeric foci but is used occasionally to refer to the number of centromeric spots. For example, on page 9, paragraph 1, line 3, cluster/clustering is used three times but with slightly different meanings. Perhaps the authors can consider using the word 'clustering' to indicate the number of spots, 'dispersion' to indicate distance between centromeres, and 'radial distribution' to indicate distance from the nuclear center? Or other ways to improve the consistency of the descriptive terms.

      We apologize for not being clear. The Clustering Score is a very specific parameter derived from use of a Ripley’s K clustering algorithm as described in Materials and Methods. We now ensure that the term is used correctly throughout and that the other terms are also used consistently.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses

      As presented, the manuscript has limitations that weaken support for the central conclusions drawn by the authors. Many of the findings align with prior work on this topic, but do not extend those findings substantially.

      An overarching limitation is the lack of temporal resolution in the manipulations relative to the behavioral assays. This is particularly important for anxiety-like behaviors, as antecedent exposures can alter performance. In the open field and elevated zero maze assays, testing occurred 30 minutes after CNO injection. During much of this interval, the targeted neurons were likely active, making it difficult to determine whether observed behavioral changes were primary - resulting directly from SuM neuronal activity - or secondary, reflecting a stress-like state induced by prolonged activation of SuM and related circuits. This concern also applies to the chronic inhibition of ventral subiculum (vSub) neurons during 10 days of CSDS.

      We appreciate the reviewer's concern regarding the timing of CNO administration relative to behavioral testing. The 30-minute interval was selected according to some previous studies[1, 2]. This window ensures stable and specific neuronal manipulation while minimizing off-target effects and was strictly performed through all experiments. We acknowledge that shorter interval (~15 mins) can be efficient to produce biological effect in vivo[3, 4]. We repeated chemogenetic tests 2-3 times to make sure to get reliable data for statistical analysis. However, we cannot exclude potential side-effects caused by chemogenetically prolonged activation of SuM because of its poor temporal resolution compared to optogenetic manipulation. We agree that employing techniques with higher temporal resolution, such as optogenetics, in future studies would provide an excellent complement to these findings.

      The combination of stressors (foot shock and CSDS) and behavioral assays further complicates interpretation. The precise role of SuM neurons, including SANs, remains unclear. Both vSub and dSub neurons responded to foot shock, but only vSub neurons showed activity differences associated with open-arm transitions in the EZM.

      We agree that the use of multiple stressors (foot shock and CSDS) adds complexity to the interpretation. Our rationale was to test the generality of the SuM response and the role of SANs across different stress modalities (acute vs. chronic). The key finding is that while both vSub and dSub projections to the SuM were activated by the acute stressor of foot shock (Figure 5N-R), only the vSub-SuM pathway showed a significant increase in calcium activity specifically during the anxiety-provoking transition from the closed to the open arms of the EZM (Figure 5I-M). This dissociation suggests a selective role for the vSub-SuM circuit in encoding anxiety-related information, beyond a general response to stress.

      In light of prior studies linking SuM to locomotion (Farrell et al., Science 2021; Escobedo et al., eLife 2024), the absence of analyses connecting subpopulations to locomotor changes weakens the claim that vSub neurons selectively encode anxiety. Because open- and closed-arm transitions are inherently tied to locomotor activity, locomotion must be carefully controlled to avoid confounding interpretations.

      We thank the reviewer for highlighting the important studies linking the SuM to locomotion. We acknowledge this known function and carefully considered it in our analyses. Non-selective activation of the entire SuM didn’t affect total distance traveled in open field and elevated zero maze (Supplemental Figure 2 B-C). Although the locomotion of mice in OF and EZM was affected while targeting SANs, we also compared the travel distance in the central area of OF, to some extent, to minimize the influence of locomotion on the estimation of anxiety produced avoidance to the central area (Figure 4 I). We agree that future work delineating the specific subpopulations within the SuM that regulate locomotion versus anxiety would be highly valuable.

      Another limitation is the narrow behavioral scope. Beyond open field and EZM, no additional assays were used to assess how SAN reactivation affects other behaviors. Without richer behavioral analyses, interpretations about fear engrams, freezing, or broader stress-related functions of SuM remain incomplete.

      In addition, small n values across several datasets reduce confidence in the strength of the conclusions.

      We acknowledge that the primary focus on OF and EZM tests is a limitation in fully characterizing the behavioral profile of SAN manipulation. These tests were selected as they are well-validated, standard assays for anxiety-like behavior in rodents[5–10]. However, we also included the reward-seeking test, where activation of SANs significantly suppressed sucrose consumption (Figure 4L), suggesting a broader impact on motivational state that is often linked to anxiety. We fully agree with the reviewer that employing a richer behavioral battery—such as tests for social avoidance, conditioned place aversion, or Pavlovian fear conditioning—in future studies will be essential to comprehensively define the functional scope of SuM SANs and to conclusively dissect their role from fear memory engrams.

      Figure level concerns:

      (1) Figure 1: In Figure 1, the acute recruitment of SuM neurons by for shock is paired with changes in neural activity induced by social defeat stress. Although interesting, the connections of changes induced by a chronic stressor to Fos induction following acute foot shock are unclear and do not establish a baseline for the studies in Figure 3 on activation of SANs by social stressors.

      Thank you for this important comment. We agree that directly linking acute foot shock-induced cFos expression with chronic social defeat stress (CSDS) electrophysiological changes may create an interpretive gap. In Figure 1, we aimed to demonstrate that both acute (foot shock) and chronic (CSDS) stressors can activate SuM neurons, using complementary methods (cFos for acute, in vivo recording for chronic). We did not intend to imply that the same neuronal population responds identically to both stressors.

      To address this, we have clarified in the text that the purpose of Figure 1 is to show that SuM is responsive to diverse stressors, rather than to establish a direct mechanistic link between acute and chronic activation patterns. The baseline for SAN studies in Figure 3 is established through the TRAP2 tagging protocol following foot shock, independent of the CSDS model. We acknowledge that future studies should compare SAN recruitment across acute vs. chronic stressors to better define their functional overlap.

      (2) Figure 2: The chemogenetic experiments using AAV-hSyn-Gq-DREADDs lack data or images, or hit maps showing viral spread across animals. This omission is critical given the small size of SuM, where viral spread directly determines which neurons are manipulated. Without this, it is difficult to interpret findings in the context of prior studies on SuM circuits involved in threats and rewards.

      Please see Supplemental Figure 2 for the infection area of AAV.

      (3) Figure 3: The TRAP experiments show that the number of labeled neurons following foot shock (Figure 3F) is approximately double that of baseline home-cage animals, though y-axis scaling complicates interpretation. It is unclear whether this reflects true Fos induction, low TRAP efficiency, or baseline recombination.

      We thank the reviewer for pointing out the axis scaling issue. We have modified the y-axis to start from 0. The SuM nucleus has been reported to play role in the awake of rodents, it’s reasonable to have some basal neuronal activation after 4-OHT i.p. injection.

      Overlap analyses are also limited. For example, it is not shown what proportion of foot shock SANs are reactivated by subsequent foot shock. Comparisons of Fos induction after sucrose reward are also weakened by the very low Fos signal observed. If sucrose reward does not robustly induce Fos in SuM, its utility in distinguishing reward- versus stress-activated neurons is questionable. Thus, conclusions about overlap between SANs and socially stressed neurons remain uncertain due to the missing quantification of Fos+ populations.

      Thank you for the question. We have replaced the reactivation chance graph with a new reactivation percent analysis graph to show the proportion of SANs that reactivated by subsequent sucrose reward or stress. The rationale we use social stress other than foot shock is to show the potential generality of foot-shock tagged neurons. The lower expression of cFos after sucrose exposure suggest first, the SuM may not involve in reward regulation, which we agree with you; second, those SANs are more likely to modulate anxiety-like behavior but not reward.

      (4) Supplemental Figure 3: The claim that "SANs in the SuM encode anxiety but not fear memory" is not well supported. Inhibition of SANs (Gi-DREADDs) did not alter freezing behavior, but the absence of change could reflect technical issues (e.g., insufficient TRAP efficiency, low expression of Gi-DREADDs). Moreover, the manuscript does not provide a positive control showing that SuM SANs inhibition alters anxiety-like behavior, making it difficult to interpret the negative result. Prior work (Escobedo et al., eLife 2024) suggests SuM neurons drive active responses, not freezing, raising further interpretive questions.

      We agree that here we didn’t provide enough data to confirm there is no regulation effect of SuM-SANs on fear memory. Relevant statement has been removed to avoid any further misunderstanding.

      (5) Figure 4: The statement that corticosterone concentration is "usually used to estimate whether an individual is anxious" (line 236) is an overstatement. Corticosterone fluctuates dynamically across the day and responds to a broad range of stimuli beyond anxiety.

      Thank you for your kind reminder. Corticosterone/cortisol, the primary stress hormone, is a well-established biomarker whose levels are elevated in response to stress and in anxiety states.[11, 12]. Some studies also reported that supplying corticosterone can produce anxiety-like behaviors in rodents[13–16]. We collect the blood sample at the same timepoint in Figure 4 C-D. We agree that line 236 is a kind of overstatement and has modified.

      (6) Figures 5-6: The conclusion that vSub neurons encode anxiety-like behavior is not firmly supported. Data from photo-activating terminals in SuM is shown for ex vivo recording, but not in vivo behavior, which would strengthen support for this conclusion. Both vSub and dSub neurons responded to foot shock. The key evidence comes from apparent differential recruitment during open-arm exploration. However, the timing appears to lag arm entry, no data are provided for closed-arm entry, and there is heterogeneity across animals. These limitations reduce confidence in the authors' central claim regarding vSub-specific encoding of anxiety.

      We thank the reviewer for this important point. To address the concern regarding the in vivo behavioral encoding specificity of the vSub-SuM pathway, we further analyzed the in vivo fiber photometry data. The new analysis revealed that calcium activity in vSub-SuM projection neurons exhibited bidirectional, instantaneous, and specific changes during transitions between the open and closed arms of the elevated plus maze: their activity significantly and immediately decreased when mice moved from the open arm to the closed arm (new results shown in Supplemental Figure 5), and conversely, significantly and immediately increased upon transitioning from the closed to the open arm. However, under the same behavioral events, dSub-SuM projection neurons showed no significant change in activity. We hope this finding could strengthens the role of the vSub-SuM pathway in encoding anxiety-like behavior.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      (1) From the data presented, the authors conclude that "the SuM is the critical brain region that regulates anxiety" (line 190). This interpretation appears overstated, as it downplays well-established contributions of other brain regions and does not place SuM's role within a broader network context. The data support that SuM neurons are recruited by foot shock and, to a lesser extent, by acute social stress. However, the alterations in activity of SuM subpopulations following chronic stress reported in Figure 1 remain largely unexplored, limiting insight into their functional relevance.

      Thank you for the suggestion. We have modified the line 190 with cautious “In this study, we combined multiple methods to determine whether the SuM is a brain region that involve in modulating anxiety.”

      (2) The limited temporal resolution of DREADD-based manipulations leaves alternative explanations untested. For example, if SANs encode signals of threat, generalized stress, or nociception, then prolonged activation could indirectly alter behavior in the open field and EZM assays, rather than reflecting direct anxiety regulation.

      We discussed the DREADD method in the first part in our response.

      (3) The conclusion that "SuM store information about stress but not memory" (line 240) is not fully supported, particularly with respect to possible roles in memory. The lack of a role in memory of events, as opposed to the output of threat or stress memory, may be true, but is functionally untested in presented experiments. The data do indicate activation of the SuM neuron by foot shock, which has been previously reported (Escobedo et al eLife 2024). The changes in SuM activity following chronic stress (Figure 1) are intriguing, but their relationship to "stress information storage" is not clearly established.

      Thank you for your valuable comments. Foot-shock-activated neurons may play role in modulate any of the following anxiety-like behaviors and emotional memory (fear memory). We realized that we didn’t fully test all aspects of anxiety and memory, thus resulting in some overstatements in the manuscript. It is more proper to focus on “anxiety avoidance” according to the reduced open-arm exploration in EZM/EPM.

      Reviewer #2 (Public review):

      This manuscript investigates the neural mechanisms of anxiety and identifies the supramammillary nucleus (SuM) as a critical hub in mediating anxiety-related behaviors. The authors describe a population of neurons in the SuM that are activated by acute and chronic stress. While their activity is not required for fear memory recall, reactivation of these neurons after chronic stress robustly increases anxiety-like behaviors as well as physiological stress markers. Circuit analysis further shows that these stress-activated neurons are driven by inputs from the ventral, but not dorsal, subiculum, and inhibition of this pathway exerts an anxiolytic effect.

      The study provides an elegant integration of techniques to link stress, neuronal ensembles, and circuit function, thereby advancing our understanding of the neural substrates of anxiety. A particularly notable point is the selective role of these stress-activated neurons in anxiety, but not in associative fear memory, which highlights functional distinctions between neural circuits underlying anxiety and fear.

      Some aspects would benefit from clarification. For example, how selective is the recruitment of this population to stress compared with other aversive states, and how should one best interpret their definition as "stress-activated neurons" given the relatively modest overlap across stress exposures? In addition, the use of the term "engram" in this context raises conceptual questions. Is it appropriate to describe a neuronal ensemble encoding an emotional state as an engram, a term usually tied to specific memory recall?

      Overall, this work makes a valuable contribution by identifying SuM stress-activated neurons and their ventral subiculum inputs as central elements of the circuitry underlying anxiety. These findings provide a valuable framework for future studies investigating anxiety circuitry and may inform the development of targeted interventions for stress-related disorders.

      We thank the reviewer for raising these important points. We agree that further clarification is warranted. In our study, we compared SAN reactivation across different stimuli: foot shock (acute physical stress), social stress (chronic psychosocial stress), and sucrose reward (non-aversive positive stimulus). As shown in Figure 3, SANs in the supramammillary nucleus (SuM) were significantly reactivated by social stress but not by sucrose reward. Moreover, the c-Fos response in SuM was markedly higher after foot shock compared to home cage controls (Figure 1). While we did not test all possible aversive states (e.g., pain, sickness), our data support that SuM SANs are preferentially recruited by stressors rather than by reward or neutral conditions. We acknowledge that the overlap across stress modalities is not complete, which may reflect differences in stress intensity, duration, or circuit engagement. Future work will systematically compare SAN recruitment across diverse aversive and non-aversive states to further define their selectivity.

      The term “stress-activated neurons” (SANs) here refers to neurons that are reliably activated by at least one type of stressor and can be reactivated by subsequent stress exposure. The partial overlap across stressors likely reflects the diversity of stress responses and the possibility that distinct subpopulations within SuM may encode different aspects of aversive experience. Importantly, chemogenetic activation of SANs was sufficient to induce anxiety-like behavior and elevate corticosterone (Figure 4), supporting their functional role in stress-related behavioral and physiological outputs. We have revised the manuscript to clarify that SANs represent a stress-responsive ensemble rather than a uniform population activated identically by all stressors.

      We appreciate the reviewer’s conceptual caution. In the revised manuscript, we intentionally avoided using the term “engram” to describe SANs. Our focus is on a stress-activated neuronal ensemble that drives anxiety-like behavior, not on memory recall per se. We refer to SANs as an “ensemble” or “population” rather than an engram, consistent with the TRAP-based labeling approach used to capture neurons activated during a specific experience. We agree that “engram” is best reserved for memory-encoding cells and will ensure this distinction remains clear throughout the text.

      Reviewer #3 (Public review):

      Weaknesses:

      The strength of some of the evidence is judged to be incomplete. The paper provides good evidence that SuM contains stress-responsive neurons, and the activity of these neurons increases some measure of anxiety-like behavior. However, the evidence that the vSub-SuM projection "encodes anxiety" and that the SuM is a key regulator of anxiety is judged to be incomplete. The claim that SuM generates an "anxiety engram" is also judged to be incompletely supported by the evidence. Namely, what is unclear is whether these cells/regions encode anxiety per se versus modulate behaviors (like exploration) that tend to correlate with anxiety. Since many brain regions respond to footshock and other stressors, the response of SuM to these stimuli is not strong evidence for a role in anxiety. I am not convinced that the identified SuM cells have a specific anxiety function. As the authors mention in the introduction, SuM regulates exploration and theta activity. Since theta potently regulates hippocampal function, there is the concern that SuM manipulations could have broad effects. As shown in Supplementary Figure 2, stimulating stress-responsive cells in SuM potently reduces general locomotor exploration. This raises concerns that the manipulation could have broader effects that go beyond just changes in anxiety-like behavior. Furthermore, the meaning of an "anxiety engram" is unclear. Would this engram encode stress, the sense of a potential threat, or the behavioral response? A more developed analysis of the behavioral correlates of SuM activity and the behavioral effects of SuM manipulations could give insight into these questions.

      We appreciate the reviewer’s thoughtful critique regarding the specificity of SuM’s role in anxiety and the interpretation of our findings. We acknowledge that SuM has broad functions, including regulating exploration and hippocampal theta. However, our data show that general SuM activation increases anxiety-like measures (reduced open-arm time in EZM, decreased center exploration in OF) without altering total locomotion (Fig. 2, Suppl. Fig. 2). The locomotor reduction in SAN activation experiments (Suppl. Fig. 2F–G) was observed alongside clear anxiety-like behavioral changes (e.g. suppressed reward seeking), suggesting that the effects are not solely due to motor suppression. We agree that the methods we used to estimate anxiety-like behaviors base on mice movement when testing, and this could be a shortage of this research when trying to link the data to anxiety. Therefore it will be more proper to interpret the results as modulation of anxiety-like behavior (anxiety related avoidance) but not anxiety itself. We have modified the manuscript to describe more precise to avoid overstatement.

      Our fiber photometry data (Fig. 5) show that vSub–SuM projection neurons increase activity specifically when mice enter open arms of the EZM—a behavioral transition associated with anxiety—whereas dSub–SuM projections do not. This activity correlates with anxiety-related behavior, not merely with movement or stress per se.

      We also agree that the term “engram” may be misleading in this context. In the manuscript, we refer to SANs as a “stress-activated neuronal ensemble” rather than an anxiety engram. Our data indicate that these neurons are recruited by stress and their reactivation produces more anxiety related avoidance to open arms. We have revised the text to avoid conceptual overreach and to clarify that SuM SANs likely contribute to a state of sustained anxiety/avoidance.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting, including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Readers would also benefit from noting that the subjects were male in the abstract and discussion of the limitations of the exclusion of females.

      Thank you for the suggestion. We have included the full statistical detail in a separate sheet as Table 1. Also, we have modified the title of the manuscript to reflect the sex of the mice.

      Reviewer #1 (Recommendations for the authors):

      (1) In line 211, the authors state, "we recorded neuronal action potentials via multichannel extracellular recording while the mice were moving in the EPM, a traditional type of maze used to test anxiety in rodents,". However, it is unclear what data is presented in the paper, that is, extracellular recordings from SuM in mice on the elevated plus maze.

      We have deleted the description of multichannel recording data in EPM as the data was removed earlier.

      Minor corrections to the text and figures.

      (2) For bar plots, perhaps clarify how the data is presented. For example, in Figure 4, "The data in B, D, E and I-L are presented as the means {plus minus} SEMs," but this does not appear to be plotted as a mean with SEM error bars because the error bars cover all the values.

      Corrected.

      (3) In Figure 5, the white text for EGFP in panel B is very difficult to see.

      Corrected.

      (4) For Figure 5D, it would be helpful to more clearly specify which neurons in SuM were recorded from. Was it SANs or all SuM neurons?

      We did whole-cell recording on all SuM neurons.

      (5) Fos2A-iCreERT2 is mislabeled as "Fos2A-iCreERT" in the methods.

      Corrected.

      (6) The sentence at line 139 "To make sure foot shock induced anxiety won't last until manipulation, we subjected139mice to an acute stress protocol involving foot shocks and then performed the elevated plus140maze (EPM) and elevated zero maze (EZM) tests to evaluate anxiety on days 2 and 7," is unclear as written.

      Thank you for pointing this. We have modified the sentence to make it more clear. “To make sure mice are on similar basal condition while applying chemo-genetic manipulation, we subjected mice to an acute stress protocol involving foot shocks and then performed the elevated plus maze (EPM) and elevated zero maze (EZM) tests to evaluate anxiety on days 2 and 7 (Figure 4 A). The mice that experienced foot shocks showed decreases in the exploration time in the open arms on day 2. However, acute stress-induced anxiety was not detected on day 7 (Figure 4 B), which allow us to compare the reactivation of SANs produced anxiety-like behavior between groups at the same baseline.”

      (7) The details of the viral injections used for ex vivo electrophysiology are not sufficient to understand the experiment and the implications of the data. Which neurons (SANs?) are recorded from, what percent of those had inputs, were the sub-neurons globally labeled or just SANs?

      We performed whole-cell recording on global SuM neurons to show if the projection is innervated by glutamergic neurons in Sub as shown in Figure 5-B that the projection neurons in Sub are exclusively vglut1 expressed. Based on this aim of the experiment, we didn’t keep any neurons that were not response to the light stimulation, therefore can’t calculate the input percent in this case. We have added words to clearly show that we did global SuM neurons in Methods.

      (8) The scale used in Figure 6C renders that data unreadable. 120 to 40% changes in body weight are well beyond the variability in the data.

      We have modified the axis (90 to 110%) to show the body weight change clearer.

      (9) The dose of CNO used, 5 mg/kg, is high, and using lower doses or other DREADD ligands is worth considering.

      Thank you for your valuable comment. We have noticed that people are using relatively lower dose of CNO or other DREADD ligands that are reported much higher affinity and less side-effect. The dose of 5mg/kg was adapted from earlier papers that using DREADD and show no obvious side-effect in mice[17], e.g locomotion (S Figure 2B), in our experiments, so we keep using this dose in this project to make it consistent across different cohorts of experiments. We are switching to DCZ to avoid any potential side-effect of CNO in the following experiments based on this project.

      Reviewer #2 (Recommendations for the authors):

      This is a strong manuscript that provides important insights into the role of the supramammillary nucleus (SuM) and its inputs from the ventral subiculum in regulating anxiety. The combination of behavioral, imaging, electrophysiological, and circuit manipulation approaches is impressive, and the distinction the authors propose between anxiety-related and fear-related circuits is conceptually important.

      There are, however, some points that I think need clarification. The authors emphasize that the hippocampus is essential for fear memory recall, yet they do not directly evaluate whether the SuM-hippocampal pathway might contribute differentially to anxiety versus fear memory. Addressing this would help to explain where the dissociation between the two processes arises.

      Thank you for the suggestion. We realized that we didn’t collect enough data to exclude the role of those SANs on memory, especially fear memory, a memory formation bases on strong emotional training as aforementioned. The data and relevant discussion have been removed to avoid misunderstanding and overstatement.

      I am also not fully convinced about the definition of the "stress-activated neurons" (SANs). The overlap across repeated stress exposures is quite modest (around 20%), which suggests that this population may not be strictly stress-specific but rather a dynamic subset that is preferentially, though not exclusively, engaged by stress. Related to this, the use of the term "engram" raises conceptual questions. Since the classic engram refers to an ensemble encoding and recalling a specific memory, it is not obvious whether it is appropriate to apply the term to a neuronal population that appears to represent a persistent emotional state. The authors should consider justifying this choice of terminology more carefully or adopting a different term.

      Thank you for your important comments. Yes we agree that the SANs in this manuscript are more likely dynamic subset other than exclusive foot-stress engaged “engram”. That’s why we use “stress-activated neurons” but not “engram” to describe this neuronal ensemble. To avoid further misleading, we have made some modification to reduce the use of “engram” across the manuscript.

      Some parts of the text also need more precision. For example, the statement in lines 63-65 that "few studies have explored emotion-related engram cells" is potentially misleading, as most engram studies focus on memories with a strong emotional component. The rationale for this claim should be clarified.

      This sentence has been deleted since it is not necessary to link the text and misleading.

      In Figure 1, the choice of methods is also puzzling: cFos immunostaining is used after shock delivery, while electrophysiology is used for the CSDS paradigm. It would be helpful to explain why different readouts were chosen for different stress models, and whether this may affect the comparability of the results.

      Thank you for this important comment. In Figure 1, we aimed to demonstrate that both acute (foot shock) and chronic (CSDS) stressors can activate SuM neurons, using complementary methods (cFos for acute, in vivo recording for chronic). The reason we chose different method is that acute stress produces transit effect while chronic stress produces long-lasting effect. To our knowledge, cFos is a well-established marker for strong neuronal activation, but with short lifespan (~4-6 hours) and suits acute paradigm better. In vivo recording allows us to compare the neuronal activity before and after chronic experiments within subjects and has ability to reveal cumulative effect which cFos cannot. To address this, we have clarified in the text that the purpose of Figure 1 in Line 112-113: “To investigate if SuM would be responsive to diverse stressors, we next examined whether chronic stress, which different mechanism underlying…”

      Finally, some additional details would strengthen the presentation. The discussion of corticosterone and other physiological markers could be expanded to indicate whether these effects were robust across stress paradigms. Similarly, the relatively modest overlap between SANs activated by different stressors could be framed more explicitly as part of a broader principle of flexible ensemble recruitment in anxiety-related circuits.

      Thank you for your suggestion. We have added more discussion about the corticosterone and the flexibility of SANs in the manuscript. See Line 267-270: “The serum corticosterone concentration can be used as a marker of stress-induced change in the peripheral blood. Previous studies showed serum corticosterone can be increased by various stress stimulation [39–42]; meanwhile, intentionally supplementing the diet with corticosterone can induce anxiety-like behaviors in rodents[43].” and Line 275-281: “However, the reactivation rate of SANs caused by different stressor was relatively lower than the initial activation rate caused by foot shock (Figure 3). This suggests that stress-activated neuronal clusters may have more flexible recruitment principles, with only a small number of neurons potentially encoding emotional information, while most other neurons remain involved in encoding other neural activities. Studies in other field, particularly studies of memory engram, has shown that the sets of neurons activated during learning are dynamic and exhibit high flexibility [44, 45].”

      Overall, the work is of high quality and provides a valuable contribution to the field, but addressing these points would help sharpen the mechanistic claims and ensure that the conceptual framework is as clear and precise as the experimental data.

      Reviewer #3 (Recommendations for the authors):

      (1) Since increased SuM activity is hypothesized to mediate the effects of stress on anxiety-like behavior, a logical step would be to test for necessity by silencing the stress-activated SuM cells.

      We agree this is a logical and valuable experiment. While our current study focused primarily on the sufficiency of SuM/SAN activation to induce anxiety-like behavior, we acknowledge that inhibition experiments would provide critical complementary evidence for necessity. We have added a statement in the Discussion noting that “future studies should examine whether silencing SuM SANs, either during stress exposure or during anxiety testing, can prevent or reduce stress-induced anxiety”. This will help establish a more complete causal role.

      (2) Discuss what is meant by "anxiety engram" and what features of anxiety the labeled cells might encode.

      We concur that “stress-activated neuron (SAN)” is a more precise descriptor than “engram” in this context. We have revised the text to avoid the potentially misleading term “engram” and instead refer to a “stress-activated neuron”. The labeled cells are preferentially reactivated by stress (not reward), and their activation promotes both behavioral avoidance and physiological stress markers (corticosterone). They likely contribute to the maintenance of an anxious state under perceived threat, rather than encoding discrete threat cues or memories.

      (3) A more nuanced analysis of behavioral correlates of SuM activity and/or the behavioral effects of SuM manipulations would strengthen this paper.

      To provide a more nuanced understanding of the behavioral correlates, we have performed additional analyses on our fiber photometry data (now presented in Supplemental Figure 6). and have also planned additional experiments for the future study to deepen our understanding.

      References:

      (1) Jendryka M, Palchaudhuri M, Ursu D, van der Veen B, Liss B, Kätzel D, et al. Pharmacokinetic and pharmacodynamic actions of clozapine-N-oxide, clozapine, and compound 21 in DREADD-based chemogenetics in mice. Sci Rep. 2019;9.

      (2) Koike H, Demars MP, Short JA, Nabel EM, Akbarian S, Baxter MG, et al. Chemogenetic Inactivation of Dorsal Anterior Cingulate Cortex Neurons Disrupts Attentional Behavior in Mouse. Neuropsychopharmacology. 2016;41:1014–1023.

      (3) Guettier J-M, Gautam D, Scarselli M, Ruiz De Azua I, Li JH, Rosemond E, et al. A chemical-genetic approach to study G protein regulation of cell function in vivo. Proceedings of the National Academy of Sciences. 2009;106:19197–19202.

      (4) Wess J, Nakajima K, Jain S. Novel designer receptors to probe GPCR signaling and physiology. Trends Pharmacol Sci. 2013;34:385–392.

      (5) Kraeuter AK, Guest PC, Sarnyai Z. The Elevated Plus Maze Test for Measuring Anxiety-Like Behavior in Rodents. Methods in Molecular Biology, vol. 1916, Humana Press Inc.; 2019. p. 69–74.

      (6) Kraeuter AK, Guest PC, Sarnyai Z. The Open Field Test for Measuring Locomotor Activity and Anxiety-Like Behavior. Methods in Molecular Biology, vol. 1916, Humana Press Inc.; 2019. p. 99–103.

      (7) Wall PM, Messier C. Methodological and conceptual issues in the use of the elevated plus-maze as a psychological measurement instrument of animal anxiety-like behavior. Neurosci Biobehav Rev. 2001;25:275–286.

      (8) Carobrez AP, Bertoglio LJ. Ethological and temporal analyses of anxiety-like behavior: The elevated plus-maze model 20 years on. Neurosci Biobehav Rev. 2005;29:1193–1205.

      (9) Seibenhener ML, Wooten MC. Use of the open field maze to measure locomotor and anxiety-like behavior in mice. Journal of Visualized Experiments. 2015. 6 February 2015. https://doi.org/10.3791/52434.

      (10) Prut L, Belzung C. The open field as a paradigm to measure the effects of drugs on anxiety-like behaviors: A review. Eur J Pharmacol. 2003;463:3–33.

      (11) Chen Y, Zhou X, Chu B, Xie Q, Liu Z, Luo D, et al. Restraint Stress, Foot Shock and Corticosterone Differentially Alter Autophagy in the Rat Hippocampus, Basolateral Amygdala and Prefrontal Cortex. Neurochem Res. 2024;49:492–506.

      (12) Hassell JE, Nguyen KT, Gates CA, Lowry CA. The Impact of Stressor Exposure and Glucocorticoids on Anxiety and Fear. Curr. Top. Behav. Neurosci., vol. 43, Springer; 2019. p. 271–321.

      (13) Peng B, Xu Q, Liu J, Guo S, Borgland SL, Liu S. Corticosterone attenuates reward-seeking behavior and increases anxiety via D2 receptor signaling in ventral tegmental area dopamine neurons. Journal of Neuroscience. 2021;41:1566–1581.

      (14) Myers B, Greenwood-Van Meerveld B. Elevated corticosterone in the amygdala leads to persistant increases in anxiety-like behavior and pain sensitivity. Behavioural Brain Research. 2010;214:465–469.

      (15) Demuyser T, Deneyer L, Bentea E, Albertini G, Van Liefferinge J, Merckx E, et al. In-depth behavioral characterization of the corticosterone mouse model and the critical involvement of housing conditions. Physiol Behav. 2016;156:199–207.

      (16) Shoji H, Maeda Y, Miyakawa T. Chronic corticosterone exposure causes anxiety- and depression-related behaviors with altered gut microbial and brain metabolomic profiles in adult male C57BL/6J mice. Molecular Brain . 2024;17.

      (17) Manvich DF, Webster KA, Foster SL, Farrell MS, Ritchie JC, Porter JH, et al. The DREADD agonist clozapine N-oxide (CNO) is reverse-metabolized to clozapine and produces clozapine-like interoceptive stimulus effects in rats and mice. Sci Rep. 2018;8.

    1. AbstractVector-borne diseases pose a persistent and increasing challenge to human, animal, and agricultural systems globally. Mathematical modeling frameworks incorporating vector trait responses are powerful tools to assess risk and predict vector-borne disease impacts. Developing these frameworks and the reliability of their predictions hinge on the availability of experimentally derived vector trait data for model parameterization and inference of the biological mechanisms underpinning transmission. Trait experiments have generated data for many known and potential vector species, but the terminology used across studies is inconsistent, and accompanying publications may share data with insufficient detail for reuse or synthesis. The lack of data standardization can lead to information loss and prohibits analytical comprehensiveness. Here, we present MIReVTD, a Minimum Information standard for Reporting Vector Trait Data. Our reporting checklist balances completeness and labor- intensiveness with the goal of making these important experimental data easier to find and reuse, without onerous effort for scientists generating the data. To illustrate the standard, we provide an example reproducing results from an Aedes aegypti mosquito study.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag020), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2:

      I read with interest the manuscript as I wholeheartedly agree there is a strong need for harmonization on reporting quantitative measurements of vector traits, especially for the subsequent development of mathematical models. The paper is well written, and examples are very helpful, particularly the one shown in Figure 1, advocating for the need for the sharing of individual (possibly raw) observations. I have some very minor comments and suggestions. Given the broad readership of the journal, I feel the Introduction would benefit from some definitions of what the authors mean by vector and vector-borne diseases, with some examples (WNV, DENV, … up to you). It's not very clear to me how the authors' current proposal aligns with what already proposed in Wu et al. 2022 (ref 21). It seems like some sort of extension? Could you please further elaborate on this? Regarding latitude and longitude, I think also the coordinate reference system should be standardized (WGS, no UTM or others). You might provide some examples of online repositories (line 187). Some (like GitHub) might not be perpetually available, differently from (hopefully) others like Zenodo or the Supplementary Materials accompanying the paper. The latter might be preferrable in my opinion. Figure 1. Please provide the equation of the TPC. Please note that Figure 2 currently does not seem to be cited in the main text (perhaps it should be on line 248?). What does "Dataset: 572" mean? As currently VecTraits seem the best (and only?) example of what the authors are proposing, perhaps it should be mentioned in the Abstract as well.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Henning et al. examine the impact of GABAergic feedback inhibition on the motion-sensitive pathway of flies. Based on a previous behavioral screen, the authors determined that C2 and C3, two GABAergic inhibitory feedback neurons in the optic lobes of the fly, are required for the optomotor response. Through a series of calcium imaging and disruption experiments, connectomics analysis, and follow-up behavioral assays, the authors concluded that C2 and C3 play a role in temporally sharpening visual motion responses. While this study employs a comprehensive array of experimental approaches, I have some reservations about the interpretation of the results in their current form. I strongly encourage the authors to provide additional data to solidify their conclusions. This is particularly relevant in determining whether this is a general phenomenon affecting vision or a specific effect on motion vision. Knowing this is also important for any speculation on the mechanisms of the observed temporal deficiencies.

      Strengths:

      This study uses a variety of experiments to provide a functional, anatomical, and behavioral description of the role of GABAergic inhibition in the visual system. This comprehensive data is relevant for anyone interested in understanding the intricacies of visual processing in the fly.

      Weaknesses:

      (1) The most fundamental criticism of this study is that the authors present a skewed view of the motion vision pathway in their results. While this issue is discussed, it is important to demonstrate that there are no temporal deficiencies in the lamina, which could be the case since C2 and C3, as noted in the connectomics analysis, project strongly to laminar interneurons. If the input dynamics are indeed disrupted, then the disruption seen in the motion vision pathway would reflect disruptions in temporal processing in general and suggest that these deficiencies are inherited downstream. A simple experiment could test this. Block C2, C3, and both together using Kir2.1 and Shibire independently, then record the ERG. Alternatively, one could image any other downstream neuron from the lamina that does not receive C2 or C3 input.

      Given the prominent connectivity of C2 and C3 to lamina neurons, we actually expected that lamina processing is also affected. We did the experiment of silencing C2 and recording in the lamina neuron L2 and found no significant difference in their response profile (Author response image 1).

      Author response image 1.

      Calcium responses of L2 axon terminals to full field ON and PFF flashes for controls (grey, N=8 flies, 59 cells) or while genetically silencing C2 using shibire<sup>ts</sup> (magenta, N=4 flies, 26 cells). Traces show mean +- SEM.

      We could include these data in the main manuscript, but we do not really feel comfortable in claiming that C2 and C3 have a specific role in motion processing only, even if it was predominantly affecting medulla neurons. To our knowledge, how peripheral visual circuitry contributes to any other visual behaviors, such as object detection, including the pursuit of mating partners, or escape behaviors, is not well understood. Instead, we added a sentence to the discussion stating that our work does not exclude that, given their wide connectivity, C2 and C3 are also involved in other visual computations.

      (2) Figure 6c. More analysis is required here, since the authors claim to have found a loss in inhibition (ND). However, the difference in excitation appears similar, at least in absolute magnitude (see panel 6c), for PD direction for the T4 C2 and C3 blocks. Also, I predict that C2 & C3 block statistically different from C3 only, why? In any case, it would be good to discuss the clear trend in the PD direction by showing the distribution of responses as violin plots to better understand the data. It would also be good to have some raw traces to be able to see the differences more clearly, not only polar plots and averages.

      We apologize: The plots in the manuscript show the mean across all cells, but the statistics were done more conservatively, across flies. We corrected this mismatch and the figure now shows the mean ± ste across flies after first averaging across cells within each fly. Thank you for pointing this out. Since we recorded n=6-8 flies per genotype, we did not include violin plots, which would indeed make sense if we showed data for each cell.

      (3) The behavioral experiments are done with a different disruptor than the physiological ones. One blocks chemical synapses, the other shunts the cells. While one would expect similar results in both, this is not a given. It would be great if the authors could test the behavioral experiments with Kir2.1, too.

      We have tried this experiment, but unfortunately, flies were not walking well on the ball, and we were not able to obtain data of sufficient quality.

      Reviewer #2 (Public review):

      Summary:

      The work by Henning et al. explores the role of feedback inhibition in motion vision circuits, providing the first identification of inhibitory inheritance in motion-selective T4 and T5 cells of Drosophila. This work advances our current knowledge in Drosophila motion vision and sets the way for further exploring the intricate details of direction-selective computations.

      Strengths:

      Among the strengths of this work is the verification of the GABAergic nature of C2 and C3 with genetic and immunohistochemical approaches. In addition, double-silencing C2&C3 experiments help to establish a functional role for these cells. The authors holistically use the Drosophila toolbox to identify neural morphologies, synaptic locations, network connectivity, neuronal functions, and the behavioral output.

      Weaknesses:

      The authors claim that C2 and C3 neurons are required for direction selectivity, as per the publication's title; however, even with their double silencing, the directional T4 & T5 responses are not completely abolished. Therefore, the contribution of this inherited feedback in direction-selective computations is not a prerequisite for its emergence, and the title could be re-adjusted.

      We adjusted the title to “are involved in motion detection.”

      Connectivity is assessed in one out of the two available connectome datasets; therefore, it would make the study stronger if the same connectivity patterns were identified in both datasets.

      We did not assume large differences between the datasets because Nern et al. 2025 described no major sexual dimorphism. To verify this, we now plotted C2 and C3 connectivity from the three major EM datasets that include C2/C3 connectivity, the female FAFB dataset (Zheng et al. 2018, Dorkenwald et al. 2024, Schlegel et al. 2024) the male visual system (Nern et al. 2025), and the 7-column dataset (Takemura et al. 2015) and see no major differences (Author response image 2 and Author response image 3).

      Author response image 2.

      Relative pres- and post-synaptic counts for C3 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      Author response image 3.

      Relative pres- and post-synaptic counts for C2 from 3 different data sets. Shown are up to ten post- or pre-synaptic partner neurons.

      The mediating neural correlates from C2 & C3 to T4 & T5 are not clarified; rather, Mi1 is found to be one of them. The study could be improved if the same set of silencing experiments performed for C2-Mi1 were extended to C2 &C3-Tm1 or Tm4 to find the T5 neural mediators of this feedback inhibition loop. Stating more clearly from the connectomic analysis, the potential T5 mediators would be equally beneficial. Future experiments might also disentangle the parallel or separate functions of C2 and C3 neurons.

      We fully agree that one could go down this route. Given the widespread connectivity of C2 and C3, and the fact that these are time-consuming experiments with often complex genetics, we had decided to instead study the “compound effect” of C2 and C3 silencing by analyzing T4/T5 physiological properties and motion-guided behavior. We now explicitly explain this logic by saying, “To understand the compound effect of C2 and C3 on motion processing, we focused on the direction-selective T4/T5 neurons, which are downstream of many of the neurons that C2 and C3 directly connect to.”

      Finally, the authors' conclusions derive from the set of experiments they performed in a logical manner. Nonetheless, the Discussion could benefited from a more extensive explanation on the following matters: why do the ON-selective C2 and C3 neurons control OFF-generated behaviors, why the T4&T5 responses after C2&C3 silencing differ between stationary and moving stimuli and finally why C2 and not C3 had an effect in T5 DS responses, as the connectivity suggests C3 outputting to two out of the four major T5 cholinergic inputs.

      Apart from the behavioral screen results, we only tested ON edges in our more detailed behavioral characterizations. And while we show phenotypes for the OFF-DS cell T5, it is well established that inhibitory cells that respond to one contrast polarity can function in the pathway with the opposite contrast polarity (e.g., the OFF-selective Mi9 in the ON pathway). We realized that our narrative in the results section was misleading in this regard (we had given the ON selectivity of C2/C3 as one argument why we first focused on the ON pathway) and eliminated this argument.

      For the differential involvement of C2/C3 for T4/T5 responses to stationary and moving stimuli (C2 and C3 silencing affects both T4 and T5 DS responses, but mostly T4 flash responses): We mostly took the disinhibition of flash responses in T4 as a motivation to look more specifically at a potential role in motion-computation. We now added a sentence about the potential emergence of these flash responses to the already extensive discussion paragraph “How could inhibitory feedback neurons affect motion detection in the ON pathway?”

      Last, we added a discussion point about the relationship between C2 and C3 connectivity and the functional consequences, and discussed the fact that C3 connectivity alone does not correlate with a functional role of C3 (alone) in DS computation.

      Reviewer #3 (Public review):

      Summary:

      This article is about the neural circuitry underlying motion vision in the fruit fly. Specifically, it regards the roles of two identified neurons, called C2 and C3, that form columnar connections between neurons in the lamina and medulla, including neurons that are presynaptic to the elementary motion detectors T4 and T5. The approach takes advantage of specific fly lines in which one can disable the synaptic outputs of either or both of the C2/3 cell types. This is combined with optical recording from various neurons in the circuit, and with behavioral measurements of the turning reaction to moving stimuli.

      The experiments are planned logically. The effects of silencing the C2/C3 neurons are substantial in size. The dominant effect is to make the responses of downstream neurons more sustained, consistent with a circuit role in feedback or feedforward inhibition. Silencing C2/C3 also makes the motion-sensitive neurons T4/T5 less direction-selective. However, the turning response of the fly is affected only in subtle ways. Detection of motion appears unaffected. But the response fails to discriminate between two motion pulses that happen in close succession. One can conclude that C2/C3 are involved in the motion vision circuit, by sharpening responses in time, though they are not essential for its basic function of motion detection.

      Strengths:

      The combination of cutting-edge methods available in fruit fly neuroscience. Well-planned experiments carried out to a high standard. Convincing effects documenting the role of these neurons in neural processing and behavior.

      Weaknesses:

      The report could benefit from a mechanistic argument linking the effects at the level of single neurons, the resulting neural computations in elementary motion detectors, and the altered behavioral response to visual motion.

      We agree that we cannot fully draw this mechanistic argument, but we also do not think that this is a realistic goal of this study. Even in a scenario where one would measure the temporal and spatial properties of “all” neurons that are connected to C2 and C3, this would likely not reveal the full mechanisms linking the single neurons to DS computation, but would require silencing specific connections, or specific molecular components of the connection, or could be complemented by models. A beautiful example where such a mechanistic understanding was achieved, recently published in Nature, essentially focused on a single synaptic connection (between Mi9 and T4) (Groschner et al. 2024), and built on extensive work that had already highlighted the importance of these neurons. We would further argue that the field does not have a good understanding of how T4/T5 responses are translated into behavior. Although possible pathways emerge from connectomes, it is for example not understood why the temporal frequency tuning of T4/T5 substantially differs from the temporal frequency tuning of the optomotor response.

      We therefore would like to highlight that the focus of our study was not to connect all those pieces, but rather to highlight the hitherto unknown overall importance of inhibitory feedback neurons for visual computations along the visual hierarchy, from individual neuron properties, via DS computation, to the temporal precision of the optomotor response.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood."

      This is incorrect not only because it is referred to as a general statement, but also because many studies have examined inhibition in flies. It may not be solely GABAergic inhibition, but that is just one type. While some discussions later address feedback from horizontal cells in the retina, etc., there is no mention of work on color vision, which requires feedback. Please rephrase.

      We now say “visual motion processing” in this sentence, and added a sentence on color vision: “... color-opponent signalling requires reciprocal inhibition between photoreceptors as well as feedback inhibition from distal medulla (Dm) neurons. (Schnaitmann et al., 2018, Heath et al., 2020, Schnaitmann et al., 2024). “

      (2) Line 197: "Because a previous studies" One or many?, but more important, please cite them.

      We corrected to “a previous study” and cite Tuthill et al. 2013

      (3) Line 172: I noticed a few minor grammatical errors and wording issues, such as the use of "we next" twice in one sentence. "To next identify potential GABAergic neurons that are important for motion computation in the ON pathway, we next intersected 12 InSITE-Gal4." I am bad at picking them out, but since I noticed them, I would strongly suggest looking at the text carefully again.

      We deleted one occurrence of ‘next’, thank you for catching that.

      (4) Question to the authors. Why did you use twice independent lines and not checkers for the white noise analysis in Figure 3e?

      We used flickering bars because many visual system neurons tested in our lab respond with a better signal-to-noise ratio as compared to checkerboards. Flickering bars also appear to be more suited to isolate the spatial surround of neurons. This type of stimulus has been successfully used in previous studies to extract receptive fields of neurons in the fly visual system (Arenz et al. 2017; Leong et al., 2016, Salazar-Gatzimas et al. 2016; Fisher et al. 2015, …).

      (5) Line 248: "Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects..." Please state how here. I would need to go to the methods.

      We added a sentence “C2 was silenced by expression of UAS-shibire<sup>ts</sup> (UAS-shi<sup>ts</sup>) for temporal control of the inhibition of synaptic activity.”

      (6) Much of the work in the blowfly uses picrotoxinin to block GABAergic inhibition in the visual motion pathway. It would be useful to mention some of this early work and its results, particularly that of Single et al. (1997). It might be interesting to reinterpret their results.

      Thank you for pointing this out. We added this paragraph to the discussion: ‘Work in blowflies has found a severe impact of GABAergic signaling for DS in LPTCs downstream of T4 and T5 cells, using application of picrotoxin to the whole brain (Single et al. 1997; Schmid and Bülthoff 1988). Although the loss of DS in LPTCs could originate from direct inhibitory synapses onto LPTCs (Mauss et al. 2015; Ammer et al. 2023), the disruption of GABAergic signaling in upstream circuitry, which reduces DS in T4 and T5, may also contribute to the phenotype seen in LPTCs.’

      Reviewer #2 (Recommendations for the authors):

      The following set of corrections aims to better the scientific and presentation aspects of this work.

      (1) The title of the work implies that C2 and C3 neurons are required for motion processing, whereas the study shows their participation in motion computations, which persists post their silencing. Therefore, "Inhibitory columnar feedback neurons contribute to Drosophila motion processing" would be a more appropriate title.

      We rephrased the title to say that inhibitory feedback neurons “are involved in” motion processing.

      (2) The morphology of C2 and C3 neurons, i.e., ramifications in medulla & cell body in medulla and axonal targeting to lamina, implies their feedback role. It would be important to mention the specific feedback loop they participate in and the role of Mi1 more extensively in lines 36, 120.

      We find it hard to speculate on the specific feedback loops that C2 and C3 are involved in from their widespread input and output connectivity. If we had, we would have wanted to support this by functional measurements of this specific loop, which was not the goal of this study.

      (3) In lines 55-89, the authors explore the instances of feedback inhibition within and across species and modalities. For the Drosophila visual example (lines 76-89), given that it also addresses motion circuits, the following studies should be included:

      Ammer, G., Serbe-Kamp, E., Mauss, A.S., et al. Multilevel visual motion opponency in Drosophila. Nat Neurosci 26, 1894-1905 (2023). https://doi.org/10.1038/s41593-023-01443-z. Mabuchi Y, Cui X, Xie L, Kim H, Jiang T, Yapici N. Visual feedback neurons fine-tune Drosophila male courtship via GABA-mediated inhibition. Curr Biol. 2023 Sep 25;33(18):3896-3910.e7. doi: 10.1016/j.cub.2023.08.034.

      We added a sentence on the Ammer et al. finding to the introduction. Since the introduction paragraph focuses on known physiological effects within the visual system, we did not find a good fit for the Mabuchi et al. study, which focuses on serotonergic feedback neurons with a role far downstream in courtship behavior.

      (4) In lines 102-103, the following work should be referenced: Groschner LN, Malis JG, Zuidinga B, Borst A. A biophysical account of multiplication by a single neuron. Nature. 2022 Mar;603(7899):119-123. doi: 10.1038/s41586-022-04428-3.

      We cited a few of the many papers that used “modeling frameworks” and selected the ones focusing on the entire feedforward circuitry. To also give credit to the Borst lab, we instead added Serbe et al. 2016 here.

      (5) In lines 107-108, the Braun et al. (2023) study has not performed Rdl knockdown experiments in T4 cells; hence, it needs to be better clarified in the text.

      We corrected this in the text.

      (6) Even though the dataset was previously published, a summary plot of the different phenotypes would be very helpful to the reader. Moreover, in line 131, as the study focuses on motion vision, it would be better to use "early motion visual processing" rather than "early visual processing.”

      We added a summary plot of the behavioral screen data to Supplementary figure 1, and rephrased previous line 131.

      (7) The first result section title excludes C3 neurons, even though in lines 172-179 they are addressed; therefore, the C3 inclusion is suggested as in "GABAergic C2 and C3 neurons control behavioral responses to motion cues". The term "required" should be excluded from the title as the other neuronal types encountered in the InSITE drivers were never quantified; thus, the "behavioral requirement" might come from these other neurons as well.

      From the experiments shown in this paragraph alone we cannot make conclusive claims about C3, as it was also weakly visible in one of our genetic control in the intersectional strategy that we took (we had written: “This strategy also revealed other GABAergic cell types, including the columnar neuron C3 and the large amacrine cell CT1 which were however also weakly present in the gad1-p65AD control).

      We changed the title of this paragraph to: A forward genetic behavioral screen identifies GABAergic C2 neurons to be involved in motion detection.

      (8) In line 142, it should be clearly stated that the MultiColor FlpOut technique was used and should also be cited: Nern A, Pfeiffer BD, Rubin GM. Optimized tools for multicolor stochastic labeling reveal diverse stereotyped cell arrangements in the fly visual system. Proc Natl Acad Sci U S A. 2015 Jun 2;112(22):E2967-76. doi: 10.1073/pnas.1506763112.

      We did not use MCFO clones, but simple Flp-out clones, and the genotype and reference for this were given in the methods: UAS-FRT-CD2y+-RFT-mCD8::GFP; UAS-Flp , (Wong et al. 2002). To make this clearer, we now also cite (Wong et al. 2002) in the results section.

      (9) In Figure 1c, a description of RFP should be written as it is already in Supplementary Figure 1c.

      We added this to the Figure caption.

      (10) In line 172, "next" is redundant as it was previously used at the beginning of the sentence.

      Removed

      (11) In line 175, based on both figures that the authors refer to, instead of C2, C3 should be written.

      We do indeed see C3 labeled in the images, but also in a gad1-p65AD control. We thus cannot be sure if C3 indeed reflects the intersection pattern. However, the three lines shown in Figure 1d clearly also label C2, which is not seen in the control condition.

      (12) In line 184, a split-C2 line is used (and a split C3 as in Supplementary Figure 2). It would enhance the credibility of the work and even be appropriate afterwards to use the word "requirement" if this split-C2 line was used for behavioral experiments, as in Gohl et al., 2011, and Sillies et al.,2013 studies.

      We are indeed using the same split-C2 line for imaging and for behavioral experiments in Figure 7. We see Figure 1 (and with that, Silies et al. 2013) as a first pass screen, from which we obtained candidates, which we then more thoroughly tested throughout the remaining manuscript, with more specific lines. We are no longer using the word “requirement”

      (13) In lines 186-188, is DenMark used as a postsynaptic marker? If yes, an additional control would be the use of Discs-large (DLG) as a postsynaptic marker, as DenMark would not be restricted to postsynaptic densities.

      Yes, we used DenMark as written in the sentence “we expressed GFP-tagged Synaptotagmin (Syt::GFP) to label pre-synapses together with the dendritic marker DenMark (Nicolai et al., 2010)”. Since our claims about widespread C2 and C3 connectivity are further supported by connectomics, we did not use another postsynaptic marker.

      (14) In line 191, L2 is mentioned as presynaptic, whereas in Figure 2b is clearly postsynaptic.

      We write “This revealed that C2 forms several presynaptic contacts with the lamina neurons L5, L1, and L2” . L5, L1, and L2 are hence postsynaptic to C2, which is what is plotted in Figure 2b. 

      (15) In line 197, the "a" in "because a previous studies" should be removed, and these studies should be cited as the authors do in line 514.

      Done as suggested.

      (16) In line 1191, the figure title uses the term "required", whereas the plotted data suggest that T4 and T5 responses remain DS after C2&C3 silencing. Rephrasing to "C2 and C3 affect direction-selective.." would be better suited.

      We replaced “required” with “contribute to”

      (17) In the legend of Figure 2b, the "Counts of synapses" is misleading. The number plotted refers to the percentage of synapse counts from the target neuron.

      Corrected.

      (18) A general question about the C2 and C3 ON selectivity: How would the authors explain the OFF deficits from the published behavioral screening in Supplementary Figure 1a? Do the other InSITE neurons contribute to it? This needs to be further elaborated in the discussion.

      A neuron being ON selective does not imply that it is functionally required in the ON pathway only. In fact, Mi9, a major component of the ON pathway (even if not “required” under many stimulus conditions), is OFF selective.

      Furthermore, both we (Ramos-Traslosheros and Silies, 2021) and others (Salazar-Gatzimas et al. 2019) have shown that both ON and OFF signals are combined in ON and OFF pathways, which is further supported by connectomics data. We clarified the transition from physiology to function in the results section, as already explained above.

      (19) In line 216, the authors' image from layer M1, but the reasoning behind this choice is missing. The explanation gap intensifies after you proceed with further examining the layer-specific responses in Supplementary Figure 2. Is this because C2 and C3 receive their inputs in M1, as is insinuated in line 219?

      As Supplementary Figure 2 shows, we initially imaged from all layers of the medulla, where C2 arborizes. Because the response properties, including kinetics, weren’t different, we had no reason to believe that C2 is highly compartmentalized. We thus subsequently focused on layer M1, where amplitudes were highest. We clarified this in the text.

      (20) In line 229, it should be clear whether the STRFs come from M1 measurements. STRF analysis in M5, M8, and M9/10 also verifies that the C2, C3 multicolumnar span would further strengthen the results. Given the focus of the work in Mi1 and T4/T5, Mi1-C2 connections should be clarified in terms of which medulla layer they formulate. Additionally, the reasoning behind showing in Figure 3 STRFs from M1 measurements, even though Supplementary Figure 2b implies equal responses in M9/10, where also Tm1 and Tm4 output from C3, should be explained.

      We never recorded STRFs in the silenced condition and make no claims about C2 changing spatial properties of Mi1. We added the information that STRFs were recorded in layer M1 to the figure caption. We checked the specific connectivity of C2 and Mi1 and they indeed connect in M1 (Author response image 4), but regardless of this result, there is no evidence for compartmentalization in these columnar neurons.

      Author response image 4.

      Image of a C2 (blue) and Mi1 (yellow) neuron from EM Data (FAFB). Circles depict synapses from C2 to Mi1 in layer M1 of the medulla.

      (21) In Figure 3e, the statistical significance or lack thereof is not visible at the bar plot.

      Consistently throughout the manuscript, we now just indicate if a comparison is significant. If nothing is shown, it means that it is not.

      To clarify this, we added a sentence to the statistics section in the methods now saying: We show significant differences in figures using asterisks (p<0.05 *,p<0.01 **, p<0.001***). Non-significant differences are not further indicated.

      Please note that based on another reviewer comment, we also adapted the analysis of the kernels. This changed the statistics to be significant for the timing of the on peak response (Figure 3e).’

      (22) In line 249, it is mentioned that the strongest C2 connection is Mi1; this does not derive from the data shown in Figure 2b.

      We intended to look at medulla neurons, and Mi1 is the most connected medulla neuron to C2. We clarified that in the text, which now reads: “Because C2 emerged as a prominent candidate from the behavioral screen, we focused on C2 and asked how silencing C2 affects temporal and spatial filter properties of the medulla neurons that provide direct input to T4 neurons. We chose to test Mi1 as it is the medulla neuron most strongly connected to C2.”

      (23) The result section title "C2 & C3 neurons shape response properties of the ON pathway medulla neuron Mi1" does not include C3 results. This would be fundamental to have. As previously mentioned, the neural correlates of this inhibitory feedback loop should be clearly defined, and the current version of this work evades doing so.

      We corrected the title. As discussed elsewhere, it was not the goal of this study to work the specific contributions of C2 (and C3) to all neurons they connect to, but rather focus on the compound effect for motion detection.

      (24) In line 276, the following work should be cited: Maisak MS, Haag J, Ammer G, Serbe E, Meier M, Leonhardt A, Schilling T, Bahl A, Rubin GM, Nern A, Dickson BJ, Reiff DF, Hopp E, Borst A. A directional tuning map of Drosophila elementary motion detectors. Nature. 2013 Aug 8;500(7461):212-6. doi: 10.1038/nature12320.

      We added the citation.

      (25) In line 273, the title implies the investigation of the spatial filtering of T4 and T5 cells. This does not take place in the respective result section.

      We changed the title to: “C2 and C3 shape temporal and spatial response properties of T4 and T5 neurons.”

      (26) In line 280, Kir2.1 is used, whereas previously thermogenetic silencing with Shibirets was preferred; could the authors elaborate on this choice in the text, for example, genetic reasons?

      We generally prefer shibire[ts] because of its inducible nature. However, our T4/T5 recordings too included more stimuli (motion stimuli) than the Mi1 recordings, and the effect of shi[ts] mediated silencing by pre-heating the flies (as established by Joesch et al. 2010) was not longlasting enough for these experiments, which is why we used Kir2.1. In a previous set of experiments, we had tried incubating flies while imaging, but this induced too large movements of the brain and T4/T5 recordings were not stable enough.

      (27) In lines 290-291, T5 ON suppression is found to be affected by C2 silencing, but the bar plot in Figure 5b uses the OFF-step data. It would be best if the ON-step data for T5 cells were also plotted.

      ON-step data for T5 are plotted in Supplementary Fig. 3e

      (28) In line 288, "when C2 was also blocked", "also" should be included, as you are referring to double silencing.

      Sorry for the confusion, we called the wrong figure in that sentence. Here, we wanted to point at the increased response of T4 to the ON-step upon C2 silencing, which was quantified in Supplementary Fig. 3e.

      (29) In line 312, it is important to mention in the discussion why it is the case that C2 and not C3 had an effect on T5 DS responses. C2 outputs to Tm1, whereas C3 to Tm1 and Tm4, based on Figure 2b, with Tm1 and Tm4 being one of the four major cholinergic T5 inputs. Hence, it would be natural to think that C3 and not C2 would affect T5 responses.

      We addressed this in the discussion.

      (30) In lines 326-328, it is crucial to mention the neural correlates that connect C2 and C3 to T4 and T5. Additionally, the Shinomiya et al. (2019) study shows C3 to T4 connections, which are mentioned in the discussion and should be cited in line 429.

      We do not think that mentioning neural correlates at this point is crucial, as these sentences were concluding a paragraph in which we link C2/C3 silencing to T4/T5 responses. We also do not know the neural correlates (but for Mi1) so this would not be accurate.

      We have been mentioning C3 to T4 connection in both the results and discussion, and our analysis (Figure 2) stems from the FAFB dataset. We added citations to both results and discussion.

      (31) In Figure 6a, compared to Figure 3b, the term compass plots is used instead of polar plots. It would be best to use one consistent term. Additionally, in Figure 6c, it is not mentioned if the responses across genotypes are the outcome of averaging across subtype responses.

      These two plots are not the same; a compass plot is a sub-category of polar plots. Polar plots, as in Figure 3, show the response amplitude of the neurons to the different directions of motion. Instead, compass plots, as in Figure 6, show vectors that depict the tuning direction and the strength of tuning of individual neurons.

      We added the following sentence to clarify the calculation in Figure 6c: ‘To average responses of all neurons, the PD of each neuron was determined by its maximal response to one of 8 directions shown.'

      (32) In line 344, the title could be adjusted to "C2 is controlling the temporal dynamics of ON behavior", under the same reasoning of 'requirements' explained before.

      We think that “is controlling” is a stronger claim than “being required”. For a geneticist, the word “required” simply means that there is a(ny) loss of function phenotype, i.e., a reduction in DS when C2 and C3 are silenced/blocked. Many neurons are sufficient but not required to induce a certain behavior (i.e., they can induce a behavior when ectopically activated, but show no significant loss of function phenotype). We therefore consider it remarkable that C2 and C3 silencing indeed shows a significant reduction in DS.

      However, we do not want to overclaim anything, and the title now reads: “T4 tunes the temporal dynamics of ON behavior”

      (33) In Figure 7c, the plot legend should be "deceleration".

      Corrected

      (34) In line 424, the Braun et al. (2023) experiments were performed in T5 cells as previously mentioned.

      Corrected

      (35) In line 435, the authors mention that both ON-selective C2 and C3 neurons act partially in parallel pathways. In Figure 2b, the upstream circuitry between C2 and C3 is identical. How would they explain the functional-connectivity contradiction?

      In terms of acting in parallel pathways, downstream, not upstream, connectivity of C2 and C3 will matter, which is not identical. C2 for example connects to Mi1, L1, and L4, whereas C3 does not. On the other hand, C3 connects to Mi9 and Tm4, which C2 does not.

      (36) In lines 445-447, the authors address C2 and C3 neurons as columnar, whereas they previously showed in Figure 3 that they are multicolumnar.

      Here, we refer to the nomenclature of Nern et al, that use the term “columnar” whenever something is present in each column. We specifically define this by saying “only 15 cells are truly columnar in the sense that they are present once per column and present in each column”. In the results section, we instead talk about “functionally multicolumnar” and changed a sentence in the discussion to say “The spatial receptive fields of C2 and C3 are consistent with the multicolumnar branching of their projections in the medulla” to avoid any such confusion.

      (37) In line 448, "thus" is repetitive, and the extracted view in line 449 does not contribute to the essence of the study.

      Fixed.

      (38) In line 459, the authors refer to inhibition inheritance; this term should be used frequently in the text in case the neural correlates between C2 & C3 and T4 & T5 are not deciphered.

      We think this point is very clear throughout the manuscript now. As one prominent example, we added a sentence to the first paragraph of the discussion saying “Given the widespread connectivity of C2 and C3 to neurons upstream of T4/T5, this effect [on DS tuning] is likely inherited from upstream neurons of T4/T5.”

      (39) In line 521, the transition between sentences is problematic.

      Corrected

      (40) For Supplementary Figure 1, why were the ON-motion deficits not addressed with the antibody approach used for Supplementary Figure 1a?

      The approach using anti-GABA stainings turned out to be largely redundant with the intersectional strategy. Furthermore, the intersectional strategy provided the full morphology of the cell and, hence, led to easier identification of the cell types involved.

      (41) In line 1169, C2 is mentioned, whereas C3 is annotated in the figure.

      Corrected

      (42) A general comment is that Tm1 inputs could be a good candidate for assessing T5 inputs, as performed for Mi1-T4 in Fig.4. Such experiments would enhance the understanding of inhibitory inheritance to T5 responses.

      We fully agree.

      (42) Do the authors have any indication or experiments done regarding the C2&C3 role in T4&T5 velocity tuning? This would be complementary to the direction of this study.

      This is a good idea, that we had tried. However, we did not see a difference between control and C2 silencing for the temporal frequency tuning of T4/T5. As velocity is closely related to temporal frequency tuning, we would not expect to see a difference there either.

      While it would have been nice to be able to draw such a link, we would also state that our behavioral data are a bit different: We did not look at temporal frequency tuning per se, and overall, it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tunings (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013). 

      (43) As a suggestion, Figure 7 would be better positioned as Figure 4, right after the ON-selectivity finding of C2 neurons.

      We preferred to keep the current order.

      Reviewer #3 (Recommendations for the authors):

      Main recommendation:

      It would be useful to propose a neural circuit model that connects the various observations. One can draw here on the many circuit models for motion vision in the prior literature.

      (1) How might the extended response in upstream neurons Mi1 lead to the inappropriate nulldirection responses in T4/T5?

      This is a good question and we can only speculate. Mi1 responses are enhanced upon C2 silencing and T4 responses to full field flash responses are also enhanced. Likely, these motionindependent responses are also seen when the edge travels into the non-preferred direction, whereas this non-motion response would likely be masked by the motion response to the preferred direction. The phenotype seen in T5 is likely inherited from medulla neurons, e.g. Tm1, to which C2 connects. How the delay of the Mi1 response upon C2 silencing may specifically affect ND responses, we don’t know. 

      (2) How is the loss of DS in T4/T5 compatible with the continued sensitivity to motion in the turning response? Perhaps the signal from 180-degree oppositely tuned T-cells gets subtracted, so as to remove the baseline activity?

      This is a great question that we cannot answer. Overall, perturbations that affect T4/T5 physiology do not necessarily manifest in equivalent phenotypes when looking at behavioral turning responses. Prominent examples come from silencing core neurons of motion-detection circuits, such as Mi1 and Tm3 (see Figure 4, Strother et al. 2017).

      (3) How do the altered dynamics in upstream neurons relate to the loss of high-frequency discrimination in the behavior? One would want to explain why the normal fly has a pronounced decay in the response even though the motion is still ongoing (Figure 7b left, starting at 0.4 s). That decay is missing in the mutant response.

      That is an excellent question that we unfortunately do not have an answer for. Please note that our visual stimuli is a single edge which is sweeping across the eye, and which might not elicit equally strong responses at each position of the eye, or each time during the stimulus presentation.

      In terms of linking the dynamics of upstream neurons to behavior, we already pointed out above that it is not well understood how responses in T4/T5 relate to behavior, as they for example have different frequency tuning, with T4/T5 neurons being tuned to lower temporal frequencies than the turning behavior of a fly walking on a ball (T4/T5 physiology: Maisak et al., 2013, Arenz et al., 2017; optomotor behaviour: Strother et al.,2017, Clark et al., 2013).

      Other recommendations:

      (1) Abstract line 37 "At the behavioral level, feedback inhibition temporally sharpens responses to ON stimuli, enhancing the fly's ability to discriminate visual stimuli that occur in quick succession." It may be worth specifying *moving* stimuli.

      Done as suggested

      (2) Line 52: "The functional significance of feedback neurons, particularly inhibitory feedback mechanisms, in early visual processing is not understood." This seems overly negative. Subsequent text mentions a number of such instances that are understood, and one could add more from the retina.

      We agree. We rephrased to say ‘motion vision’ and added more examples of known roles of feedback inhibition

      (3) Line 69: "inhibitory feedback signals from horizontal cells and amacrine cells to photoreceptors and bipolar cells, respectively, are involved in multiple mechanisms of retinal processing, including global light adaptation, spatial frequency tuning, or the center-surround organization (Diamond 2017)." Maybe add the proven role in temporal sharpening of responses, which is of relevance to the present report.

      We added temporal sharpening to that introduction point.

      (4) Figure 1: The text for this figure talks about behavioral motion detection deficits in various lines. Maybe add an example of the behavioral effects to this figure.

      We added a summary plot of the behavioral screen data to Supplementary figure 1.

      (5) Line 325: "the timing of the ON peak tended to be slower for C3 compared to C2 for both the vertical and the horizontal STRF": It's hard to see evidence for that in the data.

      Based on your next comment we reanalysed the kernels of C2 and C3. This resulted in a significant difference in peak timing between C2 and C3. 

      (6) When presenting kernels as in Figure 3d and Figure 4b, extend the time axis to positive times until the kernel goes to zero. This "prediction of future stimuli" allows the reader to see the degree of correlation within the stimulus, which affects how one interprets the shape of the kernel. Also, plotting the entire peak gives a better assessment of whether there are any shape differences between conditions. An alternative is to compute the kernel via deconvolution, which gets closer to the actual causal kernel, but that procedure tends to highlight high-frequency noise in the measurement.

      We replotted the kernels in Figure 3d and 4b to show positive times. The kernels of C2 and C3 stayed at a positive level. Going back through the data we found a severe decrease in GCaMP signal in the first 2 seconds of the recording. We reanalyzed the kernels by ignoring the first seconds. All kernels now go back to zero. The shape of the kernels did not change but we now find a significant difference in peak timing between C2 and C3. Thank you for pointing this out.

      (7) Line 280 "simultaneously blocked C2 and C3 using Kir2.1": First use of that acronym. Please explain what the method is.

      We now explain “we simultaneously blocked C2 and C3 by overexpression of the inwardrectifying potassium channel Kir2.1”

      (8) Line 350 "temporal dynamics for C2 silencing": suggests "dynamics of silencing"; maybe better "response dynamics during C2 silencing".

      Edited as suggested

      (9) Figure 7: Explain the details of the stimulus containing two subsequent on edges. What happens between one edge and the next? Does the screen switch back to black? Or does the second edge ride on top of the final level of the first edge? This matters for interpreting the response.

      Yes, the screen turns dark between subsequent edge presentations. We added a sentence to the methods to clarify that. 

      (10) Line 402 "novel, critical components of motion computation.": This seems exaggerated. At the behavioral level, motion computation is mostly unaffected, except for some details of time resolution. Whether those matter for the fly's life is unclear.

      We deleted the word ‘critical.’

      (11) Line 413 "GABAergic inhibition required for motion detection is mediated by C2 and C3": Again, this seems exaggerated. Motion *detection* appears to work fine, but the *discrimination* of two closely successive motion stimuli is affected. The rest of the text does properly distinguish "discrimination" from "detection".

      We changed the title to say: ‘GABAergic inhibition in motion detection is mediated by C2 and C3.’

      (12) Line 489 "Whereas the role of C2 and C3 for the OFF pathway may be more generally to suppress neuronal activity,": Unclear to what this refers. The present report emphasizes that there is no effect on OFF activity (Figure 5).

      We did not see an effect of T5 responses to OFF flashes as shown in Figure 5 but we found a significant reduction of DS when silencing C2, as well as slightly overall increased responses to all directions for C2 and C3 silencing, which was significant for null directions when silencing C2. This is shown in Figure 6.

      Typos:

      (1) Line 521.

      Fixed

      (2) Line 1170: context of the citation unclear.

      Fixed

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This report provides useful evidence that EABR mRNA is at least as effective as standard S mRNA vaccines for the SARS-CoV-2 booster vaccine. Although the methodology and the experimental approaches are solid, the inconsistent statistical significance throughout the study presents limitations in interpreting the results. Also, the absence of results showing possible mechanisms underlying the lack of benefit with EABR in the pre-immune makes the findings mostly observational.

      Thank you for your assessment of our study. Respectfully, we do not agree that our study shows a lack of benefit of using the EABR approach. For the monovalent boosters, the S-EABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the regular S mRNA booster, which is consistent with the findings from our prior study in naïve mice. In addition, the bivalent S-EABR booster consistently elicited the highest neutralizing titers against all tested variants, including significantly higher titers against BA.5 and BQ.1.1 than the monovalent S booster. The bivalent S-EABR booster also induced detectable neutralization activity in a larger number of mice than all other boosters.

      Consistent with this analysis, please note that reviewers 1 and 2 commented that “the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant” (reviewer 1) and “the authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting” (reviewer 2).

      We agree with the reviewers’ assessment that the EABR booster-mediated improvements were mostly modest, in particular against the BQ.1.1 and XBB.1 strains. We also acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and time-consuming given that we already included 10 mice per group, which is standard practice in the vaccine field.

      Finally, we also wish to point out that we did include experiments that addressed potential mechanistic differences between booster groups. For example, we conducted deep mutational scanning studies to determine polyclonal antibody epitope mapping profiles, showing that bivalent S-EABR boosters induced more balanced targeting of multiple RBD epitopes, which likely contributed to the observed improvements in neutralization. Our work also included cryo-EM studies demonstrating that bivalent S mRNA boosters promote heterotrimer formation, which could potentially drive preferential stimulation of cross-reactive B cells via intra-spike crosslinking. This represents a potential mechanism explaining how bivalent boosters outperformed monovalent boosters in our and many prior studies, which warrants further investigation. Finally, we also performed serum depletion assays, showing that the BA.5 neutralizing activity elicited by the bivalent Wu1/BA.5 S and S-EABR mRNA boosters was primarily driven by cross-neutralizing Abs induced by the primary vaccination series.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigated the immunogenicity of a novel bivalent EABR mRNA vaccine for SARS-CoV-2 that expresses enveloped virus-like particles in pre-immune mice as a model for boosting the population that is already pre-immune to SARS-CoV-2. The study builds on promising data showing a monovalent EABR mRNA vaccine induced substantially higher antibody responses than a standard S mRNA vaccine in naïve mice. In pre-immune mice, the EABR booster increased the breadth and magnitude of the antibody response, but the effects were modest and often not statistically significant.

      We thank the reviewer for their accurate summary of our study. Please see our comments to the reviewer’s individual points below, as well as our responses to the editor’s assessment above.

      Strengths:

      Evaluating a novel SARS-CoV-2 vaccine that was substantially superior in naive mice in pre-immune mice as a model for its potential in the pre-immune population.

      Weaknesses:

      (1) Overall, immune responses against Omicron variants were substantially lower than against the ancestral Wu-1 strain that the mice were primed with. The authors speculate this is evidence of immune imprinting, but don't have the appropriate controls (mice immunized 3 times with just the bivalent EABR vaccine) to discern this. Without this control, it's not clear if the lower immune responses to Omicron are due to immune imprinting (or original antigenic sin) or because the Omicron S immunogen is just inherently more poorly immunogenic than the S protein from the ancestral Wu-1 strain.

      The reviewer raises an important point, and we agree that including additional groups receiving three immunizations with the bivalent spike and/or spike-EABR mRNA vaccines would have improved the experimental design. However, we believe that several prior studies have already demonstrated that Omicron S immunogens are not inherently poorly immunogenic compared to the ancestral S; e.g., Scheaffer et al., Nat Med (2022); Ying et al., Cell (2022); Muik et al., Sci Immunol (2022). Based on these prior reports, we conclude that the lower neutralizing titers against Omicron variants in our study are most likely driven by immune imprinting as a result of the initial vaccination series with the ancestral S immunogen.

      (2) The authors reported a statistically significant increase in antibody responses with the bivalent EABR vaccine booster when compared to the monovalent S mRNA vaccine, but consistently failed to show significantly higher responses when compared to the bivalent S mRNA vaccine, suggesting that in pre-immune mice, the EABR vaccine has no apparent advantage over the bivalent S mRNA vaccine which is the current standard. There were, however, some trends indicating the group sizes were insufficiently powered to see a difference. This is mostly glossed over throughout the manuscript. The discussion section needs to better acknowledge these limitations of their studies and the limited benefits of the EABR strategy in pre-immune mice vs the standard bivalent mRNA vaccine.

      We acknowledge that the improvements in titers did not reach statistical significance in many cases, which we believe could have been addressed by adding more animals to our cohorts. Unfortunately, that would have been prohibitively expensive and timeconsuming given that we already included 10 mice per group, which is standard practice in the vaccine field. We added a “Limitations of the study” section at the end of the discussion to address all of these points in detail (lines 570-598 in the revised version).

      (3) The discussion would benefit from additional explanation about why they think the EABR S mRNA vaccine was substantially superior in naïve mice vs the standard S mRNA vaccine in their previously published work, but here, there is not much difference in pre-immune mice.

      As we pointed out in our response to the editor’s assessment above, the monovalent SEABR mRNA booster improved neutralizing antibody titers by 3.4-fold against BA.1 (p = 0.03; Fig. S5) and 4.8-fold against BA.5 (failed to reach statistical significance; Fig. 3B) compared to the conventional monovalent S mRNA booster, which is largely consistent with the findings from our prior study in naïve mice. Although the bivalent S-EABR mRNA booster consistently elicited higher neutralizing titers than the conventional bivalent S mRNA booster, we agree with the reviewer that these improvements were modest and not statistically significant. Overall, neutralizing activity against later Omicron variants, such as BQ.1.1 and XBB.1 was low. We attributed this finding to immune imprinting (see response to point (1) above) and acknowledged that the EABR approach was not able to effectively overcome this effect (see discussion section of the paper, lines 537-558; and “Limitations of the study” section, lines 570-598 in the revised version).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Fan, Cohen, and Dam et al. conducted a follow-up study to their prior work on the ESCRT- and ALIX-binding region (EABR) mRNA vaccine platform that they developed. They tested in mice whether vaccines made in this format will have improved binding/neutralization antibody capacity over conventional antigens when used as a booster. The authors tested this in both monovalent (Wu1 only) or bivalent (Wu1 + BA.5) designs. The authors found that across both monovalent and bivalent designs, the EABR antigens had improved antibody titers than conventional antigens, although they observed dampened titers against Omicron variants, likely due to immune imprinting. Deep mutational scanning experiments suggested that the improvement of the EABR format may be due to a more diversified antibody response. Finally, the authors demonstrate that co-expression of multiple spike proteins within a single cell can result in the formation of heterotrimers, which may have potential further usage as an antigen.

      We thank the reviewer for their support and for the accurate summary and evaluation of our study.

      Strengths:

      (1) The experiments are conducted well and are appropriate to address the questions at hand. Given the significant time that is needed for testing of pre-existing immunity, due to the requirement of pre-vaccinated animals, it is a strength that the authors have conducted a thorough experiment with appropriate groups.

      (2) The improvement in titers associated with EABR antigens bodes well for its potential use as a vaccine platform.

      Weaknesses:

      As noted above, this type of study requires quite a bit of initial time, so the authors cannot be blamed for this, but unfortunately, the vaccine designs that were tested are quite outdated. BA.5 has long been replaced by other variants, and importantly, bivalent vaccines are no longer used. Testing of contemporaneous strains as well as monovalent variant vaccines would be desirable to support the study.

      We thank the reviewer for bringing up this important point. We agree that the variants used for this study are now outdated, and it would have been informative to evaluate conventional and EABR boosters against contemporaneous strains. However, as the reviewer correctly pointed out, this type of study requires a substantial amount of time to conduct and will therefore will likely always be outdated by the time the data are analyzed and prepared for publication. To accurately assess immune responses against recent or current strains in mice, multiple boosters would have been needed to mimic the pre-existing immune context in the human population in 2025. Assuming intervals of 6-7 months between boosters (as used in this study to mimic booster intervals in the human population as closely as possible), this type of study would have been challenging to conduct, especially given the limited lifespan of mice. Thus, we performed this proof-of-concept study using outdated variants to assess the potential of EABR-modified boosters. We greatly appreciate the reviewer’s understanding and acknowledge this limitation of our study, which is highlighted in the added “Limitations of the study” section in the revised version of the manuscript (lines 570-598).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The acronym RBD in the title should be spelled out.

      We thank the reviewer for raising this point. We made this change in the revised version of the paper.

      (2) Lines 167-168 describe no differences between the cohorts at day 244. It should also be stated that for all timepoints, there are no significant differences.

      We modified the revised manuscript according to the reviewer’s suggestion (line 170).

      Reviewer #2 (Recommendations for the authors):

      (1) Given the focus on developing broad vaccines for future coronavirus outbreaks, it would be particularly informative to test whether the EABR antigens elicit broadened/heightened responses against other (beta)coronaviruses. If enough serum is left, it would seem straightforward to conduct neutralization assays against non-SARSCoV-2 coronaviruses.

      We thank the reviewer for this valid suggestion. Unfortunately, the extensive analysis of the serum samples, including spike and RBD ELISAs and neutralization assays against multiple variants, deep mutational scanning, and depletion assays, used up the serum samples for most mice. We agree that it would be interesting to investigate whether bivalent EABR boosters elicit pan-sarbecovirus responses in future studies.

      (2) In the bar plots for antibody titer changes, shown as log10 fold change, it is quite hard to interpret the difference between bars (e.g., what is the fold change difference between each bar in the same time point?). A table of mean {plus minus} SD values would be helpful.

      That’s a great suggestion. We added a table (Table S1) presenting all the geometric mean neutralization titers for all timepoints and variants in the revised version of the manuscript.

      (3) The development of heterotrimers as potential antigens is very interesting, but it seems out of place in the current manuscript. This should likely be in a separate, standalone manuscript.

      We thank the reviewer for commenting on the heterotrimer part of our manuscript. The presented work was not intended to advance the development of heterotrimers as potential antigens. Instead, our findings demonstrate that bivalent spike mRNA vaccines readily generate heterotrimers, which could promote intra-spike crosslinking and potentially impact antibody epitope targeting profiles as suggested by the deep mutational scanning data for the bivalent S-EABR mRNA booster (Fig. 4; Fig. S7-8). We think this is an important consideration that warrants further investigation with regards to the development of future bivalent or multivalent vaccines.

      (4) As a minor note, the sequences of the variants used or accession numbers should be provided in the Methods, since different groups have used different mutations for variants.

      We added the accession numbers for the vaccine strains used in this study (lines 604605).

    1. Reviewer #3 (Public review):

      Summary:

      In the paper "Deep mutational scanning reveals pharmacologically relevant insights into TYK2 signaling and disease", the authors perform a comprehensive deep mutational scan of the kinase TYK2, a protein of pharmacological interest due to its central role in multiple immune-related phenotypes. The study assesses two key functional phenotypes: protein abundance and IFN-α-dependent signaling. The signaling assays were conducted across a dose-response range under various inhibitor conditions, allowing for an in-depth characterization of TYK2 activity and regulation. Both the experimental design and data analysis were executed with rigor and transparency, yielding a dataset that appears highly reliable. The authors provide strong evidence and a scientifically grounded interpretation of their results.

      The paper presents the results of a deep mutational scan based on two assays: an IFN-α-stimulated signaling assay and a protein abundance assay. These measurements are further supported by variant classifications from AlphaMissense and ClinVar, providing a framework for functional interpretation. Building on these data, the authors propose four potential pharmacological applications of their screening system at the end of the first results section.

      First, they demonstrate that the combined analysis of abundance and IFN-α signaling identifies potential allosteric sites, focusing on variants with normal protein stability but reduced signaling activity. Through this approach, they detect two previously uncharacterized allosteric regions (Results Section 2).

      Second, they explore how the screen can be used to predict variant-specific drug responses or resistance mechanisms (Results Section 3). This is achieved through assays involving two different inhibitors, which reveal both resistance- and potentiation-associated variants.

      Third, they assess the relative functional consequences of ligand and inhibitor dosing by performing IFN-α and inhibitor dose-response experiments (1, 10, and 100 U/mL IFN-α; IC99 and IC75 inhibitor concentrations; Results Section 3).

      Finally, the authors investigate how specific human variants, such as P1104A and I684S, may inform therapeutic modality selection (Results Section 4). Although these variants exhibit no detectable effect on IFN-α signaling within this experimental system, they substantially impact protein abundance. By integrating data from the UK Biobank, the authors further demonstrate that protective effects against autoimmune disease are associated with altered protein abundance rather than differences in IFN-α signaling, highlighting the distinct mechanistic basis of TYK2's clinical relevance.

      Strengths:

      Overall, we found this paper rigorous, well-written, and easy to follow. As such, we think this is an exceptional example of a deep mutational scanning manuscript, and this dataset will be invaluable to the field. We particularly appreciate that the authors could explore sensitivity to inhibitor concentration across multiple doses of the inhibitor.

      Weaknesses:

      Despite the authors' rigorous experimentation and thoughtful interpretation, the study leaves several important mechanistic questions unresolved, as is common in any study. While the data provide clear functional patterns, the underlying biophysical and biochemical explanations remain insufficiently explored. For instance, in point 1, the identification of two novel allosteric sites is intriguing, yet the paper does not elaborate on the structural basis or mechanistic rationale for their regulatory effects. In point 2, resistance and potentiation variants are described for two distinct inhibitors, but it remains unclear why certain variants respond specifically to one compound and not the other. In point 3, higher inhibitor concentrations appear to diminish allosteric interactions, though the reasons why some sites are affected while others are not are left unexplained. Finally, in point 4, the observation that protein abundance, but not IFN-α signaling, correlates with autoimmune protection is compelling but mechanistically ambiguous. These gaps do not detract from the technical excellence of the work; rather, they highlight opportunities for future studies to clarify the molecular and pharmacological mechanisms underlying TYK2 regulation and to deepen the translational insights drawn from this comprehensive mutational scan. We hope that the authors could provide more direction and mechanistic context in the discussion section to guide readers toward these next steps.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General Response

      We are grateful for the constructive comments from reviewers and the editor.

      The main point converged on a potential alternative interpretation that top-down modulation to the visual cortex may be contributing to the NC connectivity we observed. For this revision, we address that point with new analysis in Fig. S8 and Fig. 6. These results indicate that top-down modulation does not account for the observed NC connectivity.

      We performed the following analyses.

      (1) In a subset of experiments, we recorded pupil dynamics while the mice were engaged in a passive visual stimulation experiment (Fig. S8A). We found that pupil dynamics, which indicate the arousal state of the animal, explained only 3% of the variance of neural dynamics. This is significantly smaller than the contribution of sensory stimuli and the activity of the surrounding neuronal population (Fig. S8B). In particular, the visual stimulus itself typically accounted for 10-fold more variance than pupil dynamics (Fig. S8C). This suggests that the population neural activity is highly stimulus-driven and that a large portion of functional connectivity is independent of top-down modulation. In addition, after subtracting the neural activity from the pupil-modulated portion, the cross-stimulus stability of the NC was preserved (Fig. S8D).

      We note that the contribution from pupil dynamics to neural activity in this study is smaller than what was observed in an earlier study (Stringer et al. 2019 Science). That can be because mice were in quiet wakefulness in the current study, while mice were in spontaneous locomotion in the earlier study. We discuss this discrepancy in the main text, in the subsection “Functional connectivity is not explained by the arousal state”.

      (2) We performed network simulations with top-down input (Fig. 6F-H). With multidimensional top-down input comparable to the experimental data, recurrent connections within the network are necessary to generate cross-stimulus stable NC connectivity (Fig. 6G). It took increasing the contribution from the top-down input (i.e., to more than 1/3 of the contribution from the stimulus), before the cross-stimulus NC connectivity can be generated by the top-down modulation (Fig. 6H). Thus, this analysis provides further evidence that top-down modulation was not playing a major role in the NC connectivity we observed.

      These new results support our original conclusion that network connectivity is the principal mechanism underlying the stability of functional networks.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across the mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. However, the interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicates the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

      Behavioral modulation can influence the gain of sensory-evoked responses (Niell and Stryker, Neuron, 2010). This can explain why signal correlation is one of the best predictors of noise correlations as reported in the manuscript. A pair of neurons that are similarly gain-modulated by spontaneous behavior (e.g. both active during whisking or locomotion) will have higher noise correlations if they respond to similar stimuli. Top-down modulation by the behavioral state is also consistent with the stability of noise correlations across stimuli. Therefore, it is important to determine to what extent noise correlations can be explained by shared behavioral modulation.

      We thank the reviewer for the constructive and positive feedback on our study.

      The reviewer acknowledged the quality of our experiments and analysis and stated a concern that the noise correlation can be explained by top-down modulation. We have addressed this concern carefully in the revision, please see the General Response above.

      Reviewer #2 (Public Review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over a millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of the visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimulation. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap). The paper convincingly demonstrates the robustness of the clustering analysis and of the activity correlation measurements. The calcium imaging results convincingly show that noise correlations are correlated across visual stimuli and are strongest within cell classes which could reflect distributed visual channels. A simple simulation is provided that suggests that recurrent connectivity is required for the stimulus invariance of the results. The paper is well-written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. The modeling results presented, however, suggest interestingly that a simple feedforward architecture may not account for fundamental characteristics of the data. A limitation of the study is the lack of a behavioral task. The paper shows nicely that the correlation structure generalizes across visual stimuli. However, the correlation structure could differ widely when animals are actively responding to visual stimuli. I do think that, because of the complexity involved, a characterization of correlations during a visual task is beyond the scope of the current study.

      An important question that does not seem addressed (but it is addressed indirectly, I could be mistaken) is the extent to which it is possible to obtain reliable measurements of noise correlation from cell pairs that have widely distinct tuning. L2/3 activity in the visual cortex is quite sparse. The cell groups laid out in Figure S2 have very sharp tuning. Cells whose tuning does not overlap may not yield significant trial-to-trial correlations because they do not show significant responses to the same set of stimuli, if at all any time. Could this bias the noise correlation measurements or explain some of the dependence of the observed noise correlations on signal correlations/similarity of tuning? Could the variable overlap in the responses to visual responses explain the dependence of correlations on cell classes and groups?

      With electrophysiology, this issue is less of a problem because many if not most neurons will show some activity in response to suboptimal stimuli. For the present study which uses calcium imaging together with deconvolution, some of the activity may not be visible to the experimenters. The correlation measure is shown to be robust to changes in firing rates due to missing spikes. However, the degree of overlap of responses between cell pairs and their consequences for measures of noise correlations are not explored.

      Beyond that comment, the remaining issues are relatively minor issues related to manuscript text, figures, and statistical analyses. There are typos left in the manuscript. Some of the methodological details and results of statistical testing also seem to be missing. Some of the visuals and analyses chosen to examine the data (e.g., box plots) may not be the most effective in highlighting differences across groups. If addressed, this would make a very strong paper.

      We thank the reviewer for acknowledging the contributions of our study.

      We agree with the reviewer that future studies on behaviorally engaged animals are necessary. Although we also agree with the reviewer that behavior studies are out the scope of the current manuscript, we have included additional analysis and discussion on whether and how top-down input would affect the NC connectivity in the revision. Please see the General Response above.

      Reviewer #3 (Public Review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons into 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.

      NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neuron pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights into the correlation structure of visual responses across multiple areas.

      Strengths:

      The study uses state-of-the art mesoscopic two-photon imaging.

      The measurements of shared variability across multiple areas are novel.

      The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra-class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory-evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are some of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory-evoked responses (Niell et al, Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al, Neuron 2015 for a similar point).<br /> As behavioral modulations are not considered, this confound affects most of the conclusions of the manuscript, as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain most of the results without the need for discrete broadcasting channels or any particular network architecture and should be addressed to support its main claims.

      (1b) In Figure 5 the observations are interpreted as evidence for NCs reflecting features of the network architecture, as NCs measured using gratings predicted NC to naturalistic videos. However, it seems from Figure 5 A that signal correlations (SCs) from gratings had non-zero correlations with SCs during naturalistic videos (is this the case?). Thus, neurons that are cotuned to gratings might also tend to be coactivated during the presentation of videos. In this case, they are also expected to be susceptible to shared behaviorally driven fluctuations, independently of any circuit architecture as explained before. This alternative interpretation should be addressed before concluding that these measurements reflect connectivity features.

      We thank the reviewer for acknowledging the contributions of our study.

      The reviewer suggested that gain modulation might be interfering with the interpretation of the NC connectivity. We have addressed this issue in the General Response above.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      (2) Discrete vs continuous communication channels

      (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels. This discreteness is based on an unbiased clustering approach to the tuning of neurons, followed by a manual grouping into six categories in relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      (2b) Consequently, I feel the support for discrete vs continuous selective communication is rather inconclusive. It seems that following the author's claims, it would be important to establish if neurons belong to the same groups, rather than tuning similarity is a defining feature for showing large NCs.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

      We have addressed this issue in the General Response above and the response to comment (1).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      A general recommendation discussed with the reviewers is to make use of behavioural recording to assess whether shared behaviourally driven modulations can explain the observed relation between SC and NC, independently of the network architecture. Alternatively, a simulation or model might also address this point as well as the possibility that the relation of SC and NC might be also independent of network architecture given the sparseness of the sensory responses in L2/3.

      We have addressed this in the General Response above.

      Broadly speaking, inferring network architecture based on NCs is extremely challenging. Consequently, the study could also be substantially improved by reframing the results in terms of distributed co-active ensembles without insinuation of direct anatomical connectivity between them.

      We agree that the inferring network architecture based on NCs is challenging. The current study has revealed some principles of functional networks measured by NCs, and we showed that cross-stimulus NC connectivity provides effective constraints to network modeling. We are explicit about the nature of NCs in the manuscript. For example, in the Abstract, we write “to measure correlated variability (i.e., noise correlations, NCs)”, and in the Introduction, we write “NCs are due to connectivity (direct or indirect connectivity between the neurons, and/or shared input)”. We are following conventions in the field (e.g., Sporns 2016; Cohen and Kohn 2011).

      Notice also that the abstract or title should make clear that the study was made in mice.

      Sorry for the confusion, we now clearly state the study was carried out in mice in the Abstract and Introduction.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript presents a meticulous characterization of noise correlations in the visual cortical network. However, as I outline in the public review, I think the use of noise correlations to infer communication channels is problematic and I urge the authors to carefully consider this terminology. Language such as "strength of connections" (Figure 4D) should be avoided.

      We now state in the figure legend that the plot in Fig. 4D shows the average NC value.

      My general suggestion to the authors, which primarily concerns the interpretation of analyses in Figures 4-6, is to consider the possible impact of shared top-down modulation on noise correlations. If behavioral data was recorded simultaneously (e.g. using cameras to record face and body movements), behavioral modulation should be considered alongside signal correlation as a possible factor influencing NCs.

      We have addressed this issue in the General Response above.

      I may be misunderstanding the analysis in Figure 4C but it appears circular. If the fraction of neurons belonging to a particular tuning group is larger, then the number of in-group high NC pairs will be higher for that group even if high NC pairs are distributed randomly. Can you please clarify? I frankly do not understand the analysis in Figure 4D and it is unclear to me how the analyses in Figure 4C-D address the hypotheses depicted in the cartoons.

      Sorry for the confusion, we have clarified this in the Fig. 4 legend.

      Each HVA has a SFTF bias (Fig. 1E,F; Marshel et al., 2011; Andermann et al., 2011; Vries et al., 2020). Each red marker on the graph in Fig. 4C is a single V1-HVA pair (blue markers are within an area) for a particular SFTF group (Fig. 1). The x-axis indicates the number of high NC pairs in the SFTF group in the V1-HVA pair divided by the total number of high NC pairs per that V1-HVA pair (summed over all SFTF groups). The trend is that for HVAs with a bias towards a particular SFTF group, there are also more high NC pairs in that SFTF group, and thus it is consistent with the model on the right side. This is not circular because it is possible to have a SFTF bias in an HVA and have uniformly low NCs. The reviewer is correct that a random distribution of high NCs could give a similar effect, which is still consistent with the model: that the number of high NC pairs (and not their specific magnitudes) can account for SFTF biases in HVAs.

      To contrast with that model, we tested whether the average NC value for each tuning group varies. That is, can a small number of very high NCs account for SFTF biases in HVAs? That is what is examined in Fig. 4D. We found that the average NC value does not account for the SFTF biases. Thus, the SFTF biases were not related to the modulation in NC (i.e., functional connection strength). 

      I found the discussion section quite odd and did not understand the relevance of the discussion of the coefficient of variation of various quantities to the present manuscript. It would be more useful to discuss the limitations and possible interpretations of noise correlation measurements in more detail.

      We have revised the discussion section to focus on interpreting the results of the current study and comparing them with those of previous studies.

      Figure 3B: please indicate what the different colors mean - I assume it is the same as Figure 3A but it is unclear.

      We added text to the legend for clarification.

      Typos: Page 7: "direct/indirection wiring", Page 11: "pooled over all texted areas"

      We have fixed the typos.

      Reviewer #2 (Recommendations For The Authors):

      The significance of the results feels like it could be articulated better. The main conclusion is that V1 to HVA connections avoid mixing channels and send distinctly tuned information along distinct channels - a more explicit description of what this functional network understanding adds would be useful to the reader.

      Thanks for the suggestion. We have edited the introduction section and the discussion section to make the take-home message more clear.

      Previous studies with anatomical data already indicate distinctly tuned channels - several of which the authors cite - although inconsistently:

      • Kim et al 2018 https://doi.org/10.1016/j.neuron.2018.10.023

      • Glickfeld et al., 2013 (cited)

      • Han et al., 2022 (cited)

      • Han and Bonin 2023 (cited)

      Thanks for the suggestion, we now cite the Kim et al. 2018 paper.

      I think the information you provide is valuable - but the value should be more clearly spelled out - This section from the end of the discussion for example feels like abdicates that responsibility:<br /> "In summary, mesoscale two-photon imaging techniques open up the window of cellular-resolution functional connectivity at the system level. How to make use of the knowledge of functional connectivity remains unclear, given that functional connectivity provides important constraints on population neuron behavior."

      A discussion of how the results relate to previous studies and a section on the limitations of the study seems warranted.

      Thanks for the suggestion, we have extensively edited the discussion section to make the take-home message clear and discuss prior studies and limitations of the present study.

      Details:

      Analyses or simulations showing that the dependency of correlations on similarity of tuning is not an artifact of how the data was acquired is in my mind missing and if that is the case it is crucial that this be addressed.

      At each step of data analysis, we performed control analysis to assess the fidelity of the conclusion. For example, on the spike train inference (Fig. S4), GMM clustering (Fig. S1), and noise correlation analysis (Figs. 2, S5).

      None of the statistical testing seems to use animals as experimental units (instead of neurons). This could over-inflate the significance of the results. Wherever applicable and possible, I would recommend using hierarchical bootstrap for testing or showing that the differences observed are reproducible across animals.

      We analyzed the tuning selectivity of HVAs (Fig. 1F) using experimental units, rather than neurons. It is very difficult to observe all tuning classes in each experiment, so pooling neurons across animals is necessary for much of the analysis. We do take care to avoid overstating statistical results, and we show the data points in most figure to give the reader an impression of the distributions.

      Page 2. "The number of neurons belonged to the six tuning groups combined: V1, 5373; LM, 1316; AL, 656; PM, 491; LI, 334." Yet the total recorded number of neurons is 17,990. How neurons were excluded is mentioned in Methods but it should be stated more explicitly in Results.

      We have added text in the Fig. 1 legend to direct the audience to the Methods section for information on the exclusion / inclusion criteria.

      Figure 1C, left. I don't understand how correlation is the best way to quantify the consistency of class center with a subset of data. Why not use for example as the mean square error. The logic underlying this analysis is not explained in Methods.

      Sorry for the confusion, we have clarified this in the Methods section.

      We measured the consistency of the centers of the Gaussian clusters, which are 45-dimensional vectors in the PC dimensions. We measured the Pearson correlation of Gaussian center vectors independently defined by GMM clustering on random subsets of neurons. We found the center of the Gaussian profile of each class was consistent (Fig. 1C). The same class of different GMMs was identified by matching the center of the class.

      Figure 1E. There are statements in the text about cell groups being more represented in certain visual areas. These differences are not well represented in the box plots. Can't the individual data points be plotted? I have also not found the description and results of statistical testing for these data.

      We have replotted the figure (now Fig. 1F) with dot scatters which show all of the individual experiments.

      Figure 2A, right, since these are paired data, I am not quite sure why only marginal distributions are shown. It would be interesting to know the distributions of correlations that are significant.

      This is only for illustration showing that NCs are measurable and significantly different from zero or shuffled controls. The distribution of NCs is broad and has both positive and negative values. We are not using this for downstream analysis.

      Figure 4A, I wonder if it would not be better to concentrate on significant correlations.

      We focused on large correlation values rather than significant values because we wanted to examine the structure of “strongly connected” neuron pairs. Negative and small correlation values can be significant as well. Focusing on large values would allow us to generate a clear interpretation.  

      Figure 4B, 'Mean strength of connections' which I presume mean correlations is not defined anywhere that I can see.

      I believe the reviewer means Fig. 4D. It means the average NC value. We have edited the figure legend to add clarity.

      Figure 4F, a few words explaining how to understand the correlation matrix in text or captions would be helpful.

      Sorry for the confusion, we have clarified this part in figure legend for Fig. 4F.

      Page 5, right column: Incomplete sentence: "To determine whether it is the number of high NC pairs or the magnitude of the NCs,".

      We have edited this sentence.

      Page 5, right column: "Prior findings from studies of axonal projections from V1 to HVAs indicated that the number of SF-TF-specific boutons -rather than the strength of boutons- contribute to the SF-TF biases among HVAs (Glickfeld et al., 2013)." Glickfeld et al. also reported that boutons with tuning matched to the target area showed stronger peak dF/F responses.

      Thank you. We have revised this part accordingly.

      Page 9, the Discussion and Figure 7 which situates the study results in a broader context is welcome and interesting, but I have the feeling that more words should be spent explaining the figure and conceptual framework to a non-expert audience. I am a bit at a loss about how to read the information in the figure.

      Sorry for the confusion, we have added an explanation about this section (page 10, right column).

      As far as I can see, data availability is not addressed in the manuscript. The data, code to analyze the data and generate the figures, and simulation code should be made available in a permanent public repository. This includes data for visual area mapping, calcium imaging data, and any data accessory to the experiments.

      We have stated in the manuscript that code and data are available upon request. We regularly share data with no conditions (e.g., no entitlement to authorship), and we often do so even prior to publication.

      The sex of the mice should be indicated in Figure T1.

      The sex of the mice was mixed. This is stated in the Methods section.

      Methods:

      Section on statistical testing, computation of explained variance missing, etc. I feel many analyses are not thoroughly described.

      Sorry for the confusion, we have improved our method section.

      Signal correlation (similarity between two neurons' average responses to stimuli) and its relation to noise correlation is not formally defined.

      We have included the definition of signal correlation in the Methods.

      Number of visual stimulation trials is not stated in Methods. Only stated figure caption.

      The number of visual stimulus trials is provided in the last paragraph of the Methods section (Visual Stimuli).

      Fix typos: incorrect spelling, punctuation, and missing symbols (e.g. closing parentheses).

      We have carefully examined the spelling, punctuation, and grammar. We have corrected errors and we hope that none remain.

      Why use intrinsic imaging to locate retinotopic boundaries in mice already expressing GCaMP6s?

      We agree with the reviewer that calcium imaging of visual cortex can be used to identify the visual cortex.

      It is true that areas can be mapped using the GCaMP signals. That is not our preferred approach. Using intrinsic imaging to define the boundary between V1 and HVAs has been a well refined routine in our lab for over a decade. It is part of our standard protocol. One advantage is that the data (from intrinsic signals) is of the same nature every time. This enables us to use the same mapping procedure no matter what reporters mice might be expressing (and the pattern, e.g., patchy or restricted to certain cell types).

      Reviewer #3 (Recommendations For The Authors):

      The possibilty that larger intra-group NCs observed simply reflect a multiplicative gain on cotuned neurons could be addressed using pupil and/or face recordings: Does pupil size or facial motion predict NCs and if factored out, does signal correlation still predict NCs?

      Perhaps a variant of the network model presented in Figure 6 with multiplicative gain could also be tested to investigate these issues.

      We have addressed this issue in general response.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      Similarly further analyses can be done to strengthen support for the claims that the observed NCs reflect discrete communication channels. A direct test of continuous vs categorical channels would strengthen the conclusions. One possible analysis would be to compare pairs with similar tuning (same SC) belonging to the same or different groups.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      I also found many places where the manuscript needs clarification and /or more methodological details:<br /> • How many times was each of the stimulus conditions repeated? And how many times for the two naturalistic videos? What was the total duration of the experiments?

      The number of visual stimulus trials is provided in the last paragraph of the Methods section entitled Visual Stimuli. About 15 trials were recorded for each drifting grating stimulus, and about 20 trials were recorded for each naturalistic video.

      • Typo: Suit2p should be Suite2p (section Calcium image processing - Methods).

      We have fixed the typo.

      • What do the error bars in Figure 1E represent? Differences in group representation across areas from Figure 1E are mentioned in the text without any statistical testing.

      We have revised the Figure 1E (current Fig. 1F), and we now show all data points.

      • The manuscript would benefit from a comparison of the observed area-specific tuning biases across areas (Figure 1E and others) with the previous literature.

      We have included additional discussion on this in the last paragraph of the section entitled Visual cortical neurons form six tuning groups.

      • Why are inferred spike trains used to calculate NCs? Why can't dF/F be used? Do the results differ when using dF/F to calculate NC? Please clarify in the text.

      We believe inferred spike trains provide better resolution and make it easier to compare with quantitative values from electrical recordings. Notice that NC values computed using dF/F can be much larger than those computed by inferred spike trains. For example, see Smith & Hausser 2010 Nat Neurosci. Supplementary Figure S8.

      • The sentence seems incomplete or unclear: "That is, there are more high NC pairs that are in-group." Explicit vs what?

      We have revised this sentence.

      • Figure 1E is unclear to me. What is being plotted? Please add a color bar with the metric and the units for the matrix (left) and in the tuning curves (right panels). If the Y and X axes represent the different classes from the GMM, why are there more than 65 rows? Why is the matrix not full?

      We have revised this figure. Fig. 1D is the full 65 x 65 matrix. Fig. 1F has small 3x3 matrices mapping the responses to different TF and SF of gratings. We hope the new version is clearer.

      • How are receptive fields defined? How are their long and short axes calculated? How are their limits defined when calculating RF overlap?

      We have added further details in the Methods section entitled “Receptive field analysis”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive and precise comments, which have helped us improve the consistency and clarity of our manuscript. Below, we provide a point-by-point response to each comment. In summary, the main changes introduced in the revised version are as follows:

      (1) We replaced all the statistical analyses to their non-parametric equivalents to ensure compliance with test assumptions and consistency of the results;

      (2) We compare the participants’ reaction times before and during connected practice, revealing a significant reduction in reaction times of both partners when connected;

      (3) We added, in the supplementary materials, a table reporting the vigor scores of each participant in each experimental condition, facilitating the assessment of individual and dyadic behaviors;

      (4) We have reviewed and refined the terminology throughout the manuscript and reduced the number of abbreviations to improve clarity.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel investigation of the movement vigor of individuals completing a synchronous extension-flexion task. Participants were placed into groups of two (so-called "dyads") and asked to complete shared movements (connected via a virtual loaded spring) to targets placed at varying amplitudes. The authors attempted to quantify what, if any, adjustments in movement vigor individual participants made during the dyadic movements, given the combined or co-dependent nature of the task. This is a novel, timely question of interest within the broader field of human sensorimotor control.

      Participants from each dyad were labeled as "slow" (low vigor) or "fast" (high vigor), and their respective contributions to the combined movement metrics were assessed. The authors presented four candidate models for dyad interactions: (a) independent motor plans (i.e., co-activity hypothesis), (b) individual-led motor plans (i.e., leader-follower hypothesis), (c) generalization to a weighted average motor plan (i.e., weighted adaptation hypothesis), and (d) an uncertainty-based model of dynamic partner-partner interaction (i.e., interactive adaptation hypothesis). The final model allowed for dynamic changes in individual motor plans (and therefore, movement vigor) based on partner-partner interactions and observations. After detailed observations of interaction torque and movement duration (or vigor), the authors concluded that the interactive adaptation model provided the best explanation of human-human interaction during self-paced dyadic movements.

      Strengths:

      The experimental setup (simultaneous wrist extension-flexion movements) has been thoroughly vetted. The task was designed particularly well, with adequate block pseudo-randomization to ensure general validity of the results. The analyses of torque interaction, movement kinematics, and vigor are sound, as are the statistical measures used to assess significance. The authors structured the work via a helpful comparison of several candidate models of human-human interaction dynamics, and how well said models explained variance in the vigor of solo and combined movements. The research question is timely and extends current neuroscientific understanding of sensorimotor control, particularly in social contexts.

      We thank the reviewer for their in-depth analysis and constructive assessment of our manuscript.

      Weaknesses:

      (1) My chief concern about the study as it currently stands is the relatively low number of data points (n=10). The authors recruited 20 participants, but the primary conclusions are based on dyad-specific interactions (i.e., analyses of "fast" vs "slow" participants in each pair). Some of these analyses would benefit greatly, in terms of power, from the addition of more data points.

      We understand and appreciate the reviewer’s concern regarding the effective sample size at the dyad level (n=10). While our primary analyses focus on dyad-specific interactions, we note that the reported effects are consistent across multiple dynamic conditions and are associated with large effect sizes. To provide a conservative assessment the Cohen’s D values reported correspond to the smallest effect size observed across the relevant statistical tests, thereby limiting the risk of false positives or overinterpretation. In addition, to ensure robustness given the sample size and distribution properties of the data, we have replaced all parametric tests with their non-parametric counterparts, as some analyses violated ANOVA assumptions. Friedman and Kruskal-Wallis tests are now used for paired and unpaired main effects respectively, and Wilcoxon and Mann-Whitney tests for paired and unpaired post-hoc comparisons respectively. Note that these changes did not alter the conclusions of the study.

      (a) The distribution of delta-vigor (Fast group vs Slow group) is highly skewed (see Figures 3D, S6D), with over half of the dyads exhibiting delta-vigor less than 0.2 (i.e., less than 20% of unit vigor). Given the relatively low number of dyads, it would be helpful for the authors to provide explicit listings of VigorFast, VigorSlow, and VigorCombined for each of the 10 separate dyads or pairings.

      We agree with this comment. However, we note that the distribution of vigor scores within a population is typically centered around 1, with large deviations observed only for the fastest and slowest participants [1]. As a result, the distri bution of ∆-vigor is inherently skewed. Correcting for this skewness would (i) require pairing participants based on their vigor, which is logistically difficult, and (ii) lead to an atypical sampling of dyads, with an over representation of pairs exhibiting very large vigor differences. The distributions of vigor scores for the fast and slow groups before and after the interaction are reported in Supplementary Fig. S21. In addition, as suggested by the reviewer, we have now included Table S.1 in the supplementary materials, listing the values VigorFast, VigorSlow, and VigorCombined for each of the 10 dyads. This table provides a complete view of the evolution of participant’s vigor throughout the experiment.

      (b) The authors concluded that the interactive adaptation hypothesis provided the best summary of the combined movement dynamics in the study. If this is indeed the case, then the relative degree of difference in vigor between the fast and slow participants in a dyad should matter. How well did the interactive adaptation model explain variance in the dyads with relatively low delta-vigor (e.g., less than 0.2) vs relatively high delta-vigor?

      We initially expected the magnitude of difference in individual vigor within a dyad to play a significant role. However, our analysis did not reveal any systematic effect of ∆-vigor on either the interaction force or the resulting dyadic vigor, as shown by the LMM analysis. Importantly, the interactive adaptation hypothesis does per se imply that the magnitude of vigor differences between the two partners should matter, only that their respective roles in selecting the adapted behavior is different. Although the model includes several free parameters, we did not attempt to fit it to individual dyads as would in principle be possible. Instead, we performed a sensitivity analysis to assess how variations in the difference in vigor between the partners influence model predictions. For this purpose, we simulated increasing values of µ and variations in the fast partner’s cost of time. In addition, we demonstrated that uncertainty in the estimated behavior of the slow partner, which is a priori specific to each individual, has a substantial impact on the optimal movement duration of the dyad. Overall, this analysis shows that the model captures the full range of qualitative trends observed in the experimental data. When applied to predict the behavior of the average dyad, the resulting movement time prediction error remain small, as detailed in the Results section.

      (2) The authors shared the results of one analysis of reaction time, showing that the reaction times of the slow partners and the fast partners did not differ during the initial passive block. Did the authors observe any changes in RT of either the slow or fast partner during the combined (primary task) blocks (KL, KH, etc.)? If the pairs of participants did indeed employ a form of interactive adaptation, then it is certainly plausible that this interaction would manifest in the initial movement planning phase (i.e., RT) in addition to the vigor and smoothness of the movements themselves.

      We thank the reviewer for this interesting question, that prompted us to extend our analysis of reaction times to the connected conditions. This additional analysis revealed a significant main effect of the condition on the reaction time for both the fast and slow groups (in both cases: W<sub>2</sub> > 0.39, p < 0.02). Post-hoc comparisons showed a significant reduction in reaction time between the initial null-field block (NF1) and the KH condition for the slow group (p = 0.03, D = 1.46), and a similar trend for the fast group (p = 0.06, D = 1.03). However, the reaction times remained comparable between the two groups, with no significant difference between them. We have incorporated these observations in the Results section (p.4, l.100–109) and expanded the Discussion (p.11, l.341–348) to address their implications for interactive adaptation in human-human and human-robot physical interactions.

      Reviewer #2 (Public review):

      Summary:

      This study examines how individual movement vigor is integrated into a shared, dyadic vigor when two individuals are physically coupled. Participants performed wrist-reaching movements toward targets at different distances while mechanically linked via a virtual elastic band, and dyads were formed by pairing participants with different baseline vigor profiles. Under interaction conditions, movements converged to coordinated patterns that could not be explained by simple averaging, indicating that each dyad behaved as a single functional unit. Notably, under coupling, movement durations for both partners were shorter than in the solo condition, arguing against the view that each individual simply executed an independent movement plan. Furthermore, dyadic vigor was primarily predicted by the slower partner’s vigor rather than by the faster partner’s, suggesting that neither a leader-follower strategy nor a weighted averaging account fully explains the observed behavior. The authors propose a computational model in which both partners adapt to the emerging interaction dynamics ("interactive adaptation strategy"), providing a coherent explanation of the behavioral observations.

      Strengths:

      The study is carefully designed and addresses an important question about how individual movement vigor is integrated during joint action. The experimental paradigm allows systematic manipulation of interaction strength and partner asymmetry. The behavioral results show clear and robust patterns, particularly the shortening of movement durations under elastic coupling (KL and KH conditions) and the asymmetrical contribution of the slower partner’s vigor to dyadic vigor. The computational model captures the main behavioral patterns well and provides a principled framework for interpreting dyadic vigor not as a simple combination of two independent motor plans, but as an emergent property arising from mutual adaptation. Conceptually, the study is notable in extending the notion of vigor from an individual attribute to a dyad-level construct, opening a new perspective on coordinated movement and motor decision-making.

      We thank the reviewer for their thorough analysis of our manuscript and their constructive feedback.

      Weaknesses:

      (1) A key conceptual issue concerns the apparent asymmetry between partners in the computational framework. While dyadic vigor is empirically better predicted by the slower partner’s vigor, the model formulation appears to emphasize the faster partner’s time-related cost and interaction forces. Although the cost function includes an uncertaintyrelated component associated with the slower partner, it remains unclear from the current formulation and description how dyadic vigor is formally derived from the slower partner’s control policy within the same modeling framework. This raises an important question regarding whether the model offers a symmetric account of dyadic vigor formation for both partners or whether it is effectively anchored to the faster partner’s control architecture.

      We have modified our phrasing to clarify the principles according to which the computational framework was designed (p.7, l.226–231 and p.9, l.260–264). As stated in the Results section, the model is indeed asymmetric by design, which corresponds to the different roles of the fast and slow partner exhibited in the data. In that context, the uncertain term associated with the slow partners should be understood as an overarching constraint that conditions the strategy of the dyad, while the fast partner cost of time acts as a contributor to the expected dyad strategy. Conceptually and numerically as reported in the sensitivity analysis, this asymmetry corresponds to the role of the slow partners in setting the vigor ranking among the dyads and the role of the fast partner in setting the average dyadic behavior.

      (2) A second conceptual issue concerns the interpretation of the term "motor plan." It remains unclear whether this term refers primarily to movement-related characteristics such as speed or duration, or more broadly to the underlying optimization structure that governs these variables. This distinction is theoretically important, as it determines whether the reported interaction effects should be understood as adjustments in movement characteristics or as changes in the structure of the control policy itself.

      We agree with the reviewer that this terminology required clarification. In this paper, the term “motor plan” refers to the time series of control inputs planned by the CNS, rather than solely to kinematic descriptors such as speed or duration. These planned control signals are a direct consequence of the underlying optimization structure and cost functions that govern trajectory generation. We have clarified this definition in the Introduction (p.1, l.23–24).

      Reviewer #3 (Public review):

      Strengths:

      This study provides novel insights into how individuals regulate the speed of their movements both alone and in pairs, highlighting consistent differences in movement vigor across people and showing that these differences can adapt in dyadic contexts. The findings are significant because they reveal stable individual patterns of action that are flexible when interacting with others, and they suggest that multiple factors, beyond reward sensitivity, may contribute to these idiosyncrasies. The evidence is generally strong, supported by careful behavioral measurements and appropriate modeling, though clarifying some statistical choices and including additional measures of accuracy and smoothness would further strengthen the support for the conclusions.

      Thank you for this analysis and the insightful feedback.

      Major Comments:

      (1) Given the idiosyncrasies in individual vigor, would linear mixed models (LMMs) be more appropriate than ANOVAs in some analyses (e.g., in the section "Solo session"), as they can account for random intercepts and slopes on vigor measures? Some figures (e.g., Figure 2.B and 3.E) indeed seem to show that some aspects of behaviour may present variability in slopes and intercepts across participants. In fact, I now realize that LMMs are used in the "Emergence of dyadic vigor from the partners’ individual vigor" section, so could the authors clarify why different statistical approaches were applied depending on the sections?

      We thank the reviewer for this thoughtful comment. We deliberately used different statistical approaches throughout the paper in order to address different types of questions. Note that the statistical tests were converted to their nonparametric equivalent for consistency (see answer to Reviewer 1).

      - Friedman tests were used in a limited number of cases to assess population- or group-level effects, such as differences in movement time, smoothness, or accuracy across the solo, connected, and after-effects conditions. Such tests provide a straightforward framework for these descriptive, condition-level comparisons.

      - The stability of individual and dyadic vigor scores across conditions was assessed using Pearson correlations across all condition pairs, which we consider the most direct and interpretable approach for evaluating consistency across sessions.

      - LMMs were employed to examine how dyadic vigor relates to the partners’ individual vigor measured in the solo conditions, which revealed the critical contribution of the slow partner.

      Rather than applying a single statistical framework throughout, we selected the method best suited to each question. While LMMs are well suited for modeling participant-specific variability when linking individual and dyadic measures, their systematic use in all analyses would be less intuitive and would not directly address several of the population-level comparisons central to this study.

      (2) If I understand correctly, the introduction suggests that idiosyncrasies in movement vigor may be driven by interindividual differences in reward sensitivity. However, the current task does not involve any explicit rewards, yet the authors still observe idiosyncrasies in vigor, which is interesting. Could this indicate that other factors contribute to these consistent individual differences? For example, could sensitivity to temporal costs or physical effort explain the slow versus fast subgrouping? Specifically, might individuals more sensitive to temporal costs move faster to minimize opportunity costs, and might those less sensitive to effort costs also move faster? Along the same lines, could the two subgroups (slow vs. fast) be characterized in terms of underlying computational "phenotypes," such as their sensitivities to time and effort? If this is not feasible with the current dataset, it would still be valuable to discuss whether these factors could plausibly account for the observed patterns, based on existing literature.

      We thank the reviewer for this interesting question. We first note that the notion of reward in motor control is quite broad. Although our task did not include explicit external (e.g. monetary) rewards, we assumed that participants attribute an implicit value to completing the task in accordance with the experimenter’s instructions. This assumption has been shown to be appropriate for characterising baseline behavior in previous studies [2–5].

      As discussed in the Introduction, vigor is generally understood to emerge from a tradeoff between effort, accuracy, and time. The reviewer is correct in noting that inter-individual differences in vigor may reflect differences in reward sensitivity or in its discounting [3,6], given that time and reward are intrinsically coupled. Differences in vigor may also arise from inter-individual variability in sensitivity to effort or perceived task difficulty. Because these factors are intertwined—for example, increasing accuracy through co-contraction typically incurs greater effort [7])—it is challenging to disentangle their respective contributions based solely on behavioral data.

      In the present study, our inverse optimal control procedure to identify the cost of time (and thus predict individuals’ vigor) relies on a predefined effort-accuracy tradeoff under fixed final time across multiple movement amplitudes [8]. As a result, the model does not allow us to independently estimate individual sensitivities to effort, accuracy, and time. Such characterization of computational "phenotypes" would likely require experimental paradigms in which each of these factors is systematically manipulated while the others are held constant, which is beyond the scope of the current dataset. In practice, the main value of behavioral modeling lies in revealing the relative weighting of these criteria by the CNS during motor planning [5]. We have expanded the Discussion to clarify these limitations and considerations (see Discussion p.12, l.396–401 & l.407–412).

      Finally, we chose not to emphasize these broader issues in the present manuscript because (i) they are peripheral to our primary research question on how individual vigor influences human-human interaction, and (ii) although we do not yet have definitive and consensual answers, they have been addressed in multiple studies reviewed elsewhere [9,10].

      (3) The observation that dyads did not lose accuracy or smoothness despite changes in vigor is interesting and suggests a shift in the speed-accuracy tradeoff. Could the authors include accuracy and smoothness measures in the main figures rather than only in supplementary materials? I think it would make the manuscript more complete.

      We also find that the preservation of accuracy and smoothness despite changes in vigor is an interesting result, and we therefore chose to report these measures in the Supplementary Materials. However, we believe it is preferable not to include them in the main figures for the following reasons:

      - We avoid framing our results in terms of a speed-accuracy trade-off, as Fitts’ work was initially designed to study fast movements [11], whereas our work focuses on self-paced movements. As outlined in the Introduction, vigor is more appropriately interpreted as reflecting a tradeoff between effort (related to movement speed), accuracy, and time. From this perspective, the reported changes of vigor already capture a shift in the underlying trade-off selected by the CNS, using a framework better suited to our experimental paradigm.

      - The manuscript is technically dense and reports multiple analyses that are essential to establish (i) the existence and definition of dyadic vigor, and (ii) how it emerges from interaction between partners. Although the observed preservation of accuracy and improvements in smoothness are informative, they are not central to these two primary questions and would risk diverting attention from the core contributions of the paper. In addition, accuracy is not a feature predicted by our deterministic modeling and extensions would be needed to capture these aspect. Here we only attempted to replicate average behaviors.

      (4) It is a bit unclear to me whether the variance assumptions for ANOVAs were checked, for instance, in Figure 3H.

      We thank the reviewer for this comment, which prompted us to verify the assumptions underlying our ANOVAs. We found that a few distributions in the original analysis, as well as in some of the new tests, did not meet these assumptions. To ensure consistency, all statistical analyses have now been replaced with non-parametric tests: Friedman and Kruskal-Wallis tests for paired and unpaired main effects, Wilcoxon and Mann-Whitney tests for paired and unpaired post-hocs. The updated results do not change any of the conclusions. the only minor change is accuracy, that appeared slightly improved in a restricted number of connected conditions, and now appears mostly non-impacted.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) Lines 146-147. The authors state, "Whereas the fast partners maintained a similar duration". Figures S6H,I suggest that fast partners made slower movements during the paired task relative to the solo task, not movements with a similar duration.

      We agree that Fig. S.6H,I suggest slightly slower movements for the fast partners, though not significant. We have modified the sentence to be less assertive than in the previous version (see p.6, l.155).

      (2) In the Discussion (Lines 318-319), the authors state that their findings confirm and extend the "benefits of dyadic control in collaborative actions". What benefits are they referring to here, relative to individual control? It would be helpful if the authors would elaborate on this claim.

      We have modified this sentence to clarify that the benefits of dyadic control refer to previously reported advantages over individual control, namely reduced movement time Reed and Peshkin (2008) [12] and improved tracking accuracy [13,14] (see p.11, l.336–337).

      (3) On Lines 87-89, the authors reference a decomposition of variance of vigor scores across the NF1, VL, and VH conditions; however, I did not see an explanation of how this decomposition was performed. The method used to estimate variance explained by inter-individual vs intra-individual differences in vigor should be outlined for the reader.

      Thank you for pointing out this missing information. We now explain in the statistical analysis section (see p.14, l.504–507), that the percentage of inter-individual variability in vigor is estimated using sum-square values as an estimation of inter- and intra-individual variability.

      (4) How was the absolute interaction torque for a paired movement calculated? Was it an integral of the temporal profile of torque for some portion of the combined movement? The method for calculating the absolute interaction torque needs to be specified.

      We have now clarified in the Methods (see p.14, l.490–491) that the reported average interaction effort was computed as the absolute value of the interaction torque as a function of time averaged over the entire movement.

      (5) Lines 123-124: "... interaction torque showed no significant correlation with differences in individual vigor within dyads." This statement should be supported by appropriate statistical measures.

      This result is now supported by reporting the corresponding Pearson correlation analyses. No significant correlations were found between interaction torque and differences in individual vigor within dyads (KL conditions: |r| < 0.43, p> 0.22; KH conditions: |r| < 0.18, p > 0.61, see p.5, l.132–133).

      (6) For the analysis, presented in Figure 3C, and specified on lines 116-123, the text mentions the main effects of both condition and target. There doesn’t appear to be much of an effect of the target for the KH data. Should these results not be reported as an interaction effect between the two factors instead?

      We agree with the reviewer and have corrected our presentation of these results (see p.4, l.126–128). Consistent with the reviewer’s observation, no significant effect of the target is found in the KH condition.

      (7) Figures 3E and S6B. What is the purpose of including the averaged data for each pair in addition to both individuals’ data from each pair? It would be useful to distinguish the individual data from the average data for each pair. Frankly, the number of data points shown on this sub-figure is excessive.

      There may have been a misunderstanding. Because the partners of a dyad are connected by a virtual elastic band (rather than a rigid bar), they do not execute identical movements. Therefore Figs. 3E,S6B display the movement time of all individual participants, together with the corresponding 20 individual regression lines, like in Fig. 2B. The solid black line represents the average across all individuals, and the averaged behaviors of dyads are not included. We have clarified this point by revising the caption of Fig. 3E (see p.5).

      Noted mis-spellings:

      Figure S.3A caption: "trials towards this target."

      Page 10 Line 313: "Importantly, these findings show ...".

      These mis-spellings have been corrected at supplementary p.2 and main text p.11, l.331. Thank you!

      Reviewer #2 (Recommendations for the authors):

      (1) To illustrate the contribution of the three components used to calibrate the overall cost function, it would be informative to include simulation analyses in which each component is selectively removed (i.e., ablation analyses).

      We did not perform ablation analyses, as selectively removing components of the model can lead to instability or ill-suited control inputs, making the resulting simulations difficult to interpret. Instead, we conducted a sensitivity analysis of the key parameters shaping the overall cost function, including the estimated mean and deviation of the slow partner’s movement duration, the weight associated with uncertain torque minimization (Figs. S.18,S.19), and the fast partner’s cost of time (Fig. S20). This analysis reveals the predominant roles of the estimated slow partner movement patterns in determining the model predictions, in agreement with our experimental observations.

      (2) Although the authors refer to the motor-off condition as "passive," participants actively generated the movements in the absence of external forces. Thus, this condition corresponds to active, unassisted movement. A different term may therefore reduce potential confusion for readers.

      We agree that term “passive” was not well-chosen given the context of the paper, thus we have instead replaced this denomination as “null-field” condition. Consequently, the P1 and P2 blocks are now referred to as NF1 and NF2.

      (3) Please clarify the instructions given to participants. Were they informed in advance that their movements would physically interact with those of their partner?

      Thank you for pointing out this missing clarification. We have now specified in the Methods (p.14, l.465–469) that participants were not informed prior to any condition that they would interact with a human partner; they were only told that the robot would provide assistance. When debriefed at the end of the experiment, only one out of the 20 participants reported having realized that they were connected to another human. Most participants believed they were interacting either with a version of themselves or with a robot with some randomness.

      (4) Line 475. Should "Fig. 2D" be "Fig. 2B"?

      Thank you for catching this error. The reference has been corrected to Fig. 2B (see p.15, l.522).

      Reviewer #3 (Recommendations for the authors):

      (1) The analysis of reaction times shows no difference between groups in the passive block, which challenges the assumption that movement vigor covaries with decision speed or action initiation speed. It may be worth discussing this in the context of recent literature.

      We agree that the initial analysis and discussion of reaction times were too superficial. In the revised manuscript, we now report that dyadic interaction leads to significantly shorter reaction times (p.4, l.100–109), concomitantly with improved movement velocity. We have also expanded the Discussion, on the relationship between decision and action speeds/durations (p.11, l.340–348).

      (2) Many abbreviations are unusual for a non-expert. I would recommend using the full terms instead. At least initially, I found it difficult to follow the results because the abbreviations were not immediately clear (at least to me).

      We agree that the paper had to many abbreviations. Therefore, we have removed the abbreviated names of the models and, when possible without impacting the readability, used the full names of the conditions.

      (3) Relatedly, the notation in Figure 1 may be confusing. The labels "S" and "F" (slow and fast) correspond to different concepts than "F" and "L" (follower and leader), so the same participant could be labeled "F" as fast but not "F" as a leader.

      Thank you for pointing out this potential source of confusion. We have therefore modified Fig. 1A (p.2) to avoid any potential confusion by using the full model names rather than abbreviations. In the remainder of the manuscript, "S" and "F" exclusively denote the slower and faster partners within a dyad, and we do not use abbreviations for "leader" or "follower" in the text.

      (4) In figures like 2.C and 3.I, keeping the same scales on the x and y axes and adding a diagonal reference line would make it easier to see shifts across conditions.

      As explained in the Methods, vigor scores in the low- and high-viscosity conditions were computed using the average movement durations from the NF1 condition as a reference. Consequently, because movements are slower in these conditions, the corresponding vigor values are lower than those in NF1. For this reason, using identical scales on the x- and y-axes and adding a 45◦ reference line could mislead the reader in thinking that the vigor scores are expected to be identical and reduce the readability of the figure.

      (5) Multiple hypotheses about dyadic regulation of vigor are nicely explained; it could help to indicate if any of these were a priori favored based on prior literature.

      Previous literature provides mixed evidence regarding how vigor might be regulated in dyadic interaction. For instance, Takagi et al. (2016) [15] reported that mechanically connected partners may rely on independent motor plans, which corresponds to the co-activity hypothesis considered here. However, in that study, movement duration was prescribed. We therefore expected that removing this constraint on movement duration could allow coordination strategies to emerge, particularly in view of findings on haptic communication during tracking of random targets while connected via an elastic band [13,14].

      At the same time, a large body of work on human–human and human–robot interaction has interpreted coordination through a leader–follower framework. In our context, vigor is understood as the outcome of a tradeoff between effort and elapsed time, with time being associated with a decaying reward. Based on this framework, we hypothesized a priori that a leader–follower scheme would emerge, in which the fast partner—being more sensitive to time costs and/or less sensitive to effort—would tend to drive the interaction, even at the expense of increased effort. For these reasons, the leader–follower hypothesis was formulated as the expected outcome throughout the manuscript.

      (6) In the introduction, statements such as "relative vigor of an individual is remarkably stable" appear true only in the solo condition. The same is true in the discussion where it is said that vigor is a stable trait. The whole study show that an individual can shift his/her vigor to the same vigor of another individual, so it doesn’t appear stable to me in such conditions but adaptable.

      Let us first clarify that when we describe vigor as “remarkably stable”, we do not imply that individuals do not adjust their movement timing in response to changes in external dynamics. For example, movement durations increase in visco-resistive conditions even during solo performance; nevertheless, individuals who move faster in the absence of resistance will remain faster relative to others when resistance is introduced. In this sense, stability refers to the preservation of relative rankings across conditions, rather than invariance of absolute movement timing. Because interaction with another individual constitutes a substantial change in task dynamics, an effect on individual pace is therefore expected.

      Told that (and as pointed to by the reviewer) (i) dyadic interactions lead to the emergence of a dyadic vigor characterized by average movement durations close to those of the fast partners, while the ranking across dyads is largely imposed by the slow partners; and (ii) these adaptations persist after the interaction phase. Importantly, the observed vigor adaptations appear to last longer in our physical interaction task than in previous attempts to manipulate vigor using visual feedback [16]. To account for this adaptability of vigor, we have (i) clarified claims in the Introduction regarding the stability of vigor (see p.1, l.18–20), and (ii) expanded the Discussion to more explicitly address vigor adaptability and the possible resulting consequences for the concept of vigor (see p.12, l.407–412).

      References

      (1) O. Labaune, T. Deroche, C. Teulier, and B. Berret, “Vigor of reaching, walking, and gazing movements: on the consistency of interindividual differences,” Journal of Neurophysiology, vol. 123, pp. 234–242, jan 2020.

      (2) L. Rigoux and E. Guigon, “A model of reward-and effort-based optimal decision making and motor control,” PLoS Computational Biology, vol. 8, pp. 1–13, Jan. 2012.

      (3) R. Shadmehr, J. J. O. de Xivry, M. Xu-Wilson, and T.-Y. Shih, “Temporal discounting of reward and the cost of time in motor control,” Journal of Neuroscience, vol. 30, pp. 10507–10516, aug 2010.

      (4) B. Berret and G. Baud-Bovy, “Evidence for a cost of time in the invigoration of isometric reaching movements,” Journal of Neurophysiology, vol. 127, pp. 689–701, feb 2022.

      (5) D. Verdel, O. Bruneau, G. Sahm, N. Vignais, and B. Berret, “The value of time in the invigoration of human movements when interacting with a robotic exoskeleton,” Science Advances, vol. 9, sep 2023.

      (6) K. Jimura, J. Myerson, J. Hilgard, T. S. Braver, and L. Green, “Are people really more patient than other animals? evidence from human discounting of real liquid rewards,” Psychonomic Bulletin & Review, vol. 16, pp. 1071–1075, dec 2009.

      (7) P. L. Gribble, L. I. Mullin, N. Cothros, and A. Mattar, “Role of cocontraction in arm movement accuracy,” Journal of Neurophysiology, vol. 89, pp. 2396–2405, may 2003.

      (8) B. Berret and F. Jean, “Why Don’t We Move Slower? The Value of Time in the Neural Control of Action,” Journal of Neuroscience, vol. 36, pp. 1056–1070, Jan. 2016.

      (9) R. Shadmehr and A. A. Ahmed, Vigor : neuroeconomics of movement control. The MIT Press, 2020.

      (10) D. Thura, A. M. Haith, G. Derosiere, and J. Duque, “The integrated control of decision and movement vigor,” Trends in Cognitive Sciences, vol. 29, pp. 1146–1157, Dec. 2025.

      (11) P. M. Fitts, “The information capacity of the human motor system in controlling the amplitude of movement,” Journal of Experimental Psychology, vol. 47, pp. 381–391, June 1954.

      (12) K. B. Reed and M. A. Peshkin, “Physical collaboration of human-human and human-robot teams,” IEEE Transactions on Haptics, vol. 1, pp. 108–120, July 2008.

      (13) G. Gowrishankar, A. Takagi, R. Osu, T. Yoshioka, M. Kawato, and E. Burdet, “Two is better than one: physical interactions improve motor performance in humans,” Scientific Reports, vol. 4, Jan. 2014.

      (14) A. Takagi, G. Ganesh, T. Yoshioka, M. Kawato, and E. Burdet, “Physically interacting individuals estimate the partner’s goal to enhance their movements,” Nature Human Behaviour, vol. 1, pp. 1–6, Mar. 2017.

      (15) A. Takagi, N. Beckers, and E. Burdet, “Motion plan changes predictably in dyadic reaching,” PLOS ONE, vol. 11, p. e0167314, Dec. 2016.

      (16) P. Mazzoni, B. Shabbott, and J. C. Cortes, “Motor control abnormalities in Parkinson’s disease,” Cold Spring Harbor Perspectives in Medicine, vol. 2, pp. a009282–a009282, Mar. 2012.

    1. Author response:

      Common responses:

      We thank the editors for considering our paper and the reviewers for their thoughtful and detailed feedback. Based on the comments, we will revise our manuscript to better describe how our approach differs from modeling strategies that are common in the field. We also aim to elaborate on the advantages of fastFMM and what scientific questions it is designed to answer. Finally, we will provide more background on our example analyses and the interpretation of the results.

      Within this response, “within-trial timepoints”, “time-varying predictors/behaviors”, and “signal magnitude” are used as specific examples of the general concepts of functional domain”, “functional co-variates”, and “functional outcome”, respectively. To make statements or examples more concrete, we may use the former neuroscience-specific terms when making general claims about functional models.

      - ncFLMM, cFLMM: non-concurrent or concurrent functional linear mixed models.

      - FUI: fast univariate inference. An approximation strategy to perform FLMM Cui et al. (2022).

      - fastFMM the R package that implements FUI.

      - CI confidence interval.

      Before specific line-by-line responses, we provide a brief comparison between cFLMM and fixed effects encoding models. All three reviewers suggested that fixed effects models could be an existing alternative to cFLMM (Reviewer 1 (1B), Reviewer 2 (2C), Reviewer 3 (3A)). Their shared comments highlight that our revision should articulate the advantages and applications of cFLMM relative to existing analysis strategies.

      Functional regression methods like cFLMM produce functional coefficient estimates that quantify how the magnitude of predictor-signal associations evolve across an ordered functional domain such as within-trial timepoints. Standard scalar outcome regression methods, like the GLMs specified in Engelhard et al. (2019), model these associations and their corresponding coefficients as fixed across the functional domain. While GLM encoding models may include time-varying predictors, these analysis strategies do not model the predictor–signal association as changing over the functional domain.

      Moreover, encoding models are less suited to hypothesis testing in clustered or longitudinal settings (e.g., repeated-measures datasets) and yield regression coefficient estimates that are only interpretable with respect to the units of the basis functions. In contrast, cFLMM provides time-varying coefficient estimates that are interpretable as statistical contrasts in terms of the original variables and produces hypothesis tests in clustered settings. cFLMM can be applied to datasets that define covariates in terms of the same flexible representations of covariates used in encoding models; this is a modeling choice rather than a methodological characteristic.

      The remainder of this provisional author response will respond to reviewers’ concerns line-by-line, approximately in the order they appear.

      Reviewer #1 (Public review):

      We thank Reviewer 1 for their comments, especially their efforts to provide first-hand experience with loading and applying fastFMM. We hope that recent improvements to fastFMM’s public release and vignettes address Reviewer 1’s concerns about ease-of-use.

      (1A) Overall, while they make a compelling case that this approach is less biased and more insightful, the implementation for many experimentalists remains challenging enough and may limit widespread adoption by the community.

      We believe the reviewer may have experimented with an old version of fastFMM, so their experience may not reflect recent rewrites and improvements. fastFMM v1.0.0+ is now stable, validated on CRAN, and contains new example data and step-by-step tutorials. We designed fastFMM’s model-fitting code to be similar to common GLM packages in R to reduce the learning curve for new users.

      (1B) …a clearer presentation of how common implementations in the field are performed (i.e. GLM) and how one could alternatively use the cFLMM approach would help.

      We will provide a clearer description of existing methods in the revised manuscript. Briefly, inference with fastFMM can accommodate large datasets that contain clustered data, repeated measures, or complex hierarchical effects, e.g., experiments with multiple animals and multiple trials per animal. When encoding models are fit to each cluster (e.g., animal, neuron) separately, we are not aware of a principled method to pool these cluster-specific models together to quantify uncertainty or yield an appropriate global hypothesis test.

      Reviewer #2 (Public review):

      Reviewer 2’s thoughtful feedback helped structure our points in the common response above, which we will refer to when applicable. In our response, we aim to clarify the problems that cFLMM solves and characterize the advantages in interpretability.

      (2A) The aim of incorporating variables that change within trial into this framework is interesting, and the technical implementation appears to be rigorous. However, I have some reservations as to whether the way in which variables that change within trial have been integrated into the analysis framework is likely to be widely useful, and hence how impactful the additional functionality of cFLMM relative to the previously published FLMM will be.

      We hope that the common response addresses these concerns. We were motivated to provide a concurrent extension of fastFMM based on our experience with statistical consulting in neuroscience research. Questions that benefit from a functional approach are common and often not adequately modeled with a non-concurrent approach, such as the variable trial length analysis we describe below.

      (2B) It is less clear that this approach makes sense for variables that change within trial…This partitioning of variance in the predictor into a between-trial component whose effect on the signal is modeled, and a within-trial component whose effect on the signal is not, is artificial in many experiment designs, and may yield hard to interpret results.

      We thank Reviewer 2 for highlighting a point that we did not adequately explain and that we will address further in the revision. The pointwise and joint CIs estimated by fastFMM account for uncertainty in the coefficient estimates due to variation in the predictors across within-trial timepoints. cFLMM targets a statistical quantity, or estimand, that is defined by trial timepoint specific effects, so the first step of our estimation strategy fits separate pointwise mixed models. However, models from every within-trial timepoint are then combined to calculate uncertainty and smooth the coefficient estimates. Thus, the widths of the pointwise and joint CIs depend on the estimated between-timepoint covariance and a smoothing penalty. Loewinger et al. (2025a) provides further details in Appendices 2 and 3, describing the covariance structure and detailing the power improvements of FUI compared to multiple-comparisons corrections.

      Other functional regression estimation strategies jointly fit the entire model with a single regression, e.g., functional generalized estimating equations Loewinger et al (2025b). However, these methods use basis expansions of the coefficients. In contrast, the encoding models mentioned in 2C below and Reviewer 3 (3A) apply basis-expansions of the covariates, and the resulting model does not capture how signal–covariate associations evolve across some functional domain. Although the first stage in the fastFMM approach fits pointwise linear models, this is only one of three steps in the estimation strategy. fastFMM yields coefficient estimates comparable to those that would be obtained from functional regression estimation strategies that jointly estimate the functional coefficients in a single regression. We mention this to distinguish between the target statistical quantity (functional coefficients) and the estimation strategy (pointwise vs. joint).

      (2C) …an alternative approach would be to run a single regression analysis across all timepoints, and capture the extended temporal responses to discrete behavioural events by using temporal basis functions convolved with the event timeseries. This provides a very flexible framework for capturing covariation of neural activity both with variables that change continuously such as position, and discrete behavioural events such as choices or outcomes, while also handling variable event timing from trial-to-trial.

      Our understanding is that the suggested approach aims to quantify the association between the outcome and within-trial patterns in covariates. This is a great question and we will incorporate a discussion of this into the revision. However, temporal basis functions convolved with the covariate time series cannot directly characterize these relationships. Encoding models can detect the contribution of predictors to neural signals while remaining agnostic to the precise relationship, but this flexibility can come at the cost of interpretability. The coefficients of the convolutions may not be translatable into a clear statistical contrast in terms of the original covariates.

      In our paper, we provide examples of cFLMM models with simple signal-covariate relationships. The coefficient estimates quantify the expected change in signal given a one unit change in the original predictors. Let 𝑌(𝑠) be the outcome and 𝑋(𝑠) be some covariate at within-trial timepoint 𝑠. For brevity, we will suppress subject/trial indices and random effects in the following notation. The coefficient at time point 𝑠 can be captured by the generic mean model

      𝔼[𝑌(𝑠) ∣ 𝑋(𝑠) = 1] − 𝔼[𝑌 (𝑥)|𝑋(𝑠) = 0].

      In contrast, the change in signal associated with patterns in within-trial covariates can be written as

      𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 1] − 𝔼[𝑌 (𝑠<sub>1</sub>) ∣ 𝑋(𝑠<sub>2</sub>) = 0]

      for all pairs of timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. While simple lagged or offset outcome-predictor associations can be incorporated as covariates in cFLMM, the approach does not capture all within-trial timepoints 𝑠<sub>1</sub>, 𝑠<sub>2</sub>. Encoding models also do not target the above estimand. Instead, a full function-on-function regression could estimate the above. This topic can be incorporated into our revision and may be a future line of inquiry.

      (2D) In the Machen et al. data…From the resulting beta coefficient timeseries (Figure 3C) it is not straightforward to understand how neural activity changed as the subject approached and then received the reward. A simpler approach to quantify this, which I think would have yielded more interpretable coefficient timeseries would have been to align activity across trials on when the subject obtained the reward. More broadly, handling variable trial timing in analyses like FLMM which use trial aligned data, can be achieved either by separately aligning the data to different trial events of interest or by time warping the signal to align multiple important timepoints across trials.

      In this experiment, mice waited in a trigger zone, ran through a linear corridor, then received a food reward in the reward delivery zone of either water or strawberry milkshake Machen et al. (2026). Mice received different rewards between sessions but the same reward within all trials of a given session. This design complicated the analysis, as the reward type produced prominent differences in average latency (water: 3.3 seconds, milkshake: 2.0 seconds). The authors wanted to disentangle whether mean differences in the signal across reward types reflected differences in motivation to obtain the reward or differences in reaction to reward receipt.

      We agree that performing a reward-aligned analysis would be an intuitive approach to visualize the differences in average signal for mice that received milkshake compared to water. In fact, we provide a ncFLMM reward-aligned analysis in Figure S1 of Machen et al. (2025). We will add this analysis to the revision and thank the reviewer for the suggestion. We emphasize, however, that this method answers a different question. It does not identify how the signal change associated with receiving the milkshake evolves with respect to latency, especially if the relationship is non-linear. Time warping faces similar obstacles in this setting, especially since sufficiently flexible curve registration can induce similarity due purely to noise. Generally, time warping does not lend itself to hypothesis testing as it is unclear how to propagate uncertainty from the time warping model into final hypothesis tests.

      We believe cFLMM is an appropriate choice for the specific question, and we will revise the manuscript to better reflect its advantages. The functional coefficient estimates in Figures 3C-iii and 3C-iv provide insights that are not possible to derive from the proposed alternatives. For example, we can infer that for short latencies, we do not see a significant difference in signal magnitude for mice receiving water and mice receiving the milkshake. However, for latencies longer than around 2 seconds, receiving the milkshake is associated with an additional positive change in signal. We agree that we should make Figure 3C and the accompanying discussion more clear and thank Reviewer 2 for their feedback on interpretation.

      Reviewer 3 (Public review):

      (3A) …it is not clear what the conceptual or methodological advance of this work is. As it is written, the manuscript focuses on showing how concurrent regressors offer interpretation advantages over non-concurrent regressors. While the benefit of such time-varying regressors is supported by previous literature (e.g., Engelhard et al., 2020), it is not clear whether the examples provided in the current study clearly support the advantage of one over the other…

      We assume Reviewer 3 is referencing “Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons Engelhard et al. (2019). We hope that the Common response sufficiently contrasts the settings where each approach can be applied. Because these models have different goals and assumptions, they are appropriate for answering different questions.

      (3B) In this specific example, if the question is about speed and reward type, why variables such as latency to reward or a binary “reward zone vs corridor” (RZ) regressors are used instead of concurrent velocity (or peak velocity - in the case of the non-concurrent model)? Furthermore, if timing from trial start to reward collection is variable, why not align to reward collection, which would help in the interpretation of the signal and comparison between methods? Furthermore, while for the non-concurrent method, the regressors' coefficients are shown, for the concurrent one, what seems to be plotted are contrasts rather than the coefficients. The authors further acknowledge the interpretational difficulties of their analysis.

      Thank you for pointing out that we were not clear. This was mentioned by multiple reviewers and highlights the need to elaborate on our motivation in the revision. In this example, we wanted to investigate the change in signal-reward association as a function of within-trial timepoints, not the association between instantaneous velocity and the signal. “Slow” or “fast” means “mouse with below or above average latency”. We ask you to please refer to Reviewer 2 (2C) where we discuss why event alignment is an insufficient correction.

      The functional coefficient estimates in Figure 3C are interpreted as contrasts because the fixed effect coefficients capture the difference in expected signal between strawberry milkshake and water along the functional domain. An advantage of cFLMM is that it is easy to specify models in which the coefficients correspond to interpretable contrasts of the signal across conditions. The coefficient estimate shown in Figure 3B-ii also corresponds to a contrast because the estimates capture the difference in mean signal from strawberry milkshake and water. Equations (7) and (8) in the section “Materials and methods” and sub-section “Variable trial length analysis” provide additional details on the fixed effect coefficients. Based on this confusion, we will convert the two 1 x 4 sub-plots of 3B and 3C into two 2 x 2 sub-plots to avoid unintended direct comparisons.

      To contextualize how we “acknowledge the interpretational difficulties of [our] analysis”, we stated that a non-concurrent FLMM attempting to control for a time-based covariate is difficult to interpret. The concurrent FLMM provides a straightforward interpretation directly related to the question of interest, which we discuss above in Reviewer 2 (2D).

      (3C) Because the relation between behavioral variables and neuronal signal is not instantaneous, previous literature using fixed effects uses, for example, different temporal lags, splines, and convolutional kernels; however, these are not discussed in the manuscript.

      Thank you for this suggestion. All three reviewers raised this topic (see Reviewer 1 (1B), Reviewer 2 (2C), and the Common responses), and we will incorporate our response in the revision.

      (3D) From the methods, it seems that in the concurrent version of fastFMM, both concurrent and non-concurrent regressors can be included, but this is not discussed in the manuscript.

      This is an important point that we mentioned implicitly. In our cFLMM specification of the Jeong et al. (2022) model, “we incorporated trial-specific covariates for trial number and session, modeling these as increasing numerical values rather than identical categorical variables”, which are also plotted in Appendix 3. In Box 1, “if the functional covariate of interest is a scalar constant across the domain, the models fit by the concurrent and non-concurrent procedure are identical”. We will explicitly point out that cFLMM can perform inference on combinations of functional and constant covariates.

      (3E) The methodological advance is not clearly stated, apart from inputting into fastFMM a 3D matrix of regressors x trial x timepoint, instead of a 2D matrix of regressors x trial.

      Prior to our work described in this Research Advance, it was not obvious that the existing approximation approach in fastFMM could be generalized to cFLMM. During the writing of the article, a fastFMM user reached out for help with producing pseudo-concurrent FLMMs by duplicating rows in a nonconcurrent model, which both underscores the unmet need for cFLMMs and the difficulty in fitting them with available tools.

      The “under-the-hood” differences are described in Appendix 4. Concurrent FLMM with fast univariate inference was theoretically possible as early as Cui et al. (2022). The univariate step was straightforward, but guaranteeing “fast” and “inference” was not. We needed to verify, for example, that the method-of-moments estimation of the random effects covariance matrix generalized to cFLMM, which is not a trivial step. Characterizing whether the method achieved asymptotic coverage required extensive simulation studies (Figure 4, Appendix 2). Future work may focus on fully characterizing the asymptotic convergence in high noise or high complexity regimes.

      (3F) This manuscript is neither a clear demonstration of the need for concurrent variables, nor a 'tutorial' of how to use fastFMM with the added extension.

      We hope that the Common responses clarifies how cFLMM compares to existing approaches and fills a gap in the data analysis landscape for neuroscience. The fastFMM R package vignettes contain example analyses, and we intend for these files to be work in tandem with the manuscript. To provide more guidance for interested analysts, we can explicitly reference these tutorials within the revision.

      Planned revisions

      The following summary is not exhaustive.

      Writing additions:

      Per 1B, 2C and 3A, the Common responses will be incorporated in the revision.

      Per 2B, we will discuss function-on-function regression and explore how to estimate statistical contrasts for complex within-trial relationships. Relatedly, we will clarify that the CIs in fastFMM are constructed using an estimate of the within-trial covariance of the predictors, and clarify the definition of pointwise and joint CIs.

      Per 3D, we will explicitly state that concurrent FLMMs can include covariates that are constant over within-trial timepoints.

      Though we cannot prescribe a universally correct model selection procedure, we will mention that AIC, BIC, and other summary statistics can inform the specification of the random effects.

      Analysis modifications:

      Parts of Appendix 3 may be included in Figure 2 to directly address the question investigated by Jeong et al. (2022) and Loewinger et al (2024).

      When discussing Machen et al. (2025) data, the supplementary analysis with reward-aligned ncFLMM models might be added to clarify the ncFLMM/cFLMM difference.

      Per \ref{rvw2:encoding}, the additional analysis aimed at disentangling latency and reward in Machen et al.’s variable trial length data may be incorporated as an additional sub-figure in Figure 3.

      Aesthetic changes:

      Figure 3 will be reorganized to avoid unintended direct comparisons between the coefficients of the non-concurrent and concurrent model.

      Citations for Machen et al. (2026) will be updated to reflect publication of the preprint.

      The version number for fastFMM will be updated.

      References

      Cui E, Leroux A, Smirnova E, Crainiceanu CM. Fast Univariate Inference for Longitudinal Functional Models. Journal of Computational and Graphical Statistics. 2022; 31(1):219–230. https://doi.org/10.1080/10618600.2021.1950006, doi: 10.1080/10618600.2021.1950006, pMID: 35712524.

      Engelhard B, Finkelstein J, Cox J, Fleming W, Jang HJ, Ornelas S, Koay SA, Thiberge SY, Daw ND, Tank DW, Witten IB. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature. 2019 Jun; 570(7762):509–513. https://www.nature.com/articles/s41586-019-1261-9, doi: 10.1038/s41586-019-1261-9.

      Jeong H, Taylor A, Floeder JR, Lohmann M, Mihalas S, Wu B, Zhou M, Burke DA, Namboodiri VMK. Mesolimbic dopamine release conveys causal associations. Science. 2022; 378(6626):eabq6740. https://www.science.org/doi/abs/10.1126/science.abq6740, doi: 10.1126/science.abq6740.

      Loewinger G, Cui E, Lovinger D, Pereira F. A statistical framework for analysis of trial-level temporal dynamics in fiber photometry experiments. eLife. 2025 Mar; 13:RP95802. doi: 10.7554/eLife.95802.

      Loewinger G, Levis AW, Cui E, Pereira F. Fast Penalized Generalized Estimating Equations for Large Longitudinal Functional Datasets. ArXiv. 2025 Jun; p. arXiv:2506.20437v1. https://pmc.ncbi.nlm.nih.gov/articles/PMC12306803/.

      Machen B, Miller SN, Xin A, Lampert C, Assaf L, Tucker J, Herrell S, Pereira F, Loewinger G, Beas S. The encoding of interoceptive-based predictions by the paraventricular nucleus of the thalamus D2R+ neurons. iScience. 2026 Jan; 29(1):114390. doi: 10.1016/j.isci.2025.114390.

    1. Author response:

      Reviewer 1 (Public review):

      Summary:

      This study aims to test whether human mate choice is influenced by HLA similarity while accounting for genome-wide relatedness, using the Himba as an evolutionarily relevant small-scale society population, unique among most HLA-mate choice studies. By comparing self-chosen ("love") and arranged marriages and using NGS-based 8-locus HLA class I and II sequences and genome-wide SNP data, the authors ask whether partners who freely choose each other are more HLA-dissimilar than those paired through social arrangements or random pairs. They further extend their work by examining functional differences in peptide-binding divergence among pairs and predicted pathogen recognition in potential offspring.

      Strengths:

      This study has many strengths. The most obvious is their ability to test for HLA-based mate choice in the Himba, a non-European, non-admixed, small-scale society population, the type of population that has been missing, in my opinion, from the majority of HLA mate choice studies. While Hedrick and Black (1997) used a similarly evolutionarily relevant remote tribe of native South Americans, they only considered 2 class I loci (HLA-A and HLA-B) at the first typing field (serological allele group) and did not have data for genome-wide relatedness. The Himba are also unique among previously studied populations because they have both socially arranged and self-chosen partnerships, so the authors could test if freely-chosen partners had lower MHC-similarity than assigned or randomly chosen partners.

      Another key strength of the study was the relatively large sample size (HLA allele calls from 366 individuals, 102 unrelated) and 219 individuals with HLA data, whole genome SNP data, and involved in a partnership.

      The study was also unique among HLA-mate choice studies for comparing peptide binding region protein divergence (calculated as the Grantham distance between amino acid sequences) among partner types and randomly generated pairs. This was also the first time I have seen a study use peptide binding prediction analysis of relevant human pathogens for potential offspring among partners to test if there would be a pathogen-relevant fitness benefit of partner selection.

      Weaknesses:

      My main concerns relate to the reliance on imputed HLA haplotypes and on IBD-based metrics in a region of the genome where both approaches are known to be problematic.

      First, several key results depend on HLA haplotypes inferred through imputation rather than directly observed sequence data. The authors trained HIBAG imputation models on Himba SNP data across the full 5 Mb HLA region using paired HLA allele calls from target capture sequencing (L251-253). However, the underlying SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, meaning that both SNP discovery and subsequent imputation depend on the haplotypes represented in that reference panel. As a result, the imputation framework is likely biased toward common haplotypes shared between the Himba and Yoruba populations, while rare or Himba-specific HLA alleles are less likely to be imputed accurately or at all. This limitation has been noted previously for HLA imputation, particularly for novel or low-frequency variants and for populations that are poorly represented in reference panels. While the authors compare (first-field) imputed alleles to sequenced alleles to assess imputation accuracy, this validation step itself may be biased toward the same common haplotypes that are easiest to impute. This becomes especially problematic if IBD is inferred using imputed haplotypes, because haplotype sharing would then primarily reflect common, reference-supported haplotypes, while true population-specific variation would be effectively invisible. In this scenario, downstream estimates of IBD sharing may be inflated for common haplotypes and deflated for rare ones, potentially biasing conclusions about haplotype sharing, selection, and mate choice at the HLA region.

      We appreciate the reviewer's concern, but would like to clarify two important misunderstandings in this assessment.

      First, the reviewer suggests that our SNP data were generated by mapping reads to a 1000 Genomes Yoruba reference, and that IBD inference may therefore be biased toward haplotypes common between the Himba and Yoruba. This is not the case. Our SNP genotype data were generated from the H3Africa and MEGAex genotyping arrays, which incorporated diverse reference variation to minimize ascertainment bias in non-European ancestries. No read mapping to a Yoruba reference genome was involved in SNP discovery or genotyping. The Yoruba 1000 Genomes data were used solely to provide an ancestry-matched recombination map for phasing and IBD calling–this would not bias IBD inference toward common Yoruba haplotypes. The reviewer's concern about imputation-driven inflation of IBD sharing for common haplotypes should not be relevant in our case.

      Second, regarding HLA haplotype resolution: we trained a bespoke HIBAG model directly on the Himba SNP array genotype data paired with ground-truth HLA allele calls from our own targeted HLA capture sequencing. This Himba-specific model was then used to impute HLA alleles from pseudo-homozygous genotypes derived by extracting phased SNP-based haplotypes across the HLA region for the same individuals. In this way we resolved the phase of the HLA allele calls.. To our knowledge, this paired-data approach to individual-level HLA haplotype resolution is novel; existing HLA haplotype resolution tools generally provide only population-level haplotype frequency estimates rather than individual-level phase assignments. We are confident in the reliability of the haplotypes we report. Resolved haplotypes were required to match the known targeted-sequencing HLA allele calls at a minimum of the first field for at least one allele, and both haplotypes could not be assigned to the same allele unless the individual's HLA allele calls were homozygous. Of 722 total haplotypes, 698 were successfully resolved under these criteria. We report results only on these confidently resolved haplotypes.

      Second, the interpretation of excess identity-by-descent (IBD) sharing in the HLA region is difficult given the well-documented genomic properties of this locus. The classical HLA region is highly gene-dense, structurally complex, and characterized by extreme heterogeneity in recombination rates, with pronounced hot- and cold-spots (Miretti et al. 2005; de Bakker et al. 2006, reviewed in Radwan et al. 2020). Elevated IBD in such regions can arise from low recombination, background selection, or demographic processes such as bottlenecks, all of which can mimic signals of recent positive selection. While the authors suggest fluctuating or directional selection, extensive haplotype sharing is also consistent with long-term balancing selection at the MHC (Albrechtsen et al. 2010) or recent demographic history in this population.

      We thank the reviewer for highlighting the difficulty in modeling selection at the HLA - a problem that deserves considerable attention. We acknowledge that demographic processes such as the documented Himba population bottleneck can result in elevated IBD sharing (Swinford et al. 2023, PNAS). However, our comparison of HLA IBD sharing rates against a genome-wide baseline is designed to address this: demographic processes affect all regions of the genome, so if the HLA region maintains elevated IBD sharing significantly above the genome-wide threshold, this provides meaningful evidence for a locus-specific effect beyond demographic history alone.

      We agree with the reviewer that the recombination landscape of the HLA region is complex, but this complexity itself is consistent with the region being a frequent target of selection. Previous HLA analyses have found that at the allele level, frequencies are consistent with balancing selection, while multi-locus haplotype frequencies are consistent with purifying selection and positive frequency-dependent selection (Alter et al., 2017), patterns that contribute to the complex recombination rate heterogeneity observed in the region. Recombination rate can be both a cause of extended haplotypes but also the consequence of selection against combinations of alleles.

      As Alter et al. note, the high levels of linkage disequilibrium observed among HLA alleles serve to limit the amount of diversity within HLA haplotypes, but balancing selection at the allelic level maintains multiple HLA haplotypes at high frequency across populations over long periods of time — so-called "conserved extended haplotypes" as we observe (Supplementary Figures 1 and 9). Regarding the specific selective mechanism, our results are not equally consistent with all forms of balancing selection. Albrechtsen et al. (2010) explicitly modeled overdominant balancing selection and demonstrated that equilibrium overdominance does not produce elevated IBD sharing as we observe — our results are therefore inconsistent with this mechanism. Instead, Albrechtsen et al. conclude that allele frequency change is required to generate elevated IBD, consistent with bouts of directional selection such as negative frequency-dependent or fluctuating positive selection. We will make explicit that while our findings do not support overdominance, they are consistent with these temporally dynamic forms of selection driving periodic allele frequency change at the HLA locus. We will also incorporate local recombination rate into Figure 4 to provide a comparison of local recombination rate across chromosome 6 with the observed areas of elevated IBD sharing.

      Alter, I., Gragert, L., Fingerson, S., Maiers, M., & Louzoun, Y. (2017). HLA class I haplotype diversity is consistent with selection for frequent existing haplotypes. PLoS computational biology, 13(8), e1005693.

      Beyond these main issues, there are several additional concerns that affect interpretation. Sample sizes and partnership counts are sometimes unclear; some figures would benefit from clearer scaling (Figure 1) and annotation (Figures S6 and S7), and key methodological choices (e.g., treatment of DRB copy number variation, no recombination correction in IBD calling) require further explanation. Finally, some conclusions, particularly those invoking optimality or specific selective mechanisms, are not directly tested by the analyses presented and would benefit from more cautious framing.

      We will clarify the presentation of partnership counts and sample sizes throughout the manuscript and improve the scaling and annotation of the flagged figures. Regarding DRB copy number variation, we will add explicit discussion of our analytical choices and their potential limitations. As described in our responses to the main concerns above, we will also provide more nuanced framing of the selective mechanisms consistent with our IBD results, avoiding conclusions that go beyond what our analyses directly support.

      Reviewer #2 (Public review):

      Summary:

      Evidence for the influence of MHC on mate choice in humans is challenging, as social structures and norms often confound the power of studying populations. This study uses an unusual, diverse, but relatively isolated population that allows a direct comparison of arranged and chosen partners to determine if MHC diversity is increased when choice drives mate choice. Overall, the authors use a range of genetic analyses to determine individual relationships alongside different measures of MHC diversity and potential selection pressures. The overall finding that there is no heterozygous dissimilarity difference between arranged and chosen partners. There is evidence of positive selection that may be a stronger driver, or at least it may mask other selection forces.

      Strengths:

      A rare opportunity to study human mate choice and genetic diversity. An excellent range of data and analysis that is well applied, and all results point to the same conclusion.

      Overall, this is a very well-written and concise paper when considering the significant amount of data and excellent analysis that has been undertaken.

      Weaknesses:

      (1) For the type of samples and data available, none are obvious.

      (2) Although this paper is clearly focused on humans, I was expecting more discussion around the studies that have been undertaken in animals. It is likely that between populations and species, there are different pressures that have driven the MHC evolution, but also mate choice.

      We will improve the framing of our project within the broader non-human MHC mate choice literature in our discussion.

      (3) The peptide presentation based on pathogen genomes is interesting but usually not significant. I wondered if another measure of MHC haplotype diversity to complement this would be the overall repertoire of peptides that could be presented, pathogen-based or otherwise. There is usually significant overlap in the peptides that can be presented, for example, between HLA-A and HLA-B, and this may reveal more significant differences between the alleles and haplotype frequencies.

      We would like to clarify that we did assess the unique pathogen peptides bound across all HLA class I and class II genes by each population's common haplotypes (Figures S12–S13). We acknowledge the reviewer's point that non-pathogenic peptides are also important — for example, binding with self-produced proteins. However, binding with self-produced proteins is more relevant to autoimmune risk, and the selective pressures involved are outside the scope of our current work, which focuses on pathogen-induced fluctuating directional selection and heterozygote advantage. Furthermore, selection on non-pathogenic peptide binding repertoires likely operates in the opposite direction to pathogen repertoire; whereas broader pathogen peptide binding is advantageous, broader self-peptide binding risks excessive immune activation.

      Reviewer #3 (Public review):

      The study investigates MHC-related mate choice in humans using a sample of couples from a small-scale sub-Saharan society. This is an important endeavour, as the vast majority of previous studies have been based on samples from complex, highly structured societies that are unlikely to reflect most of human evolutionary history. Moreover, the study controls for genome-wide diversity, allowing for a test of the specificity of the MHC region, as theoretically predicted. Finally, the authors examine potential fitness benefits by analysing predicted pathogen-binding affinities. Across all analyses, no deviations from random pairing are detected, suggesting a limited role for MHC-related mate choice in a relatively homogeneous society. Overall, I find the study to be carefully executed, and the paper clearly written. Nevertheless, I believe the paper would benefit if the following points were considered:

      (1) The authors claim (p. 2, l. 85) that their study is the first to employ a non-European small-scale society. I believe this claim is incorrect, as Hendrick and Black (1997) investigated MHC similarity among couples from South American indigenous populations.

      We thank the reviewer for this important clarification. Our claim was intended to be more specific: to our knowledge, this is the first study to investigate HLA-based mate preferences in a non-European small-scale society while explicitly controlling for genome-wide relatedness. Hedrick and Black (1997) did not include genome-wide relatedness controls, which is a critical distinction given that ancestry-assortative mating can produce spurious patterns of HLA similarity or dissimilarity in the absence of such correction. We will make this qualification explicit in the revised manuscript.

      (2) Regarding the argument that in complex societies, mating with a random individual would already result in sufficient MHC dissimilarity (p. 2, 78), see the paper from Croy et al. 2020, which used the largest sample to date in this research area.

      We thank the reviewer for this reference. In our revision, we will incorporate Croy et al. (2020) into our discussion and use it as a reference for comparing the Himba’s probability of highly homozygous offspring given population allele frequencies. This comparison will help support our claim that background HLA diversity in the Himba is sufficiently high so that any unrelated partner is already likely to yield adequately dissimilar offspring—a scenario that would reduce the selective benefit of active HLA-based mate choice and could mask any such preference even if it exists.

      (3) Dataset. As some relationships are parallel, I assume that certain individuals entered the dataset multiple times. This should be explicitly reported in the Methods. If I understand the analyses correctly, this non-independence was addressed by including individual identity as a random effect in the model - the authors should confirm whether this is the case. I am also wondering to what extent so-called "discovered partnerships" may affect the results. Shared offspring may be the outcome of short or transient affairs and could have a different social status compared with other informal relationships. Would the observed patterns change if these partnerships were excluded from the analyses?

      The reviewer is correct that individuals appear multiple times in the dataset—some individuals are members of multiple known partnerships, and all individuals are additionally included many times across the full set of possible random heterosexual pairings that meet our age and relatedness criteria. This non-independence is explicitly addressed in our dyadic linear mixed models by including female ID and male ID as random effects, which account for each individual's unique contribution to their similarity scores across all pairings, both real and random. We explain this explicitly in the (n) Statistical Models section of the methods section.

      Regarding discovered partnerships: we grouped these with reported informal partnerships in the current analyses due to modest sample sizes. We agree this is worth examining more carefully and will test, in our revision, whether treating discovered partnerships as a separate category, or excluding them entirely, meaningfully affects our results. We will report these analyses as a sensitivity check.

      (4) How many pairs were due to relatedness closer than 3rd degree? In addition, why was 4th degree relatedness used as a threshold in some of the other analyses?

      This information is reported in the (n) ‘Statistical Models section of the Methods’. No pairs were found to be closer than 3rd degree relatives. No arranged marriages were related at 3rd degree or closer; 1 love match marriage and 2 informal partnerships discovered through pedigree analysis were found to be 3rd degree relatives.

      Regarding the difference in relatedness thresholds: we used a 4th degree cutoff to define the unrelated set of individuals for allele and haplotype frequency analyses (n=102), as even 3rd degree relatives would inflate allele frequency estimates. In contrast, we permitted 3rd degree relatives in the background distribution for the partnership analyses to reflect the stated cultural preference for cousin marriages in arranged unions—excluding them would have made the background distribution less representative of the actual mating pool. We explain both decisions in Methods sections (d) and (n).

      (5) I was surprised by the exclusion of HIV, given that Namibia has a very high prevalence of HIV in the general population (e.g., Low et al. 2021).

      While HIV prevalence is indeed high in Namibia generally, the Himba are a relatively isolated population and, based on personal communication with Dr. Ashley Hazel—who has extensive field experience studying sexually transmitted infections in the Himba (see references 36, 52, 53, and 54)—there is no evidence of HIV transmission within this population. Dr. Hazel's expertise on this question was the basis for our exclusion of HIV from the pathogen list.

      (6) It appears that age criteria were applied when generating random pairs (p. 8, l. 350). Could the authors please specify what they consider a realistic age gap, and on what basis this threshold was chosen? As these are virtual couples used solely to estimate random variation within the population, it is not entirely clear why age constraints are necessary. Would the observed patterns change if no age criteria were applied?

      We will clarify this in our revision, but we restricted random couples to have an age gap within the range observed in actual, known partnerships (the woman is maximum 16 years older than then man and minimum 53 years younger than the man). We included this criteria to make sure random couples represented the best approximation of background, realistic partners. Our age gap criteria was quite permissive due to the large range observed in our actual pairs and we do not imagine it significantly impacted our results.

      (7) I think it would be helpful for readers if the Results section explicitly stated that real couples did not differ from randomly generated pairs. At present, only the comparison between chosen and arranged pairs is reported.

      We would like to clarify that for each analysis we explicitly report both the effects of chosen and arranged partnerships relative to the background distribution intercept, and the pairwise contrast between chosen and arranged partnerships. The intercept of each model is derived from the full background distribution of random opposite-sex pairings meeting our age and relatedness criteria, providing a null expectation under random mating. A non-significant effect for both partnership types therefore indicates that neither arranged nor chosen partnerships differ from random mating with respect to the metric in question. We describe this explicitly in the Statistical Models section of the Methods, but we will ensure this interpretation is stated more prominently in the Results section of the revised manuscript to avoid any confusion.

      (8) I appreciate the separate analyses of pathogen-binding properties for MHC class I and class II, given their functional distinctiveness. For the same reason, I would welcome a parallel analysis of MHC sharing conducted separately for class I and class II loci.

      We can incorporate separate HLA similarity/log odds of homozygous offspring analyses for class 1 and class 2 in our revision.

      (9) I think the Discussion would benefit from a more detailed comparison with previous studies. In addition, the manuscript does not explicitly address limitations of the current study, including the relatively limited sample size given the extensive polymorphism in the MHC region.

      We will expand our discussion in the revision to provide a more detailed comparison with previous studies, including Croy et al. (2020), and will add an explicit limitations section incorporating suggestions from multiple reviewers on more careful framing of optimality and specific selective mechanisms. Regarding sample size, we acknowledge this as a genuine limitation given the extensive polymorphism of the MHC region. However, our unrelated sample size used for allelic diversity estimated is comparable to previous studies in African populations (Figure 1), and our dataset is uniquely comprehensive in combining HLA class I, class II, genome-wide SNP data, and partnership data within the same individuals—a combination that enables the genome-wide relatedness correction that distinguishes our study from much of the prior literature.

      References

      Hedrick, P. W., & Black, F. L. (1997). HLA and mate selection: no evidence in South Amerindians. The American Journal of Human Genetics, 61(3), 505-511.

      Croy, I., Ritschel, G., Kreßner-Kiel, D., Schäfer, L., Hummel, T., Havlíček, J., ... & Schmidt, A. H. (2020). Marriage does not relate to major histocompatibility complex: A genetic analysis based on 3691 couples. Proceedings of the Royal Society B, 287(1936), 20201800.

      Low, A., Sachathep, K., Rutherford, G., Nitschke, A. M., Wolkon, A., Banda, K., ... & Mutenda, N. (2021). Migration in Namibia and its association with HIV acquisition and treatment outcomes. PLoS One, 16(9), e0256865.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, a value close to the average radius of the circle trajectories of the unconfined bacteria in 2D. These chiral circles impose that the bacteria swim preferentially along the right-side wall, which indeed yields chemotaxis in the presence of a chemotactic gradient. These observations are backed by numerical simulations and a geometrical analysis.

      Reviewer #3 (Public review):

      This paper addresses, through experiment and simulation, the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established, to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.

      The authors have included a useful intuitive explanation of their results via a geometric model of the trajectories. In future work it would be interesting to analyze further the voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, this might help understand how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. As these would be characterized by significant heterogeneity in pore sizes and geometries, further work will be necessary to translate the present results to those situations.

      Thanks to the referees' input and more work, we think our revised manuscript now meets the high standard of eLife

      Recommendations for the authors:

      The importance of the circular swimming chirality for the observed phenomenon could be further emphasized by actually using the word "chiral" or "chirality" in the text. Also indicating what would change is swimming were counterclockwise rather then clockwise would help the reader understand the key significance of chirality.

      We thank the reviewer for this insightful suggestion. We agree that the chirality of the surface interaction is central to the observed phenomenon and should be explicitly highlighted to improve the reader's understanding.

      In response, we have incorporated the terms "chiral" and "chirality" throughout the manuscript (Abstract, Introduction, Results, and Discussion) to emphasize this aspect. Furthermore, we have added a specific explanation in the Results section (the last paragraph of subsection “The cells in the right sidewall region dominated the chemotaxis of E. coli with lane confinements”) detailing the hypothetical scenario of counter-clockwise swimming. We clarify that in such a case, the hydrodynamic interaction would cause cells to veer left, resulting in up-gradient accumulation along the left sidewall rather than the right. We believe these additions significantly improve the clarity of the underlying physical mechanism.

      Reviewer #1 (Recommendations for the authors):

      I still have several comments that the authors may want to consider for the last version.

      - The run and tumble behavior of the cells at the surface remains puzzling and would need some more explanation in the text. Tumbles with no significant reorientation angle amount largely to smooth swimmers. How can a model based on run-and-tumbles be used to explain the difference between LSW and RSW?

      We apologize for the lack of clarity regarding the surface run-and-tumble behavior. While it is true that surface tumbles often result in smaller reorientation angles compared to bulk swimming, they are not negligible and play a critical role in the observed asymmetry. As shown in the tumble angle distributions (Fig. 2E and 2F), the probability of a tumble angle exceeding π/2 is approximately 9% for sidewall trajectories and 30% for the middle area. This tumbling behavior leads to differences between the left sidewall (LSW) and right sidewall (RSW) in two key ways:

      First, as detailed in our geometric analysis (Fig. 6), running cells following stable clockwise circular paths are geometrically favored to reach the RSW. Because cells moving up-gradient (towards the RSW) experience suppressed tumbling, they maintain these stable circular trajectories and accumulate effectively. Conversely, cells moving down-gradient (towards the LSW) experience enhanced tumbling. These frequent interruptions distort the circular trajectories required to reach the LSW, resulting in fewer bacteria entering the LSW compared to the RSW.

      Second, once at the wall, the difference in tumbling frequency dictates retention. Majority of LSW cells are swimming down-gradient (LSW-DG) and thus tumble more frequently, increasing their probability of escaping the wall. Majority of RSW cells are swimming up-gradient (RSW-UG), suppressing tumbles and increasing their residence time at the wall.

      The relevant clarifications have been included in the last paragraph of “Results” in the manuscript.

      - Figure 5B would need more explanation. I still don't understand the different behaviors for the right and left side walls at small widths. Is it noise really or a more complex behavior? Since most of these calculations are based precisely on the shape of these curves it would be useful to discuss them in more detail.

      We apologize for the lack of clarity. The behavior observed at small widths in Figure 5B is not noise; rather, it reflects the idealized nature of our simulation model.

      In the simulation, bacteria were modeled as active particles without explicit steric exclusion for the flagella and cell body. Consequently, simulated cells retain the ability to reorient and turn freely even in very narrow lanes (w ≤ 6 μm), allowing the geometric sorting mechanism (which favors the RSW) to function efficiently even at small widths. This is why the simulation shows a distinct difference between LSW and RSW proportions in this regime.

      In the experimental reality, however, the finite size of the bacterial body and flagella creates steric hindrance. In narrow channels, this physical constraint restricts the cells' ability to turn, thereby disrupting the circular swimming mechanism required to sort cells into the RSW. As a result, experimental data shows that the proportions of LSW and RSW cells tend to equalize in narrow channels (e.g., w = 6 μm in Fig. 4B), leading to a lower chemotactic drift velocity than predicted by the simulation.

      We have added a discussion regarding these steric effects and the deviation at narrow widths to the Results section (the penultimate paragraph of subsection "Simulation of E. coli chemotaxis within lane confinement") in the revised manuscript.

      - The importance of the chirality of the circular trajectories, although essential, remains insufficiently mentioned in the text.

      We have incorporated the terms "chiral" and "chirality" throughout the manuscript (Abstract, Introduction, Results, and Discussion) to emphasize this aspect. Furthermore, we have added a specific explanation in the Results section (the last paragraph of subsection “The cells in the right sidewall region dominated the chemotaxis of E. coli with lane confinements”) detailing the hypothetical scenario of counter-clockwise swimming.

      - It would be useful to color-code the trajectories of Figure 1B and alike with time.

      Thank you for the suggestion. Now the trajectories in Fig. 1B have been redrawn. Distinct colors denote individual trajectories, with color intensity darkening to indicate time progression.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Lenz and colleagues describes a detailed examination of the epigenetic changes and alterations in subnuclear arrangement associated with the activation of a unique var gene associated with placental malaria in the human malaria parasite Plasmodium falciparum. The var gene family has been heavily studied over the last couple of decades due to its importance in the pathogenesis of malaria, its role in immune avoidance, and the unique transcriptional regulation that it displays. Aspects of how mutually exclusive expression is regulated have been described by several groups and are now known to include histone modifications, subnuclear chromosomal arrangement, and in the case of var2csa, regulation at the level of translation. Here the authors apply several methods to confirm previous observations and to consider a possible role for DNA methylation. They demonstrate that the histone mark H3K9me3 is found at the promoters of silent genes, var2csa moves away from other var gene clusters when activated, and while DNA methylation is detectable at var genes, it does not seem to correlate with transcriptional activation/silencing. Overall, the data and approach appear sound.

      Strengths:

      The authors employ the latest methods for epigenetic analysis of histone marks, transcriptomic analysis, DNA methylation, and chromosome conformation. They also use strong selection pressure to be able to examine the gene var2csa in its active and silent state. This is likely the only paper that has used all these methods in parallel to examine var gene regulation. Thus, the paper provides readers with confidence in the interpretation of independent methods that address a similar subject.

      We thank the reviewer for this positive assessment. We appreciate the recognition that our study combines complementary approaches including histone mark profiling, transcriptomic analysis, DNA methylation mapping, and chromosome conformation capture in parallel to the use of strong population selection that enables a controlled comparison of var2csa in active versus silent states. We agree that the convergence of independent methods strengthens confidence in the interpretation.

      Weaknesses:

      The primary weakness of the paper is that none of the conclusions are novel and the overall conclusions do not shed much new light on the topic of var gene regulation or antigenic variation in malaria parasites. The paper is largely confirmatory. The roles of H3K9me3 and subnuclear localization in var gene regulation are well established by many groups (including for var2csa), albeit in some cases using alternative methods. The only truly unique aspect of the manuscript is the description of 5mC at var2csa when the gene is transcriptionally active or silent. Here the authors demonstrate that the mark has no clear role in transcriptional activation or silencing, however, this will not be surprising to many in the field who have previously cast doubt on a regulatory role for this modification.

      While we agree that some individual features of var gene regulation, including H3K9me3 enrichment, have been described previously, our study integrate for the first time several layer of gene regulation on the clinically important var2csa locus using phenotypically homogeneous placental-binding parasite populations. As expected, var2csa activation coincided with a loss of H3K9me3 at the locus. However, using high-resolution chromatin conformation capture (to our knowledge, this experiment had never been applied to phenotypically homogeneous parasite populations), we quantified the repositioning of var2csa relative to heterochromatic telomeric clusters. We further assessed DNA methylation in this framework and show that 5-methylcytosine is broadly present at var genes and may correlate with transcript level, but is uncoupled from transcriptional activation, repression, and switching. Together, these findings integrate transcriptional state, chromatin marks, and 3D genome organization at var2csa and argue against models in which 5mC acts as a primary regulatory switch for var gene expression.

      Reviewer #2 (Public Review):

      Summary:

      Dr Lenz and colleagues report on their in vitro studies comparing gene transcription and epigenetic modifications in Plasmodium falciparum NF54 parasites selected or not selected for adhesion of the infected erythrocytes (IEs) to the placental IE adhesion receptor chondroitin sulfate A (CSA).

      The authors report that selection led to preferential transcription of var2csa, the gene that encodes the VAR2CSA-type PfEMP1 well-established as the PfEMP1 mediating IE adhesion to CSA. They confirm that transcriptional activation of var2csa is associated with distinct depletion of H3K9me3 marks and that transcriptional activation is linked to repositioning of var2csa. Finally, they provide preliminary evidence potentially implicating 5mC in the transcriptional regulation of var2csa.

      Strengths:

      The study confirms previously reported features of gene transcription and epigenetic modifications in Plasmodium falciparum.

      As stated in our response to Reviewer 1, our study combines, for the first time, complementary approaches, including transcriptomic analysis, histone mark profiling, DNA methylation mapping, and chromosome conformation capture, together with strong population selection to enable a controlled comparison of var2csa in active versus silent states.

      Weaknesses:

      No major new finding is reported. The strength of the evidence presented is mostly solid, although certain elements, e.g., the role of 5mC in transcriptional regulation of var2cs, appear preliminary and incomplete.

      While we agree that no major new finding is reported, we were able to use for the first time a high-resolution chromatin conformation capture method to quantify the repositioning of var2csa relative to heterochromatic telomeric clusters. We also further assessed that 5-methylcytosine is present at var genes and may correlate with transcript level, but is uncoupled from transcriptional activation, repression, and switching. Together, these findings integrate for the first time transcriptional state, chromatin marks, and 3D genome organization at var2csa and argue against models in which 5mC acts as a primary regulatory switch for var gene expression.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      (1) In the second paragraph of the introduction, the authors state "....such as the shielding of the parasite antigens expressed on pRBC surfaces by other cells and the evasion of splenic clearance (8)." What does "other cells" mean here?

      We thank the reviewer for this comment. We have clarified the cell type in the text.

      (2) In their interpretation of the Hi-C data, the authors conclude that the var2csa expressing parasites display "tighter heterochromatin control of var gene regions" and "interactions around other silent var genes were increased" and "an overall compaction of telomere ends and var gene-containing intrachromosomal regions". While the data appear to show that this is true when they compare the two parasite populations, I am concerned that the authors might be misinterpreting the data. It is important to note that the NF54CSAh line is heavily selected to be nearly entirely homogeneous for var gene expression while the NF54 line is exceptionally heterogeneous. This is shown in Figure 1G. Thus, any chromosomal arrangement specific for var gene expression in the unselected NF54 population will be similarly heterogeneous and therefore could appear less tight. In other words, interactions around silent var genes and overall compaction of telomere ends might be identical between individual parasites within these populations, but appear tighter or more compact in the var2csa expressing line simply because it is a homogeneous population. Perhaps this is what the authors meant to convey, however as currently written, it seems that they conclude the expression of var2csa results in a unique change in chromosome organization. A better comparison would be two populations homogeneously expressing different var genes, one expressing var2csa and one expressing an alternative var gene. Such lines can be generated through clonal isolation or selection for binding to a different host receptor.

      We thank the reviewer for this comment. The reviewer is correct, and we have revised the Discussion section of the manuscript to clarify this issue.

      (3) The title of the last section of the Results is "Distribution of DNA methylation influences gene expression overall but does not mediate transcriptional activation and switching in antigenic variation". This is an overstatement. The authors show that DNA methylation is absent at var gene promoter regions and enriched in coding regions, but there they provide no evidence that it "influences gene expression overall". This is speculation. Lastly, when the authors examined 5mC occupancy across genes, did they normalize for GC content of the DNA sequences? GC content is known to increase dramatically in coding regions (particularly in var genes) and thus could explain the distribution of this mark. If the authors corrected for this, they should directly state this in the results section. If they did not, they should explain why they don't think this property of the P. falciparum genome explains the distribution of 5mC.

      There is often a misconception in the field that DNA methylation is primarily confined to CpG islands in promoter regions and functions mainly as a repressor of transcription. However, in contrast to promoter methylation, methylation within gene bodies is generally associated with higher levels of gene expression, suggesting a role in facilitating transcription elongation. Gene-body methylation can also repress internal promoters, thereby preventing spurious transcription initiation within the gene. In addition, it has been shown to influence alternative splicing by affecting RNA polymerase II elongation kinetics.

      We propose that, in Plasmodium, DNA methylation may be associated with priming genes for transcriptional activity rather than repressing transcription. Specifically, higher methylation levels may facilitate recruitment of the RNA polymerase II transcriptional machinery to enable transcription. In Figure 4B, we observe higher levels of DNA methylation in the first exon of highly expressed genes in both the NF54 and NF54CSAh lines. Interestingly, we also detect high levels of methylation across most introns of the var genes, introns that must be transcribed, cannot be degraded, and are essential for var gene regulation, suggesting a possible sequence-recognition function. We have edited the manuscript to improve clarity.

      (4) In the legend to Figure 3D, the authors state that the centromeres are shown in blue, however in the figure they appear to be grey while var2csa is blue.

      We have revised the figure legend accordingly.

      Reviewer #2 (Recommendations For The Authors):

      I recommend using the term "transcription" rather than "expression" when discussing events at the gene level.

      We have revised the manuscript accordingly.

      I also recommend using the term "adhesion" to describe the physical interaction between infected erythrocytes and adhesion receptors rather than adherence", which should be reserved to describe non-physical affinity (e.g., beliefs, faith).

      We have revised the manuscript accordingly.

      Important new evidence regarding transcriptional regulation of var genes in general and var2csa in particular should be discussed and cited.

      We have revised the manuscript accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The manuscript by Shukla et al. provides important mechanistic insights into kinesin-1 autoinhibition and cargo-mediated activation. Using a convincing combination of protein engineering, computational modeling, biophysical assays, HDX-MS, and electron microscopy, the authors reveal how cargo binding induces an allosteric transition that propagates to the motor domains and enhances MAP7 binding. Despite limitations arising from conformational heterogeneity and structural resolution, the study presents a unified mechanism for kinesin-1 activation that will be of broad interest to the motor protein, structural biology, and cell biology communities.

      We are grateful for the time and effort from the reviewers and editors in providing fair and constructive comments that have helped to improve the manuscript. Our point-by-point response is provided below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to interrogate the sets of intramolecular interactions that cause kinesin-1 hetero-tetramer autoinhibition and the mechanism by which cargo interactions via the light chain tetratricopeptide repeat domains can initiate motor activation. The molecular mechanisms of kinesin regulation remain an important question with respect to intracellular transport. It has implications for the accuracy and efficiency of motor transport by different motor families, for example, the direction of cargos towards one or other microtubules.

      Strengths:

      The authors focus on the response of inactivated kinesin-1 to peptides found in cargos and the cascade of conformational changes that occur. They also test the effects of the known activator of kinesin-1 - MAP7 - in the context of their model. The study benefits from multiple complementary methods - structural prediction using AlphaFold3, 2D and 3D analysis of (mainly negative stain) TEM images of several engineered kinesin constructs, biophysical characterisation of the complexes, peptide design, hydrogen/deuterium-exchange mass spectrometry, and simple cell-based imaging. Each set of experiments is thoughtfully designed, and the intrinsic limitations of each method are offset by other approaches such that the assembled data convincingly support the authors' conclusions. This study benefits from prior work by the authors on this system and the tools and constructs they previously accrued, as well as from other recent contributions to the field.

      Weaknesses:

      It is not always straightforward to follow the design logic of a particular set of experiments, with the result that the internal consistency of the data appears unconvincing in places.

      For example, i) the Figure 1 AlphaFold3 models do not include motor domains whereas the nearly all of the rest of the data involve constructs with the motor domains;

      We appreciate the reviewer’s comment regarding the absence of the motor domains in the AlphaFold3 models shown in Figure 1. These domains were intentionally excluded to improve visual clarity and to better highlight the interaction between the TPR domains and CC1 in the inhibited kinesin-1 conformation. We felt that this simplified presentation in the main figure helps readers focus on the key mechanistic advance introduced in this work at the outset of the paper. For completeness, we have provided full-length kinesin-1 AlphaFold3 models that include the motor domains in the Supplementary Information (Fig. S1), and they are described in detail in the main text. In addition, we have added a note to the Figure 1 legend to explicitly direct readers to these full-length models.

      ii) the kinesin constructs are chemically cross-linked prior to TEM sample preparation - this is clear in the Methods but should be included in the Results text, together with some discussion of how this might influence consistency with other methods where crosslinking was not used.

      Thank you. Chemical crosslinking is typically important for obtaining high-quality negative-stain TEM grids of kinesin-1 complexes and has been employed in all prior EM studies by our group and others. While this was described in the Methods, we agree that it should also be stated explicitly in the Results. Accordingly, we have added a sentence to the Results section noting that the proteins were stabilized using the amine-to-amine crosslinker BS3 (“Proteins were also stabilised using the amine-to-amine crosslinker BS3 that was important for achieving reproducibly high-quality samples for imaging.”).

      Please see point below for acknowledgement of risks of using crosslinker.

      Can those cross-links themselves be used to probe the intramolecular interactions in the molecular populations by mass spec?

      We had considered this, however, cross-linking mass spectrometry (XL-MS) has been applied extensively to essentially identical kinesin-1 complexes by Tan et al. (eLife 2023). That work provided important insights into the overall architecture of the complex, including the new head–CC1 interactions. However, as fully acknowledged by the authors, significant ambiguity remained with respect to the positioning of the TPR domains, with many cross-links that could not be straightforwardly rationalized in a single model. These unresolved aspects provided part of the motivation for the present study, as highlighted in the Introduction.

      We believe that this ambiguity likely reflects an underlying conformational equilibrium of the kinesin-1 complex (e.g. opening/closing transitions) and/or dynamic docking and undocking of the TPR domains, and lysine-rich features of the TPR domains (most notably the loops that connect the TPR alpha helices) which may make them prone to lock in non-native states, which limits the interpretability of static cross-linking data in this system. In this context therefore, we feel that XL-MS has already been thoroughly explored for kinesin-1 and that its practical limitations in resolving these TPR interactions have been reached.

      This consideration was a primary motivation for pursuing cross-linker-free, solution-based approaches, particularly HDX-MS, which we argue provide the most relevant new insights into the assembly and conformational dynamics of the complex. To make this rationale clearer, we have added an explicit note in the HDX-MS section emphasizing that this is a cross-linker-free method. The added text reads:

      “To determine how the local structural changes from adaptor binding and shoulder dislocation affected the dynamics of kinesin-1 complexes in solution, as directly and least invasively as possible, and without the risk of cross-linker artefacts.”

      In general, the information content of some of the figure panels can also be improved with more annotations (e.g. angular relationship between views in Figure 1B, approximate interpretations of the various blobs in Fig 3F, and more thought given to what the reader should extract from the representative micrographs in several figures - inclusion of the raw data is welcome but extraction and magnification of exemplar particles (as is done more effectively in Fig S5) could convey more useful information elsewhere.

      We appreciate these suggestions. We have modified the figures throughout the manuscript in line with the reviewer’s points. Raw data is now provided at higher magnification throughout so the reader can better distinguish individual particles, angular relationships have been added and further annotations provided on 2D class averages. We do not want the reader to draw too many conclusions from images of single closed particles (with the exception of open vs closed in Fig S7) as these require averaging and 2D classification to obtain meaningful insights, and so we have not added zoom panels in these cases. Figure 3F has been annotated as requested.

      Reviewer #2 (Public review):

      Summary:

      In this paper, Shukla, Cross, Kish, and colleagues investigate how binding of a cargo-adaptor mimic (KinTag) to the TPR domains of the kinesin-1 light chain, or disruption of the TPR docking site (TDS) on the kinesin-1 heavy chain, triggers release of the TPR domains from the holoenzyme. This dislocation provides a plausible mechanism for transition out of the autoinhibited lambda-particle toward the open and active conformation of kinesin-1. Using a combination of negative-stain electron microscopy, AlphaFold modeling, biochemical assays, hydrogen-deuterium exchange mass spectrometry (HDX-MS), and other methods, the authors show how TPR undocking propagates conformational changes through the coiled-coil stalk to the motor domains, increasing their mobility and enhancing interactions with the microtubule-bound cofactor MAP7. Together, they propose a model in which the TDS on CC1 of the heavy chain forms a "shoulder" in the compact, autoinhibited state. Cargo-adaptor binding, mimicked here by KinTag, dislodges this shoulder, liberating the motor domains and promoting MAP7 association, driving kinesin-1 activation.

      Strengths:

      Throughout the study, the authors use a clever construct design - e.g., delta-Elbow, ElbowLock, CC-Di, and the high-affinity KinTag - to test specific mechanisms by directly perturbing structural contacts or affecting interactions. The proposed mechanism of releasing autoinhibition via adaptor-induced TPR undocking is also interrogated with a number of complementary techniques that converge on a convincing model for activation that can be further tested in future studies. The paper is well-written and easy to follow, though some more attention to figure labels and legends would improve the manuscript (detailed in recommendations for the authors).

      Weaknesses:

      These reflect limits of what the current data can establish rather than flaws in execution. It remains to be tested if the open state of kinesin-1 initiated by TPR undocking is indeed an active state of kinesin-1 capable of processive movement and/or cargo transport. It also remains to be determined what the mechanism of motor domain undocking from the autoinhibited conformation is, and perhaps this could have been explored more here. The authors have shown by HDX-MS that the motor domains become more mobile on KinTag binding, but perhaps molecular dynamics would also be useful for modelling how that might occur.

      We are grateful for the reviewer’s comments. We agree that the weaknesses the reviewer has outlined define the limitations of the study and establish important priorities for future work, that includes molecular dynamics simulations. An important prerequisite for the latter is a starting model that one has confidence in. We think that our study and earlier work now provide a good experimentally supported foundation for using AF3 generated assemblies for this purpose, by ourselves and others.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Shukla and colleagues presents a comprehensive study that addresses a central question in kinesin-1 regulation - how cargo binding to the kinesin light chain (KLC) tetratricopeptide repeat (TPR) domains triggers activation of full-length kinesin-1 (KHC). The authors combine AlphaFold3 modeling, biophysical analysis (fluorescence polarization, hydrogen-deuterium exchange), and electron microscopy to derive a mechanistic model in which the KLC-TPR domains dock onto coiled-coil 1 (CC1) of the KHC to form the "TPR shoulder," stabilizing the autoinhibited (λ-particle) conformation. Binding of a W/Y-acidic cargo motif (KinTag) or deletion of the CC1 docking site (TDS) dislocates this shoulder, liberating the motor domains and enhancing accessibility to cofactors such as MAP7. The results link cargo recognition to allosteric structural transitions and present a unified model of kinesin-1 activation.

      Strengths:

      (1) The study addresses a fundamental and long-standing question in kinesin-1 regulation using a multidisciplinary approach that combines structural modeling, quantitative biophysics, and electron microscopy.

      (2) The mechanistic model linking cargo-induced dislocation of the TPR shoulder to activation of the motor complex is well supported by both structural and biochemical evidence.

      (3) The authors employ elegant protein-engineering strategies (e.g., ElbowLock and ΔTDS constructs) that enable direct testing of model predictions, providing clear mechanistic insight rather than purely correlative data.

      (4) The data are internally consistent and align well with previous studies on kinesin-1 regulation and MAP7-mediated activation, strengthening the overall conclusion.

      Weaknesses:

      (1) While the EM and HDX-MS analyses are informative, the conformational heterogeneity of the complex limits structural resolution, making some aspects of the model (e.g., stoichiometry or symmetry of TPR docking) indirect rather than directly visualized.

      We agree with the reviewers point. Conformational heterogeneity is a significant challenge, and the model has been developed from multiple complementary approaches. A higher resolution cryoEM study remains a priority, but is challenging because of the size, shape and flexibility of the particle, but we hope that some the approaches used here (e.g. nanobody TPR stabilisation, ElbowLock) will provide a path to achieve this.

      (2) The dynamics of KLC-TPR docking and undocking remain incompletely defined; it is unclear whether both TPR domains engage CC1 simultaneously or in an alternating fashion.

      We agree that this is a limitation. We strongly suspect that the TPR domains dynamic and are working to overcome experimental challenges to resolve this important outstanding question. We have expanded the discussion section to better highlight this important priority.

      (3) The interplay between cargo adaptors and MAP7 is discussed but not experimentally explored, leaving open questions about the sequence and exclusivity of their interactions with CC1.

      We agree that this is a limitation but will be an important priority for future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are a number of places where the text could be more precise or clear, or the figures could be designed to be more informative:

      (1) The word "unitarily" is used in several places, and I don't know what it means in this context.

      We have changed the phrasing throughout the manuscript to this term. We were attempting to contrast with presumed cooperative multivalent interactions in the context of the kinesin-1 tetramer but agree that this choice of word doesn’t quite achieve that.

      (2) On page 5 the phrase "We focused on the ElbowLock background" is introduced and needs to be explained more clearly.

      Thank you. We have amended the text to read “This KIF5C construct contains a short 5 amino acid deletion that restricts flexibility around the elbow and helps maintain particles in their lambda conformation, providing homogenous samples, and facilitating subsequent analysis (34).”

      (3) On page 6, the phrase "To improve the resolution of our images, we turned to single-particle cryoEM analysis" is imprecise - what do the authors mean by the resolution of the images? Cryo-EM data does not always guarantee a higher resolution structure, but it offers the possibility of visualising finer structural features. This is probably what is meant here, but needs to be stated more precisely.

      We have amended the text to ‘visualise finer structural details’ as suggested.

      (4) Page 7 - "suggesting that TPR domains had loosely dissociated from the core" - I don't think the evidence points to dissociation of KLCs from the complex, but the phrase "loosely dissociated" implies this - would benefit from rephrasing.

      We have changed this to ‘undocked’ for consistency with other descriptions in the manuscript.

      (5) Was the effect of the CC-Di insertion (ΔTDS) detectable by AlphaFold prediction? It would be interesting to include this, partly for completeness and partly because a slightly imperfect and maybe a more dynamic coiled-coil in this region of the molecule may be important in supporting the conformational changes required for activation.

      Thank you for this suggestion. Modelling of deltaTDS complex indeed shows displacement of the TPR domains. In the standard 5 output models, the TPR domains now occupy a variety of different positions, all with essentially zero confidence (high position error). Consistent with biochemical data, the CCDi insertion is modelled with with no overall disruption to the architecture or length of CC1 as expected. We think that this is a valuable addition to the study and have included it as a new supplementary figure (Fig S5), with main text reading.

      …. “Supporting this, models of ΔTDS complexes using AF3 showed the expected seamless insertion of CCDi into CC1, with displacement of the TPR domains to a variety of different positions, in 5 models, all with high position error with respect to KHC (Fig S5).”

      (6) Figure S1 has two sections designated (C) in the legend.

      Corrected

      (7) Figure S3 - given the resolution and level of interpretation of the 3D reconstructions, it is not relevant to include an FSC curve, but other standard information, such as angular distribution and any evidence of variability from 3D classifications (and how many particles per 3D class) should be included for all structures.

      Thank you, a complete workflow for all complexes has now been provided in Figure S8 with the information requested. In each case there were typically two ‘good’ classes. For ElbowLock, this included one without a prominent shoulder, consistent with 2D classification and quantification. We assume this may reflect a docking/undocking equilibrium. For the deltaTDS and KinTag particles, neither class showed the shoulder feature. The main text has been modified to reflect this and reads “For ElbowLock complexes, this resulted in classes with and without a prominent shoulder, in agreement with 2D classification. For ElbowLock-ΔTDS and ElbowLock-KinTag complexes, no prominent shoulder containing classes were observed.”

      Reviewer #2 (Recommendations for the authors):

      Overall, the figures would benefit from more labels for clarity, some examples and suggestions below:

      (1) Figure 1A - Connect motors to the rest of the structure e.g., wiggly lines.

      Corrected.

      (2) Figure 1B - Add arrows and angles to indicate different views of the model.

      Corrected.

      (3) Figure 1B - Label TPR1-6 (e.g., inset zoom in).

      Corrected.

      (4) Figure 2D and 3D - Label the lack of a shoulder in all averages (perhaps with an arrow instead of a circle to not obscure density), include an example average which shows prominent shoulder density.

      Corrected. Full sets of classes showing shoulder like features for deltaTDS and KinTag complexes are now shown in Figure S4.

      (5) Figure 3D: Label motor domains and elbow as in other figures.

      Corrected.

      (6) Methods: Include more information on how EM classes were compared to AF projections (e.g., Figure 1D). Was this done visually or computationally? Likewise, more information is needed on how classes were judged to have prominent/weak shoulder density (Figure 2D). In the figure legend, there is a statement that "Full sets of classes are provided in Fig. S4" but this is absent in the supplement.

      Thank you. This information has been added to the methods.

      “For comparison to the AF3 model, simulated density was generated using the molmap command in ChimeraX (73) filtering to 15 Å, and projections were generated/selected automatically using the Reference Based Auto Selected 2D function in CryoSPARC”.

      Full sets of classes are now provided in Figure S4.

      (7) Figure 1-3 - Raw micrographs are a very useful inclusion but would benefit from being a more zoomed-in view (e.g., Figure S5 scale). Particularly useful for 3C, where the mixture of open and closed would be good to see.

      Higher zoom micrographs have been provided throughout.

      (8) Figure 5D: Panels too small to see the result, suggest making full width and moving E below.

      Thank you. We have expanded the panel and moved the model to a new Figure 6.

      (9) Figure S1: PAE plot convincing, but pLDDT colour models needed.

      A representative model coloured for pLDDT has been added to Figure S1. Most of the structure sits within the light blue confident range (90 > pLDDT > 70) with the exception of the disordered regions and neck coil.

      (10) Figure 5B: Reason for the variable inputs?

      The reviewer raises an interesting point. The slightly reduced expression of deltaElbow and slightly increased expression of ElbowLock is a consistent feature of these experiments. We note that this effect is in the ‘opposite direction’ to the impact on binding to MAP7 and so does not affect our conclusions from the experiment. However, we wonder whether opening and closing of the complex may impact on turnover of kinesin proteins, which could have implications for their normal homeostasis and possible degradation after transport in polarised cells. We are considering how to explore this going forwards. We have added a note to the results section to highlight this interesting observation to the reader.

      “We also noted slightly elevated expression of ElbowLock complexes and slightly lower expression of DeltaElbow complexes, suggesting that opening/closing of the complex could impact on kinesin-1 turnover”

      (11) Figure legend 5B: Insufficient detail, the end result is stated, but the three separate gels are not described.

      Legend has been expanded.

      (12) Figure 3F: Currently somewhat problematic. It is unclear if the models are in the same view, and so comparison is difficult. Figure 1C (bottom right) shows class averages with a clear, separate CC density, so the relatively featureless model in this region is puzzling. A statement on how the three model views are related to each other, if aligned with each other, would be useful.

      We appreciate the reviewers point. Models were aligned in Chimera, using the fit in map command. Because of the limited features of the models presumably due to flexibility, achieving a good alignment for all three models was challenging, but we think that showing the 180-degree rotations is probably about the best we can achieve here.

      (13) The following statement is too strong: "Nonetheless, we obtained reference-free 2D class averages that appeared to show full-length 'side' views of the complex with clear definition of the elbow, hinge 2, and KHC-KLC (coiled-coil) interface features which enabled us to identify CC1 confidently (Fig. 1D)". Given that the negative-stain EM data were collected primarily to validate the AlphaFold model, the assignment of CC1 should be described as consistent with rather than confidently identified from the class averages. The resolution of the EM data does not independently support such an assignment, and the wording needs to be softened.

      We appreciate the reviewer’s point, we have softened the wording as suggested. The paragraph now reads.

      “To visualise finer structural details, we turned to single-particle cryoEM analysis of frozen-hydrated samples. We were unable to obtain optimal samples suitable for determining the complete structure. Nonetheless, we obtained reference-free 2D class averages that appeared to show full-length ‘side’ views of the complex with clear definition of the elbow, hinge 2, and KHC-KLC (coiled-coil) interface features (Fig. 1D). The motor domains were poorly resolved in these classes, suggesting that the head assembly is somewhat flexible relative to the coiled coil/TPR body. A comparison to low-pass filtered back-projections from the AF3 model (without motor domains) revealed density at a position concurrent with the docked TPR domains (Fig. 1D).”

      (14) There is a typo in the figure legend of Figure 3 - (E) and (F) should be (F) and (G).

      Corrected

      Reviewer #3 (Recommendations for the authors):

      I recommend the following additions:

      (1) Figure 1 labeling - In panel A, please label the "linker domain" and the "KLC subunits" explicitly to help orient the reader. In panel B, please mark the "TPR shoulder" corresponding to the docked TPR domains on CC1; this will help the reader connect parts B and C.

      Thank you, we have modified Figure 1A with this additional information.

      (2) The TPR docking site (TDS) is a central structural element, and its sequence boundaries are provided in the Methods. It would help to visualize this directly in Figure 2A or in an inset.

      We hope that the reviewer agrees that the zoomed in model in Figure 5A (alongside MAP7) provides a sufficiently detailed view of the structural interface to highlight the orientation of TPR1 with respect to CC1. The side chain contacts in the model are very plausible and confidently predicted (and can be straightforwardly reproduced in AF3 using the sequence information provided in the methods), but as our study has not explored this interaction at the single residue level, we would prefer not to imply this to the reader at this stage.

      (3) The authors' model of cargo-induced TPR dislocation is convincing. However, the Discussion could benefit from a clarification on whether both KLC-TPR domains are expected to be bound simultaneously or if a dynamic exchange occurs, as the EM data suggest potential asymmetry.

      Thank you, please see point 5 below where we have modified the discussion to reflect the reviewer’s thoughtful comments.

      (4) The HDX-MS analysis is comprehensive, but the authors may want to briefly comment on the coverage of low-signal regions (especially within CC2-CC3) to enhance clarity.

      We have added an additional supplementary figure (S10) showing sequence coverage. Overall, this is 88% but with some lower coverage around KHC-CC0 (neck) and the acidic linker that connects the KLC coiled-coil to the TPR. We have added a note to the main text to reflect this.

      “Sequence coverage was high (overall 88%) with the exception of KHC-CC0 (neck coil) and the acidic-linker region that connects the KLC coiled-coil to the TPR domains where coverage was lower”

      (5) In the Discussion, the proposed interplay between MAP7 and cargo adaptors is intriguing, especially considering the results from Anna Akhmanova's lab showing that MAP7 activates kinesin-1 processivity. Do the authors suggest that competition for CC1 is mutually exclusive or sequential? The answer has mechanistic implications.

      We have been considering questions for some time, and the short answer is that we don’t fully understand the dynamics yet. However, we appreciate the reviewer’s prompt to clarify our thinking on this. We have attempted to do this in a revised discussion section where we more explicitly outline these outstanding questions.

    1. Reviewer #2 (Public review):

      Zhe Li and colleagues investigate how mice exposed to visual threats and rewards balance their decisions in favour of consuming rewards or engaging in defensive actions. By varying threat intensity and reward value, they first confirm previous findings showing that defensive responses increase with threat intensity and that there is habituation to the threat stimulus. They then find that water-deprived mice have a reduced probability of escaping from low contrast visual looming stimuli when water or sucrose are offered in the environment, but that when the stimulus contrast is high, the presence of sucrose or water increases the probability of escape. By analysing behaviour metrics such as the latency to flee from the threat stimulus, they suggest that this increase in threat sensitivity is due to increased vigilance. Analysis of this behaviour as a function of social hierarchy shows that dominant mice have higher threat sensitivity, which is also interpreted as being due to increased vigilance. These results are captured by a drift diffusion model variant that incorporates threat intensity and reward value.

      The main contribution of this work is quantifying how the presence of water or sucrose in water-deprived mice affects escape behaviour. The differential effects of reward between the low and high contrast conditions are intriguing, but I find the interpretation that vigilance plays a major in this process not supported by the data. The idea that reward value exerts some form of graded modulation of the escape response is also not supported by the data. In addition, there is very limited methodological information, which makes assessing the quality of some of the analyses difficult, and there is no quantification on the quality of the model fits.

      (1) The main measure of vigilance in this work is reaction time. While reaction time can indeed be affected by vigilance, reaction times can vary as a function of many variables, and be different for the same level of vigilance. For example, a primate performing the random dot motion task exhibits differences in reaction times that can be explained entirely by the stimulus strength. Reaction time is therefore not a sound measure of vigilance, and if a goal of this work is to investigate this parameter, then it should be measured. There is some attempt at doing this for a subset of the data in Figure 3H, by looking at differences in the action of monitoring the visual field (presumably a rearing motion, though this is not described) between the first and second trials in the presence of sucrose. I find this an extremely contrived measure. What is the rationale for analysing only the difference between the first and second trials? Also, the results are only statistically significant because the first trial in the sucrose condition happens to have zero up action bouts, in contrast to all other conditions. I am afraid that the statistics are not solid here. When analysing the effects of dominance, a vigilance metric is the time spent in the reward zone. Why is this a measure of vigilance? More generally, measuring vigilance of threats in mice requires monitoring the position of the eyes, which previous work has shown is biased to the upper visual field, consistent with the threat ecology of rodents.

      (2) In both low and high contrast conditions, there are differences in escape behaviour between no reward and water or sucrose presence, but no statistically significant differences between water and sucrose (eg: Figure 3B). I therefore find that statements about reward value are not supported by the data, which only show differences between the presence or absence of reward. Furthermore, there is a confound in these experiments, because according to the methods, mice in the no-reward condition were not water-deprived. It is thus possible that the differences in behaviour arise from differences in the underlying state.

      (3) There is very little methodological information on behavioural quantification. For example, what is hiding latency? Is this the same are reaction time? Time to reach the safe zone? What exactly is distance fled? I don't understand how this can vary between 20 and 100cm. Presumably, the 20cm flights don't reach the safe place, since the threat is roughly at the same location for each trial? How is the end of a flight determined? How is duration measured in reward zone measures, e.g., from when to when? How is fleeing onset determined?

      (4) There is little methodological information on how the model was fit (for example, it is surprising that in the no reward condition, the r parameter is exactly 0. What this constrained in any way), and none of the fit parameters have uncertainty measures so it is not possible to assess whether there are actually any differences in parameters that are statistically significant.

      Comments on the revised manuscript:

      The manuscript has been revised and improved significantly by the addition of methodological details and new analysis. I remain, however, unconvinced by the argument that increased vigilance in the presence of reward leads to heightened escape behaviour.

      In response to my criticism that the work does not measure vigilance directly, the authors have included measures of foraging interval and foraging speed, which they state are "two direct behavioral analyses of vigilance". I disagree - like reaction time, foraging speed and foraging interval can be modulated, for example, by changes in threat sensitivity. Increased threat sensitivity comes with diverse behavioral changes that may well include increased vigilance, but foraging interval and foraging speed can certainly change without the animal expressing increased vigilance behaviors. A bigger issue I still have though, is with the conclusion that the presence of reward increases "direct escape behaviors". Comparing the no reward, water and sucrose groups indeed shows a difference (which is now clear after the split into early and late phases), but the issue is that these are different mice. As the text is written, is sounds like introducing reward will acutely increase escape. But if we look at the raw data show in Figure 2C, what I think is happening is that the presence of reward is decreasing habituation to the stimulus. The data for trials 1 and 10 in the three conditions show this - there is habituation with no reward (reaction times are all shifting to the right), a bit less with water and very little with sucrose. This is interesting in its own right and we can speculate why it might be happening, but I think this is conceptually different from what the authors are proposing.

    2. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study by Li and colleagues examines how defensive responses to visual threats during foraging are modulated by both reward level and social hierarchy. Using a naturalistic paradigm, the authors test how the availability of water or sucrose, with sucrose being more rewarding than water, shapes escape behavior in mice exposed to looming stimuli of different intensities, which are used to probe perceived threat level and defensive responses. In parallel, the study compares dominant and subordinate animals to assess how social rank biases the trade off between reward seeking and threat avoidance. By combining detailed behavioral analyses with computational modeling, the work addresses how reward level and social context jointly influence escape decisions in an ethologically relevant setting.

      Across the different experimental conditions, perceived threat level is the main determinant of behavior. The authors show that looming stimuli associated with higher threat (contrast) consistently elicit faster and more robust escape responses than lower threat stimuli. This effect is particularly evident during early exposures, when animals are highly vigilant and have not yet habituated to the looming stimulus (learned that it is not dangerous). Later they described that as animals gain experience and habituate, behavior becomes more flexible, and reward level begins to exert a graded modulation of the escape response. Importantly, the authors show that under high threat conditions increasing reward value leads to more frequent and faster escape rather than greater reward pursuit. This finding is particularly relevant, as it suggests that highly valued rewards can heighten vigilance and thereby enhance responsiveness to threat, highlighting that reward does not simply compete with defensive behavior but can also reshape it depending on the perceived level of danger, in contrast to low threat conditions, where threat can be more easily outweighed by reward. Thus, an important conceptual contribution of the study is the introduction of vigilance as a useful framework to interpret these effects. Vigilance is treated as a behavioral state reflecting heightened attention to potential danger. In line with what is known from natural foraging, mice initially maintain high vigilance when confronted with an innate threat. This perspective helps clarify a finding that might otherwise appear counterintuitive. One might expect higher rewards to motivate animals to tolerate risk, explore more, and habituate faster in any scenario. Instead, the data suggest that highly rewarding outcomes can elevate vigilance, making animals more responsive to threat and leading to faster or more frequent escape under high threat conditions. In this sense, reward does not simply compete with threat but can also amplify sensitivity to it, depending on the internal state of the animal.

      The social results are particularly interesting in this context as well. Dominant mice consistently prioritize avoidance over reward, showing stronger escape responses and slower habituation than subordinates. This behavior is well captured by the vigilance framework proposed by the authors: dominant animals appear to maintain higher vigilance, which biases decisions toward threat avoidance. The authors further suggest that stable social relationships sustain high vigilance and slow habituation, framing this as an evolutionarily conserved strategy that may enhance survival. This interpretation provides a valuable perspective on how social structure shapes defensive behavior beyond immediate physical interactions. At the same time, there are important limitations to this interpretation. All experiments were conducted in male mice, and it is possible that the relationship between social hierarchy, vigilance, and defensive behavior would differ substantially in females. In addition, the idea that stable social relationships maintain elevated vigilance does not straightforwardly align with broader views of social stability as protective for mental health and as a buffer against anxiety and stress. These points do not undermine the findings but suggest that the social effects described here should be interpreted with caution and within the specific context of the task and sex studied.

      We thank the reviewer for raising this important point. In the context of repeated looming exposure, slower habituation reflects more sustained vigilance over time. Compared to individually housed mice, group-housed mice exhibit slower habituation (Lenz et al., 2022), and pair-housed mice showed even slower habituation in our current work. Importantly, this pattern does not indicate that pair-housed mice have higher overall vigilance than individually housed animals. Although individually housed mice habituate more quickly, they display higher initial vigilance, as reflected by their increased probability of escaping in response to looming stimuli (Lenz et al., 2022). Thus, pair-housed mice exhibited reduced defensive responses compared to individually housed animals, consistent with a social buffering effect.

      Furthermore, in a separate study (Rank- and Threat-Dependent Social Modulation of Innate Defensive Behaviors; Li, Gao, Li, 2026, eLife 15:RP109571), we directly compared responses to looming stimuli when mice were tested alone versus in the presence of a social partner and observed clear evidence of social buffering.

      Another important limitation is that the neural mechanisms underlying these effects remain speculative. The manuscript includes an extensive discussion of candidate circuits, particularly involving the superior colliculus and downstream structures, but this section is necessarily based on prior literature rather than on data presented in the study. Given the complexity of the circuits involved in integrating internal state, reward, social context, and vigilance, the current work should be viewed as providing a strong behavioral and conceptual framework rather than direct insight into underlying neural mechanisms.

      We fully agree that the proposed neural mechanisms remain speculative and that the circuits involved in integrating internal state, reward, and social context are likely far more complex. We have revised the manuscript to acknowledge this limitation.

      Methodologically, the behavioral paradigm is well suited for studying escape decisions in socially housed animals, and the machine learning based classification of defensive responses is a clear strength. The computational model provides a useful formalization of how threat level, reward level, and vigilance interact and may be valuable for other laboratories studying escape, approach avoidance, or conflict situations, particularly as a way to classify behavioral outcomes after pose estimation. More generally, the work will be of interest to the neuroethology community for its detailed characterization of escape behavior under naturalistic conditions.

      Given the ethological nature of the study and the high inter individual variability reported by the authors, clarity and precision in the methods are especially important for reproducibility. While the revised manuscript addresses many earlier concerns, some aspects remain slightly difficult to follow. For example, the main text states that animals were not water deprived to avoid differences in internal state, whereas parts of the methods describe conditions in which animals were water deprived, suggesting that internal state manipulation may differ across experiments. Clearer separation and explanation of these conditions would further strengthen confidence in the work.

      To improve clarity, we have revised the Methods section to clearly distinguish between experimental conditions that involved water deprivation and those that did not.

      Overall, this study provides a rich and thoughtful analysis of how reward level and social hierarchy modulate defensive behavior through changes in vigilance. It offers a useful conceptual advance for thinking about escape behavior in naturalistic settings and lays a solid foundation for future work aimed at linking these behavioral states to underlying neural circuits.

      Reviewer #2 (Public review):

      Zhe Li and colleagues investigate how mice exposed to visual threats and rewards balance their decisions in favour of consuming rewards or engaging in defensive actions. By varying threat intensity and reward value, they first confirm previous findings showing that defensive responses increase with threat intensity and that there is habituation to the threat stimulus. They then find that water-deprived mice have a reduced probability of escaping from low contrast visual looming stimuli when water or sucrose are offered in the environment, but that when the stimulus contrast is high, the presence of sucrose or water increases the probability of escape. By analysing behaviour metrics such as the latency to flee from the threat stimulus, they suggest that this increase in threat sensitivity is due to increased vigilance. Analysis of this behaviour as a function of social hierarchy shows that dominant mice have higher threat sensitivity, which is also interpreted as being due to increased vigilance. These results are captured by a drift diffusion model variant that incorporates threat intensity and reward value.

      The main contribution of this work is quantifying how the presence of water or sucrose in water-deprived mice affects escape behaviour. The differential effects of reward between the low and high contrast conditions are intriguing, but I find the interpretation that vigilance plays a major in this process not supported by the data. The idea that reward value exerts some form of graded modulation of the escape response is also not supported by the data. In addition, there is very limited methodological information, which makes assessing the quality of some of the analyses difficult, and there is no quantification on the quality of the model fits.

      (1) The main measure of vigilance in this work is reaction time. While reaction time can indeed be affected by vigilance, reaction times can vary as a function of many variables, and be different for the same level of vigilance. For example, a primate performing the random dot motion task exhibits differences in reaction times that can be explained entirely by the stimulus strength. Reaction time is therefore not a sound measure of vigilance, and if a goal of this work is to investigate this parameter, then it should be measured. There is some attempt at doing this for a subset of the data in Figure 3H, by looking at differences in the action of monitoring the visual field (presumably a rearing motion, though this is not described) between the first and second trials in the presence of sucrose. I find this an extremely contrived measure. What is the rationale for analysing only the difference between the first and second trials? Also, the results are only statistically significant because the first trial in the sucrose condition happens to have zero up action bouts, in contrast to all other conditions. I am afraid that the statistics are not solid here. When analysing the effects of dominance, a vigilance metric is the time spent in the reward zone. Why is this a measure of vigilance? More generally, measuring vigilance of threats in mice requires monitoring the position of the eyes, which previous work has shown is biased to the upper visual field, consistent with the threat ecology of rodents.

      We agree that reaction time can be influenced by multiple factors, including stimulus strength. Consistent with this, reaction times (i.e. latencies to flee) were substantially shorter under high-contrast conditions (Figure 3E). However, even under the same high-contrast condition, reaction times were significantly shorter in the water condition compared to the no-reward condition, suggesting that other factors such as vigilance may contribute.

      Upward-directed attention includes rearing, up-stretching, and upward head orientation, which will be clarified in the Method section. To address concerns about statistical validity, we will quantify these behaviors across the first 10 trials rather than limiting the analysis to the first two.

      As for the dominance-related results, we interpret them as reflecting both enhanced vigilance and reduced reward-seeking behavior. Time spent in the reward zone is not a measure of vigilance but an indicator of reward-seeking motivation. We will clarify this in the revised manuscript.

      (2) In both low and high contrast conditions, there are differences in escape behaviour between no reward and water or sucrose presence, but no statistically significant differences between water and sucrose (eg: Figure 3B). I therefore find that statements about reward value are not supported by the data, which only show differences between the presence or absence of reward. Furthermore, there is a confound in these experiments, because according to the methods, mice in the no-reward condition were not water-deprived. It is thus possible that the differences in behaviour arise from differences in the underlying state.

      In Figure 3B, the difference between water and sucrose conditions did not reach statistical significance (p = 0.08). We plan to collect additional data to determine whether this is due to limited statistical power. It is also possible that some behavioral readouts are more sensitive to the differences between water and sucrose conditions. For example, Figure 3F shows that escape speed was significantly higher in the sucrose than in the water condition under high-contrast stimulation.

      Thank you for pointing this out. To control for the potential confounds related to internal state, mice were not water-deprived under any of the three conditions in Figures 3A-3H. We will clarify this in the main text and Methods. For Figures 3I-3M, which compare decision-making under no-reward and water conditions, we will conduct additional experiments using non-deprived mice in the water condition.

      (3) There is very little methodological information on behavioural quantification. For example, what is hiding latency? Is this the same are reaction time? Time to reach the safe zone? What exactly is distance fled? I don't understand how this can vary between 20 and 100cm. Presumably, the 20cm flights don't reach the safe place, since the threat is roughly at the same location for each trial? How is the end of a flight determined? How is duration measured in reward zone measures, e.g., from when to when? How is fleeing onset determined?

      Hiding latency was defined as the time from stimulus onset to the animal’s arrival at the safe zone. Reaction time was quantified as the latency to flee, measured from stimulus onset to the initiation of the first flight state. The flight state was defined as locomotion exceeding 10 cm at a speed greater than 10 cm/s. Distance fled was defined as the distance covered between stimulus onset and offset for all trials. However, in trials classified as no reaction or freezing, this measure does not accurately reflect escape behavior. We will therefore rename it as distance under threat to better capture its meaning. The reward zone was defined as the region within 15 cm of the reward port at the end of the arena. Duration in the reward zone was measured as the time spent within this region during the 20 seconds following stimulus onset. In Figure 4E, the percentage of time spent in the reward zone was calculated relative to the total time the mouse remained in the arena during the 2-hour social session.

      All definitions and additional details on behavioral quantification will be included in the revised Methods section.

      (4) There is little methodological information on how the model was fit (for example, it is surprising that in the no reward condition, the r parameter is exactly 0. What this constrained in any way), and none of the fit parameters have uncertainty measures so it is not possible to assess whether there are actually any differences in parameters that are statistically significant.

      We appreciate the comment and agree that further clarification is needed. We will provide a more detailed description of the model fitting procedure in the revised Methods section. Specifically, the drift rate parameter (r), which reflects the perceived reward value, was constrained to zero in the no-reward condition. To enable statistical comparison across conditions, we will report uncertainty measures for all fit parameters.

      Comments on the revised manuscript:

      The manuscript has been revised and improved significantly by the addition of methodological details and new analysis. I remain, however, unconvinced by the argument that increased vigilance in the presence of reward leads to heightened escape behaviour.

      In response to my criticism that the work does not measure vigilance directly, the authors have included measures of foraging interval and foraging speed, which they state are "two direct behavioral analyses of vigilance". I disagree - like reaction time, foraging speed and foraging interval can be modulated, for example, by changes in threat sensitivity. Increased threat sensitivity comes with diverse behavioral changes that may well include increased vigilance, but foraging interval and foraging speed can certainly change without the animal expressing increased vigilance behaviors. A bigger issue I still have though, is with the conclusion that the presence of reward increases "direct escape behaviors". Comparing the no reward, water and sucrose groups indeed shows a difference (which is now clear after the split into early and late phases), but the issue is that these are different mice. As the text is written, is sounds like introducing reward will acutely increase escape. But if we look at the raw data show in Figure 2C, what I think is happening is that the presence of reward is decreasing habituation to the stimulus. The data for trials 1 and 10 in the three conditions show this - there is habituation with no reward (reaction times are all shifting to the right), a bit less with water and very little with sucrose. This is interesting in its own right and we can speculate why it might be happening, but I think this is conceptually different from what the authors are proposing.

      We agree that vigilance is not directly observable as a single variable. Our intent was not to claim that foraging speed and foraging interval provide a direct measure of vigilance, but rather to suggest that they may serve as indirect behavioral correlates.

      We also considered an alternative interpretation: these two measures could reflect perceived reward value under high-threat conditions across distinct reward types. If that were the case, animals would be expected to exhibit shorter intervals and faster speeds across no reward, water, and sucrose conditions. However, our data do not support this interpretation (Figures 3L and 3M), suggesting that these measures are more likely correlated with vigilance. 

      Furthermore, it is unlikely that changes in foraging interval and speed are driven by altered threat sensitivity, as animals could not see the threat during most of the foraging bout and only encountered it at the end.

      Regarding the conclusion that the presence of reward increases direct escape behaviors, our interpretation is that increased reward value reduces habituation, thereby maintaining higher vigilance during the late phase. This was discussed in the second-to-last paragraph of the "Economic and social modulations of innate decision-making under threat" subsection in the Discussion.

      Reviewer #3 (Public review):

      Male mice were tested in a classic behavioral "flee the looming stimulus" paradigm. This is a purely behavioral study; no neural analyses were done. Mice were housed socially, but faced the looming stimulus individually, using an elegant automated tunnel (see videos for clarity).

      The additional changes made to the paper clarify the work done. While there are some limitations (male mice, weird stimulus), the general results are interesting and a valuable addition to the experimental literature. The main claim of the paper is that the different rewards (none, water, sucrose) did not change the escape properties early in learning, but did late, particularly that in the late (already experienced) conditions, reward value (assuming sucrose > water > no reward) interacted with the salience of the looming stimulus (light gray, dark gray). (Panels 3D, 3G, 3K, 3N).

      For readers, I want to note that one of the most interesting results is actually in Figure S2, where they find that a looming stimulus behind the mouse still makes a mouse run to the nest. In these conditions, the mouse runs past the looming stimulus to get to safety! (I also do love the video of the mouse running around the barriers like a snake to get home.)

      I have a few minor clarification questions and a few notes that I think would be useful additions for authors and readers to think about.

      Dominance: What does the mouse social science literature say about the "test tube" test? What can we conclude from this test? This would be useful when trying to understand what is causing the dominance/submissive difference in responses. Figure 4 shows that the dominant mice are more risk-averse than the submissive mice. Is "dominance" in the test-tube actually a measure of risk-seeking? Is the issue that the submissive mice don't think they can get back to the food-site easily, so they are less willing to sacrifice the current (if dangerous) foraging opportunity? Is the issue that the submissive mice can't get back to the nest? As I understand it, the nest was always available to all the mice, so I suspect inability to get to the nest is an unlikely hypotheses. Is the issue that the submissive mice also don't feel safe in the nest?

      The tube test is a widely used assay in the rodent social behavior literature to assess dominance hierarchies, operationally defined by the ability of one animal to force its opponent to retreat from a narrow tube. Importantly, this assay does not directly measure risk-seeking or anxiety-related traits, but rather competitive outcomes during social conflict. Furthermore, our data indicate that the behavioral responses of subordinate mice to looming stimuli are primarily driven by the visual threat itself rather than by social avoidance. This point was elaborated in the second paragraph of the “Social modulation of innate decision-making” subsection in the Results section.

      Limitations of the study: There is an acknowledged limitation to male mice, and the limitations of the small data sets that are typical of such experiments. In addition, however, it is also worth noting the strangeness of the looming stimulus, which is revealed clearly in the videos. The stimulus is a repeating growing circle, growing in a single location within the environment. The stimulus repeats 10 times, once per second. This is not what an attacking hawk or owl would look like. (I now have this image of an owl diving down, and then teleporting up and diving down again.) Note - I am fine with this stimulus. It produces an interesting experiment and interesting results. I do not think the authors need to change anything in their paper, but readers need to recognize that this is not a "looming predator".

      These "limitations" are better seen as "caveats" when folding these results in with the rest of the literature that has gone before and the literature to come. (Generally, I do not believe that science works by studies making discoveries that change how we think about problems - instead, science works by studies adding to the literature that we integrate in with the rest of the literature.) Thus, these caveats should not be taken as problems with the study or as fixes that need to be done. Instead, they are notes for future researchers to notice if differences are found in any future studies.

      Thus, my only suggestion is that I think authors could write a more careful paper by using the past and subjunctive tense appropriately. Experimental observations should be in past tense, as in "the influence of reward was context-dependent and emerged in the late phase" instead of "the influence of reward is context-dependent and emerges in the late phase" - it emerged in the late phase this once - it might not in future experiments, not due to any fault in this experiment nor due to replicability problems, but rather due to unexpected differences between this and those future experiments. At which point, it will be up to those future experiments to determine the difference. Similarly, large conclusions should be in the subjunctive tense, as in "these data suggest that threat intensity is likely to be the primary determinant of decision making" rather than "threat intensity is the primary determinant of decision making", because those are hypotheses not facts.

      We thank the reviewer for the helpful suggestions and have revised the Abstract accordingly.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how mice make defensive decisions when exposed to visual threats and how those decisions are influenced by reward value and social hierarchy. Using a naturalistic foraging setup and looming stimuli, the authors show that higher threat leads to faster escape, while lower threat allows mice to weigh reward value. Dominant mice behave more cautiously, showing higher vigilance. The behavioral findings are further supported by a computational model aimed at capturing how different factors shape decisions.

      Strengths:

      (1) The behavioral paradigm is well-designed and ethologically relevant, capturing instinctive responses in a controlled setting.

      (2) The paper addresses an important question: how defensive behaviors are influenced by social and value-based factors.

      (3) The classification of behavioral responses using machine learning is a solid methodological choice that improves reproducibility.

      Weaknesses:

      (1) Key parts of the methods are hard to follow, especially how trials are selected and whether learning across trials is fully controlled for. For example, it is unclear whether animals are in the nest during the looming stimulus presentations. The main text and methods should clarify whether multiple mice are in the nest simultaneously and whether only one mouse is in the arena during looming exposure. From the description, it seems that all mice may be freely exploring during some phases, but only one is allowed in the arena at a time during stimulus presentation. This point is important for understanding the social context and potential interactions, and should be clearly explained in both the main text and methods.

      We agree that these details are essential and have clarified them in the Methods. When the door system operated normally, only one mouse was allowed in the arena during looming exposure. Specifically, when all mice were in the nest, the nest-tunnel door was open and the tunnel-arena door was closed. Once a single mouse entered the tunnel, as detected by an OpenMV camera, the nest-tunnel door closed and the tunnel-arena opened, ensuring that only that mouse could enter the arena.

      Habituation was conducted over two days. On day 1, five mice were placed together in the nest for 30 minutes with all doors closed. Each mouse was then placed individually in the nest and allowed to freely explore the arena for 10 minutes under normal door operation. Finally, all mice were returned to the nest with all doors open and allowed for free exploration for 2 hours. On day 2, each mouse was placed individually in the nest and given an additional 1 hour of exploration under normal door operation.

      (2) It is often unclear whether the data shown (especially in the main summary figures) come from the first trial or are averages across several exposures. When is the cut-off for trials of each animal? How do we know how many trial presentations were considered, and how learning at different rates between individuals is taken into account when plotting all animals together? This is important because the looming stimulus is learned to be harmless very quickly, so the trial number strongly affects interpretation.

      We observed substantial inter-individual variability in habituation to looming stimuli, with a sharp decline in defensive responses over the first few trials followed by more gradual changes. To account for this, we segmented trials for each animal into two phases: an early rapidhabituation phase and a later stable phase. Analyzing these phases separately revealed that threat intensity dominates behavior in the early phase, whereas both threat and reward significantly influence behavior in the late phase. These results are now presented in revised Figures 2 and 3. Analyses restricted to first trials are included in Figure S5.

      (3) The reward-related effects are difficult to interpret without a clearer separation of learning vs first responses.

      As noted above, we have re-analyzed our data to account for learning effects.

      (4) The model reproduces observed patterns but adds limited explanatory or predictive power. It does not integrate major findings like social hierarchy. Its impact would be greatly improved if the authors used it to predict outcomes under novel or intermediate conditions.

      We have substantially revised the modeling analysis. The model is now fitted to behavioral data from the late phase and used to predict outcomes across additional conditions, including the early phase behavior and rank-dependent behavioral differences. The model successfully captures behavioral patterns across these conditions, supporting its predictive value beyond descriptive fitting.

      (5) Some conclusions (e.g., about vigilance increasing with reward) are counterintuitive and need stronger support or alternative explanations. Regarding the interpretation of social differences in area coverage, it's also possible that the observed behavioral differences reflect access to the nesting space. Dominant mice may control the nest, forcing subordinates to remain in the open arena even during or after looming stimuli. In this case, subordinates may be choosing between the threat of the dominant mouse and the external visual threat. The current data do not distinguish between these possibilities, and the authors do not provide evidence to support one interpretation over the other. Including this alternative explanation or providing data that addresses it would strengthen the conclusions.

      To support the interpretation of increased vigilance with reward under high-threat conditions, we analyzed additional behavioral measures beyond latency to flee. Rewarded mice showed longer foraging interval and slower foraging speed, both consistent with elevated vigilance (Figures 3L and 3M).

      To address the alternative explanation that subordinate mice may remain in the arena due to restricted nest access, we compared arena occupancy before, during, and after looming exposure. Although subordinates spent more time in the arena before looming, this difference disappeared during and after looming exposure (Figures 4C). Moreover, dominant and subordinate mice were

      equally likely to flee to the nest during escape trials. These findings rule out nest access restrictions as an explanation for the observed rank-dependent differences in defensive behaviors.

      (6) While potential neural circuits are mentioned in the discussion, an earlier introduction of candidate brain regions and their relevance to threat and value processing would help ground the study in existing systems neuroscience.

      We have revised the Introduction to incorporate relevant brain regions and neural circuits.

      (7) Some figures are difficult to interpret without clearer trial/mouse labeling, and a few claims in the text are stronger than what the data fully support. Figure 3H is done for low contrast, but the interesting findings will be to do this experiment with high contrast. Figure 4H - I don't understand this part. If the amount of time in the center after the loom changes for subordinate mice, how does this lead to the conclusion that they spend most of their time in the reward zone?. Figure 3A - The example shown does not seem representative of the claim that high contrast stimuli are more likely to trigger escape. In particular, the 10% sucrose condition appears to show more arena visits under low contrast than high contrast, which seems to contradict that interpretation. Also, the plot currently uses trials on the Y-axis, but it would be more informative to show one line per animal, using only the first trial for each. This would help separate initial threat responses from learning effects and clarify individual variability.

      We have substantially revised the figures. Results from trial segmentation based on individual habituation are now explicitly presented in Figures 2 and 3, and analyses using only the first trials are provided in Figure S5 to separate initial responses from learning effects.

      Regarding the original Figure 4H, we are not entirely certain about the concern. In this panel, we measured time spent in the reward zone, which is defined as the region within 10 cm of the reward port at the end of the arena, not the center of the arena, during looming exposure. Subordinate mice spent significantly more time in the reward zone than dominant mice. We have further clarified this in the revised manuscript.

      (8) The analysis does not explore individual variability in behavior, which could be an important source of structure in the data. Without this, it is difficult to know whether social hierarchy alone explains behavioral differences or if other stable traits (e.g., anxiety level, prior experiences) also contribute.

      We observed substantial individual variability in both dominant and subordinate mice, even on the first trial (Figure S7). Paired dominant–subordinate comparisons were used to isolate rankdependent effects.

      (9) The study shows robust looming responses in group-housed animals, which contrasts with other studies that often require single housing to elicit reliable defensive responses. It would be valuable for the authors to discuss why their results differ in this regard and whether housing conditions might interact with social rank or habituation.

      Robust looming-evoked defensive responses have been reported in both group- and singlehoused mice (Yilmaz and Meister, 2013, Lenzi et al., 2022), although single-housed mice habituate more rapidly. We have now discussed the potential interactions between housing conditions, social rank, and habituation in defensive behaviors in the revised manuscript.

      Reviewer #2 (Public review):

      Zhe Li and colleagues investigate how mice exposed to visual threats and rewards balance their decisions in favour of consuming rewards or engaging in defensive actions. By varying threat intensity and reward value, they first confirm previous findings showing that defensive responses increase with threat intensity and that there is habituation to the threat stimulus. They then find that water-deprived mice have a reduced probability of escaping from low contrast visual looming stimuli when water or sucrose are offered in the environment, but that when the stimulus contrast is high, the presence of sucrose or water increases the probability of escape. By analysing behaviour metrics such as the latency to flee from the threat stimulus, they suggest that this increase in threat sensitivity is due to increased vigilance. Analysis of this behaviour as a function of social hierarchy shows that dominant mice have higher threat sensitivity, which is also interpreted as being due to increased vigilance. These results are captured by a drift diffusion model variant that incorporates threat intensity and reward value.

      The main contribution of this work is to quantify how the presence of water or sucrose in waterdeprived mice affects escape behaviour. The differential effects of reward between the low and high contrast conditions are intriguing, but I find the interpretation that vigilance plays a major role in this process is not supported by the data. The idea that reward value exerts some form of graded modulation of the escape response is also not supported by the data. In addition, there is very limited methodological information, which makes assessing the quality of some of the analyses difficult, and there is no quantification of the quality of the model fits.

      (1) The main measure of vigilance in this work is reaction time. While reaction time can indeed be affected by vigilance, reaction times can vary as a function of many variables, and be different for the same level of vigilance. For example, a primate performing the random dot motion task exhibits differences in reaction times that can be explained entirely by the stimulus strength. Reaction time is therefore not a sound measure of vigilance, and if a goal of this work is to investigate this parameter, then it should be measured. There is some attempt at doing this for a subset of the data in Figure 3H, by looking at differences in the action of monitoring the visual field (presumably a rearing motion, though this is not described) between the first and second trials in the presence of sucrose. I find this an extremely contrived measure. What is the rationale for analysing only the difference between the first and second trials? Also, the results are only statistically significant because the first trial in the sucrose condition happens to have zero up action bouts, in contrast to all other conditions. I am afraid that the statistics are not solid here. When analysing the effects of dominance, a vigilance metric is the time spent in the reward zone. Why is this a measure of vigilance? More generally, measuring vigilance of threats in mice requires monitoring the position of the eyes, which previous work has shown is biased to the upper visual field, consistent with the threat ecology of rodents.

      We agree that reaction time can be influenced by multiple factors, including stimulus strength. Consistent with this, reaction times (i.e. latencies to flee) were substantially shorter under highcontrast conditions. However, even under the same high-contrast condition, reaction times were significantly shorter in the reward conditions compared to the no-reward condition, suggesting that other factors such as vigilance may contribute.

      Regarding the measurement of vigilance, in addition to the latency to flee, we analyzed two additional behavioral measures related to vigilance. First, we examined the foraging interval. Our hypothesis was that more vigilant animals would wait longer before re-entering the reward zone following threat exposure. Consistent with this prediction, mice under sucrose and water reward conditions showed significantly longer foraging intervals than those under no-reward conditions (Figure 3L). Second, we analyzed the foraging speed as mice approached the reward. Increased vigilance should lead to more cautious and therefore slower movements. Our results support this, as mice moved more slowly towards the reward under sucrose conditions (Figure 3M). Taken together, these three measures consistently indicate that mice exhibit increased vigilance under sucrose reward in high-threat conditions.

      (2) In both low and high contrast conditions, there are differences in escape behaviour between no reward and water or sucrose presence, but no statistically significant differences between water and sucrose (eg, Figure 3B). I therefore find that statements about reward value are not supported by the data, which only show differences between the presence or absence of reward. Furthermore, there is a confound in these experiments, because according to the methods, mice in the no-reward condition were not water deprived. It is thus possible that the differences in behaviour arise from differences in the underlying state.

      Our new analysis, which segments behavior into an early adaptive phase and a late stable phase, reveals a statistically significant difference between water and sucrose rewards in the late phase (Figure 3H), supporting a graded effect of reward value.

      To control for the potential confounds related to internal state, mice were not water-deprived in all reward conditions. We have clarified this in the revised manuscript.

      (3) There is very little methodological information on behavioural quantification. For example, what is hiding latency? Is this the same are reaction time? Time to reach the safe zone? What exactly is distance fled? I don't understand how this can vary between 20 and 100cm. Presumably, the 20cm flights don't reach the safe place, since the threat is roughly at the same location for each trial? How is the end of a flight determined? How is duration measured in reward zone measures, e.g., from when to when? How is fleeing onset determined?

      Hiding latency was defined as the time from stimulus onset to the animal’s arrival at the safe zone. Reaction time was quantified as the latency to flee, measured from stimulus onset to the initiation of the first flight state. The flight state was defined as locomotion exceeding 10 cm at a speed greater than 10 cm/s. Distance fled was defined as the distance covered between stimulus onset and offset for all trials. However, in trials classified as no reaction or freezing, this measure does not accurately reflect escape behavior. We will therefore rename it as distance under threat to better capture its meaning. The reward zone was defined as the region within 10 cm of the reward port at the end of the arena. Duration in the reward zone was measured as the time spent within this region during the 20 seconds following stimulus onset. In Figure 4E, the percentage of time spent in the reward zone was calculated relative to the total time the mouse remained in the arena during the 2-hour social session.

      All definitions and additional details on behavioral quantification have been included in the revised Methods section.

      (4) There is little methodological information on how the model was fit (for example, it is surprising that in the no reward condition, the r parameter is exactly 0. What this constrained in any way), and none of the fit parameters have uncertainty measures so it is not possible to assess whether there are actually any differences in parameters that are statistically significant.

      We have provided a detailed description of the model fitting procedure in the revised Methods section. Specifically, the reward-value parameter (r) was constrained to zero in the no-reward condition. We have plotted how the overall loss varies with differeent parameters (Figure S9).

      Reviewer #3 (Public review):

      Male mice were tested in a classic behavioral "flee the looming stimulus" paradigm. This is a purely behavioral study; no neural analyses were done. Mice were housed socially, but faced the looming stimulus individually. Drift-diffusion modeling found that reward-level interacted with threat level such that at low-threat levels, reward contrasted with threat as classically expected (high reward overwhelms low threat, low threat overwhelms low reward), but that reward aligned with threat at higher threat levels.

      Note that they define threat level by the darkness of the looming stimulus. I am not sure that darker stimuli are more threatening to mice. But maybe. Figure 3 shows that mice react more quickly to high contrast looming stimuli, but can the authors distinguish between the ability to detect the visual signal from considering it a more dangerous threat? (The fact that vigilance makes a difference in the high contrast condition, not the low contrast condition, actually supports the author's hypotheses here.)

      Regarding the interpretation of stimulus contrast as a proxy for threat level, we agree it is crucial to distinguish improved detection from heightened threat perception. To address this, we examined not only latency to flee but also escape distance and peak escape speed, two measures that reflect the intensity of the defensive response. If contrast only influenced detection, we would expect differences in latency but not in escape distance or speed. All three measures differed significantly across contrast conditions, supporting the interpretation that high-contrast stimuli are perceived as more threatening rather than simply more detectable. Furthermore, manual review of "no response" trials confirmed reliable detection in both conditions, with only three potential "missed" trials out of 117 under low contrast (Figure S3B). We have included this discussion in the revised manuscript.

      The drift-diffusion model (DDM) is fine. I note that the authors included a "leakage rate", which is not a standard DDM parameter (although I like including it). I would have liked to see more about the parameters. What were the distributions? What did the parameters correlate with behaviorally? I would have liked to see distributions of the parameters under the different conditions and different animals. Figure 2C shows the progression of learning. How do the fit parameters change over time as mice shift from choice to choice? How do the parameters change over mice? How do the parameters change over distance to the threat/distance to safety (as per Fanselow and Lester 1988)? They did a supplemental experiment where the threat arrived halfway along the corridor - we could get a lot more detail about that experiment - how did it change the modeling?

      Because our model is fit to the variance of latency distributions, it cannot be applied to singletrial data. Instead, we analyzed how decisions and latencies vary as functions of the fitted threat gain and reward value parameters (Figures 5G and 5H). We have also introduced a simplified deterministic model to further elucidate the decision-making process.

      Regarding the influence of distance to the threat, we conducted additional experiments, presenting the looming stimulus at the end of the arena when the mouse was at different distances from it (Figures S2C–G). We found that as the prey-threat distance increased, mice showed less direct escape behavior, with longer latencies to flee and slower escape speeds. This is consistent with the predatory imminence continuum theory (Fanselow and Lester, 1988), which describes graded defensive behaviors tuned to perceived threat level.

      Regarding the influence of distance to safety, our data indicate that it did not significantly affect defensive responses (Figures S2H and S2I). To test this further, we introduced barriers that lengthened the return path to the safe zone. We found that defensive decisions were not correlated with the distance to the safe zone (Figures S2J and S2K), suggesting that once a threat is detected, animals prioritize escape initiation over evaluating the exact path to safety.

      Overall, this is a reasonable study showing mostly unsurprising results. I think the authors could do more to connect the vigilance question to their results (which seems somewhat new to me).

      We have expanded our analysis of vigilance. In addition to escape latency, we examined the foraging interval and foraging speed. We hypothesized that more vigilant animals would wait longer before re-entering the reward zone following a threat and would approach the reward more slowly. Consistent with this prediction, mice in the sucrose- and water-reward conditions exhibited significantly longer foraging intervals and slower foraging speeds compared to those in the no-reward condition (Figures 3M and 3N). Together, these three measures consistently demonstrate that mice display heightened vigilance under high-threat, high-reward conditions.

      Although the data appear generally fine and the modeling reasonable, the authors do not do the necessary work to set themselves within the extensive literature on decision-making in mice retreating from threats.

      First of all, this is not a new paradigm; variants of this paradigm have been used since at least the 1980s. There is an *extensive* literature on this, including extensive theoretical work on the relation of fear and other motivational factors. I recommend starting with the classic Fanselow and Lester 1988 paper (which they cite, but only in passing), and the reviews by Dean Mobbs and Jeansok Kim, and by Denis Paré and Greg Quirk, which have explicit theoretical proposals that the authors can compare their results to. I would also recommend that the authors look into the "active avoidance" literature. Moreover, to talk about a mouse running from a looming stimulus without addressing the other "flee the predator" tasks is to miss a huge space for understanding their results. Again, I would start with the reviews above, but also strongly urge the authors to look at the Robogator task (work by June-Seek Choi and Jeansok Kim, work by Denis Paré, and others).

      Similarly, in their anatomical review, they do not mention the amygdala. Given the extensive literature on the role of the amygdala in retreating from danger, both in terms of active avoidance and in terms of encoding the danger itself, it would surprise me greatly if this behavior does not involve amygdala processing. (If there is evidence that the amygdala does not play a role here, but that the superior colliculus does, then that would be a *very* important result that needs to be folded into our understanding of decision-making systems and neural computational processing.)

      Second, there is an extensive economic literature on non-human animals in general and on rodents in particular. Again, the authors seem unaware of this work, which would provide them with important data and theories to broaden the impact of their results (by placing them within the literature). First, there are explicit economic literatures in terms of positively-valenced conflicts (e.g., neuroeconomics within the primate literature, sequential foraging and delaydiscounting tasks within the rodent literature), but also there is a long history within the rodent conditioning world, such as the classic work by Len Green and Peter Shizgal. I would strongly urge the authors to explore the motivational conflict literature by people like Gavin McNally, Greg Quirk, and Mark Andermann. Again, putting their results into this literature will increase the impact of their experiment and modeling.

      We have substantially revised the manuscript to contextualize our findings within the extensive literature on defensive behavior and decision-making. The revised Introduction and Discussion now integrate key theoretical frameworks, such as the predatory imminence continuum, and cite relevant work on active avoidance and other "flee the predator" paradigms (e.g., the Robogator task).

      We have also incorporated perspectives from neuroeconomics and motivational conflict, including literature on sequential foraging, delay-discounting tasks, and relevant rodent studies. Furthermore, we now discuss the potential contributions of specific brain regions, including the superior colliculus and the amygdala, to the economic and social modulation of innate defensive decisions in response to visual threats.

      Recommendations for the authors:

      Reviewing Editor Comments:

      These additional recommendations are generally consistent and overlapping across reviewers, particularly Reviewer #1 and 2, so it is advisable to undertake these changes/additions.

      Reviewer #1 (Recommendations for the authors):

      (1) Experimental methods and trial structure need clarification: It is often unclear how many trials were included per condition, per mouse, and whether the key behavioral effects (especially reward-related changes) were observed early in the session or after repeated stimulus exposure. For example, in several reward-related plots (e.g., Figure 3), it is not specified whether results are driven by early or later trials. Since the authors themselves report rapid learning of the looming stimulus (habituation), it is critical to state how many trials were included in each comparison, and to analyze whether effects hold on the first exposure and not the rest. Otherwise, conclusions about value-based behavior are hard to separate from learning effects, which may also differ between individuals. Specifically, the methods section is vague and hard to follow.

      We have substantially expanded the Methods section with additional details to improve clarity.

      To account for individual variability in habituation to the looming stimulus, we segmented trials for each animal into early and late phases. We demonstrate that threat level is the dominant factor driving behavioral responses in the early phase, while both threat level and reward condition shape behavior in the late phase. We have substantially revised Figures 2 and 3 to reflect these changes.

      (2) Add a summary of experimental design: A table or schematic summarizing the trial structure, experimental groups, reward/threat conditions, and the timeline of exposures would greatly improve clarity.

      We have added a schematic to Figure 2 summarizing the trial structure, experimental groups, reward and threat conditions, and the overall timeline.

      (3) Replot key results using only the first trial per mouse: This would allow readers to assess the first (not learned) responses and help control for habituation/suppression.

      We have replotted behavioral results using only the first trial from each mouse and included these analyses in Figure S5. These results confirm that threat level is the dominant factor driving the initial response to looming stimuli.

      (4) The model needs stronger justification and predictive value: As it stands, the model primarily fits the existing data and does not offer new insights beyond what is already evident from the behavioral results.

      Important findings, such as social hierarchy effects and habituation dynamics, are not captured in the model, reducing its relevance to the full dataset.

      The drift-diffusion framework is widely used, and in this implementation appears to have been adjusted post hoc to fit the observed data rather than generating new conceptual advances. No comparison with simpler models is included. Without testing simpler or alternative models, it is not clear whether the added complexity is necessary or justified.

      Use the model to generate and test predictions: to increase the model's contribution, the authors could simulate new conditions. Suggested experiments include:

      a) Predicting escape probability and latency at intermediate threat intensities to test whether behavior shifts gradually or abruptly.

      b) Using the model's habituation parameters to predict changes in escape behavior over repeated exposures.

      c) Adjusting vigilance or threat gain parameters to simulate dominant versus subordinate animals, and comparing model predictions to actual behavioral differences based on social rank.

      We have substantially revised the modeling section to address these concerns. The updated model is now fitted to behavioral data from the late phase of the reward–threat experiments and used to generate predictions for the early phase and for rank-dependent behavioral differences.

      The model accurately captures behavioral patterns across these conditions, demonstrating predictive power beyond descriptive fitting. Accordingly, we have removed the habituation component. Furthermore, we have introduced a simplified deterministic model in the revised manuscript to further understand the decision-making process.

      (5) Clarify housing and arena access conditions: It is unclear from the text whether all mice are in the nest during looming presentations and whether only one mouse is in the arena during the stimulus. This is important for understanding the social context of each trial and should be explained in the main text and methods.

      We have clarified this point in the Methods section. Under normal door operation, only one mouse was allowed in the arena during looming exposure. Specifically, when all mice were in the nest, the nest-tunnel door was open and the tunnel-arena door was closed. Once a single mouse entered the tunnel, as detected by an OpenMV camera, the nest-tunnel door closed and the tunnel-arena opened, ensuring that only that mouse could enter the arena.

      (6) Alternative interpretation of subordinate behavior: differences in area coverage and time in the reward zone may not reflect reduced vigilance, but rather avoidance of dominant mice. Subordinates may remain in the open arena to avoid conflict. The authors do not provide evidence distinguishing between these interpretations, and this should be addressed.

      To address the alternative explanation that subordinate mice may remain in the arena due to restricted nest access, we compared arena occupancy before, during, and after looming exposure (Figure 4C). Before looming exposure, subordinate mice spent significantly more time in the arena, consistent with the idea that they may perceive a social threat from the dominant mouse in the absence of any external threat. However, this difference disappeared during and after looming exposure. This shift suggests that the presence of an external threat alters the social dynamic, reducing the influence of dominance on nest access.

      To further assess whether dominant mice blocked subordinate access to the nest during threatdriven escapes, we analyzed the fraction of escape trials in which mice returned to the nest (Figure 4D). We found no significant difference between dominant and subordinate mice, indicating that dominant mice did not restrict nest access during these trials. Importantly, rank differences in reward-zone occupancy cannot be explained by nest exclusion, as mice do not need to return to the nest when escaping the threat—they can flee directly to the safe zone. Thus, nest access limitations do not account for the observed rank-dependent patterns.

      We agree with the reviewer that reward-zone occupancy should not be interpreted as reduced vigilance in subordinate mice; instead, it likely reflects higher perceived reward value. The manuscript has been revised accordingly.

      (7) Address why robust looming responses were observed in group-housed mice: previous studies often require single housing to elicit strong defensive responses. The authors should explain why their setup yields robust results in group-housed animals and whether housing conditions may interact with dominance or habituation.

      Looming exposure elicits robust defensive behaviors in both group- and single-housed mice (Yilmaz and Meister, 2013, Lenzi et al., 2022), with single-housed animals habituating more quickly to the stimulus (Lenzi et al., 2022). We have now discussed how housing conditions may interact with social rank and habituation to shape defensive behaviors in the revised manuscript.

      For the social-rank experiments, we intentionally co-housed dominant and subordinate mice to maintain a stable hierarchy. This choice was motivated by two considerations. First, our goal was to investigate how social rank modulates defensive responses under ethologically relevant conditions, where mice naturally live in groups. Single housing would remove this social context. Second, singly housing mice can destabilize or eliminate rank relationships, making it difficult to interpret rank-dependent behavioral differences.

      (8) Add analysis of individual variability: trial-by-trial variability or stable behavioral tendencies in individual animals are not explored. This could explain part of the variation currently attributed to social rank.

      We have analyzed individual variability in both dominant and subordinate mice. We observed substantial variability across all behavioral measurements for each group (Figure S7). To attribute the observed behavioral differences to social hierarchy rather than to other individual traits, we conducted paired comparisons between dominant and subordinate mice (Figure 4).

      (9)  Improve figure labeling and readability: some plots are ambiguous in terms of whether rows represent trials or animals. Overlapping points obscure the data in several figures, for example, Figure 3H, sucrose is n=4?- consider using jittered scatter plots, boxplots, or individual traces to improve clarity. Also same Figure axis Y is missing an 'e'.

      We have revised figures to improve clarity and corrected the typos.

      (10) Avoid overinterpretation of causal explanations: Statements such as "reward increases vigilance due to evolutionary pressure" or that "subordinates are less vigilant" go beyond what the current data can demonstrate and should be rephrased more cautiously.

      We have revised the manuscript to tone down the statement.

      Reviewer #2 (Recommendations for the authors):

      (1) Provide much more extensive methodological details on analyses and model fitting

      We have thoroughly revised the Methods section to provide extensive detail on both behavioral analyses and computational modeling, as outlined in our responses to points (3) and (4) of the Public Review.

      (2) Perform experiments or analyses that directly measure vigilance, if vigilance is to remain as a key explanation for the data.

      As detailed in our response to point (1) of the Public Review, we have supplemented the escape latency measure with two direct behavioral analyses of vigilance: foraging interval and foraging speed. This multi-metric approach robustly supports the interpretation of heightened vigilance.

      (3) Provide extra evidence for an effect of reward value, as opposed to the presence or absence of reward. Control for differences arising from the water deprivation state by performing the no reward condition experiments in water-deprived mice.

      All behavioral data in the reward–threat experiment were collected on normal (non-deprived) mice (Figures 2 and 3), which have been clarified in the revised manuscript. We have reanalyzed the data by segmenting trials into early and late phases for each animal. In the late phase, under low-threat conditions, the effect of reward value is reflected in significant differences between water and sucrose in terms of escape distance and time spent in the reward zone (Figures 3I and 3J). Under high-threat conditions, the reward value effect is reflected in significant differences in latency to flee and peak escape speed (Figures 3K and 3N).

      (4)  Using drift rate to describe the "r" variable is confusing because the drift rate of the drift diffusion process is also determined by terms alpha, beta, and h-terms.

      We have termed “r” as the reward value in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) I would tone down some of the extreme statements about the problems of previous experiments (such as that most decision-making is on 2AFC). Lots of people do decision-making in serial foraging, fleeing, and other behavioral tasks. The classic Morris water-maze or Barnesmaze are decision-making tasks that aren't 2AFC. Serial foraging tasks, such as the Restaurant Row task aren't 2AFC. And, actually, lots of mouse behavior tasks are deciding when to stop on a treadmill for a reward. And, for that matter, your task isn't all that "realistic" - mice aren't evolved to flee looming disks, they are evolved to flee hawks and owls. This doesn't invalidate your task at all. I just recommend making it about your work in a positive way rather than others in a negative way.

      We have revised the manuscript to adopt a more positive framing of our work.

      (2) I also don't think there's much use in bringing in crayfish in a mouse task. Spend your time connecting to the other rodent data (mice and rats) instead.

      We agree and have revised the manuscript accordingly, focusing our discussion on relevant rodent literature to provide a more appropriate context for our findings.

      Minor concerns:

      (1) The authors use the term "cognitive control" without making clear what they mean. In general, the authors seem to have a view on decision-making as either being "reflexes" or "cognitive control". This is a very outdated perspective. Modern perspectives include multiple decision-making systems competing, separating these based on their computational properties, such as planning, procedural, instinctual, and, yes, reflexive. Current views on the kinds of behaviors they are discussing generally see fleeing as a transition from reflexive (tonic immobility, freezing) and instinctual responses (freezing, fleeing) to deliberative (anxiety) and procedural (habit). The authors might take a look at the recent Calvin and Redish (2025) paper for some ideas on this.

      We appreciate the reviewer’s insight regarding the term “cognitive control.” In our study, we used this term to emphasize that defensive responses to looming threats are not purely reflexive. Mice exhibit four distinct types of defensive decisions within a short time window, and these decisions are systematically modulated by reward value and social rank. Notably, reward modulation is bidirectional: high reward suppresses defensive responses under low-threat conditions but enhances them under high-threat conditions, indicating that animals integrate multiple sources of information rather than relying solely on instinctive mechanisms.

      We did not observe mid-trajectory aborts in mice, as reported in rats by Calvin & Redish (2025). This difference may reflect species-specific behavior or the nature of the threat: our looming stimulus is purely visual and non-harmful, whereas the robotic predator in their study presents a physical threat. We have revised the Discussion to clarify our use of “cognitive control” and to incorporate these perspectives.

      (2) Only male mice were used. This limits the conclusions that can be drawn.

      We acknowledge the limitation of using only male mice and have discussed this limitation in the revised manuscript.

      (3) Did the authors observe darting behavior? (Gruene...Shansky 2015).

      We did not observe darting behavior, characterized by rapid movement, as reported during inescapable fear conditioning. In our experiment, the mice consistently escaped towards the nest, in most trials, ran directly to the nest without stopping. Occasionally, under low contrast conditions, mice paused once or twice but never moved towards the reward.

      (4) How was only one mouse allowed into the linear arena at a time?

      When all mice were in the nest, the nest-tunnel door was open while the tunnel-arena door remained closed. When a single mouse entered the tunnel, as detected by the RFID and OpenMV camera system, the nest-tunnel door closed and the tunnel-arena door opened, allowing only that mouse to enter the arena. We have clarified this protocol in the Methods section.

      (5) I would like to see more extensive analyses of the animal's responses as a function of distance to the threat (as per Fanselow and Lester 1988).

      As detailed in our response to the public review, we conducted new experiments analyzing behavior as a function of prey–threat distance. The finding that defensive responsiveness decreases with increasing prey–threat distance is now presented in Figures S2C–G and discussed in the context of the predatory imminence continuum.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to assess the variability in the expression of surface protein multigene families between amastigote and trypomastigote Trypanosoma cruzi, as well as between individuals within each population. The analysis presented shows higher expression of multigene family transcripts in trypomastigotes compared to amastigotes and that there is variation in which copies are expressed between individual parasites. Notably, they find no clear subpopulations expressing previously characterised trans-sialidase groups. The mapping accuracy to these multicopy genes requires demonstration to confirm this, and the analysis could be extended further to probe the features of the top expressed genes and the other multigene families also identified as variable.

      Strengths:

      The authors successfully process methanol-fixed parasites with the 10x Genomics platform. This approach is valuable for other studies where using live parasites for these methods is logistically challenging.

      Weaknesses:

      The authors describe a single experiment, which lacks controls or complementation with other approaches and the investigation is limited to the trans-sialidase transcripts.

      It would be more convincing to show either bioinformatically or by carrying out a controlled experiment, that the sequencing generated has been mapped accurately to different members of multigene families to distinguish their expression. If mapping to the multigene families is inaccurate, this will impact the transcript counts and downstream analysis.

      We thank the reviewer for raising these important points.

      We agree that the analysis of multigene families at the single-cell level is an important question, particularly given the heterogeneity observed across several of them. However, the aim of this short report is not to provide a comprehensive analysis of the entire experiment, but rather to focus on what we consider an important biological phenomenon observed in TcTS genes.

      Regarding the mapping accuracy of the reads, we acknowledge that this can limit the disambiguation of highly similar multicopy transcripts. This is, in fact, a common challenge when analyzing transcriptomic data from T. cruzi.

      To address this issue, we analyzed the sequence identity of the 3′ ends of TcS transcripts (defined as the 3′UTR plus 20% of the CDS region). As shown in Author response image 1, these regions display a median sequence identity of approximately 25%, indicating that sufficient sequence divergence exists for mapping algorithms to use during read assignment.

      In addition, it is important to note that kallisto, the software used in our analysis, was specifically designed to address multimapping reads through pseudoalignment combined with an expectation-maximization algorithm that probabilistically assigns reads across compatible transcripts.

      To directly assess performance, we simulated reads from the T. cruzi transcriptome used in this study (3′UTRs plus 20% of the CDS regions) and compared two mapping/counting strategies: (a) transcriptome pseudoalignment using kallisto, and (b) genome alignment followed by counting using STAR + featureCounts. The latter approximates the strategy implemented in CellRanger, the standard pipeline for quantifying expression levels from 10X Genomics single cell RNA-seq data. We found that kallisto recovered the simulated “true” counts with substantially higher accuracy than STAR + featureCounts (Pearson correlation: all genes, 0.991 vs 0.595; surface protein genes, 0.9996 vs 0.827; trans-sialidase (TcS) genes, 0.9998 vs 0.773). These results indicate that pseudoalignment is currently the optimal strategy for recovering the relative expression of highly similar gene family members (Author response image 1 C).

      Author response image 1

      (A) Distribution of pairwise sequence identity values calculated among the 3′-end regions of all transcripts (defined as the 3′UTR plus 20% of the coding sequence). (B) Distribution of read mapping coordinates over all multigene family transcripts normalized as percentage of the gene length (C) Scatter plots showing the correlation between estimated transcript counts obtained using kallisto (red) and STAR + featureCounts (grey) versus the corresponding simulated ground-truth values.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a valuable single-cell RNA-seq study on Trypanosoma cruzi, an important human parasite. It investigates the expression heterogeneity of surface proteins, particularly those from the trans-sialidase-like (TcS) superfamily, within amastigote and trypomastigote populations. The findings suggest a previously underappreciated level of diversity in TcS expression, which could have implications for understanding parasite-host interactions and immune evasion strategies. The use of single-cell approaches to delve into population heterogeneity is strong. However, the study does have some limitations that need to be addressed.

      The focus on single-cell transcriptional heterogeneity in surface proteins, especially the TcS family, in T. cruzi is novel. Given the important role of these proteins in parasite biology and host interaction, the findings have potential significance.

      Strengths:

      The key finding of heterogeneous TcS expression in trypomastigotes is well-supported. The analysis comparing multigene families, single-copy genes, and ribosomal proteins highlights the unusual nature of the variation in surface protein-coding genes.

      Weaknesses:

      While the manuscript identifies TcS heterogeneity, the functional implications of the different expression profiles remain speculative. The authors state it may reflect differences in infectivity, but no direct experimental evidence supports this.

      The manuscript lacks any functional validation of the single-cell findings. For instance, do the trypomastigote subpopulations identified based on TcS expression exhibit differences in infectivity, host cell tropism, or immune evasion? Such experiments would greatly strengthen the study.

      We thank the reviewer for their careful reading of the manuscript. We agree that obtaining experimental evidence on the influence of multiple multigene families would represent a significant advancement in the field. However, we would like to emphasize that this study is presented as a short communication centered on a specific and biologically relevant observation within a single multigene family. The aim of the manuscript is to highlight what we consider an important biological phenomenon that raises hypotheses to be tested in future work.

      The influence of phenotypic heterogeneity and its possible advantages under environmental pressures has been previously proposed for Trypanosoma cruzi, related trypanosomatids, and other biological systems, ranging from bacteria to tumors (Seco-Hidalgo 2015, doi: 10.1098/rsob.150190 and Luzak 2021, doi: 10.1146/annurev-micro-040821-012953, for a comprehensive review on this topic). While the reviewer is correct in noting that our model does not demonstrate a functional role for TcTS heterogeneity, the experimental approaches required to address this question in a large multigene family are highly complex. This is particularly challenging in T. cruzi, where the study of multigene families is limited by the restricted set of available molecular biology tools (such as RNAi). Therefore, further experimental validation of these observations falls outside the scope of this short report.

      In this revised version, we have included additional validation and clarification of the results, as well as a more explicit discussion of their limitations. In addition, we present a preliminary analysis exploring potential mechanisms that could coordinate the observed expression patterns of the TcTS family.

      The authors identify a subpopulation of TcS genes that are highly expressed in many cells. However, it is unclear if these correspond to previously characterized TcS members with specific functions.

      The TcS subgroup with a high frequency of detection comprises 31 genes, none of which belong to the catalytically active Group I trans-sialidases. Instead, this subgroup includes members of Groups II, III, IV, V, VI, and VIII. This information has been added to Supplementary Table 3 and is now stated in the revised manuscript.

      The authors hypothesize that observed heterogeneity may relate to chromatin regulation. However, the study does not directly address these mechanisms. There are interesting connections to be made with what they identify as the colocalization of genes within chromatin folding domains, but the authors do not fully explore this. It would be insightful to address these mechanisms in future work.

      In response to the reviewer’s and editorial team’s request for additional mechanistic insight into the regulatory processes that may be involved in the observed patterns, we have expanded the revised manuscript to discuss how the genomic context of TcS loci could contribute to the observed heterogeneity in TcS expression. As noted in the original version of the manuscript, TcS genes and other surface-protein gene families are largely partitioned into discrete genomic compartments, whose expression has been reported to be regulated by epigenetic control of chromatin-folding domains (doi.org/10.1038/s41564-023-01483-y). However, we previously showed that TcS genes detected in a high proportion of cells are, in most cases, dispersed throughout the genome, arguing against a model in which their preferential expression results from colocalization within a small number of ubiquitously activated chromatin domains. In response to the reviewer’s suggestion, we performed a more detailed analysis of the genomic locations of these TcS genes. We found that many of them are localized within the core compartment (new Figure 5). Because the core compartment is enriched for conserved, housekeeping genes that typically display more constitutive expression (doi.org/10.1038/s41564-023-01483-y), whereas the disruptive compartment is enriched for lineage-specific multigene families associated with variable, stage-specific, and recently reported stochastic expression (doi.org/10.1038/s41467-025-64900-2), our results are consistent with a model in which compartment-specific regulatory mechanisms (in addition to post-transcriptional regulation) influence the differential cellular expression of core- versus disruptive-located TcS genes. We have incorporated these results and discussion in the revised manuscript.

      The merging of technical replicates needs further justification and explanation as they were not processed through separate experimental conditions. While barcodes were retained, it would be informative to know how well each technical replicate corresponds with the other. If both datasets were sequenced on the same lane, the inclusion of technical replicates adds noise to the analysis.

      Regarding technical details, we now include the total number of mapped reads and average number of reads mapped per cell (new paragraph in the Methods section.

      The technical replicates consist of a single Illumina library that was sequenced in two separate runs. As this approach is expected to be highly reproducible, we merged both runs into a single count table. To support this decision, we assessed the concordance between the two sequencing runs and observed an almost perfect correlation between them (Author response image 2).

      Author response image 2.

      Correlation analysis of number of reads assigned to cells between technical replicate 1 and technical replicate 2.

      While the number of cells sequenced (3192) seems reasonable, it's not clear how much the conclusions are affected by the depth of sequencing. A more detailed description of the sequencing depth and its impact on gene detection would be valuable.

      We detected a mean of 1088 genes per cell. Based on the 15,319 annotated protein-coding genes in the reference genome, this represents 7.1% of the T. cruzi protein-coding gene complement detected in each cell.

      Across the entire dataset, a total of 14,321 genes were detected in at least one cell, representing 93.5% of all annotated protein-coding genes. This suggests that our experiment captured a broad representation of the parasite's transcriptome.

      This per-cell detection rate is characteristic of droplet-based scRNA-seq and is consistent with other trypanosomatid studies. For example, the T. brucei single-cell atlas (Hutchinson et al., 2021) reported a median detection of 1052 genes per cell. In the case of T. cruzi, the recently published pre-print of the T. cruzi single cell atlas from Laidlaw & García-Sánchez et al. reported a mean between 298 and 928 genes detected per cell (depending on the sample).

      This information is now included in Methods.

      While most of the methods are clear, the way in which the subsampled gene lists were generated could be more thoroughly described, as some details are not clear for the subsampling of single-copy genes.

      The subsampling method was originally described in the Figure 2 legend; to better highlight this approach, we have now moved its description to the Methods section.

      Some of the figures are difficult to interpret. For example, the color scaling in the heatmap of Supplementary Figure 3B is not self-explanatory and it is hard to extract meaningful conclusions from the graph.

      We agree with the reviewer in this assessment. We have now modified the figures to be more self-explanatory and better reflect the conclusions.

      Reviewer #3 (Public review):

      The study aimed to address a fundamental question in T. cruzi and Chagas disease biology - how much variation is there in gene expression between individual parasites? This is particularly important with respect to the surface protein-encoding genes, which are mainly from massive repetitive gene families with 100s to 1000s of variant sequences in the genome. There is very little direct evidence for how the expression of these genes is controlled. The authors conducted a single-cell RNAseq experiment of in vitro cultured parasites with a mixture of amastigotes and trypomastigotes. Most of the analysis focused on the heterogeneity of gene expression patterns amongst trypomastigotes. They show that heterogeneity was very high for all gene classes, but surface-protein encoding genes were the most variable. In the case of the trans-sialidase gene family, many sequence variants were only detected in a small minority of parasites. The biology of the parasite (e.g. extensive post-transcriptional regulation) and potential technical caveats (e.g. high dropout rates across the genome) make it difficult to infer what this might mean for actual protein expression on the parasite surface.

      We thank the reviewer for this important comment, highlighting a central challenge when studying trypanosomatid biology. We acknowledge that in most eukaryotes and particularly in T. cruzi, where there is a predominant role of post-transcriptional regulation, mRNA levels are not always directly correlated with protein abundance, as previously reported by us and others (10.1186/s12864-015-1563-8, 10.1128/msphere.00366-21, 10.1590/S0074-02762011000300002, 10.1042/bse0510031). Nevertheless, steady-state transcript levels obtained by RNA-seq remain informative for assessing differential gene expression, and this approach has been widely used as a proxy for the study of gene expression profiles in T. cruzi (10.7717/peerj.3017, 10.1371/journal.ppat.1005511, 10.1016/j.jbc.2023.104623, 10.3389/fcimb.2023.1138456, 10.1186/s13071-023-05775-4).

      It's also interesting to note that recent proteomic analyses (10.1038/s41467-025-64900-2) have revealed substantial heterogeneity in the expression of surface proteins, including trans-sialidases, supporting the idea that the transcriptional heterogeneity we observe reflects a genuine biological feature that propagates to the protein level.

      We have now added a sentence to the discussion acknowledging this limitation and discussed the results from Cruz-Saavedra, et al. in the revised manuscript.

      (1) Limit of detection and gene dropouts

      An average of ~1100 genes are detected per parasite which indicates a dropout rate of over 90%. It appears that RNA for the "average" single copy 'core' gene is only detected in around 3% of the parasites sampled (Figure 2c: ~100 / 3192). This may be comparable with some other trypanosome scRNAseq studies, but this still seems to be a major caveat to the interpretation that high cell-to-cell variability in gene expression is explained by biological rather than technical factors. The argument would be more convincing if the dropout rates and expression heterogeneity were minimal for well-known highly expressed genes e.g. tubulin, GAPDH, and ribosomal RNAs. Admittedly, in their Final Remarks, the authors are very cautious in their interpretation, but it would be good to see a more thorough discussion of technical factors that might explain the low detection rates and how these could be tested or overcome in future work.

      (2) Heterogeneity across the board

      The authors focus on the relative heterogeneity in RNA abundance for surface proteins from the multicopy gene families vs core genes. While multicopy gene sequences do show more cell-to-cell variability, the differences (Figure 2D) are roughly average Gini values of 0.99 vs 0.97 (single copy) or 0.95 (ribosomal). Other studies that have applied similar approaches in other systems describe Gini values of < 0.2-0.25 for evenly expressed "housekeeping" genes (PMIDs 29428416, 31784565). Values observed here of >0.9 indicate that the distribution for all gene classes is extremely skewed and so the biological relevance of the comparison is uncertain.

      We recognize the limitations imposed by gene dropout in our data, as highlighted by the reviewer. Unfortunately, gene dropout is an inherent limitation of 10x genomics data. Trypanosomatids are not an exception in this regard, and the general metrics of the single-cell RNA-seq data in other reports are equivalent to those obtained in our experiment.

      Despite this important limitation, we believe that our comparative analyses (the contrast between TcS and ribosomal protein expression) provide valuable insights into a biological phenomenon with potential functional relevance for the parasite. Furthermore, we are actively working on generating single-cell RNA-seq data using alternative methodologies that improve gene dropout rates. We anticipate that these future studies will help clarify the extent of the phenomenon described in this work.

      Our results reveal a small subset of TcS genes that are frequently detected across cells, a pattern that is not compatible with random detection unless these genes were highly expressed and preferentially captured by random sampling. However, as shown in Figure 4b, many genes expressed at comparable levels are not detected at high frequencies. In line with this, Figure 4c shows that within individual cells, the detected TcS genes exhibit similar expression levels. Finally, we confirmed that this frequently detected subset shows high read counts at the bulk RNA-seq level (Figure 4 - Figure Supplement 1), consistent with the fact that these TcS are frequent in the population even when they are not specially highly expressed within each cell. Taken together, these findings argue against a purely random sampling of TcS genes and support the interpretation that this pattern reflects an underlying biological feature. We agree that further validation will be required. Accordingly, since the initial submission, we have been careful to frame our conclusions conservatively, explicitly noting that dropout remains a limitation of these data that could influence the observed patterns. In the revised version, we have strengthened this point by including a specific statement in the final remarks. Our interpretation is presented as a working hypothesis that is fully compatible with the observations reported here and may be informative for the field. To better reflect this reasoning, we have revised Figure 4b, expanded the discussion, and explicitly included this limitation in the final remarks of the revised manuscript.

      Nevertheless, this study does provide some tantalising evidence that the expression of surface genes may vary substantially between individual parasites in a single clonal population. The study is also amongst the very first to apply scRNAseq to T. cruzi, so the broader data set will be an important resource for researchers in the field.

      We thank the reviewer for highlighting the relevance of our study and for their positive assessment of the potential significance of these observations. We also agree that the dataset generated here may represent a useful resource for the community.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figures 1c and 1d, it would be useful to include the genes as the plot titles.

      We agree with the reviewer that including gene names in the plot makes the panels more self-explanatory. We have added gene names to the updated version of Figure 1.

      (2) Can you include the read lengths of the sequencing and whether this is sufficient to map accurately to very similar genes of the same multigene family? As stated in the public summary, this would make the data far more convincing as standard 10x chromium cannot distinguish similar gene copies unless a longer read 2 is used. Given that only the 3' end is targeted, is this enough to distinguish the TcS and other mutligene family transcripts?

      We thank the reviewer for raising this important point. We agree that short 3′ biased reads can limit the disambiguation of highly similar multicopy transcripts. This is, in fact, a common challenge when analyzing transcriptomic data from T. cruzi.

      To address this issue, we analyzed the sequence identity of the 3′ ends of TcS transcripts (defined as the 3′UTR plus 20% of the CDS region). As shown in Author response image 1, these regions display a median sequence identity of approximately 25%, indicating that sufficient sequence divergence exists for mapping algorithms to use during read assignment.

      In addition, it is important to note that kallisto, the software used in our analysis, was specifically designed to address multimapping reads through pseudoalignment combined with an expectation-maximization algorithm that probabilistically assigns reads across compatible transcripts.

      To directly assess performance, we simulated reads from the T. cruzi transcriptome used in this study (3′UTRs plus 20% of the CDS regions) and compared two mapping/counting strategies: (a) transcriptome pseudoalignment using kallisto, and (b) genome alignment followed by counting using STAR + featureCounts. The latter approximates the strategy implemented in CellRanger, the standard pipeline for quantifying expression levels from 10X Genomics single cell RNA-seq data. We found that kallisto recovered the simulated “true” counts with substantially higher accuracy than STAR + featureCounts (Pearson correlation: all genes, 0.991 vs 0.595; surface protein genes, 0.9996 vs 0.827; trans-sialidase (TcS) genes, 0.9998 vs 0.773). These results indicate that pseudoalignment is currently the optimal strategy for recovering the relative expression of highly similar gene family members (Author response image 1C).

      The length of the R2 read (91bp) was included in Methods (line 411).

      (3) It is stated that 'single copy' genes also include 'low copy number genes". What does this include exactly? Is it more actuate to say non-surface protein genes?

      The distinction we aim to make is between multigene families and the rest of the genome. Most multigene families encode surface proteins, but not all surface protein genes belong to multigene families. To clarify this point we included a sentence in methods to reflect that when we describe “surface proteins” we are referring to surface proteins coded by multigene families (line 453). In addition, long-read genomic DNA sequencing and assembly have revealed that many genes previously believed to be single-copy are actually duplicated at low copy numbers (doi.org/10.1099/mgen.0.000177). For this reason, we extend the concept of “single-copy” genes to include those that have only a few duplicates.

      (4) It is stated in line 127 that TcS have particular high heterogeneity - it does not look that way by eye compared to the other multigene families. Can statistic be used to prove this, or simply state the decision was made to focus on the TcS?

      As noticed by the reviewer, all multigene families show significantly higher heterogeneity compared to single-copy genes, as stated in the text and shown in figure legends from Figure 2, Supplementary Figure 1 and the new Supplementary Table 2.

      That said, it was not the statistical results that guided our decision to focus on TcS, but rather their well-established biological relevance in T. cruzi. As suggested, we have now emphasized this rationale more clearly in the revised text (lines 160-167).

      Besides, recent work has shown that TcS genes exhibit a bimodal distribution of expression levels using bulk RNA-seq data, in contrast to core genes and other multigene families (doi.org/10.1038/s41467-025-64900-2, doi.org/10.1038/s41564-023-01483-y). This distinct regulatory behavior further justifies our decision to examine TcS separately.

      (5) Expression of different TcS has been investigated between the different life cycle stages for a few individual genes previously (Freitas et al). Can the authors not extend this investigation to all the genes detect by scRNA-seq here to demonstrate those with higher/lower expression in amastigotes vs trypomastigotes building on Figure 2A? Are particular groups linked to either stage?

      We performed this analysis and did not observe any correlation between TcS groups and life cycle stage. In all cases TcS were more frequently detected in trypomastigotes. This difference was statistically significant for all groups except group VII, likely due to the low number of genes analyzed in this group (Author response image 3).

      Author response image 3.

      Per-gene number of expressing cells by TcS group and life-stage. Boxplots show, for each TcS group (I–VIII), the distribution across genes of the number of cells in which the gene is detected. Each point represents a single TcS; Amastigote cells: green points/boxes, Trypomastigote cells: salmon points/boxes. The y-axis is on log10 scale. Asterisks indicate statistically significant differences from the comparison between Amastigote and Trypomastigote within each TcS group, assessed using a paired two-sided Wilcoxon signed-rank test: * p < 0.05, ** p < 0.01, *** p < 0.001.

      (6) What exactly is the Z-score shown in Figure 2B?

      In this analysis num_multigene represents the number of multigene family genes detected in each individual cell. For every cell, we counted how many genes from our predefined multigene family gene list has detectable expression (more than zero UMI counts); in the UMAP plot, this value is reflected by the size of each point. On the other hand, z_multigene captures the relative expression level of multigene family genes within each cell. This metric is calculated by summing the UMI counts of all multigene family genes per cell and then standardizing this value across the dataset using a z-score transformation, such that positive values reflect above-average multigene family expression and negative values reflect below-average levels. In the UMAP plot, this metric determines the color scale of each point. Taking together num_multigene and z_multigene allow us to distinguish cells that express multigene family genes broadly (high gene counts), strongly (high relative expression), both, or neither, and to relate these patterns to identified cell populations.

      We included a short description in legend of the new version of Figure 2 (lines 176-180).

      (7) For the reclustering of trypomastigotes based on TcS genes alone, please show the UMAP and discuss why the resolution giving two clusters is chosen? I assume increasing the resolution does not reveal clusters of cells express one of the 8 groups of TcS for example?

      We appreciate the reviewer’s suggestion. In this analysis, our goal was to test whether the phenotypic heterogeneity previously reported in trypomastigotes could be recapitulated using TcS genes alone, as prior studies described two major transcriptomic phenotypes within this stage.

      Increasing the clustering resolution did not reveal subclusters corresponding to the eight TcS sequence groups. This might reflect the fact that these groups are defined based on sequence similarity rather than on expression patterns, as noted by Freitas et al. (doi:10.1371/journal.pone.0025914).

      (8) In Figure 4B, there may be an upward trend in the level of expression and the number of cells a transcript is detected in? It would be worth showing this is or is not the case with statistics if possible.

      The number of genes detected in a high proportion of cells is low, which limits the statistical power of this analysis. Also, substantial dispersion is observed within the 0-5% interval. Nevertheless, this figure is presented primarily to highlight that a considerable number of highly expressed genes are detected in only a small fraction of cells. If expression level were the main determinant of detection frequency across cells, one would expect very few highly expressed genes to fall within the 0-5% interval. Contrary to this expectation, among the 50 highest expressed TcS genes, 62% are detected in fewer than 5% of cells, and even among the top 10 most highly expressed TcS genes, 40% fall within this lowest detection group. To facilitate this interpretation, we modified the figure (new Figure 4b) to explicitly highlight the top 50 most expressed TcS genes and incorporated this discussion into the main text of the revised manuscript (lines 244-251), making the conclusion clearer to the reader.

      (9) Do the cells group instead by expression of any of the other multigene families not investigated in detail?

      It is possible that additional transcriptional substructure among trypomastigotes is driven by the expression of other multigene families beyond TcS. In this short report (with limited number of figures, words, etc.), we focused specifically on the trans-sialidase family as discussed earlier. A more comprehensive analysis including other large surface gene families (MASPs, mucins, GP63) is planned as part of ongoing work and will be presented in future reports.

      Reviewer #2 (Recommendations for the authors):

      This reviewer suggests the conduction of functional experiments in follow-up studies to establish links between TcS expression profiles and parasite behavior and into potential regulatory mechanisms responsible for the observed TcS heterogeneity, particularly focusing on epigenetic modifications. It would be interesting to correlate the highly expressed TcS members identified here with previously characterized TcS isoforms and provide more description regarding which particular groups and TcS members are driving the findings. It would benefit from further clarification regarding sequencing depth, technical replication merging, subsampling, and specific parameters for alignment methods and more information regarding the specific statistical tests and their applicability to the data.

      This is a promising single-cell study with potentially high significance. The manuscript is well-written, and the analyses are reasonably well-executed. However, the current manuscript is limited by a lack of functional validation and mechanistic insights. The addition of further analyses and experiments, as suggested, will strengthen the conclusions and increase the impact of the work.

      We thank the reviewer for their careful reading of the manuscript. As suggested, we have performed additional validation and clarification of the results, as well as a more explicit discussion of their limitations. In addition, we have included a preliminary analysis exploring potential mechanisms that could be coordinating the observed expression patterns of the TcS family (see below). Even though we consider relevant and interesting to experimentally validate these results, given the inherent difficulties in studying multigene families in T. cruzi, an organism with a very limited set of molecular biology tools (such as RNAi), further experimental validation of these observations is outside of the scope of this short report.

      Regarding the reviewer’s question, we studied if any TcS subgroup could be driving our observations. However, we did not find any correlations indicating that a particular group was associated with any of our findings. We now include TcS group information to Supplementary Table 3.

      Regarding technical details, we now included the total number of mapped reads (line 422) and average number of reads mapped per cell (new paragraph in the Methods section, line 432-436).  

      The technical replicates consist of a single Illumina library that was sequenced in two separate runs. As this approach is expected to be highly reproducible, we merged both runs into a single count table, as stated in line 424. To support this decision, we assessed the concordance between the two sequencing runs and observed an almost perfect correlation between them (Author response image 2).

      The subsampling method was originally described in the Figure 2 legend; to better highlight this approach, we have now moved its description to the Methods section (line 456).

      The specific kallisto parameters used are stated in Methods (line 418-419). We now included that default options were used unless otherwise specified (line 419-420).

      In response to the reviewer’s and editorial team’s request for additional mechanistic insight into the regulatory processes that may be involved in the observed patterns, we have expanded the revised manuscript to discuss how the genomic context of TcS loci could contribute to the observed heterogeneity in TcS expression. As noted in the original version of the manuscript, TcS genes and other surface-protein gene families are largely partitioned into discrete genomic compartments, whose expression has been reported to be regulated by epigenetic control of chromatin-folding domains (doi.org/10.1038/s41564-023-01483-y). However, we previously showed that TcS genes detected in a high proportion of cells are, in most cases, dispersed throughout the genome, arguing against a model in which their preferential expression results from colocalization within a small number of ubiquitously activated chromatin domains. In response to the reviewer’s suggestion, we performed a more detailed analysis of the genomic locations of these TcS genes. We found that many of them are localized within the core compartment (new Figure 5). Because the core compartment is enriched for conserved, housekeeping genes that typically display more constitutive expression (doi.org/10.1038/s41564-023-01483-y), whereas the disruptive compartment is enriched for lineage-specific multigene families associated with variable, stage-specific, and recently reported stochastic expression (doi.org/10.1038/s41467-025-64900-2), our results are consistent with a model in which compartment-specific regulatory mechanisms (in addition to post-transcriptional regulation) influence the differential cellular expression of core- versus disruptive-located TcS genes. We have incorporated these results and discussion in line 301-313 of the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors consistently refer to gene "expression" but somewhere they should acknowledge that in trypanosomes RNA abundance is less predictive of protein than in most other organisms.

      We thank the reviewer for this important comment, highlighting a central challenge when studying trypanosomatid biology. We acknowledge that in most eukaryotes and particularly in T. cruzi, where there is a predominant role of post-transcriptional regulation, mRNA levels are not always directly correlated with protein abundance, as previously reported by us and others (10.1186/s12864-015-1563-8, 10.1128/msphere.00366-21, 10.1590/S0074-02762011000300002, 10.1042/bse0510031). Nevertheless, steady-state transcript levels obtained by RNA-seq remain informative for assessing differential gene expression, and this approach has been widely used as a proxy for the study of gene expression profiles in T. cruzi (10.7717/peerj.3017, 10.1371/journal.ppat.1005511, 10.1016/j.jbc.2023.104623, 10.3389/fcimb.2023.1138456, 10.1186/s13071-023-05775-4).

      It's also interesting to note that recent proteomic analyses (10.1038/s41467-025-64900-2) have revealed substantial heterogeneity in the expression of surface proteins, including trans-sialidases, supporting the idea that the transcriptional heterogeneity we observe reflects a genuine biological feature that propagates to the protein level.

      We have now added a sentence to the discussion acknowledging this limitation and discussed the results from Cruz-Saavedra, et al. in linea 266-271 of the revised manuscript.

      (2) Line 29, in the abstract there is a strong statement that T. cruzi "does not employ antigenic variation". I don't think there is much evidence either way if we are thinking about antigenic variation in the broad sense rather than the extreme model of T. brucei VSG switching. Later in the abstract they state that "no recurrent combinations of TcS genes were observed between individual cells in the population", which sounds very much like a form of antigenic variation.

      We agree with the reviewer. Indeed, we meant to state that T. cruzi does not employ an antigenic variation mechanism such as the one from T. brucei. We change this statement as suggested in lines 28 - 32.

      (3) Line 29, "relies on a diverse array of cell-surface-associated proteins encoded by large multi-copy gene families (multigene families) essential for infectivity and immune evasion" and lines 55-58 "T. cruzi infection relies on a heterogeneous set of membrane proteins, encoded mainly by large multigene families ... most of which are involved in infection, tropism, and immune evasion". It would be worth adding a bit more detail on the nature and strength of the evidence that Tc "relies on" these various genes or that they are "essential" for infectivity, tropism, and immune evasion.

      Because the journal’s short format imposes word limits, we strengthened the original statement by adding specific references that document genomic, transcriptomic and functional evidence linking the major multigene families to infectivity, tropism and immune evasion (doi.org/10.1371/journal.pone.0025914; doi.org/10.1038/nrmicro1351; doi.org/10.1128/iai.05329-11; doi.org/10.1093/nar/gkp172, doi.org/10.1371/journal.ppat.1006767), in line 77.

      (4) Line 89, 1088 genes detected per cell - what is this as a % of genes in the genome?

      We detected a mean of 1088 genes per cell. Based on the 15,319 annotated protein-coding genes in the reference genome, this represents 7.1% of the T. cruzi protein-coding gene complement detected in each cell.

      Across the entire dataset, a total of 14,321 genes were detected in at least one cell, representing 93.5% of all annotated protein-coding genes. This suggests that our experiment captured a broad representation of the parasite's transcriptome.

      This per-cell detection rate is characteristic of droplet-based scRNA-seq and is consistent with other trypanosomatid studies. For example, the T. brucei single-cell atlas (Hutchinson et al., 2021) reported a median detection of 1052 genes per cell. In the case of T. cruzi, the recently published pre-print of the T. cruzi single cell atlas from Laidlaw & García-Sánchez et al. reported a mean between 298 and 928 genes detected per cell (depending on the sample).

      This information is now included in Methods (line 435).

      (5) Line 93-94, how many cells were assigned to clusters 0 and 1?

      Cluster 0 had 2201 cells and cluster 1 had 824 cells assigned.  We have now included these specific numbers in new version of the manuscript (line 114).

      (6) Line 96, cluster 2 ama-trypo transitioning parasites - were these observable by microscopy?

      We did not perform microscopy specifically to observe or quantify the putative ama/trypo transitioning subpopulation: microscopy was only used as a pre-experiment quality check to verify cell morphology and viability. The inference that cluster 2 reflects ama/trypo transitioning parasites is drawn from the transcriptomic profile (particularly from the pattern of stage-associated marker expression observed in that cluster) and should be considered a hypothesis generated by the data, that merits further analysis, as stated in the manuscript.

      (7) Line 106-107, "As expected, single-copy gene expression is high in both amastigotes and trypomastigotes and similar on average between both cell types".

      (8) Why as expected? For a broad journal it would be useful to explain this. Amastigotes are replicative and trypomastigotes are not, so would we not expect to see some differences that reflect this?

      (9) What do you mean by the expression being "high"? High compared to what?

      (10) "Similar on average between both cell types". This does not seem concordant with Figure 1a showing a highly significant difference between ama and trypo.

      We thank the reviewer for this helpful request for clarification for broader readers and the observations regarding global expression of single copy and multigene family genes.

      Figure 2a is intended as an experimental control where we show that our 10X Genomics data shows the previously reported upregulation of surface protein genes in trypomastigotes. We have now modified the text in order to highlight this (line 129). In turn, Supplementary Figure 1a is shown as a control that this upregulation is not a general feature of trypomastigote cells.

      Regarding comment 9, what we meant is that single-copy genes display relatively high expression in both amastigotes and trypomastigotes compared with surface protein-coding genes (see expression values in Figures 2a and Supplementary Figure 1a).

      Finally, differential expression between amastigotes and trypomastigotes at the transcriptomic level has been previously studied and has shown that most single copy genes do not show variation, explaining the overall pattern of Supplementary Figure 1a where average expression is similar between stages (mean fold change = 1.1). This is likely due to the fact that these genes are related to basic cellular functions. Genes related to stage specific functions such as replication in amastigotes or normalization effects may be causing the slight, but statistically significant increase observed in overall expression in amastigotes. This contrasts with the pattern observed for multigene families where there is a clear overexpression in trypomastigotes (mean fold change = 1.5).

      As observations commented on questions 9 and 10 have been described in previous studies and are not novel nor key points in our results, we decided not to focus on them and modified the text accordingly in lines 129-135.

      (11) Line 110, "with high variation". What does "high variation" mean here? Compared to what? For the two metrics (n cells +ve for each gene and total expression level) can they give an average and the SD? It would be useful to know how many parasites the "average" surface (and core) gene is expressed in, or more precisely for which the RNA is above the limit of detection.

      We refer to the comparison with the expression profile observed for single-copy genes. This point has now been clarified in the text, and we have included the mean and standard deviation for both TcS multigene family genes and single-copy genes in trypomastigotes for both metrics in the Figure 2 legend. The average and distribution of the number of cells in which each gene is detected are shown in Figure 2c and Supplementary Figure 1a. We also added a reference to this panel at the point in the text where the phenomenon is first described.

      (12) Line 134, Figure 2b legend needs more detail - what are num_multigene and z_multigene?

      Please see our response to Reviewer 1, Question 6. We have now added a clarification to the legends of Figure 1 and Supplementary Figure 1.

      (13) Figure 2c, correct the y-axis legend because it implies your values are log10 transformed. Also, it would be useful to have more markers on the y axis so the reader can better estimate the data ranges.

      We thank the reviewer for this observation. We have now corrected the y-axis label and markers.

      (14) If the y-axis of Figure 2D started at 0 instead of 0.8 and if Lorenz curves were provided then the reader would probably get a fuller sense of the expression heterogeneity in the dataset. The legend states the differences are statistically significant but the actual p-values are not shown.

      (15) Line 142-3, more precision is needed on the p-values.

      We thank the reviewer for this helpful suggestion. We agree that Lorenz curves provide a clearer representation of expression heterogeneity than the previous plot. Accordingly, we have replaced the original panel (Figure 2d) with Lorenz curves for the groups under comparison, and have made the same change in Supplementary Figure 1d. In addition, we have included gini index values and p-values for all comparisons in Supplementary Table 2.

      (16) Figure 3, as in Figure 1a it would be useful to add another UMAP plot to show the two trypo subpopulations.

      We thank the reviewer for this suggestion. We have now updated Figure 3 to include a UMAP plot showing the two trypomastigote subpopulations.

      (17) What is the observed proportion of broad vs slender trypomastigote morphologies for Dm28c? To be consistent with the speculation at line 162 then wouldn't it need to be approximately 50-50?

      The proportions of each trypomastigote subpopulation in the DM28c strain are currently unknown. The only available relevant data come from Brener, 1965 (doi.org/10.1080/00034983.1965.11686277), in which this strain was not included. In the strains analyzed in that study, the relative proportions of broad and slender trypomastigote morphologies were highly variable: across seven strains, broad forms ranged from 18.0% to 77.3%, while slender forms ranged from 2.3% to 71.6%. Given this wide variability and the lack of DM28c-specific data, we cannot assume any expected proportion for this strain.

      (18) Line 170, please state how many genes are in the TcS subgroup mentioned here. This is an interesting finding - does this include mostly catalytically active trans-sialidase genes or is it a mixture from across all the subfamilies?

      The TcS subgroup with a high frequency of detection comprises 31 genes, none of which belong to the catalytically active Group I trans-sialidases. Instead, this subgroup includes members of Groups II, III, IV, V, VI, and VIII. This information has been added to Supplementary Table 3 and is now stated in the revised manuscript (lines 227 - 228).

      (19) Line 175-176, "Gene dropouts might favor random patterns of gene family's detection in scRNA-seq experiments, particularly affecting genes with low expression" - I'm not sure if the authors mean the detection of a gene (or not) in an individual parasite is truly random (pure luck) or whether the term stochastic would be more appropriate because they seem to be referring to randomness around a certain threshold of RNA abundance/stability? They go on to rule this out, at least for TcS genes, essentially arguing that they have something resembling an ON or OFF pattern rather than a spectrum of expression levels. This is potentially very important and could advance the field in a major way, but the fact that so many core and ribosomal genes, which 'should' be always ON, cannot be detected in most cells is a concern. A version of Figure 4B for core and ribosomal genes could be informative - do they show a different pattern to TcS?

      Our results reveal a small subset of TcS genes that are frequently detected across cells, a pattern that is not compatible with random detection unless these genes were highly expressed and preferentially captured by random sampling. However, as shown in Figure 4b, many genes expressed at comparable levels are not detected at high frequencies. In line with this, Figure 4c shows that within individual cells, the detected TcS genes exhibit similar expression levels. Finally, we confirmed that this frequently detected subset shows high read counts at the bulk RNA-seq level (Supplementary Figure 2), consistent with the fact that these TcS are frequent in the population even when they are not specially highly expressed within each cell. Taken together, these findings argue against a purely random sampling of TcS genes and support the interpretation that this pattern reflects an underlying biological feature. We agree that further validation will be required. Accordingly, since the initial submission, we have been careful to frame our conclusions conservatively, explicitly noting that dropout remains a limitation of these data that could influence the observed patterns. In the revised version, we have strengthened this point by including a specific statement in the final remarks. Our interpretation is presented as a working hypothesis that is fully compatible with the observations reported here and may be informative for the field. To better reflect this reasoning, we have revised Figure 4b, expanded the discussion, and explicitly included this limitation in the final remarks of the revised manuscript.

      (20) Line 238-9, Add details of removing extracellular epimastigotes after cell infections.

      Only cellular trypomastigotes collected from the supernatant on day 6 were used for the secondary infection, at a 10:1 parasite-to-cell ratio. After 24 hours, the cultures were washed twice with PBS to remove any remaining extracellular parasites. Under these conditions, i.e. using exclusively trypomastigotes, at this infection ratio, and maintaining the cultures in mammalian medium, we do not expect the presence or survival of extracellular epimastigotes. We have included a sentence in the Methods section clarifying this information in the revised version of the manuscript, line 382.

      (21) Line 260, was methanol used to directly resuspend the parasite pellet, or was it resuspended first e.g. in a small volume of PBS?

      As described in lines 250-257 of the original manuscript, parasites were washed and resuspended in DPBS before methanol fixation. Methanol fixation was then carried out according to the 10X Genomics Methanol Fixation Protocol. We have now emphasized this more clearly in the revised text in line 400.

      (22) What was the doublet rate?

      We identified and removed 41 doublets, all belonging to cluster 2, and retained 3,151 singlets for downstream analysis (total cells before removal = 3,192). The resulting doublet rate was 1.28%. We have included a sentence in the Methods section clarifying this information in the revised version of the manuscript, line 439 -440.

      (23) What was the frequency of rRNA and kDNA-derived reads?

      Approximately 4.02% of the reads were derived from kDNA sequences, while 1.10% corresponded to rRNA-derived reads (Author response image 4).

      Author response image 4.

      Percentage of mitochondrial and ribosomal rRNA derived reads.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the Reviewers for their comments on our manuscript “Structural insights into mitotic-centrosome assembly”. As described below, we have substantially revised the manuscript in response to their comments and are hoping you would consider the revised manuscript “Phosphorylation relieves autoinhibition to drive Cnn centrosome scaffold assembly” at The EMBO Journal. Our specific responses (black text) to the Reviewer’s comments (blue text) are detailed below

      Reviewer #1

      Main Points:

      1) From previous studies, it seems to me that for the residues potentially relevant for the hairpin regulation there is direct evidence of phosphorylation only for S567 (mass spec, phospho-antibody). Have the authors tested single site mutants (S567A and E)? Also, have they tested D mutations? If so, this should be commented on and shown. If not, it should be tested, in particular since the 2E phospho-mimetic is not functioning properly in vivo. If S571 is indeed crucial, it should be demonstrated that it is also phosphorylated. Otherwise it is possible that the mutation of this residue simply impairs important interactions (e.g. PReM-CM2, others), independent of phosphorylation.

      As requested, we have now tested individual S567A and S571A mutations and found that they both perturb Cnn scaffold assembly, but to a lesser extent than the 2A double mutant (New Fig.S3A). We also now confirm by MS that recombinant Polo can phosphorylate both S567 and S571 in vitro, and we have examined the behaviour of a 2D mutant and find that it behaves very similarly to the 2E mutant (New Fig.S3B).

      2) It is unclear why in vitro only A mutations have been tested and not phospho-mimetics. This should be tested for the interaction between PReM and CM2. This would allow to probe the model that phosphorylation opens the hairpin to allow interaction. Currently, such proof is missing in the study. Alternatively, the authors could phosphorylate the recombinant protein in vitro. The in vivo data is harder to interpret due to the complexity of the model and the authors should take advantage of the in vitro system.

      As requested, we now show in New Fig.S5 that whereas in vitro WT Cnn490-608 and Cnn-2A490-608 behave as dimers, Cnn-2E490-608 elutes in two major fractions—a tetramer species and a much larger species that elutes in the void volume (meaning that 2E can form very large species even in the absence of CM2) (Figure S5A). In the presence of CM2, Cnn-2E490-608 forms a tetramer (that eluted slightly later than the Cnn-2E490-608 tetramer) and larger complexes that contained CM2 and eluted in the void volume with a profile similar to Cnn-2E490-608 on its own (Figure S5B). These results are consistent with the possibility that the 2E substitutions open the helical hairpin to allow self-interactions that drive homo-tetramer and larger complex assembly in vitro.

      3) Regarding the worm PReM and CM2 domains, the authors mention that they have tested in vitro phosphorylation by PLK-1, but I could not find any data showing this. They should demonstrate successful phosphorylation or test candidate site by phospho-mimetic mutation. It is possible that the worm proteins depend more strongly on phosphorylation to relieve autoinhibition compared to the fly proteins.

      This is a good point, and we apologise for this omission. We now state that we confirmed by MS analysis that the recombinant worm PLK-1 we used in these in vitro experiments phosphorylates the putative SPD-5 PReM domain on the three sites (S627, S653 and S658) known to be important for promoting SPD-5 scaffold assembly in vivo (Figure Legend, Figure 6). Thus, the lack of detectable binding between these proteins is not due to the lack of phosphorylation.

      Minor Point:

      4). Fig. 6C, D: the labeling of the chimeric constructs using "+" symbols is confusing, since it suggests that separate proteins were expressed. If I understand this correctly, with the current labeling, deltaCM2+DmCM2 means WT? The authors should write the full name of the wildtype or chimeric construct in each case and use a more standard/less confusing nomenclature. Also, I suggest to start the panels and graphs with the WT sample.

      We thank the Reviewer for this suggestion and have re-labelled this Figure to clarify this point. We understand the point about putting the WT panels first in Figure 6C,D (now Figure 5C,D) but think that this is not the correct comparison to emphasise. We are testing the ability of the various CM2 domains to “rescue” the lack of a CM2 domain, so we feel Drosophila Cnn lacking CM2 is the correct baseline for this comparison.

      Reviewer #2

      Main Comments:

      1. The title is too vague. Any number of existing papers could be said to provide "structural insights into mitotic centrosome assembly". The authors need to narrow down to a defined conclusion and state this as the title.
      2. I think the strongest and most novel aspects of this study relate to the mechanism of Cnn assembly via relief of the auto-inhibited PReM. The effort to elucidate assembly mechanisms of SPD-5 and CDK5RAP2 are comparatively light and there are no accompanying experiments in worms or human cells. Without the in vivo experiments, it's hard to know if the in vitro experiments are valid. It's speculative for the authors to say they found the true PReM for CDK5RAP2; they do not demonstrate that PLK-1 phosphorylation potentiates assembly in Figure 8. Thus, I suggest re-writing the paper to focus on Cnn. Experiments in Figure 6 are still valid if reframed. For example, substituting Cnn's CM2 with the CM2 from CDK5RAP2 vs. the C-term of SPD-5 illustrates that a simple coiled-coil with open ends (H.s.CM2) is sufficient to interact with PReM whereas a coiled-coil with a closed end (SPD-5 C-term, predicted by Figure 6A) cannot. We thank the Reviewer for these helpful comments and have re-written and re-organised the manuscript in accord with these suggestions—most importantly providing a more specific title and re-ordering the data to better focus the paper on the relief of Cnn autoinhibition.

      The purpose of Figure 1 is unclear. None of the other figures examine SPD-5 and CNN in the condensate form, which required using 4% PEG in this paper. The other assays look at the network form, which could behave differently and have different dependence on specific domains. I think they should perform the condensate assay for all other figures, otherwise leave it out. Furthermore, CDK5RAP2 is mentioned, yet not examined in Figure 1. It must be noted that CDK5RAP2 will also condense into droplets under crowding conditions or with a synthetic nucleator (Rios et al., 2025 J Cell Sci). Thus, it seems that condensation potential is a universal feature of known PCM scaffold proteins.

      The original Figure 1 has been moved to end of the paper (now Figure 8) and we now more thoroughly explain the logic of these experiments. Briefly, given that the PReM and CM2 domains in flies and worms seem to function in different ways in vivo, we sought here to test whether this was also the case in vitro—where the behaviour of full-length SPD-5 and of these domains of Cnn have been extensively studied, but never directly compared. We believe such a direct comparison will be of some interest to the field (the Woodruff et al., 2017 paper describing these in vitro SPD-5 condensates has been cited >700 times). We now also cite the Rios et al., 2025 paper but note that, despite extensive efforts, we were unable to purify enough well-behaved CDK5RAP2 for our experiments and so could not include it in this analysis. We think Rios et al., used an MBP-fusion of CDK5RAP2 in their experiments, which may explain this difference.

      The study uses different species without doing the same types of experiments on each. Sometimes human CDK5RAP2 is thrown in, sometimes not. They solve crystal structures of PReM from Cnn but not from the other proteins. This gets confusing, especially since the authors state that they seek to test if fly Cnn and worm SPD-5 assemble through different mechanisms (see last sentence of the intro). Also, if the focus is on worm vs. fly PCM assembly mechanisms, why include the human protein, especially Figure 8?

      On re-reading our original manuscript we appreciate this confusion. We hope that in re-writing the manuscript along the lines suggested by the Reviewer the logical flow of our experiments will be clearer.

      The conclusion that SPD-5's narrow PReM and "CM2" domains don't interact is consistent with the cross-linking mass spectrometry data from Rios et al. 2024. They showed only one X-link with low occurrence (1 out of 6 samples) between these two regions, even in the phosphorylated state (Fig. 1G). However, Nakajo et al (2022) claimed the opposite, showing that a larger PReM-containing construct (a.a. 272-732) interacts with a C-terminal construct (a.a. 1061-1198) after PLK-1 phosphorylation. Can the authors comment on this? Perhaps there is another site in SPD-5, outside of a.a. 541-677, that acts like the Cnn PReM?

      These are good points and we now mention this last possibility in the Discussion. We also now mention the supporting cross-linking Mass Spec data from Rios et al., 2024.

      I have serious doubts that the C-terminus of SPD-5 has a CM2 domain. To me, there is no real sequence homology with the traditional CM2's from humans and flies, and the AF3 predictions support this. Ohta et al. (2021) called this region "CM2-like" based on very poor homology, which a is questionable practice. Any coiled-coil region will appear somewhat homologous due to the heptad repeat pattern that defines them (e.g., leucines line up quite nicely). Thus, is it fair to say that SPD-5 doesn't assemble through a PReM-CM2 interaction? There may be a different region in SPD-5 that looks more like the canonical CM2. I think the authors have compelling evidence to give the C-terminal coiled-coil region in SPD-5 its own name rather than calling it CM2.

      This is a fair point, although the literature is already quite confusing on the nomenclature for the C-terminal region of SPD-5 (e.g., Ohta et al., JCB, 2021; Nakajo et al., JCS, 2022), so we are reluctant to add another name to the mix. Given that we draw comparisons with the fly and human CM2 domains (that are clearly related by sequence), we think it is easiest for readers if we use the “CM2” nomenclature throughout, although making clear our conclusion that SPD-5 “CM2” does not appear to function in the same way as fly/human CM2.

      Figure 3E. Would measuring scaffold mass be more appropriate? The PReM(deltaH1,NTH2) leads to more compact scaffolds, but maybe they assemble just as well as the deltaH1 mutant. As it stands, there is a discrepancy between panel E and F in terms of what is measured (area vs. intensity) and the outcome.

      In several previous papers we use fluorescence intensity to measure the “amount” of protein at centrosomes in vivo but, in our original paper (Feng et al., Cell, 2017), we quantified PReM::CM2 scaffold assembly in vitro by measuring the area of scaffold assembly. Thus, we prefer to present the current data in this way for consistency across publications, and we believe either measure is valid. We could measure the area and intensity of the PReM∆H1 and PReM∆H1∆NTH2 scaffolds to compare scaffold density, but we think this would unnecessarily complicate this data. The main point is not how much or how dense each scaffold is, but rather that the PReM∆H1∆NTH2 protein doesn’t really make a scaffold at all—but rather makes smaller “blobs” that tend to bunch together (further characterised in Fig.S2).

      Minor Comments:

      1. In one version of the PDF there are images missing in Fig 1F, 4C, 4D. I opened another version (source version) and the images were there. Just FYI.
      2. Figure 4A. The blue coloration makes it difficult to read the black letters.
      3. Figure 4A. Why is part of the protein colored in green? This coloration isn't defined, nor does it show up again in panel B.
      4. The layout of Figure 4 is confusing. It took me a few minutes to realize that the big red box inset belonged to panel B and not panel A.
      5. Figure 4C,D. The sample size is not mentioned in the legend.
      6. The title for Figure 4 seems too speculative. How can the authors say that phosphorylation relieves the autoinhibition without structural data?
      7. Figure 5B. The sample size is not mentioned in the legend.
      8. Figure 6B,D. The sample size is not mentioned in the legend.
      9. The text in Figure 7B is hard to read because it is too small. Please make this bigger.
      10. Figure 8C. What is colored in magenta? Is there an additional labeled protein besides mNG-CM2?
      11. Figure 8C. What is the sample size? How many images were taken? Also, why are there data points off to the right of the last column?
      12. The wording of these sections needs improving. I found them complicated and difficult to understand. We thank the Reviewer for taking the time to make these helpful comments. We have addressed all these points in the revised manuscript. On point 10, the magenta objects were fiduciary beads that were inadvertently included on this panel (and are no longer shown).

      Reviewer #3

      Major Comments: 1. The title, "Structural Insights into Mitotic-Centrosome Assembly," is overly broad. The study primarily focuses on CM2-PReM intramolecular interactions in D. melanogaster Cnn and does not comprehensively address mitotic centrosome assembly across species. A more specific title reflecting the fly-centric and structural focus would better align with the manuscript's scope and conclusions.

      As described at the start of our response to Reviewer #2, the title and focus of the manuscript have been extensively revised along these lines.

      The authors analyze condensate formation by Cnn and SPD-5 but overlook condensate formation by CDK5RAP2, which was recently reported by Rios et al. (2025, PMID: 40454523). Including CDK5RAP2 would enable a more balanced and informative comparison across fly, worm, and human homologs.

      As described in point 3 of our response to Reviewer #2, we now cite Rios et al., 2025 but note that, despite extensive efforts, we were unable to purify enough well-behaved CDK5RAP2 for our experiments and so could not include it in this analysis. We believe Rios et al., used a full-length MBP-fusion of CDK5RAP2 in their experiments, which may explain this difference as MBP is very good at keeping proteins soluble (but would not be appropriate in our experiments where we compare full-length untagged proteins).

      In Figure 3, reconstitution of Cnn scaffolds using purified CM2 and PReM fragments yields "macromolecular scaffolds," but their physical properties are not defined. It remains unclear whether these assemblies are ordered or amorphous, and whether they exhibit solid- or gel-like behavior. Moreover, the heterogeneous, scattering particles observed by negative-stain EM (Figure S3B), likely corresponding to the Cnn490-608-CM2 complex, raise the possibility of nonspecific aggregation rather than organized scaffold formation. Appropriate controls lacking CM2 are needed to exclude spontaneous aggregation of PReM fragments. In addition, testing shorter truncations of the PReM H2 helix could help define the minimal requirements for scaffold assembly. Finally, the rationale for including the CnnΔExPReM construct only in vivo (Figure 3F), but not in the in vitro assays (Figure 3A-E), should be clarified.

      We apologise, as our presentation of this data has clearly led to some confusion on these points.

      First, as we now clarify, the amorphous solid-like physical properties of the PReM::CM2 scaffolds were described in our previous paper where we also showed that these scaffolds are not simply non-specific aggregates—as several single point mutations that disrupt the LZ::CM2 tetramer also prevent PReM::CM2 scaffold assembly in vitro as well as Cnn scaffold assembly in vivo (see Fig.5, Feng et al., Cell, 2017). Also, in all in vitro scaffolding experiments we always perform a negative control (-CM2) to confirm that none of the scaffolds are aggregates of the PReM domain being tested. We don’t usually show this control now as there would be lots of empty black boxes on the Figures. We do, however, show this control for the human putative PReM domain (Figure 7C), as we are testing this here for the first time.

      Second, the request to test shorter truncations of the PReM H2 helix to define the minimal requirements for scaffold assembly is unnecessary as PReM∆H1∆NTH2 already cuts H2 at the start of the LZ, and we previously showed the LZ is required for PReM::CM2 scaffold assembly in vitro (Feng et al., Cell, 2017). Thus, any further truncation of H2 will start to remove the LZ, which we already know is essential. We have now made this point more clearly.

      Finally, the Cnn∆ExPReM construct the Reviewer mentions was tested in both the in vitro (now Figure 2B) and in vivo (now Figure 2F) assays, but the labelling was confusing so this was not clear. We have now clarified this point.

      The coarse-grained (CG) simulation methodology is insufficiently described. Given that CG approaches sacrifice atomic detail and may oversimplify interactions, readers require more information to evaluate the model's reliability and limitations. A comparison with the framework used by Ramirez et al. (2024, PMID: 38356260) would be informative. It is also unclear why available crystal structures of WT and 2A Cnn (Figure 2C; Figure S4) were not used as simulation inputs, or why the structure of Cnn490-579 2E was not determined to complete the structural comparison.Furthermore, mutation of Ser567 and Ser571 to alanine markedly stabilizes the PReM domain (Figure 5C, D), implying that these residues maintain domain flexibility. Back-mapping CG models to atomic resolution could reveal the interactions altered by these mutations. The exclusive focus on double mutants (2A and 2E) is also limiting; analysis of single-point mutants at S567 or S571 would clarify whether both residues contribute equally or play distinct roles.

      We performed coarse-grained simulations because although they simplify atomic interactions and capture overall conformational dynamics, which is what we are trying to assess here (Fig.4C,D). We now clarify this point and provide more detail of our simulation methodology in the main text and Materials and Methods. We used the full helical hairpin (i.e., H2+H3+H4) prediction in these simulations—rather than the crystal structure of the partial helical hairpin (i.e., H2+most of H3)—as we reasoned that the presence of the full H3 and H4 might influence breathing, and the full helical hairpin (see Video S1) seems likely to be the relevant biological fold. As we now show (new Figure S5), and as discussed above, the 2E mutants do not behave well in vitro so we were unable to solve their structure. We agree that we could perform atomic resolution simulations to better understand how the 2A/E and single A/E mutations might suppress/enhance breathing, but we believe such an analysis is beyond the scope of the current manuscript and would distract from our main conclusions.

      The discussion lacks sufficient integration with prior studies and often presents conclusions without adequate citation. For example, the claim that flies and humans rely on related PReM-CM2 interactions whereas worms use distinct phosphorylation-regulated mechanisms is not supported by appropriate references. In addition, limited cross-referencing to the manuscript's own data weakens the connection between results and conclusions. Expanding and better grounding the discussion in existing literature would significantly enhance its depth and clarity. We thank the Reviewer for this general point and have tried to better integrate our results with prior studies—particularly in the Discussion section.

      Minor Comments: 1. In Figure 1B, the molecular weight units for the protein marker are missing and should be included. Fixed.

      In Figures 1E and 1F, readability would be improved by including x-axis labels on all graphs, rather than only on the bottom panels.Fixed. The protein structures shown in Figures 2C and 2D sh7w b b∫ybb ould be explicitly labeled as dimers to avoid confusion. Fixed. In Figures 3A-D, using fluorescently labeled CM2 would help validate both the interaction with the PReM domain and its localization within the scaffold.We have previously tried fluorescently tagging the CM2 domain, but scaffold formation is much less robust. We do not think this invalidates this assay, as the evidence supporting the PReM::CM2 interaction is very strong—including assessing the physiological influence of multiple point mutations in both domains in residues at the heart of the interaction interface identified by crystallography (e.g., see Fig.4, Feng et al., Cell, 2017).

      In Figure 3E, no statistical comparisons are presented between the original PReM construct and other samples. In addition, information regarding sample size and the number of experimental replicates is missing from the figure legend. Fixed. In Figure 3F, the absence of a pixel intensity scale bar makes the data difficult to interpret, as color values corresponding to high and low signal intensities are unclear. Moreover, no additional centrosome marker is included, nor is there evidence that PReM fragment expression levels are comparable across samples. These concerns also apply to Figures 4C and 4D.We now include pixel intensity scales in all relevant Figures. We think we do not need to show additional centrosome markers in our images as centrosomes exhibit a very reproducible behaviour in these embryos so we can be very confident that the objects we show here are genuine centrosomes. Considering expression levels, the images in Fig.4C,D (now 3C,D) are derived from stable transgenic lines so we can measure protein expression levels and show that the 2A and 2E mutants are expressed at similar levels to WT (new Figure S6). The images in 2F are from mRNA injections, so cannot be quantified in this way. However, we have vast experience with this assay (used in >15 publications since 2014) and can tell when, very occasionally, an injected mRNA is not expressed well (as this leads to a lack of general fluorescence in the cytoplasm). In addition, we know that deletions in Cnn do not generally destabilise the protein as we have analysed many such transgenic lines (see, for example, Reviewer Figure 1). Thus, the differences in centrosomal levels observed and quantified in 2F are almost certainly not caused by differences in the stability of the proteins being generated from the injected mRNAs.

      In Figure 4A, the interacting residues of PReM and CM2 shown in the red inset would be clearer if residue annotations for each domain were displayed in distinct colors. Additionally, the legends for Figures 4C and 4D do not specify the scale bar length.Fixed. The authors state that interactions between CM2 and PReM-2A462-608 could not be detected in vitro based on SEC chromatograms (Figure 5A), yet the figure does not clearly show this result. The accompanying SDS-PAGE images are too small and lack lane labels, making interpretation difficult (a similar issue applies to Figure 7B). Furthermore, the SEC chromatogram x-axis lacks volume annotations, hindering correlation between chromatographic peaks and SDS-PAGE results (in contrast to Figure 7B, which provides an appropriate example).We thank the reviewer for these points, all of which have now been fixed/adjusted.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Mohamed et al. set out to compare the assembly mechanisms of pericentriolar material (PCM) in flies and nematodes. They reveal that the main PCM scaffold protein in each species (Cnn in flies, SPD-5 in nematodes) are sufficient to form supramolecular droplets (with a crowding agent) or networks (without a crowding agent). However, they diverge in one key aspect: Cnn scaffold assembly relies on the interaction between a C-terminal CM2 domain and a central phospho-regulated domain (PReM), whereas SPD-5 does not. The authors solve the crystal structure of a region within Cnn's PReM. With the help of modeling, they speculate that this region is auto-inhibited through backfolding of alpha helices, thus preventing its interaction with the CM2 domain. This auto-inhibition would be relieved by phosphorylation, which modeling suggests would increase "breathing" of the backfolded structure. The author end by presenting evidence to suggest that the human PCM scaffold protein CDK5RAP2 may assemble through a PReM-CM2 interaction.

      Major Comments:

      1. The title is too vague. Any number of existing papers could be said to provide "structural insights into mitotic centrosome assembly". The authors need to narrow down to a defined conclusion and state this as the title.
      2. I think the strongest and most novel aspects of this study relate to the mechanism of Cnn assembly via relief of the auto-inhibited PReM. The effort to elucidate assembly mechanisms of SPD-5 and CDK5RAP2 are comparatively light and there are no accompanying experiments in worms or human cells. Without the in vivo experiments, it's hard to know if the in vitro experiments are valid. It's speculative for the authors to say they found the true PReM for CDK5RAP2; they do not demonstrate that PLK-1 phosphorylation potentiates assembly in Figure 8. Thus, I suggest re-writing the paper to focus on Cnn. Experiments in Figure 6 are still valid if reframed. For example, substituting Cnn's CM2 with the CM2 from CDK5RAP2 vs. the C-term of SPD-5 illustrates that a simple coiled-coil with open ends (H.s.CM2) is sufficient to interact with PReM whereas a coiled-coil with a closed end (SPD-5 C-term, predicted by Figure 6A) cannot.
      3. The purpose of Figure 1 is unclear. None of the other figures examine SPD-5 and CNN in the condensate form, which required using 4% PEG in this paper. The other assays look at the network form, which could behave differently and have different dependence on specific domains. I think they should perform the condensate assay for all other figures, otherwise leave it out. Furthermore, CDK5RAP2 is mentioned, yet not examined in Figure 1. It must be noted that CDK5RAP2 will also condense into droplets under crowding conditions or with a synthetic nucleator (Rios et al., 2025 J Cell Sci). Thus, it seems that condensation potential is a universal feature of known PCM scaffold proteins.
      4. The study uses different species without doing the same types of experiments on each. Sometimes human CDK5RAP2 is thrown in, sometimes not. They solve crystal structures of PReM from Cnn but not from the other proteins. This gets confusing, especially since the authors state that they seek to test if fly Cnn and worm SPD-5 assemble through different mechanisms (see last sentence of the intro). Also, if the focus is on worm vs. fly PCM assembly mechanisms, why include the human protein, especially Figure 8?
      5. The conclusion that SPD-5's narrow PReM and "CM2" domains don't interact is consistent with the cross-linking mass spectrometry data from Rios et al. 2024. They showed only one X-link with low occurrence (1 out of 6 samples) between these two regions, even in the phosphorylated state (Fig. 1G). However, Nakajo et al (2022) claimed the opposite, showing that a larger PReM-containing construct (a.a. 272-732) interacts with a C-terminal construct (a.a. 1061-1198) after PLK-1 phosphorylation. Can the authors comment on this? Perhaps there is another site in SPD-5, outside of a.a. 541-677, that acts like the Cnn PReM?
      6. I have serious doubts that the C-terminus of SPD-5 has a CM2 domain. To me, there is no real sequence homology with the traditional CM2's from humans and flies, and the AF3 predictions support this. Ohta et al. (2021) called this region "CM2-like" based on very poor homology, which a is questionable practice. Any coiled-coil region will appear somewhat homologous due to the heptad repeat pattern that defines them (e.g., leucines line up quite nicely). Thus, is it fair to say that SPD-5 doesn't assemble through a PReM-CM2 interaction? There may be a different region in SPD-5 that looks more like the canonical CM2. I think the authors have compelling evidence to give the C-terminal coiled-coil region in SPD-5 its own name rather than calling it CM2.
      7. Figure 3E. Would measuring scaffold mass be more appropriate? The PReM(deltaH1,NTH2) leads to more compact scaffolds, but maybe they assemble just as well as the deltaH1 mutant. As it stands, there is a discrepancy between panel E and F in terms of what is measured (area vs. intensity) and the outcome.

      Minor Comments

      1. In one version of the PDF there are images missing in Fig 1F, 4C, 4D. I opened another version (source version) and the images were there. Just FYI.
      2. Figure 4A. The blue coloration makes it difficult to read the black letters.
      3. Figure 4A. Why is part of the protein colored in green? This coloration isn't defined, nor does it show up again in panel B.
      4. The layout of Figure 4 is confusing. It took me a few minutes to realize that the big red box inset belonged to panel B and not panel A.
      5. Figure 4C,D. The sample size is not mentioned in the legend.
      6. The title for Figure 4 seems too speculative. How can the authors say that phosphorylation relieves the autoinhibition without structural data?
      7. Figure 5B. The sample size is not mentioned in the legend.
      8. Figure 6B,D. The sample size is not mentioned in the legend.
      9. The text in Figure 7B is hard to read because it is too small. Please make this bigger.
      10. Figure 8C. What is colored in magenta? Is there an additional labeled protein besides mNG-CM2?
      11. Figure 8C. What is the sample size? How many images were taken? Also, why are there data points off to the right of the last column?
      12. The wording of these sections needs improving. I found them complicated and difficult to understand.

      "Fly and worm Spd-2/SPD-2 and Polo/PLK-1 are clear homologues, but Cnn and SPD-5 share little sequence homology-although they are both predicted to be large coiled-coil-rich proteins. Thus, it remains unclear whether these two, largely unrelated, molecules form mitotic-PCM scaffolds that assemble and function in a similar manner"

      "We first focused on Drosophila Cnn as, although the full structure of the original PReM domain (Cnn403-608) is unknown, this domain contains an internal leucine-zipper (LZ) dimer (Cnn490-544) whose crystal structure, in a tetrameric complex with a CM2 dimer, had been solved (Figure 2A) (Feng et al., 2017)."

      "When the full PReM and CM2 domains are mixed in vitro, they form large micron-scale assemblies and point mutations that perturb the LZ::CM2 tetramer perturb PReM::CM2 scaffold assembly in vitro and Cnn scaffold assembly in vivo."

      Significance

      Overall Assessment:

      While I find the premise of this study to be interesting, its execution and presentation are not fully convincing. The study is a collection of experiments connected by a thread that can be difficult to follow. One concern is the lack of focus and a clearly stated conclusion, which is ultimately embodied by the vague title. For example, the research question at the beginning doesn't match with the outcome in the end. At the end of the introduction, the authors state they wish to compare assembly mechanisms of Cnn and SPD-5. However, at the end of the results, they present data on CDK5RAP2 and speculate on its assembly. Why introduce the human protein here? Another concern is the lack of symmetry in the experiments. There is much more in vitro characterization of Cnn than SPD-5 or CDK5RAP2, and all in vivo work is performed in flies. Finally, this study does not address if the best-established model for SPD-5 assembly-multimerization via specific, multivalent coiled-coil interactions-applies to fly Cnn. Thus, to me, this is study is a deeper dive into the mechanism of Cnn assembly, not necessarily a fair cross-species comparison. I do not have major issues with the results, but I recommend that this paper undergo significant re-writing before being re-reviewed. There are also issues with data display and reporting of experimental details (e.g., sample sizes) that should be easily fixed.

      Advance: this study provides new insight into how two specific domains interact within PCM scaffold proteins to promote scaffold assembly. It provides some new structural insight into the mechanism of Cnn auto-inhibition. However, there is limited conceptual advance, as the bigger ideas (e.g., auto-inhibition as a regulatory control, PCM scaffold assembly through condensation of coiled-coil proteins) were already established.

      Audience: this study will be of interest to cell biologists studying centrosome assembly, mitosis, and evolution.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors provide extensive immunoreactivity and expression data to map monoaminergic neurotransmitter production sites in Pristionchus pacificus. This nematode is relatively distantly related to the popular model nematode Caenorhabditis elegans, for which such information is already available. They find that dopamine, tyramine, and octopamine are present in the same neurons in both species, but differences are observed for serotonin. This forms the basis for a comparison of serotonergic neurons across 22 nematode species. In addition, they evaluate monoaminergic effects on egg-laying, head movement during reversals, and nictation behavior, to find that monoaminergic control over the latter differs between C. elegans and P. pacificus. This shows that some anatomical flexibility supports similar outcomes, whereas in other cases it is the basis of evolved regulatory differences.

      Strengths:

      The comparative efforts are laudable and valuable, including a thorough revisiting of old data and corrections of what is judged as a historic misannotation. The expected continued value of this work is also appreciated, because nematodes have similar anatomies and behaviors, cellular-resolution data of different species permits the study of functional evolution of neurotransmitter usage in homologous neurons.

      Despite the strong experimental approach, there are some points that require addressing:

      (1) Not all the concepts of the introduction ('feeding behaviors', to a lesser extent also 'evolution of neurotransmitter usage in homologous neurons') are followed up upon in the results or discussion sections.

      We will address the relative treatment of particular topics in the introduction and discussion in a revised version of the article.

      (2) The choice of nematodes ('only' 13 species) may affect what is perceived as ancestral.

      See above regarding ‘13 species’ (actually 22). Most species and genera were specifically selected previously (Loer and Rivard, 2007; Rivard et al., 2010) for broad phylogenetic coverage, representing different species and genera in 4 major clades within ‘clade V’ (Kiontke et al., 2007; Sudhaus, 2011): Anarhabditis (Caenorhabditis, including both the Elegans and Drosophilae species groups), Synrhabditis (Oscheius, Metarhabditis, Reiterina and Rhabditella), Pleiorhabditis (Teratorhabditis, Mesorhabditis, Rhomborhabditis and Pelodera), and Diplogastrids represented by P. pacificus. Among the outgroups to clade V, there are 3 distinct clades represented, each with at least two species and/or genera represented. Therefore, we believe that the determination of an ancestral condition is well-founded. We plan to add this rationale to the revised version to make this clearer.

      (2, continued) Also, identifying their cells based on comparisons with Ce or Ppa identifications only is understandable but mildly risky: there are many cells in the head, and mistakes would go unnoticed until detailed analysis in each species can provide conclusive evidence.

      We agree that there is a mild risk of incorrect identification but believe that appropriate caveats are noted in the text. Furthermore, the recent head EM reconstruction and complete embryonic cell lineage of the P. pacificus (Cook et al., 2025) shows a nearly 1-1 homology correspondence between head neurons (e.g., only a single head neuron is missing in the Ppa head relative to Cel due to altered apoptosis), and a quite high level of conservation of neurite morphology and soma position between Cel and Ppa suggests that identifications are likely correct when examining related nematodes. In cases for which a serotonin-immunoreactive cell is found in the predicted location (and often having apparent associated neurites), its homology to the matching Cel and Ppa cell is the most parsimonious interpretation: otherwise, one cell would have to lose expression and another nearby cell gain it.  

      (3) It is not reported whether the nictation-defective mutants have general locomotion defects; therefore, whether the reported problem is specific to this host-finding behavior or not.

      None of the mutants we tested for nictation behavior, including those that show severe defects in nictation (Ppa-cat-1, Ppa-tph-1, Ppa-tdc-1, Ppa-tbh-1), exhibited noticeable general locomotion defects either as dauers or non-dauers. Further clarification will be provided in a revised version of the article.

      (4) The section on RIP neurons makes sense for Ppa, but not for Ce (dauers in fact have weakened IL2-to-RIP connections) and should be revised. The nictation data also do not support the breadth of the conclusions, which should either be toned down or rephrased as hypothetical.

      We plan to address these concerns in a revised version of the article.

      (5) The discussion mostly reiterates the results, leaving little room for the author's interpretations and opinions. I would suggest reworking in favor of conceptual discussion.

      As noted above, we agree to address the relative treatment of matters in discussion in a revised version of the article.

      Reviewer #2 (Public review):

      Summary:

      This paper makes important contributions to our understanding of how nervous systems evolve, with a particular focus on whether changes in neurotransmitter usage within homologous neurons represent a mechanism for evolutionary adaptation without large-scale changes to circuitry. Comparing the predatory nematode P. pacificus with C. elegans, this study systematically examines monoamine-producing neurons, assesses how their neurotransmitter identities differ between homologous neural types, and determines how these differences relate to behavior.

      Strengths:

      The major strength of this work is its breadth, rigor, and data quality. It combines multiple, independent lines of evidence to assign neurotransmitter identity for neurons with homology grounded in lineage, morphology, and connectomics, which is essential for meaningful cross-species comparisons. Additionally, by extending the analysis beyond P. pacificus and C. elegans to other nematodes, the authors convincingly argue that features observed in P. pacificus likely reflect an ancestral state. This depth greatly enhances the significance of the conclusions.

      This work is likely to have a significant impact on the fields of comparative neurobiology and nervous system evolution. It demonstrates a powerful system and approach for linking molecular identity, cell-type homology, circuit context, and behavior across species. The data generated here will be a valuable resource for the community and provide a strong foundation for future mechanistic studies.

      More broadly, the study reinforces the idea that evolutionary change in nervous systems can occur through modulation of chemical signaling within conserved circuits, rather than through complete rewiring. This conceptual framework is likely to influence how researchers think about neural evolution in other systems.

      Weaknesses:

      Given the availability of detailed connectivity information for both species, a more explicit comparison of the local circuit context of key neurons would further strengthen the link between molecular identity and circuit function.

      We plan to address these concerns in a revised version of the article.

      Reviewer #3 (Public review):

      Summary:

      The study by Hong, Loer, Hobert, and colleagues is a comprehensive description of monoaminergic neurons in the nematode Pristionchus pacificus. The work used multiple, complementary approaches, including immunostaining and expression of genes involved in neurotransmitter synthesis or transport, to identify neurons that express a monoamine neurotransmitter. Moreover, this study characterized the phenotypes of various mutants to study their organismal function. Extensive comparisons are made to C. elegans, the nematode model that, in a way, anchors the model studied here, and new outgroup species were examined for some features so that the polarity of their evolution could be inferred. Although there is no simple or groundbreaking punchline to distill from the manuscript (i.e., other than some things are the same as in C. elegans, and some things are different), and while the study is basically descriptive in nature, the scope of the project warrants broad attention.

      Strengths:

      This manuscript offers a tremendous resource for those who use this species as a model, which, based on the author list alone, includes many labs. This study sets the bar for what can be done in a "satellite" model system.

      Given the complementarity of approaches used, such as the position of cell bodies, the connectivity and morphology of dendrites, and a previously published atlas of the connectome for this species, the identification of specific neurons (which, as the authors point out, can be easily mistaken) is convincing throughout. Likewise, appropriate caution is observed where neuron identities are ambiguous, e.g., unlabeled cells in Figure 5, or ambiguous identities in other species, as shown in Figure 10. There was a lot of data to unpack in this manuscript, but I could not find any obvious flaws in neuron identification.

      Also, the phenotypic assays were straightforward and informative.

      Weaknesses:

      No serious weaknesses were noted. One minor comment is that in general, I think the Methods could use some additional text to describe what the goal of any given technique was. For example, although there is a description of the HCR protocol in the methods, nowhere does it say what genes this method would be used for. In addition to what is shown in Figure 4, this information should be given in the Methods.

      More detailed methods will be provided in a revised version of the article.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) I found the bigger picture analysis to be lacking. Let us take stock: in other work, during active cognition, including at least one study from the Authors, TDLM shows significance sequenceness. But the evidence provided here suggests that even very strong localizer patterns injected into the data cannot be detected as replay except at implausible speeds. How can both of these things be true? Assuming these analyses are cogent, do these findings not imply something more destructive about all studies that found positive results with TDLM?

      Our focus here is on advancing methodology. Given the diversity of tasks and cognitive states in the TDLM literature, replay could exceed detection thresholds under specific conditions—especially when true event durations align with short analysis windows. While a comprehensive re-analysis of prior datasets is beyond our scope, we agree a concise synthesis can strengthen the paper.

      The previous TDLM literature uses a diverse set of tasks and addresses a broad spectrum of cognitive constructs/processes. As we acknowledge, it is perfectly possible that replay bursts in short time windows are well detectable by TDLM. However, we acknowledge that some commentary on this is warranted and have added the following paragraph to the discussion that addresses “improving TDLMs sensitivity”:

      “Finally, what do our simulations imply for the broader MEG replay literature? Our implementation successfully detects replay when boundary conditions are met, as shown in the simulation. But sensitivity depends critically on high fidelity between the analysis window and the density of replay events. A systematic evaluation of these conditions as they apply to prior studies remains beyond the scope of the current paper. Instead, our focus is on delineating boundary conditions that we hope will motivate conduct of power analyses in future work as well as inclusion of simulations that approximate realistic experimental conditions.”

      (2) All things considered, TDLM seems like a fairly 'vanilla' and low-assumption algorithm for finding event sequences. It is hard to see intuitively what the breaking factor might be; why do the authors think ground truth patterns cannot be detected by this GLM-based framework at reasonable densities?

      We agree with the overall sentiment of the referee. Our intuition is that one of the principal shortcomings of the method relates to spurious sequenceness induced by unknown factors at baseline, and poor transfer of the decoder to other modalities. and have a rough understanding of how they occur, we are currently not in a position to identify their nature. Note that we believe that these confounders are not exclusive to TDLM but are potentially threatening to all kinds of sequenceness analysis of longer time series that rely on decoders. Indeed, we suspect that classifier training is another bottleneck, as we don’t know the exact nature of the representations that are replayed, including the degree of overlap there is with a commonly used visual localizer. That said, this is not of relevance for the simulation in so far as we insert patterns that exceed the pattern strength in the localizer.

      Finally, a potential major drawback is the permutation test for significance testing. As the original authors of TDLM have noted, the current test which permutes states is overly conservative. It measures fixed effects and as it only considers the group level mean it is accordingly easily biased by individual outliers. This we have tried to account for by z-scoring sequenceness scores. We have also conferred on this with some of the authors of TDLM and discussed a yet unpublished method that aims to address this exact issue. The proposed new method uses a sign-flip permutation test at a group level and therefore implements a random-effects model of the data. This significance test has markedly increased power while still controlling for FWER. However, while we show in our power analysis that the new method is indeed more sensitive, it does not materially change the interpretation of the data. We have included this novel method in the paper and added it into the main analysis and most of the simulations.

      (3) Can the authors sketch any directions for alternative methods? It seems we need an algorithm that outperforms TDLM, but not many clues or speculations are given as to what that might look like. Relatedly, no technical or "internal" critique is provided. What is it about TDLM that causes it to be so weak?

      We believe there are several shortcomings and bottlenecks within TDLM that need to be evaluated and improved. While we highlight these issues in the discussion section titled “Improving TDLMs sensitivity,” we agree that we should provide a clearer outline of its current shortcomings. We have now added to the discussion to expand on that we think needs improvement (‘fixed time lag’) and also add a summary statement at the end of the relevant paragraph to recap the main issues needed for an improved successor method. The new paragraphs read:

      “Lastly, there are certain assumptions that TDLM makes that might not hold (see Methods Study II): Current implementations look for a fixed time lag that is the same across all participants and between all reactivation events. If time lags differ across participants, TDLM will fail to find them. Similarly, TDLM assumes a fixed sequence order and is not robust against slight within-sequence permutations or in-sequencemissing reactivation events. However, from other data sources., such as hippocampal place cell recordings, it is known that such permutations can occur where some states are skipped or fail to decode during replay. Similarly, it is assumed that each reactivation event lasts between 10-30 milliseconds, but the true temporal evolution of reactivation measured by TDLM is currently unknown. Future method development might focus on improving invariance to these assumptions.

      […]

      In summary, there are several areas where TDLM might be improved, including a restriction in its search space, improvement in classifiers, a validation of localizer representation transfer to other domains (e.g. memory representations), and the extension of TDLM to render it more robust against violations of its core assumptions.”

      Reviewer #2 (Public review):

      Weaknesses:

      The sample size is small (n=21, after exclusions), even for TDLM studies (which typically have somewhere between 25-40 participants). The authors address this somewhat through a power analysis of the relationship between replay and behavioural performance in their simulations, but this is very dependent on the assumptions of the simulation. Further, according to their own power analysis, the replay-behaviour correlations are seriously underpowered (~10% power according to Figure 7C), and so if this is to be taken at face value, their own null findings on this point (Figure 3C) could therefore just reflect under sampling as opposed to methodological failure. I think this point needs to be made more clearly earlier in the manuscript.

      We agree with the referee that our sample is smaller than previous studies due to participant exclusion criteria. However, the take-away message from our behavioural simulation and bootstrapping is that even with larger sample sizes, it is difficult to overcome baseline fluctuations of sequenceness, even if very strong replay patterns were detectable and sample sizes were of similar size to that of previous studies. Therefore, we are not convinced that that our null findings are fully explained by the smaller sample size compared to that of previous studies, Additionally, we show that even within the range of other studies, similar power would have been expected (Supplement Figure 11). However, it is true that in general null findings can be explained by under-sampling, under the assumption that an effect is present. To amplify this point, we have added the following to the Figure 3C:

      “[…]. NB, however, as our simulation shows, correlations of sequenceness with behavioural markers are likely to be underpowered and occur only with very high replay rates or much higher sample size. See our simulation discussion for a more detailed explanation on how correlations may be inherently biased, where fluctuations in baseline sequenceness overshadow individual scaling with behavioural markers.”

      Furthermore, we have added the following paragraph to the discussion to highlight this point and refer to a power analysis we have now added to the supplement (see next answer):

      “Sample sizes in previous TDLM literature usually range between 20 to 40 participants. A bootstrap power analysis shows that even at those sample sizes, power would remain low unless unrealistically high replay rates are assumed (Supplement Figure 11). Our bootstrap simulation shows that a correlation analysis between sequenceness and behaviour would in these cases be drastically underpowered, even under an assumption of high replay densities.”

      Finally, we have added a remark about the sample size to the limitations section, as naturally, an increase in sample size would yield higher power:

      “Finally, while initially planning for thirty participants, due to exclusion criteria, our study featured fewer participants than most previous studies using TDLM (i.e. usually 25-40, but 21 in our study). While we are confident that our simulation results hold under these sample sizes, as sample sizes of other studies show comparable power to ours (Fehler! Verweisquelle konnte nicht gefunden werden.), we cannot fully rule out a possibility that our null-findings are explained by a lack in power alone.”

      Relatedly, it would be very useful if one of the recommendations that come out of the simulations in this paper was a power analysis for detecting sequenceness in general, as I suspect that the small sample size impacts this as well, given that sequenceness effects reported in other work are often small with larger sample sizes. Further, I believe that the authors' simulations of basic sequenceness effects would themselves still suffer from having a small number of subjects, thereby impacting statistical power. Perhaps the authors can perform a similar sort of bootstrapping analysis as they perform for the correlation between replay and performance, but over sequenceness itself?

      We agree with the referee that this, in principle, is a great idea. However, the way that significance thresholds are calculated poses a conceptual problem for such an analysis: as for significance threshold we are defining the maximum sequenceness value across all participants, all time lags and all permutations. This sequenceness value is compared against the mean of all participants, disregarding the standard deviation. This maximum threshold would not change if we bootstrapped some of our samples. Additionally, the 95% would also not change significantly. To illustrate this point, we have added this analysis to the supplement, as Supplement Figure 10. However, the new sign-flip permutation test we now include allows for such a comparison, as it takes variance between participants into account as well! We have included all three variants of the power analysis and the figure description now reads:

      “Supplement Figure 11 Power analysis of sequenceness significance for bootstrapped samples sizes. A) Powermap for state-permutation thresholds. However, here the bootstrap approach suffers from a conceptual problem: significance thresholds are defined by the permutation maximum and/or 95-percentile of the maximums across all sequence-permutations across participants. If we resample bootstrap-participants from our existing pool, the maximum thresholds computed will remain relatively stable across resampled participants, as it only compares against the mean and disregards the standard deviation. B) The newly presented statistical approach is significantly more sensitive at higher sample sizes. Note that even then, 80% power is only reached with replay density of higher than 50 min-1 at a sample size of 60 participants. Additionally, the sign-flip permutation test assumes that the mean is at zero. As we observed a non-zero mean due to spurious oscillations, we subtracted the mean sequenceness of the baseline condition from each participant before permuting to achieve a null distribution with mean zero, as otherwise, we would have found significant replay effects in the baseline condition at increasing sample size. Nevertheless, due to the higher sensitivity, the new sign-flip test is recommended over the previous sequence-permutation-based test. Colours indicate the power from 0 to 1 for different bootstrapped sample sizes and densities. 80% power thresholds are outlined in black.”

      The task paradigm may introduce issues in detecting replay that are separate from TDLM. First, the localizer task involves a match/mismatch judgment and a button press during the stimulus presentation, which could add noise to classifier training separate from the semantic/visual processing of the stimulus. This localizer is similar to others that have been used in TDLM studies, but notably in other studies (e.g., Liu, Mattar et al., 2021), the stimulus is presented prior to the match/mismatch judgment. A discussion of variations in different localizers and what seems to work best for decoding would be useful to include in the recommendations section of the discussion.

      We agree and thank the referee for raising this issue. Note, we acknowledge we forgot to mention that these trials were excluded from classifier training. Our rationale of presenting the oddball during stimulus presentation, and not thereafter, was an assumption that by first presenting the audio and then the visual cue we would create more generalized representations that would be less modalitydependent. However, importantly, we excluded all trials that were oddballs from localizer training. Therefore we assume that this particular design choice will not greatly affect the decoder training. If some motor-preparation activity is present during the stimulus presentation, then it should be present equally across all trials and hence be ignored by the classifier as we balanced the transitions between images. We now added this information to the main text:

      “In each trial, a word describing the stimulus was played auditorily, after which the corresponding stimulus was shown. In ~11% of cases, there was a mismatch between word and image (oddball trials), and these trials were excluded from the localizer training.” Additionally in the methods section: “These oddball-trials were excluded from all further analysis and decoder training.”

      Nevertheless, we agree that the extant variety in localizer designs is underdiscussed where many assumptions of classifier training are not, as yet, fully validated. We have added a sentence highlighting different oddball paradigms to the section on the discussion of localizers and also add a summary statement with recommendations. The passage now reads:

      “Additionally, a wide variety of oddballs has been used (e.g. upside-down, scrambled, or mismatched images, cues presented visually, as words, auditorily, etc), and at this time it is unclear if these affect the representations that the classifier learns [...] In summary, we would expect a multimodal categorical localizer, and a classifier that isn’t trained on a specific timepoint, to generalize best.”

      Second, and more seriously, I believe that the task design for training participants about the expected sequences may complicate sequence decoding. Specifically, this is because two images (a "tuple") are shown together and used for prediction, which may encourage participants to develop a single bound representation of the tuple that then predicts a third image (AB -> C rather than A -> B, B -> C). This would obviously make it difficult to i) use a classifier trained on individual images to detect sequences and ii) find evidence for the intended transition matrix using TDLM. Can the authors rule out this possibility?

      We thank the reviewer for raising a possibility we have not considered! While there is some evidence that a single bound representation would have overlap with its constituents (especially before long term-consolidation) and therefore be detectable by the classifiers, we acknowledge the possibility that individual classifiers would fail to be sensitive to such a compound representation. In fact we find in the retrieval data some evidence for a combined replay of representations (where representations are replayed seemingly at the same time, see Kern 2024). We have added such a possibility to the interims-discussion of Study 1 as a qualification . However, this does not change the results or interpretation of our simulation which we consider is a key message of the paper.

      The relevant segment in the discussion section now reads:

      “Additionally, given that the stimuli were presented in combined triplets, participants may have formed a singular representation of associated items and subsequently replayed these (e.g., AB→C), instead of replaying item-by-item transitions (A→B→C). Under such a scenario, a classifier trained on individual items may fail to detect these newly formed bound representations, particularly if they diverge strongly from the single-item patterns. In our previous study where we address retrieval (Kern et al., 2024) we found that states were to varying extent co-reactivated, yet classifiers trained on single items retained sensitivity to detect these combined reactivation events. Consistent with this, prior work suggests that unified representations retain overlap with their constituent item representations (Dennis et al., 2024; Liang et al., 2020), however, there’s also evidence that different brain regions are involved if representational unitization occurs (Staresina & Davachi, 2010), potentially confusing classifiers. Therefore, we cannot exclude that rest-related consolidation replays engendered unitized representations that were insufficiently captured by our singleitem classifiers.“

      Participants only modestly improved (from 76-82% accuracy) following the rest period (which the authors refer to as a consolidation period). If the authors assume that replay leads to improved performance, then this suggests there is little reason to see much taskrelated replay during rest in the first place. This limitation is touched on (lines 228-229), but I think it makes the lack of replay finding here less surprising. However, note that in the supplement, it is shown that the amount of forward sequenceness is marginally related to the performance difference between the last block of training and retrieval, and this is the effect I would probably predict would be most likely to appear. Obviously, my sample size concerns still hold, and this is not a significant effect based on the null hypothesis testing framework the authors employ, but I think this set of results should at least be reported in the main text.

      We disagree that an absence or presence of replay might be inferred from an absolute memory enhancement. While consolidation can lead to absolute improvement of performance in, for example, motor memory domains one formulation is that in declarative learning tasks replay stabilizes latent memory traces, and in such a scenario would not necessarily lead to a boosted performance. While many declarative consolidation studies report an increase of performance compared to a control condition (i.e. without a consolidation window), this does not necessarily entail an absolute performance increase, as replay might just act to protect against loss of memory traces. Therefore, the modest increase we observe does not inference as to the presence of absence of replay absent a proper control condition.

      We did expect to find a correlation between replay and individual behavioural. Indeed, a weak correlation with performance and sequenceness can be detected. However, as we also show any such correlation is overshadowed by baseline fluctuations in sequenceness such that its overall validity is questionable, even under very high replay rates. We are therefore circumspect about this correlation, even if it was significant. Therefore, in the discussion, we chose to refrain from putting much focus on this correlation. Nevertheless, we do add a short statement to the corresponding figure label, discussing this precise issue. The segment now reads:

      “While we found a non-significant relation between a memory performance enhancement and post-learning forward sequenceness we are cautious not to overinterpret these results. As in the section “Correlation with behaviour only present at high replay speeds” the noted correlational measure oscillates heavily with baseline sequenceness fluctuations, and any true replay effect is likely to be overshadowed by such fluctuations.”

      I was also wondering whether the authors could clarify how the criterion over six blocks was 80% but then the performance baseline they use from the last block is 76%? Is it just that participants must reach 80% within the six blocks *at some point* during training, but that they could dip below that again later?

      We thank the reviewer for highlighting this point: The first block wherein participants reached >80% ended the learning blocks. After a maximum of six blocks the learning session was ended regardless of performance. Therefore, some participants’ learning blocks were ended after six blocks and without them reaching a performance of 80%.. While we described this in the Methods section, it was missing from the Results Study I section, which now contains:

      “[...] Participants then learned triplets of associated items according to a graph structure. Within the learning session, participants performed a maximum of six learning blocks, but the session was stopped if participants reached 80% memory performance (criterion learning,, up to a memory performance criterion of 80% (see Methods for details)”

      The Figure 2 description now contains

      “[...] Participants’ completed up to six blocks of learning trials. After reaching 80% in any block, no more learning blocks were performed (criterion learning) [...]”

      Lastly, there was a mistake in the Behavioural results section, which stated “All thirty participants, except one, [..] to criterion of 80%.” This is an error. In our preregistration, we defined to only include participants that successfully learned anything at all above chance. Here,we meant that only one participant failed to reach a criterion that we defined as “successful learning”. We fixed it and it now reads

      “with an accuracy above 50% (which we preregistered beforehand as an exclusion criterion for “successful learning above chance”).”

      Additionally, we have noted this for clarity in the methods section and excuse this mistake:

      “Additionally, as successful above-chance learning was necessary for the paradigm, we ensured all remaining participants had a retrieval performance of at least 50% (one participant had to be excluded, but was already excluded due to low decoding performance).”

      Because most of the conclusions come from the simulation study, there are a few decisions about the simulations that I would like the authors to expand upon before I can fully support their interpretations. First, the authors use a state-to-state lag of 80ms and do not appear to vary this throughout the simulations - can the authors provide context for this choice? Does varying this lag matter at all for the results (i.e., does the noise structure of the data interact with this lag in any way?)

      This was a deliberate choice but we acknowledge the reasoning behind this was not detailed in our initial submission. We chose a lag of 80 millisecond for three reasons: first, it is distant from the 9-11 Hz alpha oscillations we observed in our participants and does not share a harmonic with the alpha rhythm; second, we wanted to get a clear picture of the effect of simulated replay that is as isolated as possible from spurious sequenceness confounders present in the baseline condition. Thus, we chose a lag in which the sequenceness score was close to zero in the baseline condition; thirdly , in this revision, we subtracted the mean sequenceness value of the baseline such that any simulation effects would start, on average, at zero sequenceness. In this way, we could attribute any increase in sequenceness to the experimentally inserted replay, that was independent of spurious oscillations. Finally (but less importantly), as we observed that a correlation of sequenceness with behaviour was fluctuated strongly, for the reason detailed above, we chose a lag in which a correlation was as close as possible to zero. If we had not chosen a lag that adhered to these conditions, we were at risk of measuring simulated replay plus spurious sequenceness confounders.

      We have added a sentence to the main text detailing this justification:

      “We chose this timepoint (80 msec state to state lag) as its sequenceness value was close to zero in the baseline condition as well as being distant to the observed alpha rhythms of the participants (which varied between ~9-11 Hz). Additionally, we subtracted the mean sequenceness value of the baseline at 80 milliseconds lag such that any simulation effects would, on average, start at zero sequenceness “

      Additionally, we now add a more detailed explanation to the methods section.

      “This time lag (80 msec) was chosen in order to isolate precisely an effect of the experimentally inserted sequenceness. Thus, we chose a lag at which the mean baseline sequenceness was close to zero and where the correlation with behaviour was low. Additionally, we subtracted the mean sequenceness value (at 80 milliseconds) at baseline from the specific lag recorded for each participant, such that simulation effects would be initialized at zero sequenceness on average enabling any effects to be attributed purely to inserted replay. Additionally, we excluded time lags too close to the alpha rhythms of participants (which varied between ~9-11 Hz) or lags which would have a harmonic with the rhythm.”

      Second, it seems that the approach to scaling simulated replays with performance is rather coarse. I think a more sensitive measure would be to scale sequence replays based on the participants' responses to *that* specific sequence rather than altering the frequency of all replays by overall memory performance. I think this would help to deliver on the authors' goal of simulating an "increase of replay for less stable memories" (line 246).

      The referee makes an excellent point and our simulations could be rendered more realistic by inserting the actual tuples that participants answered correctly. If we understand the point correctly, there are two different ways replay might be impacted by performance: First, we can conjecture that there is greater replay if memory performance is not saturated. Second, replay only occurs for content that has actually been encoded!

      The main reasons why we chose to simulate the entire sequence being replayed for each participant is based on the following. TDLM is implemented such that the amount of replay alone is relevant, and actual transitions are not affecting the results beyond noise. Under the assumption that class-specific classifiers perform equally well, simulating A->B, B->C or simulating A->B, A->B yields equivalent results. However, results can differ if this assumption is violated. By drawing from the entire space of classes we insert, we minimize the risk of some classifiers being worse than others for some participants. For example, if we simulated only A->B for some participant instead of the whole sequence, and by chance classifier A performs suboptimally, we would then introduce additional unwanted variance into our results.

      Secondly, from our reading of the literature we infer that replay is increased generally (i.e. density of learning-specific replay is increased) for less stable memories. However, we do not have indicators of memory strength, but only a binary “remembered or not”. As TDLM is invariant to the actual transitions being replayed and only indexes the number of transitions, we chose to ignore which transitions we insert and only scaled the amount of replay.

      We have added an analysis to the Appendix that discusses this specific aspect of our study where we show that results are equivalent if we simulate replay of “A->B B->C C->D” or only “A->B A->B A->B A->B”. As we do not know how replay density interacts with memory trace stability, we opted to leave the current simulation as is. The corresponding paragraph and figure description now read:

      “From literature we know that replay is increased after learning and that less stable memories are replayed more often. We simulated this effect by scaling our replay density inversely with performance. However, for simplicity, in our simulation, we inserted sampled transitions from all valid transitions given by the graph structure, i.e., the following transitions were valid: However, this meant that some participants would have transitions inserted that they didn’t actually remember. To show that this would not change results, we simulated two scenarios: In the full sequence scenario, all valid graph transitions are inserted (i.e. all participant’s replay is sampled from 'A->B, B->C, C->D, D->E, E->F, F->G, G->E, E->H, H->I, I->B, B->J, J->A'). In the second scenario (memorized transitions) we only replayed transitions that the participant actually retrieved correctly during the post-resting state testing sessions (i.e. a participant’s replay would have been sampled from ‘A->B, B->C, G->E, E->H, H>I’, if those were the ones he remembered). In both scenarios, the number of events is kept constant. The results are equivalent as can be seen in Appendix A Figure 3. NB this only holds under the assumptions that classifiers are equally good at decoding each class.”

      […]

      “TDLM is insensitive towards which transitions are replayed and only sensitive to how many transitions are detected in total. Here we simulate transitions either sampled from the full graph (light orange/green) or participant-specific transitions of trials that participants correctly remembered (dark orange/green). Shaded areas denote the standard error across participants.”

      On the other hand, I was also wondering whether it is actually necessary to use the real memory performance for each participant in these simulations - couldn't similar goals (with a better/more full sampling of the space of performance) be achieved with simulated memory performance as well, taking only the MEG data from the participant?

      The decision to use real memory performance is indeed arbitrary. We could have also used randomly sampled values. However, as we wanted to understand our nullresults better we opted to use real performance to adhere as close as possible to the findings we previously reported. Using uniformly sampled memory performance would be less explanatory w.r.t to our actual results of the resting state data that are reported in the first study we report in the manuscript (Study I).

      Nevertheless, our current implementation already presents an approach that samples the entire performance range for the sub-analysis focusing on the correlation with behaviour. Here, in the section on “best-case”-scenario, we implement this such that it spans factors from 1 to 0 (i.e., a participant with 100% performance gets a replay scale factor of 0 and hence no replay simulated, and the worst performing participant with 50% performance has a replay rate multiplied by 1). We scale the amount of replay with this factor. As a correlation is invariant to linear scaling, statistically this is equivalent to stretching the performance distribution from 0 to 100%. We have added a sentence to the methods to provide further focus on this point:

      “To assess how performance might affect replay in our specific dataset, we chose to use the original participants’ performance values instead of uniformly sampling the performance space (which ranged from 50 to 100%). However, for the correlation analysis, we additionally added a “best-case” scenario, in which we scale replay from 0 to 1, an approach that is statistically equivalent to scaling values to the full space of possible performance (0 to 100%) (see Results Study II: Simulation).”

      Finally, Figure 7D shows that 70ms was used on the y-axis. Why was this the case, or is this a typo?

      Thanks, this is indeed a typo, we fixed it.

      Because this is a re-analysis of a previous dataset combined with a new simulation study on that data aimed at making recommendations about how to best employ TDLM, I think the usefulness of the paper to the field could be improved in a few places. Specifically, in the discussion/recommendation section, the authors state that "yet unknown confounders" (line 295) lead to non-random fluctuations in the simulated correlations between replay detection and performance at different time lags. Because it is a particularly strong claim that there is the potential to detect sequenceness in the baseline condition where there are no ground-truth sequences, the manuscript could benefit from a more thorough exploration of the cause(s) of this bias in addition to the speculation provided in the current version.

      We are currently working on a theoretical basis to explain these spurious sequenceness confounders in the baseline condition. Indeed, in our preliminary work, in certain contexts we can induce significant sequenceness in the absence of any replay signal during baseline. However, this work is at an early stage and we still have some conceptional problems to solve before we are confident enough with these data. We believe at present it would be premature to add these data to the current manuscript. Nevertheless, we now mention these spurious sequenceness confounders to raise awareness for the field and also add greater context to the discussion, highlighting one of the issues that we think is of importance:

      “[…] For example, if two classifiers’ probabilities oscillate at 10 Hz but at a different phase, a spurious time lag can be found reflecting this phase shift. We speculate that more complex interactions between classifiers oscillating at different phases are also conceivable.”

      In addition, to really provide that a realistic simulation is necessary (one of the primary conclusions of the paper), it would be useful to provide a comparison to a fully synthetic simulation performed on this exact task and transition structure (in addition to the recreation of the original simulation code from the TDLM methods paper).

      Thank you for this suggestion! We have now added a synthetic simulation, trying to keep as close as possible to the original simulation code in Liu et al. (2021), while also incorporating our current means of simulating the data (i.e. scaling by performance). We think this synthetic simulation greatly improves the paper and gives weight to our suggestion about the superiority of a hybrid approach. Additionally, it prompted us to look closer at patterns that are inserted in the synthetic simulation and perform a comparative analysis. We have now added the simulation to the main text, together with a methodological explanation of how we simulated the data in the methods section. We also added a discussion on the results and why we think a hybrid approach is currently superior to synthetic approach. The whole new section is too long to paste here – it is found after the main simulation section in the manuscript. We have also added another sentence to the abstract referring to this new inclusion.

      Finally, I think the authors could do further work to determine whether some of their recommendations for improving the sensitivity of TDLM pan out in the current data - for example, they could report focusing not just on the peak decoding timepoint but incorporating other moments into classifier training.

      While we do understand the desire to test further refinement to TDLM on the data directly, we intentionally do not include such analyses in the current paper. Our experience also informs us that there is an enormous branching factor of parameters when applying TDLM, with implications for significance of results in one or other direction. However, as there are currently only limited ways to know how well parameter changes actually improve the sensitivity to replay versus exacerbate potential underlying confounders that induce spurious sequenceness (e.g., we can get significant replay in the control condition with some parameter changes). To exclude such false positive findings, we opt for a relatively strict adherence to previously published approaches. Thus, in the current paper, we limit ourselves to assessing the reliability and robustness of previous approaches.

      Furthermore, while training on a later timepoint might increase sensitivity for a classifier when transferring between different modalities (e.g. visual to memory representation), this approach does not transfer well in our simulations, as the inserted patterns are from the same modality. We consider other, more bespoke studies, are better suited to improve classifier training. NB also see our recently started Kaggle challenge to tackle this problem: https://www.kaggle.com/competitions/the-imagine-decoding-challenge

      However, we have added a note about this dilemma to the improvement section. The section now includes:

      “Nevertheless, as the considerable branching factor poses a threat of increased falsepositive findings we opt to focus the current simulations on previously published pipelines and parameters. Future studies should systematically evaluate parameter choices on TDLM under different conditions, something that is beyond the remit of the current study.”

      Lastly, I would like the authors to address a point that was raised in a separate public forum by an author of the TDLM method, which is that when replays "happen during rest, they are not uniform or close." Because the simulations in this work assume regularly occurring replay events, I agree that this is an important limitation that should be incorporated into alternative simulations to ensure the lack of findings is not because of this assumption.

      The temporal distribution of replay throughout the resting state should not matter, as TDLM is invariant w.r.t to how replay events are distributed within the analysis window. Specifically, it does not matter if replay events occur in bursts or are uniformly distributed. Only the number of transitions is relevant, where they occur or if they are close to each other is not relevant to the numerical results (as long as the refractory window is kept, too short distances will lead to interactions between events and reduce sensitivity).). To emphasize this point, we have added another simulation which is shown in Appendix A.1 and Appendix A Figure 1. We have referenced it in the text and added the following paragraph in the Methods section

      Additionally, the timepoints of inserting replay within the resting state are sampled from a uniform distribution. Even though TDLM tracks reactivation events over time, at a macro-scale the algorithm is invariant to the temporal distribution. At each time step, the GLM regresses onto a future time step up to the maximum time lag of interest, yielding a predictor per lag. However, these predictors within the GLM are independently assessed, and hence, TDLM is, outside of the time lag window, relatively invariant to the temporal distribution of replay. To demonstrate our claim, we simulated uniform replay vs “bursty” replay that only occurs in some parts of the resting state, both yield equivalent sequenceness results (see Appendix A.1).

      Reviewer #3 (Public review):

      (1) I am still left wondering why other studies were able to detect replay using this method. My takeaway from this paper is that large time windows lead to high significance thresholds/required replay density, making it extremely challenging to detect replay at physiological levels during resting periods. While it is true that some previous studies applying TDLM used smaller time windows (e.g., Kern's previous paper detected replay in 1500ms windows), others, including Liu et al. (2019), successfully detected replay during a 5-minute resting period. Why do the authors believe others have nevertheless been able to detect replay during multi-minute time windows?

      (Due to similarity, we combined our responses with the first question of Reviewer 1)

      We are reluctant to make sweeping judgments in relation to previous literature as we wanted to prioritize on advancing methodology instead. The previous TDLM literature uses a diverse set of tasks and cognitive processes. As we state ourselves, it is possible that replay bursts in short time windows are well detectable by TDLM. We were intentionally cautious to directly critique previous studies without detailed re-analysis of their work and wanted to leave such a conclusion up to the reader. However, we realize that such a “thought-starter” might be warranted and improve the paper. Therefore, we have added the following paragraph to the discussion about “improving TDLMs sensitivity”:

      “Finally, what do our simulations imply for the broader MEG replay literature? Our implementation successfully detects replay when boundary conditions are met, as shown in the simulation. But sensitivity depends critically on high fidelity between the analysis window and the amount of replay events. A systematic evaluation of these conditions across prior studies is beyond the scope of this paper, so we do not want to adjudicate earlier findings and leave this assessment up to the reader. Instead, we delineate the boundary conditions and urge future work to conduct power analyses where possible and include simulations that approximate realistic experimental conditions.”

      For example, some studies using TDLM report evidence of sequenceness as a contrast between evidence of forwards (f) versus backwards (b) sequenceness; sequenceness was defined as ZfΔt - ZbΔt (where Z refers to the sequence alignment coefficient for a transition matrix at a specific time lag). This use case is not discussed in the present paper, despite its prevalence in the literature. If the same logic were applied to the data in this study, would significant sequenceness have been uncovered? Whether it would or not, I believe this point is important for understanding methodological differences between this paper and others.

      This approach was first introduced as part of a TDLM-predecessor that utilized crosscorrelations (Kurth-Nelson 2016), where this step is a necessity to extract any sequenceness signal at all by subtracting signals that are present in both (akin to an EEG reference). However, its validity is less clear when fwd and bkw are estimated separately, as is in the GLM case. The rationale behind subtracting here is the same as for autocorrelations: there are oscillatory confounds present in the data that introduce spurious sequenceness in both directions alike, i.e. at the same time lag, that can simply be removed by subtracting. However, this assumption only holds if the sole confounder is auto-correlations caused by a global signal that oscillates at all sensors at the same phase. In our own experience, and mentioned in the discussion, we do not think this assumption holds. Arguably, there are more complex interactions at play that cannot be removed by such a subtraction such as an increase in false positives if confounders are in an opposite direction at a specific time lag. This assumption-violation can be seen in our baseline condition, where other spurious sequenceness diverges in opposite directions for some time lags (e.g. at ~90 ms where forward sequenceness is negative and backward sequenceness is positive). We reasoned that oscillatory confounds are more stable when comparing pre vs post for the same direction than comparing within session between forward minus backward.

      Finally, we note issues introduced by the various ways that sequenceness has been analysed in previous papers: normalization of sequenceness (z-scoring across time lags or across participants or not at all), normalization of probabilities (taking raw decision scores, z-scoring, soft-max, dividing by mean, subtracting mean), taking a windowed approach and summing sequenceness scores, not to mention the various classifier choices that can be made, and all of this can be applied before subtracting conditions from each other or before subtraction. In our experience there is insufficient regard to control for multiple comparison when running all these analyses risking selectivity in reporting.

      Nevertheless, subtracting forward from backward replay is probably as valid as post minus pre. Therefore, we have added fwd-bkw plots to the supplement and explained some of the reasoning for not reporting them in the main text in the figure label. The figure label and reference now read:

      “Finally, we report forward minus backward sequenceness and our motivation for using an across-session post-pre comparison instead of within-session forwardbackward in Supplement Figure 10.”

      […]

      “Forward minus backward sequenceness within each resting state session. Previous papers often report subtraction of backward from forward sequenceness (fwd-bkw) as a means to remove oscillatory confounds that impact both sequenceness directions in synchrony. While required in early cross-correlation approaches (KurthNelson et al., 2016), its validity in GLM-based frameworks depends on an assumption that confounds are global and in-phase across sensors. We observed this assumption is violated in our baseline data, where spurious sequenceness occasionally diverges in opposite directions at specific time lags (e.g., ~90 ms). In such instances, subtraction would increase the false-positive rate rather than suppress noise. In Figure 3B, we prioritized the comparison of pre-task versus post-task sequenceness within the same direction, as oscillatory confounds appeared more stable across time within a single direction, as opposed to across directions within a single session. However, we consider both approaches are valid. We now provide the fwd-bkw plots for completeness and comparison with previous literature. A) forward minus backwards sequenceness for Control (left) and Post-Learning resting-state (right). B) T-value distribution of the sign-flip permutation test for Control (left) and Post-Learning resting-state (right)”

      (2) Relatedly, while the authors note that smaller time windows are necessary for TDLM to succeed, a more precise description of the appropriate window size would greatly improve the utility of this paper. As it stands, the discussion feels incomplete without this information, as providing explicit guidance on optimal window sizes would help future researchers apply TDLM effectively. Under what window size range can physiological levels of replay actually be detected using TDLM? Or, is there some scaling factor that should be considered, in terms of window size and significance threshold/replay density? If the authors are unable to provide a concrete recommendation, they could add information about time windows used in previous studies (perhaps, is 1500ms as used in their previous paper a good recommendation?).

      We currently do not have an empirical estimate of which window sizes are appropriate. While we used 1500ms in our previous paper, this was solely given by the experiment design which had a 1.5s wait period before the next stimulus. Our recommendation for best guidance on this matter would be to investigate related intracranial literature for SWR rate increases under similar experimental conditions. We have added the following paragraph to the discussion:

      “At this stage we cannot offer a general recommendation for window sizes as they are likely to depend on details of the research paradigm. However, intracranial recordings can be used as proxy to estimate the duration of replay bursts, for example as reported in (Norman et al., 2019) where increased SWRs were seen up to 1500 ms after retrieval cue onset”

      (3) In their simulation, the authors define a replay event as a single transition from one item to another (example: A to B). However, in rodents, replay often traverses more than a single transition (example: A to B to C, even to D and E). Observing multistep sequences increases confidence that true replay is present. How does sequence length impact the authors' conclusions? Similarly, can the authors comment on how the length of the inserted events impacts TDLM sensitivity, if at all?

      Good point! So far, most papers do not seem to include multi-step TDLM and in our experience rightfully, as it is conceptionally difficult to define clear significance thresholds while keeping in mind that shorter sub-sequences are contained within a longer sequence (e.g. ABC contains both AB and BC and a longer dependency of AC) that renders it difficult to define the correct way to create a null distribution for the permutation test. Therefore, we tried to stay as close as possible to previous approaches and only looked for single-step transitions. Nevertheless, we have added an analysis to the supplement comparing how TDLM behaves if we simulate A->B->C or A->B and separate B->C. It shows that TDLM is only sensitive to the number of transitions present in the data, and it does not matter if they are chained or chunked. The segment reads:

      “We intentionally designed our study to encourage replay of triplets. However, this begs the question as to whether it matters if triplets or individual chunks of a sequence are replayed at different time points? Here, we simulated two scenarios. In one, we inserted replay of single transitions alone with a refractory period, e.g. A->B and separate B->C transitions. In a second scenario, we simulate replay of chained triplets, e.g. A->B->C, with a distance of 80 milliseconds each. Importantly, we kept the number of transitions constant (i.e., A->B, … B->C and where A->B->C would both have 2 transitions. This creates a context wherein a four-minute resting state would have ~100 events of A->B->C inserted and ~200 events of A->B or B->C, such that in both cases this results in the same number of single step transitions. We found both are equivalent, with TDLM agnostic to the length of sequence trains, i.e., it does not matter if replay is chunked or chained under the assumption that the number of transitions remains fixed, as can be seen in Appendix A Figure 2”

      And the reference Figure description reads:

      “TDLM is invariant to the length of sequence replay trains under an assumption that the number of target transitions (e.g. single steps) is fixed. We simulated replay either as two temporally separate A->B, B->C events (light orange/green) or as a single A>B->C event (dark orange/green), both yielding equivalent sequenceness. Shaded areas denote the standard error across participants”

      For example, regarding sequence length, is it possible that TDLM would detect multiple parts of a longer sequence independently, meaning that the high density needed to detect replay is actually not quite so dense? (example: if 20 four-step sequences (A to B to C to D to E) were sampled by TDLM such that it recorded each transition separately, that would lead to a density of 80 events/min).

      Indeed, this is an interesting proposal. We intentionally kept our simulation close to the way previous simulations were set-up (i.e. Liu & Dolan et al 2021, Liu & Mattar 2021) by simulating one-step transitions and simulated them such that there is no overlap between separate events (e.g. by defining a refractory period). If the duration of replay is increased then we would also need to increase the length of the refractory period, resulting in a reduced upper limit of how much replay can occur in a 1-minute time window. This in turn would approximate roughly the same number of transitions that can be inserted into the resting state and, as detailed above, would yield the same results. Nevertheless, as we chose to use replay density and not transition density as a marker, the density would be reduced, even if the number of transitions stay the same. We have added an analysis using multi-step replay to the supplement and discuss its implications and caveats. In the main discussion we have added the following segment:

      “Similarly, in our simulation, for simplicity and to keep consistency with previousstimulations, we restricted replay events to span two reactivation events. While the characteristics of replay as measured by TDLM are unknown, it is conceivable that several steps can be replayed within one replay event. We show that the vanilla version of TDLM is fundamentally sensitive to the number of single-step transitions alone, and disregards if these are replayed chained or chunked (Appendix A.2 and Appendix A Figure 2). Nevertheless, if the number of reactivation events chained within a replay event increases, TDLMs sensitivity is increased relative to the replay density and thresholds are reached earlier (see Appendix A Figure 4). See Appendix A.4 for a simulation of multi-step replay events and our discussion of the caveats.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Please label the various significance thresholds in the legend of Figure 3.

      We have labelled all the thresholds in the figure legends.

      Reviewer #2 (Recommendations for the authors):

      I think that some of the clarity is hampered because there is a bit too much reliance on explanations from the previous paper using this task, which hampers clarity in the paper. For example, Figure 1 is not particularly useful for understanding the study in its current form; I found myself relying almost exclusively on Supplementary Figure 1 (which is from the previous paper). I'd recommend presenting some version of SF1 in the main text instead. Another example of this overreliance on the previous paper is that, as far as I can tell, the present paper never explicitly states which transitions are being tested in TDLM. In the prior work, it states "all allowable graph transitions", and so I assumed this was the same here, but the paper should standalone without having to go back to the other study. I'd recommend that the authors revise the paper in these and other places where the previous paper is mentioned.

      Thanks for raising this point! We were uncertain ourselves how to deal with the overlap in content and did not want to bloat the paper or plagiarize ourselves too much. On the advice of the referee have implemented the following to improve the manuscript and reduce a reliance on the previous paper:

      Supplement Figure 1 is indeed crucial to understanding the experiment. We have moved it to the methods section under Methods: Procedure

      Added more stimulus description to the Methods: Localizer section

      Included more details about the localizer and graph learning that were missing before

      We have added the note about which transitions we were looking for in the Methods section. Additionally, we have added this information to the Results section of Study 1.

      There are also a few typos I noticed:

      (1) Line 73: "during in the context of."

      (2) Line 287: " to exploring the."

      We fixed the typos.

      Reviewer #3 (Recommendations for the authors):

      (1) Why did the authors choose an 80ms state-to-state time lag for their simulation? I believe they should make the reason for this decision clear in the main text.

      Indeed, this point was also raised by the other reviewer. We have added a sentence to the main text about the rationale behind this decision:

      “We chose this timepoint (80 millisecond state-to-state lag) as its sequenceness value was close to zero in the baseline condition as well as being distant to the observed alpha rhythms of the participants (which varied between ~9-11 Hz). Additionally, we subtracted the mean sequenceness value of the baseline at 80 millisecond lag such that any simulation effects would, on average, start at zero sequenceness.“

      Additionally, we have added some further explanation to the Methods section.

      “This time lag (80 msec) was chosen in order to isolate precisely an effect of the experimentally inserted sequenceness. Thus, we chose a lag at which the mean baseline sequenceness was close to zero and where the correlation with behaviour was low. Additionally, we subtracted the mean sequenceness value (at 80 milliseconds) at baseline from the specific lag recorded for each participant, such that simulation effects would be initialized at zero sequenceness on average enabling any effects to be attributed purely to inserted replay. Additionally, we excluded time lags too close to the alpha rhythms of participants (which varied between ~9-11 Hz) or lags which would have a harmonic with the rhythm.“

      (2) Line 168: Can the authors define what these conservative and liberal criteria are in the text?

      We have added definitions of the criteria in the text. The text now reads:

      “[..] significance thresholds (conservative, i.e. the maximum sequenceness across all permutations and timepoints or liberal criteria, i.e. the 95% percentile of aforementioned sequenceness).”

      (3) Line 478: "calculate" instead of "calculated".

      (4) Figure 7 D: y-axis is labeled "70 ms" I believe it should be labeled 80 ms.

      Thanks, we fixed the two typos.

      (5) With replay defined as sequential reactivation at a compressed temporal timescale, many of the iEEG citations (lines 54-55) do not demonstrate replay (they show stimulus reinstatement or ripple activity, but not sequential replay). Replay studies in humans using intracranial methods have been mostly limited to those measuring single-unit activity, a good example being Vaz et al., 2020 (https://www.science.org/doi/10.1126/science.aba0672).

      We agree that, under a strict definition articulated by Genzel et al. that defines replay as sequential reactivation, many prior human iEEG studies are better described as stimulus reinstatement or ripple-related activity rather than true sequence replay. We have revised the text accordingly and now highlight the few intracranial microelectrode studies that demonstrate replay of firing sequences at the cellular/ensemble level in humans (Eichenlaub et al., 2020; Vaz et al., 2020), distinguishing these from macro-scale iEEG work providing indirect evidence alone.

      The revised paragraph now reads:

      “Replay has been shown using cellular recordings across a variety of mammalian model organisms (Hoffman & McNaughton, 2002; Lee & Wilson, 2002; Pavlides & Winson, 1989). Replay studies in humans using intracranial recordings are few, but include work demonstrating compressed replay of firing-pattern sequences in motor cortex during rest (Eichenlaub et al., 2020) as well as single-unit replay of trialspecific cortical spiking sequences during episodic retrieval (Vaz et al., 2020). By contrast, most iEEG studies report stimulus-specific reinstatement or ripple-locked activity changes without explicit demonstration of temporally compressed sequential replay (Axmacher et al., 2008; Staresina et al., 2015). As these methods are only applied under restricted clinical circumstances, such as during pre-operative neurosurgical assessments, this limits opportunities to investigate human replay. Therefore, this gives urgency to efforts aimed at developing novel methods to investigate human replay non-invasively.”

      (6) The expectations about replay frequency are grounded in literature on hippocampal replay sequences. However, MEG captures signals from across the entire brain, and the hippocampal contribution is likely relatively weak compared to all other signals. This raises an important question: is TDLM genuinely unable to detect replay at physiological (i.e., hippocampal) levels, or is it instead detecting a different form of sequential reactivation - possibly involving cortex or other regions - that may occur more frequently? More broadly, when we have evidence of replay from TDLM, do we believe it is the same thing as replay of CA1 place cell spiking sequences, as detected in rodents? Commenting on this distinction would help further develop theories of replay and what TDLM is measuring.

      This is indeed an important point that has garnered relatively little attention. While there is some evidence of a relation to hippocampal replay in form of high-frequency power increase in the hippocampus, ultimately it is not possible to know without intracranial recordings, as signal strength from those regions is rather poor in MEG.

      We have added the following segment to the manuscript that discusses these issues:

      “However, while we are using indices of SWRs as a proxy for replay density estimation, the relationship between hippocampal replay and replay detected by TDLM remains uncertain. While current decoding approaches measure replay-like phenomena on cortical sites, previous papers have reported a power increase in hippocampal areas coinciding with replay episodes as detected by TDLM. Nevertheless, it is conceivable that cortical replay found by TDLM could occur independently of hippocampal replay and SWRs and be generated by different mechanisms. Some TDLM-studies find a replay state-to-state time lag of above 100 ms, much slower than e.g. previously reported place cell replay. Future studies should employ simultaneous intracranial and cortical surface recordings to establish the relationship between hippocampal replay and replay found by TDLM.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zeng et al. have investigated the impact of inhibiting lactate dehydrogenase (LDH) on glycolysis and the tricarboxylic acid cycle. LDH is the terminal enzyme of aerobic glycolysis or fermentation that converts pyruvate and NADH to lactate and NAD+ and is essential for the fermentation pathway as it recycles NAD+ needed by upstream glyceraldehyde-3-phosphate dehydrogenase. As the authors point out in the introduction, multiple published reports have shown that inhibition of LDH in cancer cells typically leads to a switch from fermentative ATP production to respiratory ATP production (i.e., glucose uptake and lactate secretion are decreased, and oxygen consumption is increased). The presumed logic of this metabolic rearrangement is that when glycolytic ATP production is inhibited due to LDH inhibition, the cell switches to producing more ATP using respiration. This observation is similar to the well-established Crabtree and Pasteur effects, where cells switch between fermentation and respiration due to the availability of glucose and oxygen. Unexpectedly, the authors observed that inhibition of LDH led to inhibition of respiration and not activation as previously observed. The authors perform rigorous measurements of glycolysis and TCA cycle activity, demonstrating that under their experimental conditions, respiration is indeed inhibited. Given the large body of work reporting the opposite result, it is difficult to reconcile the reasons for the discrepancy. In this reviewer's opinion, a reason for the discrepancy may be that the authors performed their measurements 6 hours after inhibiting LDH. Six hours is a very long time for assessing the direct impact of a perturbation on metabolic pathway activity, which is regulated on a timescale of seconds to minutes. The observed effects are likely the result of a combination of many downstream responses that happen within 6 hours of inhibiting LDH that causes a large decrease in ATP production, inhibition of cell proliferation, and likely a range of stress responses, including gene expression changes.

      Strengths:

      The regulation of metabolic pathways is incompletely understood, and more research is needed, such as the one conducted here. The authors performed an impressive set of measurements of metabolite levels in response to inhibition of LDH using a combination of rigorous approaches.

      Weaknesses:

      Glycolysis, TCA cycle, and respiration are regulated on a timescale of seconds to minutes. The main weakness of this study is the long drug treatment time of 6 hours, which was chosen for all the experiments. In this reviewer's opinion, if the goal was to investigate the direct impact of LDH inhibition on glycolysis and the TCA cycle, most of the experiments should have been performed immediately after or within minutes of LDH inhibition. After 6 hours of inhibiting LDH and ATP production, cells undergo a whole range of responses, and most of the observed effects are likely indirect due to the many downstream effects of LDH and ATP production inhibition, such as decreased cell proliferation, decreased energy demand, activation of stress response pathways, etc.

      We thank reviewer for the careful reading of our manuscript, the accurate summary of the prevailing model, and the positive assessment of the rigor of our measurements. We agree that much prior literature reports increased oxygen consumption following LDH inhibition, and we recognize that our finding—coordinated suppression of glycolysis, the TCA cycle, and OXPHOS—differs from this prevailing interpretation. We address below the reviewer’s main concern regarding the 6-hour time point and clarify the conceptual scope of our study.

      (1) Scope: steady-state metabolic regulation versus immediate transient effects

      The reviewer raises an important point that many metabolic perturbations can trigger rapid, transient responses within seconds to minutes, whereas our measurements were performed after sustained LDH inhibition. We agree that very early time points would be required if the primary goal were to isolate the most immediate, proximal consequence of LDH inhibition before downstream propagation. However, the objective of our study is different: we aim to characterize the metabolic steady state re-established after sustained inhibition of LDH activity, because this adapted steady state is more relevant for understanding long-term metabolic consequences and therapeutic outcomes of LDH inhibition in cancer cells.

      (2) Genetic LDHA/LDHB knockout: comparison of two steady states

      A related point applies to the LDHA/LDHB knockout models. We fully agree that the knockout process necessarily involves a temporal perturbation during cell line generation and adaptation. Nevertheless, the experimental comparison in our study is explicitly between two steady states: the baseline steady state of control cells and the steady state achieved after stable genetic disruption of LDHA or LDHB. The observation that LDHA or LDHB knockout alone had minimal effects on glycolysis and respiration indicates that partial reduction of LDH activity can be compensated in a steady-state manner, consistent with the exceptionally high catalytic capacity of LDH in cancer cells relative to upstream rate-limiting enzymes.

      (3) LDH-activity-dependent quantitative relationships support stable metabolic states

      Importantly, our conclusions do not rely on a single inhibitor condition at a single time point. Rather, we established quantitative steady-state relationships between residual LDH activity and pathway behavior across a wide range of LDH inhibition. These LDH-activity-dependent data strongly support that the system resides in stable metabolic states at different degrees of LDH activity, rather than reflecting non-specific collapse due to prolonged stress.

      Specifically, we observed that when LDH activity was reduced from 100% to approximately ~9% (e.g., by genetic perturbation and partial pharmacologic inhibition), glucose consumption and lactate production remained essentially unchanged, indicating maintenance of a steady-state glycolytic flux despite substantial LDH inhibition. Only when LDH activity was further reduced below this threshold did glycolytic flux decrease in a graded manner, consistent with a nonlinear control structure (Figure 8 A & B)).

      Likewise, the isotope tracing results showed distinct LDH-activity-dependent transitions in TCA cycle labeling patterns. Over the range in which LDH activity decreased from 100% to ~9%, the [<sup>13</sup>C<sub>6</sub>]glucose-derived labeling pattern of citrate remained largely unchanged, whereas deeper inhibition led to a decrease in m2 citrate with a compensatory rise in higher-order citrate isotopologues, consistent with altered flux entry versus cycling/retention in the TCA cycle (Figure 8C). Similarly, [<sup>13</sup>C<sub>5</sub>]glutamine tracing revealed that deeper LDH inhibition reduced the direct m5 contribution, accompanied by corresponding shifts in other isotopologues (Figure 8D). These graded, quantitative transitions—rather than an abrupt global failure—support the interpretation of distinct metabolic steady states across LDH activity levels, linking LDH inhibition to changes in both glycolysis and mitochondrial metabolism.

      (4) Reconciling discrepancies with prior studies

      We agree that multiple prior studies have reported increased oxygen consumption or enhanced oxidative metabolism following LDH inhibition in cancer cells. However, we note that this prevailing notion often persists because LDH inhibition is frequently discussed by analogy to the classical Pasteur and Crabtree effects, in which cells toggle between fermentation and respiration depending on oxygen and glucose availability. We believe this analogy can be misleading.

      In the Pasteur effect, the metabolic shift is primarily driven by oxygen limitation, i.e., restriction of the terminal electron acceptor for the mitochondrial electron transport chain, which enforces reliance on fermentation. In the Crabtree effect, high glucose availability suppresses respiration through regulatory mechanisms while glycolysis is strongly activated. Both phenomena are fundamentally controlled by oxygen availability and respiratory capacity, rather than by inhibition of a specific cytosolic enzyme.

      By contrast, LDH inhibition is mechanistically distinct: it directly perturbs cytosolic redox recycling by limiting NADH-to-NAD<sup>+</sup> regeneration and can therefore constrain upstream glycolytic flux (particularly at GAPDH) and reshape pathway thermodynamics. Under conditions where LDH inhibition sufficiently limits effective NAD<sup>+</sup> availability and reduces glycolytic flux into pyruvate, the downstream consequence is reduced carbon input into the TCA cycle and suppressed OXPHOS—consistent with our experimental measurements. We therefore suggest that divergent outcomes reported across studies likely reflect differences in residual LDH activity, cell-type–specific metabolic wiring, and the extent to which glycolytic flux remains sustained versus becoming redox-limited upstream, rather than a universal Pasteur/Crabtree-like “switch” from fermentation to respiration. Accordingly, interpreting LDH inhibition as a Pasteur/Crabtree-like toggle may oversimplify the biochemical consequences of disrupting cytosolic NAD<sup>+</sup> regeneration.

      We have revised the Discussion to clarify this conceptual distinction and to avoid relying on comparisons that are not mechanistically equivalent to LDH inhibition.

      Reviewer #2 (Public Review):

      Summary:

      Zeng et al. investigated the role of LDH in determining the metabolic fate of pyruvate in HeLa and 4T1 cells. To do this, three broad perturbations were applied: knockout of two LDH isoforms (LDH-A and LDH-B), titration with a non-competitive LDH inhibitor (GNE-140), and exposure to either normoxic (21% O2) or hypoxic (1% O2) conditions. They show that knockout of either LDH isoform alone, though reducing both protein level and enzyme activity, has virtually no effect on either the incorporation of a stable 13C-label from a 13C6-glucose into any glycolytic or TCA cycle intermediate, nor on the measured intracellular concentrations of any glycolytic intermediate (Figure 2). The only apparent exception to this was the NADH/NAD+ ratio, measured as the ratio of F420/F480 emitted from a fluorescent tag (SoNar).

      The addition of a chemical inhibitor, on the other hand, did lead to changes in glycolytic flux, the concentrations of glycolytic intermediates, and in the NADH/NAD+ ratio (Figure 3). Notably, this was most evident in the LDH-B-knockout, in agreement with the increased sensitivity of LDH-A to GNE-140 (Figure 2). In the LDH-B-knockout, increasing concentrations of GNE-140 increased the NADH/NAD+ ratio, reduced glucose uptake, and lactate production, and led to an accumulation of glycolytic intermediates immediately upstream of GAPDH (GA3P, DHAP, and FBP) and a decrease in the product of GAPDH (3PG). They continue to show that this effect is even stronger in cells exposed to hypoxic conditions (Figure 4). They propose that a shift to thermodynamic unfavourability, initiated by an increased NADH/NAD+ ratio inhibiting GAPDH explains the cascade, calculating ΔG values that become progressively more endergonic at increasing inhibitor concentrations.

      Then - in two separate experiments - the authors track the incorporation of 13C into the intermediates of the TCA cycle from a 13C6-glucose and a 13C5-glutamine. They use the proportion of labelled intermediates as a proxy for how much pyruvate enters the TCA cycle (Figure 5). They conclude that the inhibition of LDH decreases fermentation, but also the TCA cycle and OXPHOS flux - and hence the flux of pyruvate to all of those pathways. Finally, they characterise the production of ATP from respiratory or fermentative routes, the concentration of a number of cofactors (ATP, ADP, AMP, NAD(P)H, NAD(P)+, and GSH/GSSG), the cell count, and cell viability under four conditions: with and without the highest inhibitor concentration, and at norm- and hypoxia. From this, they conclude that the inhibition of LDH inhibits the glycolysis, the TCA cycle, and OXPHOS simultaneously (Figure 7).

      Strengths:

      The authors present an impressively detailed set of measurements under a variety of conditions. It is clear that a huge effort was made to characterise the steady-state properties (metabolite concentrations, fluxes) as well as the partitioning of pyruvate between fermentation as opposed to the TCA cycle and OXPHOS.

      A couple of intermediary conclusions are well supported, with the hypothesis underlying the next measurement clearly following. For instance, the authors refer to literature reports that LDH activity is highly redundant in cancer cells (lines 108 - 144). They prove this point convincingly in Figure 1, showing that both the A- and B-isoforms of LDH can be knocked out without any noticeable changes in specific glucose consumption or lactate production flux, or, for that matter, in the rate at which any of the pathway intermediates are produced. Pyruvate incorporation into the TCA cycle and the oxygen consumption rate are also shown to be unaffected.

      They checked the specificity of the inhibitor and found good agreement between the inhibitory capacity of GNE-140 on the two isoforms of LDH and the glycolytic flux (lines 229 - 243). The authors also provide a logical interpretation of the first couple of consequences following LDH inhibition: an increased NADH/NAD+ ratio leading to the inhibition of GAPDH, causing upstream accumulations and downstream metabolite decreases (lines 348 - 355).

      Weaknesses:

      Despite the inarguable comprehensiveness of the data set, a number of conceptual shortcomings afflict the manuscript. First and foremost, reasoning is often not pursued to a logical conclusion. For instance, the accumulation of intermediates upstream of GAPDH is proffered as an explanation for the decreased flux through glycolysis. However, in Figure 3C it is clear that there is no accumulation of the intermediates upstream of PFK. It is unclear, therefore, how this traffic jam is propagated back to a decrease in glucose uptake. A possible explanation might lie with hexokinase and the decrease in ATP (and constant ADP) demonstrated in Figure 6B, but this link is not made.

      We appreciate the reviewer's critical comment. In Figure 3C, there is no accumulation of F6P or G6P, which are upstream of PFK1. This is because the PFK1-catalyzed reaction sets a significant thermodynamic barrier. Even with treatment using 30 μM GNE-140, the ∆G<sub>PFK1</sub> (Gibbs free energy of the PFK1-catalyzed reaction) remains -9.455 kJ/mol (Figure 3D), indicating that the reaction is still far from thermodynamic equilibrium, thereby preventing the accumulation of F6P and G6P.

      We agree with the reviewer that hexokinase inhibition may play a role, this requires further investigation.

      The obvious link between the NADH/NAD+ ratio and pyruvate dehydrogenase (PDH) is also never addressed, a mechanism that might explain how the pyruvate incorporation into the TCA cycle is impaired by the inhibition of LDH (the observation with which they start their discussion, lines 511 - 514).

      We agree with the reviewer’s comment. In this study, we did not explore how the inhibition of LDH affects pyruvate incorporation into the TCA cycle. As this mechanism was not investigated, we have titled the study:

      "Elucidating the Kinetic and Thermodynamic Insights into the Regulation of Glycolysis by Lactate Dehydrogenase and Its Impact on the Tricarboxylic Acid Cycle and Oxidative Phosphorylation in Cancer Cells."

      It was furthermore puzzling how the ΔG, calculated with intracellular metabolite concentrations (Figures 3 and 4) could be endergonic (positive) for PGAM at all conditions (also normoxic and without inhibitor). This would mean that under the conditions assayed, glycolysis would never flow completely forward. How any lactate or pyruvate is produced from glucose, is then unexplained.

      This issue also concerned me during the study. However, given the high reproducibility of the data, we consider it is true, but requires explanation. The PGAM-catalyzed reaction is tightly linked to both upstream and downstream reactions in the glycolytic pathway. In glycolysis, three key reactions catalyzed by HK2, PFK1, and PK are highly exergonic, providing the driving force for the conversion of glucose to pyruvate. The other reactions, including the one catalyzed by PGAM, operate near thermodynamic equilibrium and primarily serve to equilibrate glycolytic intermediates rather than control the overall direction of glycolysis, as previously described by us (J Biol Chem. 2024 Aug8;300(9):107648).

      The endergonic nature of the PGAM-catalyzed reaction does not prevent it from proceeding in the forward direction. Instead, the directionality of the pathway is dictated by the exergonic reaction of PFK1 upstream, which pushes the flux forward, and by PK downstream, which pulls the flux through the pathway. The combined effects of PFK1 and PK may account for the observed endergonic state of the PGAM reaction.

      However, if the PGAM-catalyzed reaction were isolated from the glycolytic pathway, it would tend toward equilibrium and never surpass it, as there would be no driving force to move the reaction forward.

      Finally, the interpretation of the label incorporation data is rather unconvincing. The authors observe an increasing labelled fraction of TCA cycle intermediates as a function of increasing inhibitor concentration. Strangely, they conclude that less labelled pyruvate enters the TCA cycle while simultaneously less labelled intermediates exit the TCA cycle pool, leading to increased labelling of this pool. The reasoning that they present for this (decreased m2 fraction as a function of DHE-140 concentration) is by no means a consistent or striking feature of their titration data and comes across as rather unconvincing. Yet they treat this anomaly as resolved in the discussion that follows.

      GNE-140 treatment increased the labeling of TCA cycle intermediates by [<sup>13</sup>C<sub>6</sub>]glucose but decreased the OXPHOS rate, we consider the conflicting results as an 'anomaly' that warrants further explanation. To address this, we analyzed the labeling pattern of TCA cycle intermediates using both [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>]glutamine. Tracing the incorporation of glucose- and glutamine-derived carbons into the TCA cycle suggests that LDH inhibition leads to a reduced flux of glucose-derived acetyl-CoA into the TCA cycle, coupled with a decreased flux of glutamine-derived α-KG, and a reduction in the efflux of intermediates from the cycle. These results align with theoretical predictions. Under any condition, the reactions that distribute TCA cycle intermediates to other pathways must be balanced by those that replenish them. In the GNE-140 treatment group, the entry of glutamine-derived carbon into the TCA cycle was reduced, implying that glucose-derived carbon (as acetyl-CoA) entering the TCA cycle must also be reduced, or vice versa.

      This step-by-step investigation is detailed under the subheading "The Effect of LDHB KO and GNE-140 on the Contribution of Glucose Carbon to the TCA Cycle and OXPHOS" in the Results section in the manuscript.

      In the Discussion, we emphasize that caution should be exercised when interpreting isotope tracing data. In this study, treatment of cells with GNE-140 led to an increase labeling percentage of TCA cycle intermediates by [<sup>13</sup>C<sub>6</sub>]glucose (Figure 5A-E). However, this does not necessarily imply an increase in glucose carbon flux into TCA cycle; rather, it indicates a reduction in both the flux of glucose carbon into TCA cycle and the flux of intermediates leaving TCA cycle. When interpreting the data, multiple factors must be considered, including the carbon-13 labeling pattern of the intermediates (m1, m2, m3, ---) (Figure 5G-K), replenishment of intermediates by glutamine (Figure 5M-V), and mitochondrial oxygen consumption rate (Figure 5W). All these factors should be taken into account to derive a proper interpretation of the data.

      Reviewer #3 (Public Review):

      Hu et al in their manuscript attempt to interrogate the interplay between glycolysis, TCA activity, and OXPHOS using LDHA/B knockouts as well as LDH-specific inhibitors. Before I discuss the specifics, I have a few issues with the overall manuscript. First of all, based on numerous previous studies it is well established that glycolysis inhibition or forcing pyruvate into the TCA cycle (studies with PDKs inhibitors) leads to upregulation of TCA cycle activity, and OXPHOS, activation of glutaminolysis, etc (in this work authors claim that lowered glycolysis leads to lower levels of TCA activity/OXPHOS). The authors in the current work completely ignore recent studies that suggest that lactate itself is an important signaling metabolite that can modulate metabolism (actual mechanistic insights were recently presented by at least two groups (Thompson, Chouchani labs). In addition, extensive effort was dedicated to understanding the crosstalk between glycolysis/TCA cycle/OXPHOS using metabolic models (Titov, Rabinowitz labs). I have several comments on how experiments were performed. In the Methods section, it is stated that both HeLa and 4T1 cells were grown in RPMI-1640 medium with regular serum - but under these conditions, pyruvate is certainly present in the medium - this can easily complicate/invalidate some findings presented in this manuscript. In LDH enzymatic assays as described with cell homogenates controls were not explained or presented (a lot of enzymes in the homogenate can react with NADH!). One of the major issues I have is that glycolytic intermediates were measured in multiple enzyme-coupled assays. Although one might think it is a good approach to have quantitative numbers for each metabolite, the way it was done is that cell homogenates (potentially with still traces of activity of multiple glycolytic enzymes) were incubated with various combinations of the SAME enzymes and substrates they were supposed to measure as a part of the enzyme-based cycling reaction. I would prefer to see a comparison between numbers obtained in enzyme-based assays with GC-MS/LC-MS experiments (using calibration curves for respective metabolites, of course). Correct measurements of these metabolites are crucial especially when thermodynamic parameters for respective reactions are calculated. Concentrations of multiple graphs (Figure 1g etc.) are in "mM", I do not think that this is correct.

      We thank the reviewer’s comment and the following are clarification of the conceptual framework, the quantitative methodology, and the experimental basis supporting our conclusions.

      (1) “It is well established that glycolysis inhibition or forcing pyruvate into the TCA cycle… leads to upregulation of TCA/OXPHOS… (authors claim lowered glycolysis leads to lower TCA/OXPHOS)”

      This framing is not accurate in the context of our study. PDK inhibition and LDH inhibition are fundamentally different perturbations. PDK inhibition directly promotes mitochondrial pyruvate oxidation by enabling PDH flux, whereas LDH inhibition primarily perturbs cytosolic redox balance (free NADH/NAD<sup>+</sup>) and thereby constrains upstream glycolytic reactions, particularly the GAPDH step. Therefore, the metabolic outcomes of these interventions are not expected to be identical and should not be treated as interchangeable.

      Importantly, we do not “ignore” prior studies proposing increased OXPHOS after LDH inhibition; we explicitly cite and summarize this prevailing interpretation in the Introduction. Our study was motivated precisely because this interpretation does not resolve key quantitative inconsistencies, including (i) the large mismatch between glycolytic flux and mitochondrial oxidative capacity, and (ii) the exceptionally high catalytic capacity of LDH relative to upstream rate-limiting glycolytic enzymes. These constraints raise a mechanistic question: how does LDH inhibition actually suppress glycolytic flux in intact cancer cells, and what are the consequences for TCA cycle and OXPHOS?

      Our central contribution is the identification of a biochemical mechanism supported by integrated measurements of fluxes, metabolite concentrations, redox state, and reaction thermodynamics: LDH inhibition increases free NADH/NAD<sup>+</sup>, decreases free NAD<sup>+</sup> availability, inhibits GAPDH, drives accumulation/depletion patterns in glycolytic intermediates, shifts Gibbs free energies of near-equilibrium reactions (PFK1–PGAM segment), suppresses pyruvate production, and consequently reduces carbon input into TCA cycle and OXPHOS. These analyses are not provided by most prior work and directly address the mechanistic gap.

      (2) Lactate signaling (Thompson/Chouchani) and metabolic modeling (Titov/Rabinowitz)

      These research directions are valuable, but they address questions that are different from the one investigated here. Our manuscript focuses on steady-state biochemical control of metabolic flux by LDH inhibition through redox-linked kinetics and pathway thermodynamics.

      (3) Pyruvate in RPMI

      Pyruvate in standard medium does not invalidate our conclusions. All experimental comparisons were performed under identical conditions across groups, and the major conclusions rely on orthogonal measurements including glycolytic flux (glucose consumption/lactate production), OCR profiling, and isotope tracing with [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>] glutamine, which directly quantify carbon entry into lactate and TCA cycle intermediates. These tracer-based results are not confounded by unlabeled extracellular pyruvate in a way that would reverse the mechanistic conclusions.

      (4) LDH activity assay in homogenates and “many enzymes can react with NADH”

      This concern is overstated. In the LDH assay, substrates are pyruvate + NADH, and the measured signal reflects NADH oxidation coupled to pyruvate reduction. In cell lysates, LDH is uniquely abundant and catalytically efficient for this reaction pair, and the inhibitor-response behavior matches the known LDHA/LDHB selectivity of GNE-140 and the cellular phenotypes. Thus, the assay is mechanistically specific in this context.

      (5) Enzyme-coupled metabolite assays and request for LC–MS validation

      The reviewer’s implication that enzyme-coupled assays are intrinsically unreliable is incorrect. Enzymatic cycling assays are a widely used quantitative approach when performed with proper specificity and calibration, and they are particularly useful for labile glycolytic intermediates that are challenging to quantify reproducibly by MS without specialized quenching, derivatization, and isotope dilution standards.

      We agree that MS-based quantification is valuable, and we have developed LC–MS methods for selected metabolites. However, absolute quantification of these intermediates remains technically difficult due to the inherent limitation of this method and, in our hands, did not provide uniformly robust performance for all intermediates required for thermodynamic analysis.

      (6) Units (“mM”)

      The metabolite concentration units are correct.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      If the goal is to investigate the direct impact of LDH inhibition, then in my opinion, most of these experiments need to be repeated at a very early time point immediately after or a few minutes after LDH inhibition. I understand that this is a tremendous amount of work that the authors might not want to pursue. I do want to highlight that the quality of the experiments performed in this work is impressive. I hope the authors continue investigating this subject and look forward to reading their future manuscripts on this topic.

      We thank the reviewer for this thoughtful and constructive comment and for the positive assessment of the experimental quality of our work.

      We fully agree that measurements at very early time points after LDH inhibition would be required if the goal were to isolate an immediate, proximal molecular event occurring before downstream propagation. However, the primary objective of our study is not to dissect a single instantaneous biochemical consequence of LDH inhibition, but rather to characterize the metabolic steady state that is re-established after sustained suppression of LDH activity, which we believe is more relevant for understanding the long-term metabolic and therapeutic consequences of LDH inhibition in cancer cells.

      (1) Scope: steady-state metabolic regulation versus immediate transient effects

      The reviewer raises an important point that many metabolic perturbations can trigger rapid, transient responses within seconds to minutes, whereas our measurements were performed after sustained LDH inhibition. We agree that very early time points would be required if the primary goal were to isolate the most immediate, proximal consequence of LDH inhibition before downstream propagation. However, the objective of our study is different: we aim to characterize the metabolic steady state re-established after sustained inhibition of LDH activity, because this adapted steady state is more relevant for understanding long-term metabolic consequences and therapeutic outcomes of LDH inhibition in cancer cells.

      (2) Genetic LDHA/LDHB knockout: comparison of two steady states

      A related point applies to the LDHA/LDHB knockout models. We fully agree that the knockout process necessarily involves a temporal perturbation during cell line generation and adaptation. Nevertheless, the experimental comparison in our study is explicitly between two steady states: the baseline steady state of control cells and the steady state achieved after stable genetic disruption of LDHA or LDHB. The observation that LDHA or LDHB knockout alone had minimal effects on glycolysis and respiration indicates that partial reduction of LDH activity can be compensated in a steady-state manner, consistent with the exceptionally high catalytic capacity of LDH in cancer cells relative to upstream rate-limiting enzymes.

      (3) LDH-activity-dependent quantitative relationships support stable metabolic states

      Importantly, our conclusions do not rely on a single inhibitor condition at a single time point. Rather, we established quantitative steady-state relationships between residual LDH activity and pathway behavior across a wide range of LDH inhibition. These LDH-activity-dependent data strongly support that the system resides in stable metabolic states at different degrees of LDH activity, rather than reflecting non-specific collapse due to prolonged stress.

      Specifically, we observed that when LDH activity was reduced from 100% to approximately ~9% (e.g., by genetic perturbation and partial pharmacologic inhibition), glucose consumption and lactate production remained essentially unchanged, indicating maintenance of a steady-state glycolytic flux despite substantial LDH inhibition. Only when LDH activity was further reduced below this threshold did glycolytic flux decrease in a graded manner, consistent with a nonlinear control structure.

      Likewise, the isotope tracing results showed distinct LDH-activity-dependent transitions in TCA cycle labeling patterns. Over the range in which LDH activity decreased from 100% to ~9%, the [<sup>13</sup>C<sub>6</sub>]glucose-derived labeling pattern of citrate remained largely unchanged, whereas deeper inhibition led to a decrease in m2 citrate with a compensatory rise in higher-order citrate isotopologues, consistent with altered flux entry versus cycling/retention in the TCA cycle. Similarly, [<sup>13</sup>C<sub>5</sub>]glutamine tracing revealed that deeper LDH inhibition reduced the direct m5 contribution, accompanied by corresponding shifts in other isotopologues. These graded, quantitative transitions—rather than an abrupt global failure—support the interpretation of distinct metabolic steady states across LDH activity levels, linking LDH inhibition to changes in both glycolysis and mitochondrial metabolism.

      Reviewer #2 (Recommendations For The Authors):

      All in all, the authors would benefit from collaboration with a group more well-versed in quantitative aspects of metabolism (such as Metabolic Control Analysis) and modelling methods (such as flux analysis) to boost the interpretation and impact of their really nice data set.

      We sincerely thank the reviewer for this insightful and constructive suggestion. We fully agree that collaboration with groups specializing in quantitative metabolic analysis, such as Metabolic Control Analysis and flux modeling, would further expand the interpretative depth and broader impact of this work.

      The primary objective of the present work, however, was not to construct a global mathematical model, but to experimentally dissect the biochemical mechanism by which LDH inhibition coordinately suppresses glycolysis, the TCA cycle, and OXPHOS, integrating enzyme kinetics with thermodynamic constraints at steady state. Within this scope, we focused on experimentally demonstrable relationships between LDH activity, redox balance, GAPDH perturbation, thermodynamic shifts in near-equilibrium reactions, and emergent flux suppression.

      We fully recognize the power of MCA and related modeling approaches in formalizing control coefficients and system-level sensitivities, and we view our dataset as particularly well suited to support such future analyses. We therefore see this work as providing a robust experimental platform upon which more comprehensive quantitative modeling can be built, either in future studies or through collaboration with specialists in metabolic modeling.

      Reviewer #3 (Recommendations For The Authors):

      We sincerely thank the reviewer for the important suggestions.

      (1) I strongly disagree that "regulation of glycolytic flux".. "remained largely unexplored.”

      Our original wording was meant to emphasize not the absence of prior work on glycolytic flux regulation, but rather that the specific biochemical mechanism by which LDH regulates glycolytic flux—particularly through the integrated effects of enzyme kinetics, redox balance, and thermodynamic constraints within the pathway—has not been fully elucidated.

      To avoid any ambiguity or overstatement, we have revised the relevant text to more precisely reflect this intent. The revised wording now reads:

      “This study elucidates a biochemical mechanism by which lactate dehydrogenase influences glycolytic flux in cancer cells, revealing a kinetic–thermodynamic interplay that contributes to metabolic regulation.”

      We believe this revised phrasing more accurately acknowledges prior work while clearly defining the specific mechanistic contribution of the present study.

      (2) Very confusing in the Introduction section: "If LDH is inhibited at the LDH step..”

      We sincerely thank the reviewer for pointing out the potential confusion caused by the phrase “If LDH is inhibited at the LDH step” in the Introduction.

      Our intention was to contrast two conceptual models of LDH inhibition. The first is the conventional view, in which the effect of LDH inhibition is assumed to be confined to the LDH-catalyzed reaction itself, leading primarily to local accumulation of pyruvate and its redirection toward mitochondrial metabolism. The second, which is supported by our data, is that LDH inhibition initiates a system-wide biochemical response, perturbing redox balance, upstream enzyme kinetics, and the thermodynamic state of the glycolytic pathway, ultimately resulting in coordinated suppression of glycolysis, the TCA cycle, and OXPHOS.

      We agree that the original phrasing was ambiguous and potentially misleading. To improve clarity, we have revised the text as follows:

      “If the effect of LDH inhibition were confined solely to its catalytic step…”

      (3) The entire introduction part when the authors attempt to explain how decreased glycolysis will lead to decreased mitochondrial respiration is confusing.

      We would like to clarify that the Introduction does not attempt to explain how decreased glycolysis leads to decreased mitochondrial respiration. Rather, the final paragraph of the Introduction is intended to highlight an unresolved conceptual inconsistency in the existing literature and to motivate the central question addressed in this study.

      Specifically, we summarize the prevailing view that LDH inhibition redirects pyruvate toward mitochondrial metabolism and enhances oxidative phosphorylation, and then point out that this interpretation is difficult to reconcile with quantitative considerations, such as the large disparity between glycolytic and mitochondrial flux capacities and the excess catalytic activity of LDH relative to upstream glycolytic enzymes. These observations are presented to emphasize that the biochemical mechanism linking LDH inhibition to changes in glycolysis and mitochondrial respiration has not been fully resolved.

      Importantly, the Introduction does not propose a mechanistic explanation for the observed suppression of mitochondrial respiration; rather, it poses this as an open question, which is then systematically addressed through experimental analysis in the Results section.

      (4) Line 144: "which is 81(HeLa-LDHAKO) -297(HeLa-Ctrl) times"- here and in many other places wording is confusing to the reader.

      Our intention was to emphasize the significant redundancy of LDH activity relative to hexokinase (HK), the first rate-limiting enzyme in the glycolysis pathway, in cancer cells.

      Specifically, we wanted to express that in HeLa-Ctrl cells, the total LDH activity is 297 times that of HK activity; while in HeLa-LDHAKO cells, although the total LDH activity decreased, it was still 81 times that of HK activity. This data comes from supplement Table 1 in the paper and aims to provide quantitative evidence for "why knocking out LDHA or LDHB alone is insufficient to significantly affect glycolysis flux," because the remaining LDH activity is still far higher than the HK activity at the pathway entrance, sufficient to maintain flux.

      Based on your suggestion, we rewrite it in the revised draft with a more specific statement: "...the total activity of LDH in HeLa cells is very high, which is 297-fold higher than the first rate-limiting enzyme HK activity in HeLa-Ctrl cells and 81-fold higher in HeLa-LDHAKO cells.”

      (5) Line 153: "in the following four aspects:"- but what are these aspects, the text below has no corresponding subtitles, etc.

      Our intention was to indicate that after LDHA or LDHB knockout alone failed to affect the glycolysis rate, we further explored its potential impact on the glycolytic pathway from four deeper perspectives: the glucose carbon to pyruvate and lactate, the glucose carbon to subsidiary branches of glycolysis, the concentration of glycolytic intermediates and the thermodynamic state of the pathway, and the redox state of cytosolic free NADH/NAD<sup>+</sup>.

      Following your valuable suggestion, we have now added the aforementioned clear subtitles to these four aspects in the revised manuscript.

      (6) Lines 193, another example of the very confusing statement: "The results suggested that the loss of total LDH concentration was compensated.."

      The actual catalytic activity (reaction rate) of LDH is determined by both its enzyme concentration and substrate concentration (pyruvate and NADH). When the total LDH protein concentration (enzyme amount) in the cell is reduced through gene knockout, the reaction equilibrium is disrupted. To maintain sufficient lactate production flux to support a high glycolysis rate, the cell compensates by increasing the concentration of one of the substrates—free NADH (as shown in Figure 1I). This results in an increased substrate concentration, despite a reduction in the amount of enzyme, thus partially maintaining the overall reaction rate.

      We have revised the original statement to more accurately describe this kinetic equilibrium process: "The decrease in total LDH concentration was counterbalanced by a concomitant increase in the concentration of its substrate, free NADH, thereby maintaining the reaction velocity.”

      (7) Line 222-223: "did not or marginally significantly affect....”

      Our intention is to reflect the complexity of the data in Figure 1. Specifically: Regarding "did not affect": This means that there were no statistically significant differences in most key parameters, such as glycolytic flux (glucose consumption rate, lactate production rate). Regarding "or marginally significantly affected": This means that in a few indicators, although statistical calculations showed p-values less than 0.05, the absolute value of the difference was very small, with limited biological significance.

      To clarify this, we rewrite it as: "...did not significantly affect glucose-derived pyruvate entering into TCA cycle, neither significantly affect mitochondrial respiration, although statistically significant but minimal changes were observed in a few specific parameters (e.g., m3-pyruvate% in medium).”

      (8) It is very confusing to use the same colors for three GNE-140 drug concentrations (Figure 2a-b) and for 3 different cell lines right next to each other (Figure 2c-d).

      The figures have been revised accordingly.

      (9) Lines 263-273: nothing is new here as oxidized NAD+ is required for run glycolysis and LDH inhibition/KO leads to a high NADH/NAD+ ratio; Also below it is well known that reductive stress blocks serine biosynthesis;

      It is well established that oxidized NAD<sup>+</sup> is required for glycolysis, that LDH inhibition or knockout increases the NADH/NAD<sup>+</sup> ratio, and that reductive stress can suppress serine biosynthesis. We did not intend to present these observations as novel.

      The key point of this section is not the qualitative requirement of NAD<sup>+</sup> for GAPDH, but rather the mechanistic alignment between LDH inhibition, changes in free NAD<sup>+</sup> availability, and the emergence of GAPDH as a flux-controlling step within the glycolytic pathway under steady-state conditions. Previous studies have largely treated the increase in NADH/NAD<sup>+</sup> following LDH inhibition as a correlative or downstream effect, without directly demonstrating how this redox shift quantitatively propagates upstream to reorganize glycolytic flux distribution and thermodynamic driving forces.

      In our study, we explicitly link LDH inhibition to (i) an increase in free NADH/NAD<sup>+</sup> ratio, (ii) inhibition of GAPDH activity in intact cells, (iii) accumulation of upstream glycolytic intermediates, (iv) suppression of serine biosynthesis from 3-phosphoglycerate, and critically, (v) coordinated shifts in the Gibbs free energies of reactions between PFK1 and PGAM. This integrated kinetic–thermodynamic framework goes beyond the established qualitative understanding of NAD<sup>+</sup> dependence and provides a pathway-level mechanism by which LDH activity controls glycolytic flux.

      (10) Lines 368-370: "... we reached an alternative interpretation of the data.."- does not provide much confidence.

      Our intention was to prudently emphasize that we proposed a new interpretation based on detailed data, differing from conventional views. Our interpretation is grounded in key and consistent evidence from dual isotope tracing experiments using [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>]glutamine: The [<sup>13</sup>C<sub>6</sub>]glucose tracing data: the labeling pattern of citrate, the starting product of TCA cycle, showed a significant decrease in m+2 %. This directly reflects a reduction in the flux of newly generated acetyl-CoA from glucose entering the TCA cycle. Simultaneously, the sum of other isotopologues % (m+1/ m+3/ m+4/m+5/m+6) increased, indicating a longer retention time of the labeled carbon in the cycle, implying a simultaneous decrease in the flux of cycle intermediates effluxed for biosynthesis. [<sup>13</sup>C<sub>5</sub>]Glutamine tracing data: the labeling pattern of α-ketoglutarate showed a decrease in m+5 %, indicating a reduction in glutamine replenishment flux. The pattern of change in the total percentage of other isotopologues % (m+1/ m+2/ m+3/m+4) also supports the conclusion of reduced intermediate product efflux.

      These two sets of data corroborate each other, pointing to a unified conclusion: LDH inhibition not only reduces carbon source inflow into the TCA cycle but also decreases intermediate product efflux, leading to a decrease in overall cycle activity. Therefore, our "alternative interpretation" is a well-supported and more consistent explanation of our overall experimental results. We revise the original wording to: "Integrated analysis of dual isotope tracing data demonstrates that LDH inhibition reduces both influx and efflux of the TCA cycle..."

      (11) Lines 418-421: This entire discussion on how TCA cycle activity is decreased upon LDH inhibition is very confusing. I also would like to see these tracer studies when ETC is inhibited with different inhibitors.

      We would like to clarify that the mitochondrial respiration rate data presented in Figure 5W are based on studies using different ETC inhibitors, and the cell treatment conditions (including culture time, etc.) for these oxygen consumption measurements are consistent with the conditions for the [<sup>13</sup>C<sub>6</sub>]glucose and [<sup>13</sup>C<sub>5</sub>]glutamine isotope tracing experiments (Figure 5A-V). Therefore, the changes in TCA cycle flux revealed by the tracing data and the inhibition of OXPHOS rate shown by the respiration measurements are mutually corroborating evidence from the same experimental conditions.

      (12) Figure 6F, G - very limited representation of growth curves, why not perform these experiments with all corresponding cell lines and over multiple days. Especially since proliferation arrest vs cell death was implicated.

      We have provided the growth curves of the HeLa-Ctrl and HeLa-LDHAKO cell lines under the corresponding treatments in Figure 6—figure supplement 1, as a supplement to Figure 6F, G (HeLa-LDHBKO cells). The choice of 48 hours as the cutoff observation point is based on clear biological evidence: under the stress of hypoxia (1% O<sub>2</sub>) combined with GNE-140 treatment, HeLa-LDHBKO cells experienced substantial death within 24 to 48 hours, at which point the differences in the growth curves were already very significant.

      (13) Move most of the Supplementary tables into an Excel file - so values can be easily accessed.

      We have compiled the tables into an Excel file and submitted it along with the revised manuscript as supplementary material.

      (14) Consider changing colors to more appealing- especially jarring is a bright blue, red, black combination on many bar graphs.

      We have adjusted the color scheme of the figures (especially the bar graphs) in the paper, and have submitted them with the revised manuscript.

      (15) Double check y-axis on multiple graphs it says "mM".

      We have checked y-axis, the unit (mM) is correct.

      (16) Instead TCA cycle use the TCA cycle.

      In the revised manuscript, TCA cycle is used.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34<sup>+</sup>Sca-1<sup>+</sup> dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns or comments.

      We sincerely thank the reviewer for the positive evaluation of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The present manuscript of Xu et al. reports a novel clearing and imaging method focusing on the liver. The Authors simultaneously visualized the portal vein, hepatic artery, central vein, and bile duct systems by injected metal compound nanoparticles (MCNPs) with different colors into the portal vein, heart left ventricle, vena cava inferior and the extrahepatic bile duct, respectively. The method involves: trans-cardiac perfusion with 4% PFA, the injection of MCNPs with different colors, clearing with the modified CUBIC method, cutting 200 micrometer thick slices by vibratome, and then microscopic imaging. The Authors also perform various immunostaining (DAB or TSA signal amplification methods) on the tissue slices from MCNP-perfused tissue blocks. With the application of this methodical approach, the Authors report dense and very fine vascular branches along the portal vein. The authors name them as 'periportal lamellar complex (PLC)' and report that PLC fine branches are directly connected to the sinusoids. The authors also claim that these structures co-localize with terminal bile duct branches and sympathetic nerve fibers and contain endothelial cells with a distinct gene expression profile. Finally, the authors claim that PLC-s proliferate in liver fibrosis (CCl4 model) and act as scaffold for proliferating bile ducts in ductular reaction and for ectopic parenchymal sympathetic nerve sprouting.

      Strengths:

      The simultaneous visualization of different hepatic vascular compartments and their combination with immunostaining is a potentially interesting novel methodological approach.

      Weaknesses:

      This reviewer has some concerns about the validity of the microscopic/morphological findings as well as the transcriptomics results, and suggests that the conclusions of the paper may be critically viewed. Namely, at this point, it is still not fully clear that the 'periportal lamellar complex (PLC)' that the Authors describe really exists as a distinct anatomical or functional unit or these are fine portal branches that connect the larger portal veins into the adjacent sinusoid. Also, in my opinion, to identify the molecular characteristics of such small and spatially highly organized structures like those fine radial portal branches, the only way is to perform high-resolution spatial transcriptomics (instead of data mining in existing liver single cell database and performing Venn diagram intersection analysis in hepatic endothelial subpopulations). Yet, the existence of such structures with a distinct molecular profile cannot be excluded. Further research with advanced imaging and omics techniques (such as high resolution volume imaging, and spatial transcriptomics/proteomics) are needed to reproduce these initial findings.

      We thank the reviewer for the thoughtful and constructive comments. In response to the reviewer’s concerns regarding the anatomical and molecular definition of the periportal lamellar complex (PLC), we have further clarified the scope and methodological boundaries of the present study in the revised manuscript.

      Regarding the key question raised by the reviewer—namely, whether the PLC represents an independent anatomical or functional unit, or merely small portal venous branches connecting larger portal veins to adjacent sinusoids—we provide below a more detailed explanation of the criteria used to define the PLC in this study. The identification of the PLC is primarily based on periportal structures that can be reproducibly recognized by three-dimensional imaging across multiple mice, exhibiting a relatively consistent spatial distribution within the periportal region. The PLC could be stably observed across different MCNP dye color assignments and independent experimental batches. In addition, three-dimensional CD31 immunofluorescence consistently revealed vascular-associated signal distributions in the same periportal region, indirectly supporting its spatial association with the periportal vascular system.

      At the morphological level, the PLC appears as a periportal vasculature-associated structure distributed around the main portal vein trunk and maintains a relatively consistent spatial proximity to portal veins, bile ducts, and neural components in three-dimensional space. This highly conserved spatial organization across multiple tissue systems supports the anatomical positioning of the PLC as a relatively distinct structural tissue unit within the periportal region.

      The present study primarily focuses on a descriptive characterization of the three-dimensional anatomical organization and spatial relationships of the PLC based on volumetric imaging and vascular labeling strategies. As a complementary exploratory analysis, we reanalyzed endothelial cell populations potentially associated with the PLC using existing liver single-cell transcriptomic datasets. This analysis was intended to provide molecular-level information consistent with the structural observations and to offer preliminary clues to its potential biological functions, rather than to independently define the PLC at the spatial level or to functionally validate it.

      We fully acknowledge the value of spatial transcriptomic and spatial proteomic technologies in revealing molecular heterogeneity within tissue architecture. However, under current technical conditions, these approaches are largely dependent on thin tissue sections and are limited by spatial resolution and signal mixing effects, which still pose challenges for resolving periportal structures with pronounced three-dimensional continuity, such as the PLC. In the future, further integration of high-resolution volumetric imaging with spatial omics technologies may enable a more refined understanding of the molecular features and potential functions of the PLC at higher spatial resolution.

      Reviewer #3 (Public review):

      Summary:

      In the revised version of the manuscript authors addressed multiple comments, clarifying especially the methodological part of their work and PLC identification as a novel morphological feature of the adult liver portal veins. Tet is now also much clearer and has better flow.

      The additional assessment of the smartSeq2 data from Pietilä et al., 2025 strengthens the transcriptomic profiling of the CD34+Sca1+ cells and the discussion of the possible implications for the liver homeostasis and injury response. Why it may suffer from similar bias as other scRNA seq datasets - multiple cell fate signatures arising from mRNA contamination from proximal cells during dissociation, it is less likely that this would happen to yield so similar results.

      Nevertheless, a more thorough assessment by functional experimental approaches is needed to decipher the functional molecules and definite protein markers before establishing the PLC as the key hub governing the activity of biliary, arterial, and neuronal liver systems.

      The work does bring a clear new insight into the liver structure and functional units and greatly improves the methodological toolbox to study it even further, and thus fully deserves the attention of the Elife readers.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.

      This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - the Periportal Lamellar Complexes (PLCs).

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell subpopulation for PLC formation and function was not tested and warrants further validation.

      We thank the reviewer for the careful and constructive comments regarding the functional validation of cell populations associated with the PLC. The central aim of this study is to establish and validate a novel volumetric imaging and vascular labeling strategy and to apply it to the periportal region of the liver, thereby revealing previously underappreciated structural organizational patterns at the three-dimensional level, rather than to perform a systematic functional validation of specific cellular subpopulations.

      We agree that the precise roles of the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cell subpopulation in the formation and function of the periportal lamellar complex (PLC) have not been directly addressed through functional intervention experiments in the present study. Our conclusions are primarily based on three-dimensional imaging and spatial distribution analyses, which reveal a stable and consistent spatial association between this cell population and the PLC structure, but are not intended to independently support causal or functional inferences. The underlying functional mechanisms remain to be elucidated in future studies using genetic or functional perturbation approaches.

      In light of these considerations, we have further refined the relevant statements in the revised manuscript to more clearly define the functional scope and limitations of the current study in the Discussion section, and to avoid functional interpretations that extend beyond the direct support of the data. At the same time, we consider functional validation of the PLC to be an important and promising direction for future investigation.

      It should be emphasized that the present study is not primarily designed to provide direct functional validation, but rather to systematically characterize the three-dimensional structural features of the periportal lamellar complex (PLC) and its cellular associations using volumetric imaging and vascular labeling approaches. At this stage, we mainly provide spatial and histological evidence for the organizational relationship between the PLC structure and the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cell population, while their specific roles in PLC formation and functional regulation await further investigation.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I highly appreciate the Authors' endeavors to improve the manuscript. I am enlisting those points (from my original review) where I still have further comments.

      (2) I would suggest this sentence:

      "...the liver has evolved a highly complex and densely organized ductal vascular-neuronal network in the body, consisting primarily of the portal vein system, central vein system, hepatic artery system, biliary system, and intrahepatic autonomic nerve network [6, 7]."

      We thank the reviewer for the valuable suggestion. We have revised the relevant sentence accordingly, and the revised wording is as follows:

      “The liver has evolved a highly complex and densely organized vascular–biliary–neural network, primarily composed of the portal venous system, central venous system, hepatic arterial system, biliary system, and the intrahepatic autonomic neural network.”

      (3) I suggest renaming 'clearing efficiency' to 'clearing time', and revise the last sentence like:

      '...The results showed that the average transmittance increased by 20.12% in 1mm-thick cleared tissue slices.'

      We thank the reviewer for this helpful suggestion. Accordingly, we have replaced the term “clearing efficiency” with “clearing time” and revised the final sentence to reflect this change. The revised wording is as follows:

      “The results showed that the average transmittance increased by 20.12% in cleared tissue slices with a thickness of 1 mm.”

      (4) While the dye perfusion was indeed on full lobe, FigS1F also seems to be rather a thick section instead of a full 3d reconstruction. This is OK, but please, be clear and specific about this in the respective part of the ms.

      We thank the reviewer for the careful review and detailed comments. We would like to clarify that Fig. S1F shows whole-lobe imaging of the mouse left liver lobe obtained after dye perfusion at the whole-liver scale, rather than an image derived from a thick tissue section. Although this image does not represent a three-dimensional reconstruction, it does reflect imaging of the entire left liver lobe at the macroscopic level.

      In addition, for the reviewer’s reference, we have provided in this response a representative image of a 200 μm-thick liver tissue section to directly illustrate the morphological differences between thick-section imaging and whole-lobe imaging. We note that the third and fourth panels in Fig. 1G of the main text already show local imaging results from 200 μm-thick sections; in contrast, the comparative image provided here presents a larger field of view and overall morphology. To avoid redundancy, this additional image is included solely for clarification in the present response and has not been incorporated into the revised manuscript or the supplementary materials.

      (11) Regarding the 'transmission quantification':

      'Regarding the comparative quantification of different clearing methods, as the reviewer noted, nearly all aqueous or organic solvent based clearing techniques can achieve relatively uniform transparency in 1 mm thick tissue sections, so differences at this thickness are limited.'

      So, based on all these, I think, measuring/comparisons of clearing efficacy in the present form are kind of pointless --- one may consider omitting this part.

      We thank the reviewer for the valuable comments. The purpose of the transmittance quantification in this study was not to provide a comprehensive comparison among different tissue-clearing methods, but rather to serve as a quantitative reference supporting the optimization of the Liver-CUBIC protocol. Accordingly, we have narrowed and clarified the relevant statements in the revised manuscript to define their scope and avoid overinterpretation.

      The revised text now reads as follows:

      “Importantly, Liver-CUBIC treatment did not induce significant tissue expansion (Figure 1B–D). In addition, quantitative transmittance measurements in 1-mm-thick cleared tissue slices showed an average increase of 20.12% (P < 0.0001; 95% CI: 19.14–21.09; Figure 1E).”

      Author response image 1.

      (16) It is OK, but please, indicate this clearly in the Methods/Results because in its present form it may be confusing for the reader: which color means what.

      We thank the reviewer for this helpful request for clarification. We agree that the previous wording may have caused confusion regarding the meaning of different MCNP colors. Accordingly, we have revised the Methods section and the relevant figure legends to clearly state that the color assignment of MCNP dyes is not fixed across different experiments or figures. The use of different colors serves solely for visualization and presentation purposes, facilitating the distinction of anatomical structures in multichannel and three-dimensional imaging, and does not indicate any fixed or intrinsic correspondence between a specific color and a particular vascular or ductal system. We believe that this clarification will help prevent misinterpretation and improve the overall clarity of the manuscript.

      (17) Still I think the hepatic artery is extremely shrunk, while the portal vein is extremely dilated. Please, note that in the referring figure (from Adori et al), hepatic artery and portal vein are ca 50 micrometers and 250 micrometers in diameter, respectively. In your figure, as I see, ca. 9-10 micrometers and 125 micrometers, respectively. This means 5x (Adori) vs. 13-14x differences (you). I would not say that this is necessarily problematic --- but may reflect some perfusion issues that may be good to consider.

      We thank the reviewer for the careful comparison and acknowledge the quantitative differences pointed out. Compared with the study by Adori et al., the diameter ratio between the hepatic artery and the portal vein in our images does indeed differ to some extent. We believe that this discrepancy primarily arises from methodological differences in imaging and analysis strategies between the two studies.

      In the work by Adori et al., periportal vasculature identification and three-dimensional segmentation were mainly based on 488 nm autofluorescence signals acquired from inverted tissues. This signal predominantly reflects the overall outline of periportal tissue regions rather than direct imaging of the vascular lumen itself. Consequently, the measured “vessel diameter” largely represents a spatial domain delineated by surrounding periportal structures, and does not necessarily correspond to the actual or functional luminal diameter of the vessel.

      In contrast, the present study employed fluorescent MCNP dye perfusion under low perfusion pressure, combined with tissue clearing and three-dimensional optical imaging. Under these experimental conditions, the measured vessel diameters more closely reflect the perfusable luminal space of vessels in a fixed state, rather than their maximally dilated diameter, and are not defined by the morphology of surrounding tissues. This distinction is particularly relevant for the hepatic artery: as a high-resistance, smooth muscle–rich vessel, its diameter is highly sensitive to perfusion pressure and post-excision changes in vascular tone. In comparison, the portal vein exhibits greater compliance and is relatively less affected by these factors.

      Based on these methodological differences, the observation of relatively smaller apparent hepatic arterial diameters—and consequently a higher arterial-to-portal vein diameter ratio—under dye perfusion–based optical imaging conditions is an expected outcome. Importantly, the primary focus of the present study is the identification and characterization of the periportal lamellar complex (PLC) as a three-dimensional lamellar tissue structure that can be stably and reproducibly recognized across different samples and imaging conditions, rather than absolute comparisons of vascular diameters.

      (21) After the presented documentation, I still have some concerns that the 'periportal lamellar complex (PLC)' that the Authors describe is really a distinct anatomical or functional unit. The confocal panel in Fig. 4F is nice and high quality. However, as far as I see, it shows that CD34+/Sca-1+ immunostaining is not specific for the presumptive PLCs in the peri-portal region. Instead, Sca-1 immunoreactivity is highly abundant also in the midzone --- to which the supposed PLCs do not extend, according to the cartoon shown in panel D, same figure. Notably, this questions also the specificity of the single cell analysis.

      We thank the reviewer for this detailed and important comment regarding the specificity of CD34<sup>+</sup>/Sca-1<sup>+</sup> markers and the definition of the periportal lamellar complex (PLC).

      It should be emphasized that the PLC is not defined on the basis of any single molecular marker, but rather by a reproducible periportal lamellar anatomical structure consistently revealed by three-dimensional imaging across multiple samples. The co-expression of CD34 and Sca-1 is interpreted within this clearly defined anatomical context and is used to characterize the molecular features of endothelial cells associated with the PLC structure.

      As shown in Fig. 4F, the co-expression of CD34 and Sca-1 delineates a continuous, lamellar endothelial structure surrounding the portal vein. In contrast, outside the periportal region—including the midlobular areas—Sca-1 or CD34 expression can also be detected, but these signals appear scattered and discontinuous, lacking an organized lamellar topology.

      In the single-cell transcriptomic analysis, we treated CD34<sup>+</sup>/Sca-1<sup>+</sup> endothelial cells as an operational population to explore molecular features that may be enriched in the microenvironment of the periportal lamellar complex (PLC). Importantly, this analysis was intended to provide molecular clues associated with the PLC, rather than to precisely assign spatial locations or identities to individual cells.

      Occasional isolated Sca-1<sup>+</sup> signals detected outside the periportal region do not affect the anatomical definition of the PLC, nor do they alter the interpretation of the single-cell analysis. These analyses serve to provide supportive and exploratory molecular information for the structural identification of the PLC, rather than constituting decisive spatial evidence.

      (23) '....In the manuscript, we have carefully stated that this analysis is exploratory in nature and have avoided overinterpretation. In future studies, high-resolution spatial omics approaches will be invaluable for more precisely delineating the molecular characteristics of these fine structures.'

      I do not find these statements either in the Discussion or in the Results. I must reiterate my opinion that the applied methodical approach in the single cell transcriptomics part has severe limitations, and the readers must be aware of this.

      We thank the reviewer for this further comment. We understand and acknowledge the reviewer’s concerns regarding the methodological limitations of single-cell transcriptomic analyses, and we agree that these limitations should be clearly communicated to readers in the main text.

      We acknowledge that in the previous version of the manuscript, the exploratory nature of the single-cell transcriptomic analysis and its methodological boundaries were discussed only in the response to reviewers and were not explicitly stated in the manuscript itself. We thank the reviewer for pointing out this omission. In the revised manuscript, we have now added explicit clarifications in the main text to prevent potential overinterpretation of these results.

      In the present study, our primary effort is focused on the descriptive characterization of the three-dimensional anatomical organization and spatial relationships of the PLC using volumetric imaging and vascular labeling strategies. As a complementary exploratory analysis, we reanalyzed existing liver single-cell transcriptomic datasets to examine endothelial cell populations exhibiting PLC-associated features, and performed differential gene expression and Gene Ontology enrichment analyses. Importantly, these results are intended to provide molecular-level support for the structural identification of the PLC and to offer preliminary insights into its potential biological functions. Accordingly, we have narrowed the presentation and interpretation of the single-cell analysis in both the Results and Discussion sections of the revised manuscript.

      In addition, we have expanded the Discussion to address the limitations of current spatial transcriptomic approaches in validating a continuous three-dimensional structure such as the PLC. Most existing spatial transcriptomic methods rely on two-dimensional tissue sections of 8–10 μm thickness, whereas identification of the PLC depends on three-dimensional imaging of tissue volumes with thicknesses of ≥200 μm, making reliable reconstruction of its spatial continuity from single sections challenging. Furthermore, because each spatial transcriptomic capture spot often encompasses multiple adjacent cells, signal mixing effects further limit precise resolution of specific periportal microstructures.

      Overall, we agree with the reviewer’s central point that the limitations of single-cell transcriptomic analyses should be clearly understood by readers. By explicitly clarifying the methodological boundaries and refining the related statements in the main text, we believe this concern has now been adequately addressed in the revised manuscript. We thank the reviewer for identifying this omission, which has helped to improve the rigor and clarity of the study.

      Reviewer #3 (Recommendations for the authors):

      (1) While interesting observations, suitable for discussion, the following sections are speculations, given that no functional characterization of PLC importance has been performed yet. This is the most felt when commenting on the role in hematopoiesis, which transiently takes place in the liver during embryogenesis (Khan et al 2016) but ceases to exist after ligation of the umbilical inlet. Adult Liver hematopoiesis remains controversial, and more solid evidence would need to be presented to support its existence in PLC regions.

      265 - These findings suggest that the Periportal Lamellar Complex (PLC) is not only a morphologically and spatially distinct, low-permeability vascular unit surrounding the portal vein, but also likely serves as a critical nexus connecting the portal vein, hepatic artery, and liver sinusoids. Thus, the PLC constitutes a key node within the interactive vascular network of the mouse liver.

      We thank the reviewer for the comments and suggestions regarding the potential functional interpretation of the periportal lamellar complex (PLC), particularly its possible association with hematopoietic function. We would like to clarify that the statement on page 265 was intended solely to describe the structural characteristics and spatial organization of the PLC within the periportal vascular network. Specifically, the original wording aimed to summarize the morphological features of the PLC and its spatial relationships among the portal vein, hepatic artery, and hepatic sinusoids.

      Nevertheless, to minimize potential misunderstanding, we have revised this section to avoid unnecessary functional implications. The revised text now reads:

      “These results suggest that the periportal lamellar complex (PLC) is a morphologically and spatially distinct vascular structure that surrounds the portal vein and may serve as a key organizational node coordinating the spatial relationships among the portal vein, hepatic artery, and hepatic sinusoids. Accordingly, the PLC represents an important structural element within the interactive vascular network of the mouse liver.”

      This revision preserves the structural significance of the PLC while avoiding overinterpretation of its functional roles.

      (2) The same is true also for this section, following Figure 3 - no functional experiment tested this. For example, diphtheria toxin is expressed in the CD34+Sca1+ population. Or at least a careful mapping of the developing liver, which would indicate if the PLC precedes or follows the BD development.

      356 as a spatial positional cue guiding bile duct growth and branching but also as a regulatory node involved in coordinating bile drainage from the hepatic lobule into the biliary network.

      To avoid potential misunderstanding, we have further refined and revised the statements in the manuscript regarding the functional interpretation of the periportal lamellar complex (PLC) and its relationship to bile duct development. We agree that cell ablation strategies are of great importance for functional validation studies. However, it should be noted that CD34 and Sca-1 are relatively broadly expressed markers during liver development, labeling multiple endothelial, mesenchymal, and progenitor cell populations, and their expression is not restricted to the PLC. Owing to this broad expression pattern, ablation of CD34<sup>+</sup>Sca-1<sup>+</sup> cell populations would likely exert widespread effects on vascular and stromal structures, thereby complicating the distinction between direct PLC-specific effects and secondary developmental alterations. As such, this strategy may present technical limitations for specifically dissecting the role of the PLC in bile duct development. At the same time, given that the primary objective of this study is the systematic characterization of the three-dimensional anatomical features and spatial organization of the PLC, we have correspondingly revised the manuscript to restrict statements regarding the relationship between the PLC and bile ducts to spatial associations supported by the current data. Specifically, our results show that primary bile ducts run along the main portal vein trunk, secondary bile ducts exhibit directed branching toward the PLC region, and terminal bile duct branches tend to spatially cluster in the vicinity of the PLC, thereby forming a reproducible periportal spatial arrangement. Based on these observations, the PLC delineates a relatively conserved anatomical microenvironment within the portal region, whose spatial position is closely associated with the organization and terminal distribution of the intrahepatic bile duct network.

      We believe that these revisions more accurately reflect the experimental evidence and the defined scope of the present study.

      (3) The following statement ought to be rephrased or skipped, considering that CD34 and Sca1 (Ly6a) are markers of periportal endothelial cells (Pietilä et al., 2025, Gómez-Salinero et al., 2022) and as shown by the authors in their own Fig. 6D. In this context and the context of the CCL4 experiments, a "simple" proliferative progenitor portal vein endothelial cell phenotype, suggested also by the presence of DLL4 (Fig5A) and JAG1 (Pietilä et al., 2025) (Benedito et al., 2009) ought to be considered.

      409 Notably, CD34 and Sca-1 (Ly6a) were co-expressed exclusively within PLC structures surrounding the portal vein, but absent from central vein ECs and midzonal LSECs (Figure 4F).

      We thank the reviewer for pointing out the potential imprecision in this wording. We agree that both CD34 and Sca-1 (Ly6a) are well-established markers of periportal endothelial cells, as previously reported (Pietilä et al., 2025; Gómez-Salinero et al., 2022), and as also illustrated in Fig. 4F of our study.

      Accordingly, the original statement suggesting that CD34 and Sca-1 are co-expressed exclusively within the PLC structure may indeed represent an overinterpretation. Following the reviewer’s suggestion, we have revised the relevant text on page 409 by removing the exclusive phrasing (“only in”) and by emphasizing instead that CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cells are enriched in periportal regions associated with the PLC, rather than being specific to or confined within the PLC.

      In addition, in the context of the CCl<sub>4</sub>-induced liver fibrosis model, we agree with the reviewer that the observed expression of DLL4 and JAG1 under fibrotic conditions is more appropriately interpreted as reflecting an activated or proliferative periportal endothelial progenitor–like phenotype, rather than defining a novel endothelial lineage. The corresponding statements in the revised manuscript have been adjusted accordingly.

      (4) Again, these concluding sentences are based on correlative evidence of mRNA expression and literature but not experimental evidence.

      436 These findings suggest that this unique endothelial cell subset in the periportal region may possess dual regulatory functions in both metabolic and hematopoietic modulation

      441 results suggest that PLC endothelial cells may not only regulate periportal microcirculatory blood flow but also help establish a specialized microenvironment that potentially supports periportal hematopoietic regulation, contributing to stem cell recruitment, vascular homeostasis, and tissue repair.

      We thank the reviewer for this thoughtful comment. We agree that these statements are primarily based on transcriptomic correlation analyses and support from previous literature, rather than direct functional experimental evidence.

      Accordingly, in the revised manuscript, we have appropriately toned down and adjusted the relevant concluding statements to more accurately reflect their inferential nature. The revised wording emphasizes associations and potential involvement, rather than definitive functional roles. These changes preserve the overall scientific interpretation while aligning the level of inference more closely with the available evidence.

      The revised text now reads:

      “Finally, we found that the main trunk of the PLC is primarily composed of CD34<sup>+</sup>Sca-1<sup>+</sup>CD31<sup>+</sup> endothelial cells (Fig. 4J). These CD34<sup>+</sup>Sca-1<sup>+</sup> double-positive cells are mainly distributed in the basal region of the PLC structure and exhibit molecular features associated with hematopoiesis. Taken together, these results suggest that PLC endothelial cells may contribute to the establishment of a local microenvironment related to periportal hematopoietic regulation and may play potential roles in stem cell recruitment and maintenance of vascular homeostasis.”

      (5) The following part is speculative and based on re-analysis from the dataset that was gathered after 6 more weeks of CCL4 treatment (12weeks Su et al., 2021), then in the linked experiments from the manuscript. And should be moved to discussion or removed.

      504 Moreover, single-cell transcriptomic re-analysis revealed significant upregulation of bile duct-related genes in the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelium of PLC in fibrotic liver, with notably high expression of Lgals1 (Galectin-1) and Hgf (Figure 5G). Previous studies have shown that Galectin-1 is absent in normal liver parenchyma but highly expressed in intrahepatic cholangiocarcinoma (ICC), correlating with tumor dedifferentiation and invasion (Bacigalupo, Manzi, Rabinovich, & Troncoso, 2013; Shimonishi et al., 2001). Additionally, hepatocyte growth factor (HGF), particularly in combination with epidermal growth factor (EGF) in 3D cultures, promotes hepatic progenitor cells to form bile duct-polarized cystic structures (N. Tanimizu, Miyajima, & Mostov, 2007). Together, these findings suggest the PLC endothelium may act as a key regulator of bile duct branching and fibrotic microenvironment remodeling in liver fibrosis.

      Collectively, our results demonstrate that the PLC, situated between the portal vein and periportal sinusoidal endothelium, constitutes a critical vascular microenvironmental unit. It may not only colocalize with bile duct branches under normal physiological conditions, but also through its basal CD34<sup>+</sup>Sca-1<sup>+</sup> double-positive endothelial cells, potentially orchestrate bile duct epithelial proliferation, branching morphogenesis, and bile acid transport homeostasis via multiple signaling pathways. Particularly during liver fibrosis progression, the PLC exhibits dynamic structural extension, serving as a spatial scaffold facilitating terminal bile duct migration and expansion into the hepatic parenchyma (Figure 5H). These findings highlight the PLC endothelial cell population and the vascular-bile duct interface as key regulatory hubs in bile duct regeneration, tissue repair, and pathological remodeling, providing novel cellular and molecular insights for understanding bile duct-related diseases such as ductular reaction, cholangiocarcinoma, and cholestatic disorders, and offering potential targets for therapeutic intervention.

      We thank the reviewer for this careful and thought-provoking comment. We understand and agree with the reviewer’s assessment that this section involves a degree of inference, as the analysis is based on a re-analysis of a previously published single-cell transcriptomic dataset from a CCl<sub>4</sub>-induced liver fibrosis model (Su et al., 2021), rather than on experimental data directly generated in the present study.

      In response to the reviewer’s suggestion, we have carefully re-examined and revised the relevant paragraphs. Without altering the overall structure of the manuscript, we have appropriately moderated the wording to clarify that these results primarily describe the transcriptional features of PLC-associated CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial cells under fibrotic conditions, and their associations with bile duct–related gene expression, rather than providing direct functional evidence for their roles in bile duct branching or microenvironmental remodeling.

      In addition, we have explicitly clarified in the main text the data source and methodological limitations of the single-cell transcriptomic analysis, and emphasized that these findings should be interpreted in conjunction with the spatial information revealed by three-dimensional imaging. Through these revisions, we aim to retain the value of this analysis in providing complementary molecular insight into PLC characteristics, while avoiding potential over-interpretation of its functional implications.

      Formal suggestions:

      (6) The following sentence would benefit from being more clearly written.

      263 - The formation of PLC structures in the adventitial layer may participate in local blood flow regulation, maintenance of microenvironmental homeostasis.

      We thank the reviewer for this helpful suggestion. The sentence has been revised to improve clarity by correcting the parallel structure and refining the wording.

      The formation of PLC structures in the adventitial layer may participate in local blood flow regulation and the maintenance of microenvironmental homeostasis.

      (7) The following sentence is misleading as it implies cell sorting, and "subsetted" rather than "sorted" should be used.

      414 Based on this, we sorted CD34<sup>+</sup>Sca-1<sup>+</sup> endothelial populations from the total liver EC pool (Figure 4G).

      Thank you for your comment.

      We have revised the term as suggested. This avoids the misleading implication of physical sorting, as our operation was analytical subsetting of the target subpopulation.

      We appreciate your careful review.

      (8) Correct typos, especially in the results section related to Fig. 6. and formatting issues in the discussion.

      730 Morphologically, the PLC shares features with previously described telocytes (TCs)- 731 a recently identified class of interstitial cells in the liver observed via transmission electron

      We thank the reviewer for pointing out this textual error. In the submitted version, the sentence describing the morphological similarity between the PLC and previously reported telocytes was inadvertently interrupted due to a punctuation issue. This has now been corrected to ensure sentence integrity and consistent formatting.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study by Xu et al. focuses on the impact of clathrin-independent endocytosis in cancer cells on T cell activation. In particular, by using a combination of biochemical approaches and imaging, the authors identify ICAM1, the ligand for T cell-expressed integrin LFA-1, as a novel cargo for EndoA3-mediated endocytosis. Subsequently, the authors aim to identify functional implications for T cell activation, using a combination of cytokine assays and imaging experiments.

      They find that the absence of EndoA3 leads to a reduction in T cell-produced cytokine levels. Additionally, they observe slightly reduced levels of ICAM1 at the immunological synapse and an enlarged contact area between T cells and cancer cells. Taken together, the authors propose a mechanism where EndoA3-mediated endocytosis of ICAM1, followed by retrograde transport, supplies the immunological synapse with ICAM1. In the absence of EndoA3, T cells attempt to compensate for suboptimal ICAM1 levels at the synapse by enlarging their contact area, which proves insufficient and leads to lower levels of T cell activation.

      Strengths:

      The authors utilize a rigorous and innovative experimental approach that convincingly identifies ICAM1 as a novel cargo for Endo3A-mediated endocytosis.

      Weaknesses:

      The characterization of the effects of Endo3A absence on T cell activation appears incomplete. Key aspects, such as surface marker upregulation, T cell proliferation, integrin signalling and most importantly, the killing of cancer cells, are not comprehensively investigated.

      We agree with the reviewer that the effects of EndoA3 depletion on T cell activation were not characterized enough. In new data presented in Fig.S4G-J, we explored additional activation markers and proliferation parameters. We didn’t observe any difference for the surface markers PD-1, CD137 and Tim-3 between LB33-MEL EndoA3+ cells treated with control and EndoA3 siRNAs. Regarding proliferation (Fig. S4J), although the proliferation index seems slightly lower upon EndoA3 depletion, we didn’t observe any significant difference either. Degranulation has also been monitored (Fig. S4K), but we didn’t observe any significant differences. In the new Fig. 3F however, we performed chromium release assays to assess the killing of cancer cells. Very interestingly, we observed an ~15% higher lysis of LB33-MEL EndoA3+ cells after EndoA3 depletion, when compared to the control condition at a ratio of 3:1 T cells:target cells (where the maximal effect is observed). These data are further discussed in the discussion section (new §6-9).

      As Endo- and exocytosis are intricately linked with the biophysical properties of the cellular membrane (e.g. membrane tension), which can significantly impact T-cell activation and cytotoxicity, the authors should address this possibility and ideally address it experimentally to some degree.

      Evaluating changes in the biophysical properties of cancer cell plasma membrane upon EndoA3 depletion is not trivial. An indirect way to address this question is by observing the area and shape of cells after siRNA treatment. In the new data added in the new Fig. S4B-D, we compared the area, aspect ratio and roundness of LB33-MEL EndoA3+ cells treated with negative control or EndoA3 siRNAs. While we observed a slight cell area reduction upon EndoA3 depletion, no significant changes were observed regarding the aspect ratio and the roundness. Hence, we think that the biophysical properties of cancer cells are not drastically modified by EndoA3 depletion.

      Crucially, key literature relevant to this research, addressing the role of ICAM1 endocytosis in antigen-presenting cells, has not been taken into consideration.

      We thank the reviewer for this important point. We have now considered and cited the relevant literature (Discussion, Page no.9).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Xu et al. studies the relevance of endophilin A3-dependent endocytosis and retrograde transport of immune synapse components and in the activation of cytotoxic CD8 T cells. First, the authors show that ICAM1 and ALCAM, known components of immune synapses, are endocytosed via endoA3-dependent endocytosis and retrogradely transported to the Golgi. The authors then show that blocking internalization or retrograde trafficking reduces the activation of CD8 T cells. Moreover, this diminished CD8 T cell activation resulted in the formation of an enlarged immune synapse with reduced ICAM1 recruitment.

      Strengths:

      The authors show a novel EndoA3-dependent endocytic cargo and provide strong evidence linking EndoA3 endocytosis to the retrograde transport of ALCAM and ICAM1.

      Weaknesses:

      The role of EndoA3 in the process of T cell activation is shown in a cell that requires exogenous expression of this gene. Moreover, the authors claim that their findings are important for polarized redistribution of cargoes, but failed to show convincingly that the cargoes they are studying are polarized in their experimental system. The statistics of the manuscript also require some refinement.

      We fully acknowledge that the requirement for exogenous expression of EndoA3 in our immunological model represents a limitation of our study. Unfortunately, it remains challenging to identify cancer cell lines for which autologous CD8 T cells are available and that endogenously express all molecular players investigated (in particular EndoA3). At this stage, we do not have access to any other cancer cell line/autologous CD8⁺ T cell pairs that are sufficiently well characterized. In future studies, it would be valuable to investigate tumor types with high endogenous EndoA3 expression (such as glioblastomas, gliomas, and head and neck cancers) for which autologous CD8 T cells could be obtained, but this remains technically challenging.

      To address the reviewer’s second point regarding polarized redistribution of cargoes, we have added new data in the new Figure 4 and Movies S8-9. Using high-speed spinningdisk live-cell confocal microscopy, we captured the movement of ICAM1-positive tubulovesicular carriers in cancer cells at the moment of contact with CD8 T cells. Capturing such events is technically challenging, as T cell–cancer cell contacts form randomly and transiently. Successful imaging requires that the cancer cell be well spread and express ICAM1–GFP at an optimal level (as it is transiently expressed as a GFP-tagged construct), while acquisition must occur precisely at the moment when the T cell initiates contact. Despite these technical constraints, we successfully imaged early stages of immune synapse formation, enabling visualization of ICAM1 vesicular transport.

      The data reveal a flux of ICAM1-positive carriers emerging from the perinuclear region (corresponding to the Golgi area) and moving toward the contact site with the CD8 T cell, with fusion events of vesicles occurring at the developing immune synapse. AI-based segmentation and tracking analyses showed that ICAM1-positive carrier trajectories were predominantly oriented toward the forming immune synapse, whereas carriers moving toward other cellular regions were markedly less frequent. These results provide direct evidence for polarized ICAM1 transport via vesicular trafficking toward the immune synapse.

      Reviewer #3 (Public review):

      Summary:

      Shiqiang Xu and colleagues have examined the importance of ICAM-1 and ALCAM internalization and retrograde transport in cancer cells on the formation of a polarized immunological synapse with cytotoxic CD8+ T cells. They find that internalization is mediated by Endophilin A3 (EndoA3) while retrograde transport to the Golgi apparatus is mediated by the retromer complex. The paper is building on previous findings from corresponding author Henri-François Renard showing that ALCAM is an EndoA3dependent cargo in clathrin-independent endocytosis.

      Strengths:

      The work is interesting as it describes a novel mechanism by which cancer cells might influence CD8+ T cell activation and immunological synapse formation, and the authors have used a variety of cell biology and immunology methods to study this. However, there are some aspects of the paper that should be addressed more thoroughly to substantiate the conclusions made by the authors.

      Weaknesses:

      In Figure 2A-B, the authors show micrographs from live TIRF movies of HeLa and LB33MEL cells stably expressing EndoA3-GFP and transiently expressing ICAM-1-mScarlet. The ICAM-1 signal appears diffuse across the plasma membrane while the EndoA3 signal is partially punctate and partially lining the edge of membrane patches. Previous studies of EndoA3-mediated endocytosis have indicated that this can be observed as transient cargo-enriched puncta on the cell surface. In the present study, there is only one example of such an ICAM-1 and EndoA3 positive punctate event. Other examples of overlapping signals between ICAM-1 and EndoA3 are shown, but these either show retracting ICAM1 positive membrane protrusions or large membrane patches encircled by EndoA3. While these might represent different modes of EndoA3-mediated ICAM-1 internalization, any conclusion on this would require further investigation.

      We agree with the reviewer that the pattern of cargoes during endocytosis (puncta vs large patches) as observed by live-cell TIRF microscopy may be confusing. Actually, a punctate pattern has been observed quasi systematically when we monitored the uptake of endogenous cargoes via antibody uptake assays (whatever the imaging approach: TIRF, spinning-disk, classical confocal or lattice light-sheet microscopy). For example:

      - ALCAM: Fig.1e-h, Supplementary Figure 5 and Supplementary Movies 1-3 and 6 in Renard et al. 2020, https://doi.org/10.1038/s41467-020-15303-y; Fig.1D and Movie 2 in Tyckaert et al. 2022, https://doi.org/10.1242/jcs.259623.

      - L1CAM: Fig.2 and 3D, Movies S1-4 in Lemaigre et al. 2023, https://doi.org/10.1111/tra.12883.

      In rare examples, bigger clusters of antibodies were observed, where EndoA3 was observed to surround them, delineate them in a “lasso-like” pattern, and the clusters were progressively taken up:

      - ALCAM: Supplementary Movie 4 in Renard et al. 2020, https://doi.org/10.1038/s41467-020-15303-y.

      However, bigger patches of cargoes were more often observed when uptake was observed using transient expression of GFP-/mCherry-tagged versions of cargoes. In these cases, EndoA3 was predominantly observed to delineate cargo patches as a “lasso-like” pattern, progressively triming those patches leading to endocytosis. For example:

      - L1CAM: Fig.3E, Movie S5-7 in Lemaigre et al. 2023, https://doi.org/10.1111/tra.12883.

      - We also observed this pattern with CD166-GFP (unpublished).

      The fact that we observed rather patches than punctate patterns upon transient expression of fluorescently-tagged constructs of cargoes is likely due to the elevated expression level of the cargoes.

      Therefore, the patchy pattern observed for ICAM1 and ALCAM, transiently expressed in fusion with fluorescent proteins, and surrounded by EndoA3 in Fig.2A-B and old Movies S1-3, is not surprising. Of note, upon anti-ALCAM antibody uptake, we observed a more punctate pattern (Fig.2C), as previously described. Unfortunately, the lower quality of commercial anti-ICAM1 antibody did not allow us to proceed to uptake assays as for ALCAM.

      Regarding Fig.S2 and old Movies S4-5, we agree with the reviewer that these data may be misleading, as they represent phenomena happening at protrusions and contact zones between two adjacent cells. We have now replaced these images with other examples where we avoid contact zones (Fig.S2 and new Movies S5-7).

      These different patterns (patches vs dots) are still unexplained at the current stage, and may indeed represent different modes of endocytosis. We think these various patterns may depend on the abundance/expression level of cargoes and their degree of clustering. This will be investigated in future studies. Still, whatever the pattern, these data demonstrate and confirm the association between EndoA3 and cargoes (such as ICAM1 or ALCAM), even in the absence of antibodies.

      Moreover, in Figure 2C-E, uptake of the previously established EndoA3 endocytic cargo ALCAM is analyzed by quantifying total internal fluorescence in LB33-MEL cells of antibody labelled ALCAM following both overexpression and siRNA-mediated knockdown of EndoA3, showing increased and decreased uptake respectively. Why has not the same quantification been done for the proposed novel EndoA3 endocytic cargo ICAM-1? Furthermore, if endocytosis of ICAM-1 and ALCAM is diminished following EndoA3 knockdown, the expression level on the cell surface would presumably increase accordingly. This has been shown for ALCAM previously and should also be quantified for ICAM-1.

      As correctly pointed by the reviewer, anti-ICAM1 antibody uptake assays would have been great. We have tried to do them many times. Unfortunately, all commercial antibodies we tested did not yield satisfying results in uptake experiments. Either the labeling was too week/non-specific, or the antibody was not effectively stripped from the cell surface by acid washes, i.e. the acid-wash conditions required for efficient stripping were too harsh for the cells to tolerate. We have tried other approaches using the same commercial antibody which do not require acid washes (loss of surface assays by FACS, or uptake assays using surface protein biotinylation) or based on insertion of an Alfa-tag in the extracellular part of ICAM1 by CRISPR-Cas9 and detection of ICAM1 with an antiAlfa-tag nanobody (unpublished approach; collaboration with the lab of Prof. Leonardo Almeida-Souza, University of Helsinki, who developed the approach), but without success. However, we were more successful with the SNAP-tag-based approach to follow retrograde transport, for which the commercial anti-ICAM1 antibody worked properly. In Fig. 1F, we could show that retrograde transport of ICAM1 (and thus most likely its endocytosis step) was significantly decreased upon EndoA3 depletion in HeLa cells, indirectly demonstrating that ICAM1 is effectively an EndoA3-dependent cargo.

      Regarding the fact that surface level of ICAM1 should increase upon perturbation of EndoA3-mediated endocytosis, we agree with the reviewer that this could be an expected result. However, this is not necessarily systematic, as the surface level of a protein cargo is always the result of a balance between its endocytosis, recycling to plasma membrane, and lysosomal degradation. We also have to take into account the neosynthesized protein flux. One must also consider that multiple endocytic mechanisms exist in parallel, and that the perturbation of one mechanism (EndoA3-mediated CIE, here) may be partially compensated by others, as cargoes can often be taken up via multiple endocytic doors. Hence, an increased abundance at the cell surface is not always guaranteed upon endocytosis perturbation. Anyway, we measured the cell surface level of both ICAM1 and ALCAM in LB33-MEL EndoA3+ cells treated with negative control or EndoA3 siRNAs (Fig. S4E-F). Only minor differences were observed.

      In Figure 4A the authors show micrographs from a live-cell Airyscan movie (Movie S6) of a CD8+ T cell incubated with HeLa cells stably expressing HLA-A*68012 and transiently expressing ICAM1-EGFP. From the movie, it seems that some ICAM-1 positive vesicles in one of the HeLa cells are moving towards the T cell. However, it does not appear like the T cell has formed a stable immunological synapse but rather perhaps a motile kinapse. Furthermore, to conclude that the ICAM-1 positive vesicles are transported toward the T cell in a polarized manner, vesicles from multiple cells should be tracked and their overall directionality should be analyzed. It would also strengthen the paper if the authors could show additional evidence for polarization of the cancer cells in response to T-cell interaction.

      A similar point was raised by reviewer #2. We have revised this section accordingly. In the new Fig. 4 and Movies S8-9, we replaced the live-cell Airyscan confocal data with highspeed spinning-disk confocal imaging data, enabling a more accurate analysis of cargo polarized redistribution and at a higher time resolution.

      Using this approach, we captured the movement of ICAM1-positive tubulo-vesicular carriers in cancer cells at the moment of contact with CD8 T cells. Capturing such events is technically challenging, as T cell–cancer cell contacts form randomly and transiently. Successful imaging requires that the cancer cell be well spread and express ICAM1–GFP at an optimal level (as it is transiently expressed as a GFP-tagged construct), while acquisition must occur precisely at the moment when the T cell initiates contact. Despite these technical constraints, we successfully imaged early stages of immune synapse formation, enabling visualization of ICAM1 vesicular transport.

      The data reveal a flux of ICAM1-positive carriers emerging from the perinuclear region (corresponding to the Golgi area) and moving toward the contact site with the CD8 T cell, with fusion events of carriers occurring at the developing immune synapse.

      AI-based segmentation and tracking analyses showed that ICAM1-positive carrier trajectories were predominantly oriented toward the forming immune synapse, whereas carriers moving toward other cellular regions were markedly less frequent. These results provide direct evidence for polarized ICAM1 transport via vesicular trafficking toward the immune synapse.

      Finally, in Figures 4D-G, the authors show that the contact area between CD8+ T cells and LB33-MEL cells is increased in response to siRNA-mediated knockdown of EndoA3 and VPS26A. While this could be caused by reduced polarized delivery of ICAM-1 and ALCAM to the interface between the cells, it could also be caused by other factors such as increased cell surface expression of these proteins due to diminished endocytosis, and/or morphological changes in the cancer cells resulting from disrupted membrane traffic. More experimental evidence is needed to support the working model in Figure 4H.

      Regarding the cell surface expression of both ICAM1 and ALCAM, as already explained above, only minor differences were observed (Fig. S4E-F). Regarding morphological changes of cancer cells upon EndoA3 depletion (Fig. S4B-D), we compared the area, aspect ratio and roundness of LB33-MEL EndoA3+ cells treated with negative control or EndoA3 siRNAs. While we observed a slight cell area reduction upon EndoA3 depletion, no significant changes were observed regarding the aspect ratio and the roundness. Cancer cell morphology is thus not drastically modified by EndoA3 depletion. All these new data are now discussed in the manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers discussed the paper and all agreed it was incomplete in supporting the conclusions. Additional data needed to support the conclusions were:

      (1) Better characterisation of Endo3A-expressing and knock-down cells such as morphology, ICAM-1, and ALCAM surface levels to name two parameters.

      As discussed above, we have now added new data addressing these points:

      - Morphology: Fig. S4B-D

      - ICAM1 and ALCAM surface levels: Fig. S4E-F These new data are discussed in the main text.

      (2) Better characterisation of the ICAM-1 polarisation process. Does this require interaction with LFA-1 can ICAM-1 be delivered to the synapse without this?

      As discussed above, we have now added new data better addressing the characterization of ICAM1 polarized trafficking to the immune synapse, that can be found in the new Fig. 4 (high-speed spinning-disk confocal imaging of ICAM1 trafficking upon conjugate formation between CD8 T cell and cancer cell). The text has been modified accordingly. The dependency on LFA-1 has not been addressed directly, but we may suppose it is indeed important as (i) it has already been addressed in other cellular systems by previous studies (Jo et al. 2010), and (ii) we observed a denser flux of ICAM1-positive carriers in the cancer cell toward regions involved in immune synapses with CD8 T cells, than other regions. As we didn’t address this question more directly in our study, we briefly mentioned this point in the Discussion section.

      (3) Better characterisation of T cell response- activation markers, cytotoxicity assays.

      As discussed above, we have now added new data addressing these points:

      - Cell surface activation markers: Fig. S4G-I

      - Proliferation: Fig. S4J

      - Degranulation: Fig. S4K

      - Cytotoxic activity: Fig. 3F

      These new data are discussed in the main text.

      (4) Citing relevant literature.

      The relevant literature (in particular the paper by Jo et al. 2010) is now cited and discussed.

      (5) Number of donors evaluated - is it true there was only one blood donor? For human studies better to have key results on >4 donors.

      Our immunological working model indeed originates from a single patient (Baurain et al., 2000), from whom both a cancer cell line (LB33-MEL) and autologous CD8 T cells were derived. These CD8 T cells specifically recognize an HLA molecule presenting a defined antigenic peptide (MUM-3) on the surface of the cancer cells. This provides us with a unique and fully natural experimental system that allows us to faithfully reconstitute cytotoxic T lymphocyte (CTL)-mediated killing of cancer cells in vitro.

      Using CD8 T cells from other donors would not be meaningful in this context, as they would not recognize the LB33-MEL cells. Conversely, testing the same CD8 T cells on other cancer cell lines requires engineering these lines to express the appropriate HLA molecule and to be exogenously pulsed with the correct antigenic peptide – which is precisely what we did with the HeLa cell line.

      Therefore, increasing the number of donors would require obtaining both cancer cell lines and CD8 T cells from each donor, ideally with evidence that the donor’s T cells recognize their own tumor cells. This is technically challenging and not trivial, although it would indeed be highly valuable to diversify immunological models in future studies.

      Importantly, the high specificity of our autologous co-culture system, where cancer cells interact with their naturally matched CD8 T cells, offers clear advantages over commonly used in vitro models such as Jurkat (T) and Raji (B) cell lines, which rely on artificial stimulation with a superantigen to enforce immunological synapse formation and T cell activation.

      (6) How does the binding of antibodies to ICAM-1 and ALCAM impact their trafficking?

      As IgG antibodies are bivalent and can bind two target antigens, they may induce clustering, which could in turn affect endocytosis. To address this concern, we performed an uptake assay based on surface protein biotinylation using a cleavable biotin reagent (with a reducible linker). Briefly, after allowing endocytosis for different time intervals, cell surface–exposed biotins were removed by treatment with the cellimpermeable reducing agent MESNA, while internalized (endocytosed) biotinylated proteins remained protected. These internalized proteins were then recovered by affinity purification on streptavidin resin and analyzed by Western blot to detect the protein of interest.

      Importantly, this uptake assay can be performed in the absence or presence of an anticargo antibody, allowing assessment of its potential influence on endocytosis. Author response image 1 shows the results for ALCAM uptake in HeLa cells, with and without anti-ALCAM antibody:

      Author response image 1.

      Antibody binding to an extracellular epitope of ALCAM increases its endocytosis. HeLa cellsurface proteins were biotinylated on ice using EZ-Link Sulfo-NHS-SS-Biotin (Pierce) and then incubated at 37 °C for the indicated times to allow endocytosis. Internalization was assessed in the absence or presence of an anti-ALCAM antibody (Ab) added to the extracellular medium. Endocytosis was stopped by returning the cells to ice, and surface-exposed biotin was removed by treatment with the cell-impermeable reducing agent MESNA. Internalized, MESNA-resistant biotinylated proteins were affinity-purified on streptavidin resin and analyzed by Western blot to detect ALCAM. The “unstripped” condition shows the total amount of ALCAM at the cell surface at the beginning of the experiment (signal at ~95 kDa). Quantification of the time course (normalized to the no-antibody condition) shows increased ALCAM endocytosis in the presence of antibody at 15 and 30 min. Blot is representative of two independent experiments; quantifications include data from both experiments.

      We observed that the anti-ALCAM antibody slightly enhanced ALCAM uptake. A similar experiment was attempted for ICAM1, but we were unable to detect the protein by Western blot using the available commercial antibody.

      Although this outcome was expected, it highlights a potential caveat in using antibodies to monitor endocytosis. Alternative tools such as nanobodies, while monovalent and theoretically less perturbing, are not yet available for many cargo proteins and may still influence cargo conformation or dynamics. Therefore, antibodies remain the current gold standard in endocytosis studies. Nevertheless, data obtained with antibodies should always be validated by complementary approaches that do not rely on antibody binding, as we have done in this study (e.g. live-cell imaging of fluorescently tagged proteins).

      The work is of interest and we look forward to your response/revision.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for submitting your manuscript which I had the pleasure to review. While I enjoyed your work, I feel that it would strongly benefit by addressing the following points:

      (1) In-depth characterization of T cell responses upon Endo3A depletion: The characterization should be expanded to include surface marker upregulation, T cell proliferation, and, most importantly, tumor cell cytotoxicity. I was wondering if the incomplete characterization of T-cell responses is due to limited supplies of antigenspecific T-cells? My understanding is that these cells have been derived from a single patient. This also raises concerns in terms of reproducibility as all data are practically from a single biological replicate. My suggestion would be to use an additional system of specific cell-cell contacts to complement the current findings. For instance, HeLa cells could be transfected to express CD19 or EpCAM, for both of which bispecific T cell engagers (Invivogen) exist that would allow specific contact formation, thereby allowing the study of the effect of Endo3A depletion across T cells from different donors and through a more complete set of assays.

      We refer the reviewer to our responses above, where these points have been addressed in detail. We sincerely thank the reviewer for the excellent suggestion of transfecting HeLa cells with CD19 or EpCAM and using bispecific T-cell engagers. However, after careful consideration, we concluded that this approach falls outside the scope of the present study, which was specifically designed to investigate the most natural system, cancer cells and their autologous CD8 T cells. We nevertheless appreciate this insightful suggestion and will certainly consider it for future studies.

      (2) Alterations in membrane tension as an alternative explanation: Endo- and exocytosis have been found to influence the biophysical properties of cells, such as membrane tension (e.g., Djakbaravo et al., 2021, PMID: 33788963), which in turn influences their susceptibility to cytotoxic T cells with lower tension corresponding to reduced cytotoxicity (e.g., Basu & Whitlock, 2016, PMID: 26924577). Thus, interference with endocytic pathways could arguably lead to changes in membrane tension that could contribute to the observed effects. These possible effects should be discussed and addressed experimentally to a degree. While measuring membrane tension directly requires specialized expertise (e.g., tether pulling experiments) and is not within the scope of this study, membrane tension affects cell spreading and actin organization. Thus, I would suggest conducting a thorough comparative phenotypical and morphological characterization of the Endo3A+ and Endo3A- cancer cells to estimate the possible effect of changes in membrane tension (if any) on the results.

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      (3) Citation and consideration of earlier work: Jo & Kwon et al., 2010 (PMID: 20681010) have previously shown that ICAM1 undergoes clathrin-independent recycling and repolarization to the immunological synapse in APCs. Furthermore, they provided evidence that actin-based transport, but not lateral diffusion, together with recycling is crucial for the repolarization of ICAM1 to the immunological synapse. This important earlier work has to be cited. Actin-based transport on the cell surface has not been considered in the current manuscript. In light of these earlier findings, it is unclear in Figure 4A if ICAM1 is delivered to the T cell from within- or from the surface of the cancer cell. I would suggest changing the imaging modalities in this experiment to be able to differentiate cell surface from internal ICAM1, e.g., by detaching the cancer cells from the surface as has been done in Fig. 4B, E, and F.

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      Reviewer #2 (Recommendations for the authors):

      Major comments:

      (1) The authors should be more careful with their claims about the importance of their results for cell polarity as their evidence for this is scarce (i.e. The live-cell imaging in Figure 4A is not quantified and the ICAM1 polarization effect shown in figure 4B-C is, albeit significant, small and not very convincing).

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      (2) The absence (or very low expression) of EndoA3 on the LB33-MEL cell suggests that EndoA3-mediated recycling of immune synaptic components is not required for T-cell activation. The fact that EndoA3 exogenous expression in LB33-MEL cells leads to increased cytokine production in T cells is, however, interesting.

      We fully agree with the reviewer’s observation. Although EndoA3 is not expressed in some cellular contexts, its cargoes may still be present. It is therefore reasonable to assume that alternative endocytic mechanisms can compensate for its absence. It is now widely accepted that many cargoes can be internalized through multiple endocytic routes, and that the relative contribution of each pathway depends strongly on the cellular and physiological context.

      For example, we have shown that ALCAM and L1CAM, although primarily internalized via clathrin-independent pathways, present a minor fraction (< 25%) undergoing clathrinmediated endocytosis (Renard et al., 2020; Lemaigre et al., 2023). Moreover, we observed that inhibition of macropinocytosis enhances EndoA3-mediated endocytosis of ALCAM, indicating a crosstalk between specific EndoA3-mediated clathrin-independent endocytosis (CIE) and non-specific macropinocytosis (Tyckaert et al., 2022).

      Thus, even in the absence of EndoA3, its cargoes are likely internalized through alternative endocytic routes. Nonetheless, our data clearly demonstrate that EndoA3 expression markedly enhances the endocytosis and intracellular trafficking of its cargoes, ultimately leading to modified CD8 T cell responses.

      (3) For the statistics in bar graphs (graphs 1C, D, E &F; 3E, 3F, S1C-I, and S3C), one cannot have all values for controls simply normalized to 1. This procedure hides the variance for the controls between each replicate and makes any statistics meaningless.

      We thank the reviewer for this important remark. Regarding Figures 1C–F, S1C–I, and S3C, which correspond to quantifications from Western blots, it is standard practice to normalize the quantification to a control condition set to 1 (or 100%). Absolute signal intensities cannot be directly compared across different blots due to the variability inherent to this semi-quantitative technique. For this reason, we chose to keep the data presented in normalized form. However, we agree that this type of data require the careful choice of a convenient statistical analysis approach. Here, we choose one-sample T tests, allowing to test the hypothesis that the various siRNA conditions are different from 100% (the normalized value of the siCtrl condition). We adapted the statistical analysis accordingly in the different figures mentioned.

      Regarding old Figures 3E–F (now Fig. 3E and 3G), which correspond to IFNγ secretion assays, we agree that representing IFNγ secretion as a fold change relative to a control condition may obscure inter-experimental variability. However, this format was intentionally chosen to facilitate data interpretation, as IFNγ secretion was quantified by ELISA and also displayed inter-experimental variability. For completeness, we now provide below the corresponding graphs showing absolute IFNγ concentrations, which retain the information on inter-experimental variability (Author response image 2). As you can see, the overall conclusions remain unchanged.

      Author response image 2.

      IFNg secretion data corresponding to Fig. 3E and 3G, expressed in absolute values (pg/mL)

      Minor comments:

      (1) What happens to surface and total levels of ICAM1 and ALCAM in the retromer or EndoA3 knockdown/overexpression conditions? This information would put the effects described into context.

      We refer the reviewer to our responses above, where these points have been addressed in detail. New data have been added and the text of our manuscript has been modified accordingly.

      (2) The authors should clearly indicate that BFA means bafilomycin A in the figure legend or methods.

      BFA corresponds to Brefeldin A. We have now clarified this information in legends and methods.

      (3) In the sentence: "These data demonstrate that retromer-mediated retrograde transport is critical for trafficking ALCAM and ICAM1 to the Golgi and that this process requires the full secretory capacity of the TGN." What do the authors mean by full secretory capacity?

      We have modified the sentence: “Together, these data demonstrate that retromermediated retrograde transport is critical for trafficking ALCAM and ICAM1 to the Golgi and that this process requires efficient secretion from the TGN (as evidenced by the involvement of Rab6).”

      (4) The method used for retrograde transport seems to be a variation of the original protocol (reference 43). The manuscript would benefit from a thorough explanation of this assay, rather than citing the original protocol.

      We did not modify the original SNAP-tag–based protocol used to monitor retrograde transport. A comprehensive methodological paper has been published (ref. 44), and we have followed it strictly. Additionally, we briefly summarized the rationale of the approach in Figure 1A and in the first paragraph of the Results section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper by Karimian et al proposes an oscillator model tuned to implement binding by synchrony (BBS*) principles in a visual task. The authors set out to show how well these BBS principles explain human behavior in figure-ground segregation tasks. The model is inspired by electrophysiological findings in non-human primates, suggesting that gamma oscillations in early visual cortex implement feature-binding through a synchronization of feature-selective neurons. The psychophysics experiment involves the identification of a figure consisting of gabor annuli, presented on a background of gabor annuli. The participants' task is to identify the orientation of the figure. The task difficulty is varied based on the contrast and density of the gabor annuli that make up the figure. The same figures (without the background) are used as inputs to the oscillator model. The authors report that both the discrimination accuracy in the psychophysics experiment and the synchrony of the oscillators in the proposed model follow a similar "Arnold Tongue" relationship when depicted as a function of the texture-defining features of the figure. This finding is interpreted as evidence for BBS/gamma synchrony being the underlying mechanism of the figure-ground segregation.

      Note that I chose to use "BBS" over gamma synchrony (used by the authors) in this review, as I am not convinced that the authors show evidence for synchronization in the gamma-band.

      We thank the reviewer for their careful assessment of our manuscript and useful comments that we believe have served to strengthen our work.

      Strengths:

      The design of the proposed model is well-informed by electrophysiological findings, and the idea of using computational modeling to bridge between intracranial recordings in non-human primates and behavioral results in human participants is interesting. Previous work has criticized the BBS synchrony theory based on the observation that synchronization in the gamma-band is highly localized and the frequency of the oscillation depends on the visual features of the stimulus. I appreciate how the authors demonstrate that frequency-dependence and local synchronization can be features of BBS, and not contradictory to the theory. As such, I feel that this work has the potential to contribute meaningfully to the debate on whether BBS is a biophysically realistic model of feature-binding in visual cortex.

      Weaknesses:

      I have several concerns regarding the presented claims, assessment of meaning and size of the presented effects, particularly with regard to the absence of a priori defined effect sizes.

      Firstly, the paper makes strong claims about the frequency-specificity (i.e., gamma synchrony) and anatomical correlates (early visual cortex) of the observed effects. These claims are informed by previous electrophysiological work in non-human primates but are not directly supported by the paper itself. For instance, the title contains the word "gamma synchrony", but the authors do not demonstrate any EEG/MEG or intracranial data in from their human subjects supporting such claims, nor do they demonstrate that the frequencies in the oscillator model are within the gamma band. I think that the paper should more clearly distinguish between statements that are directly supported by the paper (such as: "an oscillator model based on BBS principles accounts for variance in human behavior") and abstract inferences based on the literature (such as "these effects could be attributed to gamma oscillations in early visual cortex, as the model was designed based on those principles").

      We thank the reviewer for this helpful comment and agree that the scope of our claims should be clearly delineated between what is directly supported by our data and what is theoretically inferred from prior literature.

      We revised the Abstract, Introduction, and early Discussion to moderate the strength of our statements and make the distinction explicit. The revised title now emphasizes that our study tests principles derived from prior work on gamma synchrony rather than directly demonstrating gamma activity in humans. Throughout the text, we use more cautious phrasing that highlights potential mechanisms and theoretical predictions. The intention of our study was not to position synchrony as the only viable mechanism of figure–ground perception. Rather, our goal was to reinvigorate it as a potential contender by showing that features often cited as limitations of synchrony-based binding may in fact be essential properties of the mechanism. We updated phrasing throughout the manuscript to make this clearer and avoid overstating the study’s contribution.

      Importantly, our model is not agnostic with respect to frequency band. Oscillator frequencies exhibited by model units are within the gamma range by design. Frequency emerges directly from the contrast within each oscillator’s receptive field, following an empirically established relationship between stimulus contrast and gamma frequency. To our knowledge, such a robust, quantitative relationship between stimulus features to exact oscillation frequency has not been consistently demonstrated for other frequency bands. This relationship yields gamma-band frequencies for all contrasts used in our simulations. The model is thus indeed a gamma oscillator model of V1, not a generic instantiation of Binding by Synchrony (BBS) principles.

      That said, we fully agree with the reviewer that our study cannot demonstrate a direct link between gamma synchrony in visual cortex and human behavior. Our behavioral and modeling results instead show that synchronization principles derived from gamma-band physiology in V1 can predict perceptual performance patterns. We now make this distinction explicit throughout the revised manuscript.

      Secondly, unlike the human participants, the model strictly does not perform figure-ground segregation, as it only receives the figure as an input.

      We thank the reviewer for the opportunity to clarify our modeling approach. We chose not to model the background to reduce computational cost, since including it requires a substantially larger number of oscillators without changing the model’s predictions. The model thus indeed only receives the figure region as input. We aimed to test the local grouping mechanism predicted by TWCO, rather than to simulate a full figure–ground segregation process including a read-out stage. Our model therefore isolates the conditions under which local synchrony emerges within the figure region, assuming that a downstream read-out mechanism (not explicitly modeled here) would detect regions of coherent activity. The exact nature of such a read-out mechanism was beyond the scope of our work.

      To confirm that our simplified model is a valid proxy, we ran additional simulations including the background and found that a coherent figure assembly reliably emerges, as can be seen in the phase-locking patterns relative to a reference oscillator at the center of the figure. This validates that the principles of local grouping we studied in isolation hold even when the figure is embedded in a noisy surround. We have added an explicit note in the Results (paragraph 2) that we only simulate the figure and added Supplementary Figure S1 showing the additional simulations.

      Finally, it is unclear what effect sizes the authors would have expected a priori, making it difficult to assess whether their oscillator model represents the data well or poorly. I consider this a major concern, as the relationship between the synchrony of the oscillatory model and the performance of the human participants is confounded by the visual features of the figure. Specifically, the authors use the BBS literature to motivate the hypothesis that perception of the texture-defined figure is related to the density and contrast heterogeneity of the texture elements (gabor annuli) of the figure. This hypothesis has to be true regardless of synchrony, as the figure will be easier to spot if it consists of a higher number of high-contrast gabors than the background. As the frequency and phase of the oscillators and coupling strength between oscillators in the grid change as a function of these visual features, I wonder how much of the correlation between model synchrony and human performance is mediated by the features of the figure. To interpret to what extent the similarity between model and human behavior relies on the oscillatory nature of the model, the authors should find a way to estimate an empirical threshold that accounts for these confounding effects. Alternatively, it would be interesting to understand whether a model based on competing theories (e.g., Binding by Enhanced Firing, Roelfsema, 2023) would perform better or worse at explaining the data.

      We thank the reviewer for these insightful and constructive comments, which have prompted additional analyses that we believe substantially strengthen our work. The reviewer raises two main points: (1) the need for a benchmark to assess our model’s performance, and (2) the concern that the relationship between model synchrony and behavior might be a non-causal “confound” of the visual features. We address each point below.

      (1) Benchmarking model performance

      We agree that it is important to assess how well our model performs relative to the data and included this in the original manuscript. We did not predefine an absolute good fit threshold because absolute agreement depends on irreducible noise and inter-subject variability, making a universal cutoff arbitrary. Instead, we had benchmarked model performance in two complementary ways. First, the noise ceiling shown in Figure 5 provides an empirical benchmark for the maximum fit any model could achieve on our data. Simulated Arnold tongues (based on synchrony) approach this ceiling achieving 89% of possible similarity for correlation and 79% of possible similarity for weighted Jaccard similarity, respectively. Second, the parameter sweep (Figure 3) situates our model’s performance within the broader parameter space. It shows that the model, whose key parameters were fixed a priori from independent macaque neurophysiological data, lies close to the optimal regime for explaining the human data. It also provides an estimate of the lower bound (worst-performing point) on the fit that a misspecified model implementing the identical mechanism would achieve. Our model with fixed a priori parameters does 1.41 times better than a misspecified model for the correlation fit metric and 3 times better for weighted Jaccard similarity.

      (2) Synchrony as mechanism vs. potential confound

      We appreciate the reviewer’s suggestion to test whether synchrony explains behavior beyond stimulus features. In our framework, synchrony is a near-deterministic function of the manipulated stimulus features given fixed model parameters. As a result, synchrony and the stimulus features are collinear (R<sup>2</sup>≈0.8) leaving no independent variance for synchrony to explain once stimulus features are included. Adding both into one statistical model yields unstable coefficients and no out-of-sample improvement.

      Mechanistically, we believe the relevant question is not whether synchrony explains behavior beyond stimulus features but whether synchrony is the correct transformation of the stimulus features to reproduce the behavioral pattern. Please note that in our design we ensured that mean contrast and luminance are identical in the figure and the background such that there are not more high-contrast Gabors in the figure than in the background. We did this with the aim to render mean contrast not a relevant feature. However, there are more high-contrast Gabors in the background, and it is conceivable that the absence of such high contrasts in the figure drives the detection/discrimination of the figure. We therefore agree that testing alternative models would further clarify the unique explanatory value of the synchrony mechanism. To that end, we derived two alternative rate-based readouts from the same V1 simulations of our model from which we derived synchrony. First, average firing rates inside the figure and second, the difference between average firing rates inside the figure and average firing rates in the background (rate difference). We analyzed each individually as predictors of behavior and performed a model comparison based on out-of-sample predictions. While rate difference (but not average firing) showed meaningful associations with performance when considered alone, the synchrony readout had a larger effect size and was favored by the model comparison. We added a new subsection comparing synchrony to rate-based alternatives in the Results (paragraphs 7-9), including additional Bayesian analyses and LOO-CV model comparison. Please note that the model comparison we added to the manuscript provides an additional benchmark beyond the map-level ceiling analysis. It indicates that the mapping from stimulus features to behavior via synchrony generalizes best without requiring an a priori good-fit threshold.

      We agree that formally comparing our model to a sophisticated rate-based alternative, such as an instantiation of the Binding by Enhanced Firing model, is an important direction for future work. However, it remains an open and non-trivial question whether such a model could quantitatively reproduce the precise shape of the behavioral Arnold tongue that emerges from the systematic manipulation of our stimulus parameters. Implementing and parameterizing such a model in a comparable, biologically grounded framework is a substantial undertaking that lies beyond the scope of the current study. Therefore, our goal here was not to claim exclusivity for synchrony-based mechanisms, but rather to re-evaluate their plausibility by showing that features often seen as limitations (stimulus dependence and frequency heterogeneity) are, in fact, essential characteristics of the TWCO framework that can predict complex behavioral outcomes.

      We would also like to clarify that our stimulus features were derived from theory rather than psychophysical literature. Starting from the principles of TWCO, we mapped frequency detuning and coupling strength onto known anatomical and physiological properties of early visual cortex, and only then derived the corresponding stimulus manipulations (contrast heterogeneity and grid coarseness). Demonstrating that these features predict behavior is therefore not trivial but constitutes a first empirical confirmation that the core TWCO variables match perception.

      Apart from adding analyses of additional rate-based readouts of our model, we also refined our discussion of the relationship between these and a synchrony-based mechanism.

      Reviewer #2 (Public review):

      The authors aimed to investigate whether gamma synchrony serves a functional role in figure-ground perception. They specifically sought to test whether the stimulus-dependence of gamma synchrony, often considered a limitation, actually facilitates perceptual grouping. Using the theory of weakly coupled oscillators (TWCO), they developed a framework wherein synchronization depends on both frequency detuning (related to contrast heterogeneity) and coupling strength (related to proximity between visual elements). Through psychophysical experiments with texture discrimination tasks and computational modeling, they tested whether human performance follows patterns predicted by TWCO and whether perceptual learning enhances synchrony-based grouping.

      We thank the reviewer for their thoughtful and constructive review. We believe the comments have served to improve our work.

      Strengths:

      (1) The theoretical framework connecting TWCO to visual perception is innovative and well-articulated, providing a potential mechanistic explanation for how gamma synchrony might contribute to both feature binding and separation.

      (2) The methodology combines psychophysical measurements with computational modeling, with a solid quantitative agreement between model predictions and human performance.

      (3) In particular, the demonstration that coupling strengths can be modified through experience is remarkable and suggests gamma synchrony could be an adaptable mechanism that improves with visual learning.

      (4) The cross-validation approach, wherein model parameters derived from macaque neurophysiology successfully predict human performance, strengthens the biological plausibility of the framework.

      Weaknesses:

      (1) The highly controlled stimuli are far removed from natural scenes, raising questions about generalisability. But, of course, control (almost) excludes ecological validity. The study does not address the challenges of natural vision or leverage the rich statistical structure afforded by natural scenes.

      We agree with the reviewer that the insights of the present study are limited to texture stimuli and have made adjustments in the Discussion (final two paragraphs) to avoid claiming generalizability to natural stimuli. We have also adjusted the title to specifically limit our results to texture stimuli. To establish the principles of TWCO, we needed tight control over the stimulus, but are intrigued by the idea to investigate natural scenes. We have added to our Discussion (paragraph 9) that future should evaluate to what extent the principles we investigate here apply to natural scenes. Synchrony-based mechanisms have been successfully used for image segmentation tasks in machine vision, showing that the proposed mechanism can in principle work for natural scenes.

      (2) The experimental design appears primarily confirmatory rather than attempting to challenge the TWCO framework or test boundary conditions where it might fail.

      We thank the reviewer for this important point. Our primary motivation was to address the neurophysiological properties of gamma synchrony that have been suggested to severely challenge the binding by synchrony mechanism. Particularly the strong dependence of gamma oscillations and synchrony on stimulus features. Our goal was to show that from the perspective of TWCO, these challenges become expected components of the mechanism. In essence, we wanted to promote a conceptual shift that converts what pushes a theory to its limit into something that is actually its central tenet. To facilitate this shift, we designed the experiment to directly test this core tenet.

      While our approach was designed to test a central prediction of TWCO rather than explicitly challenge its boundaries, we respectfully argue that it was far from a simple confirmatory experiment. The design incorporated high-risk elements that provided considerable room for both the theory and our model to fail. First, the core prediction itself was non-obvious and highly specific. We did not simply test whether contrast heterogeneity and grid coarseness affect perception. We tested the stronger hypothesis that they would reflect a specific, interactive trade-off (the behavioral Arnold tongue) as specified by TWCO. Second, our modeling approach was deliberately constrained to provide a further stringent test. We did not post-hoc optimize the model's key parameters to fit our behavioral data. Instead, we fixed them a priori based on independent neurophysiological data from macaques. This was a high-risk choice, as a mismatch between a priori model predictions and the human data would have seriously challenged the framework's generalizability.

      We agree that future research should further challenge TWCO. For instance, by using stimuli that require segregating several objects simultaneously or objects that cover more extensive regions of the visual field.

      (3) Alternative explanations for the observed behavioral effects are not thoroughly explored. While the model provides a good fit to the data, this does not conclusively prove that gamma synchrony is the actual mechanism underlying the observed effects.

      We agree that our results do not conclusively show that gamma synchrony is the actual mechanism underlying figure-ground segregation. We admit that the original phrasing used throughout the manuscript was too strong and gave the impression that we wanted to establish exactly that. However, the goal of our work was only to reinvigorate gamma synchrony as a potential contender by showing that features often cited as limitations of synchrony-based binding may in fact be essential properties of the mechanism. We have revised the title and made adjustments throughout the manuscript to better reflect this more moderate goal.

      Additionally, we added tests of alternatives (Results, paragraphs 7–9) to clarify the unique explanatory value of the synchrony mechanism. To that end, we derived two alternative rate-based readouts from the same V1 simulations of our model. First, we extracted average firing rates inside the figure. Second, we computed the difference between average firing rates inside the figure and average firing rates in the background (rate difference). We analyzed each individually as predictors of behavior and performed a model comparison between these two and synchrony based on out-of-sample predictions. While the rate difference (but not average firing) showed meaningful associations with performance when considered alone, the synchrony readout had a larger effect size and was favored by the model comparison.

      (4) Direct neurophysiological evidence linking the observed behavioral effects to gamma synchrony in humans is absent, creating a gap between the model and the neural mechanism.

      We agree that the model only provides a how-possibly account linking stimulus features to performance. Showing that the brain actually relies on this mechanism would require showing that cortical synchrony mediates the effect of stimulus features on behavior beyond firing rates. Collecting such data would constitute a major effort that would go beyond the scope of this study. We acknowledge the need for electrophysiological data and the mediation analysis in the updated Discussion.

      Achievement of Aims and Support for Conclusions:

      The authors largely achieved their primary aim of demonstrating that human figure-ground perception follows patterns predicted by TWCO principles. Their psychophysical results reveal a behavioral "Arnold tongue" that matches the synchronization patterns predicted by their model, and their learning experiment shows that perceptual improvements correlate with predicted increases in synchrony.

      The evidence supports their conclusion that gamma synchrony could serve as a viable neural grouping mechanism for figure-ground segregation. However, the conclusion that "stimulus-dependence of gamma synchrony is adaptable to the statistics of visual experiences" is only partially supported, as the study uses highly controlled artificial stimuli rather than naturalistic visual statistics, or shows a sensitivity to the structure of experience.

      Likely Impact and Utility:

      This work offers a fresh perspective on the functional role of gamma oscillations in visual perception. The integration of TWCO with perceptual learning provides a novel theoretical framework that could influence future research on neural synchrony.

      The computational model, with parameters derived from neurophysiological data, offers a useful tool for predicting perceptual performance based on synchronization principles. This approach might be extended to study other perceptual phenomena and could inspire designs for artificial vision systems.

      The learning component of the study may have a particular impact, as it suggests a mechanism by which perceptual expertise develops through modified coupling between neural assemblies. This could influence thinking about perceptual learning more broadly, but also raises questions about the underlying mechanism that the paper does not address.

      Additional Context:

      Historically, the functional significance of gamma oscillations has been debated, with early theories of temporal binding giving way to skepticism based on gamma's stimulus-dependence. This study reframes this debate by suggesting that stimulus-dependence is exactly what makes gamma useful for perceptual grouping.

      The successful combination of computational neuroscience and psychophysics is a significant strength of this study.

      The field would benefit from future work extending (if possible) these findings to more naturalistic stimuli and directly measuring neural activity during perceptual tasks. Additionally, studies comparing predictions from synchrony-based models against alternative mechanisms would help establish the specificity of the proposed framework.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In a joint discussion to integrate the peer reviews and agree on the eLife recommendations, both reviewers agreed that the work is valuable, but they were on the fence about whether the strength of evidence was incomplete or solid, eventually settling on incomplete. The reviewers make several recommendations for improving these ratings, which I (Reviewing Editor) have organised into 3 points below, with point 1 of particular importance. Underneath the summary, please see the individual recommendations of the reviewers.

      (1) Strengthen evidence for the unique role of gamma synchrony in explaining the data, and ensuring claims are directly supported by relevant data:

      Reviewers 2 and 3 both note the lack of direct evidence for gamma involvement, and reviewer 2 observes that the fit with behaviour may trivially be explained by a relationship between contrast heterogeneity and grid coarseness without need for oscillation. The reviewers felt that the approach of fitting the model to human data could be strengthened to help address this issue - and they offer various solutions, e.g., more principled a-priori criteria around good vs bad fit of the model to both main task and training data, and comparison to alternative binding models (Reviewer 2), identifying and testing boundary conditions of the model (Reviewer 3). There is also the possibility of collecting direct human neurophysiological evidence linking the behavioural data to neural mechanisms. Our discussion also highlighted the need to weaken claims (including in the title) where links are not directly demonstrated by methods from the present study, e.g., resting on indirect comparisons to primate literature.

      We agree with the editor and reviewers that this was a critical point. To address it, we have made several major revisions.

      As suggested, we have weakened claims where the links are not directly demonstrated by our data. The title has been revised to be more specific, and we have carefully edited the abstract, introduction, and discussion to distinguish between our model's predictions and direct neurophysiological evidence.

      To address the concern that our model's fit might be trivially explained by visual features, we have performed a new analysis comparing the synchrony-based readout to two alternative rate-based readouts from the same V1 simulations. This new comparison shows that the synchrony readout provides a superior out-of-sample prediction of human behavior.

      While a full implementation of a competing theory like "Binding by Enhanced Firing" would be a valuable next step, we note that parameterizing such a model in a comparably grounded framework is a substantial undertaking beyond the scope of the present study. Our new analysis provides an important first step in this direction.

      (2) Make explicit and address the limitations of the stimuli:

      Include that the model is not extracting the figure from the background, and the controlled stimuli may limit generalizability.

      To address the concern that our model was not performing true figure-ground extraction, we performed a new set of simulations that included both the figure and the immediate background. The results confirm that synchrony dynamics within the figure region are not affected by the presence of the background. We added these validation results as supplementary materials. We have additionally made the modeling choice and its justification more explicit in the Results and Methods sections.

      We have revised the Discussion to be more explicit about the limitations of using highly controlled texture stimuli. We now clearly state that our findings are specific to this context and that further research is required to determine if these principles generalize to the segregation of objects in natural scenes.

      (3) Some clarifications to make more accessible:

      Include the figure explaining the framework (Reviewers 1&2), and also the model details (Reviewer 2).

      We have revised Figure 1 and its caption to more clearly illustrate the links from TWCO principles to their neural implementation in V1 and the resulting behavioral predictions.

      We have expanded the Methods section to provide a more detailed and accessible description of the model's construction. We now clarify precisely how the oscillator grid was defined in visual space, how eccentricity-dependent receptive field sizes were implemented, and how these were mapped onto a retinotopic cortical surface to determine coupling strengths.

      Reviewer #1 (Recommendations for the authors):

      (A) Major concerns:

      (1) My main concern:

      My main concern is the repeated claims that the observed findings can be attributed to gamma synchrony in the early visual cortex. I find this claim misleading as the authors do not report any electrophysiological data that directly supports such claims. As stated in my public review, I feel that the authors should be clear about direct evidence versus more abstract inferences based on the literature.

      In particular, I recommend changing claims about "gamma synchrony" to "Binding by Synchrony" That being said, the authors can outline that the model was built under the assumption that this synchrony is mediated by gamma in early visual cortex, but I don't think it should be part of their main conclusions.

      We appreciate that TWCO’s general principles are frequency-agnostic and can be viewed as binding by synchrony in a broad sense. Our work, however, specifically instantiates these principles in V1 gamma: the model reflects TWCO dynamics together with V1 anatomy/physiology and the well-established contrast–frequency relationship in the gamma range (which, to our knowledge, has not been demonstrated with comparable specificity for other bands). In that sense, it is a gamma oscillator model of V1, rather than a generic BBS instantiation. Moreover, stimulus dependencies often cited as challenges to BBS have been used in particular to argue against gamma; showing that these very dependencies are integral to the TWCO mechanism is central to our contribution, and we therefore keep our conclusions focused on the gamma-specific instantiation tested here.

      (2) Mediation of the observed effects by the visual features of the figure:

      The authors motivate the hypothesis that BBS predicts that the perception of texture-defined objects depends on the density of texture elements and their contrast heterogeneity. This hypothesis seems trivial as those are the features that distinguish figure from ground. I think it would be important to clarify how this hypothesis is unique to BBS and not explained by competing theories, such as Binding by Enhanced Firing (Roelfsema, 2023). The authors should be clear about what part of the hypothesis is not trivial based on the task and clearly attributable to oscillators and synchrony.

      Our stimulus features were derived from theory rather than psychophysical literature. Starting from the principles of TWCO, we mapped frequency detuning and coupling strength onto known anatomical and physiological properties of early visual cortex, and only then derived the corresponding stimulus manipulations (contrast heterogeneity and grid coarseness). We agree that grid coarseness (element distance) is an established facilitator of figure–ground perception. By contrast, contrast heterogeneity (feature variance) is less commonly emphasized as a figure–ground cue, compared to mean-based cues, but follows directly from TWCO’s frequency detuning. Importantly, mean contrast and luminance were matched exactly between figure and background in our stimuli. Demonstrating that contrast heterogeneity and grid coarseness not only independently affect figure-ground perception, but reflect a trade-off where higher heterogeneity needs to counteracted by reduced grid coarseness in the way TWCO specifies is therefore non-obvious and provides an initial empirical indication that the core TWCO variables might shape perception. We also agree that alternative models would further clarify the unique explanatory value of synchrony. In the revised manuscript, we compare rate-based readouts (mean figure rate; figure–background rate difference) with the synchrony readout from the same simulations. Rate difference indeed constitutes a predictor of performance, but the synchrony readout showed a larger effect and was preferred by out-of-sample model comparison.

      Using a linear model, the authors assess the relationship between discrimination accuracy and synchrony. Did the authors also include the factors grid coarseness and contrast heterogeneity in this model? Again, as both the task performance (as shown by the GEE analysis) and oscillatory synchrony depend on these features, the relationship between model and behavioral performance will be mediated by the visual features.

      Thank you for raising this. In our framework, detuning (via contrast heterogeneity) and coupling (via grid coarseness) are the inputs, synchrony is the proposed mechanistic mediator, and behavior is the output. Because synchrony in our model is a (near-)deterministic function of the manipulated features under fixed parameters, a joint features+synchrony regression is statistically ill-posed (perfect multicollinearity up to numerical error) and cannot add information. A proper mediation test would require trial-wise neural measurements of synchrony in the same task, which we do not have and acknowledge as a limitation in the Discussion. Accordingly, we show that both the features themselves (reflecting TWCO principles) and model-derived synchrony (realizing the proposed pathway) account for behavior.

      We agree this does not establish a unique contribution of synchrony. To probe alternatives, we added rate-based readouts and a model comparison to the revised manuscript. These additional analyses indicate that synchrony outperforms simple rate-based mappings. We do not claim this rules out more sophisticated rate-based mechanisms. Our aim is to demonstrate that synchrony is a viable, behaviorally informative readout for downstream processing. We do not assert it is the only mechanism the brain uses. Synchrony had been discounted due to its stimulus dependence; our results are intended to rule it back in. We have made changes throughout the manuscript to better reflect this more modest aim.

      (3) Goodness of fit measures are not established a prior:

      I have described this concern in my public review. It is hard to assess what the authors would have interpreted as a good or a bad fit, especially without accounting for the confound in the relationship between oscillator synchrony and behavior. Similarly, when assessing the similarity between the behavioral and dynamic Arnold Tongues across different coupling parameters, the authors found that the chosen parameters (based on macaque data) were not optimal. They offer the explanation that the human cortex has a lower coupling decay than the macaque cortex, and the similarity is higher for lower values of coupling decay. While this explanation is not entirely implausible, it is unclear where an oscillator model with human values would be in the presented plot, as the authors didn't estimate those values from the human studies. Moreover, the task used in the Lowet et al., 2017 paper is very different from the task presented here, which could also account for differences. Overall, the explanation appears hand-wavy considering the lack of empirically defined goodness of fit measures.

      Thank you for these concerns.

      We did indeed not provide a priori thresholds for what would be considered good fit. Instead, we used two complementary benchmarks; namely noise ceilings and parameter exploration. The former provides an upper bound on what any model (not just ours but based on completely different mechanisms) could achieve given our data. The parameter sweep provides an indication how well our concrete model can maximally fit the data and how bad it can be based on possible parameters. These benchmarks are more informative than a fixed a-priori cutoff, which would depend on unknown noise and inter-subject variability. Both the noise ceiling and the parameter exploration indicate that our model, using a priori fixed parameters, performs well. Additionally, we redid all our statistical analyses after z-normalizing every predictor to provide easier interpretation of effect sizes.

      Regarding the reason that key model parameters were not optimal, we believe our interpretation to be plausible. We agree that we currently do not have data to estimate the exact human decay factor and hence cannot establish how much model fit would be affected. However, the parameter exploration in Figure 3 shows that small to modest reductions in decay would improve model fit. We discuss this now in the revised manuscript.

      The reviewer’s suggestion is intriguing. While Lowet et al. (2017) used a different task, the parameters we took from their work (decay rate and maximum coupling) are intended to reflect anatomical properties and thus should not be task-dependent. That said, Lowet et al. ‘s data carry uncertainty, so our estimates may not be exact; we note this explicitly in the revised Discussion. Whether a different task would have yielded better parameter estimates is difficult to determine, but we considered Lowet’s paradigm appropriate because it was designed to target the same V1 anatomical and physiological properties that map onto TWCO.

      I have concerns about a similar confound in the training effects. If I'm not mistaken, the Hebbian Learning rule encourages synchronization between the oscillators in the grid. As such, it causes synchronization to increase over several simulations. Clearly, the task performance of the participants also improves over the sessions. Again, an empirical threshold would be required to assess whether the similarity in learning between model and performance goes beyond what is expected based on learning alone. How much of these effects can be attributed to the model being oscillatory?

      The reviewer is correct that, in our framework, learning operates via changes in coupling that increase synchrony. Enhanced synchrony is the proposed (and in our model also the actual) pathway by which learning impacts behavior. We agree that learning could, in principle, act through pathways other than synchrony. Demonstrating this would not be achieved by a mediation analysis here, because that requires independent, trial-level neural measurements of the candidate pathways (synchrony and alternatives). In the absence of such data, the appropriate approach would be model comparison between competing mechanistic readouts. We have added such a model comparison for a synchrony readout versus two rate-based readouts derived from the same simulations for the first session; i.e., focusing on the pathway from stimulus features to behavior. However, a similar model comparison is not possible for learning. As we show in the supplementary materials, rate-based readouts of our V1 model are not at all affected by coupling strength. As such, they are insensitive to changes in coupling and are thus not viable as alternative mechanisms to explain performance changes due to learning. A fair test of rate-based alternatives would require building a detailed rate-based figure–ground segregation model that predicts session-wise changes. We agree that this is an important next step but it is also substantial undertaking beyond the scope of the present study.

      (4) Similarly, for the comparison of the Arnold Tongue in the transfer session and the early session:

      In the first part of the Results section, it says: "Our model rests on the assumption that learning-induced structural changes in early visual cortex are specific to the retinotopic locations of the trained stimuli. We evaluated whether this assumption holds for our human participants using the transfer session following the main training period. [...] If learning is indeed local, participants' performance in the transfer session should resemble that of early training sessions, indicating a reset in performance for the new retinal location."

      The authors find that a model fit to session 3 explains the data in the transfer session best and consider this as evidence for the above-stated expectation. Again, it is unclear where the cutoff would have been for a session to be declared as early or late. For instance, had the participants only performed 4 sessions, would the performance be best explained by session 3 or session 1?

      A high number of statistical tests are used, which, firstly, need to be corrected for multiple comparisons (did the authors do this?). Secondly, I feel that the regression models could be improved. For instance, the authors fit one model per session and then assess how well each model explains the variance in the transfer session. I think the authors might want to opt for one model with the regressors contrast heterogeneity, grid coarseness, and session (and their interaction). Using this approach, the authors would still be able to assess which session predicts the data best. Similarly, interindividual variability could be accounted for by adding participant-specific random effects to the model (and using a mixed model), instead of fitting individual models per participant.

      We agree the “early vs late” cutoff was underspecified. In the revision, we predefine Session 2 as the early-learning reference, excluding Session 1 to avoid familiarization/response–mapping effects. We then fit a single Bayesian hierarchical model with contrast heterogeneity, grid coarseness, and session, plus a transfer indicator, and participant-level random effects. This allows us to place the transfer session on the same scale as training and to test a) whether the transfer session precedes the state in session 2 via the posterior contrast P(βtransfer<βSess2) and b) whether it is indistinguishable from the state in session two using an equivalence test derived from the fitted model. We find that the transfer session is equivalent to session 2. We added this updated analysis of the transfer session in the Results (paragraph 15).

      In response to the suggestion to use a hierarchical regression model for analyzing the transfer session, we have decided to use such a model for all our analyses in a Bayesian framework. In this Bayesian framework, inference is based on the joint posterior (credible intervals/equivalence) of all predictors in a model and additional post-hoc multiplicity corrections are not required.

      (5) Questions regarding the model:

      What does it mean that the grid was "defined in visual space"? How biologically plausible with regard to the retinotopy and organization of the oscillators do the authors claim the model to be?

      We are happy to clarify this point. We have a total of 400 oscillators reflecting neural assemblies in V1. We start by defining a regular, 20x20, grid of the receptive field (RF) centers of these oscillators inside the figure region. Each oscillator is then also assigned a RF size based on the eccentricity of its RF center. We use the threshold-linear relationship between RF eccentricity and RF size reported in [1] to assign RF sizes. Each oscillator thus has an individual, eccentricity-dependent, RF size.

      For the coupling between oscillators, we need to know their cortical distances. We obtain these by first determining the cortical location of each oscillator through a complex-logarithmic topographic mapping of neuronal receptive field coordinates onto the cortical surface [2,3]. For this mapping, we use human parameter values estimated by [4]. From these cortical locations, we then compute pairwise Euclidean distances.

      The model thus captures realistic retinotopy, eccentricity-dependent RF sizes, and distance-dependent coupling on the cortical surface. We have adjusted our Methods to make these steps clearer.

      (1) Freeman, J., & Simoncelli, E. P. (2011). Metamers of the ventral stream. Nature neuroscience, 14(9), 1195-1201.

      (2) Balasubramanian, M., & Schwartz, E. L. (2002). The isomap algorithm and topological stability. Science, 295(5552), 7. https://doi.org/10.1126/science.1066234

      (3) Schwartz, E. L. (1980). Computational anatomy and functional architecture of striate cortex: a spatial mapping approach to perceptual coding. Vision Research, 20(8), 645–669. http://www.sciencedirect.com/science/article/pii/0042698980900905

      (4) Polimeni, J. R., Hinds, O. P., Balasubramanian, M., van der Kouwe, A. J. W., Wald, L. L., Dale, A. M., & Schwartz, E. L. (2005). Two-dimensional mathematical structure of the human visuotopic map complex in V1, V2, and V3 measured via fMRI at 3 and 7 Tesla. Journal of Vision, 5(8), 898. https://doi.org/10.1167/5.8.898

      Similarly, do the authors claim that each gabor annuli stimulates a single receptive field in V1?

      We hope that with the additional explanation above, it is clearer that there is not a one-to-one mapping. Each oscillator samples the local image by pooling over all Gabor annuli that overlap its receptive field (partially or fully) and computes the average contrast within its RF. Conversely, a single annulus typically overlaps multiple RFs and contributes to each in proportion to the overlap.

      I am unsure how the oscillators were organized, if not retinotopically. How is the retinotopic input fed into the non-retinotopically arranged oscillators?

      We hope that with the additional explanation above, it is clearer that the network is strictly retinotopic.

      The frequency of each oscillator changes according to ω=2πv with ν=25+0.25C. How were the values for the linear regression in v chosen? Reference?

      The slope and intercept parameters for this equation were first reported in [5]. We added the reference to the Methods.

      (5) Lowet, E., Roberts, M., Hadjipapas, A., Peter, A., van der Eerden, J., & De Weerd, P. (2015). Input-dependent frequency modulation of cortical gamma oscillations shapes spatial synchronization and enables phase coding. PLoS computational biology, 11(2), e1004072.

      (6) Hebbian Learning Rule:

      I am confused about how the effective learning rate E= ∈t is calculated. It is said that it is estimated based on the similarity between the second experimental session and the distribution of synchrony after letting the model learn. How can the model learn without knowing epsilon and t?

      We agree with the reviewer that our procedure to estimate the effective learning rate requires further clarification. We performed a nested grid search. Essentially, we let the model learn between session 1 and 2 with each of 25 candidate effective learning rates and evaluate how well each of them allow the model to fit performance in session 2. We then select the best effective learning rate and create a new, smaller, grid around this value and repeat that procedure. In total we perform 5 nested grids to arrive at the final effective learning rate. We expanded the explanation in the Methods.

      (B) Minor concerns:

      (1) Small N: 2/3 of the studies that were cited to justify the small sample were notably different from the current experiment, i.e., Intoy 2020 is an eye movement task, Lange 2020 is a memory task (Tesileanu 2020 is more similar). I think a power analysis would be great to support, as the sample size seems quite low

      Our study uses a within-subject design with ~750 trials per session (≈6,000 total) per participant, analyzed with a hierarchical model that pools information across trials and participants. To assess adequacy, we ran a simulation-based design analysis using the fitted hierarchical model (i.e., post hoc, based on the observed variance components). This analysis indicated a detection probability >90% for all key effects. We now report the results of this design analysis in the (Supplementary Table 1) and note this in the Results (paragraph 1).

      Regarding the literature context, we agree the cited studies are not identical to ours; we referenced them to illustrate a common practice (small N with many trials) when targeting low-level, early-visual mechanisms. Intoy (pattern/contrast sensitivity) and Lange (perceptual learning in early vision) share that focus, while Tesileanu is methodologically closest.

      (2) Figure 1 could be more informative and better described in the text. The authors often don't refer to the panels in Figure 1. Maybe it would help to swap a and b to describe the Arnold tongue first? It might also be a good idea to add the coupling strength and frequency detuning axes

      We have swapped panels a and b and now refer to each panel in the main text to enhance clarity.

      (3) Values of rho (distance - is this degrees visual angle)? Do the authors assume that the size of the stimuli corresponds to receptive fields in V1? If so, how is this justified?

      The center-to-center distance between any pair of neighboring annuli is indeed expressed in degrees of visual angle. Rho is a scaling factor for this distance. With rho=1, the center-to-center distance corresponds to the diameter of the annuli; i.e., they touch but do not overlap each other. We do not assume any relation between the size of receptive fields and the size of the annuli. Receptive field sizes in our model are purely determined by their eccentricity and each oscillator can have several annuli within its receptive field while each annulus can fall within several overlapping receptive fields of different oscillators. We believe that the schematic illustration in Figure 1 might have given the impression that each oscillator sees exactly one annulus and added a note that this is not the case and merely an oversimplification to illustrate the relationship between contrast and intrinsic frequency.

      (4) Some equations are embedded in the text, and some are not. It might be easier to find the respective equation if they all have an index. For instance, the authors mention the psychometric function that relates model synchrony and performance in the results section. It would be easier to find if it had an index that the authors could refer to.

      We moved this equation as well as the contrast intrinsic frequency mapping from inline to displayed and numbered them.

      (5) Is there a reference for "Our model rests on the assumption that learning-induced structural changes in early visual cortex are specific to the retinotopic locations of the trained stimuli"? (If so, it should be cited.)

      We added references supporting this assumption.

      (6) Figure 2b: colorbar missing label.

      We added the label.

      Reviewer #2 (Recommendations for the authors):

      Cool work!

      (1) The reader would benefit from (a single) comprehensive figure that visually explains the entire conceptual framework-from TWCO principles to neural implementation to behavioural predictions-accessible to readers without specialised knowledge of oscillatory dynamics. This will give the paper a greater impact.

      We have adjusted Figure 1 in accordance with suggestions made by reviewer 1 and added further explanations to the caption and the Introduction to enhance clarity on how the principles of TWCO relate to neural implementation.

      (2) I think this paper would benefit from the audience eLife provides, but the paper could move closer to the audience.

      (3) Pride comes before the fall, but I am not the most uninformed reader, and it took me some effort to process everything.

      Thank you, we took this to heart. In the Introduction, we now state more explicitly how each variable is operationalized and how these map onto TWCO with improved reference to relevant panels in the schematic figure. We agree the framework is conceptually dense. TWCO principles reach the stimuli through specific V1 anatomy and physiology, so there are several links to keep in mind. Our goal with the revised introduction and figure is to make those links better visible.

      (4) You could consider discussing potential implications for understanding perceptual disorders characterized by altered neural synchrony (e.g., schizophrenia, autism) and how your learning paradigm might inform perceptual training interventions.

      Thank you for this suggestion. We have added that TWCO might provide a new lens to study perceptual disorders to the Discussion. We provide a concrete example of the relation between grouping, gamma synchrony (in light of TWCO) and lateral connectivity in schizophrenia

      (5) I think this paper has real strength, but rather than dispersing limitations throughout the discussion, create a dedicated section that systematically addresses ecological validity, alternative explanations, and generalisability concerns. This will also preempt criticism.

      We appreciate the suggestion. Our preference is to discuss limitations in context, next to the specific results they qualify, so readers see why each limitation matters and how it affects interpretation. Nevertheless, paragraph 7 on page 20 summarizes most limitations in a single paragraph.

    1. It’s important for educators to have a sense of what race and ethnicity are due to our potential for subconscious racial biases as teachers of MLs.1 While some MLs and their educators may share a common racial or ethnic identity, many do not. As white educators ourselves who have been granted many unearned privileges, we (the book authors) must become aware of and reflect on what these biases and privileges might mean for our practice as teachers. No matter what our racial identity and ethnicity, all of us need to approach this work with humility.

      This passage emphasizes the need for students to reflect on their own biases and identities. I think this connects strongly to culturally responsive teaching because educators must be aware of how their perspectives influence their teaching practices. Reflection and humility allow teachers to create more equitable learning environments for multilingual learners. This makes me think about how ongoing professional development could support teachers in recognizing and addressing these biases.

    1. For an example of public shaming, we can look at late-night TV host Jimmy Kimmel’s annual Halloween prank, where he has parents film their children as they tell the parents tell the children that the parents ate all the kids’ Halloween candy. Parents post these videos online, where viewers are intended to laugh at the distress, despair, and sense of betrayal the children express. I will not link to these videos which I find horrible, but instead link you to these articles:

      I think that children often find distress in many thing that don't warrant it and it may be humorous to so to see them worry about things that aren't serious but I personally don't like this Jimmy Kimmel prank. The intention of the adults' here is to cause distress to kid for laughter alone and thats not fair or kind and it shouldn't be okay just because they are children. Posting this sort of content online could also have negative mental and social effects on a kid too.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to elucidate the recruitment order and assembly of the Cdv proteins during Sulfolobus acidocaldarius archaeal cell division using a bottom-up reconstitution approach. They employed liposome-binding assays, EM, and fluorescence microscopy with in vitro reconstitution in dumbbellshaped liposomes to explore how CdvA, CdvB, and the homologues of ESCRT-III proteins (CdvB, CdvB1, and CdvB2) interact to form membrane remodeling complexes.

      The study sought to reconstitute the Cdv machinery by first analyzing their assembly as two subcomplexes: CdvA:CdvB and CdvB1:CdvB2ΔC. The authors report that CdvA binds lipid membranes only in the presence of CdvB and localizes preferentially to membrane necks. Similarly, the findings on CdvB1:CdvB2ΔC indicate that truncation of CdvB2 facilitates filament formation and enhances curvature sensitivity in interaction with CdvB1. Finally, while the authors reconstitute a quaternary CdvA:CdvB:CdvB1:CdvB2 complex and demonstrate its enrichment at membrane necks, the mechanistic details of how these complexes drive membrane remodeling by subcomplexes removal by the proteasome and/or CdvC remain speculative.

      Although the work highlights intriguing similarities with eukaryotic ESCRT-III systems and explores unique archaeal adaptations, the conclusions drawn would benefit from stronger experimental validation and a more comprehensive mechanistic framework.

      Strengths:

      The study of machinery assembly and its involvement in membrane remodeling, particularly using bottom-up reconstituted in vitro systems, presents significant challenges. This is particularly true for systems like the ESCRT-III complex, which localizes uniquely at the lumen of membrane necks prior to scission. The use of dumbbell-shaped liposomes in this study provides a promising experimental model to investigate ESCRT-III and ESCRT-III-like protein activity at membrane necks.

      The authors present intriguing evidence regarding the sequential recruitment of ESCRT-III proteins in crenarchaea-a close relative of eukaryotes. This finding suggests that the hierarchical recruitment characteristic of eukaryotic systems may predate eukaryogenesis, which is a significant and exciting contribution. However, the broader implications of these findings for membrane remodeling mechanisms remain speculative, and the study would benefit from stronger experimental validation and expanded contextualization within the field.

      We thank the Referee for his/her appreciation of our work.

      Weaknesses:

      This manuscript presents several methodological inconsistencies and lacks key controls to validate its claims. Additionally, there is insufficient information about the number of experimental repetitions, statistical analyses, and a broader discussion of the major findings in the context of open questions in the field.

      We have now added more controls, information about repetitions, and discussion.

      Reviewer #2 (Public review):

      Summary:

      The Crenarchaeal Cdv division system represents a reduced form of the universal and ubiquitous ESCRT membrane reverse-topology scission machinery, and therefore a prime candidate for synthetic and reconstitution studies. The work here represents a solid extension of previous work in the field, clarifying the order of recruitment of Cdv proteins to curved membranes.

      Strengths:

      The use of a recently developed approach to produce dumbbell-shaped liposomes (De Franceschi et al. 2022), which allowed the authors to assess recruitment of various Cdv assemblies to curved membranes or membrane necks; reconstitution of a quaternary Cdv complex at a membrane neck.

      We thank the Referee for his/her appreciation of the work.

      Weaknesses:

      The manuscript is a bit light on quantitative detail, across the various figures, and several key controls are missing (CdvA, B alone to better interpret the co-polymerisation phenotypes and establish the true order of recruitment, for example) - addressing this would make the paper much stronger. The authors could also include in the discussion a short paragraph on implications for our understanding of ESCRT function in other contexts and/or in archaeal evolution, as well as a brief exploration of the possible reasons for the discrepancy between the foci observed in their liposome assays and the large rings observed in cells - to better serve the interests of a broad audience.

      We have now added more controls, information about repetitions, and discussion.

      Reviewer #3 (Public review):

      Summary:

      In this report, De Franceschi et al. purify components of the Cdv machinery in archaeon M. sedula and probe their interactions with membrane and with one-another in vitro using two main assays - liposome flotation and fluorescent imaging of encapsulated proteins. This has the potential to add to the field by showing how the order of protein recruitment seen in cells is related to the differential capacity of individual proteins to bind membranes when alone or when combined.

      Strengths:

      Using the floatation assay, they demonstrate that CdvA and CdvB bind liposomes when combined. While CdvB1 also binds liposomes under these conditions, in the floatation assay, CdvB2 lacking its C-terminus is not efficiently recruited to membranes unless CdvAB or CdvB1 are present. The authors then employ a clever liposome assay that generates chained spherical liposomes connected by thin membrane necks, which allows them to accurately control the buffer composition inside and outside of the liposome. With this, they show that all four proteins accumulate in necks of dumbbell-shaped liposomes that mimic the shape of constricting necks in cell division. Taken altogether, these data lead them to propose that Cdv proteins are sequentially recruited to the membrane as has also been suggested by in vivo studies of ESCRT-III dependent cell division in crenarchaea.

      We thank the Referee for his/her appreciation of the work.

      Weaknesses:

      These experiments provide a good starting point for the in vitro study the interaction of Cdv system components with the membrane and their consecutive recruitment. However, several experimental controls are missing that complicate their ability to draw strong conclusions. Moreover, some results are inconsistent across the two main assays which make the findings difficult to interpret:

      (1) Missing controls.

      Various protein mixtures are assessed for their membrane-binding properties in different ways. However, it is difficult to interpret the effect of any specific protein combination, when the same experiment is not presented in a way that includes separate tests for all individual components. In this sense, the paper lacks important controls. For example, Fig 1C is missing the CdvB-only control. The authors remark that CdvB did not polymerise (data not shown) but do not comment on whether it binds membrane in their assays. In the introduction, Samson et al., 2011 is cited as a reference to show that CdvB does not bind membrane. However, here the authors are working with protein from a different organism in a different buffer, using a different membrane composition and a different assay. Given that so many variables are changing, it would be good to present how M. sedula CdvB behaves under these conditions.

      We thank the referee for raising this point. We have now added these data in Figure 1C. Indeed it turns out that CdvB from M. sedula exhibits clear membrane binding on its own in a flotation assay.

      Similarly, there is no data showing how CdvB alone or CdvA alone behave in the dumbbell liposome assay.

      Without these controls, it's impossible to say whether CdvA recruits CdvB or the other way around. The manuscript would be much stronger if such data could be added.

      We have now added these data in Figure 1E, 1F and 1G. Overall, we can confirm that CdvA binds the membrane better in the presence of CdvB (although both proteins can bind the membrane on their own). Both proteins appear to recognize the curved region of the membrane neck.

      (2) Some of the discrepancies in the data generated using different assays are not discussed.

      The authors show that CdvB2∆C binds membrane and localizes to membrane necks in the dumbbell liposome assay, but no membrane binding is detected in the flotation assay. The discrepancy between these results further highlights the need for CdvB-only and CdvA-only controls.

      We have now added these controls in Figure 1. In addition, we would like to clarify that the flotation assay and the SMS dumbbell assay serve different purposes and are not directly comparable in quantitative terms. In the flotation assay, all the protein present as input is eventually recovered and visualized. Thus, quantitative information on the proportion of the fraction of the total protein bound to lipids can be inferred from this assay. The SMS assay, in contrast, provides a very different kind of information. Because of the particular protocol required to generate dumbbells (De Franceschi, 2022), the total amount of protein in the inner buffer in dumbbells is not accurately defined, because protein that is not correctly reconstituted (e.g. which aggregates while still in the droplet phase) will interfere with vesicle generation, with the result that dumbbell with such aggregates is generally not formed in the first place. This renders it impossible to draw any quantitative conclusions about the proportion of the sample bound to lipids. The SMS is therefore not directly comparable to the flotation assay, and it is rather complementary to it. Indeed, the purpose of the SMS is to provide information about curvature selectivity of the protein.

      (3) Validation of the liposome assay.

      The experimental setup to create dumbbell-shaped liposomes seems great and is a clever novel approach pioneered by the team. Not only can the authors manipulate liposome shape, they also state that this allows them to accurately control the species present on the inside and outside of the liposome. Interpreting the results of the liposome assay, however, depends on the geometry being correct. To make this clearer, it would seem important to include controls to prove that all the protein imaged at membrane necks lie on the inside of liposomes. In the images in SFig3 there appears to be protein outside of the liposome. It would also be helpful to present data to show test whether the necks are open, as suggested in the paper, by using FRAP or some other related technique.

      We thank the Referee for his/her appreciation. The proteins are encapsulated inside the liposomes, not outside of them. While Figure S3 might give the appearance that there is some protein outside, this is actually just an imaging artifact. Author response image 1 (below) explains this: When the membrane and protein channel are shown separately, it is clear that the protein cluster that appeared to be ‘outside’ actually colocalizes with an extra small dumbbell lobe (yellow arrowhead). The protein appeared to be outside of it because (1) the protein fluorescent signal is stronger than the signal from the membrane, and (2) there is a certain time delay in the acquisition of the two channels (0.5-1 second), thus the membrane may have slightly shifted out of focus when the fluorescence was being acquired. We are confident that the protein is inside in these dumbbells because the procedure for preparing the dumbbells requires extensive emulsification by pipetting, which requires ≈ 1 minute. This time is more than sufficient for proteins with high affinity for the membrane, like ESCRT and Cdv, to bind the membrane. For an example of how fast binding under confinement can be, please see movie 2 from this paper: De Franceschi N, Alqabandi M, Miguet N, Caillat C, Mangenot S, Weissenhorn W, Bassereau P. The ESCRT protein CHMP2B acts as a diffusion barrier on reconstituted membrane necks. J Cell Sci. 2018 Aug 3;132(4):jcs217968.

      Moreover, in many instances, we observed that the protein is inside because, by increasing the gain in the images post-acquisition, a clear protein signal appear in the lumen (see Author response image 2).

      Author response image 1.

      Separate channels showing colocalization of protein and lipids (adapted from Figure S3). The zoom-in shows separate channels, highlighting that the CdvB2 cluster that seems to be ‘outside the dumbbell’ actually colocalizes with the small terminal lobe of the dumbbell, indicating that the protein is encapsulated within that lobe.

      Author response image 2.

      Residual protein present inside lumen of dumbbells as visualized by increasing the brightness post-acquisition.

      We are not sure what the referee means by “test whether the necks are open, as suggested in the paper”. We are confident that the lobes of dumbbells originated from a single floppy vesicle, and were therefore mutually connected with an open neck (at least at the onset of the experiment). We have performed extensive FRAP assays on dumbbells in previous papers (De Franceschi et al., ACS nano 2022 and De Franceschi et al., Nature Nanotech 2024) which unequivocally proved that these chains of dumbbells are connected with open necks. We now also performed a few FRAP assay with reconstituted Cdv proteins, which confirmed this point. We have added a movie of such an experiment to the manuscript (Movie 1).

      Investigating whether the necks are open or closed after Cdv reconstitution is indeed a very relevant question, that could be rephrased as “verify whether Cdv proteins or their combination can induce membrane scission”. This is however beyond the scope of this manuscript, as the current work merely addressed the question of hierarchical recruitment of Cdv proteins at the membrane. We plan to examine this in future work.

      (4) Quantification of results from the liposome assay.

      The paper would be strengthened by the inclusion of more quantitative data relating to the liposome assay. Firstly, only a single field of view is shown for each condition. Because of this, the reader cannot know whether this is a representative image, or an outlier? Can the authors do some quantification of the data to demonstrate this? The line scan profiles in the supplemental figures would be an example of this, but again in these Figures only a single image is analyzed.

      The images that we showed are indeed representative. The dumbbells that are generated by the SMS approach contain an “internal control”: in each dumbbell, the protein has the option of localizing at the neck or localizing elsewhere in the region of flat membrane. We see consistently that Cdv proteins have a strong preference for localizing at the neck.

      We would recommend that the authors present quantitative data to show the extent of co-localization at the necks in each case. They also need a metric to report instances in which protein is not seen at the neck, e.g. CdvB2 but not CdvB1 in Fig2I, which rules out a simple curvature preference for CdvB2 as stated in line 182.

      While the request for better quantitation is reasonable, this would require carrying out very significant new experiments at the microscope, which is rendered near-impossible since both first authors left the lab on to new positions.

      Secondly, the authors state that they see CdvB2∆C recruited to the membrane by CdvB1 (lines 184-187, Fig 2I). However, this simple conclusion is not borne out in the data. Inspecting the CdvB2∆C panels of Fig 2I, Fig3C, and Fig3D, CdvB2∆C signal can be seen at positions which don't colocalize with other proteins. The authors also observe CdvB2∆C localizing to membrane necks by itself (Fig 2E). Therefore, while CdvB1 and CdvB2∆C colocalize in the flotation assay, there is no strong evidence for CdvB2∆C recruitment by CdvB1 in dumbbells. This is further underscored by the observation that in the presented data, all Cdv proteins always appear to localize at dumbbell necks, irrespective of what other components are present inside the liposome. Although one nice control is presented (ZipA), this suggests that more work is required to be sure that the proteins are behaving properly in this assay. For example, if membrane binding surfaces of Cdv proteins are mutated, does this lead to the accumulation of proteins in the bulk of the liposome as expected?

      In the particular example of Figure 2I, it indeed appears that there are some clusters of CdvB2ΔC that do not contain CdvB1 (we indicated them in Author response image 3 by red arrowheads), while the yellow arrowheads indicate clusters that contain both proteins. It can be clearly seen that the clusters that do contain both proteins (yellow arrows) are localized at necks, while those that only contain CdvB2ΔC (red arrows) are not localized at necks. This is no coincidence. The clusters indicated by the red arrow do contain CdvB1. However, these clusters rapidly diffuse on the membrane plane because they are not fixed at the neck: therefore, they constantly shift in and out of focus. Because there is a time delay in the acquisition of each channel (between 0.5 and 1 second), these cluster were in focus when the CdvB2ΔC signal was being acquired, but sifted out of focus when the CdvB1 signal was being acquired. This implies that the clusters indicated by the yellow arrowheads are stably localized at necks, which is precisely the point we wished to make with this experiment: because Cdv proteins have an affinity for curved geometry, they preferentially and stably localize at necks. Why don’t all the clusters localize at necks then? We estimate that the simple answer is that, in this particular case, there are more clusters than there are necks, so some of the clusters must necessarily localize somewhere else.

      Author response image 3.

      Current Figure 2H, where clusters that are double-positive for both CdvB1 and CdvB2ΔC are indicated by yellow arrowheads, while cluster that apparently only contain CdvB2ΔC are indicated by red arrowheads. It is observed that all the double-positive clusters are localized at necks.

      (5) Rings.

      The authors should comment on why they never observe large Cdv rings in their experiments. In crenarchaeal cell division, CdvA and CdvB have been observed to form large rings in the middle of the 1 micron cell, before constriction. Only in the later stages of division are the ESCRTs localized to the constricting neck, at a time when CdvA is no longer present in the ring. Therefore, if the in vitro assay used by the authors really recapitulated the biology, one would expect to see large CdvAB rings in Figs 1EF. This is ignored in the model. In the proposed model of ring assembly (line 252), CdvAB ring formation is mentioned, but authors do not discuss the fact that they do not observe CdvAB rings - only foci at membrane necks. The discussion section would benefit from the authors commenting on this.

      The referee is correct: it is intriguing that we don’t see micron-sized rings for CdvA and CdvB. We do note that our EM data (Fig.S1) show that CdvA in its own can form rings of about 100-200nm diameter, well below the diffraction limit, that could well correspond to the foci that we optically resolve in Figure 1. We now added a brief comment on this to the manuscript on lines 256-264.

      (6) Stoichiometry

      It is not clear why 100% of the visible CdvA and 100% of the the visible CdvB are shifted to the lipid fraction in 1C. Perhaps this is a matter of quantification. Can the authors comment on the stoichiometry here?

      We agree that this was unclear. Since that particular gel was stained by coumassie, the quantitative signals might be unreliable, and hence we have repeated this experiment using fluorescently labelled proteins, which show indeed a less extreme distribution. This was also done to make the data more uniform, as requested by the referees.

      (7) Significance of quantification of MBP-tagged filaments.

      Authors use tagging and removal of MBP as a convenient, controllable system to trigger polymerisation of various Cdv proteins. However, it is unclear what is the value and significance of reporting the width and length of the short linear filaments that are formed by the MBP-tagged proteins. Presumably they are artefactual assemblies generated by the presence of the tag?

      Providing a measure of the changes induced by MBP removal, in fact, validates that this actually has an effect. But perhaps this places too much emphasis on the short filaments. We now opted for a compromise, removing the quantification of the width and length of short filaments formed by MBPtagged protein from the text, but keeping the supplementary figure showing their distribution as compared to the other filaments (Figure S2E, SF).

      Similar Figure 2C doesn't seem a useful addition to the paper.

      We removed panel 2C, and now merely report these values in the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would suggest the authors perform a deeper discussion about their findings, such as what are the evolutionary implications, how they think lipids from these archaea may affect the recruitment process,...

      Because there is no exact homology between Archaea Cdv proteins and Eukaryotic ESCRT-III proteins, we do not feel our work brings new evolutionary implications beyond what we already state in the manuscript. We also dis not perform experiments using Archaea lipids, thus we would rather not speculate on how they may potentially affect the recruitment of Cdv proteins.

      In general, the manuscript lacks information regarding some scale bars, number of experimental repetitions (n or N), statistical analysis when needed, information about protein concentrations used in their assays.

      We have now added this information in the manuscript.

      Below, I provide a list of comments that I think the authors should address to improve the manuscript:

      (1) Line 113-114: The authors test protein-membrane interactions using flotation assays with positively curved SUV membranes but encapsulate proteins in dumbbell-shaped liposomes with negative curvature at the connecting necks. Might the use of membranes with opposite curvatures affect the recruitment process? Since the proteins are fluorescently labeled, I suggest testing recruitment using flat giant unilamellar vesicles or supported lipid bilayers (with zero curvature) to validate their findings.

      We thank the referee for this suggestion. Please do note that we are not claiming in our paper that Cdv proteins recognize negative curvature. We merely observe that they localize at necks. The neck of a dumbbell exhibits the so-called “catenoid” geometry, which is characterized by having both positive and negative curvature.

      Experimentally, on the SUVs, we now realize there was a mistake in the method section: In the flotation assay we in fact used multilamellar vesicles, not SUVs, precisely for the reason mentioned by the referee. We apologize for the oversight and have now corrected this in the methods. Multilamellar vesicles are not characterized by a strong positive curvature as SUVs do, but we do agree that they likely don’t have negative curvature there either. Because of the heterogeneous nature of the multilamellar vesicles, they provide a binding assay that was rather independent of the curvature. Complementary to the flotation assay, the SMS approach was employed to reveal the curvature preference of proteins.

      Finally, we performed the experiment on large GUVs suggested by the referee using CdvB as an example, but this turned out to be inconclusive because the protein forms clusters: these clusters may be creating local curvature at the nanometer scale, which cannot be resolved by optical microscopy (Author response image 4). This is quite typical for proteins that recognize curvature (cf. for instance: De Franceschi N, Alqabandi M, Miguet N, Caillat C, Mangenot S, Weissenhorn W, Bassereau P. The ESCRT protein CHMP2B acts as a diffusion barrier on reconstituted membrane necks. J Cell Sci. 2018 Aug 3;132(4):jcs217968.)

      Author response image 4.

      Fluorescently labelled CdvB bound to giant unilamellar vesicle. The protein was added in the outer buffer. CdvB forms distinct clusters, which may generate a local region of high membrane curvature.

      (2) Line 138-139: How is His-ZipA binding the membrane? Wouldn't Ni<sup>2+</sup>-NTA lipids be required? If not, how is the binding achieved?

      Indeed, NTA-lipids were present. This is now stated both in the legend and in the methods.

      (3) In the encapsulated protein assays, why does the luminal fluorescence intensity of the encapsulated protein sometimes appear similar to the bulk fluorescence signal? Since only a small fraction of the protein assembles at membrane necks, shouldn't the luminal pool of unbound protein show higher fluorescence intensity inside the liposomes?

      We thank the referee for raising this point and giving us the opportunity to explain this. The reason is that Cdv proteins have a very high affinity for the neck, and when they cluster at the neck the fluorescence intensity of the cluster is many times higher than the background fluorescence. Because we were interested in imaging the clusters and avoiding overexposing them, we adjusted the imaging conditions accordingly, with the result that the fluorescence from both the lumen and the bulk is at very low level.

      By choosing different imaging conditions, however, it can be actually seen that the signal inside the lumen is clearly higher than the bulk: this can be seen for instance in Author response image 2, where the brightness has been properly adjusted.

      (4) Line 184-185: In Fig. 2I, some CdvB2ΔC puncta seem independent of CdvB1 and are not localized at membrane necks. How many such puncta exist? For example, in the provided micrograph, 2 out of 5 clusters are independent of CdvB1. This proportion is significant. Could the authors quantify the prevalence of these structures and discuss why they form?

      We thank the referee for giving us the opportunity to explain this apparent discrepancy. We’ll like to stress the fact that CdvB2ΔC and CdvB1 form an obligate heterodimer: in all our experiments, without exception, we find that they form a strong complex when we mix the two proteins. This is true both in dumbbells and in flotation assays.

      In the particular example of Figure 2I, it indeed appears that there are some clusters of CdvB2ΔC that do not contain CdvB1 (we indicated them in Author response image 3 by red arrowheads), while the yellow arrowheads indicate clusters that contain both proteins. It can be clearly seen that the clusters that do contain both proteins (yellow arrows) are localized at necks, while those that only contain CdvB2ΔC (red arrows) are not localized at necks. This is no coincidence. The clusters indicated by the red arrow do contain CdvB1. However, these clusters rapidly diffuse on the membrane plane because they are not fixed at the neck: therefore, they constantly shift in and out of focus. Because there is a time delay in the acquisition of each channel (between 0.5 and 1 second), these cluster were in focus when the CdvB2ΔC signal was being acquired, but sifted out of focus when the CdvB1 signal was being acquired. This implies that the clusters indicated by the yellow arrowheads are stably localized at necks, which is precisely the point we wished to make with this experiment: because Cdv proteins have affinity for curved geometry, they preferentially and stably localize at necks. Why don’t all the clusters localize at necks then?

      (5) Figure 1E and 1F: Why do lipids accumulate and colocalize with the proteins? How can the authors confirm lumen connectivity between vesicles? Performing FRAP assays could validate protein localization and enrichment at the lumen of the membrane necks.

      At first sight, indeed some lipid enrichment seems to be observed at the neck between lobes of dumbbells.

      This is, however, an imaging artifact due to the fact that the neck is diffraction limited. As shown in the Author response image 5, we are acquiring the membrane signal from both lobes at the neck region, and therefore the signal is roughly double, hence the apparent lipid enrichment.

      Author response image 5.

      Schematic illustrating that the neck between two lobes is smaller than the diffraction limit of optical microscopy (the size of a typical pixel is indicated by the green square). Because of this technical limitation, the fluorescence intensity of the membrane at the neck is twice that of a single membrane.

      The referee is correct in pointing out that these images do not prove that the lobes are connected, and that FRAP assays is the only way to prove this point. However, in previous papers we have confirmed extensively that in chains of dumbbells the lobes are connected:

      - De Franceschi N, Pezeshkian W, Fragasso A, Bruininks BMH, Tsai S, Marrink SJ, Dekker C. Synthetic Membrane Shaper for Controlled Liposome Deformation. ACS Nano. 2022 Nov 28;17(2):966–78. doi: 10.1021/acsnano.2c06125.

      - De Franceschi N, Barth R, Meindlhumer S, Fragasso A, Dekker C. Dynamin A as a one-component division machinery for synthetic cells. Nat Nanotechnol. 2024 Jan;19(1):70-76. doi: 10.1038/s41565023-01510-3.

      Random sticking of liposomes would also generate clusters of vesicles, not linear chains. We now provide also a Movie (Movie 1) supporting this point.

      Investigating whether the necks are open or closed after Cdv reconstitution is indeed a very relevant question, that could be rephrased as “verify whether Cdv proteins or their combination can induce membrane scission”. This is however beyond the scope of this manuscript, as the current work merely addressed the question of hierarchical recruitment of Cdv proteins at the membrane. We plan to examine this in future work.

      (6) Why didn't the authors use the same lipid composition, particularly the same proportion of negatively charged lipids, on the SUVs of the flotation assays and on the dumbbell-shaped liposomes?

      In flotation assays, it is typical to use a relatively large proportion of negatively charged lipids, to promote protein binding. This is because the aim is to maximize membrane coverage by the protein. The SMS procedure to generate dumbbell-shaped GUVs is completely different, however. Rather than covering the membrane with protein, the idea is to reduce the amount of protein to a minimum, so that any curvature preference can be best visualized. This is e.g. routinely done in tube pulling experiments, for the same reason (See for instance Prévost C, Zhao H, Manzi J, Lemichez E, Lappalainen P, Callan-Jones A, Bassereau P. IRSp53 senses negative membrane curvature and phase separates along membrane tubules. Nat Commun. 2015 Oct 15;6:8529. doi: 10.1038/ncomms9529).

      (7) Line 117-119: The suggestion that polymer formation between CdvA and CdvB facilitates membrane recruitment is intriguing. However, fluorescence microscopy experiments could better elucidate whether there is sequential recruitment of CdvB followed by CdvA, or if these proteins form a heteropolymer composite for membrane binding. Can CdvB bind membranes independently, or does this require synergy between CdvA and CdvB.

      We thank the referee for prompting us to perform this experiment. As we now show in Figure 1C, CdvB indeed is able to bind the membrane independently of CdvA. Whether this happens sequentially or simultaneously is an interesting question, but one that is impossible to address with either the SMS or the flotation assay, because in both cases we can only observe the endpoint of the recruitment.

      We would also like to clarify one specific experimental detail. Perhaps unsurprisingly, the results from the flotation assay are dependent on the way the assay is performed. In particular, we observed that the same protein can exhibit a different binding profile depending on whether it is being loaded either at the top or at the bottom of the gradient. This can be seen in Author response image 6. This is counterintuitive, since once the equilibrium is reached, the result should only depend on the density of the sample. We performed an overnight centrifugation (> 16 hours) on a short tube (< 3 cm tall), thus equilibrium is being reached (which is corroborated by the fact that CdvB1 and CdvB2 can float to the top of the gradient within this timespan, as shown in Figure 2C, 2E, 2G). We ascribe the difference between top and bottom loading to the fact that, when the sample is loaded at the bottom, it has to be mixed with a concentrated sucrose solution, while in the case of loading from the top, this is not done.

      In literature, both loading from top and from bottom have been used:

      - Lata S, Schoehn G, Jain A, Pires R, Piehler J, Gottlinger HG, Weissenhorn W. Helical structures of ESCRTIII are disassembled by VPS4. Science. 2008 Sep 5;321(5894):1354-7. doi: 10.1126/science.1161070

      - Moriscot C, Gribaldo S, Jault JM, Krupovic M, Arnaud J, Jamin M, Schoehn G, Forterre P, Weissenhorn W, Renesto P. Crenarchaeal CdvA forms double-helical filaments containing DNA and interacts with ESCRT-III-like CdvB. PLoS One. 2011;6(7):e21921. doi: 10.1371/journal.pone.0021921.

      - Senju Y, Lappalainen P, Zhao H. Liposome Co-sedimentation and Co-flotation Assays to Study LipidProtein Interactions. Methods Mol Biol. 2021;2251:195-204. doi: 10.1007/978-1-0716-1142-5_14. In performing the flotation assay for CdvB1 and CdvB2ΔC, or when using all 4 proteins together, we loaded the sample at the bottom, and we could detect reproducible binding to liposomes (Figures 2D, 2F, 2H, 3A). However, CdvB does not bind the membrane when loaded at the bottom. Thus, for the experiments shown in figure 1C, we loaded the proteins at the top. This experimental setup allowed us to highlight that CdvB indeed induce a stronger interaction between CdvA and the membrane.

      Author response image 6.

      CdvB binding to multilamellar vesicles in a flotation assay. In the left panel, the sample was loaded at the top of the sucrose gradient; in the right panel it was loaded at the bottom.

      (8) Line 165-173: The authors claim that filament curvature differs between CdvB2ΔC alone and the CdvB1:CdvB2ΔC complex. Are these differences statistically significant? What is the sample size (N)? Furthermore, how do the authors confirm interactions between these proteins in the absence of membranes based solely on EM micrographs?

      We can confirm that the filaments are composed by both proteins, because the filaments have different curvature when both proteins are present. However, as requested by referee 3, point (7), we removed the quantification of curvature from panel 2C. We report the N number in the text.

      (9) Line 121-123: Are the authors referring to positive or negative membrane curvatures? The cited literature suggests ESCRT-III proteins either lack curvature preferences (e.g., Snf7, CHMP4B) or prefer high positive curvature (e.g., late ESCRT-III subunits). This is confusing since the authors later test recruitment to negatively curved necks.

      We do not claim that Cdv proteins prefer positive or negative curvature, because the necks present in dumbbells have a catenoid geometry, which include both positive and negative curvature. We have now clarified this in the discussion.

      (10) Since the conclusions rely on the oligomeric state of the proteins, providing SEC-MALS spectra to show the protein oligomeric state right after the purification would strengthen the claims.

      While such SEC-MALDI experiments may be interesting, practical implementation of this is not possible since both first authors left the lab on to new positions.

      (11) Line 157-160: Suppl. Fig. 2 shows only a single EM micrograph of a small filament. Could the authors provide lower magnification images showing more filaments?

      As requested by Referee 3, point (7), we have toned down the importance of these short filaments.

      Also, why are the sample sizes for filament length (N=161) and width (N=129) different?

      Protein filaments formed by Cdv tend to stick to each other side by side, so that for some filaments the width could not be accurately assessed, and accordingly those were removed from the analysis.

      (12) The introduction states that CdvA binds membranes while CdvB does not. However, the results suggest CdvB facilitates membrane binding, helping CdvA attach. This discrepancy needs further explanation.

      We thank the referee for raising this point. We have now performed additional experiments (both SMS assay and flotation assays) showing that indeed CdvB from M. sedula is (unlike CdvB from Sulfolobus) able to bind the membrane on its own (Figure 1C, 1F).

      Reviewer #2 (Recommendations for the authors):

      Best practice would be to show single fluorescence channels in grayscale or inverted grayscale, retaining pseudocolouring only for the merged multichannel image.

      We decided to retain and standardize the colors, both for gels and for microscopy images, in order to have the same color-code for each protein. We believe this improves readability, and this was also a request from Referee 3. Thus, throughout the manuscript, CdvA is in grayscale, CdvB in yellow, CdvB1 in green, CdvB2ΔC in cyan and the membrane in magenta.

      It would be great to include a quantification of liposome curvature vs focal intensity of the various Cdv components - across figures.

      Quantification of liposome curvature at the neck can be done (De Franceschi et al., Nature Nanotech. 2024). However, in practice, this requires transferring of the sample post-preparation into a new chamber in order to increase the signal-to-noise ratio of the encapsulated dye, a procedure that drastically reduces the yield of dumbbells. The very sizeable amount of work required to obtain reliable measurements, especially considering all the proteins and protein combinations used in this study, indicates that this represents a project in itself, which goes well beyond the scope of this manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) We would encourage the authors to consider including the length of the scale bar next to the scale bar in each image and not in the figure description. This would greatly aid in clarity and interpretation of figures.

      We have now written the length of the scale bar in the figures.

      (2) In a similar vein, could the authors consider labeling panels throughout the manuscript, writing that sample is being presented? This goes mainly for the negative stain and the dumbbell fluorescence images, as having to continuously consult the figure legend again hinders clarity.

      We have now labelled the EM images as requested by the referee.

      (3) Lines 254-256: would the statement hold not only for CdvB2∆C, but for all imaged proteins? They all seem to localize to membrane necks, presumably favoring membrane binding to a specific membrane topology.

      We agree with the referee, and changed the phrasing accordingly.

      (4) CdvB2∆C construct - presumably this was a truncation of helix 5 of the ESCRT-III domain? Figure 1A shows that the ESCRT-III domain spans residues 34-170 and therefore implies that all five ESCRT-III helices (which make up the ESCRT-III domain) are present in the C-terminal truncation. Could the authors clarify?

      Indeed, the truncation was done at residue 170.

      (5) Results of the liposome flotation assays are presented inconsistently across the three figures (Figs 1C, 2DFH, and 3A). This makes it more difficult than it needs to be to interpret and compare results. Could the authors consider presenting the three gels in a more similar, standardized way across the three figures?

      To improve readability, we now standardized the colors, both for gels and for microscopy images, in order to have the same color-code for each protein. Thus, throughout the manuscript, CdvA is in grayscale, CdvB in yellow, CdvB1 in green, CdvB2ΔC in cyan and the membrane in magenta.

      (6) From the data presented in Fig 1EF, it cannot be concluded whether CdvB and CdvA colocalize, as only one protein is labelled. Is there a technical reason for this?

      We have now repeated the same experiment by having both proteins labelled, confirming that there is co-localization at the neck (Figure 1G).

      (7) Fig 2C: is the difference between the two samples significant

      As requested by Referee 3, we have removed Figure 2C.

      (8) Fig 2I is missing a 'merged' panel.

      We have now added the merged panel.

      (9) The fluorescence intensity plots in Supp Figs 1C and 3C would be easier to interpret if the lipid and protein signal would be plotted on the same plot (say, with normalized fluorescence intensity)

      It is not immediately obvious to us what the signal should be normalized to. What we wished to convey with these plots was that the intensity of proteins spikes at the neck region. In an attempt to improve clarity, we have now aligned the plots vertically, and highlighted the position of the neck.

      (10) CdvA should have a capital "A" in Figure 3A, panel 3.

      We have now corrected this.

      (11) The discussion doesn't comment on the need to truncate CdvB2.

      This is explained in the result session.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study represents an important advance in our understanding of how certain inhibitors affect the behavior of voltage gated potassium channels. Robust molecular dynamics simulation and analysis methods lead to a new proposed inhibition mechanism with strength of support being mostly convincing, and incomplete in some aspects. This study has considerable significance for the fields of ion channel physiology and pharmacology and could aid in development of selective inhibitors for protein targets 

      We are encouraged by this favorable assessment and thank editors and reviewers for their constructive feedback and recommendations. We trust that the revisions made to the manuscript will clarify the aspects that had been perceived to be incomplete.

      Reviewer #1 (Public review):

      Summary: 

      This study seeks to identify a molecular mechanism whereby the small molecule RY785 selectively inhibits Kv2.1 channels. Specifically, it sought to explain some of the functional differences that RY785 exhibits in experimental electrophysiology experiments as compared to other Kv inhibitors, namely the charged and non-specific inhibitor tetraethylammonium (TEA). This study used a recently published cryo-EM Kv2.1 channel structure in the open activated state and performed a series of multi-microsecond-long all-atom molecular dynamics simulations to study Kv2.1 channel conduction under the applied membrane voltage with and without RY785 or TEA present. While TEA directly blocks K+ permeation by occluding ion permeation pathway, RY785 binds to multiple nonpolar residues near the hydrophobic gate of the channel driving it to a semi-closed non-conductive state. This mechanism was confirmed using an additional set of simulations and used to explain experimental electrophysiology data.

      Strengths:

      The total length of simulation time is impressive, totaling many tens of microseconds. The study develops forcefield parameters for the RY785 molecule based on extensive QM-based parameterization. The computed permeation rate of K+ ions through the channel observed under applied voltage conditions is in reasonable agreement with experimental estimates of the singlechannel conductance. The study performed extensive simulations with the apo channel as well as both TEA and RY785. The simulations with TEA reasonably demonstrate that TEA directly blocks K+ permeation by binding in the center of the Kv2.1 channel cavity, preventing K+ ions from reaching the SCav site. The conclusion is that RY785 likely stabilizes a partially closed conformation of the Kv2.1 channel and thereby inhibits the K+ current. This conclusion is plausible given that RY785 makes stable contact with multiple hydrophobic residues in the S6 helix. This further provides a possible mechanism for the experimental observations that RY785 speeds up the deactivation kinetics of Kv2 channels from a previous experimental electrophysiology study.

      Weaknesses:

      The study, however, did not produce this semi-closed channel conformation and acknowledges that more direct simulation evidence would require extensive enhanced-sampling simulations. The study has not estimated the effect of RY785 binding on the protein-based hydrophobic pore constriction, which may further substantiate their proposed mechanism. And while the study quantified K+ permeation, it does not make any estimates of the ligand binding affinities or rates, which could have been potentially compared to the experiment and used to validate the models. 

      As stated in the original manuscript, we concur that the mechanism we propose remains hypothetical until further studies of the complete conformational cycle of the channel are conducted. The recently determined structure of a Kv2.1 channel in the closed state (Mandala and MacKinnon, PNAS 2025) presents an excellent opportunity to do so. Indeed, a cursory analysis of that structure shows that a Pro-Ile-Pro motif in helix S6 marks the position of the intracellular gate, where the pore domain constricts maximally (aside from the selectivity filter). As illustrated in Fig. 5, this motif is precisely where the benzimidazole and thiazole moieties of RY785 bind in our simulations. The mechanism we outline in Fig. 7 thus seems very plausible, in our view; that is RY785 occludes the K<sup>+</sup> permeation pathway before the pore domain reaches the closed conformation, explaining the observed electrophysiological effects (see Discussion). The Discussion has been revised to note the recent discovery of the aforementioned structure, its implications for the mechanism we propose, and the opportunities for further research that are now open.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Zhang et al. investigate the conductivity and inhibition mechanisms of the Kv2.1 channel, focusing on the distinct effects of TEA and RY785 on Kv2 potassium channels. The study employs microsecond-scale molecular dynamics simulations to characterize K+ ion permeation and compound binding inhibition in the central pore. 

      Strengths:

      The findings reveal a unique inhibition mechanism for RY785, which binds to the channel walls in the open structure while allowing reduced K+ flow. The study also proposes a long-range allosteric coupling between RY785 binding in the central pore and its effects on voltage-sensing domain dynamics. Overall, this well-organized paper presents a high-quality study with robust simulation and analysis methods, offering novel insights into voltage-gated ion channel inhibition that could prove valuable for future drug design efforts.

      Weaknesses:

      (1) The study neglects to consider the possibility of multiple binding sites for RY785, particularly given its impact on voltage sensors and gating currents. Specifically, there is potential for allosteric binding sites in the voltage-sensing domain (VSD), as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019).

      As noted in the manuscript, we designed our simulations to explore the possibility that RY785 binds within the pore domain, because TEA and RY785 are competitive and TEA is known to bind within the pore. That RY785 did in fact spontaneously and reproducibly bind within the pore was however not a predetermined outcome; if the site of interaction for the inhibitor was elsewhere in the channel, the simulation would not have shown a stable associated state, which would have prompted us to examine other possible sites, including the voltage sensors. It was also not predetermined or foreseeable a priori that the mode of interaction we observed in simulation provides a straightforward rationale for the electrophysiological effects of RY785. Based on our results, therefore, we believe that RY785 binds within the pore of Kv2. As stated by the reviewer, other allosteric modulators are known to bind instead to the sensors; to our knowledge, however, there is no precedent of a small-molecule inhibitor that simultaneously acts on the sensors and the pore domain. We therefore believe that future studies should focus on corroborating or refuting the mechanism we propose, through additional experimental and computational work; if, contrary to our claim, RY785 is found not to bind to the pore domain, it would be logical to explore other possible sites of interaction, as the reviewer suggests. The Discussion has been modified to address this point.

      (2) The study describes RY785 as a selective inhibitor of Kv2 channels and characterizes its binding residues through MD simulations. However, it is not clear whether the identified RY785-binding residues are indeed unique to Kv2 channels.

      To clarify this question, we have included a multiple sequence alignment as Supplementary Figure 1; the revised manuscript refers to this figure in the Discussion section. The alignment reveals that the cluster of residues forming contacts with RY785 (Val409, Pro406, Ile405, Ile401, and Val398) is indeed specific to Kv2.1. Among Kv channels, Kv3.1 and Kv4.1 exhibit the greatest similarity to Kv2.1 at these positions, but they differ in a crucial substitution: Ile405 in Kv2.1 is replaced by Val. This replacement shortens the sidechain, undoubtedly reducing the magnitude of the hydrophobic interaction between inhibitor and channel (Val is approximately 6 kcal/mol, i.e. 1,000 times, more hydrophilic than Ile). Kv5.1 differs from Kv2.1 at two positions: Pro406 is replaced by His, and Val409 by Ile. The introduction of His abolishes the hydrophobic interaction at that position, and the need for hydration likely perturbs all adjacent contacts with RY785. Lastly, Kv6-Kv10 and Cav channels feature entirely different residues at these positions. Consistent with these findings, a recent study by the Sack lab (https://elifesciences.org/articles/99410) has demonstrated that Kv5, Kv6, Kv8, and Kv9 pore subunits confer resistance to RY785, while a high-throughput electrophysiological study carried out by Merck (Herrington et al., 2011) reported that RY785 shows no significant activity against Cav channels. The sequence alignment offers a simple interpretation for these experimental observations, namely that RY785 is recognized by Kv2 channels through the abovementioned hydrophobic cluster within the pore domain.

      (3) The study does not clarify the details, rationale, and ramifications of a biasing potential to dihedral angles.

      We refer the reviewer to published work, for example Stix et al, 2023 and Tan et al, 2022. We provide additional comments below.

      (4) The observation that the Kv2.1 central pore remains partially permeable to K+ ions when RY785 is bound is intriguing, yet it was not revealed whether polar groups of RY785 always interact with K+ ions.

      We detected no persistent specific interactions between RY785 and the permeant K+ ions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript describes atomistic molecular dynamics (MD) simulations of a voltage-gated potassium channel Kv2.1 using its cryo-EM structure in the open activated state and its inhibition by a classical non-specific cationic blocker tetraethylammonium (TEA) as well as a novel selective inhibitor RY785. Using multi-microsecond-long all-atom MD runs under the applied membrane voltage of 100 mV the authors were able to confirm that the channel structure represents an open conducting state with the computed single-channel conductance lower than experimental values, but still in the same order of magnitude range. They also determined that both TEA and RY785 bind in the channel pore between the cytoplasmic hydrophobic gate and narrow selectivity filter (SF) region near the extracellular side. However, while TEA directly blocks a knock-on K+ conduction by physically obstructing ion access to the SF, the mechanism of action of RY785 is different. It does not directly prevent K+ access to the SF but rather binds to multiple residues in the hydrophobic gate region, which effectively narrows a pore and drives the channel toward a semi-closed nonconductive conformation, which might be distinct from one with the deactivated voltage sensors and closed pore observed at hyperpolarized membrane potentials. However, additional studies beyond the scope of this work might be needed to fully establish this mechanism as suggested by the authors.

      The manuscript is written very well and represents a significant advance in the field of ion channel research. I do not have any major issues, which need to be addressed. However, I have several suggestions.

      For the apo-channel K+ conduction MD simulation under the applied voltage, the authors seem to observe mostly a direct or Coulomb knock-on mechanism across the SF with almost no water copermeation. This is in line with computational electrophysiology studies with dual membrane setup by B. de Groot and others but in disagreement with multiple previous studies by B. Roux and others also using applied electric field and CHARMM force fields as in the present study. I wonder why the outcomes are so different. Is it related to the Kv2.1 channel itself, a relatively small applied electric field used (corresponding to a membrane potential of 100 mV vs. 500-750 mV used in many previous simulations), ion force field (e.g., LJ parameters), or some other factors? Could weak dihedral restraints on the protein backbone and side chains contribute to this mechanism? I also wonder if the authors might have considered different initial SF ion configurations. Related to that, I wonder if the authors observed any SF distortions in their simulations including frequently observed backbone carbonyl flipping and/or dilation/contraction.

      We are aware of these discrepancies between published simulation studies, but cannot offer a satisfactory explanation, beyond speculation. The reviewer is correct that the mechanism of ion permeation we observe is comparable to that reported by de Groot, as we noted in Tan et al, 2022 and Stix et al, 2023. Neither in this nor in those previous studies did we observe any persistent distortions of the selectivity filter – but that outcome was expected by construction. The weak biasing potentials acting on the mainchain dihedral angles allow for local fluctuations but not a persistent deformation, relative to the conductive form determined experimentally.

      For MD simulations with the ligand present, I wonder if the authors can comment on the effect of the ligand especially RY785 on the pore size or more importantly size of the hydrophobic gate. The presence of the ligand itself would definitely result in a narrower pore, but I also wonder if this would also lead to a rearrangement of pore sidechain and/or backbone residues, which would lead to a narrower pore from a protein itself thus confirming the proposed mechanism of driving the channel towards a semi-closed state. It is easy to compute but I wonder if the presence of weak dihedral restraints may preclude this analysis.

      Yes, while the simulation design used in this study allows for local fluctuations in the mainchain structure and nearly unrestricted sidechain dynamics, changes in either the secondary or tertiary structure of the channel are strongly disfavored. This approach is thus sufficient to examine ligand binding or ion flow in the microsecond timescale but not channel gating. In the revised version of the Discussion, we outline a roadmap for future computational studies of that gating process, on the basis of the open-channel structure we used and the recently determined structure of the closed state.

      The authors state that RY785 does not block K+ ion, but it does significantly slow the rate of K+ ion access to the pore Scav site. Is this not a part of the mechanism for inhibition of the channel? The authors seem to focus on the primary mechanism of inhibition as the RY785 promoting channel closing, but would it not also reduce K+ current in the open state by slowing the rate of K+ entry into the cavity and selectivity filter? The authors should address this point in the text. I am also somewhat confused that in the MD simulations performed by the authors, there is still some K+ conduction with RY785 in the pore, which is not in 100% agreement with electrophysiology experiments. Does it mean that the channel in the simulations has not yet reached that semiclosed state or a reduced K+ conduction is not observed experimentally?

      The salient experimental observation is RY785 abrogates K+ currents through Kv2 channels (Herrington et al, 2011; Marquis et al, 2022). In our view, that observation can be explained in one of two ways: either RY785 completely blocks the flow of K+ ions across the channel while the pore domain remains in the conductive, open state – like TEA does – or RY785 induces or facilitates the closing of the channel, thereby abrogating K+ flow. The fact that we observe K+ flow while RY785 is bound to the channel is therefore not in disagreement with the electrophysiological measurements, but it does rule out the first of those two possible interpretations of the existing experiments. As it happens, the second possible explanation, i.e. that RY785 facilitates the closing of the pore domain, also provides a rationale for another puzzling experimental observation, namely that RY785 shifts the voltage dependence of the currents produced by the voltage sensors as they reconfigure to open or close the intracellular gate.

      Also, I wonder if the authors considered that since there are 4 potential equivalent sites in the pore (although, overlapping) more than one RY785 might be needed to prevent K+ conduction, even though the experimental Hill coefficient of ~1 does not indicate cooperativity.

      Admittedly, our simulation design was based on the premise that only one RY785 molecule might be recognized within the pore. Based on the outcome of the simulations, we are confident that this assumption was valid, as the binding pose that we identified rules out multiple occupancy – which would be indeed consistent with a Hill coefficient of ~1.

      I also wonder if the authors considered estimating ligand binding affinities and/or "on" rates from their simulations to have a more direct comparison with experiments and test the accuracy of their models. There are multiple enhanced sampling techniques allowing to do that, although it can be a study on its own.

      We thank the reviewer for this suggestion, which we will consider for future studies.

      The authors also discussed that they could not study Kv2.1 deactivation in a reasonable simulation time. Indeed it is very challenging but they should cite previous studies e.g. 2012 Jensen et al paper (PMID: 22499946) on this subject. There are structures of Kv channels with the deactivated voltagesensing domains (VSDs) available, e..g of EAG1 channel (PDB 8EP1), although they do not have a domain-swapped architecture. There are structural modeling approaches including AlphaFold, which can be potentially used to get a Kv2.1 structure with deactivated VSDs, and targeted MD, string method etc. can be used to study transition between different states with and without bound ligands.

      As noted, a structure of a Kv2 channel with a closed pore has now been determined experimentally. In the revised Discussion, we comment on what this structure tells us about the mechanism of inhibition we propose, and how it could be leveraged in future studies.

      The authors should be commended for doing a thorough QM-based force field parameterization of RY785. However, a validation of the developed force field parameters is lacking. In terms of QM validation, a gas-phase dipole moment can be compared in terms of direction and magnitude (it's normal to be overestimated to implicitly reflect solvent-induced polarization). If there are any experimental data available for this compound, they can be tested as well.

      We agree with the reviewer that forcefield validation is important, but to our knowledge no experimental data exists for RY785 to compare with, such as hydration free energies. We did however compare the gas-phase dipole moment computed with QM and with the MM forcefield we developed based on atomic charges optimized to reproduce QM interactions with water. The MM model yields a gas-phase dipole moment of 3.94 D, which is 20% greater than the QM dipole moment, or 3.23 D. That deviation is within the typical range for electroneutral molecules (Vanommeslaeghe et al, 2010), and as the reviewer notes, reflects the solvent-induced polarization implicit in the derivation of atomic charges. As shown in Author response image 1, the orientation of the dipole moment calculated with MM (right, blue arrow) is also in good agreement with that predicted with QM (left)

      Author response image 1.

      (1) p. 3 "the last two helices in each subunit" -> "the last two transmembrane helices in each subunit".

      Thanks. Corrected.

      (2) p. 5 "and therefore do not cause large density variations e.g. 100-fold or greater.". I would be more specific here and indicate what are the actual variations in density or free energy encountered and how they are compared e.g. with thermal fluctuations (~kT).

      Thanks. The exact variations in K+ density had been included in the original manuscript, in Fig. 2C, but we failed to refer to this figure at this point in the description of the results. The ion density is plotted in a log scale to facilitate conversion to free-energy units. Corrected.

      (3) p. 6 Figure 1 caption "and along the perpendicular to the membrane" -> "perpendicular to the membrane normal"?. "The channel is an assembly of four distinct subunits (in colors);" -> "The channel is an assembly of four identical subunits (distinct by colors);". I would use the same protein coloring method in panels B and C as was used in panel A.

      Thanks. Corrected as needed.

      (4) p. 6 Figure 2 In panel B I would appreciate a representative complete ion permeation event trace. In panel C caption I would indicate corresponding sites "S0-S4, Scav" for each residue mentioned. I also would not use gray color for site names in the figure.

      We appreciate the suggestion, but believe the figure is clear as is. Panel B is meant to focused on the mechanism of knock-on. Panel A includes numerous complete permeation events. 

      (5) p. 7 Figure 3 caption. Please indicate which atoms of residues T373 and P406 were used to define SF and gate positions. Chemical structures of both TEA and RY785 would be useful. In panels C and F channel interacting residues (if any) would be helpful to show.

      The revised caption clarifies that the positions of T373 and P406 are represented by their carbonalpha atoms. A close-up view of the structures of TEA and RY785 is included in the Supplementary Information section.

      (6) p. 8. Figure 4 caption. Please indicate if N atoms ere used for density maps in panels B and C, and which value of the density was used to show meshes. In panel A please indicate what are the units of the density shown by color maps. 

      The caption has been revised to clarify these questions.

      (7) p. 9 "inside the protein" -> "inside the channel pore".

      Thanks. Corrected.

      (8) p. 10 "which lines the cavity" -> "which lines the water-filled cavity"

      We appreciate the suggestion but believe the wording is clear as is.

      (9) p.10 Fig. 5. It would be helpful to distinguish residues from different chains e.g. by different colors rather than using different colors for different residues. The S atom in RY785 is hard to recognize due to the yellow color used for C atoms. Figure 5B is very confusing. It is not clear what this plot represents. For instance, what does it mean that Pro405 has ~10 contacts in 20% of simulation snapshots? Does it mean 10 C..C/S interactions within 4.5 A? I am not sure what the value of this is. I think a bar or radar chart plot showing % of contacts with one, two, or more residues of each type would be more helpful. 

      Thanks. The revised caption ought to clarify how to interpret the plot.

      (10) p. 12 "Due to its 2-fold molecular symmetry". TEA has a tetrahedral point group or Td symmetry. It has several two-fold rotational axes though. 

      Thanks. Corrected.

      (11) p. 12 "it prevents K+ ions in the cytoplasmic space from destabilizing the K+ ions that reside in the selectivity filter" I am not sure if this statement is entirely accurate as there might be destabilization of a multi-ion SF configuration not ions per see.

      We believe this statement is clear as is.

      (12) p. 13 Fig. 7 caption "includes non-conductive or transiently inactivated states" - I am not sure what "transiently inactivated state" is as inactivation is a specific term used in ion channel research and it does not seem to be explicitly considered in this study.

      A reference has been included in the caption for readers interested in the process of inactivation.

      (13) p. 14 "the net charge of these constructs is thus zero". This would depend on the number of basic and acidic residues in the protein. 

      Yes, it does – and as a result the construct we model has a net zero charge.

      (14) p. 14 I wonder if the protein was constrained or heavily restrained during MARTINI membrane building and equilibration procedure. Otherwise, C-alpha mapping would be problematic and clashes with lipid membrane atoms might take place as well.

      It was indeed. When a protein is simulated using the MARTINI coarse-grained forcefield, its fold must be preserved through a network of strong ‘virtual’ bonds between adjacent carbon-alpha atoms. This is standard practice so we do not believe it requires further explanation.

      (15) p. 15 PME - please spell out and provide reference.

      Corrected.

      (16) p. 15 "with a smooth switching function" - is it a special or standard switching function? Also, was it used for energy or forces? 

      The switching function brings both forces and energies to a value of zero at the cut-off value, smoothly. We refer the reviewer to the NAMD manual for further details.

      (17) p. 15 '𝑘 = 1 𝑘B𝑇.' Please confirm that there is a factor of "1" there, which can be actually skipped if this is the case. 

      The value of k = 1 KBT is correct.

      (18) p. 15. Please cite PMID: 22001851 for the transmembrane electric field application technique.

      Corrected.

      (19) p. 15 "and CHARMM36m" -> "and CHARMM36m force field". 

      Corrected.

      (20) p. 16 "the four proteins subunits" -> "the four protein subunits". 

      Corrected.

      (21) p. 16. Please provide the reference for CGenFF. It's reference 49. 

      Corrected.

      Supporting Information (SI): CGenFF is misspelled in multiple figure captions in the SI. All potential energy scans indicate "angle", but some are bond angles while others are dihedral angles. Using subscripts for atom numbers is confusing and does not match the numbering scheme used in Fig. S1. So, please use the same style of numbering throughout, e.g. C46-C42-N43 (without subscripts). Please label the X and Y axes in Figsures S2-S19 and S21. In Figure S22 please perform a linear regression analysis and/or compute Pearson correlation coefficients and indicate trend lines. Table S1. It would be good to compute RMS or mean unsigned errors to get an idea about accuracy. Also, please indicate if reference QM values were scaled by 1.16 for energies or offset for distances. 

      The Supplementary Information has been corrected. We thank the reviewer for their detailed feedback. 

      Reviewer #3 (Recommendations for the authors):

      (1) The study needs to consider the possibility of multiple binding sites for RY785, particularly given its impact on voltage sensors and gating currents. Specifically, the potential for allosteric binding sites in the voltage-sensing domain (VSD) should be assessed, as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019). Molecular docking and/or MD simulations could quickly test this hypothesis. If this hypothesis is not true, a comprehensive search can exclude such a possibility, which can also confirm the long-range allosteric coupling between RY785 binding in the central pore and voltage-sensing domain dynamics. 

      Please see our response above.

      (2) The authors describe RY785 as a selective inhibitor of Kv2 channels and characterize its binding residues through MD simulations. To support this claim, Figure 5 needs to include a multiple sequence alignment with other Kv channels. This would help demonstrate whether the identified RY785-binding residues are indeed unique to Kv2 channels.

      Please see our response above.

      (3) The study applies a biasing potential to 𝜙, 𝜓, and 𝜒1 dihedral angles. Please clarify:

      (a) Is this potential solely to prevent selectivity filter collapse/degradation, as mentioned in a previous D. E. Shaw Research publication (Jensen et al., 2012)?

      Yes, that is correct.

      (b) If it applies to all amino acids, can this potential prevent other changes, such as in the voltagesensing domain?

      Yes, that is correct.

      (c) What specific "large-scale structural changes" does this potential preclude? 

      For example, it would preclude the spontaneous degradation of the secondary or tertiary structure of the protein. We have revised the Methods section to make these points clearer. 

      (d) Given that such biasing potentials on backbone dihedral angles can decrease conformational flexibility, and considering that Kv channel permeability/conductivity could be highly sensitive to filter flexibility, what insights can you provide about the impact of the force constant k on channel conductivity?

      In previous studies based on an identical methodology (Stix et al, 2023; Tan et al, 2022), we have observed good agreement between calculated and experimental conductance values – at least as good as can be hoped for, when all approximations are considered. Based on the data presented in those studies, we have no reason to believe our methodology inhibits the permeability of the channel, which is logical as the local structural fluctuations required for K+ flow across the selectivity filter are not impaired, by definition. To the contrary, the fact that these weak biasing potentials make the conductive form of the filter the most favorable state in simulation enable a clear-cut analysis of conductance under plausible simulation conditions, both in terms applied voltage and K+ concentration. We refer the reviewer to the abovementioned studies for further details and a discussion of this subject.

      (4) The observation that the Kv2.1 central pore remains partially permeable to K+ ions when RY785 is bound is intriguing. Given the compact nature of the central cavity when RY785 is bound, it would be valuable to investigate whether polar groups of RY785 (e.g., nitrogens from the amide, benzimidazole, and thiazole moieties) always interact with K+ ions. Characterizing these interactions could inform the design of similar compounds with differential modulation effects.

      We examined this possibility and detected no convincing interaction patterns between RY785 and K+ ions – logically, inhibitor and ions are in close proximity while residing concurrently within the pore, but we detected no evidence of specific interactions.

      Minor points:

      It is strongly recommended that the refined force field parameters for RY785 be shared as a separate supplementary file in CHARMM force field format. This addition would be valuable for the scientific community, allowing other researchers to use or compare these parameters in future studies.

      We agree entirely. Upon publication of the VOR for this article the forcefield parameters for RY785 will be made freely available for download at https://github.com/Faraldo-Gomez-Lab-atNIH/Download.

      The study uses a KCl concentration of 300 mM, which exceeds typical intracellular K+ levels. While this may be intentional to enhance K+ permeation probability, a brief justification for this choice should be included in the Methods section.

      Yes, what motivated this choice in this and in our previous studies of K+ channels was the expectation of a greater number of permeation events, for a given simulation length, and therefore greater confidence (i.e. statistical significance) in the observed ion conductance, or in the degree to which it might be inhibited by a blocker. It worth noting that 300 mM KCl, while atypical in the intracellular environment, is often used in electrophysiological studies. The Methods section has been amended to clarify this point.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Persistence is a phenomenon by which genetically susceptible cells are able to survive exposure to high concentrations of antibiotics. This is especially a major problem when treating infections caused by slow growing mycobacteria such as M. tuberculosis and M. abscessus. Studies on the mechanisms adopted by the persisting bacteria to survive and evade antibiotic killing can potentially lead to faster and more effective treatment strategies.

      To address this, in this study, the authors have used a transposon mutagenesis based sequencing approach to identify the genetic determinants of antibiotic persistence in M. abscessus. To enrich for persisters they employed conditions, that have been reported previously to increase persister frequency - nutrient starvation, to facilitate genetic screening for this phenotype. M.abs transposon library was grown in nutrient rich or nutrient depleted conditions and exposed to TIG/LZD for 6 days, following which Tnseq was carried out to identify genes involved in spontaneous (nutrient rich) or starvationinduced conditions. About 60% of the persistence hits were required in both the conditions. Pathway analysis revealed enrichment for genes involved in detoxification of nitrosative, oxidative, DNA damage and proteostasis stress. The authors then decided to validate the findings by constructing deletions of 5 different targets (pafA, katG, recR, blaR, Mab_1456c) and tested the persistence phenotype of these strains. Rather surprisingly only 2 of the 5 hits (katG and pafA) exhibited a significant persistence defect when compared to wild type upon exposure to TIG/LZD and this was complemented using an integrative construct. The authors then investigated the specificity of delta-katG susceptibility against different antibiotic classes and demonstrated increased killing by rifabutin. The katG phenotype was shown to be mediated through the production of oxidative stress which was reverted when the bacterial cells were cultured under hypoxic conditions. Interestingly, when testing the role of katG in other clinical strains of Mab, the phenotype was observed only in one of the clinical strains demonstrating that there might be alternative anti-oxidative stress defense mechanisms operating in some clinical strains.

      Strengths:

      While the role of ROS in antibiotic mediated killing of mycobacterial cells have been studied to some extent, this paper presents some new findings with regards to genetic analysis of M. abscessus susceptibility, especially against clinically used antibiotics, which makes it useful. Also, the attempts to validate their observations in clinical isolates is appreciated.

      Weaknesses:

      Amongst the 5 shortlisted candidates from the screen, only 2 showed marginal phenotypes which limits the impact of the screening approach.

      We appreciate the reviewer’s comments, but we note that 4 out of 5 genes displayed phenotypes concordant with findings of the Tn-Seq data, with katG and pafA, as well as MAB_1456c (during starvation only) and blaR (in rich media only) having decreased survival as shown in Figure 3A-D. We do agree that some of the phenotypes were more modest in a single-mutant context than in the pooled Tn-Seq screen. In addition, several mutants that had modest changes in survival also showed profound defects in resuming growth after removal of antibiotics, with the pafA mutants particularly impaired. (Figure 3 - figure supplement 1).

      While the role of KatG mediated detoxification of ROS and involvement of ROS in antibiotic killing was well demonstrated, the lack of replication of this phenotype in some of the clinical isolates limits the significance of these findings.

      While the role of katG varied among strains, the antibiotic-induced accumulation of ROS was seen in all three strains (Figure 6A). This suggests that in some strains other ROS-detoxification pathways are able to compensate for the loss of katG.

      (Figure 2—figure supplements 1–3)

      Figure 1—figure supplement 1.

      Reviewer #2 (Public review):

      Summary:

      The work set out to better understand the phenomenon of antibiotic persistence in mycobacteria. Three new observations are made using the pathogenic Mycobacterium abscessus as an experimental system: phenotypic tolerance involves suppression of ROS, protein synthesis inhibitors can be lethal for this bacterium, and levofloxacin lethality is unaffected by deletion of catalase, suggesting that this quinolone does not kill via ROS.

      Strengths:

      The ROS experiments are supported in three ways: measurement of ROS by a fluorescent probe, deletion of catalase increases lethality of selected antibiotics, and a hypoxia model suppresses antibiotic lethality. A variety of antibiotics are examined, and transposon mutagenesis identifies several genes involved in phenotypic tolerance, including one that encodes catalase. The methods are adequate for making these statements.

      Weaknesses:

      The work can be improved by a more comprehensive treatment of prior work, especially comparison of E. coli work with mycobacterial studies.

      Moreover, the work still has some technical issues to fix regarding description of the methods, supplementary material, and reference formating.

      See detailed responses below.

      Overall impact: Showing that ROS accumulation is suppressed during phenotypic tolerance, while expected, adds to the examples of the protective effects of low ROS levels. Moreover, the work, along with a few others, extends the idea of antibiotic involvement with ROS to mycobacteria. These are fieldsolidifying observations.

      Comments on revisions:

      The authors have moved this paper along nicely. I have a few general thoughts.

      It would be helpful to have more references to specific figures and panels listed in the text to make reading easier.

      Text modified to add more figure references.

      (1) I would suggest adding a statement about the importance of the work. From my perspective, the work shows the general nature of many statements derived from work with E. coli. This is important. The abstract says this overall, but a final sentence in the abstract would make it clear to all readers.

      We appreciate the suggestion and have added a line to the abstract.

      (2) The paper describes properties that may be peculiar to mycobacteria. If the authors agree, I would suggest some stress on the differences from E. coli. Also, I would place more stress on novel findings. This might be done in a section called Concluding Remarks. The paper by Shee 2022 AAC could be helpful in phrasing general properties.

      We have added mention of this in the discussion (lines 354-356).

      (3) Several aspects still need work to be of publication quality. Examples are the materials table and the presentation of supplementary material. Reference formatting also needs attention.

      We respond to the specific details below.

      Reviewer #3 (Public review):

      Summary:

      The manuscript demonstrates that starvation induces persister formation in M. abscesses.

      They also utilized Tn-Seq for the identification of genes involved in persistence. They identified the role of catalase-peroxidase KatG in preventing death from translation inhibitors Tigecycline and Linezolid. They further demonstrated that a combination of these translation inhibitors leads to the generation of ROS in PBS-starved cells.

      Strengths:

      The authors used high-throughput genomics-based methods for identification of genes playing a role in persistence.

      Weaknesses:

      The findings could not be validated in clinical strains.

      Comments on revisions: No more comments for the authors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors are strongly encouraged to check the references. There is some systematic error in the citations of references. Started to list but then they were too many.

      For example Ln 51, Ref #11 cited, should be #10. Ln 59, #18 is wrongly cited. Should be - Ln 104. Ref #27 wrongly cited.

      Ref #26 and #28 identical.

      Even in discussion section a lot of references are mis-cited.

      We very much appreciate the reviewer catching this issue with the import of our references and we have corrected this.

      Reviewer #2 (Recommendations for the authors):

      Below I have listed comments on specific issues that I hope are useful during revision.

      Line 21 population is singular

      Text modified

      Line 21 comma after antibiotic (subordinate clause) Line

      Text modified

      25 is how singular?

      Text modified

      Impression of abstract: the work seems to confirm and therefore generalize concepts derived from studies with E. coli. If the authors agree, such a statement would be appropriate as a final sentence. I would also look for novel features to stress in the abstract.

      Line 41 this challenge is vague

      Text modified

      Line 43 comma such as (also comma at the end of the parenthetical statement). This type of comma error is common throughout the manuscript and slows reading.

      Text modified

      Line 60 paradoxically. Is this the best concept? Or is it the natural effect of evolution (assuming that mycobacteria or their ancestors were exposed to environmental antibiotics)?

      It is certainly problematic for clearing infection.

      Text not modified.

      Line 63 highlighted uncertainties ... meaning is unclear especially since you may have changed what "model" is referring to.

      Text modified

      Line 66 models.... Do you really mean systems? Models of what?

      This refers to mechanistic models. Text not modified.

      Line 67 arrest cell division. This is written as if it were true. Does the evidence point specifically to cell division or perhaps more accurately suppression of metabolism (see Ye et al 2025 mBio).

      Both have been postulated as important. Text modified to add concept of metabolism

      ... targeted by antibiotics non-essential... Do you think that antibiotics work by inactivating essential targets? That seems overly simplistic, as lethal action is more likely the metabolic response to the damage caused. By the end of the paragraph you come around to this view, but you have already misdirected the reader. The reader is not sure what to believe. Line 70 note that there are many inhibitors of transcription and translation that only block growth, they do not rapidly kill cells

      There can be both direct, and indirect secondary killing mechanisms. We devote a significant portion of the Discussion section to this topic.

      Line 71 debate. There was indeed a debate, but reference 22 is not a valid citation for this. I think you mislead the reader by not accurately describing the debate. It was basically about the inability of Kim Lewis and James Imlay to reproduce the work of ref. 22. A great deal of prior work and then subsequent work showed that the challenge to ref. 22 lacked substance.

      (1) Text modified to fix an error in the citation number related to direct β-lactam-mediated lysis.

      (2) We agree that there is a great deal of data supporting antibiotic-induced ROS as important for bactericidal activity in many circumstances and do not argue otherwise. This sentence points out that over the years the paradigm for how antibiotics kill bacteria has evolved.

      Line 80. It seems you are starting a new topic here. What about beginning a new paragraph?

      The paragraph introduces mycobacteria of which Mabs is one. Text not modified.

      Line 85 delete the comma: it implies a compound sentence that is not delivered.

      Text modified.

      Line 109 screen singular

      Text modified.

      Line 156 these conditions is imprecise and vague

      Conditions were described in paragraph above in the manuscript. Text not modified.

      Fig 2 it would be helpful to more clearly define the meaning of the coordinates

      Text modified.

      Line 230 and throughout please indicate the location of the data being cited for rapid reader reference

      Text modified.

      Lines 315-323 You could use this paragraph as the first of the Discussion. Some readers prefer to read the Discussion before the results. For them, a summary at the beginning of the Discussion is useful.

      Text modified.

      Line 328 without underlying mechanism... for E. coli refer to Zeng PNAS 2022. Depending on when the final version of this paper happens, there should be a figure in a Zhao Zhu mLife paper on purA that will have been published. Since it is not yet available, it cannot be cited.

      We agree that the Zeng et al study is interesting and have added this reference to our discussion. However, these findings related to broad Crp-regulated tolerance actually underscore the point that we are making: that there are multiple factors (Crp, RelA, Lon, TisB, MazE, others) that mediate antibiotic tolerance.

      Line 339 where are the data?

      These data are in Figure 5, panels C, D. We have clarified the text to indicate that only a single agent from each of these classes was tested.

      Line 346 here you are summarizing evidence for ROS in killing mycobacteria. You should include the moxifloxacin study by Shee et al 2022 AAC.

      Reference added.

      Line 348 refer to James Collins' work with E. coli in which his lab examined agents with a variety of mechanisms. There seems to be a fundamental difference between E. coli and mycobacteria with respect to rifampicin, a strictly static agent in E. coli but clearly lethal in mycobacteria. Note that chloramphenicol is static in E. coli and blocks ROS production. What does it do in mycobacteria? A brief discussion of this difference might be relevant at line 362

      Text modified.

      Lines 364-368 Here the idea might be simply that there are two modes of killing, one that is a direct extension of class-specific damage (chromosome fragmentation with fluoroquinolones, for example, or cell lysis by beta-lactams) and a second that is a metabolic response to the antibiotic damage (ROS accumulation). The second type is not class specific. Within this context, the mycobacterial killing by rifampicin might be a class-specific extension of inhibition of transcription that does not occur in E. coli.

      Agreed, text modified to include this.

      Line 400 The Key Resource table is not of publication quality. Precision and repeatability can be improved by spelling out the name of the vendor and its location (City, Country). In the present case, use of BD is lab jargon.

      We appreciate the reviewer’s precision. However, this is actually not lab jargon. Becton, Dickinson and Company now refers to itself as BD (see https://www.bd.com/en-us), and the American Type Culture Collection now refers to itself as ATCC (see https://www.atcc.org/about-us/who-we-are).

      Line 639 It would be good to have experienced colleagues critically review the manuscript, especially for English usage. Listing those persons here adds to the credibility of the work

      Text not changed.

      References: please refer to the journal style. Here you use italic for titles and scientific names, thereby obscuring the scientific names. Normally article titles are not italic and scientific names are ALWAYS italic unless prohibited by journal style.

      Our reference format is concordant with eLife submission guidelines, and all references are reformatted by the journal at the time of final publication (see https://elifesciences.org/insideelife/a43f95ca/elife-references-yes-we-take-any-format-no-we-re-not-rekeying).

      Supplemental Material: Please refer to journal style. Normally this is a stand-alone document that includes a title page and carefully crafted figure legends. Supplemental figures would be numbered as 1, 2, ... A professional appearing Supplemental Material section shows author publication experience not obvious in other parts of the paper. The text indicated MIC determinations. I would like to see a table of MIC values.

      (1) MIC table added as Supplemental Table 5.

      (2) The Supplemental figures are submitted and named in accordance with eLife instructions. Please note that for eLife, there is not a stand-alone supplementary figure section with a title page as you are requesting, but instead the figure supplements for each figure are provided as online files linked to each figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Feng et al. uses mouse models to study the embryonic origins of HSPCs. Using multiple types of genetic lineage tracing, the authors aimed to identify whether BM-resident endothelial cells retain hematopoietic capacity in adult organisms. Through an important mix of various labeling methodologies (and various controls), they reach the conclusion that BM endothelial cells contribute up to 3% of hematopoietic cells in young mice.

      Strengths:

      The major strength of the paper lies in the combination of various labeling strategies, including multiple Cdh5-CreER transgenic lines, different CreER lines (col1a2), and different reporters (ZsGreen, mTmG), including a barcoding-type reporter (PolyLox). This makes it highly unlikely that the results are driven by a rare artifact due to one random Cre line or one leaky reporter. The transplantation control (where the authors show no labeling of transplanted LSKs from the Cdh5 model) is also very supportive of their conclusions.

      We appreciate the Reviewer’s consideration of the strengths of our study supporting the identification of adult endothelial to hematopoietic transition (EHT) in the mouse bone marrow.

      Weaknesses:

      We believe that the work of ruling out alternative hypotheses, though initiated, was left incomplete. We specifically think that the authors need to properly consider whether there is specific, sparse labeling of HSPCs (in their native, non-transplant, model, in young animals). Polylox experiments, though an exciting addition, are also incomplete without additional controls. Some additional killer experiments are suggested.

      Recognizing the importance of the weaknesses pointed by the Reviewer, we provide below our response to the thoughtful recommendations rendered.

      Reviewer #1 (Recommendations for the authors):

      The main model is to label cells using Cdh5 (VE-cadherin) CreERT2 genetic tracing. Cdh5 is a typical marker of endothelial cells. The data shows that, when treating adults with tamoxifen, the model labels PBMCs after ~10 days, and the labeling kinetics plateau by day 14... The authors reach the main conclusion: that adult ECs are making hematopoietic cells.

      We agree that the main tool used in this study is to label endothelial cells (ECs) using Cdh5 (VE-Cadherin) CreERT2 genetic tracing in mice. Indeed, Cdh5 is recognized as a good marker of ECs. As a minor point, we wish to clarify that the results from treating adult Cdh5-CreERT2 mice with tamoxifen (Figure 1F) show that the ZsGreen labeling kinetics plateau by day 28 (not by day 14).

      Important controls should be shown to rule out alternative possibilities: namely, that the CreERT2 reporter is being sparsely expressed in HSPCs. Many markers, specific as they may seem to be, can show expression in non-specific lineages - particularly in the cases of BAC and PAC transgenic models, in which the transgene can be present in multiple tandem copies and subject to genome location-specific effects. As the authors remind readers, the Cdh5 gene is partly transcribed (though at low levels) in HSPCs, and even more clearly expressed in specific subpopulations such as CLPs, DCs, pDCs, B cells, etc. Some options would be to: i) check if the Cdh5-CreERT2 transgene (not endogenous Cdh5, but the BAC/PAC transgene) is expressed in LSKs (at least by qPCR), ii) verify if any CreERT2 protein levels are present in LSKs (e.g., by western blot), and iii) check if tamoxifen is labeling any HSPCs freshly after induction (e.g., flow cytometry data of ZsGreen LSKs at 24-48h post tamoxifen injection).

      We fully agree with the Reviewer that many markers, allegedly specific to a certain cell type, can show expression in other cell lineages. We also agree that excluding sparse or ectopic CreERT2 expression in hematopoietic stem and progenitor cells (HSPCs) is essential for interpreting lineage-tracing results. As suggested by the Reviewer, we have now examined if the Cdh5-CreERT2 transgene is expressed in bone marrow LSKs. To this end, we analyzed the Polylox single-cell RNAseq dataset presented in this study, containing ZsGreen<sup>+</sup> ECs and enriched ZsGreen<sup>+</sup> LSKs. As shown in the revised Figure S4D, CreERT2 transcripts were detected exclusively in Cdh5-expressing endothelial populations and were absent from Ptprc/CD45-expressing hematopoietic cells, except for plasmacytoid dendritic cells (pDCs; Figure S4E). These results are consistent with the RNAseq data from adult mouse bone marrow[1] showing that the Cdh5 gene is not expressed in HSPCs, CLPs, DCs, or B cells. Rather, among hematopoietic CD45<sup>+</sup> cells, Cdh5 is only expressed in a small subset of plasmacytoid dendritic cells (pDCs), which are terminally differentiated cells. These published results are described in the text.

      To further support this conclusion, we provide additional single-cell RNAseq analyses from our unpublished dataset of LSKs isolated from Cdh5-CreERT2/ZsGreen mice and not enriched for ZsGreen expression. These new analyses were performed after integrating the single-cell data from ECs and ZsGreen<sup>+</sup> hematopoietic cells from the Polylox dataset (current study). As shown in Author response images 1 and 2, CreERT2 expression closely matches the expression patterns of Cdh5, Pecam1, and Emcn and is not detected in Ptprc/CD45-expressing hematopoietic cells.

      Author response image 1.

      Expression of CreERT2, Cdh5, Ptprc and ZsGreen in BM cell populations enriched with ECs and hematopoietic cells. The single-cell RNAseq results are derived from ZsGreen-enriched BM ECs and ZsGreen-enriched BM hematopoietic cells were derived from Polylox lineage-tracing experiments (data shown in Fig. 5; 37,667 ECs and 48,065 BM hematopoietic cells) and from LSKs (23,017 cells) independently isolated from tamoxifen-treated Cdh5-CreERT2/ZsGreen mice without ZsGreen enrichment (unpublished data).

      Author response image 2.

      Expression of CreERT2, Cdh5, Ptprc, Pecam1, Emcn, ZsGreen1, Col1a2, Cd19, Cd3e, Itgam (CD11b), Ly6a (Sca-1), Kit(cKit), Cd34, Cd48, Slamf1 (CD150), and Siglech in enriched BM ECs and LSKs from Cdh5-CreERT2/ZsGreen mice treated with tamoxifen 4 weeks prior to harvest (same cell source as indicated in Author response image 1).

      Additionally, we functionally tested whether hematopoietic progenitors could acquire ZsGreen labeling following tamoxifen administration using transplantation assays (Figure 4A-D). ZsGreen<sup>-</sup> LSKs (purity 99%), sorted from Cdh5-CreERT2/ZsGreen donors that had never been exposed to tamoxifen to exclude background Cre leakiness, were transplanted into lethally irradiated wild-type recipients. After stable hematopoietic reconstitution, recipients were treated with tamoxifen. If transplanted HSPCs or their progeny expressed CreERT2, tamoxifen administration would be expected to induce ZsGreen labeling. However, no ZsGreen<sup>+</sup> hematopoietic cells were detected in these recipients, demonstrating that hematopoietic progenitors from Cdh5-CreERT2/ZsGreen and their descendants do not undergo tamoxifen-induced recombination.

      Together, the single-cell transcriptional and transplantation data demonstrate that CreERT2 expression and tamoxifen-induced recombination are restricted to Cdh5-expressing ECs (except for pDCs). These findings support the conclusion that ZsGreen<sup>+</sup> hematopoietic cells arise from adult bone marrow ECs rather than from contaminating hematopoietic progenitors.

      One important missing experiment is to trace how ECs actually do this hematopoietic conversion: meaning, which populations of HSPCs are being produced by adult ECs in the first instance? LT-HSCs? ST-HSCs? MPPs? GMPs? All of the above? What are the kinetics? Differentiation is likely to follow a hierarchical path, but this is unclear at the moment.

      We agree that defining the earliest EC-derived hematopoietic cell progenitors and the kinetics by which these progenitors appear (LT-HSC vs ST-HSC/MPP vs lineage-restricted progenitors) would provide important insights into adult EHT.

      In the current genetic labeling system, a rigorous kinetic analysis of hematopoietic cells first generated by EC-derived in vivo is not straightforward. Specifically, the low-level baseline reporter ZsGreen<sup>+</sup> fluorescence in hematopoietic cells (dependent on EHT occurring prenatally, perinatally or in young mice or other causes (Figure 1 A-D and Figure S1 D-I) impairs identification of newly generated ZsGreen<sup>+</sup> progenitors at early time points and distinguish them from baseline fluorescence. A potential solution might be to introduce serial harvests across multiple time-points in large mouse cohorts to capture rare transitional events with statistical significance.

      We wish to emphasize that the primary objective of this study was to establish whether adult bone marrow ECs have a hemogenic potential. Our data demonstrate adult EC-derived hematopoietic cell output that includes progenitor-containing fractions and multilineage mature progeny, under both steady-state conditions. We acknowledge that the current work does not resolve the order and kinetics of hematopoietic cell emergence following EHT. Therefore, under “Limitations of the study” we explicitly state this limitation and frame the identification of the earliest endothelial-derived progenitors and their kinetics as an important direction for future work.

      One warning sign is how rare the reported phenomenon is. Even when labeling almost 90% of the BM ECs, these make at most ~3% of blood (less than 1% in the transplants in Figure 4F, less than 0.5% in the col1a2 tracing in Figure 7). This means this is a very rare and/or transient phenomenon... The most major warning sign is the fast kinetics of labeling and the fast plateau. We know that: a) differentiation typically follows some hierarchy, b) in situ dynamics of blood production are slow (work by Rodewald and Höfer). Considering how fast these populations need to be replaced to reach a steady state so rapidly (as reported here, 2-4 weeks), the presumably specialized ECs would need to be steadily dividing and producing hematopoietic cells at a fast pace (as a side prediction, the adult "EHT" cluster would likely be highly Mki67+). More importantly, the ZsGreen LSKs produced by the ECs would have to undergo VERY rapid differentiation (much faster than normal LSKs) or otherwise, if 3% of them are produced by a top compartment (the BM ECs) every 4 weeks, then the labeled population would continue to grow with time. The authors could try to challenge this by testing if the ZsGreen LSKs undergo much faster differentiation kinetics or lower self-renewal (which does not seem to be the case, at least in their own transplantation data). We believe a more likely explanation is that the label is being acquired more or less non-specifically, directly across a bunch of HSPC populations.

      The Reviewer correctly notes that that the population of hemogenic ECs in the adult mouse bone marrow is small and the output of hematopoietic cells from these hemogenic ECs accounts for at most 3% of blood cells. We agree that delineating the kinetics by which hematopoietic cells are generated from adult EC is important, as this information would provide important insights into adult EHT.

      Nonetheless, we believe that the rapid appearance and early plateau of labeled blood cells in our experiments may not derive from a sustained, high-rate generation of labeled blood cells from self-renewing top-tier hematopoietic cell compartments, such as LT-HSCs. Rather, our data are more consistent with a predominantly lineage-restricted and biased hematopoietic progenitor cell population being the source of labeled blood cells. Supporting this interpretation, longitudinal analysis of peripheral blood shows that EGFP<sup>+</sup> PBMCs are consistently enriched with myeloid cells, whereas EGFP<sup>-</sup> PBMCs are predominantly B cells (Figure 4G and H). This myeloid lineage skewing is stable over time and contrasts with what would be expected if labeling were acquired broadly and nonspecifically across the hematopoietic hierarchy. Therefore, our results are more consistent with myeloid biased progenitors being among the first populations that EHT generates.

      We acknowledge that our studies do not identify the earliest endothelial-derived hematopoietic cells produced in vivo, and do not define their differentiation kinetics. Addressing rigorously these questions would require temporally resolved lineage tracing with sufficiently powered cohorts at early time point to statistically distinguish from baseline reporter background. These important experiments were beyond the scope of the present study. As noted above, under “Limitations of the study” we explicitly state this limitation and frame the identification of the earliest endothelial-derived progenitors and their kinetics as an important direction for future work.

      Transplant experiments in Figure 4 do offer a crucial experiment in support of the main conclusion of the manuscript. These experiments show that transplanted LSKs bearing the Cdh5-CreERT2 and ZsGreen reporter cannot acquire the tamoxifen-induced label post-transplantation - suggesting that the label is coming from ECs. However, it is also possible that the LSK Cdh5-CreERT expression is partly during the transplantation process... Indeed, we know through the aging data that the labeling is less active in aged mice. In any case, this would be verified by qPCR/western-blot (comparing native vs post-transplant LSKs).

      We agree with the Reviewer that the experiment in Figure 4A-D “offer a crucial experiment in support of the main conclusion of the manuscript.” The results of this experiment show that ZsGreen negative LSKs from the Cdh5-CreERT2-ZsGreen reporter mice do not acquire tamoxifen-induced ZsGreen fluorescence post transplantation, supporting the endothelial cell origin of blood ZsGreen<sup>+ </sup>cells.

      The Reviewer raises the possibility a “that the LSK Cdh5-CreERT expression is partly during the transplantation process... , and that this Cdh5-CreERT expression may occur slowly as learned “through the aging data that the labeling is less active in aged mice.” As we show in Figure 3F, tamoxifen administration induced a similar percentage of ZsGreen<sup>+ </sup>ECs in the bone marrow of Cdh5-Cre<sup>ERT2</sup>(BAC)/ZsGreen mice, whether tamoxifen was administered to 6-week-old, 16-week-old, 26-week-old or 36-week-old mice. Similar results with Cdh5-CreERT2 (BAC) mice are reported in the literature[2]. Since the mice transplanted with ZsGreen<sup>-</sup> LSKs were followed for 25 weeks after tamoxifen administration, we believe that the results in Figure 4A-D address the concern raised by the Reviewer.

      Supporting the conclusion that LSKs from the Cdh5-CreERT2-ZsGreen reporter mice do not express the Cdh5-CreERT2 under a native -non-transplant- setting, we now provide transcriptomic data from Cdh5-CreERT2/ZsGreen mice (not transplanted) showing that CreERT2 expression closely tracks with expression of canonical endothelial markers (Cdh5, Pecam1, Emcn) and is not detectable in Ptprc/CD45-expressing hematopoietic cells (Author response images 1 and 2). These data were obtained from non-transplanted mice treated with tamoxifen at ~12 weeks of age and analyzed four weeks later. Together, these results indicate that CreERT2 expression is endothelial-restricted in Cdh5-CreERT2-ZsGreen reporter mice.

      Figure 5 presents PolyLox experiments to challenge whether adult ECs produce hematopoietic cells through in situ barcoding. Several important details of the experiment are missing in the main text (how many cells were labeled, at which time point, how long after induction were the cells sampled, how many bones/BM-cells were used for the sample preparation, what was the sampling rate per population after sorting, how many total barcodes were detected per population, how many were discarded/kept, what was the clone-size/abundance per compartment). As presented, the authors imply that 31 out of ~200 EC barcodes are shared with hematopoietic cells... This would suggest that ~15% of endothelial cells are producing hematopoietic cells at steady state. This does not align well with the rarity of the behavior and the steady state kinetics (unless any BM EC could stochastically produce hematopoietic cells every couple of weeks, or if the clonality of the BM EC compartment would be drastically reduced during the pulse-chase overlap with mesenchymal cells. Important controls are missing, such as what would be the overlap with a population that is known to be phylogenetically unrelated (e.g., how many of these barcodes would be found by random chance at this same Pgen cut-off in a second induced mouse). Also, the Pgen value could be plotted directly to see whether the clones with more overlapping populations/cells (3HG, 127, 125, CBA) also have a higher Pgen. We posit that there are large numbers of hematopoietic clones that contribute to adult hematopoiesis (anywhere from 2,000-20,000 clones would be producing granulocytes after 16 weeks post chase), and it would be easy to find clones that overlap with granulocytes (the most abundant and easily sampled population) - HSPCs would be the more stringent metric.

      We thank the Reviewer for highlighting the need for a more detailed description of the Polylox experiments. To address this deficiency, we have compiled a document (Additional Supplementary Information file) containing all the specifics of the Polylox experimental and analytical parameters in one location. This includes: (i) the number of cells analyzed per population, (ii) the time points of induction and sample collection, (iii) the number of bones and total bone marrow cells used for preparation, (iv) the sampling rate following cell sorting, (v) the total number of detected barcodes per population, (vi) barcode filtering criteria and numbers retained or discarded, and (vii) clone-size and barcode number across cell compartments. We have updated the manuscript to refer readers to this Supplementary file.

      The Reviewer concluded from our results (Figure 5, Figure S5) that 31 out of ~200 endothelial cell (EC) barcodes shared with hematopoietic cells (HCs), implying that ~15% of ECs produce hematopoietic cell progeny at steady state. This interpretation in inconsistent with our data showing the rare nature of adult EHT and would require either that a large fraction of bone-marrow ECs can generate hematopoietic cells within short time windows, or that EC would clonally expand rapidly during the pulse-chase period, as noted by the Reviewer. The explanation for this apparent problem is technical. Briefly, the ~200 EC barcodes recovered do not represent all barcoded ECs. During Polylox barcode library construction, a mandatory size-selection step is applied prior to PacBio sequencing, retaining fragments that are approximately 800–1500 bp in length, whereas the full Polylox cassette spans ~2800 bp. This is mainly because the PacBio sequencer requires that the library be either 800-1500bp or over 2500bp, for optimal sequencing results. As described in the original Polylox publication[3,4], this size selection eliminates most (approximately 75%) longer barcodes, together with ~85% of the shorter barcodes. Thus, ECs harboring very long or short recombined barcodes are under-represented or excluded from sequencing. As a result, the 22 true barcodes linking ECs and HCs recovered from sequencing do not indicate that ~10–15% of ECs generate hematopoietic progeny. Rather, these barcodes represent a highly selected subset of ECs with barcode configurations compatible with library recovery and sequencing. The observed EC–HC barcode sharing thus reflects qualitative lineage connectivity, not the quantitative frequency of endothelial-derived hematopoiesis at steady state.

      The Reviewer correctly notes that true Polylox barcodes are shared by ECs and mesenchymal-type cells and asks that we examine whether this overlap could occur by chance alone. The Polylox filtering threshold (pGen < 1 × 10<sup>-6</sup>), that we have revised for stringency (from pGen < 1 × 10<sup>-4</sup>, without altering the essential results; new Figure S4 and revised Figure 5C-F) renders such overlap exceedingly unlikely. At this threshold, the expected number of random recombination events among 4,069 barcoded cells is approximately 0.004. Consequently, among the 87 mesenchymal cells identified here, fewer than 0.4 cells would be expected, to share a barcode with another cell by chance alone. Thus, the probability of recovering identical barcodes across unrelated lineages due to random recombination is vanishingly small, and the observed EC–mesenchymal barcode sharing substantially exceeds random expectation.

      Related to this observation, the Reviewer correctly notes that the endothelial and mesenchymal cell lineages are phylogenetically unrelated. However, endothelial-to-mesenchymal cell transition (EndMT), the process by which normal ECs completely or partially lose their endothelial identity and acquire expression of mesenchymal markers, is a well-established process that occurs physiologically and in disease states (Simons M Curr Opin Physiol 2023). In the bone marrow, the occurrence of EndMT has been documented in patients with myelofibrosis, and the process affects the bone marrow microvasculature (Erba BG et al The Amer J Patholl 2017). Single-cell RNAseq of non-hematopoietic bone marrow cells has shown the existence of a rare population of ECs that co-expresses endothelial cell markers (Cdh5, Kdr, Emcm and others) and the mesenchymal cell markers, as shown in Figure 6E and F.

      We fully agree with the Reviewer that given the large number of hematopoietic clones contributing to adult hematopoiesis -particularly granulocyte-producing clones- it may be relatively easy to detect barcode overlap with abundant mature populations, whereas overlap with HSPCs would represent a more stringent and informative metric of lineage relationships. The Polylox results presented here show the sharing of true barcodes between individual ECs and HSPC.

      Reviewer #2 (Public review):

      Summary:

      Feng, Jing-Xin et al. studied the hemogenic capacity of the endothelial cells in the adult mouse bone marrow. Using Cdh5-CreERT2 in vivo inducible system, though rare, they characterized a subset of endothelial cells expressing hematopoietic markers that were transplantable. They suggested that the endothelial cells need the support of stromal cells to acquire blood-forming capacity ex vivo. These endothelial cells were transplantable and contributed to hematopoiesis with ca. 1% chimerism in a stress hematopoiesis condition (5-FU) and recruited to the peritoneal cavity upon Thioglycolate treatment. Ultimately, the authors detailed the blood lineage generation of the adult endothelial cells in a single cell fashion, suggesting a predominant HSPCs-independent blood formation by adult bone marrow endothelial cells, in addition to the discovery of Col1a2+ endothelial cells with blood-forming potential, corresponding to their high Runx1 expressing property.

      The conclusion regarding the characterization of hematopoietic-related endothelial cells in adult bone marrow is well supported by data. However, the paper would be more convincing, if the function of the endothelial cells were characterized more rigorously.

      We thank the Reviewer for the supportive comments about our study.

      (1) Ex vivo culture of CD45-VE-Cadherin+ZsGreen EC cells generated CD45+ZsGreen+ hematopoietic cells. However, given that FACS sorting can never achieve 100% purity, there is a concern that hematopoietic cells might arise from the ones that got contaminated into the culture at the time of sorting. The sorting purity and time course analysis of ex vivo culture should be shown to exclude the possibility.

      We agree that FACS sorting can never achieve 100% cell purity and that sorting purity is critical for interpreting the ex vivo culture experiments presented in our study. As requested by the Reviewer, we have now documented the purity of the sorted endothelial cell (EC) population used in the ex vivo culture experiments. The post-sort purity of CD45<sup->/sup>VE-cadherin<sup>+</sup>ZsGreen<sup>+</sup> ECs was 96.5 %; this data is now shown in the revised Figure 2B (Post Sort Purity panel). This purity level is comparable to purity levels of sorted ECs shown in Figure S2I (94.5 %).

      While we agree that a detailed time-course analysis of hematopoietic cell output from EC cultures could further strengthen the conclusion that bone marrow ECs can produce hematopoietic cells ex vivo, we wish to call attention to the additional critical control in the experiment shown in Figure 2B-D. In this experiment, we co-cultured CD45<sup>+</sup>ZsGreen<sup>+</sup> hematopoietic cells from Cdh5-CreERT2/ZsGreen mice, rather than ECs, and examined if these hematopoietic cells could produce ZsGreen<sup>+</sup> cell progeny after 8-week culture under the same conditions used in EC co-cultures (conditions not designed to support hematopoietic cells long-term). Unlike ECs, the CD45<sup>+</sup>ZsGreen<sup>+</sup> hematopoietic cells did not generate ZsGreen<sup>+</sup> hematopoietic cells at the end of the 8-week culture, indicating that the culture conditions are not permissive for the maintenance, proliferation and differentiation of hematopoietic cells. This provides strong evidence that even if few hematopoietic cells contaminated the sorted ECs, these hematopoietic cells would not contribute to EC-derived production of hematopoietic cells at the 8-week time-point. We have revised the text of the results describing the results of Figure 2B-D.

      (2) Although it was mentioned in the text that the experimental mice survived up to 12 weeks after lethal irradiation and transplantation, the time-course kinetics of donor cell repopulation (>12 weeks) would add a precise and convincing evaluation. This would be absolutely needed as the chimerism kinetics can allow us to guess what repopulation they were (HSC versus progenitors). Moreover, data on either bone marrow chimerism assessing phenotypic LT-HSC and/or secondary transplantation would dramatically strengthen the manuscript.

      The original manuscript reported survival and engraftment up to 12 weeks post transplantation. The recipient mice have now been monitored for up to 10 months post transplantation. These extended survival and engraftment data are now included in the revised Figure 2I and J replacing the previous 10-week analyses.

      We agree with the Reviewer that the time-course kinetics of donor cell repopulation would help define adult endothelial to hematopoietic transition (EHT) and the hematopoietic cell types produced by adult (EHT). We did not perform serial time-course sampling of peripheral blood beyond the 10-week and the 10-month time-points. Given that the recipient mice were lethally irradiated with increased susceptibility to infection, we sought to minimize repeated interventions that could compromise animal health and survival. We therefore prioritized long-term survival and endpoint analysis over repeated longitudinal sampling. Nonetheless, the long-term survival,10 months, and multilineage hematopoietic cell reconstitution after lethal irradiation provides functional evidence that adult EHT produced at least some LT-HSC.

      We acknowledge that phenotypic assessment of bone marrow LT-HSC chimerism /or secondary transplantation would further strengthen the manuscript. We have clarified these limitations in the revised manuscript under “Limitations of the study”.

      (3) The conclusion by the authors, which says "Adult EHT is independent of pre-existing hematopoietic cell progenitors", is not fully supported by the experimental evidence provided (Figure 4 and Figure S3). More recipients with ZsGreen+ LSK must be tested.

      We agree with the Reviewer that, in most cases, a larger number of experimental data points is helpful to strengthen the conclusions, and that having additional mice transplanted with ZsGreen-enriched LSK would be desirable. However, we do not believe that additional mice transplanted with ZsGreen LSKs would strengthen the conclusions drawn from the experimental results shown in Figure 4D, in which we used 6 mice transplanted with ZsGreen-depleted (ZsGreen<sup>-</sup>) LSKs and 2 mice transplanted with ZsGreen<sup>+</sup>-enriched (ZsGreen<sup>+</sup>) LSKs. The independence of adult EHT from “pre-existing hematopoietic cell progenitors” is based on the following experimental results and conclusion from these results.

      First, ZsGreen<sup>-</sup> LSKs (purity 99%) isolated from Cdh5-CreERT2/ZsGreen mice were transplanted into lethally irradiated WT recipients (n = 6). These ZsGreen<sup>-</sup> LSKs robustly reconstituted hematopoiesis, demonstrating successful engraftment. Importantly, tamoxifen administration to the recipients of ZsGreen<sup>-</sup> LSKs produced no detectable ZsGreen<sup>+</sup> cells in the blood for up to 6 months post transplantation (Figure 4D, blue line encompassing the results of the 6 mice). This result demonstrates that the transplanted ZsGreen<sup>-</sup> hematopoietic progenitors and their progeny do not acquire ZsGreen labeling in vivo following tamoxifen treatment, indicating that they lack the Cre-recombinase. This result is consistent with the endothelial specificity of Cdh5 expression.

      Second, ZsGreen<sup>+</sup> LSKs (accounting for ~50% of the LSKs) isolated from Cdh5-CreERT2/ZsGreen mice were transplanted into lethally irradiated WT recipients (n = 2). This arm of the experiment was performed in part as a technical control to confirm successful engraftment and detection of ZsGreen<sup>+</sup> hematopoietic cells in the transplant setting. Importantly, tamoxifen administration to the two recipients of ZsGreen<sup>+</sup> LSKs (Figure 4D, two green lines reflecting these two mice) show that the level of ZsGreen<sup>+</sup> blood cells stabilized in each of the mice between week 10 and 24, showing equilibrium between the proportion of ZsGreen<sup>+</sup> and ZsGreen<sup>-</sup>cells in the blood. This indicates that pre-existing ZsGreen<sup>+</sup> LSK are not responsible for tamoxifen-induced increases in ZsGreen<sup>+</sup> hematopoietic cell in blood.

      Together, the results from this experiment demonstrate that in the setting of transplantation, tamoxifen does not induce ZsGreen labeling of ZsGreen- hematopoietic progenitors/their progeny. This result strongly supports the conclusion that ZsGreen⁺ hematopoietic cells arise independently of pre-existing or inducible hematopoietic progenitors. We have revised the text to clarify these experiments and to present the results in a simplified manner.

      Strengths:

      The authors used multiple methods to characterize the blood-forming capacity of the genetically - and phenotypically - defined endothelial cells from several reporter mouse systems. The polylox barcoding method to trace the adult bone marrow endothelial cell contribution to hematopoiesis is a strong insight to estimate the lineage contribution.

      Weaknesses:

      It is unclear what the biological significance of the blood cells de novo generated from the adult bone marrow endothelial cells is. Moreover, since the frequency is very rare (<1% bone marrow and peripheral blood CD45+), more data regarding its identity (function, morphology, and markers) are needed to clearly exclude the possibility of contamination/mosaicism of the reporter mice system used.

      We agree that the biological significance and functional roles of hematopoietic cells generated de novo from adult bone marrow ECs remain important open questions. We also agree that the output of hematopoietic cells from adult EHT is low, but rare events can be important, particularly as they pertain to stem/progenitor cell biology. Both points are described under “Limitations of the study”. The primary goal of the present study was to address the question whether adult bone marrow ECs can undergo EHT. We believe that the combination of various mouse transgenic lines, different Cre-ER, different reporters (ZsGreen and mTmG), including the s.c. barcoding reporter (PolyloxExpress), different approaches to evaluate hematopoiesis in vivo and ex vivo, makes it rather unlikely that our conclusions are driven by an artifact related to a specific leaky reporter, contamination, or problems with one of the Cre-lines. The experiment where we find no tamoxifen-induced labeling of transplanted ZsGreen<sup>-</sup> LSKs derived from the Cdh5-CreERT2/ZsGreen mice is strongly supportive of the existence of adult EHT, virtually excluding a contribution of contaminant hematopoietic cells.

      Reviewer 2 Recommendations for the authors:

      (1) There is a discrepancy in the proportion of peripheral blood composition between different reporters (mTmG and ZsGreen) (Figure 1G and Figure S1K), especially the contrasting B cell proportion between both models. The additional comments on this data should be mentioned.

      In the revised Results section, we now note that the mTmG and ZsGreen reporters show slightly different efficiencies or kinetics of labeling. These differences have previously been reported[5] and have been attributed to relative reporter leakiness, sensitivity to tamoxifen, or different kinetics of Cre recombination. As suggested, these comments have been added to the text following the description of (Figure S2A).

      (2) Experimental methods concerning cell transplantation/transfer need more information, such as: a) using or not using rescue cells and how many cells are they if using, b) single or split dose of irradiation, c) when were cells transplanted following irradiation, etc. Otherwise, the data are uninterpretable.

      We have ensured that the Material and Methods section under “Bone marrow ablation and transplantation” contains all the information requested by the Reviewer.

      (3) Some of the grouped data haven't been statistically analyzed.

      We have reviewed all data and performed appropriate statistical analyses where comparisons were made. In the revised figures and legends, all grouped datasets now include statistical tests and p-values are indicated (added to Fig. 3H and I; Figure 4G).

      (4) Some flowcytometry plot has the quantitative number, others do not. The quantitative information is absolutely needed in all flow cytometry plots.

      We have updated the flow cytometry figures to include quantitative values (percentages or absolute counts) in all relevant plots (2B (new figure, bottom left); 2C; S1G, S1H).

      (5) It is more relevant to present the Emcn/VE-Cadherin plot from gated CD45+/ZsGreen+, not the CD45-/ZsGreen+ fraction (Figure 2C), as the latter were not the EHT-derived offspring, but rather the common phenotypic endothelial cells

      As requested, we have added the suggested flow cytometry plot. The revised Figure 2C now includes an Emcn vs. VE-Cadherin plot from the gated CD45<sup>+</sup>ZsGreen<sup>+</sup> population. This complements the existing panel and confirms that the cells of interest retain endothelial cell markers after culture, while the CD45<sup>+</sup>ZsGreen<sup>+</sup> cells did not express endothelial markers. The figure legend has been updated to explain the new panel. We agree that this plot more directly highlights the phenotype of the presumed EHT-derived cells.

      (6) To show the effect of the ex vivo culture, the authors should present the absolute number of CD45+ZsGreen+ cells in the pre-/post-culture; otherwise, the data are uninterpretable (Figure 2D).

      Our interpretation of the Reviewer’s comment above (relative to the experiment shown in Figure 2B-D) is that the Reviewer would like that we provide the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells introduced into the co-culture (supplemented with unsorted BM cells, ZsGreen<sup>+</sup> hematopoietic cell or ZsGreen<sup>+</sup> ECs) and the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells recovered at the end of the 8-week culture. Currently, the results in Figure 2D show the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells recovered at the end of the 8-week culture. The input of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells for unsorted BM cells was 2.93e6 on average; for ZsGreen<sup>+</sup> hematopoietic cells was 1.68e6 on average and from sorted ZsGreen<sup>+</sup> ECs was estimate up to 100.

      (7) It is confusing to see Figures 2F and 2G, which apparently show the data from the middle of the experimental procedure (Figure 2E). Those data should be labelled clearly regarding which procedures of the whole experiment protocol.

      As correctly noted by the Reviewer, Figures 2F and 2G provide data that relate to the middle of the graphical representation of the experiment shown in Figure 2E. We see how this may be confusing.

      Therefore, we have updated both the figure labeling and legend to explicitly indicate that Figure 2F and 2G provide the FACS sorting results for the cells used for transplantation. The revised legend now reads: “Representative flow cytometry plots of the non-adherent cell fraction after 8 weeks of co-culture (cells used for transplantation).”

      References

      (1) Kucinski, I., Campos, J., Barile, M., Severi, F., Bohin, N., Moreira, P.N., Allen, L., Lawson, H., Haltalli, M.L.R., Kinston, S.J., et al. (2024). A time- and single-cell-resolved model of murine bone marrow hematopoiesis. Cell Stem Cell 31, 244-259.e10. https://doi.org/10.1016/j.stem.2023.12.001.

      (2) Identification of a clonally expanding haematopoietic compartment in bone marrow | The EMBO Journal | Springer Nature Link https://link.springer.com/article/10.1038/emboj.2012.308.

      (3) Pei, W., Shang, F., Wang, X., Fanti, A.-K., Greco, A., Busch, K., Klapproth, K., Zhang, Q., Quedenau, C., Sauer, S., et al. (2020). Resolving Fates and Single-Cell Transcriptomes of Hematopoietic Stem Cell Clones by PolyloxExpress Barcoding. Cell Stem Cell 27, 383-395.e8. https://doi.org/10.1016/j.stem.2020.07.018.

      (4) Pei, W., Feyerabend, T.B., Rössler, J., Wang, X., Postrach, D., Busch, K., Rode, I., Klapproth, K., Dietlein, N., Quedenau, C., et al. (2017). Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 548, 456–460. https://doi.org/10.1038/nature23653.

      (5) Álvarez-Aznar, A., Martínez-Corral, I., Daubel, N., Betsholtz, C., Mäkinen, T., and Gaengel, K. (2020). Tamoxifen-independent recombination of reporter genes limits lineage tracing and mosaic analysis using CreERT2 lines. Transgenic Res 29, 53–68. https://doi.org/10.1007/s11248-019-00177-8.

    1. 13.6. Design Analysis: Mental Health# We want to provide you, the reader, a chance to explore mental health more. We want you to be considering potential benefits and harms to the mental health of different people (benefits like reducing stress, feeling part of a community, finding purpose, etc. and harms like unnecessary anxiety or depression, opportunities and encouragement of self-bullying, etc.). As you do this you might consider personality differences (such as introverts and extroverts), and neurodiversity, the ways people’s brains work and process information differently (e.g., ADHD, Autism, Dyslexia, Face blindness, depression, anxiety). But be careful generalizing about different neurotypes (such as Autism), especially if you don’t know them well. Instead try to focus on specific traits (that may or may not be part of a specific group) and the impacts on them (e.g., someone easily distracted by motion might…., or someone sensitive to loud sounds might…, or someone already feeling anxious might…). We will be doing a modified version of the five-step CIDER method (Critique, Imagine, Design, Expand, Repeat). While the CIDER method normally assumes that making a tool accessible to more people is morally good, if that tool is potentially harmful to people (e.g., give people unnecessary anxiety), then making the tool accessible to more people might be morally bad. So instead of just looking at the assumptions made about people and groups using a social media site, we will be also looking at potential harms to different people and groups using a social media site. So open a social media site on your device. Then do the following (preferably on paper or in a blank computer document):

      This section asks readers to think about how social media affects mental health in both good and bad ways. It suggests considering different types of people, like introverts, extroverts, and people with ADHD or anxiety, instead of making broad generalizations. The goal is to look at specific traits and how social media might help or harm people with those traits, especially when using the CIDER method to evaluate design choices.

    1. You aren’t likely to end up in a situation as dramatic as this. If you find yourself making a stand for ethical tech work, it would probably look more like arguing about what restrictions to put on a name field (e.g., minimum length), prioritizing accessibility, or arguing that a small piece of data about users is not really needed and shouldn’t be tracked. But regardless, if you end up in a position to have an influence in tech, we want you to be able to think through the ethical implications of what you are asked to do and how you choose to respond.

      Although in this case the engineer was able to successfully stand up against the unethical aspects of what they were doing, I think in other circumstances it may not be so easy. Engineers who don't comply could simply be fired, or they could find other workarounds if everyone isn't on the same page as they were with this case.

    1. Reviewer #1 (Public review):

      Summary:

      In this paper, Chen et al. identified a role for the circadian photoreceptor CRYPTOCHROME (CRY) in promoting wakefulness under short photoperiods. This research is potentially important as hypersomnolence is often seen in patients suffering from SAD during winter times. The mechanisms underlying these sleep effects are poorly known.

      Strengths:

      The authors clearly demonstrated that mutations in cry lead to elevated sleep under 4:20 Light-Dark (LD) cycles. Furthermore, using RNAi, they identified GABAergic neurons as a primary site of CRY action to promote wakefulness under short photoperiods. They then provide genetic and pharmacological evidence demonstrating that CRY acts on GABAergic transmission to modulate sleep under such conditions.

      Weaknesses:

      The authors then went on to identify the neuronal location of this CRY action on sleep. This is where this reviewer is much more circumspect about the data provided. The authors hypothesize that the l-LNvs which are known to be arousal promoting may be involved in the phenotypes they are observing. To investigate this, they undertook several imaging and genetic experiments.

      While the authors have made improvements in this resubmitted manuscript, there are still multiple concerns about the paper. I think the authors provide enough evidence suggesting that CRY plays a role in sleep under short photoperiod. The data also supports that CRY acts in GABAergic neurons. However, there are still major issues with the quality of the confocal images presented throughout the paper. In many cases it appears that the images are oversaturated with poor resolution, making it hard to understand what is going on. In addition, none of the drivers used in this study are specific to the neurons the authors aim to manipulate. Therefore, the identity of the GABAergic neurons involved in this CRY dependent sleep mechanism remains unclear. Similarly, whether l-LNvs are the target of this GABA mediated sleep regulation under short photoperiod is not fully demonstrated. The data presented suggests that but does not prove it.

      Major concerns:

      (1) While the authors provided sleep parameters like consolidation or waking activity for some experiments. These measurements are still not shown for several experiments (for example Figures 2E, 3, 4, 5, and 6). These data are essential, these metrics must be reported for all sleep experiments.

      (2) Line 144 "We fed flies with agonists of GABA-A (THIP) and GABA-B receptor (SKF-97541) (Ki and Lim, 2019; Matsuda et al., 1996; Mezler et al., 2001). Both drugs enhance sleep in WT," The proper citation is needed here, Dissel et al., 2015 PMID:25913403. Both THIP and SKF-97541 were used in that paper.

      (3) Figure 2C and 2F: it appears that the control data is the same in both panels. That is not acceptable.

      (4) Figure 4A: With the quality of the images, it is impossible to assess whether GABA levels are increased at the l-LNvs soma.

      (5) Fig 4 S1A shows colabeling of l-LNvs and Gad1-Gal4 expressing neurons. They are almost 100% overlapping signals. This would indicate that the l-LNvs are GABAergic themselves, or that there is a problem with this experiment.

      (6) Fig 4 S1B: Again, I can see colabelling of the GFP and PDF staining, suggesting that Gad1-Gal4 expresses in l-LNvs.

      (7) Line 184: "Consistently, knocking down Rdl in the l-LNvs rescues the long sleep phenotype of cry mutants (Figure 4-figure supplement 1D)." This statement is incorrect as the driver used for this experiment, 78G01-GAL4 is not specific to the l-LNvs, so it is possible that the phenotypes observed are not coming from these neurons.

      (8) Figure 4G-K: None of these manipulations are specific to the l-LNvs. The authors describe 10H10-GAL4 and 78G01-GAL4 as l-LNvs specific tools, but this is not the case. Why not use the SS00681 Split-GAL4 line described in Liang et al., 2017 PMID: 28552314? It is possible that some of the effects reported in this manuscript are not caused by manipulating the l-LNvs.

      (9) Similarly for the manipulation of s-LNvs, the authors cannot rule out effect that are coming from other cells as R6-GAL4 is not specific to s-LNvs.

      (10) The staining presented in Fig 5 S1 is not very convincing. Difficult to see whether Gad1-GAL4 only expresses in the s-LNvs.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We appreciate the authors' efforts in addressing the concerns raised, particularly including a variance partitioning approach to analyse their data. Detailed feedback on the revised manuscript are below and we include a brief list of comments that we think the authors could address in the text: 

      (1) Justify metric selection - Could you please include in the text and explanation for why only five behavioural metrics were highlighted out of the many you calculated?

      We have added explanations throughout the manuscript clarifying the rationale for selecting these behavioral parameters, including in lines 467ff. and 531ff. In short, the five highlighted metrics were chosen because they capture key aspects of the behavioral repertoire and, importantly, can be consistently measured across all experimental conditions. Other parameters were excluded as they were only applicable under specific contexts and thus not suitable for cross-condition comparisons.

      (2) Discuss ICC variation - We note that there is variation among the ICC scores for the different metrics you've studied. While this is expected, we ask that you acknowledge in the text that some traits show high repeatability and others low, and reflect this variation in the conclusions.

      We have added an additional paragraph in the Discussion (lines 743ff.) addressing the variation in ICC values among behavioral traits. This new section highlights that some metrics show high repeatability while others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions about individual behavioral stability across contexts.

      (3) Tone down general claims - Because of the above point, we recommend that you avoid overstating that individuality persists across all behaviours. Please clarify this in the Abstract and main text that it applies to some traits more than others.

      We carefully reviewed the entire manuscript and revised the phrasing wherever necessary to avoid overgeneralization. Statements about individuality have been adjusted to clarify that consistent individuality can be measured in some behavioral traits more strongly than to others, both in the Abstract and throughout the main text.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings, and extends the experiments from temporal stability to examining correlation of locomotion features betweendifferent contexts. 

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23°C and 32°C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32°C variance is predictable by the 23°C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to ingroup ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or openhardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      Comments on revisions:

      While the incorporation of a hierarchical mixed model (HMM) appears to represent an improvement over their prior single-parameter correlation approach, it's not clear to me that this is a multivariate analysis. They write that "For each trait, we fitted a hierarchical linear mixed-effects model in Matlab (using the fit lme function) with environmental context as a fixed effect and fly identity (ID) as a random intercept... We computed the intraclass correlation coefficient (ICC) from each model as the betweenfly variance divided by total variance. ICC, therefore, quantified repeatability across environmental contexts."

      Does this indicate that HMM was used in a univariate approach? Can an analysis of only five metrics of several dozen total metrics be characterized as 'holistic'?

      Within Figure 10a, some of the metrics show high ICC scores, but others do not. This suggests that the authors are overstating the overall persistence and/or consistency of behavioral individuality. It is clear from Figure S8 that a large number of metrics were calculated for each fly, but it remains unclear, at least to me, why the five metrics in Figure 10a are justified for selection. One is left wondering how rare or common is the 0.6 repeatability of % time walked among all the other behavioral metrics. It appears that a holistic analysis of this large data set remains impossible. 

      We thank the reviewer for the careful and thoughtful assessment of our work.

      We have added an additional paragraph in the Discussion (lines 743ff.) explicitly addressing the variation in ICC values among behavioral traits. This section emphasizes that while some metrics show high repeatability, others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions regarding individual behavioral stability across contexts.

      Regarding the reviewer’s concern about the analytical approach, we would like to clarify that the hierarchical linear mixed model (LMM) was applied in a univariate framework—each behavioral metric was analyzed separately to estimate its individual ICC value. This approach allows us to quantify repeatability for each trait across environmental contexts while accounting for individual identity as a random effect. Although this is not a multivariate model in the strict sense, it represents an improvement over the prior pairwise correlation approach because it explicitly partitions within- and between-individual variance.

      As for the selection of behavioral metrics, the five parameters highlighted (% time walked, walking speed, vector strength, angular velocity, and centrophobicity) were chosen because they represent key, biologically interpretable dimensions of locomotor and spatial behavior and, importantly, could be measured reliably across all tested conditions. Several other parameters that we routinely analyze (e.g., Linneweber et al., 2020) could not be calculated in all contexts—for instance, under darkness or when visual cues were absent—and therefore were excluded to maintain consistency across assays.

      We agree that a truly holistic multivariate comparison across all extracted parameters would be valuable; however, given the contextual limitations of some metrics, such an analysis was not feasible in the present framework. We have clarified these points in the revised manuscript to avoid potential misunderstandings.

      The authors write: "...fly individuality persists across different contexts, and individual differences shape behavior across variable environments, thereby making the underlying developmental and functional mechanisms amenable to genetic dissection." However, presumably the various behavioral features (and their variability) are governed by different brain regions, so some metrics (high ICC) would be amenable to the genetic dissection of individuality/variability, while others (low ICC) would not. It would be useful to know which are which, to define which behavioral domains express individuality, and could be targets for genetic analysis, and which do not. At the very least, the Abstract might like to acknowledge that inter-context consistency is not a major property of all or most behavioral metrics.

      We thank the reviewer for this helpful comment and agree that not all behavioral traits exhibit the same degree of inter-context consistency. We have clarified this point in the revised Abstract and ensured that it is also reflected in the main text. The Abstract now reads: 

      “We find that individuality is highly context-dependent, but even under the most extreme environmental alterations tested, consistency of behavioral individuality always persisted in at least one of the traits. Furthermore, our quantification reveals a hierarchical order of environmental features influencing individuality. We confirmed this hierarchy using a generalized linear model and a hierarchical linear mixed model. In summary, our work demonstrates that, similar to humans, fly individuality persists across different contexts (albeit worse than across time), and individual differences shape behavior across variable environments. The presence of consistency across situations in flies makes the underlying developmental and functional mechanisms amenable to genetic dissection.” 

      This revision clarifies that individuality is not uniformly expressed across all behavioral metrics, but rather in a subset of traits with higher repeatability, which are the most promising targets for future genetic analyses.

      I hold that inter-trial repeatability should rightly be called "stability" while inter-context repeatability should be called "consistency". In the current manuscript, "consistency" is used throughout the manuscript, except for the new edits, which use "stability". If the authors are going to use both terms, it would be preferable if they could explain precisely how they define and use these terms.

      We thank the reviewer for drawing attention to this inconsistency in terminology. We apologize for the oversight and have corrected it throughout the manuscript to ensure uniform usage.

      Reviewer #2 (Public review):

      Summary:

      The authors repeated measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      I think the authors are missing an opportunity to use much more robust statistical methods. It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anticonservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and withinindividual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not change, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? what exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors): 

      I am delighted to see the authors have included hierarchical models in their analysis. I really think this strengthens the paper and their conclusions while simultaneously making it more accessible to folks that typically use these types of methods to investigate these patterns of individual behavior. It's also cool, and completely jives with my own experience measuring individual behavior in that the activity metrics show the highest repeatability compared to the more flexible behaviors (such as "exploration"). I think it's quite striking and interesting to see such moderate repeatability estimates in these behaviors across what could be very different environmental scenarios. I think this is a very strong and meaty paper with a lot of information to digest producinghowever a very elegant and convincing take-home message: individuals are unique in their behavior even across very different environments.

      We sincerely thank the reviewer for the positive and encouraging feedback, as well as for their valuable input throughout the review process. We are very pleased that the inclusion of hierarchical models and the resulting interpretations resonated with the reviewer’s own experience and perspective.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-2<sup>+</sup>; Sequence #2: 3<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-4<sup>+</sup>) - forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) - indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      Elegant behavioral design that affords the detection of hidden-state representations.

      Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      The number of subjects is small - can't fully rule out idiosyncratic, animal-specific effects.

      Comments

      (1) Emergence of sequence-dependent OFC representations across learning.

      A conceptual point that would benefit from further discussion concerns the emergence of sequence-dependent OFC activity at overlapping positions (e.g., position P3, odor 1). This implies knowledge of the broader task structure. Such representations are presumably absent early in learning, before rats have learned the sequence structure. While recordings were conducted only after rats were well trained, it would be informative if the authors could comment on how they envision these representations developing over learning. For example, does sequence differentiation initially emerge as animals learn the overall task structure, followed by progressive compression once animals learn that certain states are functionally equivalent? Clarifying this learning-stage interpretation would strengthen the theoretical framing of the results.

      We agree that the emergence of sequence-dependent OFC activity at overlapping positions (e.g., P3) implies knowledge of the broader task structure and therefore must depend on learning. Although we did not record during early acquisition in the current study, we can outline a learning-stage framework consistent with both prior work and the comparative analyses included here and include it in the discussion.

      We think the development of OFC representations is a multi-stage process. Early in learning, before animals have acquired the sequential structure of the task, OFC activity is likely dominated by local sensory features and immediate reinforcement history, with little differentiation between sequences at overlapping positions. As animals learn that odors are embedded within extended sequences that have utility for predicting future outcomes, OFC representations would begin to differentiate identical sensory cues based on their sequence context, giving rise to sequence-dependent activity at positions such as P3. This stage reflects acquisition of the broader task structure and the recognition that current cues carry information about future states.

      With continued training, however, OFC representations normally undergo a further refinement: positions that differ in sensory identity but are functionally equivalent become compressed, while distinctions that are irrelevant for guiding behavior are suppressed. Evidence for this later stage comes from our over-trained control animals, in which discrimination between overlapping positions is near chance across most trial epochs, and from prior work using the same task in less-trained animals, where sequence-dependent discrimination is more strongly preserved. Thus, sequence differentiation appears to emerge during structure learning but is subsequently down weighted as animals learn which distinctions are behaviorally irrelevant.

      Within this framework, prior cocaine exposure appears to interfere specifically with this later refinement stage. Cocaine-experienced rats exhibit OFC representations resembling those seen earlier in learning—retaining sequence-dependent discrimination at overlapping and functionally equivalent positions—despite extensive training. This suggests not a failure to acquire task structure per se, but rather an impairment in the ability to collapse across states that share common underlying causes.

      (2) Reference to the 24-odor position task

      The reference to the previously published 24-odor position task is not well integrated into the current manuscript. Given that this task has already been published and is not central to the main analyses presented here, the authors may wish to a) better motivate its relevance to the current study or b) consider removing this supplemental figure entirely to maintain focus.

      Thanks for your suggestion, we have removed this supplemental figure as suggested.

      (3) Missing behavioral comparison

      Line 117: the authors state that absolute differences between sequences differ between cocaine and sucrose groups across all three behavioral measures. However, Figure 1 includes only two corresponding comparisons (Fig. 1I-J). Please add the third measure (% correct) to Figure 1, and arrange these panels in an order consistent with Figure 1F-H (% correct, reaction time, poke latency).

      Thanks for your suggestion, we have included the related figure as suggested.

      (4) Description of the TCA component

      Line 220: authors wrote that the first TCA component exhibits low amplitude at positions P1 and P4 and high amplitude at positions P2 and P3. However, Figure 3 appears to show the opposite pattern (higher magnitude at P1 and P4 and lower magnitude at P2 and P3). Please check and clarify this apparent discrepancy. Alternatively, a clearer explanation of how to interpret the temporal dynamics and scaling of this component in the figure would help readers correctly understand the result.

      Thanks for your suggestion. We appreciate this point and agree that clearer guidance on how to interpret the temporal and scaling properties of the tensor components would help readers. In the TCA framework, each component is defined by three separable factors: a neuron factor, a temporal factor, and a trial (position) factor. The temporal factor reflects the shape of the activity pattern within a trial, indicating when during the trial that component is expressed, whereas the trial factor reflects how strongly that temporal pattern is expressed at each position and across trials.

      Importantly, the absolute scaling of these factors is not independently meaningful. Because TCA components are scale-indeterminate, the magnitude of the temporal factor and the trial factor should be interpreted relative to one another within a component, not across components. Thus, a large value in the trial factor does not imply stronger neural activity per se, but rather greater expression of that component’s characteristic temporal pattern at that position or trial.

      Accordingly, when a component shows similar temporal dynamics across groups but differs in its trial factor structure—as observed here—the interpretation is that the same within-trial dynamics are being differentially recruited across task positions, rather than that the timing of neural responses has changed.

      We have added a brief discussion of this in this section of the results in the manuscript.

      (5) Sucrose control

      Sucrose self-administration is a reasonable control for instrumental experience and reward exposure, but it means that this group also acquired an additional task involving the same reinforcer. This experience may itself influence OFC representations and could contribute to the generalization observed in control animals. A brief discussion of this possibility would help contextualize the interpretation of cocaine-related effects.

      We agree that sucrose self-administration is not a perfect neutral manipulation and that this experience could, in principle, influence OFC representations. In particular, sucrose self-administration involves instrumental responding for the same primary reinforcer used in the odor task, and thus may promote additional learning about reward predictability, action–outcome contingencies, or contextual structure that could facilitate generalization.

      Several considerations, however, suggest that the generalization observed in control animals primarily reflects learning-dependent refinement of task representations rather than a specific consequence of sucrose self-administration per se. First, the amount of sucrose administered during this phase was minimal (50 µl × 60 presses at most per session for 14 sessions) compared with the total sucrose reward obtained during task recording (100 µl × 160 trials per session for several dozen sessions). Second, all rats were extensively trained on the odor sequence task prior to any self-administration, and the key signatures of compression and generalization we report—near-chance discrimination between functionally equivalent positions—are consistent with prior studies using the same task in animals that did not undergo sucrose self-administration. Finally, comparisons to less-trained animals in earlier work show that OFC representations evolve toward greater abstraction with increasing task experience, indicating that generalization is a property of advanced learning rather than a unique outcome of sucrose exposure.

      Importantly, even if sucrose self-administration were to enhance generalization in OFC, this would not account for the primary finding that cocaine-experienced rats fail to show these signatures despite identical task training and parallel instrumental experience. Thus, the critical comparison is not between sucrose-trained animals and naive controls, but between two groups matched for self-administration experience, differing only in the pharmacological consequences of the reinforcer. Within this framework, the absence of position-general representations in cocaine-experienced rats reflects a disruption of normal learning-dependent abstraction rather than an artifact of the control condition.

      We have added a brief discussion acknowledging that sucrose self-administration may bias OFC toward abstraction, while emphasizing that cocaine exposure prevents the emergence or maintenance of these representations under otherwise comparable experiential conditions.

      (6) Acknowledge low N

      The number of rats per group is relatively low. Although the effects appear consistent across animals within each group, this sample size does not fully rule out idiosyncratic, animal-specific effects. This limitation should be explicitly acknowledged in the manuscript.

      We acknowledge that the number of animals per group is relatively small and therefore cannot fully rule out animal-specific effects. However, the key neural and behavioral signatures reported here were consistent across individual animals within each group and across multiple levels of analysis, and no outliers were observed. In addition, sample sizes of this scale are common in cocaine self-administration studies due to their technical and logistical constraints. We did not attempt to obscure this limitation and have now explicitly acknowledged it in the manuscript discussion.

      (7) Figure 3E-F: The task positions here are ordered differently (P1, P4, P2, P3) than elsewhere in the paper. Please reorder them to match the rest of the paper.

      Thank you for pointing this out. We agree that the ordering of task positions in Figures 3E–F should be consistent with the rest of the manuscript. We have reordered the positions to match the standard sequence order used elsewhere in the paper (P1, P2, P3, P4) to improve clarity and avoid confusion.

      Reviewer #2 (Public review):

      In the current study, the authors use an odor-guided sequence learning task described as a "figure 8" task to probe neuronal differences in latent state encoding within the orbitofrontal cortex after cocaine (n = 3) vs sucrose (n = 3) self-administration. The task uses six unique odors which are divided into two sequences that run in series. For both sequences, the 2nd and 3rd odors are the same and predict reward is not available at the reward port. The 1st and 4th odors are unique, and are followed by reward. Animals are well-trained before undergoing electrode implant and catheterization, and then retrained for two weeks prior to recording. The hypothesis under test is that cocaine-experienced animals will be less able to use the latent task structure to perform the task, and instead encode information about each unique sequence that is largely irrelevant. Behaviorally, both cocaine and sucrose-experienced rats show high levels of accuracy on task, with some group differences noted. When comparing reaction times and poke latencies between sequences, more variability was observed in the cocaine-treated group, implying animals treated these sequences somewhat differently. Analyses done at the single unit and ensemble level suggests that cocaine self-administration had increased the encoding of sequence-specific information, but decreased generalization across sequences. For example, the ability to decode odor position and sequence from neuronal firing in cocaine-treated animals was greater than controls. This pattern resembles that observed within the OFC of animals that had fewer training sessions. The authors then conducted tensor component analysis (TCA) to enable a more "hypothesis agnostic" evaluation of their data.

      Overall, the paper is well written and the authors do a good job of explaining quite complicated analyses so that the reader can follow their reasoning. I have the following comments.

      While well-written, the introduction mainly summarises the experimental design and results, rather than providing a summary of relevant literature that informed the experimental design. More details regarding the published effects of cocaine self-administration on OFC firing, and on tests of behavioral flexibility across species, would ground the paper more thoroughly in the literature and explain the need for the current experiment.

      We appreciate this suggestion and have tried to expand the Introduction to more explicitly situate the study within the existing literature on cocaine-induced changes in OFC function. In particular, prior work has shown that cocaine self-administration alters OFC firing properties and disrupts behavioral flexibility across species, including impairments in reversal learning, outcome devaluation, and sensory preconditioning. We have revised the Introduction to expand this literature review and more clearly articulate how these established findings motivated our focus on OFC representations of hidden task structure and generalization.

      For Fig 1F, it is hard to see the magnitude of the group difference with the graph showing 0-100%- can the y axis be adjusted to make this difference more obvious? It looks like the cocaine-treated animals were more accurate at P3- is that right?

      The concluding section is quite brief. The authors suggest that the failure to generalize across sequences observed in the current study could explain why people who are addicted to cocaine do not use information learned e.g. in classrooms or treatment programs to curtail their drug use. They do not acknowledge the limitations of their study e.g. use of male rats exclusively, or discuss alternative explanations of their data.

      We agree that the current 0–100% scale can make small differences difficult to discern. We will make it clear in the figure captions (We will adjust the y-axis to a narrower range to better highlight group differences). Across P3, cocaine-experienced rats were more accurate than controls.

      We appreciate the suggestion to expand the discussion. We have revised the concluding section to acknowledge key limitations, including the use of only male rats, the number of subjects, and to note that alternative explanations—such as differences in motivational state or attention—could also contribute to the observed effects. These revisions provide a more balanced interpretation while retaining the focus on OFC-mediated generalization as a potential mechanism for persistent, context-specific drug-seeking.

      Is it a problem that neuronal encoding of the "positions" i.e. the specific odors was at or near chance throughout in controls? Could they be using a simpler strategy based on the fact that two successive trials are rewarded, then two successive trials are not rewarded, such that the odors are irrelevant?

      We thank the reviewer for this point. While neuronal encoding of individual positions (specific odors) in control animals was comparatively lower, this does not indicate that the rats were using a simpler strategy based solely on reward patterns. First, rats were extensively trained on the odor sequence task prior to recordings, demonstrating accurate discrimination across all positions, and their trial-by-trial behavior reflects sensitivity to specific odors rather than only reward alternation. Second, the task design—with overlapping sequences and positions that differ in reward contingency across sequences—requires tracking odor-specific context to maximize reward; a purely “two rewarded, two non-rewarded” strategy would fail at overlapping positions and would not account for the compression of functionally equivalent positions observed in the OFC. Third, in the less-trained rats shown in Figure 3C, decoding accuracy was higher than in the sucrose group, indicating that these animals still differentiated negative positions. With additional training, decoding patterns suggested improved generalization across positions. Thus, the near-chance neural selectivity in controls reflects representation of latent task states rather than external sensory cues, consistent with the idea that OFC abstracts task-relevant structure and ignores irrelevant sensory differences.

      When looking at the RT and poke latency graphs, it seems the cocaine-experienced rats were faster to respond to rewarded odors, and also faster to poke after P3. Does this mean they were more motivated by the reward?

      At present, the basis of these response-time differences remains unclear, in part because motivation is difficult to define operationally. If motivation is indexed solely by reaction time or poke latency, then the data are consistent with increased response vigor in cocaine-experienced rats. Indeed, RT and poke-latency measures indicate that cocaine-experienced rats responded more quickly on some rewarded trials, including after P3. However, overall task performance was high in both groups, suggesting that these differences cannot be attributed simply to superior learning or engagement. Faster responses may also reflect differences in deliberation or strategy, with cocaine-experienced rats relying more on rapid, stimulus-driven responding and sucrose-trained rats engaging in more careful evaluation. In addition, altered reward sensitivity or persistent effects of cocaine exposure may contribute to these behavioral differences. Thus, the faster responses observed in cocaine-experienced rats likely reflect a combination of heightened reward responsivity and altered encoding of task structure, rather than a straightforward increase in motivation alone.

      Recommendations for the authors:

      The reviewers were very positive about the manuscript and emphasized the rigor and state of the art analyses. Two points that came up were the very small n (6 total and 3 per condition) and the exclusive use of males. Adding more subjects is not recommended. However, more discussion and acknowledgement of this issue is recommended. The main concern is that idiosyncratic differences between individuals (not differences in cocaine history) are responsible for the differences observed in OFC encoding.

      We acknowledge that the sample size (n = 3 per group) and use of only male rats limit generalizability and do not fully rule out idiosyncratic, individual-specific effects. However, the key neural and behavioral signatures we report were consistent across all animals within each group and across multiple analyses (single-unit, ensemble decoding, and TCA). We now explicitly note these limitations in the Discussion, emphasizing that while individual variability cannot be fully excluded, the convergence of results across multiple levels of analysis supports the interpretation that the observed differences reflect effects of prior cocaine exposure rather than idiosyncratic differences.

      Reviewer #2 (Recommendations for the authors):

      In the legend to figure 2, the authors state "Notably, rats could discriminate between the two sequences (S1 vs. S2) based solely on current sensory information at two task epochs ["Odor" at P3 and P4; black bars]. At all other task epochs, indicated by gray bars, the discrimination relied on an internal memory of events". I'm confused by this statement- how does the odor at P3 help to discriminate the sequences? Surely P1 and P4 are the times when the odor sampling indicates which sequence they are in?

      We thank the reviewer for pointing out this source of confusion. The statement in the original figure legend was imprecise, and we have removed the figure and revised the figure legends because the results in the left panel substantially overlapped with those shown in the right panel. In this task, odors at positions P1 and P4 are the only cues that directly signal sequence identity, whereas the odors presented at P2 and P3 are identical across sequences. Accordingly, discrimination observed during the “Odor” epoch at P3 does not reflect sensory differences but instead depends on the animal’s use of internal memory or sequence context to infer sequence identity.

    1. What children do not see in their books also teaches them about who matters and who doesn’t in our society.

      This quote stood out to me because it shows how important representation is in children’s books. When certain groups of people are not shown in books, children may think those people are not important or do not belong. Books should include different cultures, families, and experiences so that all students can see themselves and others represented. As future teachers, we need to choose books carefully so our classroom libraries reflect the diversity of the real world.

    1. Author response:

      General Statements

      We thank the reviewers for their thoughtful and constructive comments on our manuscript. We have thoroughly considered all points raised and have made extensive revisions to address them. These revisions have significantly strengthened the manuscript.

      In summary, the key revisions and clarifications include:

      (1) Developmental Time-Course: To address the need for earlier phenotypic analysis, we have performed new immunofluorescence experiments at 30 days after hatching (dah). This new data (Fig. S7) precisely pinpoints the onset of the Leydig cell differentiation defect in dhh<sup>-/-</sup> mutants, establishing ~30 dah as the critical window for Dhh action.

      (2) Role of Ptch1 and Ptch2: We have qualified our conclusions regarding receptor specificity throughout the text to accurately reflect our findings and the limitation posed by the early lethality of ptch1 mutants. The in vivo genetic evidence for Ptch2 (the rescue of dhh<sup>-/-</sup> by ptch2<sup>-/-</sup>) is emphasized, while we now explicitly state that a role for Ptch1 cannot be ruled out without future conditional knockout models.

      (3) Mechanism between Gli1 and Sf1: In direct response to the reviewers' request for stronger evidence, we have performed a new cold probe competition assay. This experiment provides dose-dependent, biochemical evidence for the specificity of Gli1 binding to the sf1 promoter (New Fig. 5E). Furthermore, we have revised the text throughout the manuscript to use more precise language (e.g., "Gli1 activates sf1 expression") and removed overstated claims of "direct" regulation.

      (4) Methodological Rigor and Controls: We have added crucial negative controls for all RNA-FISH experiments using sense probes (New Fig. S9), provided detailed quantification methods for immunofluorescence, clarified the number of biological replicates for transcriptomic analyses, and corrected statistical tests as recommended.

      (5) Clarity and Presentation: We have revised the text for clarity, expanded the description of the TSL cell line's validation in the Introduction, added missing details to figure legends and methods, and incorporated suggested key references.

      We believe that our detailed responses and the significant new data and textual revisions have fully addressed the reviewers' concerns and have substantially improved the quality and impact of our manuscript.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      This manuscript by Zhao et. al investigates the canonical hedgehog pathway in testis development of Nile tilapia. They used complementary approaches with genetically modified tilapia and transfected TSL cells (a clonal stem Leydig cell line) previously derived from 3-mo old tilapia. The approach is innovative and provides a means to investigate DHH and each downstream component from the ptch receptors to the gli and sf1 transcription factors. They concluded that Dhh binds Ptch2 to stimulate Gli1 to promote an increase in Sf1 expression leading to the onset of 11-ketotesterone synthesis heralding the differentiation of Leydig cells in the developing male tilapia.

      Major comments:

      (1) Are the key conclusions convincing?

      Most results as reported are convincing; however, some conclusions are premature as additional experiments are required to satisfy their claims. For example, the phenotype of the dhh-/- testis is convincing in that Cyp1c1 cells are missing and the addition of ptch2-/- rescues the phenotype indicating a direct path. The link from gli to sf1, however, requires additional study to validate the direct relationship (see item 3 below).

      We thank the reviewer for the positive assessment that our principal findings are convincing. Regarding the connection between Gli1 and Sf1, we agree that additional validation was important. We have now performed new experiments and revised our text. As detailed in our response to item 3 below, we have incorporated a cold probe competition assay (new Fig. 5E) which provides dose-dependent evidence for the specificity of Gli1 binding to the sf1 promoter. Furthermore, we have toned down our conclusions in the manuscript.

      (2) Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Major: Most significant premature claim is the statement that gli1 directly controls sf1 activity. Additional experiments are required to make this claim (see next statement).

      We agree with the reviewer that the claim of "direct" control was premature. We have therefore revised the manuscript accordingly. All statements claiming "direct" regulation of sf1 by Gli1 have been removed or replaced with more accurate descriptions, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1." These changes, coupled with the new functional data from the cold probe competition experiment (Fig. 5E) described in our response to item 3, now provide a robust and appropriately qualified account of our findings.

      Minor: As addressed in the discussion section, the ptch1 animals fail to survive limiting the ability to validate both ptch1 and ptch2 roles. Thus, the conclusion that only ptch2 is required should be qualified.

      We thank the reviewer for this rigorous comment. We fully acknowledge the limitation imposed by the early lethality of ptch1 mutants, which precludes a definitive in vivo assessment of its potential role in postnatal testis development. In direct response to this point, we have revised the text throughout the manuscript to more accurately reflect the strength of our conclusions. Specifically, in the Results section, we now state that “This differential receptor requirement implies that Ptch2 likely acts as the functional receptor for transducing Dhh signals in TSL cells” (lines 174–176). Furthermore, we have strengthened the Discussion by explicitly stating: “Therefore, while our findings strongly nominate Ptch2 as the principal receptor for Dhh in SLCs, a definitive exclusion of a role for Ptch1 will require future studies employing Leydig cell–specific conditional knockout models” (lines 265–268). We believe these revisions provide a appropriately qualified interpretation of our data while maintaining the compelling narrative of Ptch2's primary role.

      Major: There are a couple of key references missing however, please consider including:

      - Kothandapani A, Lewis SR, Noel JL, Zacharski A, Krellwitz K, Baines A, Winske S, Vezina CM, Kaftanovskaya EM, Agoulnik AI, Merton EM, Cohn MJ, Jorgensen JS.PLoS Genet. 2020 Jun 4;16(6):e1008810. doi: 10.1371/journal.pgen.1008810. eCollection 2020 Jun.PMID: 32497091

      - Park SY, Tong M, Jameson JL.Endocrinology. 2007 Aug;148(8):3704-10. doi: 10.1210/en.2006-1731. Epub 2007 May 10.PMID: 17495005

      We have included the key references: Kothandapani A, et al. (2020). PLoS Genet. and Park SY, et al. (2007). Endocrinology.

      (3) Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. Additional experiments are suggested to strengthen the direct connection between gli1 and sf1:

      Major: Figure 5F shows evidence for increased sf1-luc activity upon co-transfection of OnGli1 in TSL cells. These data would be strengthened with evaluation of the same sf1 promoter that has each/both putative GLI binding sites mutated.

      We thank the reviewer for this insightful suggestion. To further strengthen the evidence for the functional connection between Gli1 and the sf1 promoter, we have performed a new cold probe competition experiment. Given the potential presence of other unpredicted Gli-binding motifs within the 5-kb sf1 promoter region and the practical constraints, we employed an alternative, robust biochemical approach. This assay used a wild-type oligonucleotide containing the canonical Gli-binding motif (GACCACCCA) as a specific competitor. As shown in the new Fig. 5E, this cold probe caused a significant, dose-dependent reduction in Gli1-induced sf1-luc activity, while a mutated control probe (TTAATTAAA) had no effect. This result provides strong evidence that Gli1-mediated transactivation of the sf1 promoter is dependent on its specific binding to this consensus motif.

      Furthermore, in response to the reviewer's comment, we have revised the manuscript text to use more precise language, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1," toning down any overstated claims of direct regulation. Together with the existing data-which includes the original luciferase assay, the new competition experiment, and key loss-of-function/gain-of-function genetic evidence from SLCs transplantation-we believe our study now provides a compelling and multi-faceted case for Gli1 being the key regulator of sf1 within this pathway. We are confident that these revisions have satisfactorily addressed the point raised.

      Major: All 8xGli-luciferase assays should include evaluation of the mutant 8xGli-luciferase plasmid as a negative control.

      We thank the reviewer for highlighting the importance of reporter assay controls. In our study, we included the empty vector pGL4.23, which lacks any Gli-binding sites, as the fundamental negative control. As shown in Fig. 4C, this vector showed minimal background activity that was unresponsive to Dhh, confirming that the strong luciferase induction in the 8xGli-reporter is entirely dependent on functional Gli-binding sites. While a mutated 8xGli construct is one valid approach, we think that the use of an empty vector is functionally equivalent and equally rigorous for establishing specificity. We are confident that our current data unambiguously demonstrate Gli-dependent activation. For clarity, we have explicitly stated in the figure legend and methods that pGL4.23 served as the negative control.

      Minor: Figure 5D experiment that includes TSL-gli1(also 2,3) +/- OnDhh; please examine whether the absence of Gli affects expression of sf1 in each condition. In other words, provide a loss-of-function of Gli connection to regulation of sf1.

      We measured the mRNA expression levels of sf1 in TSL-WT, TSL-gli1<sup>-/-</sup>, TSL-gli2<sup>-/-</sup>, and TSL-gli3<sup>-/-</sup> cells using qRT-PCR. The results are presented in the new Supplementary Figure S8A. The results show that the loss of gli1 leads to a significant reduction in the expression of sf1. In contrast, the knockout of gli2 or gli3 had no significant effect on sf1 expression levels.

      (4) Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Given the expertise, it is not anticipated that the suggested experiments would be a significant burden to this group.

      We appreciate the reviewer's considerations. Now, we have performed the additional key experiments, which have been incorporated into the revised manuscript. We believe these new data have fully addressed the points raised.

      (5) Are the data and the methods presented in such a way that they can be reproduced?

      Most methods are adequately described or referenced to previous detailed description. There were, however, some methods that could benefit from additional details:

      Major: IF quantification data: please provide details on how the number of positive cells were quantified and presented, for example, how many cells from how many sections for each genotype were included for the analysis?

      We have added relevant information in the "Materials and Methods" section in line 369-373: “For each biological replicate (n\=5-6 fish per genotype), three non-serial, non-adjacent testis sections were analyzed. From each section, three representative fields of view were captured to ensure non-overlapping sampling. All positive cells number of Vasa, Sycp3 and Cyp11c1 was quantified by Image J Pro 1.51 software using default parameters.”

      Major: FISH: No controls are present, for example, scrambled RNA probes. Further, please clarify or address the significant presence of message in the nucleus.

      As suggested, we have now included negative control experiments using sense RNA probes for all genes (ptch1, ptch2, gli1, gli2, gli3). These controls showed no specific signal, confirming the specificity of our antisense probe hybridization. These data are now presented in the new Supplementary Figure S9.

      Major: TSL cells: TSL-onDhh, -onSf1: provide evidence for increase in expression

      We measured the mRNA expression levels of dhh in TSL-WT and TSL-OnDhh, and sf1 in TSL-WT and TSL-OnSf1 using qRT-PCR. The results are presented in the new Supplementary Figure S8B. The results show that overexpression of Dhh and Sf1 significantly increased the mRNA expression levels of dhh and sf1, respectively.

      Major: TSL + SAG cells and other treatments in general: how long were they treated before transplantation?

      Response: We have added relevant information in the "Materials and Methods" section in line 398-399: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before transplantation.”

      Major: Transcriptome analyses: how many replicates were used for each cell line? Please clarify-the results presented in Fig 5E: how was this plot generated, it is interpreted that all three cell lines were combined and compared to the WT line. It is not clear how this was achieved.

      We have added relevant information in the "Materials and Methods" section in line 445-447: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before collection. For each genotype, cells from three independent culture wells were pooled.

      Added relevant information in the "Results" section in line 198-202: “…we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions.”

      (6) Are the experiments adequately replicated and statistical analysis adequate?

      Most are adequate and appropriate, some questions remain:

      - Transcriptomes-how many replicates (see above)?

      - IF quantification-how were cells identified/how many sections (see above)?

      Minor: Statistics: methods indicate that a student's t-test was used, but ANOVA's are also used, which is appropriate. There are data presented that should be reevaluated via an ANOVA: Figure 4D, 4N-R; Figure 5G-no stats indicated in figure legend.

      We sincerely thank the reviewer for highlighting the inappropriate use of statistical tests in our original submission. We have re-analyzed all data using the ANOVA-based methods as suggested in the specific detail. We confirm that these changes do not alter the overall interpretation of our results but provide a more robust and statistically sound foundation for our conclusions. We changed “Differences were determined by two-tailed independent Student's t-test” to “Statistical significance was determined by one-way ANOVA followed by Tukey's test (C, Q-U, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (D) (*, P < 0.05; **, P < 0.01; NS, no significant difference).”

      In lines 719-721 we added “Statistical significance was determined by one-way ANOVA followed by Tukey's test (E, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (B, H) (*, P < 0.05; **, P < 0.01; NS, no significant difference).” in line 745-747.

      Reviewer #1 (Significance):

      The data presented in this manuscript provides important context towards the connection between the DHH pathway, Sf1, and steroidogenesis.

      The audience would likely include developmental biologists, including those related to differentiation of any hormone producing cell type and especially those focused on steroidogenesis onset. Clinical interests will be related to sex determination and differentiation, especially related to male sex phenotype differentiation. Basic scientists will be especially interested.

      Expertise: mouse fetal testis differentiation and maturation, steroidogenesis, hedgehog, sf1. Good fit except for the animal model, but they are surprisingly similar.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this work, Zhao et al., investigated the role of Dhh signaling pathway in the proliferation and differentiation of leydig lineage cells in the testes of Nile tilapia, an economic important farmed fish. By generating dhh mutants, the authors showed that loss of Dhh in tilapia recapitulated mammalian phenotypes, characterized by testicular hypoplasia and androgen insufficiency. A previous established TSL line was used to rescue the deficits in dhh-/- testes, which demonstrated that Dhh regulates the differentiation of SLCs rather than their survival. By generating mutant TSL lines, the authors aimed to identify the downstream players under Dhh in tilapia. Based on the data, the authors propose that a dhh-ptch2-gli1-sf1 axis exists in leydig cell lineage development.

      How secreted dhh from Sertoli cells affect the Leydig cells remains elusive. While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model. Unfortunately, this work is not well performed, and the conclusions are not well supported by the current data. And to reach logic conclusions, more meaningful experiments should be performed, and more convincing data should be provided.

      Strength:

      The authors used genetic mutants, TSL lines, and cell transplantation techniques to address the questions. The manuscript is technically sound, and overall is well-written.

      Limitations:

      Experimental design should be optimized, and more convincing data should be provided to reach solid conclusion.

      (1) The SLCs (stem leydig cells) used in this work. The SLC line was established from 3-month-old immature XY tilapia. The authors claimed that this line is a SLC line only because they express a few Leydig markers such as pdgfra and nestin. However, in my opinion, the identity of the cell line is not clear. It is suggested to perform more experiments, including flow cytometry assay or single cell RNA sequencing analysis, to further characterize this line, to demonstrate that this line is a real SLCs that are equivalent to the SLCs in 3-month testes of tilapia. According to the previous publication (2020), the information about the line was not well presented.

      We thank the reviewer for this comment regarding the characterization of the TSL cell line. The identity of TSL as a stem Leydig cell line was rigorously established in our previous publication (Huang et al., 2020), which provided comprehensive molecular, in vitro, and in vivo functional evidence that meets the definitive criteria for an SLC. This includes its stable expression of established SLC markers (pdgfrα, nestin, coup-tfii), its capacity to differentiate into steroidogenic cells producing 11-KT in vitro, and most critically, its ability to colonize the testicular interstitium, differentiate into Leydig cells, and restore androgen production upon transplantation in vivo.

      In direct response to the reviewer's point, we have revised the Introduction of our manuscript to provide a more detailed and clear description of the TSL line's origin and validation (lines 95-105) as “Furthermore, a stem Leydig cell line (TSL) has been established from the testis of a 3-month-old Nile tilapia. TSL expresses platelet-derived growth factor receptor α (pdgfrα), nestin, and chicken ovalbumin upstream promoter transcription factor II (coup-flla), which are usually considered as SLC-related markers in several other species. Notably, this cell line exhibits the capacity to differentiate into 11-ketotestosterone (11-KT)-producing Leydig cells both in vitro and in vivo. When cultured in a defined induction medium, TSL cells differentiate into a steroidogenic phenotype, expressing key steroidogenic genes including star1, star2, and cyp11c1, and producing 11-KT; upon transplantation into recipient testes, TSL cells successfully colonize the interstitial compartment, activate the expression of steroidogenic genes, and restore 11-KT production”, ensuring that readers can fully appreciate its well-founded identity as a SLC model without needing to consult the original publication. We are confident that the existing body of evidence solidly supports all conclusions drawn from its use in this study.

      (2) How loss of dhh affects testicular and the leydig cell lineage development are not clearly investigated. In the current manuscript, the characterization of dhh mutant was not enough and lack of in-depth investigation. The authors primarily looked at testes at 90 dph when Leydig cell lineage was well developed. In my opinion, this time was too late. To investigate the earlier events that are affected by loss of dhh, I suggested to perform experiments at earlier time points, in particular around the initiation stages of the sex differentiation and Lyedig cell specification/maturation.

      We thank the reviewer for this insightful comment. We agree that a thorough developmental analysis is crucial. In response to this point, we have now performed an in-depth investigation at earlier stages to precisely define the phenotype onset.

      Our revised manuscript includes new data from a developmental time-course analysis. While our initial characterization included 5, 10, and 20 dah, we now identified 30 dah as the critical window for Leydig cell differentiation onset, which was also supported by prior work (Zheng et al.). Our new immunofluorescence data at 30 dah now clearly show that Cyp11c1-positive cells are present in wild-type testes but are entirely absent in dhh<sup>-/-</sup> mutants (Fig. S7). This finding pinpoints the initial failure of SLC differentiation.

      We have integrated this key finding into the Discussion (lines 234-239) as “To define the onset of Leydig cell differentiation, we performed a developmental time-course analysis. This revealed that Cyp11c1-positive steroidogenic cells first appear in wild-type testes at 30 dah, while being conspicuously absent in dhh<sup>-/-</sup> mutants at this same stage (Fig. S7). This clear temporal pattern establishes ~30 dah as the developmental window when SLCs initiate their differentiation program in the Nile tilapia.”

      Concurrently, our analysis of the 90 dah timepoint remains vital, as it represents a mature stage with robust spermatogenesis and a stabilized somatic niche. This allows for a comprehensive assessment of the ultimate functional consequences of the early differentiation block, including its impact on germ cell support and overall testicular architecture.

      Thus, our study now provides a complete developmental perspective: the 30 dah timepoint identifies the initiation of the Dhh-dependent defect, while the 90-dah analysis reveals the mature, functional outcomes within the intact testicular niche.

      (3) The authors claimed that there was a ptch2-gli1-sf1 axis. The conclusion was drawn largely based on data that generated from the in vitro cultured TSL line. More data from genetic mutant tilapia are required to support the conclusion.

      We thank the reviewer’s insightful comments regarding the need for robust in vivo validation. In fact, our conclusion of a Dhh-Ptch2-Gli1-Sf1 axis is supported by an integrated experimental strategy, combining key in vivo evidence with targeted in vitro analyses to build a coherent model.

      (1) Evidence for Ptch2 as the key receptor: The role of Ptch2 is supported by a pivotal in vivo genetic experiment. The observation that the dhh<sup>-/-</sup> testicular phenotype is fully rescued in dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants provides compelling genetic evidence that Ptch2 is the essential receptor for Dhh in vivo (Fig. 4E-U). We acknowledge that the early embryonic lethality of global ptch1 mutation precludes its functional analysis in postnatal testis development. Therefore, while our data strongly nominate Ptch2 as the principal receptor, we have qualified our conclusions in the revised manuscript to reflect that a role for Ptch1 cannot be definitively excluded without Leydig cell-specific conditional knockout models.

      (2) Evidence for Gli1 and its regulation of Sf1: The role of Gli1 as the key transcriptional effector was efficiently identified using our well-characterized TSL system, a valid approach for dissecting this highly conserved signaling cascade. The functional connection between Gli1 and Sf1 is supported by multiple lines of evidence: transcriptomic profiling, promoter analysis, luciferase reporter assays (including a new cold probe competition experiment), and most importantly, in vivo functional validation via SLC transplantation. The latter demonstrated that Sf1 is both necessary and sufficient for SLC differentiation within the testicular niche (Fig. 5).

      In direct response to the reviewer's points, we have thoroughly revised the manuscript text to ensure all claims are accurately stated, particularly regarding the receptor specificity and the nature of the Gli1-Sf1 regulatory relationship. We believe our study provides a solid foundation for the proposed signaling axis.

      Overall, better experimental design should be planned, including the rescue experiments. Some key information was missed. For instance, the identity of the stem Leydig cells was not clearly presented.

      We have explained it in point #1.

      Figures:

      Figure 1: The authors described the phenotypes at 90 dph. Loss of dhh led to severe phenotypes in testicular formation, as evidenced by defective formation of Vasa, a germline stem cell marker; loss of expression of cyp11c1, a leydig cell marker; and loss of sycp3, a marker of meiosis of spermatogonia.

      However, in my opinion, 90 dph was too late. To investigate the role of dhh in Leydig cell lineage, the authors are suggested to focus on earlier developmental stages when the sex differentiation and maturation of leydig cells occur. This work is actually a development biology one that investigates how dhh loss in Sertoli cells affects the development of Leydig cells. The careful characterization of earliest testicular phenotypes of dhh mutant is very important.

      We have explained it in point #2.

      Figure 2: Please clarify the logic for performing rescue experiments using 11-KT. Provided the critical role of 11-KT in the testis development and spermatogenesis, it was not unexpected that 11-KT treatment can rescue most of the cell types in testes. If dhh is absolutely required for LC lineage development maturation, adding 11-KT at 30 dph will not have an effect. Why not perform rescue experiments using Dhh protein?

      We thank the reviewer for this insightful comment, which allows us to clarify the logical progression of our experimental design, a process central to genetic discovery.

      When we first characterized the dhh<sup>-/-</sup> mutant, we observed a complex suite of phenotypes: testicular hypoplasia, arrested germ cell development, a profound deficiency of Leydig cells, and drastically low androgen levels. A primary challenge was to distinguish which defects were direct consequences of losing Dhh signaling and which were secondary effects of the overall testicular failure.

      We therefore employed a classic genetic strategy: phenotypic dissection through targeted rescue. The 11-KT rescue experiment was designed to test a foundational hypothesis: Are the severe testicular defects in dhh<sup>-/-</sup> mutants primarily a consequence of the systemic androgen deficiency? The results provided a pivotal and clear answer: while 11-KT treatment partially rescued germ cell development and testicular structure, it completely failed to restore the population of Cyp11c1-positive Leydig cells. This critical finding allowed us to dissociate the phenotypes, demonstrating that the Leydig cell defect is a primary, cell-autonomous consequence of Dhh loss, not a secondary effect of low androgen.

      This conclusion logically propelled the next phase of our research: to shift focus from systemic hormone action to the local, niche role of Dhh in regulating the Leydig lineage directly. This led directly to the TSL transplantation experiments and the mechanistic dissection of the Ptch2-Gli1-Sf1 axis within SLCs.

      Regarding the use of Dhh protein, we agree it is a complementary approach. However, producing biologically active, recombinant Hedgehog ligand is challenging due to its essential dual lipid modification, which is required for solubility and activity. Our transplantation experiments with TSL-OnDhh cells (Fig. 3) functionally demonstrate that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, thereby directly addressing the core question without the need for recombinant protein.

      Figure 3. The authors showed that in dhh-/- testes, TSL engrafted equivalently but failed to express Cyp11c1. This result was strange which raised a question about the identity of the TSLs, as I have mentioned above. The authors claimed that the TSLs are stem Leydig cells, which I doubt. Additional data should provided to support the statement.

      In the testicular environment, the transplanted TSLs should be able to colonize and differentiate into more mature leydig cells. Only a small portion of the PKH26-labled TSLs became Cyp11c1 positive after transplantation, can the authors comment this observation?

      To address "Mutation of dhh blocks SLC differentiation", the authors should first carefully examine the TSL lineage development using dhh mutant. Then, investigate how loss of dhh disrupts the cross talk between Sertoli cells and Leydig cells. why bother performing transplanted TSLs? Please clarify. Why not perform rescue experiments using Dhh protein at appropriate developmental stages?

      We thank the reviewer for these comments, which allow us to clarify the rationale and interpretation of our key experiments.

      (1) We have provided comprehensive evidence establishing the TSL line as a SLC line (Response to Point #1). The observation that WT TSL cells engraft but fail to differentiate in the dhh<sup>-/-</sup> testicular environment is not strange; it is, in fact, the core and most crucial finding of this experiment. It provides direct functional evidence that the dhh<sup>-/-</sup> niche lacks the essential signals required to initiate SLC differentiation, consistent with the severe deficiency of endogenous Cyp11c1<sup>+</sup> cells in these mutants (Fig. 1I-J', N).

      (2) The reviewer's concern about "only a small portion" of cells differentiating is based on a misunderstanding. Our quantitative data (Fig. 3F) show that approximately 78% of the transplanted PKH26+ TSL cells successfully differentiated into Cyp11c1<sup>+</sup> cells in WT hosts. This high efficiency robustly demonstrates the differentiation potential of TSL cells and the permissiveness of the WT niche. The near-zero differentiation rate in the dhh<sup>-/-</sup> host (Fig. 3F) starkly highlights the specific and severe defect in the mutant microenvironment.

      (3) The TSL transplantation experiment was the most direct strategy to test why Cyp11c1<sup>+</sup> cells are absent in dhh<sup>-/-</sup> testes. It allowed us to distinguish between a failure in SLC differentiation and other possibilities (e.g., cell death). The finding that functional SLCs cannot differentiate in the mutant niche logically directed our subsequent focus onto the cell-intrinsic molecular mechanism (the Ptch2-Gli1-Sf1 axis) within the Leydig lineage. While Sertoli-Leydig crosstalk is an important area, it was beyond the scope of this study aimed at defining the intrinsic differentiation pathway.

      (4) Regarding Dhh protein rescue, generating bioactive, lipid-modified recombinant Hh protein is technically challenging. Our transplantation of TSL-OnDhh cells (Fig. 3) functionally demonstrates that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, effectively addressing this question without the need for recombinant protein.

      Figure S3. “To assess whether dhh mutation affects androgen-producing cells outside Leydig cells, 11-KT levels were analyzed during early testicular development before SLCs differentiation. IF analyses revealed that no Cyp11c1 positive cells were present in the testes of XY WT fish at 5, 10, and 20 dah, indicating that SLCs had not yet differentiated at these stages (Fig. S3A-C). Tissue fluid 11-KT levels showed no significant differences between WT and dhh-/- XY fish at 5, 10, and 20 dah (Fig. S3D)”. These observations suggested that loss of dhh does not affect the specification of SLCs, but affect its differentiation into mature LCs. The differentiation of Cyp11c1 should be later than 20 dah. So when is the earliest time point for formation of Cyp11c1 positive cells, and how loss of dhh affect this? These are important questions to answer.

      We agree with the reviewer's interpretation that our data suggest dhh loss affects SLC differentiation rather than initial specification. In direct response to the need for earlier timepoints, we have now performed and included an analysis at 30 dah, which we identified as the critical window for Leydig cell differentiation onset. Our new data (Fig. S7) show that Cyp11c1+ cells are present in WT testes but are entirely absent in dhh<sup>-/-</sup> mutants at this stage. This precisely pinpoints the initiation of the phenotypic divergence and establishes ~30 dah as the developmental window when Dhh signaling is required to drive SLC differentiation. Our study therefore now provides a complete developmental perspective, from the initial failure at 30 dah to the mature functional outcomes at 90 dah.

      Figure 4. The authors generated ptch1/2 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Ptch2, but not Ptch1, is specifically required for transducing Dhh signals in TSLs. The conclusion was only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments, using ptch mutants, should performed to substantiate this.

      The authors stated “Ptch2 acts as the obligate receptor for Dhh signaling during testis development”. If ptch2 is required for TSL lineage, why ptch2-/- testes exhibited no significant differences in testicular histology and Leydig cell (Cyp11c1+) populations and serum 11-KT levels? This contradictory statement need to be addressed.

      We thank the reviewer for these critical comments, which allow us to clarify the logic underlying our conclusions regarding Ptch2.

      (1) In Vivo Genetic Evidence for Ptch2: Our conclusion that Ptch2 is the primary receptor for Dhh is not based solely on the TSL luciferase assays. It is definitively supported by a key in vivo genetic experiment: the complete phenotypic rescue in the dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants (Fig. 4F-R). In genetic terms, the loss of the receptor (ptch2) suppressing the phenotype caused by the loss of the ligand (dhh) is classic evidence for a ligand-receptor relationship within a linear pathway. This in vivo evidence strongly substantiates Ptch2's role at the animal level. The early embryonic lethality of ptch1 mutants precludes a similar in vivo test for Ptch1 in postnatal testis development.

      (2) Addressing the Apparent Contradiction of the ptch2<sup>-/-</sup> Phenotype: The reviewer raises an excellent point, which stems from the fundamental biology of the Hh pathway as shown in Author response image 1. Ptch receptors are inhibitory. In the absence of ligand, Ptch suppresses pathway activity.

      Author response image 1.

      The canonical Hh signaling pathway. In the dhh<sup>-/-</sup> mutant, the pathway is suppressed due to unopposed Ptch activity, leading to a failure in SLC differentiation. In the ptch2<sup>-/-</sup> mutant, this key inhibitory brake is removed, leading to constitutive activation of the pathway. The fact that ptch2<sup>-/-</sup> testes are normally indicates that this level of pathway activation is not detrimental and, crucially, is sufficient to support wild-type levels of Leydig cell development and steroidogenesis. This lack of a phenotype in the receptor mutant, contrasted with the severe ligand mutant phenotype, is a common and expected observation in signaling pathways where the receptor acts as a tonic inhibitor.

      In summary, the normal development of ptch2<sup>-/-</sup> testes is not contradictory but is entirely consistent with its role as the inhibitory receptor for Dhh. The severe phenotype in dhh<sup>-/-</sup> mutants and its specific rescue by removing ptch2 provides compelling genetic evidence for their functional relationship. We have revised the text throughout the manuscript to ensure these conclusions are accurately stated.

      Figure 5. The authors generated gli1/2/3 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Gli1, but not Gli2/3, was specifically required for transducing Dhh signals in TSL cells. The conclusion is drawn, only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments should performed to substantiate this, using the gli mutant fish.

      To identify Gli1-dependent targets in SLCs, the authors compared transcriptomes of TSLWT, Dhh-overexpressing (TSL-OnDhh), Gli1-overexpressing (TSL-OnGli1), and SAG-treated (TSL+ SAG) TSL cells. While this experiments can be used to identify dhh target genes, it is better to use gli mutant cell lines. Since the authors have generate gli1/2/3 mutants, why not using these mutant fish to identify/confirm the Gli targets?

      We thank the reviewer for these comments.

      (1) We acknowledge that Gli1 as the key transcriptional effector is primarily based on our in vitro evidence using the TSL cell line. We have revised the manuscript accordingly to ensure this is stated precisely, avoiding overstatement.

      (2) Concerning the transcriptomic analysis, the reviewer suggests using glis mutant cell lines. While this is a valid approach, our strategy of profiling pathway activation (via Dhh/Gli1 overexpression or SAG treatment) was deliberately chosen to provide a high signal-to-noise ratio for identifying genes that are positively upregulated during the differentiation process. Analyzing loss-of-function mutants under basal conditions can be confounded by potential compensatory mechanisms among the Gli family members, potentially masking the specific transcriptional signature of pathway activation we sought to capture.

      By the way, we have generated gli1/2/3 mutant TSL cell lines for the functional luciferase assays, but we have not generated the corresponding glis mutant fish lines, which would represent a substantial new line of investigation.

      Reviewer #2 (Significance):

      While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      The authors investigate the Dhh signaling pathway in Leydig cell differentiation in the tilapia model. They generated multiple mutant lines in different hedgehog pathway components and utilized a Leydig stem cell line to interrogate Leydig cell differentiation. Through this analysis, the authors demonstrate that Dhh regulates Leydig differentiation rather than survival. They also found that Ptch2 is the specific receptor that mediates signaling to promote Leydig differentiation and that Gli1 is the primary Gli involved. Furthermore, they show that a known regulator of Leydig cell development and function, SF1, is a downstream transcriptional target. Overall, the study identifies previously unknown information as to how Dhh signaling regulates Leydig cell development, which is necessary for testosterone production by the testis.

      Major Comments

      (1) In the RNAseq analysis is not clear exactly how the 33 "up-regulated" genes were identified. What was the methodology for identification of these genes? Some of the genes were down-regulated or not different in the OnGli condition and some in the OnDhh condition were not differentially expressed, as shown in Fig S8B. Therefore, it is unclear why all 33 genes are classified as upregulated "across all three conditions".

      We have clarified this methodology in the Materials and Methods section in line 452-454: “Differentially expressed genes (DEGs) were identified for each condition (TSL-OnDhh, TSL-OnGli1, TSL+SAG) compared to TSL-WT controls using edgeR (threshold: FDR < 0.05, |log2(foldchange)| ≥ 1.5). And we Added relevant information in the Results section in line 198-202: we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions (Fig. 5C, S6A).”

      We have also updated Fig. S8B to include a clear value and to better visualize the FPKM value levels of these 33 genes across the conditions.

      (2) In figure 4A (and possibly B), it appears that ptch RNA is in the nucleus of the cell. Why would the RNA be primarily in the nucleus? Is the RNA detection accurate? Were controls done? The methods state that sense probes were made but no how they compared to the antisense probes. This comment can also be applied to the gli FISH, particularly gli3 (Figure 5).

      This is an excellent observation. We speculate that the apparent nuclear signal may be due to strong transcriptional activity in the nucleus. To confirm the specificity of our FISH experiment, we performed FISH with sense RNA probes as negative controls for all genes (ptch1, ptch2, gli1, gli2, gli3), and no specific signals were observed (see New Fig. S9).

      Minor comments

      (1) In the introduction, please include information as to when tilapia reach sexual maturity

      We have added this information to the Introduction in line 91-92: early sexual maturity (approximately 3 months after hatching for males and 6 months after hatching for females).

      (2) When first mentioning experiments that use the PKH26 dye, please give a brief description of the dye in the text of the results. This is described in the methods but it would be helpful to have some information about what PKH26 is in the results to more easily understand the figure and experimental design.

      We have added a brief description in the Results section in line 151-152: “To dissect Leydig cell lineage impairment in dhh<sup>-/-</sup> testes, we transplanted the TSL labeled with PKH26 (a fluorescent red hydrophobic membrane dye that enables tracking of transplanted cells) into WT and dhh<sup>-/-</sup> testes (Fig. 3A).”

      (3) In the statistical analysis section of the methods, the authors state that two-tailed t-tests were performed however in the figure legends it states that ANOVA was done for some of the statistical analysis. Please clarify this.

      We have updated the Statistical Analyses section in Methods to clarify in line 472-476: “A two-tailed independent Student’s t-test was used to determine the differences between the two groups. One-way ANOVA, followed by Tukey multiple comparison, was used to determine the significance of differences in more than two groups. P < 0.05 was used as a threshold for statistically significant differences.”

      (4) Figures - in figures that have charts with the Y-axis labeled as "relative positive cells", or similar, please explain what exactly is meant by "relative". What is it relative to?

      We have revised all relevant Y-axis labels and figure legends to explicitly state the quantification method. For example, we now use: "Vasa<sup>+</sup> / DAPI<sup>+</sup> (%), Sycp3<sup>+</sup> / DAPI<sup>+</sup> (%) or Cyp11c1<sup>+</sup> / DAPI<sup>+</sup> (%).

      (5) Figure 1: please point out the testes in panels A and B

      We have indicated the position of the testes with arrows in Figures 1A and B.

      (6) In figure 4, it would be helpful for the WT images from S7 moved to fig 4.

      We have moved representative WT images from Fig. S7 into Fig. 4 for easier comparison with the mutant phenotypes.

      (7) Figure 4E: Are the yellow bars comparable to each other. Is there any significance to the increased luciferase with 8xGli in ptch2-/- as compared to the other genotypes?

      We thank the reviewer for this astute observation. Yes, the yellow bars are directly comparable, and the elevated basal luciferase activity of the 8xGli reporter in the ptch2<sup>-/-</sup> TSL cells is indeed significant and expected. The genetic ablation of ptch2 removes this inhibition, leading to ligand-independent, constitutive activation of the downstream signaling cascade. The observed increase in basal reporter activity in the ptch2<sup>-/-</sup> cells is a classic manifestation of this mechanism.

      The primary objective of this experiment was to test the cells' responsiveness to Dhh stimulation across genotypes. The key finding is that while wild-type and ptch1<sup>-/-</sup> cells showed a significant response to Dhh, the ptch2<sup>-/-</sup> cells-which already exhibited high basal activity-were completely unresponsive. This combination of constitutive activation and ligand insensitivity in the ptch2<sup>-/-</sup> genotype provides particularly strong genetic evidence that Ptch2 is the essential receptor mediating Dhh signal transduction in this system.

      (8) Figure 5G: please include what exactly what each construct name stands for in the figure legend

      We have expanded the legend for Fig. 5G to define each construct.

      (9) Figure S8B: please include what the values in the table are (eg are these the significance values?)

      We have updated the caption for Figure S8B (now Figure S6B): “The FPKM value for each gene in each sample is indicated within the squares. The color gradient from blue to red reflects low to high expression levels per row (gene).”

      Reviewer #3 (Significance):

      Strengths and limitations:

      The genetics of the tilapia system and the availability of the tilapia Leydig stem cell lines were particular strengths of this study. The study utilizes fish genetics to genetically interrogate the Dhh signaling pathway in Leydig cell development through generation and analysis of mutant lines. The tilapia Leydig stem cell line was an integral part of this study as it allowed for genetic and chemical manipulation of Dhh signaling in undifferentiated Leydig cells and, through transplantation into testes, allowed for analysis of how Leydig cell differentiation was affected.

      Advance:

      The study makes significant advances as to how Dhh signaling instructs Leydig cell differentiation, including identification of the Ptch receptor and Gli transcription factor that function downstream of Dhh in this process. Furthermore, they identify a direct link between Dhh signaling and Sf1 expression, which is known to important for Leydig cell function.

      Audience:

      This study will be of particular interest to reproductive biologists, endocrinologists, and developmental biologists. The study may also be of interest to researchers and physicians investigating cancers that are promoted by androgens produced by Leydig cells of the testis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper aims to characterize the relationship between affinity and fitness in the process of affinity maturation. To this end, the authors develop a model of germinal center reaction and a tailored statistical approach, building on recent advances in simulation-based inference. The potential impact of this work is hindered by the poor organization of the manuscript. In crucial sections, the writing style and notations are unclear and difficult to follow.

      We thank the reviewer for their kind words, and have endeavored to address all of their concerns as to the structure and style of the manuscript.

      Strengths:

      The model provides a framework for linking affinity measurements and sequence evolution and does so while accounting for the stochasticity inherent to the germinal center reaction. The model's sophistication comes at the cost of numerous parameters and leads to intractable likelihood, which are the primary challenges addressed by the authors. The approach to inference is innovative and relies on training a neural network on extensive simulations of trajectories from the model.

      Weaknesses:

      The text is challenging to follow. The descriptions of the model and the inference procedure are fragmented and repetitive. In the introduction and the methods section, the same information is often provided multiple times, at different levels of detail.

      Thank you for pointing this out. We have rearranged the methods in order to make the presentation more linear, and to reduce duplication with the introduction.

      Specifically, we moved the affinity definition to the start, removed the redundant bullet point list, and moved the parameter value table to the end.

      This organization sometimes requires the reader to move back and forth between subsections (there are multiple non-specific references to "above" and "below" in the text).

      This is a great point, we have either removed or replaced all references to "above" or "below" with more specific citations.

      The choice of some parameter values in simulations appears arbitrary and would benefit from more extensive justification. It remains unclear how the "significant uncertainty" associated with these parameters affects the results of inference.

      We have clarified where various parameter values come from:

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      In addition, the performance of the inference scheme on simulated data is difficult to evaluate, as the reported distributions of loss function values are not very informative.

      We thought of two different interpretions for this comment, so have worked to address both.

      First, the comment could have been that the distribution of loss functions on the training sample does not appear to be informative of performance on data-like samples. This is true, and in our revision we have emphasized the distinction between the two types of simulation sample: those for training, where each simulated GC has different (sampled) parameter values; vs the "data mimic" samples where all GCs have identical parameters. Since the former have different values for each GC, we can only plot many inferred curves together on the latter. We also would like to emphasize that the inference problem for one GC will have much more uncertainty than will that for an ensemble of GCs (as in the full replay experiment).

      “After building and training our neural network, we evaluate its performance on subsets of the training sample. While this evaluation provides an important baseline and sanity check, it is important to note that the training sample differs dramatically from real data (and the “data mimic” simulation sample that mimics real data). While real data consists of 119 GCs with identical parameters and thus response functions, we need the GCs in our training sample to span the space of all plausible parameter values. This means that while we must evaluate performance on individual GCs in the training and testing samples, in real data (and data mimic simulation) we combine results from 119 curves into a central (medoid) curve. Inference on the training sample will thus appear vastly noisier than on real data and data mimic simulation, and also cannot be plotted with all true and inferred curves together.”

      A second interpretation was that the reviewer did not have an intuitive sense of what a loss function value of, say, 1.0 actually means. To address this second interpretation, we have also added a supplement to Figure 2 with several example true and inferred response functions from the training sample, with representative loss values spanning 0.17 to 2.18. We have also added the following clarification to the caption of Figure 1-figure supplement 2:

      “The loss value is thus the fraction of the area under the true curve represented by the area between the true and inferred curves.”

      Finally, the discussion of the similarities and differences with an alternative approach to this inference problem, presented in Dewitt et al. (2025), is incomplete.

      We have expanded this section of the manuscript, and added a new plot directly comparing the methods.

      “In order to compare more directly to DeWitt et al. 2025, we remade their Fig.S6D, truncating to values at which affinities are actually observed in the bulk data, and using only three of the seven timepoints (11, 20, and 70, Figure 8, left). We then simulated 25 GCs with central data mimic parameters out to 70 days. For each such GC, we found the time point with mean affinity over living cells closest to each of three specific “target” affinity values (0.1, 1.0, 2.0) corresponding to the mean affinity of the bulk data at timepoints 11, 20, and 70. We then plot the effective birth rates of all living cells vs relative affinity (subtracting mean affinity) at the resulting GC-specific timepoints for all 25 GCs together Figure 8, right). Note that because each GC evolves at very different and time-dependent rates, we could not simply use the timepoints from the bulk data, since each GC slice from our simulation would then have very different mean affinity. The mean over GCs of these GC-specific chosen times is 10.9, 24.5, 44.4 (compared to the original bulk data time points 11, 20, 70). It is important to note that while the first two target affinities (0.1 and 1.0) are within the affinity ranges encountered in the extracted GC data, the third value (2.0) is far beyond them, and thus represents extrapolation to an affinity regime informed more by our underlying model than by the real data on which we fit it.”

      Reviewer #2 (Public review):

      Summary:

      This paper presents a new approach for explicitly transforming B-cell receptor affinity into evolutionary fitness in the germinal center. It demonstrates the feasibility of using likelihood-free inference to study this problem and demonstrates how effective birth rates appear to vary with affinity in real-world data.

      Strengths:

      (1) The authors leverage the unique data they have generated for a separate project to provide novel insights into a fundamental question. (2) The paper is clearly written, with accessible methods and a straightforward discussion of the limits of this model. (3) Code and data are publicly available and well documented.

      Weaknesses (minor):

      (1) Lines 444-446: I think that "affinity ceiling" and "fitness ceiling" should be considered independent concepts. The former, as the authors ably explain, is a physical limitation. This wouldn't necessarily correspond to a fitness ceiling, though, as Figure 7 shows. Conversely, the model developed here would allow for a fitness ceiling even if the physical limit doesn't exist.

      Right, whoops, good point. We've rearranged the discussion to separate the concepts, for instance:

      “While affinity and fitness ceilings are separate concepts, they are closely related. An affinity ceiling is a limit to affinity for a given antigen: there are no mutations that can improve affinity beyond this level. This would result in a truncated response function, undefined beyond the affinity ceiling. A fitness ceiling, on the other hand, is an upper asymptote on the response function. Such a ceiling would result in a limit on affinity for a germinal center reaction, since once cells are well into the upper asymptote of fitness they are no longer subject to selective pressure.”

      (2) Lines 566-569: I would like to see this caveat fleshed out more and perhaps mentioned earlier in the paper. While relative affinity is far more important, it is not at all clear to me that absolute affinity can be totally ignored in modeling GC behavior.

      This is a great point, we've added a mention of this where we introduce the replay experiment in the Methods:

      “It is important to note that this is a much lower level than typical BCR repertoires, which average roughly 5-10% nucleotide shm.”

      And expanded on the explanation in the Discussion:

      “Some aspects of behavior in the low-shm/early times regime of the extracted GC data are also potentially different to those at the higher shm levels and longer times found in typical repertoires. This is especially relevant to affinity or fitness ceilings, to which we likely have little sensitivity with the current data.”

      (3) One other limitation that is worth mentioning, though beyond the scope of the current work to fully address: the evolution of the repertoire is also strongly shaped by competition from circulating antibodies. (Eg: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3600904/, http://www.sciencedirect.com/science/article/pii/S1931312820303978). This is irrelevant for the replay experiment modeled here, but still an important factor in general repertoires.

      Yes good point, we've added these citations in a new paragraph on between-lineage competition:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013: McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors are encouraged to follow the suggestions of manuscript re-organization by Reviewer 1, in order to improve readability. We would also like to suggest improving the discussion of the traveling wave model to explain it in a more self-contained way. In passing, please clarify what is meant by 'steady-state' in that model. A superficial understanding would suggest that the only steady state in that model would be a homogeneous population of antibodies with maximum affinity/fitness.

      These are great suggestions. We have substantially rearranged the text according to Reviewer 1's suggestions, especially the Methods, and expanded on and rearranged the traveling wave discussion. We've also clarified throughout that the traveling wave model is assuming steady state with respect to population. In the public response to reviewer 1 above we describe these changes in more detail.

      Reviewer #1 (Recommendations for the authors):

      I suggest that the organization of the paper be reconsidered. The current methods section is long and at times repetitive, making it impossible to parse in a single reading. Moving some technical details from the main text to an appendix could improve readability. Despite the length of the methods section, many important points, such as justification of choices in model specification or values of parameters, are treated only briefly.

      We have rearranged the methods section, particularly the discussion of our model, and have more clearly justified choices of parameter values as described in the public response.

      Discussion of similarities and differences with reference to Dewitt et al. 2025 should be revised, as it's currently unclear whether the method presented here has any advantages.

      We have expanded this comparison, and emphasized the main disadvantage of the traveling wave approach: there is no way of knowing whether by abstracting away so much biological detail it misses important effects. We have also emphasized that the two approaches use different types of data (time series vs endpoint) which are typically not simultaneously available:

      “The clear advantage of the traveling wave model is its simplicity: if its high level view is accurate enough to effectively model the relevant GC dynamics, it is far more tractable. But reproducing low-level biological detail, and making high-dimensional real data comparisons (e.g. Figure 5) to iteratively improve model fidelity, are also useful, providing direct evidence that we are correctly modeling the underlying biological processes. The two approaches also utilize different types of data: we use a single time point, and thus must reconstruct evolutionary history; whereas the traveling wave requires a series of timepoints. The availability of both types of data is a unique feature of the replay experiment, and provides us with the opportunity to directly compare the approaches.”

      The results obtained from the same data should be directly compared (can the response function be directly compared to the result in Figure S6D in Dewitt et al., 2025? If yes, it should be re-plotted here and compared/superimposed with Figures 6 and 7). The text mentions the results differ, but it remains ambiguous whether the differences are significant and what their implications are.

      We've added a new Figure 8, comparing a modified version of the traveling wave Fig S6D to a new plot derived from our results using the data mimic parameters. While the two plots represent fundamentally different quantities, they do put the results of the two methods on an approximately equal footing and we see nice concordance between them in regions with significant data (they disagree substantially for larger negative affinities). We have also added emphasis to the point that the traveling wave model uses an entirely separate dataset to what we use here.

      Other comments:

      (1) l. 80: "[in] around 10 days"?

      Text rearranged so this phrase no longer appears.

      (2) l. 96: "an intrinsic rate [given by?] the response function above".

      Text rearranged so this phrase no longer appears.

      (3) Figure 1: The. “specific model” could part be expanded and improved to help make sense of model parameters and the order of different processes in the population model. Example values of parameters can be plotted rather than loosely described, (e.g., y_h+y_c, the upper asymptotes can be plotted in place of the “yscale determines upper asymptotes” label.

      Great suggestion, we've changed the labels.

      (4) The cartoons in the other parts are somewhat cryptic or illegible due to small sizes.

      We have added text in the caption linking to the figures that are, in the figure, intended to be in schematic form only.

      “Plots from elsewhere in the manuscript are rendered in schematic form: those in “infer on data” refer to Figure 4-figure supplement 1, and those in “simulate with inferred parameters” to Figure 5.

      (5) L. 137: It's not helpful to give numerical values before the definition of affinity. (and these numbers are repeated later).

      Good point, we've moved the affinity definition to the previous section, and remove the duplicate range information.

      (6): Table 1: A number of notations are unclear, such as “#seqs/GC” or “mutability multiplier”. The double notation for crucial parameters doesn't help. At the moment the table is introduced, the columns make little sense to the reader, and it's not well specified what dictates the choice or changes of parameter values or ranges.

      We've moved the table further down until after the parameters have been introduced, and clarified the indicated names.

      (7) l. 147: Choices of model are not justified and appear arbitrary (e.g., why death events happen at one of two rate).

      We have clarified the reasoning behind having two death rates.

      (8) l.151: “happened on the edges of developing phylogenetic tree” - ambiguous: do they accumulate at cell divisions? What is a “developing tree”?

      We have removed this ambiguous phrasing.

      (9) l.161: This paragraph is particularly dense.

      We have rearranged this section of the methods, and split up this paragraph.

      (10) l. 164: All the different response functions for different event types? Or only the one for birth, as stated before?

      Yes. This has been clarified.

      (11) l.167: Does the statement in the bracket refer to a unit?

      This has been clarified.

      (12) l. 169: Discussion of the implementation seems too detailed.

      Hopefully the rearranged description is clearer, but we worry that removing the details of events selection would leave some readers confused.

      (13) l. 186: Why describe the methods that, in the end, were not used? Similarly, as a mention of “variety of response functions” seems out of place if only one choice is used throughout the paper. eq. (2): that's mˆ{-1} from eq. (1). Having the two equations using the same notation is confusing.

      We've moved the mention of alternatives to the Discussion, where it is an important source of uncontrolled systematic uncertainty, and removed the extra equation.

      (14) l. 206: Unclear what “thus” refers to.

      Removed.

      (15) l.211: What does “neglecting y_h” mean?

      This has been clarified.

      (16) l. 242: Unclear what “this” refers to.

      Clarified.

      (17) l. 261: What does “model independence” refer to in this context?

      From the sigmoid model. Clarified.

      (18) l. 306: What values for which parameters? References?

      We have clarified and updated this statement - it was out of date, corresponding to the analysis before we started fitting non-sigmoid parameters.

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      (19) l. 326: "is interpreted as having" or "corresponds to"?

      Changed.

      (20) l. 340: Not sure what "encompassing" means in this context.

      Clarified.

      (21) l. 341: "We do this..." -- I think this sentence is not grammatical.

      Fixed.

      (22) l. 348: "on simulation" -- "from simulated data"?

      Indeed.

      (23) l. 351: "top rows", the figures only have one row.

      Fixed.

      (24) Figure 2: It's difficult to tell from the loss function itself whether inference on simulated data works well. Why not report the simulated and inferred response functions? The equivalent plots in Figure 5 would also be informative. Has inference been tested for different "sigmoid parameters" values?

      This is an important point that was not clear, thanks for bringing it up. We have expanded on and emphasized the differences between these samples and the reasoning behind their different evaluation choices. Briefly, we can't display true vs inferred response functions on the training samples since the curves for each GC are different -- the plot would be entirely filled in with very different response function shapes. This is why we do actual performance evaluation on the "data mimic" samples, where all GCs have the same parameters. Summary stats (like Fig 5) for the training sample are in Fig 5 Supplement 2.

      (25) l. 354: Unclear what "this" refers to.

      Removed.

      (26) l. 355: We assume the parameters are the same?

      Yes, we assume all data GCs have the same parameters. We have added emphasis of this point.

      (27) Figure 4: Is "lambda" the fitness? Should be typeset as \lambda_i?

      Our convention is to add the subscript when evaluating fitness on individual cells, but to omit it, as here, when plotting the response function as a whole.

      (28) l. 412: "[a] carrying capacity constraint".

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) In 2 places, you state that observed affinity ranged from -37 to 3, but I assume that the lower bound should be -3.7.

      The -37 was actually correct, but we had mistakenly missed updating it when we switched to the latest (current) version of the affinity model. We have updated the values, although these don't really have any effect on the model since we only infer within bounds in which we have a lot of points:

      “Affinity is ∅ for the initial unmutated sequence, and ranges from -12.2 to 3.5 in observed sequences, with a mean median of -0.3 (0.3).

      (2). I had to look up the Vols nicker paper to understand the tree encoding: It would be nice to spend another sentence or two on it here for those who aren't familiar.

      Great point, we have added the following:

      “We encode each tree with an approach similar to Lambert et al. (2023) and Thompson et al. (2024), most closely following the compact bijective ladderized vector (CBLV) approach from Voznica et al. (2022). The CBLV method first ladderizes the tree by rotating each subtree such that, roughly speaking, longer branches end up toward the left. This does not modify the tree, but rather allows iteration over nodes in a defined, repeatable way, called inorder iteration. To generate the matrix, we traverse the ladderized tree in order, calculating a distance to associate with each node. For internal nodes, this is the distance to root, whereas for leaf nodes it is the distance to the most-recently-visited internal node (Voznica et al., 2022, Fig. 2). Distances corresponding to leaf nodes are arranged in the first row of the matrix, while those from internal nodes form the second row.”

      (3) On line 351, you refer to the "top rows of Figure 2 and Figure 3," but each only has one row in the current version. I think it should now be "left panel.".

      Fixed.

      (4) How many vertical dashed lines are in the left panel of the bottom row of Figure 7? I think it's more than one, but can't tell if it is two or three...

      Nice catch! There were actually three. We've shortened them and added a white outline to clarify overlapping lines.

      (5) Would the model be applicable to GCs with multiple naive founders of different affinities? Or would more/different parameters be needed to account for that?

      The model would be applicable, but since the time required for our simulation scales roughly with the total simulated population size, we could probably only handle competition among at most a couple of GCs. Some sort of "migration strength" parameter would be required for competition among GCs (or within one GC if we don't want to assume it's well-mixed), but that doesn't seem a terrible impediment. We've added the following:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013; McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

    1. Why do social media platforms make decisions that harm users? And why do social media platforms sometimes go down paths of self-destruction and alienating their users? Sometimes these questions can be answered by looking at the economic forces that drive decision-making on social media platforms, in particular with capitalism. So let’s start by defining capitalism. 19.1.1. Definition of Capitalism:# Capitalism is: “an economic system characterized by private or corporate ownership of capital goods, by investments that are determined by private decision, and by prices, production, and the distribution of goods that are determined mainly by competition in a free market” Merriam-Webster Dictionary In other words, capitalism is a system where: Individuals or corporations own businesses These business owners make what they want and set their own prices. They compete with other businesses to convince customers to buy their products. These business owners then hire wage laborers at predetermined rates for their work, while the owners get the excess business profits or losses. Related Terms# Here are a few more terms that are relevant to capitalism that we need to understand in order to get to the details of decision-making and strategies employed by social media companies. Shares / Stocks Shares or stocks are ownership of a percentage of a business, normally coming with getting a percentage of the profits and a percentage of power in making business decisions. Companies then have a board of directors who represent these shareholders. The board is in charge of choosing who runs the company (the CEO). They have the power to hire and fire CEOs For example: in 1985, the board of directors for Apple Computers denied Steve Jobs (co-founded Apple) the position of CEO and then they fired him completely CEOs of companies (like Mark Zuckerberg of Meta) are often both wage-laborers (they get a salary, Zuckerberg gets a tiny symbolic $1/year) and shareholders (they get a share of the profits, Zuckerberg owns 16.8%) Free Market Businesses set their own prices and customers decide what they are willing to pay, so prices go up or down as each side decides what they are willing to charge/spend (no government intervention) See supply and demand What gets made is theoretically determined by what customers want to spend their money on, with businesses competing for customers by offering better products and better prices Especially the people with the most money, both business owners and customers Monopoly “a situation where a specific person or enterprise is the only supplier of a particular thing” Monopolies are considered anti-competitive (though not necessarily anti-capitalist). Businesses can lower quality and raise prices, and customers will have to accept those prices since there are no alternatives. Cornering a market is being close enough to a monopoly to mostly set the rules (e.g., Amazon and online shopping) 19.1.2. Socialism# Let’s contrast capitalism with socialism: Socialism, in contrast is a system where: A government owns the businesses (sometimes called “government services”) A government decides what to make and what the price is the price might be free, like with public schools, public streets and highways, public playgrounds, etc. A government then may hire wage laborers at predetermined rates for their work, and the excess business profits or losses are handled by the government For example, losses are covered by taxes, and excess may pay for other government services or go directly to the people (e.g., Alaska uses its oil profits to pay people to live there). As an example, there is one Seattle City Sewer system, which is run by the Seattle government. Having many competing sewer systems could actually make a big mess of the underground pipe system. 19.1.3. Accountability in Capitalism and other systems# Let’s look at who the leaders of businesses (or services) are accountable for in capitalism and other systems. Democratic Socialism (i.e., “Socialists1”)# With socialism in a representative democracy (i.e., “democratic socialism”), the government leaders are chosen by the people through voting. And so, while the governmental leaders are in charge of what gets made, how much it costs, and who gets it, those leaders are accountable to the voters. So, in a democratic socialist government, theoretically, every voter has an equal say in business (or government service) decisions. Note, that there are limitations to the government leaders being accountable to the people their decisions affect, such as government leaders ignoring voters’ wishes, or people who can’t vote (e.g., the young, non-citizens, oppressed minorities) and therefore don’t get a say.

      I thought this assignment was interesting because it connected programming with a real-world scenario. It helped me understand how the way we design an algorithm can affect fairness and outcomes for different people. I also liked that it made us think not only about writing correct code, but also about the social impact of algorithms.

    1. As a social media user, we hope you are informed about things like: how social media works, how they influence your emotions and mental state, how your data gets used or abused, strategies in how people use social media, and how harassment and spam bots operate. We hope with this you can be a more informed user of social media, better able to participate, protect yourself, and make it a valuable experience for you and others you interact with. For example, you can hopefully recognize when someone is intentionally posting something bad or offensive (like the bad cooking videos we mentioned in the Virality chapter, or an intentionally offensive statement) in an attempt to get people to respond and spread their content. Then you can decide how you want to engage (if at all) given how they are trying to spread their content.

      I genuinely think this class overall will help me with how I engage with social media in the future. I notice faster when I am doomscrolling, and notice more if the content I am watching is trying to get a response out of me. While I don't think I can fully quit social media (namely Instagram and Twitter), I do think I can be more cognizant. However, I may go into the settings for both apps now and go through it very deeply to make sure I am not being tracked as much as usual, and turn off things like targeted ads.

    1. Author response:

      The following is the authors’ response to the current reviews.

      I thank the authors for their clarifications. The manuscript is much improved now, in my opinion. The new power spectral density plots and revised Figure 1 are much appreciated. However, there is one remaining point that I am unclear about. In the rebuttal, the authors state the following: "To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated." 

      I am very confused by this statement, because both Fig. 7B and Suppl. Fig. 1B show that the visual- (i.e., visual target presented alone) has a lower accuracy and longer reaction time than visual+ (i.e., visual target presented with distractor). In fact, Suppl. Fig. 1B legend states the following: "accuracy: auditory- - auditory+: M = 7.2 %; SD = 7.5; p = .001; t(25) = 4.9; visual- - visual+: M = -7.6%; SD = 10.80; p < .01; t(25) = -3.59; Reaction time: auditory- - auditory +: M = -20.64 ms; SD = 57.6; n.s.: p = .08; t(25) = -1.83; visual- - visual+: M = 60.1 ms ; SD = 58.52; p < .001; t(25) = 5.23)." 

      These statements appear to directly contradict each other. I appreciate that the difficulty of auditory and visual trials in block 2 of MEG experiments are matched, but this does not address the question of whether the distractor was actually distracting (and thus needed to be inhibited by occipital alpha). Please clarify.

      We apologize for mixing up the visual and auditory distractor cost in our rebuttal. The reviewer is right in that our two statements contradict each other.

      To clarify: In the EEG experiment, we see significant distractor cost for auditory distractors in the accuracy (which can be seen in SUPPL Fig. 1A). We also see a faster reaction time with auditory distractors, which may speak to intersensory facilitation. As we used the same distractors for both experiments, it can be assumed that they were distracting in both experiments.

      In our follow-up MEG-experiment, as the reviewer stated, performance in block 2 was higher than in block 1, even though there were distractors present. In this experiment, distractor cost and learning effects are difficult to disentangle. It is possible that participants improved over time for the visual discrimination task in Block 1, as performance at the beginning was quite low. To illustrate this, we divided the trials of each condition into bins of 10 and plotted the mean accuracy in these bins over time (see Author response image 1). Here it can be seen that in Block 2, there is a more or less stable performance over time with a variation < 10 %. In Block 1, both for visual as well as auditory trials, an improvement over time can be seen. This is especially strong for visual trials, which span a difference of > 20%. Note that the mean performance for the 80-90 trial bin was higher than any mean performance observed in Block 2. 

      Additionally, the same paradigm has been applied in previous investigations, which also found distractor costs for the here-used auditory stimuli in blocked and non-blocked designs. See:

      Mazaheri, A., van Schouwenburg, M. R., Dimitrijevic, A., Denys, D., Cools, R., & Jensen, O. (2014). Region-specific modulations in oscillatory alpha activity serve to facilitate processing in the visual and auditory modalities. NeuroImage, 87, 356–362. https://doi.org/10.1016/j.neuroimage.2013.10.052

      Van Diepen, R & Mazaheri, A 2017, 'Cross-sensory modulation of alpha oscillatory activity: suppression, idling and default resource allocation', European Journal of Neuroscience, vol. 45, no. 11, pp. 1431-1438. https://doi.org/10.1111/ejn.13570

      Author response image 1.

      Accuracy development over time in the MEG experiment. During block 1, a performance increase over time can be observed for visual as well as for auditory stimuli. During Block 2, performance is stable over time. Data are presented as mean ± SEM. N = 27 (one participant was excluded from this analysis, as their trial count in at least one condition was below 90 trials).


      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      In this study, Brickwedde et al. leveraged a cross-modal task where visual cues indicated whether upcoming targets required visual or auditory discrimination. Visual and auditory targets were paired with auditory and visual distractors, respectively. The authors found that during the cue-to-target interval, posterior alpha activity increased along with auditory and visual frequency-tagged activity when subjects were anticipating auditory targets. The authors conclude that their results disprove the alpha inhibition hypothesis, and instead implies that alpha "regulates downstream information transfer." However, as I detail below, I do not think the presented data irrefutably disproves the alpha inhibition hypothesis. Moreover, the evidence for the alternative hypothesis of alpha as an orchestrator for downstream signal transmission is weak. Their data serves to refute only the most extreme and physiologically implausible version of the alpha inhibition hypothesis, which assumes that alpha completely disengages the entire brain area, inhibiting all neuronal activity.

      We thank the reviewer for taking the time to provide additional feedback and suggestions and we improved our manuscript accordingly.

      (1) Authors assign specific meanings to specific frequencies (8-12 Hz alpha, 4 Hz intermodulation frequency, 36 Hz visual tagging activity, 40 Hz auditory tagging activity), but the results show that spectral power increases in all of these frequencies towards the end of the cue-to-target interval. This result is consistent with a broadband increase, which could simply be due to additional attention required when anticipating auditory target (since behavioral performance was lower with auditory targets, we can say auditory discrimination was more difficult). To rule this out, authors will need to show a power spectral density curve with specific increases around each frequency band of interest. In addition, it would be more convincing if there was a bump in the alpha band, and distinct bumps for 4 vs 36 vs 40 Hz band.

      This is an interesting point with several aspects, which we will address separately

      Broadband Increase vs. Frequency-Specific Effects:

      The suggestion that the observed spectral power increases may reflect a broadband effect rather than frequency-specific tagging is important. However, Supplementary Figure 11 shows no difference between expecting an auditory or visual target at 44 Hz. This demonstrates that (1) there is no uniform increase across all frequencies, and (2) the separation between our stimulation frequencies was sufficient to allow differentiation using our method.

      Task Difficulty and Performance Differences:

      The reviewer suggests that the observed effects may be due to differences in task difficulty, citing lower performance when anticipating auditory targets in the EEG study. This issue was explicitly addressed in our follow-up MEG study, where stimulus difficulty was calibrated. In the second block—used for analysis—accuracy between auditory and visual targets was matched (see Fig. 7B). The replication of our findings under these controlled conditions directly rules out task difficulty as the sole explanation. This point is clearly presented in the manuscript.

      Power Spectrum Analysis:

      The reviewer’s suggestion that our analysis lacks evidence of frequency-specific effects is addressed directly in the manuscript. While we initially used the Hilbert method to track the time course of power fluctuations, we also included spectral analyses to confirm distinct peaks at the stimulation frequencies. Specifically, when averaging over the alpha cluster, we observed a significant difference at 10 Hz between auditory and visual target expectation, with no significant differences at 36 or 40 Hz in that cluster. Conversely, in the sensor cluster showing significant 36 Hz activity, alpha power did not differ, but both 36 Hz and 40 Hz tagging frequencies showed significant effects These findings clearly demonstrate frequency-specific modulation and are already presented in the manuscript.

      (2) For visual target discrimination, behavioral performance with and without the distractor is not statistically different. Moreover, the reaction time is faster with distractor. Is there any evidence that the added auditory signal was actually distracting?

      We appreciate the reviewer’s observation regarding the lack of a statistically significant difference in behavioral performance for visual target discrimination with and without the auditory distractor. While this was indeed the case in our EEG experiment, we believe the absence of an accuracy effect may be attributable to a ceiling effect, as overall visual performance approached 100%. This high baseline likely masked any subtle influence of the distractor.

      To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated.

      Regarding the faster reaction times observed in the presence of the auditory distractor, this phenomenon is consistent with prior findings on intersensory facilitation. Auditory stimuli, which are processed more rapidly than visual stimuli, can enhance response speed to visual targets—even when the auditory input is non-informative or nominally distracting (Nickerson, 1973; Diederich & Colonius, 2008; Salagovic & Leonard, 2021). Thus, while the auditory signal may facilitate motor responses, it can simultaneously impair perceptual accuracy, depending on task demands and baseline performance levels.

      Taken together, our data suggest that the auditory signal does exert a distracting influence, particularly under conditions where visual performance is not at ceiling. The dual effect—facilitated reaction time but reduced accuracy—highlights the complexity of multisensory interactions and underscores the importance of considering both behavioral and neurophysiological measures.

      (3) It is possible that alpha does suppress task-irrelevant stimuli, but only when it is distracting. In other words, perhaps alpha only suppresses distractors that are presented simultaneously with the target. Since the authors did not test this, they cannot irrefutably reject the alpha inhibition hypothesis.

      The reviewer’s claim that we did not test whether alpha suppresses distractors presented simultaneously with the target is incorrect. As stated in the manuscript and supported by our data (see point 2), auditory distractors were indeed presented concurrently with visual targets, and they were demonstrably distracting. Therefore, the scenario the reviewer suggests was not only tested—it forms a core part of our design.

      Furthermore, it was never our intention to irrefutably reject the alpha inhibition hypothesis. Rather, our aim was to revise and expand it. If our phrasing implied otherwise, we have now clarified this in the manuscript. Specifically, we propose that alpha oscillations:

      (a) Exhibit cyclic inhibitory and excitatory dynamics;

      (b) Regulate processing by modulating transfer pathways, which can result in either inhibition or facilitation depending on the network context.

      In our study, we did not observe suppression of distractor transfer, likely due to the engagement of a supramodal system that enhances both auditory and visual excitability. This interpretation is supported by prior findings (e.g., Jacoby et al., 2012), which show increased visual SSEPs under auditory task load, and by Zhigalov et al. (2020), who found no trial-by-trial correlation between alpha power and visual tagging in early visual areas, despite a general association with attention.

      Recent evidence (Clausner et al., 2024; Yang et al., 2024) further supports the notion that alpha oscillations serve multiple functional roles depending on the network involved. These roles include intra- and inter-cortical signal transmission, distractor inhibition, and enhancement of downstream processing (Scheeringa et al., 2012; Bastos et al., 2015; Zumer et al., 2014). We believe the most plausible account is that alpha oscillations support both functions, depending on context.

      To reflect this more clearly, we have updated Figure 1 to present a broader signal-transfer framework for alpha oscillations, beyond the specific scenario tested in this study.

      We have now revised Figure 1 and several sentences in the introduction and discussion, to clarify this argument.

      L35-37: Previous research gave rise to the prominent alpha inhibition hypothesis, which suggests that oscillatory activity in the alpha range (~10 Hz) plays a mechanistic role in selective attention through functional inhibition of irrelevant cortical areas (see Fig. 1; Foxe et al., 1998; Jensen & Mazaheri, 2010; Klimesch et al., 2007).

      L60-65: In contrast, we propose that functional and inhibitory effects of alpha modulation, such as distractor inhibition, are exhibited through blocking or facilitating signal transmission to higher order areas (Peylo et al., 2021; Yang et al., 2023; Zhigalov & Jensen, 2020; Zumer et al., 2014), gating feedforward or feedback communication between sensory areas (see Fig. 1; Bauer et al., 2020; Haegens et al., 2015; Uemura et al., 2021).

      L482-485: This suggests that responsiveness of the visual stream was not inhibited when attention was directed to auditory processing and was not inhibited by occipital alpha activity, which directly contradicts the proposed mechanism behind the alpha inhibition hypothesis.

      L517-519: Top-down cued changes in alpha power have now been widely viewed to play a functional role in directing attention: the processing of irrelevant information is attenuated by increasing alpha power in areas involved with processing this information (Foxe, Simpson, & Ahlfors, 1998; Hanslmayr et al., 2007; Jensen & Mazaheri, 2010).

      L566-569: As such, it is conceivable that alpha oscillations can in some cases inhibit local transmission, while in other cases, depending on network location, connectivity and demand, alpha oscillation can facilitate signal transmission. This mechanism allows to increase transmission of relevant information and to block transmission of distractors.

      (4) In the abstract and Figure 1, the authors claim an alternative function for alpha oscillations; that alpha "orchestrates signal transmission to later stages of the processing stream." In support, the authors cite their result showing that increased alpha activity originating from early visual cortex is related to enhanced visual processing in higher visual areas and association areas. This does not constitute a strong support for the alternative hypothesis. The correlation between posterior alpha power and frequency-tagged activity was not specific in any way; Fig. 10 shows that the correlation appeared on both 1) anticipating-auditory and anticipating-visual trials, 2) the visual tagged frequency and the auditory tagged activity, and 3) was not specific to the visual processing stream. Thus, the data is more parsimonious with a correlation than a causal relationship between posterior alpha and visual processing.

      Again, the reviewer raises important points, which we want to address

      The correlation between posterior alpha power and frequency-tagged activity was not specific, as it is present both when auditory and visual targets are expected:

      If there is a connection between posterior alpha activity and higher-order visual information transfer, then it can be expected that this relationship remains across conditions and that a higher alpha activity is accompanied by higher frequency-tagged activity, both over trials and over conditions. However, it is possible that when alpha activity is lower, such as when expecting a visual target, the signal-to-noise ratio is affected, which may lead to higher difficulty to find a correlation effect in the data when using non-invasive measurements.

      The connection between alpha activity and frequency-tagged activity appears both for auditory as well as visual stimuli and The correlation is not specific to the visual processing stream:

      While we do see differences between conditions (e.g. in the EEG-analysis, mostly 36 Hz correlated with alpha activity and only in one condition 40 Hz showed a correlation as well), it is true that in our MEG analysis, we found correlations both between alpha activity and 36 Hz as well as alpha activity and 40 Hz.  

      We acknowledge that when analysing frequency-tagged activity on a trial-by-trial basis, where removal of non-timelocked activity through averaging (which we did when we tested for condition differences in Fig. 4 and 9) is not possible, there is uncertainty in the data. Baseline-correction can alleviate this issue, but it cannot offset the possibility of non-specific effects. We therefore decided to repeat the analysis with a fast-fourier calculated power instead of the Hilbert power, in favour of a higher and stricter frequency-resolution, as we averaged over a time-period and thus, the time-domain was not relevant for this analysis. In this more conservative analysis, we can see that only 36 Hz tagged activity when expecting an auditory target correlated with early visual alpha activity.

      Additionally, we added correlation analyses between alpha activity and frequency-tagged activity within early visual areas, using the sensor cluster which showed significant condition differences in alpha activity. Here, no correlations between frequency-tagged activity and alpha activity could be found (apart from a small correlation with 40 Hz which could not be confirmed by a median split; see SUPPL Fig. 14 C). The absence of a significant correlation between early visual alpha and frequency-tagged activity has previously been described by others (Zhigalov & Jensen, 2020) and a Bayes factor of below 1 also indicated that the alternative hypotheses is unlikely.

      Nonetheless, a correlation with auditory signal is possible and could be explained in different ways. For example, it could be that very early auditory feedback in early visual cortex (see for example Brang et al., 2022) is transmitted alongside visual information to higher-order areas. Several studies have shown that alpha activity and visual as well as auditory processing are closely linked together (Bauer et al., 2020; Popov et al., 2023). Inference on whether or how this link could play out in the case of this manuscript expands beyond the scope of this study.

      To summarize, we believe the fact that 36 Hz activity within early visual areas does not correlate with alpha activity on a trial-by-trial basis, but that 36 Hz activity in other areas does, provides strong evidence that alpha activity affects down-stream signal processing.

      We mention this analysis now in our discussion:

      L533-536: Our data provides evidence in favour of this view, as we can show that early sensory alpha activity does not covary over trials with SSEP magnitude in early visual areas, but covaries instead over trials with SSEP magnitude in higher order sensory areas (see also SUPPL. Fig. 14).

      Reviewer #1 (Recommendations for the authors):

      The evidence for the alternative hypothesis, that alpha in early sensory areas orchestrates downstream signal transmission, is not strong enough to be described up front in the abstract and Figure 1. I would leave it in the Discussion section, but advise against mentioning it in the abstract and Figure 1.

      We appreciate the reviewer’s concern regarding the inclusion of the alternative hypothesis—that alpha activity in early sensory areas orchestrates downstream signal transmission—in the abstract and Figure 1. While we agree that this interpretation is still developing, recent studies (Keitel et al., 2025; Clausner et al., 2024; Yang et al., 2024) provide growing support for this framework.

      In response, we have revised the introduction, discussion, and Figure 1 to clarify that our intention is not to outright dismiss the alpha inhibition hypothesis, but to refine and expand it in light of new data. This revision does not invalidate the prior literature on alpha timing and inhibition; rather, it proposes an updated mechanism that may better account for observed effects.

      We have though retained Figure 1, as it visually contextualizes the broader theoretical landscape. while at the same time added further analyses to strengthen our empirical support for this emerging view.

      References:

      Bastos, A. M., Litvak, V., Moran, R., Bosman, C. A., Fries, P., & Friston, K. J. (2015). A DCM study of spectral asymmetries in feedforward and feedback connections between visual areas V1 and V4 in the monkey. NeuroImage, 108, 460–475. https://doi.org/10.1016/j.neuroimage.2014.12.081

      Bauer, A. R., Debener, S., & Nobre, A. C. (2020). Synchronisation of Neural Oscillations and Cross-modal Influences. Trends in cognitive sciences, 24(6), 481–495. https://doi.org/10.1016/j.tics.2020.03.003

      Brang, D., Plass, J., Sherman, A., Stacey, W. C., Wasade, V. S., Grabowecky, M., Ahn, E., Towle, V. L., Tao, J. X., Wu, S., Issa, N. P., & Suzuki, S. (2022). Visual cortex responds to sound onset and offset during passive listening. Journal of neurophysiology, 127(6), 1547–1563. https://doi.org/10.1152/jn.00164.2021

      Clausner T., Marques J., Scheeringa R. & Bonnefond M (2024). Feature specific neuronal oscillations in cortical layers BioRxiv :2024.07.31.605816. https://doi.org/10.1101/2024.07.31.605816

      Diederich, A., & Colonius, H. (2008). When a high-intensity "distractor" is better then a low-intensity one: modeling the effect of an auditory or tactile nontarget stimulus on visual saccadic reaction time. Brain research, 1242, 219–230. https://doi.org/10.1016/j.brainres.2008.05.081

      Haegens, S., Nácher, V., Luna, R., Romo, R., & Jensen, O. (2011). α-Oscillations in the monkey sensorimotor network influence discrimination performance by rhythmical inhibition of neuronal spiking. Proceedings of the National Academy of Sciences of the United States of America, 108(48), 19377–19382. https://doi.org/10.1073/pnas.1117190108

      Jacoby, O., Hall, S. E., & Mattingley, J. B. (2012). A crossmodal crossover: opposite effects of visual and auditory perceptual load on steady-state evoked potentials to irrelevant visual stimuli. NeuroImage, 61(4), 1050–1058. https://doi.org/10.1016/j.neuroimage.2012.03.040

      Keitel, A., Keitel, C., Alavash, M., Bakardjian, K., Benwell, C. S. Y., Bouton, S., Busch, N. A., Criscuolo, A., Doelling, K. B., Dugue, L., Grabot, L., Gross, J., Hanslmayr, S., Klatt, L.-I., Kluger, D. S., Learmonth, G., London, R. E., Lubinus, C., Martin, A. E., … Kotz, S. A. (2025). Brain rhythms in cognition – controversies and future directions. ArXiv. https://doi.org/10.48550/arXiv.2507.15639

      Nickerson R. S. (1973). Intersensory facilitation of reaction time: energy summation or preparation enhancement?. Psychological review, 80(6), 489–509. https://doi.org/10.1037/h0035437

      Popov, T., Gips, B., Weisz, N., & Jensen, O. (2023). Brain areas associated with visual spatial attention display topographic organization during auditory spatial attention. Cerebral cortex (New York, N.Y. : 1991), 33(7), 3478–3489. https://doi.org/10.1093/cercor/bhac285

      Salagovic, C. A., & Leonard, C. J. (2021). A nonspatial sound modulates processing of visual distractors in a flanker task. Attention, perception & psychophysics, 83(2), 800–809. https://doi.org/10.3758/s13414-020-02161-5

      Scheeringa, R., Petersson, K. M., Kleinschmidt, A., Jensen, O., & Bastiaansen, M. C. (2012). EEG α power modulation of fMRI resting-state connectivity. Brain connectivity, 2(5), 254–264. https://doi.org/10.1089/brain.2012.0088

      Spaak, E., Bonnefond, M., Maier, A., Leopold, D. A., & Jensen, O. (2012). Layer-specific entrainment of γ-band neural activity by the α rhythm in monkey visual cortex. Current biology : CB, 22(24), 2313–2318. https://doi.org/10.1016/j.cub.2012.10.020

      Yang, X., Fiebelkorn, I. C., Jensen, O., Knight, R. T., & Kastner, S. (2024). Differential neural mechanisms underlie cortical gating of visual spatial attention mediated by alpha-band oscillations. Proceedings of the National Academy of Sciences of the United States of America, 121(45), e2313304121. https://doi.org/10.1073/pnas.2313304121

      Zhigalov, A., & Jensen, O. (2020). Alpha oscillations do not implement gain control in early visual cortex but rather gating in parieto-occipital regions. Human brain mapping, 41(18), 5176–5186. https://doi.org/10.1002/hbm.25183

      Zumer, J. M., Scheeringa, R., Schoffelen, J. M., Norris, D. G., & Jensen, O. (2014). Occipital alpha activity during stimulus processing gates the information flow to object-selective cortex. PLoS biology, 12(10), e1001965. https://doi.org/10.1371/journal.pbio.1001965

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined. 

      We thank the reviewer for this useful comment. We plan to clarify the method, including all the relevant variables in our revised manuscript. The reviewer is correct in pointing out that there are more sections and equations in Choi et al., including the derivation of an exact expression for the steady-state queue-length distribution and the two-moment approximation for the queue-length distribution. Since only the latter was directly utilized in our work, we included in the first version of our manuscript only material on this section and not the other. We agree with the reviewer on readers benefiting from additional information on the derivation of the exact expression for the steady-state queue-length distribution. Therefore, we will summarize the derivation of this expression in our revised manuscript. Regarding the assumptions of the method we applied, especially those for going from the exact expression to the two-moment approximation, we did describe these in the Materials and Methods of our manuscript. We recognize from this comment that the writing and organization of this information may not have been sufficiently clear. We had separated the information on this method into two parts, with the descriptive summary placed in the Materials and Methods and the equations or mathematical formula placed in the Appendix. This can make it difficult for readers to connect the two parts and remember what was introduced earlier in the Materials and Methods when reading the equations and mathematical details in the Appendix. For our revised manuscript, we plan to cover both parts in the Materials and Methods, and to provide more of the technical details in one place, which will be easier to understand and follow.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow. 

      We thank the reviewer for this suggestion. We will add a diagram illustrating the connection between the queueing procedure and malaria transmission.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates. 

      There appears to be some confusion on what we display in some key figures. We will clarify this further both here and in the revised text. In Figures 1, 2, and 10-14, we displayed the bootstrapped distributions including the 95% CIs. These figures do not show the distribution of the mean FOI taken over multiple simulations. We estimated mean FOI on an annual basis per host in the following sense. Both of our proposed methods require either a steady-state queue length distribution, or moments of this distribution for FOI inference. However, we only have one realization or observation for each individual host, and we do not have access to either the time-series observation of a single individual’s MOI or many realizations of a single individual’s MOI at the same sampling time. This is typically the case for empirical data, although numerical simulations could circumvent this limitation and generate such output. Nonetheless, we do have a queue length distribution at the population level for both the simulation output and the empirical data, which can be obtained by simply aggregating MOI estimates across all sampled individuals. We use this population-level queue length distribution to represent and approximate the steady-state queue length distribution at the individual level. Such representation or approximation does not consider explicitly any individual heterogeneity due to biology or transmission. The estimated FOI is per host in the sense of representing the FOI experienced by an individual host whose queue length distribution is approximated from the collection of all sampled individuals. The true FOI per host per year in the simulation output is obtained from dividing the total FOI of all hosts per year by the total number of all hosts. Therefore, our estimator, combined with the demographic information on population size, is for the total number of Plasmodium falciparum infections acquired by all individual hosts in the population of interest per year.

      We evaluated the impact of individual heterogeneity on FOI inference by introducing individual heterogeneity into the simulations. With a considerable amount of transmission heterogeneity across individuals (namely 2/3 of the population receiving more than 90% of all bites whereas the remaining 1/3 receives the rest of the bites), our two methods exhibit a similar performance than those of the homogeneous transmission scenarios.

      Concerning the second point, we will add a quantitative assessment of the ability of the estimator to recover the truth across simulations and include this information in the legend of each figure. In particular, we will provide the proportion of simulations where the truth is captured by the entire bootstrap distribution, in addition to some measure of relative deviation, such as the relative difference between the true FOI value and the median of the bootstrap distribution for the estimate. This assessment will be a valuable addition, but please note that the comparisons we have provided in a graphical way do illustrate the ability of the methods to estimate “sensible” values, close to the truth despite multiple sources of errors. “Close” is here relative to the scale of variation of FOI in the field and to the kind of precision that would be useful in an empirical context. From a practical perspective based on the potential range of variation of FOI, the graphical results already illustrate that the estimated distributions would be informative.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure. 

      We thank the reviewer for pointing out these aspects of the work that can be further clarified. We will specify the ranges for the choice of mean and variance parameters for inter-arrival times as well as the grid of values tested in the corresponding figure caption or in a separate supplementary table. We maximized the likelihood of observing the set of individual MOI estimates in a sampled population given steady queue length distributions (with these distributions based on the two-moment approximation method for different combinations of the mean and variance of inter-arrival times). We will add a section to either the Materials and Methods or the Appendix in our revised manuscript including an explicit formulation of the likelihood.

      We will add example figures on the shape of the likelihood to the Appendix. We will also test how choices of the grid of values influence the overall quality of the estimation procedure. Specifically, we will further refine the grid of values to include more points and examine whether the results of FOI inference are consistent and robust against each other.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population. 

      The reviewer is indeed correct about the difficulty of empirically measuring the duration of infection for 1-5-year-olds, and that of further testing whether these 1-5-year-olds exhibit the same distribution for duration of infection as naïve adults co-infected with syphilis. We will nevertheless continue to use the described method for duration of infection, while better acknowledging and discussing the limitations this aspect of the method introduces. We note that the infection duration from the historical clinical data we have relied on, is being used in the malaria modeling community as one of the credible sources for this parameter of untreated natural infections in malaria-naïve individuals in malaria-endemic settings of Africa (e.g. in the agent-based model OpenMalaria, see 1).

      It is important to emphasize that the proposed methods apply to the MOI estimates for naïve or close to naïve patients. They are not suitable for FOI inference for the school-aged children and the adult populations of high-transmission endemic regions, since individuals in these age classes have been infected many times and their duration of infection is significantly shortened by their immunity. To reduce the degree of misspecification in infection duration and take full advantage of our proposed methods, we will emphasize in the revision the need to prioritize in future data collection and sampling efforts the subpopulation class who has received either no infection or a minimum number of infections in the past, and whose immune profile is close to that of naïve adults, for example, infants. This emphasis is aligned with the top priority of all intervention efforts in the short term, which is to monitor and protect the most vulnerable individuals from severe clinical symptoms and death.

      Also, force of infection for naïve hosts is a key basic parameter for epidemiological models of a complex infectious disease such as falciparum malaria, whether for agent-based formulations or equation-based ones. This is because force of infection for non-naïve hosts is typically a function of their immune status and the force of infection of naïve hosts. Thus, knowing the force of infection of naïve hosts can help parameterize and validate these models by reducing degrees of freedom.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation. 

      Thank you for this question. We will investigate more values of the parameter c systematically, including substantially higher ones. We note however that this quantity is the carrying capacity of the queuing system, or the maximum number of blood-stage strains that an individual human host can be co-infected with. We do have empirical evidence for the value of the latter being around 20 (2). This observed value provides a lower bound for parameter c. To account for potential under-sampling of strains, we thus tried values of 25 and 30 in the first version of our manuscript.

      In general, this parameter influences the steady-state queue length distribution based on the two-moment approximation, more specifically, the tail of this distribution when the flow of customers/infections is high. Smaller values of parameter c put a lower cap on the maximum value possible for the queue length distribution. The system is more easily “overflowed”, in which case customers (or infections) often find that there is no space available in the queuing system/individual host upon their arrival. These customers (or infections) will not increment the queue length. The parameter c has therefore a small impact for the part of the grid resulting in low flows of customers/infection, for which the system is unlikely to be overflowed. The empirical MOI distribution centers around 4 or 5 with most values well below 10, and only a small fraction of higher values between 15-20 (2). When one increases the value of c, the part of the grid generating very high flows of customers/infections results in queue length distributions with a heavy tail around large MOI values that are not supported by the empirical distribution. We therefore do not expect that substantially higher values for parameter c would change either the relative shape of the likelihood or the MLE.

      Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context. 

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics. 

      (3) The mathematical approach is simple and elegant, and thus easy to understand. 

      Weaknesses: 

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates. 

      We thank the reviewer for bringing up the limitation of our proposed methods due to their reliance on a known and fixed duration of infection from historical clinical data. Please see our response to reviewer 1 comment 2a.

      (2) Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration. 

      We thank the reviewer for pointing out a potential improvement to the work. We acknowledge that FOI is inferred from MOI, and thus is dependent on the information contained in MOI. FOI reflects risk of infection, is associated with risk of clinical episodes, and can relate local variation in malaria burden to transmission better than other proxy parameters for transmission intensity. It is possible that MOI can be as informative as FOI when one regresses the risk of clinical episodes and local variation in malaria burden with MOI. But MOI by definition is a number and not a rate parameter. FOI for naïve hosts is a key basic parameter for epidemiological models. This is because FOI of non-naïve hosts is typically a function of their immune status and the FOI of naïve hosts. Thus, knowing the FOI of naïve hosts can help parameterize and validate these models by reducing degrees of freedom. In this sense, we believe the transformation from MOI to FOI provides a useful step.

      Given the difficulty of measuring infection duration, estimating infection duration and FOI simultaneously appears to be an attractive alternative, as the referee pointed out. This will require however either cohort studies or more densely sampled cross-sectional surveys due to the heterogeneity in infection duration across a multiplicity of factors. These kinds of studies have not been, and will not be, widely available across geographical locations and time. This work aims to utilize more readily available data, in the form of sparsely sampled single-time-point cross-sectional surveys.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates. 

      We thank the reviewer for pointing out aspects of the work that can be further clarified. It is difficult to disentangle the effect of drug treatment on measurement, including infection status, MOI, and duration of infection. Thus, we did not attempt to address this matter explicitly in the original version of our manuscript. Instead, we considered two extreme scenarios which bound reality, well summarized by the reviewer. First, if drug treatment has had no impact on measurement, the MOI of the drug-treated 1-5-year-olds would reflect their true underlying MOI. We can then use their MOI directly for FOI inference. Second, if the drug treatment had a significant impact on measurement, i.e., if it completely changed the infection status, MOI, and duration infection of drug-treated 1-5-year-olds, we would need to either exclude those individuals’ MOI or impute their true underlying MOI. We chose to do the latter in the original version of the manuscript. If those 1-5-year-olds had not received drug treatment, they would have had similar MOI values than those of the non-treated 1-5-year-olds. We can then impute their MOI by sampling from the MOI estimates of non-treated 1-5-year-olds.

      The reviewer is correct in pointing out that this imputation does not add additional information and can potentially deflate the variability of MOI distributions, compared to simply throwing or excluding those drug-treated 1-5-year-olds from the analysis. Thus, we can include in our revision FOI estimates with the drug-treated 1-5-year-olds excluded in the estimation.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals. 

      We imputed the MOI values of microscopy-negative but PCR-positive 1-5-year-olds by sampling from the microscopy-positive 1-5-year-olds, effectively assuming that both have the same, or similar, MOI distributions. We did so because there is a weak relationship in our Ghana data between the parasitemia level of individual hosts and their MOI (or detected number of var genes, on the basis of which the MOI values themselves were estimated). Parasitemia levels underlie the difference in detection sensitivity of PCR and microscopy.

      We will elaborate on this matter in our revised manuscript and include information from our previous and on-going work on the weak relationship between MOI/the number of var genes detected within an individual host and their parasitemia levels. We will also discuss potential reasons or hypotheses for this pattern.

      Reviewer #3 (Public Review):

      Summary: 

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI. 

      Strengths: 

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics. 

      Weaknesses: 

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion. 

      We thank the reviewer for this comment, although we think there is a mis-understanding on what can and cannot be practically validated in the sense of a “true” measure of FOI that would be free from assumptions for a complex disease such as malaria. We would not want the results to be over-interpreted and will extend the discussion of what we have done to test the methods. We note that for the performance evaluation of statistical methods, the use of simulation output is quite common and often a necessary and important step. In some cases, the simulation output is generated by dynamical models, whereas in others, by purely descriptive ones. All these models make their own assumptions which are necessarily a simplification of reality. The stochastic agent-based model (ABM) of malaria transmission utilized in this work has been shown to reproduce several important patterns observed in empirical data from high-transmission regions, including aspects of strain diversity which are not represented in simpler models.

      In what sense this ABM makes a set of biological and structural assumptions which are “probably similar” to those of the queuing methods we present, is not clear to us. We agree that relying on models whose structural assumptions differ from those of a given method or model to be tested, is the best approach. Our proposed methods for FOI inference based on queuing theory rely on the duration of infection distribution and the MOI distribution among sampled individuals, both of which can be direct outputs from the ABM. But these methods are agnostic on the specific mechanisms or biology underlying the regulation of duration and MOI.

      Another important point raised by this comment is what would be the “true” FOI value against which to validate our methods. Empirical MOI-FOI pairs for FOI measured directly by tracking cohort studies are still lacking. There are potential measurement errors for both MOI and FOI because the polymorphic markers typically used in different cohort studies cannot differentiate hyper-diverse antigenic strains fully and well (5). Also, these cohort studies usually start with drug treatment. Alternative approaches do not provide a measure of true FOI, in the sense of the estimation being free from assumptions. For example, one approach would be to fit epidemiological models to densely sampled/repeated cross-sectional surveys for FOI inference. In this case, no FOI is measured directly and further benchmarked against fitted FOI values. The evaluation of these models is typically based on how well they can capture other epidemiological quantities which are more easily sampled or measured, including prevalence or incidence. This is similar to what is done in this work. We selected the FOI values that maximize the likelihood of observing the given distribution of MOI estimates. Furthermore, we paired our estimated FOI value for the empirical data from Ghana with another independently measured quantity EIR (Entomological Inoculation Rate), typically used in the field as a measure of transmission intensity. We check whether the resulting FOI-EIR point is consistent with the existing set of FOI-EIR pairs and the relationship between these two quantities from previous studies. We acknowledge that as for model fitting approaches for FOI inference, our validation is also indirect for the field data.

      Prompted by the reviewer’s comment, we will discuss this matter in more detail in our revised manuscript, including clarifying further certain basic assumptions of our agent-based model, emphasizing the indirect nature of the validation with the field data and the existing constraints for such validation.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone). 

      We thank the reviewer for this comment. We will add supplementary figures for the MOI distributions generated by the queuing theory method (i.e., the two-moment approximation method) and our agent-based model in our revised manuscript.

      In the first version of our manuscript, we considered two extreme scenarios which bound the reality, instead of simply assuming that drug treatment does not impact the infection status, MOI, and duration of infection. See our response to reviewer 2 point (3). The resulting FOI estimates differ but not substantially across the two extreme scenarios, partially because drug-treated individuals’ MOI distribution is similar to that of non-treated individuals (or the apparent lack of drug treatment on MOI as pointed by the referee). We will consider potentially adding some formal test to quantify the difference between the two MOI distributions and how significant the difference is. We will discuss which of the two extreme scenarios reality is closer to, given the result of the formal test. We will also discuss in our revision possible reasons/hypotheses underlying the impact of drug treatment on MOI from the perspective of the nature, efficiency, and duration of the drugs administrated.

      Regarding the last point of the reviewer, on understanding the relationship between MOI and FOI, we are not fully clear about what was meant. We are also confused about the statement on what the “model is doing in this manuscript alone”. We interpret the overall comment as the reviewer suggesting a better understanding of the relationship between MOI and FOI, either between their distributions, or the moments of their distributions, perhaps by fitting models including simple linear regression models. This approach is in principle possible, but it is not the focus of this work. It will be equally difficult to evaluate the performance of this alternative approach given the lack of MOI-FOI pairs from empirical settings with directly measured FOI values (from large cohort studies). Moreover, the qualitative relationship between the two quantities is intuitive. Higher FOI values should correspond to higher MOI values. Less variable FOI values should correspond to more narrow or concentrated MOI distributions, whereas more variable FOI values should correspond to more spread-out ones. We will discuss this matter in our revised manuscript.

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying. 

      We thank the reviewer for this helpful comment as it is fundamental that there is no confusion on the basic definitions. EIR, the entomological inoculation rate, is closely related to the force of infection but is not equal to it. EIR focuses on the rate of arrival of infectious bites and is measured as such by focusing on the mosquito vectors that are infectious and arrive to bite a given host. Not all these bites result in actual infection of the human host. Epidemiological models of malaria transmission clearly make this distinction, as FOI is defined as the rate at which a host acquires infection. This definition comes from more general models for the population dynamics of infectious diseases in general. (For diseases simpler than malaria, with no super-infection, the typical SIR models define the force of infection as the rate at which a susceptible individual becomes infected).  For malaria, force of infection refers to the number of blood-stage new infections acquired by an individual host over a given time interval. This distinction between EIR and FOI is the reason why studies have investigated their relationship, with the nonlinearity of this relationship reflecting the complexity of the underlying biology and how host immunity influences the outcome of an infectious bite.

      We agree however with the referee that there could be some confusion in our definition resulting from the approach we use to estimate the MOI distribution (which provides the basis for estimating FOI). In particular, we rely on the non-existent to very low overlap of var repertoires among individuals with MOI=1, an empirical pattern we have documented extensively in previous work (See 2, 3, and 4). The method of var_coding and its Bayesian formulation rely on the assumption of negligible overlap. We note that other approaches for estimating MOI (and FOI) based on other polymorphic markers, also make this assumption (reviewed in _5). Ultimately, the FOI we seek to estimate is the one defined as specified above and in both the abstract and introduction, consistent with the epidemiological literature. We will include clarification in the introduction and discussion of this point in the revision.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method. 

      We will modify the relevant sentences to use “consistent” instead of “robust”.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology. 

      We thank the reviewer for this comment. As also mentioned in the response to reviewer 1’s comments, we will reorganize and rewrite parts of the text in our revision to improve clarity.

      References and Notes

      (1)   Maire, N. et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg., 75(2 Suppl):19-31 (2006).

      (2)   Tiedje, K. E. et al. Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions. eLife, 12 (2023).

      (3)   Day, K. P. et al. Evidence of strain structure in Plasmodium falciparum var gene repertoires in children from Gabon, West Africa. Proc. Natl. Acad. Sci. U.S.A., 114(20), 4103-4111 (2017).

      (4)   Ruybal-Pesántez, S. et al. Population genomics of virulence genes of Plasmodium falciparum in clinical isolates from Uganda. Sci. Rep., 7(11810) (2017).

      (5)   Labbé, F. et al. Neutral vs. non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19(1) (2023).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have adequately responded to all comments.

      We thank Reviewer 1 for their positive assessment of our previous round of revisions.

      Reviewer #2 (Public review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      - The use of historical clinical data is very clever in this context

      - The simulations are very sophisticated with respect to trying to capture realistic population dynamics

      - The mathematical approach is simple and elegant, and thus easy to understand

      Weakness:

      The assumptions of the approach are quite strong, and the authors have made clear that applicability is constrained to individuals with immune profiles that are similar to malaria naive patients with neurosyphilis. While the historical clinical data is a unique resource and likely directionally correct, it remains somewhat dubious to use the exact estimated values as inputs to other models without extensive sensitivity analysis.

      We thank reviewer 2 for their comments on our previous round of revisions. The statement here that “it remains somewhat dubious to use the exact estimated values as inputs to other models” suggests that we may not have been sufficiently clear on how infection duration is represented in our agent-based model (ABM) of malaria population dynamics. Because our analysis uses simulated outputs from the ABM to validate the performance of the two queuing-theory methods, we believe this point warrants clarification, which we provide below.

      When simulating with the ABM, we do not use empirical estimates of infection duration in immunologically naïve individuals from the historical clinical data as direct inputs. Instead, infection duration emerges from the within-host dynamics modeled in the ABM (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision). Briefly, each Plasmodium falciparum parasite carries approximately 50-60 var genes, each encoding a distinct variant surface antigen expressed during the blood stage of infection. Empirical evidence[1,2] indicates that these var genes are expressed largely sequentially. If a host has previously encountered the antigenic product of a given var gene and retains immunity to it, subject to waning at empirically estimated rates[3,4], the corresponding parasite subpopulation is rapidly cleared. Conversely, if the host is naïve to that gene, it takes approximately seven days for the immune system to mount an effective antibody response, resulting in a rapid decline or elimination of the expressed variant[5]. This seven-day timescale aligns with the duration of each successive parasitemia peak observed in Plasmodium falciparum infections[6,7], each arising primarily from the expression of a single var gene and occasionally from a small number of var genes.

      In our previous analyses, we therefore modeled an average expression duration of seven days per gene in naïve hosts. Specifically, the switching time to the next gene was drawn from an exponential distribution with a mean of seven days. Each var gene is represented as a linear combination of two epitopes (alleles), based on the empirical characterization of two hypervariable regions in the var tag region[8], and immunity is acquired against these alleles. Immunity to one allele of a given gene reduces its average expression duration by approximately half, whereas immunity to both alleles results in an immediate switch to another var gene within the infection. Consequently, the total duration of infection is proportional to the number of unseen alleles by the host across all var genes expressed during that infection (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision).

      Prompted by the reviewer’s comments, in this revision we additionally tested mean expression durations of 7.5 and 8 days per var gene, together with an extension of the within-host rules. These values were applied in combination with the extended within-host rules (see the next paragraph for motivation and details). Although differences among the three mean expression durations are modest at the per-gene level, when aggregated across all var genes expressed within an individual parasite, the resulting total infection duration can differ by on the order of several months. The resulting distributions of infection duration across immunologically naïve individuals and those aged 1-5 years, together with those generated under our previous simulation settings, span a range of means and variances that lies above and below, but encompasses, scenarios comparable to the historical clinical data from naïve neurosyphilis patients treated with P. falciparum malaria. We have provided example supplementary figures illustrating that the distributions of infection duration from the simulated outputs overlap with, and closely resemble, the empirical distribution from the historical clinical data (Appendix 1-Figure 27-32).

      We considered the following modification of the within-host rules. In our previous ABM simulations, we had assumed that an infection would clear only once the parasite had exhausted its entire var gene repertoire, that is, after every var gene had been expressed and recognized. However, biological evidence indicates that clearance can occur earlier for several reasons, including stochastic extinction before full repertoire exhaustion. Even if some var genes remain unexpressed, an infection can terminate due to demographic stochasticity once parasite densities fall to very low levels. This decline in parasite densities may result from non-variant-specific immune mechanisms or from cross-immunity among var genes that share sequence similarity or alleles[9,10,11], both of which can substantially reduce parasite numbers. To model the possibility of termination or clearance before full repertoire exhaustion, we implemented a simple scenario in which there is a small probability of clearing the current infection while a given var gene-whether non-final or final-is being expressed. This probability is a function of the host’s pre-existing immunity to the two epitopes (alleles) of that gene, thereby capturing in a parsimonious manner the effects of cross-immunity among sequence- or allele-sharing var genes in reducing parasitemia. Specifically, it is modeled as a Bernoulli draw whose success probability equals the immunity level against the gene (0 for no immunity to either epitope, 0.5 for immunity to one epitope, and 1 for immunity to both epitopes) multiplied by a constant factor of 0.025. Thus, the probability scales with pre-existing variant-specific immunity to the gene but remains small overall, while introducing additional variance into the emergent distribution of total infection duration across hosts.

      We acknowledge that the ABM used to simulate malaria population dynamics cannot capture all mechanisms and complexities underlying within-host processes, many of which remain poorly understood. However, we emphasize that the resulting distributions of infection duration generated by the ABM span a broad range of means, variances, and shapes, including distributions that closely match those observed in the clinical historical data. Because the queueing-theory methods rely on only the mean and variance of infection duration to estimate the force of infection (FOI), these scenarios, which collectively span and encompass values comparable to the empirical ones, provide an appropriate basis for evaluating the performance of the methods using simulated outputs. We have added supplementary figures (see Appendix 1-Figure 16-22) illustrating the corresponding FOI inference results when we allow for clearance before the complete expression of the var repertoire, and the accuracy of FOI estimation remains comparable across all the scenarios examined.

      Finally, we emphasize that the application of the queuing-theory methods to the simulated outputs and to the Ghana field survey data involve two self-contained steps. For the simulations, FOI is inferred directly from the emergent distributions of infection duration generated by the ABM. For the Ghana surveys, FOI is inferred using the historical clinical data, which remains one of the few credible and widely used empirical sources for infection duration in immunologically naïve individuals[6]. By exploring different mean expression durations and within-host rules in the ABM, which generates distributions of infection duration that span and encompass those comparable to the empirical distribution, we demonstrate that the queueing-theory methods perform comparably across diverse scenarios and are well suited for application to the Ghana field surveys.

      We expanded the section on within-host dynamics in Appendix 1 to elaborate on this point (Lines 817-854).

      Reviewer #3 (Public review):

      I think the authors gave a robust but thorough response to our reviews and made some important changes to the manuscript which certainly clarify things for me.

      We thank Reviewer 3 for their positive feedback on our previous round of revisions.

      References

      (1) Zhang, X. & Deitsch, K. W. The mystery of persistent, asymptomatic Plasmodium falciparum infections. Curr. Opin. Microbiol 70, 102231 (2022).

      (2) Deitsch, K. W. & Dzikowski, R. Variant gene expression and antigenic variation by malaria parasites. Annu. Rev. Microbiol. 71, 625–641 (2017).

      (3) Collins, W. E., Skinner, J. C. & Jeffery, G. M. Studies on the persistence of malarial antibody response. American journal of epidemiology, 87(3), 592–598 (1968).

      (4) Collins, W. E., Jeffery, G. M. & Skinner, J. C. Fluorescent Antibody Studies in Human Malaria. II. Development and Persistence of Antibodies to Plasmodium falciparum. The American journal of tropical medicine and hygiene, 13, 256–260 (1964).

      (5) Gatton, M. L., & Cheng, Q. Investigating antigenic variation and other parasite-host interactions in Plasmodium falciparum infections in naïve hosts. Parasitology, 128(Pt 4), 367–376 (2004).

      (6) Maire, N., Smith, T., Ross, A., Owusu-Agyei, S., Dietz, K., & Molineaux, L. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. The American journal of tropical medicine and hygiene, 75(2 Suppl), 19–31 (2006).

      (7) Chen D. S., Barry A. E., Leliwa-Sytek A., Smith T-A., Peterson I., Brown S. M., et al. A Molecular Epidemiological Study of var Gene Diversity to Characterize the Reservoir of Plasmodium falciparum in Humans in Africa. PLoS ONE 6(2): e16629 (2011).

      (8) Larremore D. B., Clauset A., & Buckee C. O. A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes. PLoS Comput Biol 9(10): e1003268 (2013).

      (9) Holding T. & Recker M. Maintenance of phenotypic diversity within a set of virulence encoding genes of the malaria parasite Plasmodium falciparum. J. R. Soc. Interface.1220150848 (2015).

      (10) Crompton, P. D., Moebius, J., Portugal, S., Waisberg, M., Hart, G., Garver, L. S., Miller, L. H., Barillas-Mury, C., & Pierce, S. K. Malaria immunity in man and mosquito: insights into unsolved mysteries of a deadly infectious disease. Annual review of immunology, 32, 157–187 (2014).

      (11) Langhorne, J., Ndungu, F., Sponaas, AM. et al. Immunity to malaria: more questions than answers. Nat Immunol 9, 725–732 (2008).

    1. Author response:

      We thank the three reviewers for their critical and in-depth assessment of our study. Below you find our comments to the public reviews and our revision plans.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript adds to the recent, exciting developments in our understanding of the MmpL/S transporters from mycobacteria. This work provides solid support for the trimeric/hexameric arrangement of subunits in the complex, and reveals a possible pathway for substrate translocation.Overall, I think this manuscript is a solid body of work that adds to several recent studies from this team and others on the structure and mechanism of the MmpL/S transporter family, particularly MmpL4/S4. The combination of AF, disulfide engineering, and experimental structure is good, though it is a bit puzzling that the experimental structure based on disulfide stabilization of the AF prediction does not recapitulate key elements (MmpS periplasmic domain docking to MmpL, and altered CCD configuration).

      I have no major concerns about this manuscript.

      We thank reviewer#1 for this positive assessment of our work. The deviation of the AF prediction from the experimental structure is , in our view, not puzzling. AF does not take the physical properties of proteins into account, but predicts structures based on strong sequence alignments. It therefore does not have “knowledge” about the general flexibility of domains such as the CCD, which is also observed in the corresponding MmpL5 structures, nor does it have knowledge about preferred conformational states. Rather than “failing” to confirm the AF predictions, our cryo-EM structure revealed an unexpected tilted conformation of the CCD. As we outline in comments below, the physiological relevance of the tilted CCD is unclear. Its flexibility might be required to interact with (still elusive) outer membrane protein components to form the fully assembled efflux machinery.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes the structure of the Mycobacterium tuberculosis (MmpS4)3-(MmpL4)3 hetero-heximeric transporter complex. The structure was obtained by cryogenic electron microscopy using an engineered construct that cross-links MmpS4 to MmpL4 via a disulfide bond. The position of the disulfide bond was determined using an Alphafold2 model of the hetero-heximer. Although Alphafold2 predicts a symmetric hetero-heximer, the author found that the structure of the coiled-coil domain (CCD) is asymmetric, tilted at about 60° relative to the membrane domains, and only contains two of the three alpha helical hairpins, with the third being disordered.

      Strengths:

      The strategy of using Alphafold2 models to guide construct design for experimental structure determination is state-of-the-art, and this work provides a great example of its applications and limitations. I.e., the experimental structure does not fully recapitulate the prediction but provides unexpected results.

      The comparisons between the authors' structures and the previously published structures of the MmpL4 monomer and MmpL5 trimers strengthen the authors' findings.

      We thank reviewer#2 for this positive assessment of our work and agree that it is interesting that the experimental structures do not fully agree with the AF predictions (see also comment to reviewer#1).

      Weaknesses:

      A more detailed description of the current mechanistic hypothesis would strengthen the manuscript. The authors state that the two periplasmic domains "are expected to undergo rigid body movements that allow substrate transport through these periplasmic domains similar to the conformational changes observed in the E. coli multidrug efflux pump AcrB". A schematic of the proposed transport cycle, as a supplemental figure that shows the current hypothesis regarding transport, would be beneficial for understanding the previous structures and putting the current structure in context. Outside of "the mechanistic basis of how these conformational changes are coupled to protonation of the DY-pairs", what are the major controversies/open questions regarding the mechanism?

      We thank the reviewer for this valuable comment. We will add a new figure with the model of the MmpL4 transport cycle based on our new data and discuss the proposed molecular transport mechanism in more detail in the main.

      The authors provide evidence that the cysteine-depleted S4L4 construct is functional, but do not show that the construct with the introduced disulfide bond #5 (D39C MmpS4 and S434C MmpL4) is also functional. Demonstrating this would allow the authors to better interpret their resulting structures.

      In the revised version, we will include additional data to assess the functional consequences of cross-linking.

      The analysis presented in Figure 5 and Supplementary Figure 7 seems to suggest that the authors are proposing that the CCD central cavity acts as a transport pathway for the transported substrate, but I am not sure that this hypothesis is explicitly stated. This makes the reasoning behind the analysis presented unclear. Clarity could be improved by stating that the hypothesis of direct transport of substrate through the CCD central channel is being examined using the structure prediction, and what the implications are for the structure solved with the incompletely formed CCD.

      We state clearly in the discussion that the channel through the CCD seems too narrow to let large molecules like mycobactin and bedaquiline pass:[AG1]

      Line 318ff: “ The channel radius of the MmpL4 CCD is very narrow with a minimum of 1.1 Å according to the AlphaFold3 predition (Fig. 5). This is much smaller than the smallest axis of a molecular model of mycobactin molecule of ?? nm as determined from a model of iron-free mycobactin. In addition, the cryo-EM structure of MSMEG_1382 revealed a constriction in the CCD channel [21]. Even though the methionine side chains lining the channel wall are considered to be flexible{Aledo, 2019 #69594}, large conformational changes of the α-helical hairpins relative to each other would be required to allow passage of molecules as large as mycobactin and bedaquiline. The AcrAB-TolC efflux machinery provides an example for such large conformational changes to enable transport of large molecules by iris-like opening and closing movements the outer membrane channel-tunnel TolC [33]. Similar helical twisting may widen the channel of the CCD. Alternatively, it is conceivable that the substrates of MmpL4/MmpL5 are transported along the CCD surface, potentially requiring further protein partners. It is interesting to note that siderophore secretion and drug efflux by MmpL4/MmpL5 systems involves at least two additional proteins, namely the periplasmic protein Rv0455, which was shown to be essential for mycobactin efflux [34] and an outer membrane channel, whose identity remains elusive. A complete molecular understanding of the transport mechanism through the MmpL4/MmpL5 systems hence requires the identification of the missing components and structural information about their interactions.”

      The channel radius of the MmpL4 CCD is very narrow (minimum of 1.1 Å) according to the AlphaFold3 prediction (Fig. 5), and the cryo-EM [AG2] [MN3] structure of MSMEG_1382 revealed a further constriction in the CCD channel [21]. We therefore consider direct substrate transport through the CCD central channel to be physically implausible for molecules of the size of mycobactin and bedaquiline. Even accounting for the flexibility of the methionine side chains lining the channel wall, the large conformational changes of the α-helical hairpins relative to each other would be required to accommodate such large substrates. While iris-like opening movements have been described for TolC in the AcrAB-TolC system [33], those movements widen an already substantially larger channel, and even such dramatic conformational changes would be insufficient to open a channel as narrow as that of the MmpL4 CCD to a diameter permissive for substrate passage. We instead favor a model in which substrates are transported along the outer surface of the CCD, potentially with the assistance of additional protein partners. This is consistent with the observation that MmpL4/MmpL5-mediated siderophore secretion and drug efflux involves at least two further proteins: the periplasmic protein Rv0455, shown to be essential for mycobactin efflux [34], and an as-yet-unidentified outer membrane channel. In this context, the overall flexibility of the CCD - illustrated here by the tilted, incompletely formed conformation - may reflect the conformational dynamics required for interaction with these partner proteins, rather than being directly involved in forming a transport conduit. A complete mechanistic understanding will require identification of the missing components and structural characterization of the fully assembled efflux machinery.

      We do not think that the incompletely formed CCD represents a conformation that is relevant for transport. But it is a demonstration of the overall flexibility of the CCD, which may be required to further open the channel in case the substrates are transported within the CCD tube. Further in-depth experiments will be needed to clarify this interesting question, which is beyond the scope of this paper.

      Given that the results emphasize the flexibility of the CCD, the manuscript would be strengthened by 3D variability analysis either in cryoSPARC or using cryoDRGN (or both). This would allow the authors to better quantify the degree of motion in the CCD and how it may correlate to flexibility in other regions. Further 3D flex reconstruction in cryoSPARC may improve the map quality of the CCD.

      This is a great suggestion. We will include a 3D variability analysisin the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Earp et al reports cryoEM structures of the hexameric (MmpS4)<sub>3</sub>-(MmpL4) )<sub>3</sub> complex from Mycobacterium tuberculosis, which belongs to the RND family of transporters and is known to have a role in the export of siderophores and contribute to drug resistance. The experimental workflow showcased involves the design of disulfide pairs using distance constraints obtained from the AlphaFold predicted structure of the hexameric complex. One such disulfide pair was used to determine the ~3.0 Å structures. The structure reveals density for the previously unresolved coiled-coil domain (CCD), a tilted CCD arrangement, and a cavity within the periplasmic domain, which the authors assert is occupied by detergent. Comparison of this complex with the monomer structure of MmpL4 shows conformational variations interpreted to implicate different domains and conserved residues involved in proton coupling, which might be related to the transport mechanism. While the methodological aspects of the manuscript are solid, enthusiasm for the overall advance/significance is less so, with doubts about the relevance of the tilted CCD structure, considering disulfide trapping and an incomplete validation of the claim that the titled CCD represents a stable intermediate conformation. A clear, updated transport mechanism is largely missing from the manuscript.

      We thank reviewer#3 for these useful comments, which we will address during the revision of the manuscript. In particular, we plan to include a scheme of an updated transport model.

      Strengths:

      Beautiful structures, AF prediction-experimental validation nexus that could be fine-tuned for different systems/difficult to target complexes.

      Weaknesses:

      Physiological relevance of the tilted CCD conformation. No clear mechanistic model for the transport. While the CCD may indeed be a stable intermediate, the fact that the rest of the trimeric arrangement is unaffected does not fully rule out disulfide trapping as a factor in promoting this. The findings would be strengthened if the same tilted conformation is seen using a different set of disulfides. The significance of the detergent molecule and the new cavity observed could also be better discussed in terms of an updated transport model.

      We believe that there was a misunderstanding about our interpretation of the tilted CCD. As a matter of fact, it must be a stable intermediate, otherwise no density would have been observed for it in the cryo-EM maps. Despite being a stable intermediate, it is indeed unlikely that it represents a conformational state that is relevant/required for transport. Firstly, only the upright, complete CCD can bridge the periplasm. because . Secondly, the structure was determined in detergent and lacks additional protein binder partners, which might stabilize the upright conformation of the CCD . It is also conceivable, as the reviewer pointed out, that disulfide cross-linking may have caused the tilt. However, as we wrote in the manuscript, we do not think that cross-linking caused this striking asymmetry of the CCD, because the three MmpL4 and MmpS4 chains are basically symmetrical in the C1-processed data (see also Figure 2E):

      Line 182 ff: “To assess whether there are asymmetries in other parts of the structure, we superimposed the individual protomers of the (MmpS4)3-(MmpL4)3 complex analyzed using C1 symmetry (Fig. 2E). Apart from the two resolved α-helical hairpins, the MmpL4 core domains and the resolved parts of MmpS4 differ by a RMSD of less than 0.6 Å and are therefore structurally identical considering the map resolution of around 3 Å. The fact that the core domains of MmpS4 and MmpL4 do not deviate between the protomers argues against the possibility that the cross-links established between them cause the (asymmetric) tilt of the CCD.”

      Regarding the DDM binding site, we will indeed include an updated transport model. That said, we wish to be cautious, because we lack experimental proof that MmpL4 can in fact transport DDM.

    1. Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits are regulating behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting and in fact more fundamental than showing if it is serotonin that does it or not.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      Comments on the latest version:

      The changes to the manuscript sufficiently addressed my few comments. I do not have anything else substantial to add to my review and I am comfortable with my initial assessment.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In "Drift in Individual Behavioral Phenotype as a Strategy for Unpredictable Worlds," Maloney et al. (2024) investigate changes in individual responses over time, referred to as behavioral drift within the lifespan of an animal. Drift, as defined in the paper, complements stable behavioral variation (animal individuality/personality within a lifetime) over shorter timeframes, which the authors associate with an underlying bet-hedging strategy. The third timeframe of behavioral variability that the authors discuss occurs within seasons (across several generations of some insects), termed "adaptive tracking." This division of "adaptive" behavioral variability over different timeframes is intuitively logical and adds valuable depth to the theoretical framework concerning the ecological role of individual behavioral differences in animals.

      Strengths:

      While the theoretical foundations of the study are strong, the connection between the experimental data (Figure 1) and the modeling work (Figure 2-4) is less convincing.

      Weaknesses:

      In the experimental data (Figure 1), the authors describe the changes in behavioral preferences over time. While generally plausible, I identify three significant issues with the experiments:

      (1) All of the subsequent theoretical/simulation data is based on changing environments, yet all the experiments are conducted in unchanging environments. While this may suffice to demonstrate the phenomenon of behavioral instability (drift) over time, it does not properly link to the theory-driven work in changing environments. An experiment conducted in a changing environment and its effects on behavioral drift would improve the manuscript's internal consistency and clarify some points related to (3) below.

      We have added further discussion of this to the discussion section.

      (2) The temporal aspect of behavioral instability. While the analysis demonstrates behavioral instability, the temporal dynamics remain unclear. It would be helpful for the authors to clarify (based on graphs and text) whether the behavioral changes occur randomly over time or follow a pattern (e.g., initially more right turns, then more left turns). A proper temporal analysis and clearer explanations are currently missing from the manuscript.

      We have added a figure (1F to better visualize the changes in handedness over days). We have also pointed out the connection between the power spectrum and the autoregressive model given by the Wiener-Khinchen theorem (which states that the autocorrelation function of a wide-sense stationary process has a spectral decomposition of its power spectrum).

      (3) The temporal dimension leads directly into the third issue: distinguishing between drift and learning (e.g., line 56). In the neutral stimuli used in the experimental data, changes should either occur randomly (drift) or purposefully, as in a neutral environment, previous strategies do not yield a favorable outcome. For instance, the animal might initially employ strategy A, but if no improvement in the food situation occurs, it later adopts strategy B (learning). In changing environments, this distinction between drift and learning should be even more pronounced (e.g., if bananas are available, I prefer bananas; once they are gone, I either change my preference or face negative consequences). Alternatively, is my random choice of grapes the substrate for the learning process towards grapes in a changing environment? Further clarification is needed to resolve these potential conflicts.

      We have discussed this further in the discussion.

      Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use a time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases the fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, the authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits regulate behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting, and in fact more fundamental than showing if it is serotonin that does it or not.

      We have adjusted our wording and contextualized our claims based on previous literature.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      We have reanalyzed the behavioral data in a hierarchical model to account for batch effects. Accounting for batch effects (Fig 1G, S1G) we still observe differences between genotypes and for pharmaceutical manipulations of serotonin, though our data provides more equivocal evidence for the effects of trh<sup>n</sup> on drift.

      Reviewer #3 (Public review):

      Summary:

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that while flies exhibit an individual turning bias (when averaged over time), their preferences fluctuate over slow timescales.

      To understand whether genetic or neuromodulatory mechanisms influence the drift in individual preference, the authors test different fly strains concluding that both genetic background and the neuromodulator serotonin contribute to the degree of drift.

      Finally, the authors use theoretical approaches to identify the range of environmental conditions under which drift in individual bias supports population growth.

      Strengths:

      The model provides a clear prediction of the environmental fluctuations under which a drift in bias should be beneficial for population growth.

      The approach attempts to identify genetic and neurophysiological mechanisms underlying drift in bias.

      Weaknesses:

      Different behavioral assays are used and are differently analysed, with little discussion on how these behaviors and analyses compare to each other.

      We have added text indicating that these two behavioral responses have previously been shown to be correlated to each other and that the spectral power analysis and autoregressive model are conceptually linked.

      Some of the model assumptions should be made more explicit to better understand which aspects of the behaviors are covered.

      We have added a table in the supplemental clarifying all of the parameters of modeling for each figure.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Highlights of the Consultation Session of 3 Reviewers

      In the consultation session, the reviewers discussed as particularly important the relative contribution of genotype and variable environment. Further analyses of the replicates of the genotypes were suggested to exclude the environment as the source of difference in the extent of drift between genotypes. If the difference in the extent of drift between replicates is greater than the difference in the extent of drift between genotypes, then one cannot really say that there is a genetic control over drift and that it would evolve (which is still an interesting result, but would be less exciting for a follow-up evolution experiment). If replicates differ, testing whether the relative difference in the extent of drift between genotypes is maintained across environments would also be strong evidence that the extent of behavioral drift is a property of a genotype and not a mere result of a fluctuating/variable environment. The authors do present two behavior paradigms that can serve the purpose of comparing the relative extent of drift between genotypes across the two paradigms that they already have. The authors might consider whether experimental data could be brought closer to theory by including an experiment in a variable environment (e.g temp or diet changes etc.).

      Reviewers also agreed in the consultation session that methods and definitions were somewhat cryptic, and it would be very helpful if they were more detailed. For example, linking the free walking analysis to the Ymaze and then the model1 to the model2 was not straightforward.

      We have added text to make more explicit the theoretical connection between the freewalking analysis, the ymaze analysis, and the model. We have added text and a supplemental table to clarify the methods.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 161: The authors state in the supplement that they used DGRP strains, which are inbred and not isogenic. According to the original authors, they possess 99.3% genetic identity. The isoD1 strain has no known crossing scheme, so complete chromosome isogeneity remains questionable, especially after 12 or more years since its creation. The authors should refer to the strains as "near-isogenic" or a similar term.

      We have adjusted the language as suggested to be more accurate.

      (2) Lines 276, 338: The manuscript contains some unfinished sentences or remnants from the drafting process (e.g., "REFREF"). A thorough editorial review is recommended to eliminate such errors.

      We have cleaned up all references and made additional passes to adjust text.

      Reviewer #2 (Recommendations for the authors):

      (1) If the authors want to claim that serotonin is a regulator of drift, they should provide a negative control experiment, using equivalent perturbations of another neuromodulator and non-modulator. Alternatively, they could simply soften the claims revolving around serotonin and its putative direct role in modulating drift.

      We have softened the claims as suggested to avoid claiming our results show a specific role for serotonin.

      (2) I would suggest always using "behavioral drift" when referring to drift, especially in the context of modeling, because it can be easily confused with genetic drift and cause confusion when reading.

      We have adjusted the language throughout the manuscript per this suggestion.

      (3) It would be good to see in the methods if the 2-hour assays were always done at the same time of the fly's subjective day and when (e.g. how many hours after lights on).

      We have clarified this.

      (4) I understand that many experiments use methodology replicated from the group's previous work, but I would recommend elaborating the experimental methods section in the supplementary such that the reader can understand and reproduce the methods without having to sift through and look for them in previous papers.

      We have expanded on our discussion of the methodology in the methods section.

      Reviewer #3 (Recommendations for the authors):

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that flies exhibit an individual turning bias (when averaged over time), yet their preferences fluctuate over slow timescales. However, it's unclear why the authors chose to switch to a different assay to compare strains. In particular, it's ambiguous whether the behavioral measure in one setup is comparable to that in the other; specifically, whether a bias in one setup reflects the same type of bias in the other. The behavior is also sampled differently across setups (though the details are unclear; see comments below) and analyzed using different methods. Consequently, it remains uncertain whether the slow fluctuations observed in the arena setup are also present in the Y maze. It appears that the analysis of the Y maze data only addresses individual behavioral variance or, at most, day-to-day changes, without accounting for longer-term correlations in bias-which I understood to be the primary interest in the arena setup. Some clarification is needed here (see specific comments below).

      In Figure 2, the authors attempt to show the potential advantage of individual drift for survival in unpredictable, fluctuating environments. They demonstrate that while bet-hedging provides an advantage over timescales matching the generation time (since reproduction is required), it offers less benefit on shorter timescales, where an increased individual drift could be advantageous. This approach is well-conceived, and the findings are convincing, though the model would benefit from further clarification and additional explanation in the text.

      Here are some more specific comments:

      PART 1:

      (1) L 223 one probably cannot see a circadian peak at 24h if the data were filtered at 24h, did they look with another low pass cutoff?

      We clarified in the text that the power spectrum analysis was performed on unfiltered data.

      (2) L 243 the spread in standard deviation is said to be consistent with drifting bias, however, I do not agree with this. The variation could be stochastic but independent across days, and show no temporal correlation. As done with the circular arena, a drift should be estimated as a temporal correlation in the behavior.

      It is consistent insofar as seeing a non-zero standard deviation is a necessary condition for drift. While it does not show that there is any consistency over time, this can be inferred from the autoregressive model (as well as previous work). We have added text to make this clearer.

      (3) In the autoregressive model this temporal aspect seems to be incorporated only to the first order (from day to day). Therefore, from what I understand, the drift term is not correlated over time. This seems very different from the spectral analysis done in the circular assay, and I wonder if it fits at all the initial definition of drift. For example, is the model compatible with a fixed mean and a similar power spectrum as in Figure 1C? The text should clarify that.

      can be made clear in the case of σ = 0 and ϕ = 1, where values wouldϕ ≠ be0 In an AR(1) process, datapoints day to day are correlated as long as . This perfectly correlated with each other across time. The AR(1) model and the PSD of circling can be related via the Wiener-Khinchin theorem. We have added text to make this connection clear.

      (4) Did serotonin have no role in turning bias? My understanding of previous work was that serotonin should affect the bet-hedg variance as well - the authors should discuss what is expected or not, especially given that the pharmacological and genetic approaches do not have the same effect on bet-edging (Figure 1H-I).

      As the pharmacological methods were only applied after eclosion, we do not find it surprising that we do not measure differences in the initially measured distribution of handedness in that case. We do see more evidence of it in the mutations, though the trh<sup>n</sup> experiments provide a less clear effect after our adjustments to account for batch effects.

      (5) Methods: It is unclear how flies were handled across days; e.g. in Y mazes: 2h each day for how many days? In the arena flies were imaged either twice daily for 2h per session, or continuously for 24h (L138) - but which data are used where?

      We will make this more clear, but all data in figure 1 was the continuous 24h data

      This part of the methods is not well explained and I think it should be described in more detail.

      (6) How many flies per genotype were tested in fig 1E?

      Information was added to the caption to duplicate information in the table.

      PART 2:

      (7) In Figure 2B I do not understand the formulation N(50−ϕ: 50, σ), N(phi-et: et, σ) or in general N(x: m, s): does this mean that the variable x has normal distribution with mean m and variance s? Usually this would be written as N(x|m, s) or N(x; m, s)

      If so then: N(50−ϕ: 50, σ) = N(ϕ: 0, σ) which has mean=0 while the figure caption says "from a normal distribution centred on the long term environmental mean" - what is the long term environmental mean?

      If this is correct, and, therefore, we are just centering the mean, what about N(et-phi: et, σ)?

      Et is the environment at the time, not the mean of the environment (which is 50). We have added more detail in supplementary methods to address this.

      (8) Should ϕ vary between 1-100? And is the environmental parameter in Figure 2C also varying between 1-100? These ranges should be written somewhere.

      While implied in the sigma notation, we have added more detail in supplementary methods to explain the situation.

      (9) As far as I understand the bounding envelope in Figure 2B is necessary to contain the drift model. In Figure 1F, a bounding effect was generated by the "tendency to revert to no bias." It is unclear to me whether these two formulations are equivalent. Moreover, none of these two models might be able to recapitulate the correlations observed in the circular arena and analyzed spectrally in Figure 1C. It would be necessary that the author make an effort to relate these models/quantifications one to another. My understanding of Figure 1B is that there are slow fluctuations around the mean. Is the bounded drift model in 2B not returning to the same mean? And do these models generate slow fluctuations? Further explanation could help clarify these points.

      We have added additional explanation to explain the connection between the power spectrum and the two methods of (phi and bounding envelop) of establishing stationarity.

      (10) Expanding on the above: I thought that the definition of individuality is based on some degree of stability over days. However, both models assume drift to occur from day to day (and also the analysis of the DGRP lines assumes so). Some clarification here could help: is the initial bet-edging variation maintained in the population? And is the mean individual bias still a thing or it is just drifting away all the time?

      The initial bet-hedging is maintained to some degree, based on the parameter of phi and the bounding envelope. We have added text to make this clearer.

      (11) In both Figures 2C and 2E the populations are always shrinking, is that correct? And if so, is it expected? Does the model allow growth in a constant environment?

      As the plotted values are the log, the optimal environments do allow growth (visible more clearly in 2D). We have added some text to make this clearer.

      (12) Growth is quantified only across 100 days (Figure 2D) but at day 100 there is not something like a steady state, how is 100 chosen? Would it make sense to check longer times to see if the system eventually takes off? And if not, why?

      (13) Related to the above: what is the growth range achieved in Figure 3A-B? Is the heatmap normalized to the same value across conditions? I think it would be important to consider the absolute range of variation of growth or at least the upper value across conditions.

      Moreover: is growth quantified at day 100? What happens at longer times? Does the temporal profile of the growth curve differ across environmental conditions? (I'm referring to a Figure as 2D).

      As we are plotting the log change, we are ultimately showing the growth rate. While a more realistic model would involve carrying capacity, we believe a simplified model showing growth or no growth captures the difference in growth rate between different strategies. We have added some text to make this clearer.

      (14) Suddenly at line 502, sexual maturity is introduced as a parameter, which was never mentioned before, called a_min in the figure legend of panel 3a, but it is unclear where this is in the model. And please also clarify if sex maturity is the same as generation time.

      Sexual maturity is the same as generation time, we have standardized terminology throughout the paper.

      (15) Regarding lines 505-508, could one simply conclude that in this model formulation, the generation time has the effect of a low pass filter on environmental fluctuation? The question is: is this filtering effect the only effect of generation time?

      While this seems to capture the high-frequency effect we see, it does not explain the shift from bet-hedging->drift we see at lower-frequency environmental fluctuations.

      (16) What reproductive rate is used for the PCA analysis? Is the variance associated with the drift so low because of choosing a fast reproductive rate? A comment in the main text would be helpful.

      We have clarified that these plots were done at 10 days.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Most importantly, in accordance with questions raised by Reviewer 1, we now include a detailed comparison of the cell type frequencies between the two examined time points as well as comparison of the pseudotimes along those lineages. This is detailed in the new section “Many cell types are shared between day 8 and day 16 EBs” and illustrated in Supplementary Figure 6c and Supplementary Figures 7-8.

      Besides this new chapter and its accompanying methods part, we mainly edited the language and to clarify methods and assumptions according to the Reviewer suggestions.

      The main concern of Reviewer 2 was our use of the liftoff gene annotation. We explained our reasoning for this choice extensively in our public response to the Reviewer, but did not incorporate this into our manuscript because even though this is an important subject it is not within the main scope of our paper.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen, et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise be unobservable.

      Undirected differentiation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and humans and then employed to annotate other species. Jocher, Janssen, et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single-cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is a challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as, despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability, etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      We thank the Reviewer for their kind assessment of our work.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests a significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa, etc.

      You are absolutely correct in pointing out that the large clonal variability in cell type composition is a challenge for our analysis. We also noted the odd behavior of the orangutan EBs, and their underrepresentation of ectoderm. There are many possible sources for these variable differentiation propensities: clone, sample origin (in this case urine) and individual. However, unfortunately for the orangutan, we have only one individual and one sample origin and thus cannot say whether this germ layer preference says something about the species or is due to our specific sample. Because of this high variability from multiple sources, getting enough cell types with an appreciable overlap between species was limiting to analyses. In order to be able to derive meaningful conclusions from intra-species analyses and the impact of different sources of variation on cell type propensity, we would need to sequence many more EBs with an experimental design that balances possible sources of variation. This would go beyond the scope of this study.

      Instead, here we control for intra-species variation in our analyses as much as possible: For the analysis of cell type specificity and conservation the comparison is relative for the different specificity degrees (Figure 3C). For the analysis of marker gene conservation, we explicitly take intra-species variation into account (Figure 4D).

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them.

      Concerning the temporal aspect, indeed we knowingly omitted to include an explicit comparison of day 8 and day 16 EBs, because we felt that it was not directly relevant to our main message. Our pseudotime analysis showed that the differences of the two time points were indeed a matter of degree and not so much of quality. All major lineages were already present at day 8 and even though day 8 cells had on average earlier pseudotimes, there was a large overlap in the pseudotime distributions between the two sampling time points (Author response image 1). That is why we decided to analyse the data together.

      Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses?

      When we started the experiment, we simply did not know what to expect. We were worried that cell types at day 8 might be too transient, but longer culture can also introduce biases. That is why we wanted to look at two time points, however as mentioned above the differences are in degree.

      Concerning the cell type composition: yes, day 16 EBs are more heterogeneous than day 8 EBs. Firstly, older EBs have more distinguishable cell types and hence even if all EBs had identical composition, the sampling variance would be higher given that we sampled a similar number of cells from both time points. Secondly, in order to grow EBs for a longer time, we moved them from floating to attached culture on day 8 and it is unclear how much variance is added by this extra handling step.

      Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      We did not see any differences in the marker conservation between early and late cell types, but we have too little data to say whether this carries biological meaning.

      Author response image 1.

      Pseudotime analysis for a differentiation trajectory towards neurons. Single cells were first aggregated into metacells per species using SEACells (Persad et al. 2023). Pluripotent and ectoderm metacells were then integrated across all four species using Harmony and a combined pseudotime was inferred with Slingshot (Street et al. 2018), specifying iPSCs as the starting cluster. Here, lineage 3 is shown, illustrating a differentiation towards neurons. (A) PHATE embedding colored by pseudotime (Moon et al. 2019). (B) PHATE embedding colored by celltype. (C) Pseudotime distribution across the sampling timepoints (day 8 and day 16) in different species.

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. However some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

      Author response image 2.

      UMAP visualization for the Harmony-integrated dataset across all four species for the seven shared cell types, colored by cell type identity (A) and species (B).

      Good point, if we understand correctly, the concern is that in our relatively broadly defined cell types, species are not well mixed and that this in turn is partly responsible for marker gene divergence. This problem is indeed difficult to address, because most approaches to evaluate this require integration across species which might lead to questionable results (see our Discussion).

      Nevertheless, we attempted an integration across all four species. To this end, we subset the cells for the 7 cell types that we found in all four species and visualized cell types and species in the UMAPs above (Author response image 2).

      We see that cardiac fibroblasts appear poorly integrated in the UMAP, but they still have very transferable marker genes across species. We quantified integration quality using the cell-specific mixing score (cms) (Lütge et al. 2021) and indeed found that the proportion of well integrated cells is lowest for cardiac fibroblasts (Author response image 3A). On the other end of the cms spectrum, neural crest cells appear to have the best integration across species, but their marker transferability between species is rather worse than for cardiac fibroblasts (Supplementary Figure 9). Cell-type wise calculated rank-biased overlap scores that we use for marker gene conservation show the same trends (Author response image 3B) as the F1 scores for marker gene transferability. Hence, given our current dataset we do not see any indication that the low marker gene conservation is a result of too broadly defined cell types.

      Author response image 3.

      (A) Evaluation of species mixing per cell type in the Harmony-integrated dataset, quantified by the fraction of cells with an adjusted cell-specific mixing score (cms) above 0.05. (B) Summary of rank-biased overlap (RBO) scores per cell type to assess concordance of marker gene rankings for all species pairs.

      Reviewer #2 (Public review):

      Summary:

      The authors present an important study on identifying and comparing orthologous cell types across multiple species. This manuscript focuses on characterizing cell types in embryoid bodies (EBs) derived from induced pluripotent stem cells (iPSCs) of four primate species, humans, orangutans, cynomolgus macaques, and rhesus macaques, providing valuable insights into cross-species comparisons.

      Strengths:

      To achieve this, the authors developed a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types across primates. This study makes a significant contribution to the field by advancing cross-species cell type identification.

      We thank the reviewer for their positive and thoughtful feedback.

      Weaknesses:

      However, several critical points need to be addressed.

      (1) Use of Liftoff for GTF Annotation

      The authors used Liftoff to generate GTF files for Pongo abelii, Macaca fascicularis, and Macaca mulatta by transferring the hg38 annotation to the corresponding primate genomes. However, it is unclear why they did not use species-specific GTF files, as all these genomes have existing annotations. Why did the authors choose not to follow this approach?

      As Reviewer 1 also points out, also we have observed that the annotation of non-human primates often has truncated 3’UTRs. This is especially problematic for 3’ UMI transcriptome data as the ones in the 10x dataset that we present here. To illustrate this we compared the Liftoff annotation derived from Gencode v32, that we also used throughout our manuscript to the Ensembl gene annotation Macaca_fascicularis_6.0.111. We used transcriptomes from human and cynomolgus iPSC bulk RNAseq (Kliesmete et al. 2024) using the Prime-seq protocol (Janjic et al. 2022) which is very similar to 10x in that it also uses 3’ UMIs. On average using Liftoff produces higher counts than the Ensembl annotation (Author response image 4A). Moreover, when comparing across species, using Ensembl for the macaque leads to an asymmetry in differentially expressed genes, with apparently many more up-regulated genes in humans. In contrast, when we use the Liftoff annotation, we detect fewer DE-genes and a similar number of genes is up-regulated in macaques as in humans (Author response image 4B). We think that the many more DE-genes are artifacts due to mismatched annotation in human and cynomolgus macaques. We illustrate this for the case of the transcription factor SALL4 in Author response image 4C, D. The Ensembl annotation reports 2 transcripts, while Liftoff from Gencode v32 suggests 5 transcripts, one of which has a longer 3’UTR. This longer transcript is also supported by Nanopore data from macaque iPSCs. The truncation of the 3’UTR in this case leads to underestimation of the expression of SALL4 in macaques and hence SALL4 is detected as up-regulated in humans (DESeq2: LFC= 1.34, p-adj<2e-9). In contrast, when using the Liftoff annotation SALL4 does not appear to be DE between humans and macaques (LFC=0.33, p.adj=0.20).

      Author response image 4.

      (A) UMI-counts/ gene for the same cynomolgus macaque iPSC samples. On the x-axis the gtf file from Ensembl Macaca_fascicularis_6.0.111 was used to count and on the y-axis we used our filtered Liftoff annotation that transferred the human gene models from Gencode v32. (B) The # of DE-genes between human and cynomolgus iPSCs detected with DESeq2. In Liftoff, we counted human samples using Gencode v32 and compared it to the Liftoff annotation of the same human gene models to macFas6. In Ensembl, we use Gencode v32 for the human and Ensembl Macaca_fascicularis_6.0.111 for the Macaque. For both comparisons we subset the genes to only contain one-to-one orthologs as annotated in biomart. Up and down regulation is relative to human expression. C) Read counts for one example gene SALL4. Here we used in addition to the Liftoff and Ensembl annotation also transcripts derived from Nanopore cDNA sequencing of cynomolgus iPSCs. D) Gene models for SALL4 in the space of MacFas6 and a coverage for iPSC-Prime-seq bulk RNA-sequencing.

      (2) Transcript Filtering and Potential Biases

      The authors excluded transcripts with partial mapping (<50%), low sequence identity (<50%), or excessive length differences (>100 bp and >2× length ratio). Such filtering may introduce biases in read alignment. Did the authors evaluate the impact of these filtering choices on alignment rates?

      We excluded those transcripts from analysis in both species, because they present a convolution of sequence-annotation differences and expression. The focus in our study is on regulatory evolution and we knowingly omit marker differences that are due to a marker being mutated away, we will make this clearer in the text of a revised version.

      (3) Data Integration with Harmony

      The methods section does not specify the parameters used for data integration with Harmony. Including these details would clarify how cross-species integration was performed.

      We want to stress that none of our conservation and marker gene analyses relies on cross-species integration. We only used the Harmony integrated data for visualisation in Figure 1 and the rough germ-layer check up in Supplementary Figure S3. We will add a better description in the revised version.

      Reference

      Janjic, Aleksandar, Lucas E. Wange, Johannes W. Bagnoli, Johanna Geuder, Phong Nguyen, Daniel Richter, Beate Vieth, et al. 2022. “Prime-Seq, Efficient and Powerful Bulk RNA Sequencing.” Genome Biology 23 (1): 88.

      Kliesmete, Zane, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, and Ines Hellmann. 2024. “Evidence for Compensatory Evolution within Pleiotropic Regulatory Elements.” Genome Research 34 (10): 1528–39.

      Lütge, Almut, Joanna Zyprych-Walczak, Urszula Brykczynska Kunzmann, Helena L. Crowell, Daniela Calini, Dheeraj Malhotra, Charlotte Soneson, and Mark D. Robinson. 2021. “CellMixS: Quantifying and Visualizing Batch Effects in Single-Cell RNA-Seq Data.” Life Science Alliance 4 (6): e202001004.

      Moon, Kevin R., David van Dijk, Zheng Wang, Scott Gigante, Daniel B. Burkhardt, William S. Chen, Kristina Yim, et al. 2019. “Visualizing Structure and Transitions in High-Dimensional Biological Data.” Nature Biotechnology 37 (12): 1482–92.

      Persad, Sitara, Zi-Ning Choo, Christine Dien, Noor Sohail, Ignas Masilionis, Ronan Chaligné, Tal Nawy, et al. 2023. “SEACells Infers Transcriptional and Epigenomic Cellular States from Single-Cell Genomics Data.” Nature Biotechnology 41 (12): 1746–57.

      Street, Kelly, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. 2018. “Slingshot: Cell Lineage and Pseudotime Inference for Single-Cell Transcriptomics.” BMC Genomics 19 (1): 477.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1B: the orangutan tubulin stain looks a bit unusual - just confirming that this is indeed the right image the authors want to include here.

      We agree, this unfortunately also reflects the findings from the scRNA-seq analysis in that we found hardly any cells that we would classify as proper neurons.

      (2) Typo on line 90: 'loosing' should be 'losing'.

      Fixed

      (3) Line 118: why do the authors believe that using singleR will give better results than MetaNeighbour? This certainly seems supported by the data in S4 and S5, but the reasoning is not clear.

      We think that this might depend on the signal to noise ratio, which is a property specific to each dataset. Here we just wanted to state that our approach seems to work better for our developmental data, but we didn’t test out other data and thus cannot generalize.

      (4) Figure 2B: there are some coloured lines on the first filled black bar from the left - do they mean anything? I couldn't work it out from looking at the figure.

      Indeed this is a bit misleading the colors on the left represent the species identity: this was to illustrate the mixing of the of species for each cell type: The legend reads now: “Each line represents a cell which are colored by their species of origin on the left and by their current cell type assignment during the annotation procedure on the right.”

      (5) Figure 3: I did not understand how the seven bins of the cell type specificity metric were derived until much later - it is just the number of cell types in which a gene is expressed, yes? Might be worth making this clearer earlier in the text.

      We made this more explicit in the legend. “Boxplot of expression conservation of genes according to the number of different cell types in which a gene is expressed in humans (cell type specificity).”

      (6) It would be great to provide a bit more thorough documentation for the shiny app, so it can serve as a stand-alone resource and not require going back and forth with the paper to make sure one knows what one is doing at every point.

      Agree, this would be a good idea. We are on it.

      (7) Line 477: I think this is unclear - the authors retain over 11000 cells per species but then set the maximum number of cells in a cluster for pairwise comparison to 250... which is a lot fewer. What happens to all the other cells? This probably needs some rewriting to clarify it.

      We did this to minimize the power differences due to cell numbers and thus make the results more comparable across species. We added this explanation to the methods section for Marker gene detection.

      Reviewer #2 (Recommendations for the authors):

      How was the clustering resolution (0.1) determined?

      This resolution was only used for the initial rough check up of the germ layers as reported in Figure 1 and Supplementary Figures S3. We chose this resolution because it yielded roughly the same number of clusters as the number of cell types that we got from classification with the Rhodes et al data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides evidence that cerebellar projections to the thalamus are required for learning and execution of motor skills in the accelerating rotarod task. This important study adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The data presentation is generally sound, especially the main observations, with some limitations in describing the statistical methods and a lack of support for two separate cerebello-thalamic pathways, which is incomplete in supporting the overall claim.

      We completed the MS by adding a double retrograde labelling study showing that the two pathways have limited overlap and by addressing the other concerns.

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript tackling the issue of whether subcircuits of the cerebellum are differentially involved in processes of motor performance, learning, or learning consolidation. The authors focus on cerebellar outputs to the ventrolateral thalamus (VL) and to the centrolateral thalamus (CL), since these thalamic nuclei project to the motor cortex and striatum respectively, and thus might be expected to participate in diverse components of motor control and learning. In mice challenged with an accelerating rotarod, the investigators reduce cerebellar output either broadly, or in projection-specific populations, with CNO targeting DREADD-expressing neurons. They first establish that there are not major control deficits with the treatment regime, finding no differences in basic locomotor behavior, grid test, and fixed-speed rotarod. This is interpreted to allow them to differentiate control from learning, and their inter-relationships. These manipulations are coupled with chronic electrophysiological recordings targeted to the cerebellar nuclei (CN) to control for the efficacy of the CNO manipulation. I found the manuscript intriguing, offering much food for thought, and am confident that it will influence further work on motor learning consolidation. The issue of motor consolidation supported by the cerebellum is timely and interesting, and the claims are novel. There are some limitations to the data presentation and claims, highlighted below, which, if amended, would improve the manuscript.

      We thank the reviewer for the positive comments and insightful critics.

      (1) Statistical analyses: There is too little information provided about how the Deming regressions, mean points, slopes, and intercepts were compared across conditions. This is important since in the heart of the study when the effects of inactivating CL- vs VL- projecting neurons are being compared to control performance, these statistical methods become paramount. Details of these comparisons and their assumptions should be added to the Methods section. As it stands I barely see information about these tests, and only in the figure legends. I would also like the authors to describe whether there is a criterion for significance in a given correlation to be then compared to another. If I have a weak correlation for a regression model that is non-significant, I would not want to 'compare' that regression to another one since it is already a weak model. The authors should comment on the inclusion criteria for using statistics on regression models.

      We thank the reviewer for pointing out this weakness of description. The description of the Methods has thus been expanded and better justified in the “Quantification and statistical analysis” section.

      We agree with the reviewer that comparison between Deming regressions would be fragile due to the weakness of these regression in treatment groups (while they are quite robust for control groups) and they are not included in the MS, although Deming regression coefficients with their confidence intervals are now provided for all groups in the statistical tables. As now more clearly explained in the Methods, the comparisons between groups are based on the distribution of residuals around regressions of the control regression lines. If we understand correctly the reviewer’s request, the control groups are all included.

      (2) The introduction makes the claim that the cerebellar feedback to the forebrain and cortex are functionally segregated. I interpreted this to mean that the cerebellar output neurons are known to project to either VL or CL exclusively (i.e. they do not collateralize). I was unaware of this knowledge and could find no support for the claim in the references provided (Proville 2014; Hintzer 2018; Bosan 2013). Either I am confused as to the authors' meaning or the claim is inaccurate. This point is broader however than some confusion about citation.

      The references are not cited in the context of collaterals from the DCN but for the output channels of the basal ganglia and cerebellum: “They [basal ganglia and cerebellum] send projections back to the cortex via anatomically and functionally segregated channels, which are relayed by predominantly non-overlapping thalamic regions (Bostan, Dum et al. 2013, Proville, Spolidoro et al. 2014, Hintzen, Pelzer et al. 2018).” Indeed, the thalamic compartments targeted by the basal ganglia and cerebellum are distinct, and in the Proville 2014, we showed some functional segregation of the cerebello-cortical projections (whisker vs orofacial ascending projections). Hintzen et al. have indeed performed an extensive review indicating the limited overlap between cerebellar- and basal ganglia-recipient territories. The sentence has been corrected to clarify what the “They” referred to.

      The study assumes that the CN-CL population and CN-VL population are distinct cells, but to my knowledge, this has not been established. It is difficult to make sense of the data if they are entirely the same populations, unless projection topography differs, but in any event, it is critical to clarify this point: are these different cell types from the nuclei? how has that been rigorously established?; is there overlap? No overlap? Etc. Results should be interpreted in light of the level of this knowledge of the anatomy in the mouse or rat.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but as the reviewer says, it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      (3) It is commendable that the authors perform electrophysiology to validate DREADD/CNO. So many investigators don't bother and I really appreciate these data. Would the authors please show the 'wash' in Figure 1a, so that we can see the recovery of the spiking hash after CNO is cleared from the system? This would provide confidence that the signal is not disappearing for reasons of electrode instability or tissue damage/ other.

      The recordings were not extended to the wash period, but examination of the firing rate before CNO on successive days did not evidence major changes in the population firing rate (this is now shown in a new supplementary figure 6).

      (4) I don't think that the "Learning" and "Maintenance" terminology is very helpful and in fact may sow confusion. I would recommend that the authors use a day range " Days 1-3 vs 4-7" or similar, to refer to these epochs. The terminology chosen begs for careful validation, definitions, etc, and seems like it is unlikely uniform across all animals, thus it seems more appropriate to just report it straight, defining the epochs by day. Such original terminology could still be used in the Discussion, with appropriate caveats.

      Since reference to these time windows is repeatedly used in the text we have shifted to “Early” and “Late” phase terminology.

      (5) Minor, but, on the top of page 14 in the Results, the text states, "Suggesting the presence of a 'critical period' in the consolidation of the task." I think this is a non-standard use of 'critical period' and should be removed. If kept, the authors must define what they mean specifically and provide sufficient additional analyses to support the idea. As it stands, the point will sow confusion.

      This has been corrected to: “suggesting the cerebellar contribution to the consolidation of the task is critical early in the learning process and cannot be easily reinstated later”

      Reviewer #2 (Public review):

      Summary:

      This study examines the contribution of cerebello-thalamic pathways to motor skill learning and consolidation in an accelerating rotarod task. The authors use chemogenetic silencing to manipulate the activity of cerebellar nuclei neurons projecting to two thalamic subregions that target the motor cortex and striatum. By silencing these pathways during different phases of task acquisition (during the task vs after the task), the authors report valuable findings of the involvement of these cerebellar pathways in learning and consolidation.

      Strengths:

      The experiments are well-executed. The authors perform multiple controls and careful analysis to solidly rule out any gross motor deficits caused by their cerebellar nuclei manipulation. The finding that cerebellar projections to the thalamus are required for learning and execution of the accelerating rotarod task adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The finding that silencing the cerebellar nuclei after a task impairs the consolidation of the learned skill is interesting.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      While the controls for a lack of gross motor deficit are solid, the data seem to show some motor execution deficit when cerebellar nuclei are silenced during task performance. This deficit could potentially impact learning when cerebellar nuclei are silenced during task acquisition.

      One of our key controls are the tests of the treatment on fixed speed rotarod, which provides the closest conditions to the ones found in the accelerating rotarod (the main difference between the protocols being the slow steady acceleration of rod rotation in the accelerating version). Indeed, small but measurable deficits are found at the highest speed in the fixed speed rotarod in the CN-VAL group, while there was no measurable effect on the CN-CL group, which actually shows lower performances from the second day of learning; we believe this supports our claim that the CN-CL inhibition impacted more the learning process than the motor coordination. In contrast, the CN-VAL group only showed significantly lower performance on day 4 consistent with intact learning abilities. Yet, under CNO, CN-VAL mice could stay for more than a minute and half at 20rpm, while in average they fell from the accelerating rotarod as soon as the rotarod reached the speed of ~19rpm (130s). Overall, we focused our argument on the first days of learning where the differences between the groups are more pronounced. We clarified the discussion (section “A specific impact on learning of CL-projecting CN neurons.”)

      Separately, I find the support for two separate cerebello-thalamic pathways incomplete. The data presented do not clearly show the two pathways are anatomically parallel. The difference in behavioral deficits caused by manipulating these pathways also appears subtle.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      While we agree that after 3-4 days of learning the difference between the groups becomes elusive, we respectfully disagree with the reviewer that in the early stages these differences are negligible.

      Reviewer #3 (Public review):

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that:

      (1) Cerebellothalamic connections are important for learning motor skills

      (2) Cerebellar efferents specifically to the central lateral (CL) thalamus are important for shortterm learning

      (3) Cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and

      (4) That once a skill is acquired, cerebellothalamic connections become important for online task performance.

      The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between online learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is a trend towards impaired rotarod performance at higher speeds in Supplementary Figure 4f, suggesting that there could be subtle changes in motor performance below the level of detection of their assays.

      This is now better acknowledged in the discussion in the section “A specific impact on learning of CL-projecting CN neurons.” However, we want to underline that the strongest deficit in learning is found in animals with CN->CL inhibition which latency to fall saturates at about 100s on the rotarod; this indicates that mice fall as soon as the accelerating rotarod speed reaches about 16rpm. In fixed speed rotarod, the inhibition of CN->CL neurons shows not even a trend of difference at 15rpm with control mice, and the animals run 2 minutes without falling at this speed. This makes us confident that the CN->CL pathway interfers more with the learning than with the actual locomotor function on the rotarod.

      (2) There is likely some overlap between CN neurons projecting to VAL and CL, somewhat limiting the specificity of their conclusions.

      This issue is treated in the discussion. (see also replies to reviewers 1 and 2 above). We added experiments with simultaneous retro-AAV infections in CL and VAL and the data are presented in Supplementary Figure 5. We found that retrograde infection targeted different populations of CN neurons; although collaterals in both CL and VAL may be present for (some of) these two populations of neurons, they are likely strongly biased toward one or the other thalamic regions, explaining the differential retrograde labelling in the CN. We hope these experiments will answer the reviewer’ s concern.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Multiple studies have reported on the effect of cerebellar nuclei (CN) manipulation on locomotion. Here the authors perform several controls and careful analysis to rule out gross motor deficits caused by DREADD-mediated CN silencing. As the authors point out in the discussion, part of the difference from prior studies could be the mild degree of inhibition here. However, it is possible that the CN inhibition here induces a subtle motor deficit and the accelerating rotarod task is challenging and more readily reveals this motor deficit, rather than a deficit in motor learning per se. Two pieces of data seem to suggest this:

      (a) under CN inhibition during the task (Figure 1i), mice could never achieve the level of performance as mice under CN inhibition after the task, even after several days of training, which suggests the CN inhibition is interfering with task performance;

      (b) in highly trained mice (after learning), applying the CN inhibition impaired performance to a similar extend as mice in Figure 1i (Figure 4).

      Can the authors rule out the possibility that CN inhibition during the task is impairing motor execution rather than motor learning?

      We do not rule out a contribution of impaired motor coordination at the highest speed (last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”). Indeed, most of our argument in favor of deficit in learning is primarily in the first days (Early phase), particularly for the CN->CL CNO group (Fig 3h). A crucial control in our work is the use of fixed speed rotarod, where no deficit is observed. The difference between the fixed and accelerating rotarod is rather minimal since the acceleration of the rotarod is rather small (0.12rpm/s for speed up to >20 rpm).

      Interpreting the effect of treatment reversal is challenging. If the only effect of CNO was a motor deficit, the animals who learned under CNO should rapidly regain higher performance under saline, which is not observed. When switching from CNO to Saline after 7 days of training, it is difficult to disentangle which part is due to a crude motor deficit (which would not show in fixed speed rotarod), and which part is due to an unability to resume motor learning after the task has been (mis-)consolidated.

      (2) The separation of the cerebellar pathways to the intralaminar thalamus (IL) and ventral thalamus (VAL) is not clear to me. It is not clear the CN neurons projecting to these nuclei are distinct. In addition, although IL projects to the striatum and VAL does not, both IL and VAL project to motor cortex. It is unclear to what extent these pathways can be separated. The argument for distinct pathways (as laid out in the discussion) is the distinct behavior deficits when manipulating these two pathways, but this difference seems subtle (point 3).

      We now clarify that CN populations are different help to retrograde labelling experiments (new Suppl Fig 5). A discussion on the differences in IL and VAL projections is now discussed in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.” Briefly, we argue that the despite some overlap of their targets, the profiles of the CL and VAL differ substantially.

      (3) The pattern of behavioral deficits induced by CN->CL and CN->VAL neurons appear similar in Figure 3b-c and e-f. I have difficulty seeing how these data lead to the differences in the regression fits in panels 3g-k, which seem to show distinct patterns of performance change within and across sessions. One notable difference in Figure 3b-c and e-f seems to be that CN->VAL CNO treated mice exhibit lower performance on the very first trial for most days. Somehow, this pattern is present even after the CNO treatment is switched to saline (Figure 3f). I wonder if this data point is driving the difference. One control analysis the authors could do is to exclude the 1st trial and test if the effects are preserved.

      Since the learning is cumulative and involves varying degree of consolidation it is indeed difficult to substantiate the difference from the average performance: a performance on day 3 may be limited by slow learning and perfect consolidation or good learning and imperfect consolidation. That is why we designed an analysis which takes into account the observed relationships between initial performance, within session gain of performance and acrosssession carry-over of this gain of performance (Fig 2). This analysis focuses on the first days of learning, before the performance plateau is reached in the CNO groups. While a clear deficit in consolidation is observed with full CN inhibition, this is not the case for the CN→CL CNO groups, despite their weaker performance after 3 days, similar to that seen with full CN inhibition. In contrast, normal learning is observed in the CN→VAL CNO group during these three days. The consolidation deficit in the CN→VAL CNO group is more subtle than in the CN CNO group and is indeed largely driven by the first data point. This is consistent with the idea that CN→VAL inhibition only partially impairs consolidation (compared to full CN inhibition), leaving some “savings” that allow rapid reacquisition.

      (4) The quantification of locomotion in Figure S2 needs more information. What is linear movement? What is sigma? What is the alternation coefficient? These are not defined in the legends or the Methods as far as I can tell. Related to point 1 above, the authors should provide some analysis of the stride length and hindlimb to forelimb distance as measures of locomotion execution.

      These measures were taken from Simon J Neurosci 2004 24(8):1987-1995 which is now cited and their description is now provided in the Methods.

      Minor:

      (5) To help readers follow the logic of experimental design, please explain why CNO was switched to saline after day 4 in Figures 1j, 3c, and f. Specifically, is the saline manipulation meant to test something as opposed to applying CNO throughout the entire course of the behavioral test?

      Since we had no difference between the groups at the end of the Early phase, we decided to test whether the skill consolidated under CNO remained available when the CNO was removed (and it indeed was). This is now more clearly stated in the Results.

      (6) I have difficulty understanding what is plotted in Figure 4b and d. The legend says the change in performance is calculated the same way as in Figure 2a, so the changes are presumably the regression slopes. But how are the regression slopes calculated for daily start (1st trial) and daily end (last trial)?

      Skill level at the beginning and end of each trial correspond to the values of the regression line for abscissae values of trial 1 and trial 7 (green points). This has been added to the figure legend.

      (7) Do CN-CL and CN-VAL neurons also project to other brain regions besides the thalamus? Might these pathways also contribute to learning and consolidation of the accelerating rotarod task? Please discuss.

      This is now discussed in more detail in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”

      Reviewer #3 (Recommendations for the authors):

      (1) Please check the anatomic evidence for the strict dichotomy between intralaminar (specifically central lateral nucleus) nuclei projecting to the striatum and the ventral-anteriorlateral (VAL) complex projecting to the cortex. For example, while the Chen et al paper shows that there are cerebellar-intralaminar-striatal projections, it does not exclude intralaminar cortex projections, which have at least been demonstrated in rats. Similarly, VAL has projections to striatum (see, e.g., Smith et al, "The thalamostriatal system in normal and diseased states", Frontiers in Systems Neuroscience, 2014). It may be that some of these projections are stronger, but I don't think it's true that these pathways are as well-separated as the authors suggest. I also don't think this changes the fundamental conclusions but is important for potential mechanisms by which differential learning could occur and necessitate modification of Figure 5.

      We have toned down the interpretation of CL and VAL relaying specifically to different brain structures and mostly put forward the duality of the pathways. The connections with the cortex are now discussed at the end of the section “A specific impact on learning of CL-projecting CN neurons.”

      (2) Please provide more details on the spike sorting. By what metrics were single units declared to be well-separated? How many units were identified under each condition? What was the distribution of firing rates with and without CNO treatment? Are the units shown in panel 1f from before and after CNO as in panel E or are just 2 examples of isolated units? The units by themselves are not very helpful to the reader. Showing sample auto and/or crosscorrelograms for units recorded on the same electrode would be more helpful to show how well-isolated the units are.

      Single units were considered well-isolated based on quantitative quality metrics computed after MountainSort 4 spike sorting (Phyton 3.8). Units were required to have a signal-to-noise ratio (SNR) greater than 5, inter-spike interval (ISI) violations less than 1%, an amplitude cutoff below 0.1, a presence ratio above 0.9, a firing rate greater than 0.1 Hz, and at least 50 detected spikes. In addition, units were assessed for temporal stability across the recording using autocorrelograms and presence over the recording, ensuring there were no prolonged periods of total inactivity. Units meeting these criteria were deemed well-separated and reliable for further analysis. This has been added to the Methods.

      Cell numbers are provided with the statistics in the supplementary table for fig panel 1g. Panels are from the same unit before and after CNO. Example of auto- crosscorr- are provided in the new Supplementary Figure 6.

      (3) Panel 2g - "firing rate modulation" is unclear. I think the authors are showing the mean firing rate with DREADD+CNO treatment divided by the mean firing rate in the pre-CNO condition for the same group (I couldn't find that in the Methods, my apologies if I missed it)? However, firing rate modulation to me means variability in firing rate within a recording. Perhaps "relative firing rate" or "% pre-CNO firing rate" would be clearer?

      The definition has been added to the Method and the axis has been changed to ‘Change in FR induced by SAL/CNO’

      (4) Figure 3f - why does consolidation appear to be impaired after the transition from CNO to saline between sessions, when in panel 1j suppressing the CN does not have a similar effect once CNO is switched to saline? Could this be driven by a small number of mice? Since a central conclusion of the paper is that CN-VAL connections are uniquely important for posttraining consolidation, this discrepancy is important to explain - if the results post-saline are spurious, how do we know that the results post-CNO aren't also spurious? Panels similar to Figure 4b and d showing all the data from the last/first trial of each session I think would be convincing.

      Our results overall indicate that the overnight consolidation of the improvement in performance seem only effective in the early phase (as pointed out on the summary figure 5). We do not believe then that the saline results are spurious.

      It can be seen indeed in the control groups of the figure 1; to make this more visible, we plot in Author response image 1 the difference between trial 7 and trial 1 the next day. An overnight drop in performance becomes visible in the late phase.

      Author response image 1.

      The decrement on the first trial in the first 3 days is visible for the majority of the mice. The plot asked by the reviewer is represented in the Author response image 2.

      Author response image 2.

      Minor points:

      (5) In panel 1a, the solid yellow line obscures a lot of the image and I don't think adds anything.

      We assume this was referring to a line on fig1d, which has been removed.

      (6) Panel 2a - color selection could present problems for those with red-green color blindness.

      This has been fixed.

      (7) Supplementary Figure 3 - what are the arrows and arrowheads indicating?

      These have been removed.

      (8) In the Discussion: "Studies of cerebellar synaptic plasticity provide clearly support the involvement of cerebellum in rotarod learning..." Delete the word "provide"

      This has been fixed

      (9) "This indicates that either the distinct functional roles of VAL-projecting or CLprojecting." The second "of" should be "or", I think.

      This has been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the reviewer for the thorough and constructive evaluation of our manuscript. We greatly appreciate the recognition of our work's strengths, particularly the integration of experiments and mathematical modeling, the stochastic framework for describing sloughing events, and the insights into pressure-driven detachment dynamics.

      We have carefully considered each point raised and provide detailed responses below. In response to the reviewer's comments, we have revised the Methods section to better clarify our approach to three-dimensional assessment. We believe these revisions have improved the clarity of the manuscript.

      Below, we address each of the specific concerns raised by the reviewer:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:<br /> The study achieves its primary goal of integrating experiments and modeling to understand the coupling between flow and biofilm growth and detachment in a microfluidic channel, but it should have highlighted the weaknesses of the methods. I list the ones that, in my opinion, are the main ones:

      The study does not consider biofilm porosity, which could significantly affect the flow and forces exerted on the biofilm. Porosity could impact the boundary conditions, such as the no-slip condition, which should be validated experimentally.

      Porosity is indeed a key component of biofilm structures, resulting from the polymeric nature of the EPS matrix, mechanical forces, and biological processes such as cell death or predation. When considering flow-biofilm interactions, this porosity may allow fluid flow through the biofilm, with reported permeability values spanning an extremely broad range from 1015 to 10-7 m2 (Kurz et al., 2023).

      However, we argue that biofilm permeability is not the primary driver in our system:

      (1) In microscopy visualization, our biofilms form dense structures where flow around the biofilm through narrow channels dominates over flow through the porous biofilm matrix.

      (2) We performed microrheology experiments in these biofilms by imaging the Brownian motion of nanoparticles in the biofilm. Their trajectories indicate that, in our conditions, the viscoelastic flow of the biofilm itself largely dominates over the flow of culture medium through the biofilm matrix.

      (3) We argue that the extreme variability in reported permeability values (spanning several orders of magnitude, Kurz et al., 2023) reflects not only differences in experimental systems, but also fundamental challenges in defining and measuring permeability for viscoelastoplastic biofilms (the biofilm itself is actually flowing). Given this uncertainty, incorporating permeability into our model would introduce parameters that cannot be reliably constrained from literature or independently measured in our setup. Our approach (i.e. treating the biofilm as impermeable and focusing on flow obstruction) avoids this parametrization complexity while successfully capturing the observed dynamics.

      (4) Our model successfully predicts the observed scaling laws (φmax ∝ Q1/2, Fig. 7f) and hydraulic resistance dynamics (Fig. 3) without invoking permeability, suggesting that flow obstruction rather than flow penetration is the dominant mechanism.

      Reference: Kurz, D. L.; Secchi, E.; Stocker, R.; Jimenez-Martinez, J. Morphogenesis of biofilms in porous media and control on hydrodynamics. Environ. Sci. Technol. 2023, 57 (14), 5666−5677.

      The research suggests EPS development as a stage in biofilm growth but does not probe it using lectin staining. This makes it impossible to accurately assess the role of EPS in biofilm development and detachment processes.

      We respectfully disagree that lectin staining is necessary to assess the role of EPS in our system, and we argue that our approach using genetic mutants is superior for the following reasons. Lectin staining has significant limitations. While widely used, lectin staining (e.g., concanavalin A) is non-specific (binding not only to EPS polysaccharides but also to bacterial cell surfaces) and is non-quantitative. It can confirm the presence of polysaccharides but cannot establish causal relationships between specific EPS components and mechanical properties or detachment dynamics. We performed preliminary experiments with ConA-rhodamine (data not shown), which showed widespread presence of polysaccharides. However, this provided limited insight beyond confirming EPS production, which is well-established for P. aeruginosa PAO1 biofilms. We employed a more rigorous genetic approach to directly assess the role of EPS composition. We used Δpel and Δpsl mutants (strains lacking key exopolysaccharides that are the primary structural components of the PAO1 matrix). Our results demonstrate that both mutants show significantly reduced maximum clogging compared to wild-type. The Δpsl mutant is particularly affected, with near-complete detachment at certain flow rates. These differences directly link EPS composition to mechanical stability and detachment dynamics. This genetic approach provides causal, quantitative evidence for the role of specific EPS components in biofilm development and detachment, information that lectin staining cannot provide. We believe this addresses the reviewer's concern more rigorously than lectin staining would.

      While the force and flow are three-dimensional, the images are taken in two dimensions. The paper does not clearly explain how the 2D images are extrapolated to make 3D assessments, which could lead to inaccuracies.

      We thank the reviewer for this important observation. We would like to clarify our methodological approach. Our primary three-dimensional measurement is the hydraulic resistance R(t), obtained from pressure drop measurements across the biofilm-containing channel section. This pressure-based measurement inherently captures the three-dimensional flow obstruction caused by the biofilm. We then employ a geometric model (uniform biofilm layer on all channel walls) to convert R(t) into volume fraction φ(t).

      The two-dimensional fluorescence imaging serves to validate this model-based approach rather than being the basis for three-dimensional extrapolation. The uniform layer assumption is supported by three independent lines of evidence: (i) the excellent quantitative agreement between predicted and measured scaling laws (φmax ∝ Q1/2, Fig. 7f), obtained without adjustable parameters; (ii) the high reproducibility of φmax values across different flow rates and replicates; and (iii) the strong correlation between model-derived φ(t) from pressure measurements and integrated fluorescence intensity (Fig. 3b-d).

      We have added clarifying text in the Methods section (subsection "Data analysis for the calculation of the hydraulic resistance and volume fraction") to better explain this approach and emphasize that pressure measurements provide the three-dimensional information, with the geometric model serving as the link to volume fraction.

      Although the findings are tested using polysaccharide-deficient mutants, the results could have been analyzed in greater detail. A more thorough analysis would help to better understand the role of matrix composition on the stochastic model of detachment.

      We thank the reviewer for this suggestion. Our mutant analysis demonstrates that Δpsl and Δpel strains have significantly reduced φmax and altered detachment dynamics compared to wild-type (Fig. 8), directly linking EPS composition to mechanical stability as predicted by our model. A rigorous quantitative connection between matrix composition and the stochastic parameters (interevent times, jump amplitudes) would require: (i) substantially more sloughing events for statistical power, (ii) independent mechanical characterization of each mutant, and (iii) a mechanistic model linking EPS composition to detachment parameters. We are currently developing microrheology approaches to characterize mutant mechanical properties, which could enable such refinement in future work.

      However, this represents a substantial study beyond the scope of the current manuscript, which establishes the self-sustained sloughing-regrowth cycle and its stochastic nature. The mutant results serve their intended purpose: demonstrating that EPS composition affects detachment, consistent with our model's framework.

      Reviewer #2 (Public review):

      This manuscript develops well-controlled microfluidic experiments and mathematical modelling to resolve how the temporal development of P. aeruginosa biofilms is shaped by ambient flow. The experiment considers a simple rectangular channel on which a constant flow rate is applied and UV LEDs are used to confine the biofilm to a relatively small length of device. While there is often considerable geometrical complexity in confined environments and feedback between biofilm/flow (e.g. in porous media), these simplified conditions are much more amenable to analysis. A non-dimensional mathematical model that considers nutrient transport, biofilm growth and detachment is developed and used to interpret experimental data. Regimes with both gradual detachment and catastrophic sloughing are considered. The concentration of nutrients in the media is altered to resolve the effect of nutrient limitation. In addition, the role of a couple of major polysaccharide EPS components are explored with mutants, which leads results in line with previous studies.

      There has been a vast amount of experimental and modelling work done on biofilms, but relatively rarely are the two linked together so tightly as in this paper. Predictions on influence of the non-dimensional Damkohler number on the longitudinal distribution of biofilm and functional dependence of flow on the maximum amount of biofilm (𝜙max) are demonstrated. The study reconfirms a number of previous works that showed the gradual detachment rate of biofilms scales with the square root of the shear stress. More challenging are the rapid biofilm detachment events where a large amount of biofilm is detached at once. These events occur are identified experimentally using an automated analysis pipeline and are fitted with probability distributions. The time between detachment events was fitted with a Gamma distribution and the amplitude of the detachment events was fitted with a log-normal distribution, however, it is not clear how good these fits are. Experimental data was then used as an input for a stochastic differential equation, but the output of this model is compared only qualitatively to that of the experiments. Overall, this paper does an admirable job of developing a well-constrained experiments and a tightly integrated mathematical framework through which to interpret them. However, the new insights this provides the underlying physical/biological mechanisms are relatively limited.

      We thank the reviewer for the thorough evaluation of our work and for highlighting the tight integration between experiments and modeling. We appreciate the constructive feedback regarding the goodness-of-fit for the probability distributions.

      To address the concern that "it is not clear how good these fits are," we have added quantile-quantile (Q-Q) plots for the Gamma distribution fits of inter-event times to the Supplementary Materials (Supplementary Figure S20). These plots demonstrate that the sample quantiles track the theoretical Gamma quantiles across all flow rates (0.2, 2, and 20 μL/min), indicating that the Gamma distribution provides a reasonable approximation of the overall distributional behavior. For detachment amplitudes, we selected the lognormal distribution based on the observed high skewness and kurtosis in the data, which are characteristic signatures of lognormal processes.

      Formal goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) yielded mixed results across datasets, passing for some while failing for others. This variability reflects inherent noise from measurements, discrete temporal sampling, automated detection thresholds, and intrinsic biological variability. Importantly, our goal is to capture essential distributional characteristics for input into the stochastic model, not to achieve perfect statistical fit across all individual datasets. The Q-Q plots confirm that these distributions provide reasonable approximations, and the qualitative agreement between model predictions and experimental observations validates this modeling approach. We have revised the Methods section to clarify this rationale.

      We respectfully disagree that “new insights this provides the underlying physical/biological mechanisms are relatively limited.” Beyond confirming previous findings (e.g., scaling for gradual detachment), we believe our work provides several novel mechanistic insights. First, the Pe/Da criterion enables quantitative prediction of nutrient limitation regimes, allowing systematic decoupling of nutrient effects from other phenomena in biofilm studies. Second, we demonstrate that pressure, not shear, drives sloughing detachment events, a mechanism overlooked in previous studies where the notion of “shear-induced detachment” clearly dominates. Third, we show that sloughing-regrowth cycles occur even in single channels, establishing pressure-driven fluctuations as a signature of confined biofilm growth, independent of geometric complexity. Finally, the stochastic description of sloughing demonstrates that, while instantaneous biofilm states are irreproducible, the underlying randomness is predictable, therefore addressing a fundamental challenge in biofilm research.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In the abstract, I suggest clarifying the term "bacteria development." It is unclear if it refers to bacterial growth, biofilm formation, or biofilm detachment. The concept is expressed more clearly at the end of the Introduction.

      We have modified the entire abstract to make it clearer. The abstract now explicitly establishes the key processes - growth ('nutrients necessary for growth', 'growing bacteria obstruct flow paths') and detachment ('mechanical stresses that cause detachment', 'flow-induced detachment', 'sloughing') - before using 'bacterial development' as a collective term to refer to these coupled spatiotemporal dynamics. We believe the abstract is now clear as written.

      (2) Findings from Sanfilippo et al. (2019) were slightly questioned by Padron et al. (PNAS, 2023), who discovered that H2O2 transport is responsible for fro operon upregulation.

      Thanks for the clarification, which is indeed significant. The new sentence now reads: Pseudomonas aeruginosa has been found to regulate the fro operon in response to flow-modulated H2O2 concentrations (Sanfilippo et al. 2019, Padron et al. 2023).

      (3) Additionally, Kurz et al. (2022) account for pressure buildup as the mechanism controlling sloughing.

      We respectfully disagree and note that Kurz et al. (2022) identify shear stress, not pressure buildup, as the primary mechanism controlling sloughing. Besides the title, key sentences include “opening was driven by a physical process and specifically by the shear forces associated with flow through the biofilm”, “The opening of the PFPs is driven by flow-induced shear stress, which increases as a PFP becomes narrower due to microbial growth, causing biofilm compression and rupture.” While pressure differences are measured as indicators of system state and do contribute to normal compression stresses, their mechanistic explanation emphasizes that narrowing PFPs experience increased shear rates that eventually exceed the biofilm's yield stress, triggering viscoplastic deformation and detachment. The pressure buildup is a hydraulic consequence of narrowing rather than the direct cause of sloughing. In contrast, our work demonstrates that in confined geometries, pressure differences generate tangential stresses at the biofilm-solid interface that directly drive detachment.

      (4) The flow control strategy represented in Fig. 1 is not explained and should be detailed in the Methods section.

      The methods section reads as follows. Inoculation and flow experiments BHI suspensions were adjusted at optical density at OD640nm= 0.2 (108 CFU/mL) and inoculated inside the microchannels from the outlet, up to approximately ¾ of the channel length in order to keep a clean inlet. The system was let at room temperature (25°C) for 3h under static conditions. Flow experiments were then performed at 0.02, 0.2, 2, 20 and 200 μL/min constant flow rates for 72h in the microchannels at room temperature. For the experiments at 0.2, 2, 20 and 200 μL/min, the fluidic system was based on a sterile culture medium reservoir pressurized by a pressure controller (Fluigent FlowEZ) and connected with a flow rate controller (Fluigent Flow unit). The flow rate was maintained constant by using a controller with a feedback loop adjusting the pressure in the liquid reservoir. The reservoir was connected to the chip using Tygon tubing (Saint Gobain Life Sciences Tygon™ ND 100-80) of 0.52 mm internal diameter and 1.52 mm external diameter, along with PEEK tubing (Cytiva Akta pure) with 0.25 mm inner diameter adapters for flow rate controller. The waste container was also pressurized by another independent pressure controller to reduce air bubble formation in the inlet part. For the experiments at 0.02 μL/min, we used an Harvard Phd2000 syringe pump for the flow.

      (5) Including images of the actual biofilms formed in a portion of the channel would aid in understanding the analysis presented in Fig. 2.

      Images are introduced later on (eg Figure 5). There is also supplementary material showing videos.

      (6) The boundary conditions used to calculate the stress in the developed model should be discussed. The authors should specify why biofilm porosity is neglected.

      We have added a detailed discussion in the supplementary (Section I.2).

      (7) In the first section of the Results, the authors hypothesize that heterogeneity in biofilm development could be due to oxygen limitation. However, given the high oxygen permeability of PDMS, this hypothesis is later denied by their data. It would be prudent to avoid this hypothesis initially to streamline the presentation. Additionally, the authors should specify how oxygen levels at the inlet and outlet are measured.

      We appreciate this comment and agree that streamlining would simplify the presentation. However, after careful consideration, we have chosen to retain the oxygen limitation hypothesis for the following reasons: (1) oxygen limitation is a frequently invoked mechanism in biofilm systems and deserves explicit consideration, (2) it is not immediately obvious that oxygen remains non-limiting in larger microchannels where transverse gradients could develop, and (3) systematically eliminating this plausible alternative hypothesis strengthens our mechanistic conclusion that BHI drives the observed heterogeneity. Regarding oxygen measurements: we did not directly measure dissolved oxygen concentrations. Our approach is only indirect.

      (8) What is the standard deviation of the doubling time measured at different flows (page 9)?

      We have indicated the standard deviation in the text. Note that the graph shows the SEM.

      (9) What is the "zone of interest" in the channel mentioned on page 9?

      We have added the following sentence to clarify: To further understand this effect, let us consider the mass balance of biofilm in the zone of interest -- the zone where biofilm grows in between the two UVC irradiation zones -- in the channel.

      (10) Minor and major detachment events should be classified based on a defined threshold or criteria, and their frequency should be measured.

      We appreciate the reviewer's concern about quantitative rigor. However, we respectfully disagree that imposing arbitrary thresholds to classify 'minor' vs. 'major' events would improve our analysis. Detachment events in our system span a continuum of magnitudes, and any threshold would be artificial and potentially misleading. Our quantitative characterization of detachment dynamics is provided through the statistical analysis of interevent times, which we show follow a gamma distribution. This stochastic framework captures the full spectrum of detachment behavior without requiring arbitrary binning. The terms 'minor' and 'major' in our manuscript are used qualitatively to illustrate the range of observed phenomena, not as formal classifications.

      (11) Have the authors identified a reason for the peaks in the volume fraction in the Δpsl mutants at the highest flow rate?

      The biofilm thickness following these sloughing events is below our detection limit, consistent with a residual layer of cells. However, these cells grow, leading to a time window where the fraction is measurable, before a new detachment event occurs. Our understanding is that the psl mutant forms a weaker matrix with a much lower threshold for sloughing.

      (12) The fit of the probability density function for the relative density function does not match the data well. The authors should comment on this.

      We have added quantile-quantile (Q-Q) plots for the Gamma distribution fits of inter-event times to the Supplementary Materials (Supplementary Figure S20). These plots demonstrate that the sample quantiles track the theoretical Gamma quantiles across all flow rates (0.2, 2, and 20 μL/min), indicating that the Gamma distribution provides a reasonable approximation of the overall distributional behavior. For detachment amplitudes, we selected the lognormal distribution based on the observed high skewness and kurtosis in the data, which are characteristic signatures of lognormal processes. Formal goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) yielded mixed results across datasets, passing for some while failing for others. This variability reflects inherent noise from measurements, discrete temporal sampling, automated detection thresholds, and intrinsic biological variability. Importantly, our goal is to capture essential distributional characteristics for input into the stochastic model, not to achieve perfect statistical fit across all individual datasets. The Q-Q plots confirm that these distributions provide reasonable approximations, and the qualitative agreement between model predictions and experimental observations validates this modeling approach. We have revised the Methods section to clarify this rationale.

      (13) Additionally, the simulated fraction appears very flat, with limited detachments compared to experiments. Why?

      The model captures the essential dynamics of growth-detachment cycles, including the characteristic timescales and volume fraction ranges. Some event-to-event variability in the experimental data likely reflects biological stochasticity not captured by our current approach—for example, variations in local biofilm mechanical properties or matrix composition that affect the precise stress at which sloughing occurs. While incorporating such biological variability as a stochastic parameter would improve detailed agreement, it would require extensive additional characterization beyond the scope of this study. The current model successfully reproduces the key qualitative and semi-quantitative features of the system.

      (14) The methods section should include a more detailed explanation of how the model was validated against experimental data.

      Model validation was performed by comparing predicted biofilm volume fraction time series and sloughing event statistics against experimental observations across multiple flow rates. The model reproduces the characteristic growth-sloughing cycles, timescales, and steady-state volume fractions without additional parameter fitting beyond the experimentally measured distributions.

      (15) It would be useful to include information on the reproducibility of the experiments and any variations observed between replicates.

      Experiments were performed in N=3 biological replicates. Individual time series for all replicates are shown in Supplementary Figures, demonstrating consistent behavior across replicates.

      (16) A discussion of the limitations of the study, particularly regarding the assumptions made in the modeling and their potential impact on the results, would strengthen the paper.

      We have added a discussion on why we chose to neglect the porosity of the biofilm, and strengthened parts on the uniform biofilm layer assumption.

      Reviewer #2 (Recommendations For The Authors):

      Page 2: "A vast" —> "The vast"

      Changed.

      The text and line widths on many of the figures are far too small. I printed it out at normal size, but had to look at a PDF and magnify to actually see what the graphs are showing. Fig. 9c is particularly illegible.

      Changed.

      Fig. 1 caption "photonic" —> "optical"?

      Changed

      Can you spell out the actual mathematical definition of 𝜙 on page 5 when it is introduced? Currently it just says the "cross section volume fraction of the biofilm", but that seems potentially ambiguous. It is valid to say that this is "fraction of the cross section occupied by the biofilm"?

      Changed

      Bottom of page 5: can you state the physical interpretation of the assumption that M is bounded between 0 and 1. i.e. that growth is larger than detachment?

      There is a comment on that in the paper. It reads “In assuming that M ∈ ]0, 1] and eliminating cases where M > 1, we have not considered situations of systematic detachment 𝜙equ = 0 for any value of the concentration, since this is not a situation that we encountered experimentally.” This comes just after presenting the expression on the only non-trivial steady-state, as it becomes easier to explain the consequences of the initial choice at this point.

      Currently the choice of detachment initially used in the model is a bit confusing. You say that you are going to assume a (1-𝜙)-1 model for simplicity (bottom of page 5), but then later you find that the (1-𝜙)3/4 model is more accurate (page 16). Since the latter has already been confirmed in numerous other studies, why not start with that one from the beginning?

      We thank the reviewer for this important question, which highlights an area where our presentation could be clearer. We did not find that the (1-φ)-3/4 model is "more accurate." Rather, we deliberately chose the (1-φ)-1 scaling because it captures pressure-induced detachment, which we hypothesized would dominate in confined flows where biofilms clog a large portion of the channel. The (1-φ)-3/4 scaling, widely used in previous studies, describes shear stress at the biofilm/fluid interface and was developed primarily for reactor systems where pressure effects are negligible. Our analysis on page 16 validates this choice by demonstrating that pressure stress indeed exceeds shear stress when volume fraction is large, which corresponds to late Stage I and all of Stage II precisely where our model is applied. The excellent quantitative agreement between predicted and measured φmax values across flow rates (Fig. 7f, Table 1) further supports the (1-φ)-1 scaling. We recognize that our initial presentation may have suggested the (1-φ)-1 choice was merely for "simplicity." We have revised this section to emphasize that this scaling was chosen specifically to capture pressure-driven detachment in confined geometries, with the physical justification provided by the stress analysis that follows. We have also clarified our ideas on page 16 to express clearly that (1-φ)-3/4 is never used. We could alternatively use a multi-modal detachment function combining both scalings, but the data do not require this additional complexity.

      In general, the models you derived in this study could be better contrasted with that from previous works. e.g. can you compare your Eqn (4) with the steady-state solutions obtained by other previous studies? Is this consistent with previous works or different? (aside from framing the biofilm thickness in terms of 𝜙)

      We are currently working on a paper dedicated to modeling biofilm development in confined flows, which will do a better job at comparing approaches.

      Top of page 6 - you assume K* = 0.1 - Does this assume that cells grow at half the rate in 0.1X BHI as they do in 1X BHI? Has this been confirmed experimentally or is this just a guess?

      This was estimated rather than measured directly. Model predictions were a lot more sensitive to the Damköhler number, than to the value of K.

      "radial" is used widely in this paper, but you are using a square geometry. Is "transverse" a better choice?

      Yes it clearly is. It’s been changed.

      Fig 3. Are panels (a) and (b) showing different bioreps of the same condition? If so, please spell that out in the caption.

      There was an error here in the caption of fig a. This has been changed. The correspondence is between a and c, and these are exactly the same, not bioreps.

      In multiple places it noted that the change in hydraulic resistance is correlated with the "change in biofilm colonization." Why not demonstrate this directly using a cross correlation analysis? How is the latter connected to the 𝜙 parameter? (e.g. is this d(𝜙)/dt?)

      We thank the reviewer for this suggestion. To clarify: φ(t) represents the volume fraction of biofilm in the channel. We measure this in two independent ways: (1) φ(t) from hydraulic resistance (black line in Fig. 3) i.e. calculated from pressure measurements using φ = 1 - √(R₀/R(t)), assuming uniform layer growth (see Methods section "Data analysis for the calculation of hydraulic resistance and volume fraction") and (2) φ(t) from fluorescence (green squares in Fig. 3) i.e. estimated from integrated GFP intensity or image segmentation of the glass/liquid interface. The reviewer is correct that we should quantify this relationship directly. We have now added correlation analysis between these two independent measurements of φ (new Supplementary Figure S21). The analysis shows strong positive correlation, with r-values ranged from 0.68 to 0.77 across all flow rates. This validates two key aspects of our approach: (1) the uniform layer assumption used to convert R(t) to φ(t) is reasonable, and (2) the pressure-based measurements accurately capture the dynamics visible in fluorescence imaging, including both growth phases and sloughing events. The strong agreement is particularly notable given that these measurements probe different aspects of the biofilm: hydraulic resistance is sensitive to the three-dimensional obstruction of flow, while fluorescence captures primarily the biofilm attached to the glass surface within our focal plane. Their correlation supports the model assumptions. We have revised the manuscript to clarify this relationship and present the correlation analysis.

      Top of page 9 - a doubling time of 110 mins is reported in liquid culture - is this in shaken or static conditions? Can you provide some data on how this was calculated? (e.g. on a plate reader?) Do you think your measurements in the microfluidics could be affected by attachment/detachment of cells, rather than being solely driven by division. It is curious that your apparent growth rate varies by a factor of two across the different flow rates and there is not a monotonic dependency. Both attachment and detachment would depend on the flow rate (with some non-trivial dependencies).e.g. https://www.pnas.org/doi/10.1073/pnas.2307718120 https://doi.org/10.1016/j.bpj.2010.11.078

      Given that your doubling time in the microfluidics is sole based on changes in cell number (rather than directly tracking cell divisions) it seems possible your results here are measuring the combined effect of growth, attachment and detachment, rather than just growth.

      We agree with those comments regarding the doubling time measurement. We have added a description of how we performed the doubling time measurement in the Methods section.

      Page 9 - you discuss the role of EPS here, but the effect of EPS is not demonstrated here and this is muddled with a discussion about the non-linearity of the putative dependency. Maybe this would be on a firmer footing if you save the discussion of EPS for the section on the Psl and Pel mutants?

      Changed.

      Middle of page 9: Please define what "smooth detachment" means and contrast it with catastrophic sloughing. Also, please define what you mean by "flow, seeding, and erosion" detachment are and how these three things differ from one another.

      We have clearly defined each term in the revised version.

      The results from wavelet scalograms seem to be underutilised and not well described. Can you clearly say what time series this analyses has been calculated on the caption? e.g. hydraulic resistance? Other than simply pointing out the "blue stripes", what can be gained from this analyses that could not be obtained with another method? It would be great if the basic features of this plot could more fully discussed (e.g. is the curved envelope at the bottom caused by edge effects?)

      We have improved the text, captions and method section following the reviewer’s comment.

      Fig. 5 a and b - please list the time at which each of these images were taken. Do these have the same dt between the two sets of images?

      Yes the dt is the same (30 minutes). It’s been indicated in the caption.

      Fig. 6: you have significant 2D variation in the biofilm width along the length of the channel. The relative contribution of pressure and shear based detachment will be different at different positions along the length. However, this variation is ignored in your model. Can you please comment on this in our manuscript and how it might affect the interpretation of your results? e.g. would the longitudinally averaged description yield the same result as one that takes the geometry into account (on average)?

      Our model indeed assumes longitudinally averaged properties. A more detailed spatially resolved model would be valuable for capturing heterogeneities and will be explored in future work.

      Bottom of page 11: you say standard deviations are in the range of 10-3. How does this jibe with the error bars on the middle flow rate in Fig. 7e?

      This extremely low standard deviation only applies to the maximum value of 𝜙 and is a completely different measurement from the whisker boxes presented in fig7e.

      Fig. 7: You are calculating the "Fraction" here. Is this "𝜙"? If so, can you put that on the y-axis instead? You calculate the volume fraction two different ways e.g. with hydraulic resistance and with imaging. Is only one of these shown in (e)? Is the same powerlaw dependence shown in (f) conserved when the other measurement of the "fraction" is used? Can you include both in Fig. 7e?

      We have modified the axis and indicated 𝜙.

      (e) is calculated only from hydraulic resistance. This is the most precise measurement to evaluate 𝜙 quantitatively.

      Related to the previous comment: Some of the estimates of 𝜙max in Table 1 are obtained by fitting the model to integrated fluorescence data (Fig. 2b), while others are estimated from measurements of the hydraulic resistance. The former yields non-unique sets of parameters. Can the biofilm fraction instead actually be estimated directly from fluorescent imaging by segmenting biofilm and directly calculating how much of the cross section is occupied by cells on average across the length? This seems like a more direct measure of this quantity. Given there are multiple ways of estimating the same parameter, it would be better consistency checking to make sure that different methods actually yield the same result.

      We have now added in Fig S21 a direct comparison of these two measurement methods. These are strongly correlated. Microscopy is more direct but only provides 2D pictures. Hydraulic resistance provides a 3D measurement, but relies on a model of biofilm distribution. Both are imperfect, but correlate well. In particular, we see that the 2D measurement does capture sloughing.

      You cite a large number of supplemental figures (e.g. Fig. S21 on page 12), but the figures in your SI only go up to 11.

      We have revised references to supplementary figures.

      Bottom of page 11: Your data from liquid culture suggests that your psl mutant grows at half the rate of WT cells. Is that consistent with your microfluidic data (e.g. Fig. 8)? If not, might this be a sign that your growth rate analyses from the microfluidics might be affected by attachment/detachment? (see comment above) Psl cells should detach much more easily.

      The approach taken to measure doubling times in the microfluidic system does not rely on the macroscopic measurements presented in figure 8, but rather on the approach presented in fig 4. These measurements require specific imaging (different magnification and time stepping) and we did not perform such experiments for the mutants.

      In analyses of sloughing, you fit the times between the jumps and the relative amplitude. Are these two random variables correlated with one another? Might that influence your results? Your methods say that "jumps were identified through through the selection of local maxima" of the derivative. Do you to say "minima" here? Did you keep all local maxima/minima or did you have a threshold?

      These are two random variables, not correlated with another. This is an assumption, and it would be interesting to analyze whether these are correlated. To perform this analysis, we believe that we would first need to acquire even more data and more replications to improve the statistical analysis.

      Yes, it was minima (in the code we make everything positive, hence the confusion).

      Yes, there is a threshold on the value of the jump itself. This value is extremely low and essentially filters out noise.

      Fig. 9 - can you make it clearer in the caption what timeseries you are analysing here? I understand from the methods this that is the "volume fraction." The data/fits are difficult to see in Fig. 9 b and impossible to see in Fig. 9c because the green bars get in the way of the other two data sets. Can this visualisation be improved? It is not clear to me how good of a job the Gamma and log-normal fits are actually doing.

      We have clarified that histograms are calculated from all experiments/replicates.

      We have slightly modified the graph to make it clearer. This comparison is intrinsically hard, partly because it compares discrete data with continuous PDFs.

      Aside from noting the results from the stochastic sloughing model are 'strikingly similar to experimental data', which seems to be based on a qualitative analysis of the lines in Fig. 7 d, e, and f. However, experimental data is not plotted in the same graph nor is the experimental data that we should be comparing this to cited in the text/caption.

      We have added a note in the caption to indicate which figure it can be compared to.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-03242R

      Corresponding author(s): Shinya Kuroda

      1. General Statements

      We appreciate the reviewers for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer's comments and have revised our manuscript accordingly.

      The reviewers' comments in this letter are in Bold and Italics.

      2. Point-by-point description of the revisions

      Response to Reviewer #1's Comments

      Evidence, reproducibility and clarity:

      Major comments

      1. This study leaves out lipid metabolism as a major energy metabolism pathway relevant to AD. The authors themselves cite the significance of acylcarnitines and CPT1A in AD (pg. 3, lines 32-33, pg. 4, lines 1-2). Lipid metabolism and homeostasis is known to be disrupted in AD1. Fatty acid oxidation is a known energy source in the prefrontal cortex2 and will also generate acetyl coA, which this study reveals is a significant decreased metabolite in AD. Furthermore, sphingomyelin emerges as one of the major decreased DEMs as well. Thus, lipid metabolism should be highlighted in Figure 3 and discussed throughout the manuscript; otherwise its omission should be clearly stated and justified.

      We appreciate the reviewer's insightful comment regarding a critical role of lipid metabolism in AD. We recognize that lipid metabolism is a metabolic pathway deeply involved in AD pathology (Baloni et al., 2022, 2020; Varma et al., 2021). Accordingly, we have revised the Limitations section to more strongly emphasize its role as a vital energy source (pg. 13, lines 15-17). Regarding the visualization of lipid metabolism, we extracted lipid-related pathway from the trans-omic network but found that the regulatory relationships among DEPs and DEMs were excessively complex and interconnected. Thus, interpreting this regulatory network seemed to be more challenging compared to the other energy production pathways presented in our manuscript. Therefore, we have concluded that the pathway analysis in our trans-omic network may not be suitable for deeply elucidating the lipid dysregulation in AD. We have added a statement acknowledging this as a limitation of our current methodology in the revised manuscript (pg. 13, lines 13-22).

      The covariates used for differential analysis should be discussed and justified. Notably, age is used as a covariate for transcriptomic analysis but not proteomic and metabolomic analysis, with no justification. Additionally, given the known importance of lipid metabolism in AD and the putative role of APOE in lipid homeostasis3, APOE genetic status should be considered as a covariate, or its omission should be justified.<br />

      We appreciate the reviewer's comment regarding the included covariates in differential analyses of our study. The reason we did not include other variables, such as age at death and RIN, is that these data were not available for each sample. Thus, we referred to the original research articles from which proteomic or metabolomic datasets used in our study were derived. Regarding the metabolomic dataset, in the original article (Batra et al., 2023), only two metabolites, 1-methyl-5-imidazoleacetate and N6-carboxymethyllysine, were significantly associated with age. In addition, no metabolites were significantly associated with sex, BMI, and years of education. Regarding the proteomic dataset, in the original article (Johnson et al., 2020), age at death, PMI, and sex were included as covariates in the analyses, though these variables were not found to strongly influence the data (Extended Data Fig.2 in (Johnson et al., 2020)).

      The authors make a conclusion statement that suggests intervention: "Collectively, our data suggests that preserving or improving the ability to produce ATP and early intervention in the process of nitrogen metabolism are candidates for the prevention and treatment of dementia" (pg. 12, lines 12-14). This claim is not well-supported by the evidence provided in the study. There are a few limitations: (a) This was an observational, not interventional study; (b) The study did not establish whether the metabolic disruptions are causes or effects in AD; and (c) ATP or other bioenergetic indicators were not directly measured. Therefore, any statements about potential interventions should be removed or qualified as highly speculative.

      We agree with the reviewer that the statement regarding potential interventions was not sufficiently supported by our analyses. Accordingly, we have removed the sentence regarding prevention and treatment from the revised manuscript (e.g., we have deleted final paragraph of the previous manuscript).

      In conjunction with the last point, the main conclusion of the study is that energy production is down in AD. The data presented in Figure 3 are consistent with this conclusion, but it is far from definitive due to limitations stated above in comments 3a and 3b. The authors should offer additional support for this conclusion: experimental follow-up, flux modeling, analysis of alternative datasets with ATP measurement, causal inference.<br />

      We sincerely thank the reviewer for this valuable and constructive suggestion. Regarding flux modeling, we agree that metabolic flux analysis could provide important mechanistic insight. Indeed, previous studies have applied flux modeling in the context of lipid metabolism in Alzheimer's disease (Baloni et al., 2022). We also attempted to perform flux modeling focusing on energy metabolism. However, we found it difficult to obtain biologically meaningful and robust results and therefore decided not to include these analyses in the current manuscript.

      With respect to ATP measurements, we fully agree that direct evidence of altered ATP levels would further strengthen our conclusion. However, to the best of our knowledge, there are currently no publicly available large-scale datasets that directly measure ATP levels in human postmortem brain tissues. This limitation makes it challenging to incorporate validation in the present study.

      Regarding experimental follow-up, we agree that functional validation is essential to confirm the mechanistic implications of our findings. We are actively considering follow-up experimental studies. However, we consider the present work to be a multi-omic integrative analysis aimed at identifying key molecular alterations and generating biologically important hypotheses. We have revised the Limitation section to more clearly position this manuscript as an observational systems-level analysis (pg. 13, lines 20-22).

      The validation analysis did not sufficiently show the generalizability of this study's results. The authors demonstrated a correlation of 0.53 to the MSBB transcriptomics data and 0.60 to the AMP-AD DiverseCohorts proteomics data. Beyond these correlation coefficients, no meaningful comparison between the datasets is offered. How concordant are the differentially expressed features (or pathways) between the datasets? How robust would the trans-omic network be if incorporating the alternate datasets? Is the main conclusion (energy metabolism is down in AD) supported by the validation datasets? We think this analysis should be expanded and described in the main text. Although the results for external metabolomics datasets are reported in Fig S2C, correlation coefficients with the external data are not reported. The authors state, "Note that each study used different definitions for AD and CT groups, had variations in measurement methods and brain regions analyzed." We appreciate these limitations. However, the external data should be re-analyzed using the same definitions of AD and CT, if possible. The limitations and results (which DEMs are shared between datasets) should be discussed in the main text. __

      We thank the reviewer for this important comment regarding the generalizability of our findings. In the revised manuscript, we have expanded the validation analyses and summarized the results in Figure S2. First, at the transcriptomic level, Figure S2B and S2C show the overlap between up- and downregulated genes in AD identified in our ROSMAP-derived analyses and those reported in a previously published large-scale meta-analysis of 2,114 postmortem samples across seven brain regions (Wan et al., 2020). A substantial proportion of DEGs were shared, supporting cross-cohort and cross-region robustness to some extent. At the proteomic level, Figure S2E shows a comparison between the ROSMAP and the AMP-AD DiverseCohorts datasets. We highlighted the subset of enzymes involved in the energy metabolism analysis shown in Fig. 3 and calculated a separate correlation coefficient for this subset (Pearson coefficient = 0.86, p-value = 1.5e-7), further supporting our main conclusion. In addition, to assess the concordance between the two datasets in a threshold-independent manner, we additionally performed Rank-Rank Hypergeometric Overlap (RRHO) analysis (Figure S2E). RRHO analysis (Cahill et al., 2018; Plaisier et al., 2010) enables the comparison of ranked protein lists without relying on arbitrary differential expression cutoffs and has been used for cross-dataset comparison in several previous studies (Fröhlich et al., 2024; Maitra et al., 2023). The RRHO heatmaps demonstrated significant enrichment in the concordant quadrants, confirming systematic agreement between datasets beyond simple correlation coefficients. For metabolomics, Figure S2G shows RRHO analyses comparing the ROSMAP metabolomic data with other datasets measured by the same UPLC-MS/MS platform (Batra et al., 2024; Novotny et al., 2023), demonstrating significant concordance in ranked metabolite changes in AD.

      The glycolysis analysis and discussion needs more development. Glycolysis and gluconeogenesis share many of the same enzymes, but they are not the same pathway and should not be discussed as such. To make a claim about the overall influence of enzyme and metabolite levels on glycolysis, the authors should focus on the energetically committing steps of glycolysis (hexokinase, phosphofructokinase, pyruvate kinase) in Figure 3A, and include the full/current version of the figure in the supplement. Gluconeogenesis-specific enzymes (pyruvate carboxylase, PEPCK) are not mentioned at all - are they among the DEPs/DEGs?<br />

      We appreciate the reviewer's comment regarding the distinction between glycolysis and gluconeogenesis pathway. Among the gluconeogenesis-specific enzyme proteins, G6PC1, FBP1, PC, and PCK2 were measured in our dataset, but none of them were identified as DEPs. In addition, gluconeogenesis is a process that occurs primarily in the liver and kidney rather than the brain. Given this biological context and the lack of significant changes in relevant enzymes, we have revised the terminology throughout the manuscript, replacing "glycolysis/gluconeogenesis pathway" with "glycolysis pathway" in the revised version.

      Given that there wasn't good concordance between the DEGs and DEPs, did including the mRNA and transcription factor layers in the network really add anything useful? It seems like the main conclusions of the manuscript were driven by the protein and metabolite layers only. How many of the DE metabolic enzymes were coregulated at the transcript and protein level? It would be useful to include the 5-layer trans-omic network in the supplement to display these results. Given your network, at what level does it appear that energy metabolism is regulated?<br />

      It is true that our primary conclusion regarding the regulation of energy metabolism is driven by the changes in protein and metabolite abundance. However, we consider the low concordance between mRNA and protein expression itself to be an important feature of AD pathology, as also reported in previous studies (Johnson et al., 2022; Tasaki et al., 2022). Although we did not perform a further analysis of this discordance, we believe that including the TF and mRNA layers into the metabolic trans-omic network strengthens a system-wide view of metabolic dysregulation in AD.

      Regarding the mRNA changes corresponding to the DEP enzymes, please refer to Figure S7A.

      Comment further on the results from Figure 2D. What can be learned from identifying metabolites with the greatest degree centrality? What pathways other than energy metabolism are highlighted by the trans-omic network?<br />

      We assume that some energetic indicators, including AMP and acetyl-CoA, and nitrogen metabolism-related metabolites, Glu, 2-oxoglutarate, and urea, can be potential key regulators of dysregulated metabolism in AD.

      (Suggestion) We suggest the authors leverage their trans-omic network in additional ways beyond giving a snapshot of a few energy metabolism pathways. The analysis of top DEMs could go further. What pathways are impacted beyond energy metabolism? Among the metabolic reactions allosterically regulated by top DEMs, what metabolic pathways are enriched?<br />

      We identified the enriched metabolic pathways that were allosterically regulated by DEMs in AD using Fisher's exact test. Alanine, aspartate, and glutamate metabolism pathways were significantly enriched in 2-oxoglutarate, glutarate, alanine, and glutamate-regulating metabolic reactions. Arginine and proline metabolism pathway was enriched in N-methyl-L-arginine and putrescine-regulating metabolic reactions. Arginine biosynthesis pathway was enriched in arginine-regulating metabolic reactions. Glycerophospholipid metabolism pathway was enriched in CDP-ethanolamine-regulating metabolic reactions. Glycine, serine, and threonine metabolism pathway was enriched in serine-regulating metabolic reactions. Purine metabolism pathway was enriched in AMP-regulating metabolic reactions. Pyrimidine metabolism pathway was enriched in deoxyuridine and thymidine-regulating metabolic reactions. Sphingolipid metabolism pathway was enriched in sphingosine-regulating metabolic reactions. However, this analysis did not yield sufficiently valuable insights into the regulatory relationships among biomolecules in AD. Thus, we did not include these results in the revised manuscript.

      (Suggestion) Figure 3 shows that most differential signal in AD points to lower energy production due to the combination of differentially expressed metabolites and enzymes, but we are not given much context about the strength of these among all the differential signals. We would suggest including volcano plots where the features of interest, i.e. DE enzymes and metabolites, are colored differently (or a similar figure).<br />

      We thank the reviewer for this constructive suggestion. To provide better context regarding the importance of the differential signals, we have added volcano plots for mRNAs, proteins, and metabolites in Figure S4A, B, and C.

      (Suggestion) The PPI network could be better leveraged to understand metabolic changes in AD. If nodes are grouped into subnetworks (e.g. by Louvain / Leiden clustering) and tested for pathway enrichment, could you find functional subnetworks of coordinately up- and down- regulated metabolic enzymes? This could yield some pathways of interest beyond the energy metabolism pathways already highlighted.<br />

      We appreciate the reviewer's suggestion to utilize the PPI network for subnetwork analysis. However, it is important to note that the proteomic dataset analyzed in this study is derived from the original work of (Johnson et al., 2020). In that paper, the authors already performed a Weighted Gene Co-expression Network Analysis (WGCNA) across several datasets to identify co-expressed modules and functional pathways.

      Given this, we assumed that applying additional clustering methods to the same dataset would be unlikely to yield significant biological insights beyond the established findings.

      __ ____Minor comments __

      12. "All genes" and "all metabolites" should not be the background for the proteomic and metabolic pathway enrichment analysis by Metascape and MetaboAnalyst. The background should be limited to the proteins and metabolites that were measured.

      We fully agree with the reviewer that using "all gene" or "all metabolites" as a background is not suitable for enrichment analyses. As suggested, we have revised the enrichment analyses using the measured proteins and metabolites as a background in both Metascape and MetaboAnalyst (Fig. S4D).

      Highlight the metabolic enzymes in Fig S2B. Calculate a separate correlation coefficient for the enzymes extracted in the energy metabolism analysis from Fig 3.<br />

      We appreciate the reviewer's suggestion to refine the correlation analysis. As requested, we have revised Fig. S2D to explicitly highlight the subset of enzymes involved in the energy metabolism analysis shown in Fig. 3. We calculated a separate correlation coefficient for the subset (Pearson coefficient = 0.86, p-value = 1.5e-7).

      Use a multiple hypothesis adjusted p-value or q-value in Figure S3.<br />

      We agree with the reviewer regarding the necessity of correcting for multiple comparisons. Accordingly, we have revised Fig. S4D using q-values.

      Describe the methods used to calculate the logFC values from the validation dataset.<br />

      We have revised the Methods to include a detailed description of the procedure used to calculate the log2FC values for the validation datasets (pg. 21, lines 13-15).

      It is difficult to read Figure 3. We would recommend really emphasizing to the reader to refer to Fig S7B as a "key" to this figure. The description of the red/blue arrows and nodes in the methods section (pg. 24, lines 21-36, pg 25, lines 1-4) were also helpful, but very lengthy. We recommend putting an abridged version of this description into the Fig S7 figure legend.<br />

      We appreciate the feedback regarding the readability of Fig. 3. As recommended, we have revised the manuscript to explicitly direct readers to Fig. S8B as an essential "key" for interpreting the network visualization (pg. 8, lines 28). Furthermore, we have added an abridged description of the network elements to the legend of Fig. S8B.

      The S7 figure legend should refer to panels A and B, not E and F.<br />

      We apologize for this oversight. We have corrected the legend of Fig. S8.

      (Suggestion) Are any of the differentially expressed metabolites allosteric regulators of the DE transcription factors? This could be interesting to discuss.<br />

      We appreciate the reviewer's insightful suggestion about the potential allosteric regulation of the DETFs by DEMs. We conducted an extensive literature search to identify any reports related to this perspective. However, to the best of our knowledge, no such direct interactions have been reported to date.

      Significance:

      The study's strength lies in leveraging three omics modalities across large patient cohorts (n ~ 150-240) to identify coherent signals between transcriptomics, proteomics, and metabolomics in postmortem DLPFC tissue. It was encouraging to see that the main result, showing downregulation for TCA, oxidative phosphorylation, and ketone body metabolism, emerged from consistent signals across both proteomics and metabolomics. This result was consistent with previous findings in other models cited by the author4,5 and other studies 6,7 demonstrating deficiency in energy-producing pathways in AD. Another strength of the study is the application of thoughtful methodology to connect differentially expressed proteins and metabolites via an intermediate data layer of metabolic reactions. The authors leverage the KEGG and BRENDA databases and apply sound logic to estimate the effects of enzyme level and metabolite level on pathway activity, with metabolites serving as substrate, product, or allosteric regulator for reactions. This trans-omic network methodology was developed in previous studies cited by the author8,9. However, as written, this study is limited in its contribution of new knowledge to the AD research field. The main conclusion (energy production is down in AD, due to regulatory disruption of energy metabolism) is not strongly supported (see comments 1, 3, and 4 for elaboration). The evidence could be improved by orthogonal approaches: further experimentation, further integration of external datasets, causal modeling, or flux modeling. Alternatively, even in the absence of new experimental and computational approaches, the story could be made more complete by further leveraging the trans-omic network to provide insights into (a) the regulation of energy metabolism; and (b) the impacts of key disrupted metabolites (see comments 7-9). The study is also limited in its demonstrating the power of these methodologies to provide integrative insights. As mentioned above, the integration of enzyme levels and metabolite levels is clearly useful (Figure 3). In contrast, the utility of the mRNA and transcription factor layers was not evident. The study did not appear to improve or expand upon trans-omic network methodology described in the previous works. Finally, the various analyses (analyzing the trans-omic network for nodes with the highest degree centrality, the PPI analysis, and viewing the energy metabolism pathways in the network) provided disparate results that were only tenuously connected in the discussion section.


      Response to Reviewer #2's Comments____

      Evidence, reproducibility and clarity: Summary

      This manuscript integrates public transcriptomic, proteomic, and metabolomic datasets from ROSMAP DLPFC samples to construct a multi-layer metabolic trans-omic network in Alzheimer's disease. By linking transcription factors, enzyme mRNAs, proteins, metabolic reactions, and metabolites, the authors report coordinated downregulation of the TCA cycle, oxidative phosphorylation, and ketone body metabolism, along with mixed regulatory signals in glycolysis/gluconeogenesis. They interpret these patterns as indicative of broad energetic dysfunction and alterations in amino-acid/nitrogen metabolism in AD. While the framework is conceptually appealing, much of the analysis remains descriptive, and several biological interpretations extend beyond what the data can robustly support. The reliance on bulk tissue without accounting for cell-type composition, limited covariate adjustment, and the absence of validation or sensitivity analyses reduce confidence in the mechanistic conclusions. Overall, the study provides a preliminary systems-level overview, but additional rigor is needed before the proposed trans-omic regulatory insights can be considered convincing.

      Major Comments

      1. Interpretation requires more cautious phrasing, and validation is essential. The manuscript frequently asserts that specific pathways are "inhibited" or that energetic deficits are "compensated," but these conclusions extend beyond what the descriptive, bulk-level data can support. Because no metabolic flux, causality, or direct functional measurements are included, the results should be framed as putative regulatory shifts, not confirmed impairments. Critically, key claims about pathway inhibition would require flux modeling, perturbation analyses, or experimental validation to be convincing. Without such validation, the mechanistic interpretations remain speculative.

      We thank the reviewer for this crucial comment. We fully agree that, given the descriptive and bulk-level nature of our analysis, mechanistic interpretations must be made with caution. In the absence of direct metabolic flux measurements or experimental validation, our findings should be interpreted as putative regulatory shifts rather than confirmed functional impairments. Accordingly, we have revised the manuscript to temper mechanistic claims. We have replaced definitive statements with more speculative phrasing (e.g., "Our analysis revealed a putative coordinated downregulation ..." instead of "Our analysis revealed a coordinated downregulation ..." in Abstract section; "we demonstrate the systems-level view of the potential dysregulated energy production ..." instead of "we demonstrate the systems-level view of the dysregulated energy production ..." in pg. 10, lines 25-26).

      Although the authors acknowledge this in the limitations, bulk-level differences may primarily reflect altered proportions of neurons, astrocytes, microglia, and oligodendrocytes rather than true within-cell-type regulation. Incorporating a cell-type deconvolution or performing a sensitivity analysis would substantially improve interpretability. This issue also impacts the trans-omic network: if the molecules included originate from different cell types, the inferred regulatory relationships may not reflect true intracellular processes.

      We appreciate the reviewer's point that bulk-level differences can reflect altered proportions of different brain cell types, subsequently affecting the inferred trans-omic network analysis. To assess the changes in cell type proportions of the samples that we used in our study, we additionally used public single-cell transcriptomic datasets, which were obtained from DLPFC tissue of 465 subjects in the ROSMAP cohort (Green et al., 2024). For each omic data that we used in our analyses, we matched the same subjects and calculated the following cell type proportions, astrocytes, excitatory neurons, inhibitory neurons, microglias, oligodendrocytes, and OPCs. Then, we statistically compared the cell type proportions between control subjects and patients with AD (Fig. S3). In the transcriptomic data, we confirmed that the proportion of inhibitory neurons in the AD group was smaller than in the CT group, and that the proportion of oligodendrocytes in the AD group was larger than in the CT group. In the proteomic data, we did not observe any statistically significant changes in the cell type proportion between the two group. In the metabolomic data, we found that the proportion of inhibitory neurons in the AD group was smaller than in the CT group (pg. 6, lines 8-11).

      Differential analysis covariates. For the differential expression analyses, only gender and PMI were included as covariates. Additional variables, such as age at death, RIN, neuropathological measures, and comorbidities, can strongly influence molecular profiles and should be considered to ensure that the observed differences reflect AD-related biology rather than confounding pathological or technical factors.

      We appreciate the reviewer's comment regarding the included covariates in differential analyses of our study. The reason we did not include other variables, including age at death and RIN, is that these data for each sample were not available. Thus, we referred to original research articles from which proteomic or metabolomic datasets used in our study were derived. Regarding the metabolomic dataset, in the original article (Batra et al., 2023), only two metabolites, 1-methyl-5-imidazoleacetate and N6-carboxymethyllysine, were significantly associated with age. In addition, no metabolites were significantly associated with sex, BMI, or education. Regarding the proteomic dataset, in the original article, age at death, PMI, and sex were included as covariates in the analyses, though these variables were not found to strongly influence the data (Extended Data Fig.2 in (Johnson et al., 2020)).

      Network stability and sample non-overlap. Proteomic, transcriptomic, and metabolomic data come from partially overlapping individuals. The authors should test whether the reconstructed network is robust to: different significance thresholds, restricting analyses to overlapping samples and alternative definitions of AD vs control.

      __ __We appreciate the reviewer's comment for the trans-omic network stability. In our study, the number of individuals for whom all omic modalities were measured was relatively small (n=25 in CT and n=35 in AD). This limited overlap reduces statistical power and can affect the downstream network construction. We have acknowledged this limitation in the revised manuscript and clarified that the reconstructed networks should be interpreted with caution regarding reproducibility and generalizability (pg. 13, lines 13-23).

      Minor Comments

      1. Some TF enrichment and regulatory inferences lack explicit mention of multiple-testing correction.

      We apologize for the lack of clarity in our original description. We have corrected for multiple-testing for the TF inference. Thus, we have revised the Methods section to explicitly describe the correction method used and the threshold applied (pg. 23, lines 23-24).

      The limitations section is strong but should explicitly discuss the influence of postmortem interval on metabolite levels.<br />

      We appreciate the reviewer's comment about the effect of postmortem interval on changes in metabolite levels. Accordingly, we have added the description of this perspective in our revised manuscript (pg. 13, lines 1-5).

      __*Reviewer #2 (Significance (Required)):

      Significance *__

      The study extends a trans-omic integration framework, originally applied to metabolic disease, into the context of Alzheimer's pathology. Although the biological findings largely confirm known alterations in mitochondrial and energy metabolism, the network-based approach offers a structured way to view cross-layer regulatory changes. Its main advance is conceptual rather than biological, providing a unified framework rather than uncovering fundamentally new mechanisms. This work will primarily interest researchers in neurodegeneration and systems biology, as well as computational groups developing multi-omics integration methods.

      Response to Reviewer #3's Comments


      Evidence, reproducibility and clarity

      This study leverages existing transcriptomic, metabalomic and proteomic datasets from prefrontal cortex (PFC) to assess metabolic dysregulation in Alzheimer's disease (AD). They found a downregulation of multiple metabolic pathways, including TCA cycle, oxidative phosphorylation, and ketone metabolism, that may explain bioenergetic alterations in AD. The study used matching ROSMAP omics datasets from the DLPFC that have allowed more robust data integration. However, the datasets are all generated using bulk tissue, which makes data interpretation difficult. For example, the AD changes they observed may be due to shifts in cell type proportion with disease (e.g. cell death, neuron inflammation). Did the authors account for any potential shifts in cell type proportion in their analysis?* *

      __If the assumption is that the changes in AD are cell intrinsic, which cell types are likely to be impacted? Can the authors integrate any existing single-cell analysis to infer which cell types may be driving the signals they detect, and whether this accounts for some of the antagonistic regulatory effects that were detected?______

      We thank the reviewer for their insightful comments. We agree that the use of bulk tissue datasets cannot account for cell-type heterogeneity. As noted in our Limitations section (pg. 12, lines 24-27), we recognize that previous studies have found that the Braak stage is correlated positively with microglia and astrocyte proportions and negatively with oligodendrocyte proportion (Hannon et al., 2024; Shireby et al., 2022). Regarding the integration of single-cell analysis, we have referenced recent snRNA-seq findings (Mathys et al., 2024) in our Limitations section (pg. 12, lines 28-32) to deconvolve our bulk signatures.

      Furthermore, in our revised manuscript, we additionally used public single-cell transcriptomic datasets, which were obtained from DLPFC tissue of 465 subjects in the ROSMAP cohort (Green et al., 2024). For each omic data that we used in our analyses, we matched the same subjects and calculated the following cell type proportions, astrocytes, excitatory neurons, inhibitory neurons, microglia, oligodendrocytes, and OPCs. Then, we statistically compared the cell type proportions between control subjects and patients with AD (Fig. S3). In the transcriptomic data, we confirmed that the proportion of inhibitory neurons in the AD group was smaller than in the CT group, and that the proportion of oligodendrocytes in the AD group was larger than in the CT group. In the proteomic data, we did not observe any statistically significant changes in the cell type proportion between the two groups. In the metabolomic data, we found that the proportion of inhibitory neurons in the AD group was smaller than in the CT group (pg. 6, lines 8-11).

      Significance

      The manuscript provides multimodal insight into metabolic dysregulation in AD in the PFC. Given that metabolic dysfunction is likely to play a major in disease pathogenesis, this is a study of importance. However, the findings lack granularity at the cell type level, which limits the impact of the study.

      Reference

      1. Baloni, P., Arnold, M., Buitrago, L., Nho, K., Moreno, H., Huynh, K., Brauner, B., Louie, G., Kueider-Paisley, A., Suhre, K., Saykin, A. J., Ekroos, K., Meikle, P. J., Hood, L., Price, N. D., Alzheimer's Disease Metabolomics Consortium, Doraiswamy, P. M., Funk, C. C., Hernández, A. I., ... Kaddurah-Daouk, R. (2022). Multi-Omic analyses characterize the ceramide/sphingomyelin pathway as a therapeutic target in Alzheimer's disease. Communications Biology, 5(1), 1074.
      2. Baloni, P., Funk, C. C., Yan, J., Yurkovich, J. T., Kueider-Paisley, A., Nho, K., Heinken, A., Jia, W., Mahmoudiandehkordi, S., Louie, G., Saykin, A. J., Arnold, M., Kastenmüller, G., Griffiths, W. J., Thiele, I., Alzheimer's Disease Metabolomics Consortium, Kaddurah-Daouk, R., & Price, N. D. (2020). Metabolic Network Analysis Reveals Altered Bile Acid Synthesis and Metabolism in Alzheimer's Disease. Cell Reports. Medicine, 1(8), 100138.
      3. Batra, R., Arnold, M., Wörheide, M. A., Allen, M., Wang, X., Blach, C., Levey, A. I., Seyfried, N. T., Ertekin-Taner, N., Bennett, D. A., Kastenmüller, G., Kaddurah-Daouk, R. F., Krumsiek, J., & Alzheimer's Disease Metabolomics Consortium (ADMC). (2023). The landscape of metabolic brain alterations in Alzheimer's disease. Alzheimer's & Dementia: The Journal of the Alzheimer's Association, 19(3), 980-998.
      4. Batra, R., Krumsiek, J., Wang, X., Allen, M., Blach, C., Kastenmüller, G., Arnold, M., Ertekin-Taner, N., Kaddurah-Daouk, R., & Alzheimer's Disease Metabolomics Consortium (ADMC). (2024). Comparative brain metabolomics reveals shared and distinct metabolic alterations in Alzheimer's disease and progressive supranuclear palsy. Alzheimer's & Dementia: The Journal of the Alzheimer's Association, 20(12), 8294-8307.
      5. Cahill, K. M., Huo, Z., Tseng, G. C., Logan, R. W., & Seney, M. L. (2018). Improved identification of concordant and discordant gene expression signatures using an updated rank-rank hypergeometric overlap approach. Scientific Reports, 8(1), 9588.
      6. Fröhlich, A. S., Gerstner, N., Gagliardi, M., Ködel, M., Yusupov, N., Matosin, N., Czamara, D., Sauer, S., Roeh, S., Murek, V., Chatzinakos, C., Daskalakis, N. P., Knauer-Arloth, J., Ziller, M. J., & Binder, E. B. (2024). Single-nucleus transcriptomic profiling of human orbitofrontal cortex reveals convergent effects of aging and psychiatric disease. Nature Neuroscience, 27(10), 2021-2032.
      7. Green, G. S., Fujita, M., Yang, H.-S., Taga, M., Cain, A., McCabe, C., Comandante-Lou, N., White, C. C., Schmidtner, A. K., Zeng, L., Sigalov, A., Wang, Y., Regev, A., Klein, H.-U., Menon, V., Bennett, D. A., Habib, N., & De Jager, P. L. (2024). Cellular communities reveal trajectories of brain ageing and Alzheimer's disease. Nature, 633(8030), 634-645.
      8. Hannon, E., Dempster, E. L., Davies, J. P., Chioza, B., Blake, G. E. T., Burrage, J., Policicchio, S., Franklin, A., Walker, E. M., Bamford, R. A., Schalkwyk, L. C., & Mill, J. (2024). Quantifying the proportion of different cell types in the human cortex using DNA methylation profiles. BMC Biology, 22(1), 17.
      9. Johnson, E. C. B., Carter, E. K., Dammer, E. B., Duong, D. M., Gerasimov, E. S., Liu, Y., Liu, J., Betarbet, R., Ping, L., Yin, L., Serrano, G. E., Beach, T. G., Peng, J., De Jager, P. L., Haroutunian, V., Zhang, B., Gaiteri, C., Bennett, D. A., Gearing, M., ... Seyfried, N. T. (2022). Large-scale deep multi-layer analysis of Alzheimer's disease brain reveals strong proteomic disease-related changes not observed at the RNA level. Nature Neuroscience, 25(2), 213-225.
      10. Johnson, E. C. B., Dammer, E. B., Duong, D. M., Ping, L., Zhou, M., Yin, L., Higginbotham, L. A., Guajardo, A., White, B., Troncoso, J. C., Thambisetty, M., Montine, T. J., Lee, E. B., Trojanowski, J. Q., Beach, T. G., Reiman, E. M., Haroutunian, V., Wang, M., Schadt, E., ... Seyfried, N. T. (2020). Large-scale proteomic analysis of Alzheimer's disease brain and cerebrospinal fluid reveals early changes in energy metabolism associated with microglia and astrocyte activation. Nature Medicine, 26(5), 769-780.
      11. Maitra, M., Mitsuhashi, H., Rahimian, R., Chawla, A., Yang, J., Fiori, L. M., Davoli, M. A., Perlman, K., Aouabed, Z., Mash, D. C., Suderman, M., Mechawar, N., Turecki, G., & Nagy, C. (2023). Cell type specific transcriptomic differences in depression show similar patterns between males and females but implicate distinct cell types and genes. Nature Communications, 14(1), 2912.
      12. Mathys, H., Boix, C. A., Akay, L. A., Xia, Z., Davila-Velderrain, J., Ng, A. P., Jiang, X., Abdelhady, G., Galani, K., Mantero, J., Band, N., James, B. T., Babu, S., Galiana-Melendez, F., Louderback, K., Prokopenko, D., Tanzi, R. E., Bennett, D. A., Tsai, L.-H., & Kellis, M. (2024). Single-cell multiregion dissection of Alzheimer's disease. Nature, 632(8026), 858-868.
      13. Novotny, B. C., Fernandez, M. V., Wang, C., Budde, J. P., Bergmann, K., Eteleeb, A. M., Bradley, J., Webster, C., Ebl, C., Norton, J., Gentsch, J., Dube, U., Wang, F., Morris, J. C., Bateman, R. J., Perrin, R. J., McDade, E., Xiong, C., Chhatwal, J., ... Harari, O. (2023). Metabolomic and lipidomic signatures in autosomal dominant and late-onset Alzheimer's disease brains. Alzheimer's & Dementia: The Journal of the Alzheimer's Association, 19(5), 1785-1799.
      14. Plaisier, S. B., Taschereau, R., Wong, J. A., & Graeber, T. G. (2010). Rank-rank hypergeometric overlap: identification of statistically significant overlap between gene-expression signatures. Nucleic Acids Research, 38(17), e169.
      15. Shireby, G., Dempster, E. L., Policicchio, S., Smith, R. G., Pishva, E., Chioza, B., Davies, J. P., Burrage, J., Lunnon, K., Seiler Vellame, D., Love, S., Thomas, A., Brookes, K., Morgan, K., Francis, P., Hannon, E., & Mill, J. (2022). DNA methylation signatures of Alzheimer's disease neuropathology in the cortex are primarily driven by variation in non-neuronal cell-types. Nature Communications, 13(1), 5620.
      16. Tasaki, S., Xu, J., Avey, D. R., Johnson, L., Petyuk, V. A., Dawe, R. J., Bennett, D. A., Wang, Y., & Gaiteri, C. (2022). Inferring protein expression changes from mRNA in Alzheimer's dementia using deep neural networks. Nature Communications, 13(1), 655.
      17. Varma, V. R., Wang, Y., An, Y., Varma, S., Bilgel, M., Doshi, J., Legido-Quigley, C., Delgado, J. C., Oommen, A. M., Roberts, J. A., Wong, D. F., Davatzikos, C., Resnick, S. M., Troncoso, J. C., Pletnikova, O., O'Brien, R., Hak, E., Baak, B. N., Pfeiffer, R., ... Thambisetty, M. (2021). Bile acid synthesis, modulation, and dementia: A metabolomic, transcriptomic, and pharmacoepidemiologic study. PLoS Medicine, 18(5), e1003615.
      18. Wan, Y.-W., Al-Ouran, R., Mangleburg, C. G., Perumal, T. M., Lee, T. V., Allison, K., Swarup, V., Funk, C. C., Gaiteri, C., Allen, M., Wang, M., Neuner, S. M., Kaczorowski, C. C., Philip, V. M., Howell, G. R., Martini-Stoica, H., Zheng, H., Mei, H., Zhong, X., ... Logsdon, B. A. (2020). Meta-Analysis of the Alzheimer's Disease Human Brain Transcriptome and Functional Dissection in Mouse Models. Cell Reports, 32(2), 107908.
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Matsen et al. describe an approach for training an antibody language model that explicitly tries to remove effects of "neutral mutation" from the language model training task, e.g. learning the codon table, which they claim results in biased functional predictions. They do so by modeling empirical sequence-derived likelihoods through a combination of a "mutation" model and a "selection" model; the mutation model is a non-neural Thrifty model previously developed by the authors, and the selection model is a small Transformer that is trained via gradient descent. The sequence likelihoods themselves are obtained from analyzing parent-child relationships in natural SHM datasets. The authors validate their method on several standard benchmark datasets and demonstrate its favorable computational cost.

      They discuss how deep learning models explicitly designed to capture selection and not mutation, trained on parent-child pairs, could potentially apply to other domains such as viral evolution or protein evolution at large.

      Strengths:

      Overall, we think the idea behind this manuscript is really clever and shows promising empirical results. Two aspects of the study are conceptually interesting: the first is factorizing the training likelihood objective to learn properties that are not explained by simple neutral mutation rules, and the second is training not on self-supervised sequence statistics but on the differences between sequences along an antibody evolutionary trajectory. If this approach generalizes to other domains of life, it could offer a new paradigm for training sequence-to-fitness models that is less biased by phylogeny or other aspects of the underlying mutation process.

      Thank you for your kind words.

      Weaknesses:

      Some claims made in the paper are weakly or indirectly supported by the data. In particular, the claim that learning the codon table contributes to biased functional effect predictions may be true, but requires more justification.

      Thank you for this comment, which made us realize that we had not adequately explained the key insight of Figure S3. We have expanded the caption of Figure S3 to clarify:

      “DASM selection factors match the pattern seen in experimental measurements, while masked language models show artifacts from the codon table.

      The experimental data (left two panels) show a slight decrease in median scores for amino acids requiring multiple nucleotide mutations (“multiple”) versus single mutations (“single”).

      DASM captures this pattern, showing similar distributions for both categories.

      In contrast, AbLang and ESM assign radically lower scores to multinucleotide amino acid substitutions, consistent with the masked language modeling objective learning codon-level mutation probabilities as described in the main text (Figure 1a).”

      This figure directly supports our claim: the experimental fitness data show similar distributions for single-mutation vs multiple-mutation amino acids, yet AbLang2 and ESM assign dramatically different scores to these groups, while DASM does not.

      Additionally, the paper could benefit from additional benchmarking and comparison to enhanced versions of existing methods, such as AbLang plus a multi-hit correction.

      It's an interesting idea to consider enhancing existing models. However, this approach faces some challenges. Most fundamentally, it is difficult to recast AbLang and other such models in an evolutionary framework: the masked language objective is simply not an evolutionary one. We have written a whole paper working to do this (https://doi.org/10.1371/journal.pcbi.1013758) and the results were middling despite our best efforts. Specifically regarding multihit, the effects of multihit are minor compared to the codon table effects, and those require the structure of codon-based evolutionary model.

      Further descriptions of model components and validation metrics could help make the manuscript more readable.

      We have clarified several aspects of the model in the revision: we now describe the Thrifty neutral model in the introduction, clarify the transformer architecture and wiggle activation function in the Methods, and explain the joint branch-length optimization procedure.

      In the introduction we now describe Thrifty:

      “This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F 5-mer model.”

      In the Methods we clarify the architecture:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.

      This function asymptotes to zero for highly deleterious mutations and grows sub-linearly for beneficial ones.”

      And the joint optimization:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      Reviewer #2 (Public review):

      Summary:

      Endowing protein language models with the ability to predict the function of antibodies would open a world of translational possibilities. However, antibody language models have yet to achieve breakthrough success, which large language models have achieved for the understanding and generation of natural language. This paper elegantly demonstrates how training objectives imported from natural language applications lead antibody language models astray on function prediction tasks. Training models to predict masked amino acids teaches models to exploit biases of nucleotide-level mutational processes, rather than protein biophysics. Taking the underlying biology of antibody diversification and selection seriously allows for disentangling these processes through what the authors call deep amino acid selection models. These models extend previous work by the authors (Matsen MBE 2025) by providing predictions not only for the selection strength at individual sites, but also for individual amino acid substitutions. This represents a practically important advance.

      Strengths:

      The paper is based on a deep conceptual insight, the existence of a multitude of biological processes that affect antibody maturation trajectories. The figures and writing a very clear, which should help make the broader field aware of this important but sometimes overlooked insight. The paper adds to a growing literature proposing biology-informed tweaks for training protein language models, and should thus be of interest to a wide readership interested in the application of machine learning to protein sequence understanding and design.

      Thank you for your kind words.

      Weaknesses:

      Proponents of the state-of-the-art protein language models might counter the claims of the paper by appealing to the ability of fine-tuning to deconvolve selection and mutation-related signatures in their high-dimensional representation spaces. Leaving the exercise of assessing this claim entirely to future work somewhat diminishes the heft of the (otherwise good!) argument.

      This is an interesting idea! However, it seems to us that this approach has some fundamental limitations. Existing models operate on amino acid sequences with no nucleotide representation, so while they can be implicitly biased by the codon table, they have no signal to separate selection from effects related to the codon table and SHM rates.

      We interpret this comment as proposing that we could use fine-tuning on functional data to pull out the selection components (that would only affect the functional data) versus the mutation component. That sounds like an interesting research project. We would be concerned that there are correlations between mutability and selective effects (e.g., CDRs are both more mutable and under different selection), creating identifiability problems unless separate data sources are used as we do here.

      Additionally, the fine-tuning approaches we are aware of are taskspecific: they require labeled data from a specific assay (binding to antigen X, expression in system Y) that may or may not relate to the general evolutionary selection signal. Also, such approaches are limited to the specific data used and may not do a good job of guiding the model to a signal that is not present in the training data.

      By structuring the model as we do, we obtain the evolutionary interpretation directly from phylogenetic signal without requiring taskspecific supervision.

      In the context of predicting antibody binding affinity, the modeling strategy only allows prediction of mutations that improve affinity on average, but not those which improve binding to specific epitopes.

      We agree, and this is fundamental to any general purpose model. Predictions of binding patterns for a specific target requires information about that target to be specified in the training data. We look forward to developing such task-specific models in the future.

      We have added a paragraph to the Discussion clarifying this limitation:

      “The current generation of DASM model does not use any antigen-labeled training data.

      The signal that it leverages to infer some limited ability to predict binding comes from natural affinity maturation.

      This affinity maturation comes through natural repertoires and so represents a mix of all of the antigens to which the sampled individuals have been exposed.”

      Reviewer #3 (Public review):

      Summary:

      This work proposes DASM, a new transformer-based approach to learning the distribution of antibody sequences which outperforms current foundational models at the task of predicting mutation propensities under selected phenotypes, such as protein expression levels and target binding affinity. The key ingredient is the disentanglement, by construction, of selection-induced mutational effects and biases intrinsic to the somatic hypermutation process (which are embedded in > a pre-trained model).

      Strengths:

      The approach is benchmarked on a variety of available datasets and for two different phenotypes (expression and binding affinity). The biologically informed logic for model construction implemented is compelling, and the advantage, in terms of mutational effects prediction, is clearly demonstrated via comparisons to state-of-the-art models.

      Thank you.

      Weaknesses:

      The gain in interpretability is only mentioned but not really elaborated upon or leveraged for gaining insight.

      We are also excited about the ability of these models to provide interpretable predictions. We have dedicated an entire paper to this direction: “A Sitewise Model of Natural Selection on Individual Antibodies via a Transformer-Encoder" in MBE (https://doi.org/10.1093/molbev/msaf186). The interpretations offered by that paper overturn some of the oversimplified dogma about how natural selection works in antibodies (purifying in FWK and diversifying in CDR), giving a more nuanced sitewise perspective. The paper also highlights the importance of specific structural features of the antibodies.

      This eLife paper, on the other hand, is focused on comparison to antibody language models and benchmarking zero-shot prediction on functional tasks.

      We have better highlighted this new paper in our revision with:

      “We have dedicated a companion paper to leveraging this interpretability to provide new perspectives on the operating rules of affinity maturation (Matsen et al., MBE 2025): that work provides a nuanced sitewise perspective on natural selection in antibodies that challenges classical oversimplified views of selection patterns.”

      The following aspects could have been better documented: the hyperparametric search to establish the optimal model; the predictive performance of baseline approaches, to fully showcase the gain yielded by DASM.

      We appreciate the concern and the desire to reveal all the factors that lead to a strong performance result. For this particular paper, we feel that this is less of a concern because we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. We now describe how other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      Regarding baseline approaches, our previous paper includes comparisons to simpler models for the evolutionary objective. Here we focus on comparison to antibody language models for functional prediction. Comparing between state-of-the-art models is the standard practice for papers in this field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      We recommend modest amounts of revision, discussed below:

      Major comments:

      (1) In the first section of the results, there is extensive discussion on shortcomings of existing antibody language models like AbLang2 that seems to associate all of the performance gap with the inability to separate non-synonymous mutations separated by 1 or 2+ substitutions.

      In reality, some of the lower likelihoods in the 2+ substitution case could actually reflect real fitness deficits (while others could indeed be rarer occurrences in the training data). The authors should either moderate these claims or do an analysis that leverages antibody deep mutational scanning data to show that, conditioned on the fitness of the antibody (probably expression) being the same (either all high or all low), AbLang2 still artefactually considers rarer-training/less-codon-accessible variants to be less fit.

      As described above, we believe that this is addressed by Figure S3, but if not please correct us.

      (2) Some in the machine learning for antibody community might view the set of benchmarked datasets to be incomplete and somewhat arbitrarily selected, though we do think this is a good start, and the results are promising. A dataset commonly used in this field that is missing from this paper is from Shehata et al. (https://pubmed.ncbi.nlm.nih.gov/31553901/). A binding affinity experiment that is also commonly used in the field is from Phillips et al. (https://elifesciences.org/articles/71393) - this dataset measures combinatorial changes of framework regions on binding, which may be especially relevant here.

      We're glad to have the opportunity to clarify this, thanks.

      We based our evaluations on the April 2024 version of the FLAb benchmarking project (https://doi.org/10.1101/2024.01.13.575504) which preceded our work and thus was not subject to selection bias by us. We took the largest data sets in that repository. After this we became aware of the rich data sets offered by the Whitehead lab that provided binding measurements for many variants for a number of antigens, and added that to the evaluation set.

      We have clarified this in the manuscript:

      “We based our evaluations on the April 2024 version of the FLAb benchmarking project, which preceded our work and thus was not subject to selection bias by us.

      We also benchmarked high-throughput binding data (more recent than FLAb) from the Whitehead lab that provided affinity measurements across many variants and antigens.”

      The Shehata dataset is interesting but doesn't fit so much in the DASM mold: it is a survey of biophysical properties across many independent antibodies rather than a deep investigation of point mutants of a smaller collection of focal antibodies.

      FLAb has grown to include the Phillips dataset. We are working full-tilt on the next version of DASM and will be including many other datasets in our paper on DASM2. Thanks for the tip!

      (3) Similar to the above comment, we were also extremely curious as to why the authors did not test data from DeWitt et al. (https://pubmed.ncbi.nlm.nih.gov/40661619/). Instead, the authors only make a cryptic reference to this study on lines 201-6, but we could not even find a figure describing the results discussed on these lines. It would be great to actually include this data.

      We agree, however, our model is for human rather than mouse. We would like to train a mouse model in the future but have not yet lined up the appropriate data.

      (4) The authors should comment on potential data leakage if the SHM trajectories used in training have a similar sequence or antigen similarity to the benchmark expression/binding datasets.

      This is a good question that we should clarify. Our model is trained only on evolutionary trajectories and not functional data. Evaluation is then done on functional data without fine-tuning. Because these evaluation data are categorically different from the training data and thus data leakage is not a problem. Recall that our model is zero-shot: it only considers evolutionary trajectories and not functional data as such. In a similar way, other self-supervised models such as MLMs do not exclude seeing an antibody in the training data when they are doing functional prediction.

      We have clarified this in the manuscript with

      “Because the DASM is trained exclusively on evolutionary trajectories rather than functional measurements, evaluation on expression and binding benchmarks is strictly zero-shot with no risk of data leakage.”

      Relatedly, what happens if this approach is applied to completely de novo antibodies?

      We direct this reviewer to the Shanehsazzadeh dataset that involves antibodies that were suggested by an AI algorithm rather than observed in nature.

      If the reviewer is referring to completely synthetic antibody molecules, such as those generated by inverse folding, we have not attempted this.

      (5) It makes sense that you included the multihit correction as a response to your earlier instantiation (without this correction) underestimating the probabilities of multiple mutations in a codon associated with a single amino acid substitution (lines 476-477).

      However, this could potentially make for a somewhat unfair comparison to existing methods: if, say, we took AbLang (or another comparator) and also applied a multi-hit correction (even in some naive way at inference time), how would that compare to DASM? If this comparison favors DASM, it would show that models need more than just such a correction on top of existing methods to do good sequence scoring--which would only amplify the impact of the results.

      Thank you for this suggestion. We believe that we have addressed it in the response to the public reviews, but please let us know if not.

      Minor comments:

      (1) It would be worth explicitly defining/summarizing the mutation model used in the study, e.g. giving an overview of Thrifty in the introduction or where it first appears.

      Thanks, we have done this:

      “Our approach separates mutation and selection processes by encoding functional effects in a Deep Amino acid Selection Model (DASM) while explicitly modeling mutation using a separate fixed model trained on neutrally evolving data.

      This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F (Yaari et al., 2013) 5-mer model.”

      (2) Paragraph starting on line 58: it sounds like you're suggesting that masked deep learning models will learn certain features of genomes in a certain order. We suggest that you weaken the language, giving examples of various things the model could learn, not implying that such models will necessarily learn the most useful features after the less useful ones.

      We have fixed this by removing the "First... Second... Third... Finally" ordering:

      “It could memorize the germline genes and learn about the probabilities of V(D)J recombination.

      It could learn the codon table, as according to this table some aminoacid mutations are much more likely than others. It could learn rates of somatic hypermutation...

      It could also learn about the impact of amino acid mutations on antibody function through natural selection in the course of affinity maturation, which is the desired signal.

      However, this desired signal is confounded by the preceding factors.”

      (3) Line 72: You make a strong claim that existing models conflate mutation and selection without knowing for sure that they didn't successfully learn these components separately (it seems this would require a lot of mechanistic interpretability). The language could be softened here.

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (4) Line 79: Say a bit more about the separate fixed mutation model here. Why shouldn't we worry about this choice (especially the word "fixed") biasing your results? Does the empirical performance of your method suggest this doesn't really matter?

      We have added to the description of the fixed mutation model, as described above.

      As described in the public response, training SHM models on out-of-frame sequences is an established methodology for characterizing mutation in the absence of selection. In principle one could jointly train a model of SHM and selection, but one could have identifiability problems as there is a correlation between more mutable sites (e.g. in the CDRs) and those under relaxed selection. Using out-of-frame sequences gives a clean an independent description of the SHM process.

      (5) Line 81: on what benchmarks does it outperform? State briefly.

      Great suggestion. Done:

      “The DASM, trained on substantially less data, outperforms AbLang2 and general protein language models including ESM2 and ProGen2-small. This outperformance holds on the largest benchmark datasets of the FLAb collection and on recent high-throughput binding assays.”

      (6) Paragraph starting on line 90: The topic sentence reads a bit vague to us. Do you mean that you want to learn the extent to which models are regurgitating nucleotide similarity of AAs in determining the scores associated with AAs at masked sites?

      Thank you. We have updated to

      "We first sought to understand the extent to which processes such as neutral mutation rate and the codon table influence antibody language model prediction at masked sites."

      (7) Paragraph starting on line 108: feels speculative and maybe better for the discussion...

      We appreciate this comment, but we have decided to keep the content where it is. Although this would make sense as a Discussion item we feel like it fits well here right next to the evidence, and the structure of our Discussion doesn't really have a place for it.

      (8) Paragraph starting on line 116: don't say "sequences from [12]" or "method of [15]." Explain what these are before giving the citation.

      Whoops! Thanks. We have fixed these.

      (9) Line 134: Consider giving a brief definition of perplexity?

      Thanks. We added our favorite definition:

      “Perplexity (as defined in the Methods) is the standard way of evaluating the plausibility of a sequence according to a model: it is the acrosssite geometric mean of the inverse probability of the observed amino acid.”

      (10) Line 154: A citation here could be useful to support the claim that these models are learning phylogeny.

      We have replaced with the more clearly established "codon table":

      “We implemented a model to learn amino-acid preferences of antibodies without being influenced by germline genes, the codon table, or SHM biases.”

      (11) Lines 161-162: Given that phylogenetic inference methods can be tough to scale, we're curious how you managed to get 2 million PCPs from the data? Did you construct a bunch of different phylogenies (in > parallel)?

      Indeed! We now clarify in the methods section that these trees were run in parallel across clonal families:

      “As in our previous work, tree inference and ancestral sequence reconstruction were performed per clonal family with the K80 substitution model...

      Because these clonal families are independent these phylogenetic inferences were run in parallel.”

      (12) Line 173-174: Can you say more about the joint optimization of the branch lengths? Are you conditioning on a phylogenetic tree topology only, and leaving the branch lengths unknown? Do you account for the fact that these branch lengths in the same phylogenetic tree aren't independent?

      Thanks for pointing out the need to clarify these points. We have done so in the methods section and provided a pointer to the methods section in the main text.

      In the main text we now say:

      “We trained DASMs of several sizes (~1M, ~4M, ~7M) using joint optimization of branch length t and parameters of the DASM (see Methods for details).”

      And in the Methods:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      (13) Line 358: Yes, in a trivial sense, separating mutation and selection means that we know exactly how each of those two components has been learned. We would be curious if you could say anything about mechanistic interpretability within the deep learning selection model. If not, could this be a future research direction?

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (14) Lines 384-386--indeed. Do you have any proposals for how a phylogeny could be constructed at this scale?

      As above this is not one big phylogeny but many, which invites parallelization.

      Reviewer #2 (Recommendations for the authors):

      (1) I agree that a full study of fine-tuning strategies for all possible alternative models is beyond the scope of the paper. However, a little bit of fine-tuning would go a long way to demonstrate how easy (or hard) it is to extract the relevant signal from a general protein language model embedding.

      As described in our response to the public reviews, we appreciate this point but have decided to focus on the core novelty of the paper and leave fine-tuning experiments to future work.

      (2) The authors might want to add some discussion about what signals their models capture with regard to binding affinity (averages), and how this limitation might be addressed in future work.

      As described in our response to the public reviews, we have added a paragraph to the Discussion clarifying this limitation.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: I think more references have to be provided re: Antibody "foundation" language models, e.g. adding AntiBERTy and the two versions of AntiBERTa.

      We have added citations to those two models, although we weren't sure what the second version of AntiBERTa was. There are very many antibody language models. If we could use number ranges we would cite a dozen or more, but I hesitate to add many of them in the eLife format, which has parenthetical citations. If there are others that you consider essential don't hesitate to suggest them.

      (2) A key point of the approach is the disentanglement of “mutation” and “selection”, as mentioned in the introduction. However, the explanation of what the authors mean by mutation and selection comes only later. I would anticipate it in the introduction for clarity.

      This is a great point. The revised intro has this in the second sentence:

      “Natural antibodies are generated through V(D)J recombination, and refined by somatic hypermutation and affinity-based selection in germinal centers.”

      and the "While the masked..." paragraph now more clearly calls out selection.

      (3) Line 133: expression of what? Could the authors also explain mechanistically why expression should be impacted by a mutation? In what conditions do these data sample expression?

      We have clarified that it is expression in a phage display library:

      “To do so, we used the largest dataset of the FLAb collection of benchmarks, which measures the effect of single mutations on expression in a phage display library.”

      (4) Line 142: Clarify that 0.49 and 0.3 are correlation coefficients. Also, what type of correlation coefficient is this?

      Thanks for the catch! They are Pearson correlations as we now describe.

      (5) Line 173: The hyperparametric search should have been more documented (with a description of how it was carried out and plots).

      As described in our response to the public reviews, we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. Other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      (6) Line 358: The authors say that 'DASMs provide direct interpretability'. However, this is not really inspected. A valuable addition would be to show how such interpretability is made possible, how it can recapitulate existing biological knowledge or provide hints for antibody engineering.

      As described above, this is addressed in detail in our previous paper.

      (7) Line 398: 'Inferred insertions or deletions were reversed, so that all sequences align to the naive sequence without gaps.' Could the authors comment on whether this is a limitation of the approach, why it wasn't dealt with and whether it could be the direction of future work?

      Funny you should mention this! We have been planning out such an extension in detail recently. We have added a sentence in the discussion:

      “We also have plans to extend the DASM framework to estimate the effect of natural selection on insertion and deletion events.”

      (8) Line 430-431: Could the authors clarify 'shared' over what? Also, I believe these two lines really describe the DASM architecture. This should be spelt out more clearly and tied to the description provided in lines 173-175. A diagram of the architecture would be a valuable addition to provide a full picture of the model (this could be added to the general diagram of the modelling approach of Figure S8).

      We have clarified in the text that this is indeed a description of the DASM architecture -- thanks for the catch:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.”

      The architecture is very “stock” - just the default torch TransformerEncoder, so I don't think that it merits a diagram. We have expanded our discussion of the simple architecture in the revision. This sits in contrast to the setup for the loss function, which is quite custom and is the subject of Figure 2 and Figure S8.

      (9) Another general remark is that, to fully showcase the predictive advantage offered by DAMS with all the modelling choices entailed, one could show the performance of simpler models, like the mutation model alone (with no selection factors), or models where selection factors are just learnt independently for each site, or are learnt with a simple linear layer instead of a transformer (these are just ideas of some simpler approach that can set baselines over which DASM improvement can be shown).

      This is a great suggestion. The primary focus of this paper is in comparing to alternate antibody language models in terms of functional prediction.

      These simpler models could be used for comparing the evolutionary objective, which we did in our previous paper (https://doi.org/10.1093/molbev/msaf186). We note that a sitewise model with fixed sites cannot really be appropriately formulated due to sequences being of different lengths.

      Additional changes

      In addition to the reviewer-requested changes, we added a comparison of ESM2 model sizes (650M vs 3B parameters) on the Koenig benchmark. We found that scaling ESM2 from 650M to 3B parameters did not improve performance. Indeed, the larger model showed slightly degraded correlations, particularly for light chain predictions. This is consistent with recent observations that medium-sized protein language models can outperform larger ones on transfer learning tasks (Vieira et al., Sci. Rep. 2025). We added Table S2 documenting these results and cite this finding in the main text to justify our use of the 650M model throughout the analyses. After doing this, we realized for the Shanehsazzadeh evaluation we had accidentally used ESM2-3B instead of ESM2-650M. The corrected ESM2-650M values are slightly lower (0.191 and 0.308 for sequence lengths 119 and 120, respectively, compared to the previous values of 0.248 and 0.337). This correction does not affect our conclusions, as DASM substantially outperforms ESM2 on this benchmark before and after the change.

      We also realized in the course of revision that we had been scoring AbLang2 using the masked-marginals pseudo-perplexity approach for the single-mutant Koenig dataset (Figure 1c), rather than the standard persequence pseudo-perplexity used elsewhere in the paper. For maskedmarginals, probabilities are computed using only wild-type context, whereas standard pseudo-perplexity uses each variant's own context.

      The masked-marginals approach has a simple interpretation: for singlemutation variants, it is a linear transformation of the log ratio of the variant amino acid probability to the wild-type amino acid probability, both evaluated under wild-type context. This log-odds ratio directly measures how much the model prefers the mutation over the original residue.

      We found that masked-marginals performed better for AbLang2 on this dataset, so we continued using it for Figure 1c. However, for the benchmarking table (Table 1), we switched to per-sequence pseudoperplexity as for the other comparisons in the paper, following the standard benchmarking protocol defined in FLAb (Chungyoun et al., 2024). We document both approaches in the Methods section:

      “An alternative “masked-marginals” approach scores variants using only wild-type context.

      For a wild-type sequence w, masked-marginals computes . for all amino acids a at each position i once, then uses these wild-type-derived probabilities to compute pseudoperplexity for any variant x...

      For a single-mutation variant x that differs from wild-type w only at position j, all terms except position j cancel when comparing to wild-type, giving . Thus, the log-probability difference between variant and wild-type amino acids equals, up to an additive constant that depends only on the wild-type sequence, negative n times the log pseudo-perplexity of the variant.

      For Figure 1c on the single-mutant Koenig dataset, we found that this approach gave a higher correlation for AbLang2 and so used it in that figure.

      For benchmarking comparisons (Table 1), we followed standard practice and used per-sequence pseudo-perplexity.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful comments and constructive suggestions. We describe how we have addressed each point below and are grateful for the guidance on areas where our work could be clarified or expanded. In particular, we note the following:

      Selection scan summary statistics: In our revised manuscript, we have included summary statistics from the selection scans. We believe this addition will enhance transparency and provide additional context for readers.

      Reporting of outliers: As highlighted by the editor, the reviewers expressed differing views on the most appropriate way to report outliers. To provide a comprehensive and balanced presentation, we now report both the empirical selection statistics and the corresponding converted p-values in either the main text or supplement, and both outputs are also provided in the full summary files. This dual approach will allow readers to fully interpret the results under both perspectives.

      Expanded discussion of admixture timing and population structure: We have carefully considered the reviewers' suggestions to incorporate additional descriptions of population structure or demographic analyses, and have done so in our revisions where possible. These changes strengthen the rigor and clarity of the analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper reports an analysis of whole-genome sequence data from 40 Faroese. The authors investigate aspects of demographic history and natural selection in this population. The key findings are that the Faroese (as expected) have a small population size and are broadly of Northwest European ancestry. Accordingly, selection signatures are largely shared with other Northwest European populations, although the authors identify signals that may be specific to the Faroes. Finally, they identify a few predicted deleterious coding variants that may be enriched in the Faroes.

      Strengths:

      The data are appropriately quality-controlled and appear to be of high quality. Some aspects of the Faroese population history are characterized, in particular, by the relatively (compared to other European populations) high proportion of long runs of homozygosity, which may be relevant for disease mapping of recessive variants. The selection analysis is presented reasonably, although as the authors point out, many aspects, for example differences in iHS, can reflect differences in demographic history or population-specific drift and thus can't reliably be interpreted in terms of differences in the strength of selection.

      Weaknesses:

      The main limitations of the paper are as follows:

      (1) The data are not available. I appreciate that (even de-identified) genotype data cannot be shared; however, that does substantially reduce the value of the paper. Minimally, I think the authors should share summary statistics for the selection scans, in line with the standard of the field.

      We agree with the reviewer that sharing the selection scan results is important, so we have now made the selection scan summary statistics publicly available, and clearly lay out the guidelines and research questions for which the data can be accessed in our Data Availability statement.

      (2) The insight into the population history of the Faroes is limited, relative to what is already known (i.e., they were settled around 1200 years ago, by people with a mixture of Scandinavian and British ancestry, have a small effective population size, and any admixture since then comes from substantially similar populations). It's obvious, for example, that the Faroese population has a smaller bottleneck than, say, GBR.

      More sophisticated analyses (for example, ARG-based methods, or IBD or rare variant sharing) would be able to reveal more detailed and fine-scale information about the history of the populations that is not already known. PCA, ADMIXTURE, and HaplotNet analysis are broad summaries, but the interesting questions here would be more specific to the Faroes, for example, what are the proportions of Scandinavian vs Celtic ancestry? What is the date and extent of sex bias (as suggested by the uniparental data) in this admixture? I think that it is a bit of a missed opportunity not to address these questions.

      We clarify that we did quantify the proportions of various ancestry components as estimated by HaploNet in main text Figure 5 and supplemental figures S6 and S7. To better highlight this result, we now also include the average global ancestry of the various components in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes.

      We agree that more fine-scale demographic analyses would be informative. We now additionally provide an estimation of the admixture date in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes and discussion using the DATES software which is optimized for ancient genomes.

      We have encountered problems with using different standard date estimation software, including DATES, which give very inconsistent and unstable results. As we note in our text, we suspect this might be due to the strong bottleneck experienced in the history of the Faroe Islands, low LD differentiation between the source populations, or multiple pulses of admixture, which may be breaking one or more of the assumptions of these methods. Assessing the limitations of these methods is beyond the scope of this current manuscript; however, we will continue working on this problem for future studies, possibly using simulations to assess where the problem might be. We recognize that our relatively small sample size places limits on the fine-scale demographic analyses that can be performed. We are addressing this in ongoing work by generating a larger cohort, which we hope will enable more detailed inference in the future.

      (3) I don't really understand the rationale for looking at HLA-B allele frequencies. The authors write that "ankylosing spondylitis (AS) may be at a higher prevalence in the Faroe Islands (unpublished data), however, this has not been confirmed by follow-up epidemiological studies". So there's no evidence (certainly no published evidence) that AS is more prevalent, and hence nothing to explain with the HLA allele frequencies?

      We agree that no published studies have confirmed a higher prevalence of ankylosing spondylitis (AS) in the Faroe Islands. Our recruitment data suggest that AS might be more common than in other European populations, but we understand that this is only based on limited, unpublished observations and what we are hearing from the community. We emphasized in our original manuscript that this is based on observational evidence from the FarGen project. However, as this reviewer pointed out, we can be more clear that this prevalence has not been formally studied.

      In revision, we clarify in the Main Text - Results - HLA-B Allele Frequencies and Discussion that our recruitment data suggest a higher prevalence of AS may be possible, but more formal epidemiological studies are needed to confirm this observation. The reason we study HLA-B allele frequencies is to see if the genetic background of the Faroese population could help explain this possible difference, since HLA-B27 is already known to play a strong role in AS.

      Reviewer #2 (Public review):

      In this paper, Hamid et al present 40 genomes from the Faroe Islands. They use these data (a pilot study for an anticipated larger-scale sequencing effort) to discuss the population genetic diversity and history of the sample, and the Faroes population. I think this is an overall solid paper; it is overall well-polished and well-written. It is somewhat descriptive (as might be expected for an explorative pilot study), but does make good use of the data.

      The data processing and annotation follows a state-of-the-art protocol, and at least I could not find any evidence in the results that would pinpoint towards bioinformatic issues having substantially biased some of the results, and at least preliminary results lead to the identification of some candidate disease alleles, showing that small, isolated cohorts can be an efficient way to find populations with locally common, but globally rare disease alleles.

      I also enjoyed the population structure analysis in the context of ancient samples, which gives some context to the genetic ancestry of Faroese, although it would have been nice if that could have been quantified, and it is unfortunate that the sampling scheme effectively precludes within-Faroes analyses.

      We note that although the ancestry proportions were not originally specified in the main text, we did quantify ancestry proportions in the modern Faroese individuals and other ancient samples, and we visualized these proportions in Figure 5 and Supplementary Figures S6 and S7. As stated in our response to Reviewer #1, in our revisions, we now more clearly state the average global ancestry of the various components in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes.

      I am unfortunately quite critical of the selection analysis, both on a statistical level and, more importantly, I do not believe it measures what the authors think it does.

      Major comments:

      (1) Admixture timing/genomic scaling/localization:

      As the authors lay out, the Faroes were likely colonized in the last 1,000-1,500 years, i.e., 40-60 generations ago. That means most genomic processes that have happened on the Faroese should have signatures that are on the order of ~1-2cM, whereas more local patterns likely indicate genetic history predating the colonization of the islands. Yet, the paper seems to be oblivious to this (to me) fascinating and somewhat unique premise. Maybe this thought is wrong, but I think the authors miss a chance here to explain why the reader should care beyond the fact that the small populations might have high-frequency risk alleles and the Faroes are intrinsically interesting, but more importantly, it also makes me think it leads to some misinterpretations in the selection analysis.

      See response to point #3

      (2) ROH:

      Would the sampling scheme impact ROH? How would it deal with individuals with known parental coancestry? As an example of what I mean by my previous comment, 1MB is short enough in that I would expect most/many 1MB ROH-tracts to come from pedigree loops predating the colonization of the Faroes. (i.e, I am actually quite surprised that there isn't much more long ROH, which makes me wonder if that would be impacted by the sampling scheme).

      The sampling scheme was designed to choose 40 Faroese individuals that were representative of the different regions and were minimally related. There were no pairs of third-degree relatives or closer (pi-hat > 0.125) in either the Faroese cohort or the reference populations. It is possible that this sampling scheme would reduce the amount of longer ROHs in the population, but we should still be able to see overall patterns of ROH reflective of bottlenecks in the past tens of generations. Additionally, based on this reviewer's earlier comment, 1 Mb ROHs would still be relevant to demographic events in the last 40-60 generations given that on average 1 cM corresponds to 1 Mb in humans, though we recognize that is not an exact conversion.

      That said, the “sum total amount of the genome contained in long ROH” as we described in the manuscript includes all ROHs greater than 1Mb. Although we group all ROHs longer than 1Mb into one category in Main Text Figure 2, we now additionally provide the distribution in ROH lengths across all individuals for each cohort in a new Supplemental Figure S3. As this plot shows, there certainly are ROHs longer than 1Mb in the Faroese cohort, and on average there is a higher proportion of long ROH particularly in the 5-15 Mb range in the Faroese cohort relative to the other cohorts. As the reviewer points out, these longer ROHs are possibly indicative of a more recent or stronger bottleneck in the Faroes relative to the comparison cohorts. We highlight this result in Main Test - Results - Population Structure and Relatedness.

      (3) Selection scan:

      We are talking about a bottlenecked population that is recently admixed (Faroese), compared to a population (GBR) putatively more closely related to one of its sources. My guess would be that selection in such a scenario would be possibly very hard to detect, and even then, selection signals might not differentiate selection in Faroese vs. GBR, but rather selection/allele frequency differences between different source populations. I think it would be good to spell out why XP-EHH/iHS measures selection at the correct time scale, and how/if these statistics are expected to behave differently in an admixed population.

      The reviewer brings up good points about the utility of classical selection statistics in populations that are admixed or bottlenecked, and whether the timescale at which these statistics detect selection is relevant for understanding the selective history of the Faroese population. We break down these concerns separately.

      (1) Bottlenecks: Recent bottlenecks result in higher LD within a population. However, demographic events such as bottlenecks affect global genomic patterns while positive selection is expected to affect local genomic patterns. For this reason, iHS and XP-EHH statistics are standardized against the genome-wide background, to account for population-specific demographic history.

      (2) Admixture: The term “admixture” has different interpretations depending on the line of inquiry and the populations being studied. Across various time and geographic scales, all human populations are admixed to some degree, as gene flow between groups is a common fixture throughout our history. For example, even the modern British population has “admixed” ancestry from North / West European sources as well, dating to at least as recently as the Medieval & Viking periods (Gretzinger et al. 2022, Leslie et al. 2015), yet we do not commonly consider it an “admixed” population, and we are not typically concerned about applying haplotype-based statistics in this population. This is due to the low divergence between the source populations. In the case of the Faroe Islands, we believe admixture likely occurred on a similar timescale or even earlier, based on the DATES estimates. We see low variance in ancestry proportions estimated by HaploNet, both from the historical Faroese individuals (dated to 260 years BP) and the modern samples. This indicates admixture predating the settlement of the Faroe Islands, where recombination has had time to break up long ancestry tracts and the global ancestry proportions have reached an equilibrium. That is, these ancestry patterns suggest that the modern Faroese are most likely descended from already admixed founders. In the original manuscript, we mentioned this as a likely possibility in the Main Text - Discussion: “This could have occurred either via a mixture of the original “West Europe” ancestry with individuals of predominantly “North Europe” ancestry, or a by replacement with individuals that were already of mixed ancestry at the time of arrival in the islands (the latter are not uncommon in Viking Age mainland Europe).” In our revisions, we further included the DATES estimations of the timing of admixture in the modern and historical Faroese samples, which pre-date the timing of settlement in both cases. We highlight these points in the Discussion. And, as with the case of the British population, the closely-related ancestral sources for the Faroese founders were likely not so diverged as to have differences in allele frequencies and long-range haplotypes that would disrupt signals of selection from iHS or XP-EHH.

      (3) Time scale: It is certainly possible, and in fact likely, that iHS measures selection older than the settlement of the Faroe Islands. In our manuscript, we calculated iHS in both the Faroese and the closely related British cohort, and we highlight in the main Main Text that the top signals, with the exception of LCT, are shared between the two cohorts, indicative of selection that began prior to the population split (Discussion and Results - Signals of Positive Selection). iHS is a commonly calculated statistic, and it is often calculated in a single population without comparing to others, so we feel it is important to show our result demonstrating these shared selection signals. In our revisions, we now clarify in the Discussion the limitations and time-scale at which the iHS statistic may detect selection. As far as XP-EHH, it is a statistic designed to identify differentiated variants that are fixed or approaching fixation in one population but not others. The time-scale of selection that XP-EHH can detect would therefore be dependent on the populations used for comparison. As XP-EHH has the best power to identify alleles that are fixed or approaching fixation in one population but not others, it is less likely to detect older selection events / incomplete sweeps from the source populations. We highlight this point in the Discussion.

      (4) Similarly, for the discussion of LCT, I am not convinced that the haplotypes depicted here are on the right scale to reflect processes happening on the Faroes. Given the admixture/population history, it at the very least should be discussed in the context of whether the 13910 allele frequency on the Faroes is at odds with what would be expected based on the admixture sources.

      We agree that more investigation into the LCT allele frequency in the other ancient samples may provide some insight into the selection history, particularly in light of ancient admixture. Please note, we did look at the allele frequency of the LCT allele rs4988235 and stated in the main text that it was present at high frequencies in the historical (250BP) Faroese samples. The frequency of this allele in the imputed historical Faroese samples is 82% while the allele is present at ~74% frequency in modern samples. We originally did not report the exact percentage in the main text because the sample size of the historical samples (11 individuals) is small and coverage of ancient samples is low, leading to potential errors in imputation.

      However, given the reviewer’s comment, we have now included the frequencies as well as these caveats in the Discussion. We additionally calculated the LCT allele frequency in other ancient samples, and assuming that we had good proxies for the sources at the time of admixture, we calculated the expected allele frequency in the admixed ancestors of the Faroese founders (Discussion), but again note the limitations in using such a calculation in this context.

      (5) I am lacking information to evaluate the procedure for turning the outliers into p-values. Both iHS and XP-EHH are ratio statistics, meaning they might be heavy-tailed if one is not careful, and the central limit theorem may not apply. It would be much easier (and probably sufficient for the points being made here) to reframe this analysis in terms of empirical outliers.

      Given that there are disagreements on the best approach to reporting selection scan results from the reviewers, in our revision, we have additionally supplied both the standardized iHS / XP-EHH values in Supplementary Fig. S10 as well as these values transformed to p-values in Main Text Fig. 3. Additionally, both outputs are provided in the publicly available selection scan results files. We provide the method for obtaining p-values in the subsection “Selection scan” from the Methods section - we used a method developed earlier by Fariello et al.

      (6) Oldest individual predating gene flow: It seems impossible to make any statements based on a single individual. Why is it implausible that this person (or their parents), e.g., moved to the Faroes within their lifetime and died there?

      We agree with the reviewer that this is a plausible explanation, and in our revisions, we have updated the Main Text - Discussion to acknowledge this possibility.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Please note that there was disagreement among the reviewers regarding the reporting of outliers.

      As stated in our response to the public reviews, given the disagreement, we include both the empirical selection statistics as well as the converted p-values in the main text, supplement and selection scan files.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2:

      Define labels / explain why they differ from 1000k populations / make them consistent throughout the manuscript.

      We apologize for the error in labels for Figure 2. These are the same populations used in other figures and analyses. We have fixed this in our revisions so that the labels are consistent with the rest of the manuscript.

      (2) Figure S2 label:

      "The matrix is rescaled after subsetting the individuals, so although the scales are different, the overall structure remains the same." I do not understand this sentence. The samples are different, the scale is different, the apparent pattern is different - what overall structure is supposed to be the same?

      We apologize that the language was not clear in the figure label. The scales between panels A and B are different, because popkin rescales the kinship labels after subsetting so that the minimum kinship is zero. This is necessary when subsetting individuals from an already estimated kinship matrix particularly when subsetting from global populations to a single region. From the popkin documentation: “This rescaling is required when subsetting results in a more recent Most Recent Common Ancestor (MRCA) population compared to the original dataset (for example, if the original data had individuals from across the world but the subset only contains individuals from a single continent)” (https://rdrr.io/cran/popkin/man/rescale_popkin.html).

      We also described this in the Methods - Population Genetics - Kinship and runs of homozygosity section: “When calculating the kinship matrix for the Faroese WGS cohort only, we used the rescale_kinship() function, which will change the most recent common ancestor and give different absolute values, but the overall relationship structure in the subpopulation remains the same.”

      That is, the relative kinship within the Faroese cohort remains consistent, despite the different scale.

      It is difficult to see the kinship of Faroese individuals in the larger plot with all cohorts, which is why we subset and visualize the Faroese cohort alone. We have updated the Fig. S2 label language to make this more clear.

      (3) "Iron Age Wet Europe"

      We have corrected this typo to “Iron Age West Europe.”

      I'm confused if the ancient Faroese were part of the imputation panel: Figure 5 legend implies they are, methods imply they are not.

      The ancient samples are not imputed with the modern Faroese and reference samples, but they are the imputed data downloaded from Allentoft et al. and merged with the modern Faroese cohort. We specify that we downloaded imputed ancient samples in both the Methods - Fine-scale structure estimation using ancient genomes and in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes. The description of the imputation panel in the Methods - Bioinformatics - Variant calling and imputation refers only to the modern samples.

      (4) Kinship:

      The kinship of the Faroes is useful (and nice) as a QC analysis showing the genetic data matches the expectations from the pedigree. I don't know what I should learn from the kinship of the 1000kg samples (I'd assume one could learn something about bottleneck strength from this), but it's not developed/discussed.

      The global kinship matrix provides complementary information to PCA and ROH, as another way to quantify and visualize the relationships within and between populations. Additionally, as the reviewer mentioned, bottlenecks increase kinship within populations. Given that popkin estimates kinship measured from a Most Recent Common Ancestor, we can best observe this increase in kinship when comparing to other global populations. We more clearly delineate what can be observed from Fig. S2A versus Fig. S2B in the Results - Population Structure and Relatedness.

      Reference

      (1) Gretzinger, J. et al. The Anglo-Saxon migration and the formation of the early English gene pool. Nature 610, 112–119 (2022)

      (2) Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).

    1. Romeo. Give me that mattock and the wrenching iron. Hold, take this letter; early in the morning 2960See thou deliver it to my lord and father. Give me the light: upon thy life, I charge thee, Whate'er thou hear'st or seest, stand all aloof, And do not interrupt me in my course. Why I descend into this bed of death, 2965Is partly to behold my lady's face; But chiefly to take thence from her dead finger A precious ring, a ring that I must use In dear employment: therefore hence, be gone: But if thou, jealous, dost return to pry 2970In what I further shall intend to do, By heaven, I will tear thee joint by joint And strew this hungry churchyard with thy limbs: The time and my intents are savage-wild, More fierce and more inexorable far 2975Than empty tigers or the roaring sea. Balthasar. I will be gone, sir, and not trouble you. Romeo. So shalt thou show me friendship. Take thou that: Live, and be prosperous: and farewell, good fellow. Balthasar. [Aside] For all this same, I'll hide me hereabout: 2980His looks I fear, and his intents I doubt. [Retires] Romeo. Thou detestable maw, thou womb of death, Gorged with the dearest morsel of the earth, Thus I enforce thy rotten jaws to open, 2985And, in despite, I'll cram thee with more food! [Opens the tomb] Paris. This is that banish'd haughty Montague, That murder'd my love's cousin, with which grief, It is supposed, the fair creature died; 2990And here is come to do some villanous shame To the dead bodies: I will apprehend him. [Comes forward] Stop thy unhallow'd toil, vile Montague! Can vengeance be pursued further than death? 2995Condemned villain, I do apprehend thee: Obey, and go with me; for thou must die. Romeo. I must indeed; and therefore came I hither. Good gentle youth, tempt not a desperate man; Fly hence, and leave me: think upon these gone; 3000Let them affright thee. I beseech thee, youth, Put not another sin upon my head, By urging me to fury: O, be gone! By heaven, I love thee better than myself; For I come hither arm'd against myself: 3005Stay not, be gone; live, and hereafter say, A madman's mercy bade thee run away. Paris. I do defy thy conjurations, And apprehend thee for a felon here. Romeo. Wilt thou provoke me? then have at thee, boy! 3010 [They fight] Page. O Lord, they fight! I will go call the watch. [Exit] Paris. O, I am slain! [Falls] 3015If thou be merciful, Open the tomb, lay me with Juliet. [Dies] Romeo. In faith, I will. Let me peruse this face. Mercutio's kinsman, noble County Paris! 3020What said my man, when my betossed soul Did not attend him as we rode? I think He told me Paris should have married Juliet: Said he not so? or did I dream it so? Or am I mad, hearing him talk of Juliet, 3025To think it was so? O, give me thy hand, One writ with me in sour misfortune's book! I'll bury thee in a triumphant grave; A grave? O no! a lantern, slaughter'd youth, For here lies Juliet, and her beauty makes 3030This vault a feasting presence full of light. Death, lie thou there, by a dead man interr'd. [Laying PARIS in the tomb] How oft when men are at the point of death Have they been merry! which their keepers call 3035A lightning before death: O, how may I Call this a lightning? O my love! my wife! Death, that hath suck'd the honey of thy breath, Hath had no power yet upon thy beauty: Thou art not conquer'd; beauty's ensign yet 3040Is crimson in thy lips and in thy cheeks, And death's pale flag is not advanced there. Tybalt, liest thou there in thy bloody sheet? O, what more favour can I do to thee, Than with that hand that cut thy youth in twain 3045To sunder his that was thine enemy? Forgive me, cousin! Ah, dear Juliet, Why art thou yet so fair? shall I believe That unsubstantial death is amorous, And that the lean abhorred monster keeps 3050Thee here in dark to be his paramour? For fear of that, I still will stay with thee; And never from this palace of dim night Depart again: here, here will I remain With worms that are thy chamber-maids; O, here 3055Will I set up my everlasting rest, And shake the yoke of inauspicious stars From this world-wearied flesh. Eyes, look your last! Arms, take your last embrace! and, lips, O you The doors of breath, seal with a righteous kiss 3060A dateless bargain to engrossing death! Come, bitter conduct, come, unsavoury guide! Thou desperate pilot, now at once run on The dashing rocks thy sea-sick weary bark! Here's to my love! 3065[Drinks] O true apothecary! Thy drugs are quick. Thus with a kiss I die. [Dies] [Enter, at the other end of the churchyard, FRIAR] 3070LAURENCE, with a lantern, crow, and spade] Friar Laurence. Saint Francis be my speed! how oft to-night Have my old feet stumbled at graves! Who's there? Balthasar. Here's one, a friend, and one that knows you well. Friar Laurence. Bliss be upon you! Tell me, good my friend, 3075What torch is yond, that vainly lends his light To grubs and eyeless skulls? as I discern, It burneth in the Capel's monument. Balthasar. It doth so, holy sir; and there's my master, One that you love. 3080 Friar Laurence. Who is it? Balthasar. Romeo. Friar Laurence. How long hath he been there? Balthasar. Full half an hour. Friar Laurence. Go with me to the vault. 3085 Balthasar. I dare not, sir My master knows not but I am gone hence; And fearfully did menace me with death, If I did stay to look on his intents. Friar Laurence. Stay, then; I'll go alone. Fear comes upon me: 3090O, much I fear some ill unlucky thing. Balthasar. As I did sleep under this yew-tree here, I dreamt my master and another fought, And that my master slew him. Friar Laurence. Romeo! 3095[Advances] Alack, alack, what blood is this, which stains The stony entrance of this sepulchre? What mean these masterless and gory swords To lie discolour'd by this place of peace? 3100[Enters the tomb] Romeo! O, pale! Who else? what, Paris too? And steep'd in blood? Ah, what an unkind hour Is guilty of this lamentable chance! The lady stirs. 3105 [JULIET wakes] Juliet. O comfortable friar! where is my lord? I do remember well where I should be, And there I am. Where is my Romeo? [Noise within] Friar Laurence. I hear some noise. Lady, come from that nest Of death, contagion, and unnatural sleep: A greater power than we can contradict Hath thwarted our intents. Come, come away. Thy husband in thy bosom there lies dead; 3115And Paris too. Come, I'll dispose of thee Among a sisterhood of holy nuns: Stay not to question, for the watch is coming; Come, go, good Juliet, [Noise again] 3120I dare no longer stay. Juliet. Go, get thee hence, for I will not away. [Exit FRIAR LAURENCE] What's here? a cup, closed in my true love's hand? Poison, I see, hath been his timeless end: 3125O churl! drunk all, and left no friendly drop To help me after? I will kiss thy lips; Haply some poison yet doth hang on them, To make die with a restorative. [Kisses him] 3130Thy lips are warm. First Watchman. [Within] Lead, boy: which way? Juliet. Yea, noise? then I'll be brief. O happy dagger! [Snatching ROMEO's dagger] This is thy sheath; 3135[Stabs herself] there rust, and let me die. [Falls on ROMEO's body, and dies] [Enter Watch, with the Page of PARIS] Page. This is the place; there, where the torch doth burn. 3140 First Watchman. The ground is bloody; search about the churchyard: Go, some of you, whoe'er you find attach. Pitiful sight! here lies the county slain, And Juliet bleeding, warm, and newly dead, Who here hath lain these two days buried. 3145Go, tell the prince: run to the Capulets: Raise up the Montagues: some others search: We see the ground whereon these woes do lie; But the true ground of all these piteous woes We cannot without circumstance descry. 3150 [Re-enter some of the Watch, with BALTHASAR] Second Watchman. Here's Romeo's man; we found him in the churchyard. First Watchman. Hold him in safety, till the prince come hither. [Re-enter others of the Watch, with FRIAR LAURENCE] Third Watchman. Here is a friar, that trembles, sighs and weeps: 3155We took this mattock and this spade from him, As he was coming from this churchyard side. First Watchman. A great suspicion: stay the friar too. [Enter the PRINCE and Attendants] Prince Escalus. What misadventure is so early up, 3160That calls our person from our morning's rest? [Enter CAPULET, LADY CAPULET, and others] Capulet. What should it be, that they so shriek abroad? Lady Capulet. The people in the street cry Romeo, Some Juliet, and some Paris; and all run, 3165With open outcry toward our monument. Prince Escalus. What fear is this which startles in our ears? First Watchman. Sovereign, here lies the County Paris slain; And Romeo dead; and Juliet, dead before, Warm and new kill'd. 3170 Prince Escalus. Search, seek, and know how this foul murder comes. First Watchman. Here is a friar, and slaughter'd Romeo's man; With instruments upon them, fit to open These dead men's tombs. Capulet. O heavens! O wife, look how our daughter bleeds! 3175This dagger hath mista'en—for, lo, his house Is empty on the back of Montague,— And it mis-sheathed in my daughter's bosom! Lady Capulet. O me! this sight of death is as a bell, That warns my old age to a sepulchre.

      romeo kills paris and places his body inside juliets tomb believing juliet is dead he drinks the poison and dies beside her friar arrives just as juliet is awakening but romeo is already dead when the friar leaves she sees romeos body and decide to stab herself with the dagger

    1. 18.4. Repair and Reconciliation# The idea of repair (or reconciliation) has shown up a couple of times already, both in the role of shame in child development, and in the Enforcing Social Norms: The Morality of Public Shaming paper. Let’s look more at what a repair might or might not look like. 18.4.1. Limits of Reconciliation# When we think about repair and reconciliation, many of us might wonder where there are limits. Are there wounds too big to be repaired? Are there evils too great to be forgiven? Is anyone ever totally beyond the pale of possible reconciliation? Is there a point of no return? One way to approach questions of this kind is to start from limit cases. That is, go to the farthest limit and see what we find there by way of a template, then work our way back toward the everyday. Let’s look at two contrasting limit cases: one where philosophers and cultural leaders declared that repairs were possible even after extreme wrongdoing, and one where the wrongdoers were declared unforgivable.1 Nuremberg Trials# After the defeat of Nazi Germany, prominent Nazi figures were put on trial in the Nuremberg Trials. These trials were a way of gathering and presenting evidence of the great evils done by the Nazis, and as a way of publicly punishing them. We could consider this as, in part, a large-scale public shaming of these specific Nazis and the larger Nazi movement. Some argued that there was no type of reconciliation or forgiveness possible given the crimes committed by the Nazis. Hannah Arendt argued that no possible punishment could ever be sufficient: The Nazi crimes, it seems to me, explode the limits of the law; and that is precisely what constitutes their monstrousness. For these crimes, no punishment is severe enough. It may well be essential to hang Göring, but it is totally inadequate. Hannah Arendt/Karl Jaspers correspondence, 1926-1969 See also: Eichmann in Jerusalem: A Report on the Banality of Evil by Hannah Arendt Truth and Reconciliation Commission# In South Africa, when the oppressive and violent racist apartheid system ended, Nelson Mandela and Desmond Tutu set up the Truth and Reconciliation Commission. The commission gathered testimony from both victims and perpetrators of the violence and oppression of apartheid. We could also consider this, in part, a large-scale public shaming of apartheid and those who hurt others through it. Unlike the Nuremberg Trials, the Truth and Reconciliation Commission gave a path for forgiveness and amnesty to the perpetrators of violence who provided their testimony. See also: What Archbishop Tutu’s ubuntu credo teaches the world about justice and harmony 18.4.2. Steps for Repentance# For when reconciliation is possible, what would it look like? In the article Famous abusers seek easy forgiveness. Rosh Hashanah teaches us repentance is hard. by Rabbi Danya Ruttenberg, she outlines a set of steps for “repentance” needed for someone to have their relationship with others repaired: “The bad actor must own the harm perpetrated, ideally publicly” “They must do the hard internal work to become the kind of person who does not harm in this way — which is a massive undertaking, demanding tremendous introspection and confrontation of unpleasant aspects of the self” “They must make restitution for harm done, in whatever way that might be possible” “Then — and only then — they must apologize sincerely to the victim” “Lastly, the next time they are confronted with the opportunity to commit a similar misdeed, they must make a different, better choice” 18.4.3. Repair Example# On February 6, 2022, Jeremy Schneider became the Twitter “main character of the day” for posting the following Tweet, which was widely condemned as being mean and not understanding other people’s experiences: Fig. 18.1 Jeremy Schneider’s Tweet# In what was an unusual turn of events for a Twitter “main character of the day,” Jeremy Schneider later made an apology that was mostly accepted by the Twitter users who had criticized his Tweet: Fig. 18.2 Part 1 of Jeremy Schneider’s apology# Fig. 18.3 Part 2 of Jeremy Schneider’s apology# 18.4.4. Reflection questions# Do you think there are situations where reconciliation is not possible? What would reconciliation look like (if possible), when a social media platform is used in a genocide (see: Meta urged to pay reparations for Facebook’s role in Rohingya genocide) Does Jeremy Schneider’s apology cover the five steps of repentance listed by Rabbi Danya Ruttenberg? Pick a situation where someone is being publicly shamed. Who is responsible for accepting or rejecting their apology/repentance? Pick a social media platform and a situation where someone is being publicly shamed. What might that person do to try to repair or reconcile after the public shaming? Pick a social media platform. In what ways does that platform make it difficult to repair or reconcile after public shaming? 1 We give these two examples to illustrate how important it is to appreciate the breadth of views on this incredibly difficult question, not to imply that one view or the other is preferable. The Nuremberg Trials and the Truth and Reconciliation Commission are both attempts at responding to great evils, and we believe it is important to understand different views of people who suffered. So take your time to think through your intuitions about these limit cases, and research different perspectives on these events (and other atrocities), and then work your way back to the everyday context of social media posting. { requestKernel: true, binderOptions: { repo: "binder-examples/jupyter-stacks-datascience", ref: "master", }, codeMirrorConfig: { theme: "abcdef", mode: "python" }, kernelOptions: { kernelName: "python3", path: "./ch18_public_shaming" }, predefinedOutput: true } kernelName = 'python3' previous 18.3. Perspectives on the Ethics of Public Shaming

      I think reconciliation is possible in some situations, but it requires real effort from the person who caused harm. They need to admit what they did, reflect on why it was wrong, and sincerely apologize. Jeremy Schneider’s apology seems to follow many of these steps because he admitted his tweet was mean, explained how he reflected on it, and promised to think more carefully before posting in the future. However, on social media it is often difficult to repair harm because posts spread quickly and large numbers of people may continue criticizing someone even after they apologize.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors appear to be excluding a significant fraction of the TCRlow gamma delta T cells from their analysis in Figure 1A. Since this population is generally enriched in CD25+ gamma delta T cells, this gating strategy could significantly impact their analysis due to the exclusion of progenitor gamma delta T cell populations.

      We were cautious in our gating strategy since the TCR𝛿+ CD3e+ subset is rather small and so low signal/background noise ratio can be an issue if the gates used are too broad/generous. There is some inevitable low level background staining with the TCR𝛿 that sits just above the bulk of the negative population and is CD3ε -ve. Although this background represents a tiny fraction of total cells, we were wary of gate contamination into our TCR𝛿+ CD3e<sup>+</sup> subset and we wanted a gating strategy that could be applied across other organs too. We do not, however, believe this conservative strategy is impacting on measurements progenitor numbers across strains or our conclusions, since the size of this progenitor population in the various IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains was never impacted by the mutations. But to reassure the reviewer, we show our conservative gate as compared with a very broad TCR𝛿 gate and see we are not missing a substantial population of CD25+ cells just below our gate. This also helps illustrate how close the background from the CD27<sup>int</sup> expressing αβ thymocytes (right column) comes to the TCR𝛿+ CD3+ gate and the importance of tight lineage gating.

      Author response image 1.

      (2) The overall phenotype of the IKKDeltaTCd2 mice is not described in any great detail. For example, it is not clear if these mice possess altered thymocyte or peripheral T cell populations beyond that of gamma delta T cells.

      Given that gamma delta T cell development has been demonstrated to be influenced by gamma delta T cells (i.e, trans-conditioning), this information could have aided in the interpretation of the data.

      Apologies for not being clearer on this point. We have studied conventional αβ T cell development in these strains in considerable detail, and these studies are published and discussed in some detail in the introduction in paragraph 3 on page 3-4 and in cited references Schmidt-Supprian et al 2004, SIlva et al 2014, Xing et al 2016, Webb et al 2019, Carty et al 2023. These detail how IKK expression is critical for thymic development of αβ T cells and their peripheral survival, and dissects the role of NF-κB activation and cell death regulation by IKK. However, we now add new discussion (page 11-12) that considers the potential impact of altered αβ T cell development in the strains used for this study.

      We agree that trans-conditioning is also an important consideration, since CD4 TH17 T cells can enhance type 17 𝛾𝛿 T cell development (10.1038/icb.2011.50). This is of relevance to the limited conclusions we draw concerning type 17 𝛾𝛿 T cells. The REL and IKK deficient strains do lack effector populations, including type 17 αβ T cells, so it is possible that the absence of type 17 αβ T cells in these strains does contribute to the modest impact of IKK deletion in the type 17 𝛾𝛿 subset. We now highlight this information and discuss in the manuscript (page 11-12).

      Related to this, it would have been helpful if the authors provided a comparison of the frequencies of each of the relevant subsets, in addition to the numbers.

      We now provide both the absolute frequencies of different 𝛾𝛿 subsets and their relative frequencies to one another, as supplementary figure 2. We still believe assessing absolute numbers is the gold standard, since the differential impact of gene deletions on the αβ T cell compartments in different strains will effect whether or not αβ T cells are present, and therefore overall representation of 𝛾𝛿 T cells can vary considerably between strains. Hence, absolute numbers are more reliable measure of cell abundance.

      (3) The manner in which the peripheral gamma delta T cell compartment was analyzed is somewhat unclear. The authors appear to have assessed both spleen and lymph node separately. The authors show representative data from only one of these organs (usually the lymph node) and show one analysis of peripheral gamma delta T cell numbers, where they appear to have summed up the individual spleen and lymph node gamma delta T cell counts. Since gamma deltaT17 and gamma deltaT1 are distributed somewhat differently in these compartments (lymph node is enriched in gamma deltaT17, while spleen is enriched in gamma deltaT1), combining these data does not seem warranted. The authors should have provided representative plots for both organs and calculated and analyzed the gamma delta T cell numbers for both organs separately in each of these analyses.

      We did of course process and calculate numbers of different subsets in both lymph nodes and spleen. Where we saw loss of peripheral 𝛾𝛿 subsets, or rescue, this was reflected in seperate analysis of both organs and we did not see any organs specific effects in the mouse strains analysed. We therefore took the initial view that presenting aggregate data was most efficient and least repetitive representation of data. However, we very much recognise the reviewers concern, and interest to see these data, so have now included representative plots across both organs for figure 1D, and show cell numbers of lymph nodes and spleen separately, as well as together, for figures 1, 2, 4 and 7, and these plots reflect the differences observed when we combined data. We did not break down the data for all figures (e.g. figures 3 and 5) as it was more cumbersome for more complex multi-strain comparisons and so attempt to balance clarity and transparency against unnecessary repetitive data presentation.

      (4) The authors make extensive use of surrogate markers in their analysis. While the markers that they choose are widely used, there is a possibility that the expression of some of these markers may be altered in some of their genetic mutants. This could skew their analysis and conclusions. A better approach would have been to employ either nuclear stains (Tbx21, RORgammaT) or intracellular cytokine staining to definitively identify functional gamma deltaT1 or gamma deltaT17 subsets.

      We did share a similar concern, but think this is not an issue where subsets disappear and are almost completely absent, such as in IKK1/2 KO and Casp8 KO settings. Where we saw rescue with RIPK1<sup>D138N</sup> in Casp8ΔT<sup>CD2</sup> strains, we were keen to demonstrate that the populations we saw restored did exhibit their expected function, and so confirmed this in figure 5C by intracellular cytokine staining after a short 4h restimulation in vitro. This also served to validate our gating strategy, since what we designated as Type 1 cells - CD27+CD122+CD44<sup>int</sup> cells were the only source of IFN-gamma, while CD27–CD44<sup>hi</sup> CD122<sup>lo</sup> cells were the only source of IL-17. Adaptive/ naive cells made neither cytokine. So while we did not include nuclear stains, we were satisfied that the cytokine assays validated the gating strategy.

      (5) The analysis and conclusion of the data in Figure 3A is not convincing. Because the data are graphed on log scale, the magnitude of the rescue by kinase dead RIPK1 appears somewhat overstated. A rough calculation suggests that in type 1 game delta T cells, there is ~ 99% decrease in gamma delta T cells in the Cre+WT strain and a ~90% decrease in the Cre+KD+ strain. Similarly, it looks as if the numbers for adaptive gamma delta T cells are a 95% decrease and an 85% decrease, respectively. Comparing these data to the data in Figure 5, which clearly show that kinase dead RIPK1 can completely rescue the Caspase 8 phenotype, the conclusion that gamma delta T cells require IKK activity to repress RIPK1-dependent pathways does not appear to be well-supported. In fact, the data seem more in line with a conclusion that IKK has a significant impact on gamma delta T cell survival in the periphery that cannot be fully explained by invoking Caspase8-dependent apoptosis or necroptosis. Indeed, while the authors seem to ultimately come to this latter conclusion in the Discussion, they clearly state in the Abstract that "IKK repression of RIPK1 is required for survival of peripheral but not thymic gamma delta T cells." Clarification of these conclusions and seeming inconsistencies would greatly strengthen the manuscript. With respect to the actual analysis in Figure 3A, it appears that the authors used a succession of non-parametric t-tests here without any correction. It may be helpful to determine if another analysis, such as ANOVA, may be more appropriate.

      Yes, we completely agree with this assessment and conclusion. While kinase dead RIPK1 does provide some rescue, this appears relatively modest, and instead supports the view, validated in figure 7, that maybe the dominant function of IKK in 𝛾𝛿 T cells is to activate NF-κB dependent survival signals. Nevertheless, RIPK1<sup>D138N</sup> does provide some significant rescue, which allows some peripheral cells to repopulate and demonstrates that IKK is repressing RIPK1 mediated cell death. It is actually not trivial to assess the relative importance of IKK-RIPK1 and IKK-NF-κB functions. In the IKKΔT<sup>CD2</sup> RIPK1<sup>D138N</sup> mice, we prevent RIPK1 induced death, but still lack the NF-κB-dependent survival signal. Consistent with this, the ~1log reduction in 𝛾𝛿 numbers between WT and IKKΔT<sup>CD2</sup> RIPK1<sup>D138N</sup> mice is actually similar to what we observe in the absence of REL subunits (Fig. 7) which is a smaller reduction than we observe in IKKΔT<sup>CD2</sup> mice. What would have been ideal is to have a scenario where IKK regulation of RIPK1 was defective but NF-κB survival signalling was intact. This would reveal the full impact of loosing IKK dependent regulation of RIPK1 alone, which we suspect would result in substantial cell death that could not be blocked by NF-κB. Unfortunately, we not have or know of suitable mouse mutants to test this. This is quite a nuanced discussion and we now clarify the scope and extent of conclusions we can draw (p. 7, 11).

      (6) The conclusion that the alternative pathway is redundant for the development and persistence of the major gamma delta T cell subsets is at odds with a previous report demonstrating that Relb is required for gamma delta T17 development (Powolny-Budnicka, I., et al., Immunity 34: 364-374, 2011). This paper also reported the involvement of RelA in gamma delta T17 development. The present manuscript would be greatly improved by the inclusion of a discussion of these results.

      Thank you - we include a discussion of these papers now (p12).

      (7) The data in Figures 1C and 3A are somewhat confusing in that while both are from the lymph nodes of IKKdeltaTCD2 mice, the data appear to be quite different (In Figure 3A, the frequency of gamma delta T cells increases and there is a near complete loss of the CD27+ subset. In Figure 1A, the frequency of gamma delta T cells is drastically decreased, and there is only a slight loss of the CD27+ subset.)

      Yes, we agree these do like quite different and could be confusing. The lymph nodes from IKKΔT<sup>CD2</sup> lack αβ T cells and B cells, and so the cellularity is much lower than normal. Consequently, the percentage representation of remaining cells can be more noisy, while total cellularity calculations are more consistent. This is not an issue in the other strains that all have more cells in lymph nodes. We now show plots from spleen of the same mice which appear better aligned with additional splenic data shown in Figure 1.

      Reviewer #2 (Public review):

      (1) All approaches used confer changes to the entire T cell compartment. Therefore, the authors are unable to resolve whether the observations are mediated by direct and/or indirect effects (e.g., disorganized lymphoid architecture impacting maintenance/survival/homing).

      We address this important point in the discussion (p11-12). The impacts of gene deletions upon αβ and 𝛾𝛿 T cells operate independently of one another (as also discussed in response to reviewer 1). For instance, the phenotype of αβ T cells is identical in IKKΔT<sup>CD2</sup> and IKKΔT<sup>CD4</sup> mice - 𝛾𝛿 T cells are only targeted in IKKΔT<sup>CD2</sup> mice. Similarly, the phenotype of 𝛾𝛿 T cells is similar in IKKΔT<sup>CD2</sup> vs Casp8.IKKΔT<sup>CD2</sup> strains. αβ T cells are absent from IKKΔT<sup>CD2</sup> but present in near normal numbers in Casp8.IKKΔT<sup>CD2</sup> mice. Others have also noted that 𝛾𝛿 T cell development is normal in Rag deficient mice (10.1126/science.1604321). In any case, an absence of αβ T cells is expected to promote 𝛾𝛿 T cell survival in the absence of competition for common utilised cytokines such as IL-7 and IL-15, though we do not see much evidence for this in mice with and without αβ T cells such as IKKΔT<sup>CD2</sup> vs Casp8. IKKΔT<sup>CD2</sup> strains. We do now discuss the potential contribution of trans-conditioning for type 17 𝛾𝛿 T cell development (p12).

      (2) Assessment of factors that impact T cell numbers in the periphery is necessary. Are there observable changes to the proliferation, survival, and migration of gd T cell subsets?

      In IKKΔT<sup>CD2</sup> and Casp8. IKKΔT<sup>CD2</sup> deficient strains, we infer a defect in survival, since they lack peripheral 𝛾𝛿 T cells, despite normal thymic development. Their absence made it hard to assess proliferation and migration, though 𝛾𝛿 T cells were absent from all lymphoid organs. The conclusions that defective survival is responsible for the absence of 𝛾𝛿 T cells in the different strains is also supported by the rescue of IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains by kinase dead RIPK1D138N. Furthermore, the presence of small numbers of residual populations in lymph nodes and spleen of IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains demonstrates that migration patterns were normal. Were cells unable to recirculate, they might be expected to fail to leave the thymus, or to accumulate in the spleen. We so no evidence of either of these scenarios.

      (3) TCRd chain usage, especially among type 3 gd T cells, should be assessed.

      We did not unfortunately, assess chain usage, choosing rather to rely of phenotypic identity of specific subsets, which we show in figure 5C, was extremely robust. IL-17 was only secreted by CD27– CD44<sup>hi</sup> 𝛾𝛿 T cells, while IFN-gamma was only secreted by CD27+ CD44<sup>hi</sup> 𝛾𝛿 T cells. We argue that the production of these key effector cytokines is the most direct test of a subsets functional identity and the phenotypic designation is robust.

      (4) The functional consequences of IKK signaling on gd T cells were largely unaddressed. Cytokine analyses were performed only in the RIPK1D138N Casp8∆TCD2 model, leaving open the question of how canonical NF-κB-dependent signaling impacts the long-term functionality of gd T cells.

      Yes, we agree this remains an open question around the transcriptional mechanisms by which NFκB signalling promotes cell survival, and one best addressed in future studies. We did not perform cytokine staining more widely, because the cytokine assay relies on short term re-stimulation of T cells with PMA and ionomycin. PMA activates PKC which in turn activates NF-κB signalling to elicit the cytokine response measured in this assay. As such, the results of such assays would be hard to interpret. We agree it would be interesting to investigate the functional consequences of REL deficiency in future studies, although this may need a more nuanced setting where 𝛾𝛿 T cells are not lost as a result of their defective survival.

      (5) The authors suggest that Caspase 8 is required for the development and maintenance of type 3 gd T cells. While the authors discussed the limitations of assessing adult mice in interpreting the data, it seems like a relatively straightforward experiment to perform.

      We did attempt these experiments with collaborators by analysing type 17 𝛾𝛿 T cell development in fetal thymic organ culture (FTOC). However, the GM mice are not so easy to breed and generating the large numbers of embryos required to set up the FTOCs proved too challenging and we were unable to generate these data.

      (6) While analyses of Casp8∆TCD2 RIPK1D138N mice suggest that loss of adaptive and type 1 gamma delta T cells in Casp8∆TCD2 animals is due to necroptosis, the contribution of RIPK3 kinase activity remains unexamined. RIPK3 activity determines whether cells die via necroptosis or apoptosis in RIPK1/Caspase8-dependent signaling, and inclusion of this analysis would strengthen mechanistic insights.

      Given time and resources, it would have been ideal to confirm necroptotic cell death by alternative knockouts, such as RIPK3 or MLKL. However, formation of the necrosome is dependent on kinase active RIPK1, since autophosphorylation of RIPK1 changes its conformation to allow recruitment of RIPK3 and MLKL and formation of the necrosome. Therefore, the rescue of CASPASE8 deficient T cells from cell death by kinase dead RIPK1 is very solid genetic evidence of necroptosis.

      (7) Canonical NF-κB signaling through cRel alone was not evaluated, leaving a gap in the understanding of transcriptional pathways required for gd T cell subsets.

      This was assessed in p105/RelA knockout strain, which only express cREL. What we lacked was an assessment of what RelA/p50 dimers can support in the absence of cREL. We do however, show the impact of RelA single deficiency, and RelA/p50 deficiency.

      In truth, we had many REL deficient strains and it was challenging to make all the combinations we wanted. However, we try to compensate for this by discussing what cREL:cREL dimers and cREL:P50 dimers are capable of doing by analysing 𝛾𝛿 T cell development in p105/RELA DKO and RELA KO mice - these do show that cREL:P50 can compensate in the absence of RELA, but cREL:cREL cannot.

      Reviewer #3 (Public review):

      Weaknesses:

      The paper would benefit greatly from a graphical abstract that could summarize the key findings, making the key findings accessible to the general immunology or biochemistry reader. Ideally, this graphic would distinguish the requirements for NF-κB signals sustaining thymic γδ T cell differentiation from peripheral maintenance, taking into account the various subsets and signaling pathways required. In addition, the authors should consider adding further literature comparing the requirements for NF-κB /necroptosis pathways in regulating other non-conventional T cell populations, such as iNKT, MAIT, or FOXP3+ Treg cells. These data might help position the requirements described here for γδ T cells compared to other subsets, with respect to homeostatic cues and transcriptional states.

      Thank you - we have added such discussions. We are happy to add a graphical abstract if journal constraints permit this.

      Last and least, there are multiple grammatical errors throughout the manuscript, and it would benefit from further editing. Likewise, there are some minor errors in figures (e.g., Figure 3A, add percentage for plot from IKKDT.RIPK1D138N mouse; Figure 7, “Adative").

      Thank you !

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      Summary of findings and key conclusions This manuscript asks how pharmacologic targeting of the outer mitochondrial membrane protein MIRO1 (RHOT1) with a MIRO1-binding compound (MR3) reshapes immunosuppressive programs in the glioma tumor microenvironment (TME). The core of the paper is a cross-species transcriptomic comparison that combines an in vivo mouse dataset with an ex vivo human perturbation dataset. Model systems and approach (as described): • Mouse in vivo: GL261-Luc intracranial glioma in C57BL/6J mice; MR3 is administered intracranially at the implantation site (10 µM in 5 µL DMSO) on days 11 and 18, and tumors are harvested on day 22 for single-nucleus RNA-seq (snRNA-seq). • Mouse snRNA-seq: NeuN-based nuclei sorting, 10x Genomics v3.1; alignment to mm10; Seurat-based integration and annotation. Tumor-cell calling is supported by CNV inference (SCEVAN/CopyKAT). One MR3-treated sample is excluded after QC, leaving 3 control vs 2 MR3-treated samples (11,940 NeuN− nuclei). • Human ex vivo: freshly resected glioma cores from 3 patients are cultured with 10 µM MR3 or DMSO for 24 h, followed by bulk RNA-seq (STAR alignment to hg19; DESeq2 for differential expression). • Cross-species integration: the analysis is restricted to 1:1 orthologs and protein-coding genes shared across datasets; inferred cell-cell signaling is explored with CellChat. Main findings (as presented): • MR3 shifts expression of a subset of glioma-associated genes toward a non-tumor-like direction ("rescued genes") and is associated with large changes in inferred cell-type composition in the mouse snRNA-seq dataset (including a marked drop in the fraction of nuclei annotated as tumor: 44.5% to 4.3%; Fig. 1E). • Across TCGA-vs-GTEx (glioma-upregulated genes) and three MR3 response analyses (mouse snRNA-seq, mouse pseudo-bulk, and human bulk RNA-seq), PARP11/Parp11 is reported as the only gene that is consistently upregulated in glioma and consistently downregulated by MR3 (Fig. 2B). • Within the mouse myeloid compartment, Parp11 is most enriched in MAC4 and MAC1, while MAC1 shows high Cd274 (Pdl1/PD-L1). MR3 reduces Parp11 in MAC4/MAC1 and reduces Cd274 in MAC1 (Fig. 2H). • CellChat analysis suggests that in controls MAC1 is the dominant sender of PD-L1/PD-1 signaling to CD8+ T cells (Fig. 3C), and that this PD-L1/PD-1 interaction is strongly diminished after MR3 (Fig. 3E). • The authors propose a paracrine model in which MAC4-derived PGE2 (via Ptges3) sustains Parp11 expression in MAC1 through cAMP/PKA/CREB, promoting PD-L1-mediated T-cell suppression; MR3 disrupts this circuitry (Fig. 4). Major comments 1. Strength of the conclusions Two parts of the story felt well supported by the data as shown. First, the cross-species convergence on PARP11/Parp11 is a clear and potentially useful result (Fig. 2B). Second, the myeloid subclustering plus CellChat analysis makes a coherent case that PD-L1/PD-1 signaling in this model is dominated by a specific macrophage subset (MAC1) and changes after MR3 (Fig. 2H, Fig. 3). Where I was less convinced is when the manuscript moves from "transcriptomic and modeling evidence" to causal statements such as "MIRO1-mediated axis driving immunosuppression" and "MR3 reduces tumor burden by reactivating immunity." At the moment, several central inferences remain indirect: • Causality is inferred primarily from transcriptomic shifts and ligand-receptor inference rather than functional immune readouts.

      -We thank the Reviewer for the constructive evaluation. We have toned down the claims throughout the manuscript with tracking.

      • __ On-target attribution to MIRO1 hinges on MR3 being a MIRO1 binder; the study does not include a genetic MIRO1 perturbation or a target-engagement/epistasis test in the relevant immune compartments (and the authors acknowledge this limitation in the Discussion).__ -We have examined on-target activity of MR3 in our other papers. For example, by depleting Miro1 with CRISPRi in glioma cells (Miro1 KD cells), we found that it phenocopied the effect of MR3. We also expressed Miro1-7A, a drug-resistant mutant of Miro1 predicted to be unable to bind MR3 (1) in Miro1 KD glioma cells, which rendered glioma cells insensitive to MR3 treatment. These data demonstrate that in cellular glioma models, Miro1 is the target of MR3 and MR3 exerts its functions via directly binding to Miro1.

      We have also excluded off-target effect of MR3 by examining other mitochondrial GTPases (1, 2) including Miro2.

      We agree these data were not done specifically in immune compartments, and have acknowledged it in Discussion and added more explanation in Introduction citing our published papers.

      • __ The very large reduction in "tumor cell proportion" (Fig. 1E) is striking but is still a composition measure of recovered nuclei; it is not, on its own, a direct measurement of tumor size/burden and could be sensitive to differential nuclei recovery or cell loss during processing.__ -We agree that the "tumor cell proportion" in Fig. 1E represents the composition of recovered nuclei and is not, by itself, a direct measurement of tumor size or burden. We have removed "tumor burden" throughout the manuscript to avoid confusion.

      To determine whether the observed reduction might reflect technical bias, we examined the quality control metrics across all samples. Of the six initial samples (three control and three treated), one treated sample (TN1) showed clear quality concerns and was therefore excluded from downstream analysis.

      For the remaining samples, the distributions of detected genes per nucleus and total RNA counts per nucleus were similar between groups. The percentage of mitochondrial reads was consistently low, and only a small fraction of nuclei was removed during filtering, indicating overall comparable nuclei quality. Notably, the treated samples yielded similar or even higher total numbers of recovered nuclei, despite showing a lower tumor cell proportion. Please refer to new Fig. S1A for these results.

      Together, these observations suggest that the decrease in tumor cell proportion is unlikely to be explained simply by differential nuclei recovery, sequencing depth, or filtering effects. That said, we recognize that compositional differences in single-nucleus RNA sequencing data do not provide a direct measurement of tumor burden. We have revised the manuscript to clarify this point and to indicate that independent future approaches would be required for definitive assessment.

      I think the paper can go forward in its current scope, but the strength of the claims should match the level of evidence. If the authors want to keep strong, causal language in the title/abstract ("driving immunosuppression," "reduces tumor burden"), then I consider one or two targeted validation experiments essential (see below). Alternatively, the authors can temper the language and position the mechanistic model more explicitly as a hypothesis generated from the transcriptomic analysis.

      -We thank the Reviewer! We have toned down the claims throughout the manuscript to make the data consistent with the conclusion.

      __ Statements that should be labeled as preliminary/speculative (unless additional validation is added) • MAC4-derived PGE2 as the upstream driver of MAC1 Parp11/PD-L1: plausible and nicely consistent with Ptges3 being MAC4-high in controls and reduced with MR3 (Fig. 4A), but not demonstrated.__

      -We have changed the conclusion of this part to:

      Together, these bioinformatic findings suggest that MAC4 may produce PGE₂, which could act on nearby MAC1 cells in a paracrine manner to increase Parp11 expression, although this model needs to be functionally validated.

      • __ MIRO1 _→ mtDNA _→ cGAS/STING _→_ Ptges3 as a mechanistic chain: interesting, but currently framed largely by pathway knowledge plus modest expression changes (Supplementary Fig. S5).__ -We have added: "which requires future functional investigation."

      • __ "MR3 reactivates anti-tumor immunity to reduce tumor burden": the gene set enrichment and CellChat shifts are consistent with immune activation, but immune-mediated tumor control is not directly tested.__ -We have toned down these claims on tumor burden and only conclude as: MR3 may enhance anti-tumor immune responses.

      __ Replication and statistics Mouse snRNA-seq replication is limited after QC (3 control vs 2 MR3-treated animals). With n=2 treated, it is hard to know whether some of the biggest composition and cluster-level changes are robust to animal-to-animal variability.__

      -As also explained to Rev 2, we originally planned 3 mice per group. Despite losing one after QC, sample-level pseudobulk PCA analysis (treating each mouse as one replicate) of the mice shows clear separation of treated from untreated groups (new Fig. S2C), supporting technical reproducibility despite a small n. The two MR3-treated samples clustered together and were clearly separated from controls, indicating that the transcriptional effect of MR3 exceeds inter-animal variability (new Fig. S2C). The reduction in tumor cell proportion was also observed in both treated animals (new Fig. S2F). We have added this description to the Results (Page 5, lines 116-118) and included a new figure showing the tumor cell proportion for each animal (new Fig. S2F).

      We acknowledge this is a limitation, but as the Reviewer also pointed out that our paper's significance is to transcriptomically link Miro1 to well-known immune suppression factors in glioma TME and integrate 3 glioma databases which will facilitate researchers in the field to advance their own research. Thus, our methods and resource should be still valid and useful to the community.

      Relatedly, the snRNA-seq differential expression is performed with Seurat FindMarkers (Wilcoxon rank-sum). Per-cell testing can inflate significance if biological replicate structure is not accounted for (pseudoreplication). I suggest the authors clarify exactly how they handled sample-level replication for the key DE results and, where possible, re-run the main DE comparisons using a sample-aware approach (e.g., pseudo-bulk within cell types/subclusters).

      -We thank the reviewer for raising this important point. In the original analysis, differential expression was performed using Seurat's FindMarkers function which performs per-cell testing. We acknowledge that this approach can overestimate significance if biological replicate structure is not explicitly accounted for.

      To address this, we re-ran the key differential expression analyses using a pseudo-bulk approach: counts were aggregated per cell type/subcluster per sample, and DE testing was performed across samples rather than individual cells. The main results and conclusions remain consistent with the original analysis, while this approach ensures that statistical significance properly reflects biological replication (new FigS3. D-F).

      For the human bulk RNA-seq, the methods indicate 3 patient tissues split across MR3 vs DMSO for 24 h. In DESeq2, a paired design (including patient as a blocking factor) would be important to avoid patient-to-patient variability dominating the treatment signal; the manuscript should confirm whether the design formula accounted for this.

      -In the revised manuscript, we re-ran the DESeq2 analysis using a paired design with patient as a blocking factor and compared DMSO and MR3 within each patient (P1-P3). The results are consistent with our previous analysis. PARP11 remains significantly downregulated (raw p-value Finally, several places in the Methods define significance using p-value cutoffs (e.g., GEPIA3 TCGA/GTEx analysis uses p 1; human DE uses p = 1). Because multiple testing is substantial in all of these analyses, I recommend reporting FDR-adjusted values consistently (and being explicit about whether figures/tables show raw or adjusted p-values).

      -We have now used FDR-adjusted values for the TCGA/GTEx analysis and have updated Fig. 1C (top left), Results, and Methods accordingly. PARP11 remains significant after FDR correction.

      For the human bulk RNA-seq, very few genes pass an adjusted p 2FC| > 1 across all four differential expression analyses and updated the corresponding description in Methods.

      __ Do the data support the macrophage-to-CD8 suppression claim? The CellChat PD-L1/PD-1 network figures are suggestive (Fig. 3C/E), but ligand-receptor inference is not the same as demonstrating functional T-cell inhibition. At minimum, I would like to see one orthogonal readout (flow or immunostaining) showing that PD-L1__ protein on myeloid cells and PD-1 on CD8 T cells change in the expected directions after MR3, and that CD8 T cells show an activation/effector signature at the protein level.

      -We agree this would be clearly the next step in functional studies, but the current manuscript is focused on transcriptomic analysis and method building, so we have toned down any claims at the functional level.

      In addition, we have observed that T cells after MR3 treatment show upregulation of cytotoxicity- and IFN-response-related genes consistent with enhanced effector function at the transcriptional level. We have added new Fig. S6A and explanation in Result.

      __ PARP11: mediator vs marker The cross-species PARP11 result is the most convincing and potentially generalizable finding in the manuscript (Fig. 2B). However, in the specific context of this study, PARP11 is still best supported as a conserved MR3-responsive candidate rather than a demonstrated causal driver of PD-L1-mediated suppression. If the authors want to argue PARP11 is an effector of the pathway (rather than a marker), they should either soften the language or add a minimal functional linkage experiment within the existing scope (see "Optional" experiments below).__

      -We have softened the overall language throughout the manuscript to emphasize the correlation and PARP11 as a marker and to reflect the bioinformatic nature of the study. As this paper's main goal is method development and resource building, with already 11 figures, we think functional experiments could be done in another paper.

      __ Reproducibility and clarity of methods I appreciate that the authors provide a code/data portal (MiroScape) and a GitHub link. To make the study as reproducible as possible, I recommend: • Deposit raw sequencing reads for both mouse and human datasets (GEO/SRA) and include accession numbers in the manuscript.__

      -We have just deposited all raw data. Accession numbers will be provided once it is public.

      • __ Provide a short, consolidated "computational reproducibility" note with software versions and key parameters (Seurat, CellChat, STAR, DESeq2, etc.).__ -Added

      • __ Clarify pseudo-bulk construction (what is aggregated, at what level, and how many biological replicates contribute to each pseudo-bulk comparison).__ -Added

      • __ Add a brief summary of MR3 provenance/validation and what "MIRO1-binding" means operationally in the context of these experiments (especially for readers outside the MIRO1 field).__ -We have added this in Introduction.

      Experiments requested (kept within the existing claims) I am intentionally not suggesting new lines of experimentation. The experiments below are aimed only at supporting the paper's current central claims. I separate them into items I consider essential vs optional, depending on how strongly the authors want to phrase mechanistic conclusions.

      -We thank the Reviewer. We have toned down the claims to reflect the bioinformatic nature of the paper. We will perform suggested experiments below in another paper.

      Essential if the title/abstract continue to use strong causal language • Protein-level validation of the PD-L1/PD-1 axis and CD8 activation in the GL261 model. A focused flow cytometry panel (myeloid PD-L1; CD8 PD-1 plus one or two effector markers such as GZMB/IFNG/Ki67) or multiplex IF/IHC on tumor sections would substantially strengthen the central MAC1 ____→____ CD8 claim. • An orthogonal measure of tumor burden in the same treatment paradigm. The manuscript currently treats the drop in the fraction of nuclei annotated as tumor (Fig. 1E) as a reduction in tumor burden; I recommend including IVIS longitudinal data and/or histologic tumor area/volume at harvest to support this statement. • If feasible, modestly increase in vivo biological replication (the snRNA-seq analysis currently has n=2 treated after QC). Even adding one additional treated animal that passes QC would help. Feasibility (rough guidance only; core pricing varies widely by institution): a repeat GL261 cohort to harvest tumors for flow and/or histology typically takes ~3-6 weeks end-to-end. A small flow panel plus core time is often on the order of a few thousand USD (antibodies and cytometry), while basic histology/IF quantification might be in the hundreds to low-thousands. If the authors already have stored tissue from the existing cohort, some of this could be faster/cheaper. Optional (only if the authors want the MAC4 ____→____ PGE2 ____→____ Parp11 mechanism to be more than a model) • Measure PGE2 (ELISA or targeted lipidomics) in tumor lysates/conditioned media from control vs MR3-treated samples, or provide a closer proxy for PGE2 pathway engagement in the relevant clusters. Optional (only if the authors want to argue PARP11 is an effector) • A minimal functional linkage experiment (in vitro) testing whether PARP11 perturbation phenocopies the relevant aspect of MR3 in macrophages (e.g., PD-L1 levels and/or the ability to suppress CD8 activation in a co-culture). This could be done with a PARP11 inhibitor or knockdown. I do not think in vivo genetics are required for this manuscript, but some functional tie would prevent overinterpretation.

      __ Minor comments A. Analysis/experimental clarifications that seem straightforward • Human DESeq2: please clarify whether the DESeq2 design was paired by patient (i.e., patient as a blocking factor).__

      -See above. We re-ran the human differential expression analysis using a paired design with patient as a blocking factor and explained in Methods.

      • __ snRNA-seq DE: please clarify whether any sample-aware method was used for the key DE conclusions (especially Parp11/Cd274 changes) rather than per-cell statistics alone.__ -See above. The key DE results are based on sample-level pseudobulk (each mouse as one replicate). The two MR3-treated samples cluster together in pseudobulk PCA (new Fig. S2C), and the tumor reduction is seen in both animals (new Fig. S2F), supporting robustness to animal variability.

      • __ CellChat: because min.cells filtering is used (min.cells = 20), please note this explicitly in figure legends where subclusters appear only in one condition, so readers understand why certain labels are missing.__ -We have edited the Fig 3 legend accordingly.

      __ Figure and text consistency issues I noticed several figure/legend/citation issues that look like simple fixes: • Fig. 3 legend panel labeling: the legend text refers to the PD-L1/PD-1 chord plot as (C) MR3− and (D) MR3+, but (D) is the heatmap panel; the chord plots are (C) and (E). This should likely read (C) MR3− and (E) MR3+.__

      -Yes, and corrected.

      • __ Fig. 5 panel reference: the Results text refers to the Cross Species module as Fig. 5F, but the Fig. 5 legend defines panels (A-E) and labels (E) as "Cross Species module." Please reconcile (either change the text to Fig. 5E or add a panel F).__ -Changed to "E".

      • __ Discussion figure citation: the Discussion cites Ptges3/PGE2 evidence as "(Figure 3)," but Ptges3 is shown in Fig. 4A and the model is in Fig. 4B.__ -Added "Figure 4A-B" there.

      • __ Fig. 1D numbers: the Results text states 509/1,602 (mouse) and 15/106 (human) "rescued" genes (Fig. 1D), but the Fig. 1D pie charts are labeled with different totals (mouse total 3490; human total 104). Please reconcile the denominators and ensure the figure matches the text and analysis choice (bulk vs snRNA vs filtered gene sets).__ -For the cross-species analysis, we only counted genes with human-mouse orthologs so that the two datasets were compared in the same gene space. This avoids inflation from species-specific genes. We have added a clarification in the figure legend.

      • __ Fig. 2 legend: there is a stray quote in "lymphoid subclusters" (appears as subclusters").__ -removed.

      __ Presentation and framing • Tone down or carefully qualify statements equating snRNA-seq composition shifts with reduced tumor burden (or add an orthogonal tumor-burden measurement as suggested above).__

      -We have removed "tumor burden" throughout the manuscript.

      • __ Where possible, tie mechanistic language explicitly to the level of evidence ("consistent with," "suggests," "model proposes") so readers do not over-interpret the transcriptomic inference.__ -done.

      • __ Consider adding a small schematic in the Results or a short "interpretation" sentence in the figure legends explaining what the CellChat plots do and do not show, since non-specialists can misread these as direct interaction measurements.__ -We have added explanations in Fig 3 legends for CellChat and emphasized the transcriptomic nature of the data.

      __ Prior literature The PARP11 immunotherapy literature is cited appropriately. For the PGE2 angle, it may help readers if the authors add one or two glioma-focused references on PGE2-mediated myeloid/T-cell suppression (if not already in the full reference list).__

      -We have added two more papers showing PGE2 may induce MDSCs and immunosuppresion in glioma (3) (4).

      Significance

      Nature and significance of the advance The advance here is primarily conceptual and resource-oriented. Conceptually, the work connects a mitochondrial regulator (MIRO1) to a specific, testable immunosuppressive circuit in the glioma TME. Technically, the cross-species perturbation framework and the accompanying MiroScape portal should be useful to groups looking for conserved, drug-responsive immune programs.

      Context within the existing literature Immunosuppression in glioma and the importance of tumor-associated myeloid populations are well established, as is the limited success of checkpoint blockade in GBM. The manuscript's proposed MAC4/MAC1 paracrine model and its emphasis on PD-L1/PD-1 signaling adds a focused, hypothesis-generating view of how particular macrophage states might sustain CD8 dysfunction. The identification of PARP11 as a conserved MR3-responsive gene also fits with emerging work implicating PARP11 in immunoregulatory programs and response to immunotherapy.

      Audience • Neuro-oncology and glioma TME researchers (myeloid heterogeneity, immune suppression). • Tumor immunology groups interested in myeloid-driven checkpoint resistance. • Researchers working on mitochondrial stress signaling and immunometabolism. • Computational biologists building cross-species or multi-modal integration frameworks. Reviewer expertise and limitations Keywords: glioma microenvironment; macrophage/microglia biology; tumor immunology; single-cell/nucleus transcriptomics; computational ligand-receptor inference. Limitations: I am not a medicinal chemist, so I cannot deeply evaluate MR3 chemistry, PK/PD, or specificity beyond what is presented. I also did not evaluate the full web-portal implementation beyond the manuscript description.

      Reviewer #2

      Evidence, reproducibility and clarity

      The authors study responses to MIRO1 inhibition in a mouse model of GL261 GBM and in human tissue pieces treated ex vivo. They provide an interesting link between mitochondrial function and potential therapeutic outcomes in a tumor type that is typically challenging to treat. The manuscript is written clearly, in correct English language and figures are well structured and easy to interpret. -We thank the Reviewer for the positive comments. We want to clarify that the compound binds to Miro1 and doesn't inhibit Miro1's GTPase activity (1). We have now added explanation in Introduction.

      __ Major critique: 1. However, I need to stress that study is based of few experiments with low robustness. The predominant experiment is single-nuclei RNAseq analysis of GL261 tumors implanted into mice, constituting 3 CTRL and 2 treated mice, due to removal of 3rd animal following sequencing (low recovery of high quality nuclei). Therefore, the sample group is small. This is understandable for snRNA-seq experiment (although 3 animals in treated group is somewhat necessary), but the efficiency of treatment with MR3 should be better documented in a larger cohort of animals. Crucial changes in distribution of cell types or polarisation of myeloid cells should be confirmed with flow cytometry, which is more feasible on a larger cohort.__

      -We agree. As explained to Rev 3, the current paper is focused on conceptual and methodical advances and providing a resource to the community, which is already big with 11 figures. As Rev 1 mentioned, our paper's significance is to transcriptomically link Miro1 to well-known immune suppression factors in glioma TME and integrate 3 glioma databases which will facilitate researchers in the field to advance their own research. Importantly, PCA analysis of the mice at the animal level showed clear separation of treated from untreated groups and the reduction in tumor cell proportion was also observed in both treated animals (new Fig. S2C, F), supporting technical reproducibility despite a small n. Thus, our methods and resource should be still valid and useful to the community. Exploring the tumor-reducing efficacy of MR3 or combined treatments (e.g. with anti-PD-L1 or PARP11 inhibitor) in larger cohorts is an exciting next step.

      __ Human model does not seem robust (also, only 3 patients). Very few genes are affected by treatment (incomparably less than in mice), which poses a question if the model is sufficient to study the effect of the treatment. This should be at least discussed and arguments should be stated why such model is suitable.__

      -We agree and the observed variability in treatment response is actually expected and consistent with the well-established molecular and phenotypic heterogeneity of human glioma. Importantly, despite this diversity, we identified one gene (PARP11) consistently altered across all patient's samples and mouse model. This cross-species reproducibility supports the biological and translational relevance of the finding of PARP11. We have now added this to Discussion.

      In addition, we reanalyzed the human bulk RNA-seq using a paired design with patient as a blocking factor as suggested by another reviewer, which increased the number of DE genes (new Fig. 1C).

      __ Fig. S1E shows that actually few genes are commonly affected between human and mouse experiments. So conclusion about "conserved" modulation by MR3 seem an overstatement.__

      -We meant "Parp11" is conserved. We have deleted "conserved" throughout the manuscript when we didn't refer specifically Parp11 to avoid confusion.

      __ Mechanistic conclusions about PARP11, PGE, PD-L1 etc are not documented by any wet lab experiments, just by bioinformatic modelling.__ -We have scrutinized the Main Text to emphasize this.

      Minor: 1. Authors should discuss choice of GL261 model. It is immunogenic and does not resemble human GBM ideally, so the choice should be explained.

      -Although GL261 model demonstrates higher immunogenicity compared to human GBM, this feature enables evaluation of immune-modulating therapies and mechanisms in an immune-competent setting. This model still preserves critical aspects of glioma biology, including immunosuppressive TME, invasive behavior, and intracranial growth (5). Thus, this model provides a suitable platform for our study of mechanistic investigation of immune cells in the TME. We have now added this to Method.

      __ In clustering of mouse snRNAseq data, T cells seem underclustered, e.g. Treg cluster clearly constitutes half of Il2ra-positive and negative cells, the latter probably being conventional CD4+ T cells (usually CD4+ T cells in GL261 are 50:50 Treg and conventional). This can affect further conclusions on cell:cell interactions.__

      -We thank the reviewer for this important observation. We agree that in the former annotation, it was improper to annotate all the CD4+ T cells as Treg cells, given the limited expression of Foxp3, Il2ra and other Treg marker genes. Consequently, the previously annotated "Treg cluster" likely includes both regulatory-like and conventional CD4+ T cells.

      We have further clustered the CD4+ T cell population and found that if we divided CD4+ T cells into conventional CD4+ T and Treg cells, it yielded few Treg cells for downstream analysis (~50). This would compromise the robustness and reliability of our following analysis (CellChat/DEA/etc).

      To address this, we have revised our annotation and now refer to this population more conservatively as "regulatory-like CD4+ T cells" rather than bona fide Tregs. Importantly, this subset still exhibits elevated expression of immunoregulatory molecules and is associated with CD8+ T cell dysfunction, preserving the main conclusions regarding immune suppression within the tumor microenvironment. We have updated the Results, Figures, and Discussion accordingly to clarify this revised annotation and its implications for cell-cell interactions.

      Please refer to following new figures for the updated annotation and associated results:

      Fig. 2G-H, Fig. 3A-G, Fig. S4C-D,G, Fig. S5-B-G, Fig. S6A.

      Significance

      The study provides an interesting conclusion and potentially relevant discovery. However, in opinion of this reviewer, the performed experiments do not strengthen this sufficiently, especially in terms of mechanical insights and weak data on human samples. In the line of general literature on new treatments of GBM and testing thereof in mouse model, this study lacks mechanistic insights and solid data on therapeutic efficiency.

      -As mentioned above, the goal of this paper is to provide novel methods to integrate datasets, resource building, and identify markers in the glioma TME. It will serve as useful resources to the community and form the foundation for future therapeutic validation in larger cohorts. We have acknowledged the limitations in the revised manuscript.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): ____

      The authors of "Cross-Species Transcriptomic Integration Reveals a Conserved, MIRO1-Mediated Macrophage-to-T-Cell Signaling Axis Driving Immunosuppression in Glioma" present transcriptomic, both bulk RNA Seq and single nucleus RNA Seq, from GL261 murine gliomas treated with the Miro1 targeting compound MR3. RNA Seq data from human tumor explants treated with MR3 is also presented. The authors compared DEGs from their treated tissues with publicly available RNA Seq data sets comparing DEGs from normal tissue and Glioma tumors. The goal being to identify genes modulated by MR3 that may be underlying glioma growth, TME changes, and immunosuppression. There is a significant amount of data presented, with in-depth analysis conducted on the sequencing data sets. The manuscript is lacking in mechanistic depth and this reviewer feels that the results are over-interpreted, especially without any additional conformational assays run to confirm the interpretation of the sequencing data. There were many bold statements made (lines 109-110, 117, 130-131, 142-144, 163-165) that I felt did not have enough evidence to back up their claims. __

      -We have toned down these places mentioned above:

      Line 109-110: Deleted now

      Line 117: Deleted now

      Line 130-131: Deleted now

      Line 142-144: Deleted: "highly differentially expressed", the rest of the sentence is supported by our data.

      Line 163-165: Deleted now

      As explained later, our paper is focused on bioinformatic analysis and resource and method building. In-depth functional studies will be performed in another paper.

      __A significant concern is the lack of conformation that MR3 is targeting Miro1 in these models. __

      -We have done this in another manuscript where we show that in cellular glioma models, Miro1 is the target of MR3 and MR3 exerts its functions via directly binding to Miro1.

      __Previous publications from the authors have shown evidence that MR3 reduces Miro1 expression in cell and fly models. Sometimes this requires the co application of FCCP or antimycin A. Thus, the results attributed within cannot be attributed to Miro1 changes but rather any on or off-target effect of MR3. __

      -We originally discovered MR3 by ligand-based in silico modeling and thermal shift direct binding assay (1, 2). Thus, MR3 is a Miro1 binder (stated in Abstract and Introduction too, now we have added more background in Introduction). Indeed, sometimes we saw MR3 reduced Miro1 protein levels under certain conditions, for example, in vivo in flies after days of feeding (1, 2), or in PD cells upon Antimycin A or CCCP treatment (1, 2, 6, 7). MR3 mostly likely exerts its function via altering Miro1 protein-protein interactions (8) and Miro1 protein is subsequently degraded in proteasomes following complex dissociation or after posttranslational modifications (1, 2) (8). We have stated this hypothesis in Result section (page 10, possible model).

      In our other papers we have excluded off-target effect of MR3 by examining other mitochondrial GTPases (1, 2) including Miro2, and by showing Miro1 KD glioma cells phenocopied the effects of MR3 and drug-resistant Miro1 mutant in glioma cells rendered insensitivity to MR3. These data show Miro1 is the main target of MR3.

      We have added more explanations to the Introduction.

      __Understanding that mouse studies are expensive and time-consuming, and the acquisition of human tissue is not trivial, the sample sets are still small. Further confirmation of findings in cell models, organoids etc. would strengthen the findings and justify the smaller sample size of mice and human tissue. __

      -We agree and we have another in-depth study. However, the current paper is focused on conceptual and methodical advances and providing a resource to the community, which is already big with 11 figures. As Rev 1 mentioned, our paper's significance is to transcriptomically link Miro1 to well-known immune suppression factors in glioma TME and integrate 3 glioma databases which will facilitate researchers in the field to advance their own research. Thoroughly understanding Miro1's role in glioma TME is our next goal as stated in Discussion and is beyond the scope of the current study.

      __The website MiroScape will be a very useful tool in the proper hands. ____

      1. Confirm activity of MR3 on Miro1 in relevant samples. Direct downregulation? Modulation of other targets known to be altered by MR3? __

      -As mentioned above, we have shown in tumor cells, MR3 disrupts pathogenic Miro1-protein interactions without the need to reduce Miro1 protein. There is currently no other target known to be altered by MR3, not even Miro2, demonstrated before (1, 2). We have added more explanations in Main Text.

      __ Conduct further mechanistic work to validate claims inferred by differentially expressed genes.__

      -As mentioned above, our current paper is focused on bioinformatic methods and resource building. Further mechanistic work will be performed in another paper.

      __ Significantly temper claims related cell targeting, direct communication between cells and overarching responses inferred from Sequencing data. -Done. See above and Main Text.

      Reviewer #3 (Significance (Required)):

      My laboratories expertise lies in signaling related to mitochondrial structure and function. We have investigated the Miro1 protein and effects on cellular responses related to Miro1 expression. We have tested the MR3 compound in our own systems with limited success. Therefore my major concerns lie in validating the on-target activity of the compound in their models. __-As explained above, in our other papers we have thoroughly examined on-target activity of MR3 by courter-screening other Miro1 related/similar proteins (1, 2, 6, 7) and by using Miro1 KD cells. We have now added more explanations in Main Text.

      __ With additional mechanistic validation this could be a very significant study. Using advanced model systems as the authors do allows for a comprehensive understanding of tissue responses. This is far advanced from simple single cell line culture studies but also adds significant complexity to the interpretation of the data. I am a strong believer that Sequecing data must be validated with functional assays.__

      -We agree and are actively conducting those studies. However, bioinformatic analysis and method and resource building are sometimes too comprehensive to combine with functional data which may take years to obtain. We think our paper's method, markers identified in TME, and resources will be very useful to the community.

      References

      1. Hsieh CH, Li L, Vanhauwaert R, Nguyen KT, Davis MD, Bu G, Wszolek ZK, Wang X. Miro1 Marks Parkinson's Disease Subset and Miro1 Reducer Rescues Neuron Loss in Parkinson's Models. Cell metabolism. 2019;30(6):1131-40 e7. Epub 2019/10/01. doi: 10.1016/j.cmet.2019.08.023. PubMed PMID: 31564441; PMCID: PMC6893131.
      2. Li L, Conradson DM, Bharat V, Kim MJ, Hsieh CH, Minhas PS, Papakyrikos AM, Durairaj AS, Ludlam A, Andreasson KI, Partridge L, Cianfrocco MA, Wang X. A mitochondrial membrane-bridging machinery mediates signal transduction of intramitochondrial oxidation. Nat Metab. 2021. Epub 2021/09/11. doi: 10.1038/s42255-021-00443-2. PubMed PMID: 34504353.
      3. Mi Y, Guo N, Luan J, Cheng J, Hu Z, Jiang P, Jin W, Gao X. The Emerging Role of Myeloid-Derived Suppressor Cells in the Glioma Immune Suppressive Microenvironment. Front Immunol. 2020;11:737. Epub 2020/05/12. doi: 10.3389/fimmu.2020.00737. PubMed PMID: 32391020; PMCID: PMC7193311.
      4. Dean PT, Hooks SB. Pleiotropic effects of the COX-2/PGE2 axis in the glioblastoma tumor microenvironment. Front Oncol. 2022;12:1116014. Epub 20230126. doi: 10.3389/fonc.2022.1116014. PubMed PMID: 36776369; PMCID: PMC9909545.
      5. Mathios D, Kim JE, Mangraviti A, Phallen J, Park CK, Jackson CM, Garzon-Muvdi T, Kim E, Theodros D, Polanczyk M, Martin AM, Suk I, Ye X, Tyler B, Bettegowda C, Brem H, Pardoll DM, Lim M. Anti-PD-1 antitumor immunity is enhanced by local and abrogated by systemic chemotherapy in GBM. Science translational medicine. 2016;8(370):370ra180. Epub 2016/12/23. doi: 10.1126/scitranslmed.aag2942. PubMed PMID: 28003545; PMCID: PMC5724383.
      6. Bharat V, Durairaj AS, Vanhauwaert R, Li L, Muir CM, Chandra S, Kwak CS, Le Guen Y, Nandakishore P, Hsieh CH, Rensi SE, Altman RB, Greicius MD, Feng L, Wang X. A mitochondrial inside-out iron-calcium signal reveals drug targets for Parkinson's disease. Cell Rep. 2023;42(12):113544. Epub 2023/12/07. doi: 10.1016/j.celrep.2023.113544. PubMed PMID: 38060381.
      7. Bharat V, Hsieh CH, Wang X. Mitochondrial Defects in Fibroblasts of Pathogenic MAPT Patients. Front Cell Dev Biol. 2021;9:765408. Epub 2021/11/23. doi: 10.3389/fcell.2021.765408. PubMed PMID: 34805172; PMCID: PMC8595217.
      8. Kwak CS, Du Z, Creery JS, Wilkerson EM, Major MB, Elias JE, Wang X. Optogenetic Proximity Labeling Maps Spatially Resolved Mitochondrial Surface Proteomes and a Locally Regulated Ribosome Pool. bioRxiv. 2025. Epub 2026/01/07. doi: 10.64898/2025.12.21.693523. PubMed PMID: 41497653; PMCID: PMC12767525.
    1. Author response:

      General Statements

      Our study provides important mechanistic insights into how the perinuclear actomyosin network PANEM facilitates the interaction of unfavorably positioned chromosomes, i.e. peripheral and polar chromosomes, with the mitotic spindle in early mitosis to ensure their correct segregation in subsequent anaphase. All reviewers agree that our study makes important contribution to the field of mitosis and chromosome segregation. They make positive comments on our manuscript, for example, ‘The work highlights the PANEM as a key spatial and temporal element of chromosome congression’, ‘The work is an excellent addition to the field’, and ‘the concept of PANEM could be integrated into textbooks and models of chromosome congression’. All three reviewers also acknowledge the high quality of the data, rigorous and accurate analyses, and convincing quantification in our study. Reviewers 1 and 3 give several comments and suggestions for revision of our manuscript. Please find our point-by-point revision plan of the manuscript from page 3.

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Figure 4I: This panel is currently unclear and should be drastically simplified.

      We will follow this suggestion and simplify this figure. For example, we plan to remove the column of “Start” because it is obvious and does not provide much new information.

      I recommend to reorganize figures as follows:

      Figure I: Keep as single figure but simplify. Figure 1D and 1E could be combined, move unnormalized SCV to supplementary materials. Same goes for 1F.

      We will follow this suggestion and reorganize Figure 1 accordingly.

      New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.

      As suggested, we will conduct new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We will then combine the new results with Figure S7 to make the new Figure 8.

      On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.

      C. Expansion of PANEM functional analysis

      To strengthen the conclusions and broaden the study beyond the group's previous work, PANEM function should be tested in additional contexts (some may be considered optional but important for broader impact): [underlined by authors]

      Test PANEM function in at least one additional cell line that displays PANEM to rule out cellline-specific effects.

      As suggested, we will study the effect of PANEM contraction in one or two additional cell lines that form PANEM during prophase. For example, we plan to inhibit the PANEM contraction and study the outcome, focusing on the generation of polar chromosomes, which is the major defect after the inhibition of PANEM contraction in U2OS cells.

      Evaluate PANEM contraction role in unsynchronized U2OS cells, where centrosome separation can occur before NEBD in a subset of cells (Koprivec et al., 2025), and in other cell types with variable spindle elongation timing.

      As suggested, we will investigate the outcome (e.g. generation of polar chromosomes) of reduced PANEM contraction in unsynchronized U2OS cells, and address whether the two subsets of cells, where centrosomes’ separation occurs before and after NEBD, show any difference in the outcome.

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.

      To explain the new interpretation of our results more clearly, we plan to add a new diagram to a supplemental figure in the revised manuscript.

      Minor Comments

      Sixth subheading (currently in Discussion): Move the final paragraph of the Discussion into the Results and expand it with preliminary analyses linking PANEM contraction to congression efficiency across untreated cell types or under mild nocodazole treatment.

      As suggested, we will move the final paragraph of the Discussion to make a new final section in the Results. Moreover, as suggested, we will study the outcome of inhibiting PANEM contraction in cell lines other than U2OS, and add the results to the new final section in the Results.

      Significance

      Advance

      This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]

      We have addressed or will address the underlined criticisms as detailed above.

      Audience

      Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.

      Expertise

      My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.

      Significance

      While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]

      The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is. So, we do not plan a revision based on this reviewer’s comments.

      Reviewer #3:

      Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochoremicrotubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.

      Major points

      (4) The work has high quality manual tracking of objects in early mitosis- if this would be made available to the field, it can help build AI models for tracking. The authors could consider depositing the tracking data and increasing the impact of their work.

      As suggested, we will include kinetochore tracking data as supplemental data in the revised manuscript.

      Minor points

      (2) Discussion point: If cells had not separated their centrosomes before NEBD, would PANEM still be effective? Perhaps the cancer cell lines or examples as shown in Figure 6A have some clues here.

      The same question has been raised by Reviewer #1’s major point. We will undergo new experiments to directly address this question in a revised manuscript. If we do not obtain interpretable results, we will discuss this issue further in the Discussion, as suggested.

      (3) Figure 7 cartoon shows misalignment leading to missegregation. It may be useful to consider this in the context of the centrosome directed kinetochore movements via pivoting microtubules. Is this process blocked in azBB-treated cells?

      This issue is closely relevant to point 2 above. As discussed above, we will first address this issue experimentally. If we do not obtain interpretable results, we will discuss this issue further in the Discussion.

      Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Remove repetitive statements that simply restate that later phenotypes arise as consequences of delayed Phase 1 (applicable to subheadings 3 onward).

      As suggested, we have removed the statement for the delayed start of Phase 2 for peripheral kinetochores in azBB-treated cells (Page 9, second paragraph). We have also simplified the statement for the delayed start of Phase 3 and Phase 4 to avoid repetition (Page 9, third paragraph; Page 10, second paragraph).

      B. Specificity and redundancy of actin perturbation

      To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:

      Apply global actin inhibitors (e.g., cytochalasin D, latrunculin A) to disrupt the entire actin cytoskeleton. These perturbations strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as reported previously (Lancaster et al., 2013; Dewey et al., 2017; Koprivec et al., 2025). The minimal effect of global inhibition must be addressed when proposing a localized actomyosin mechanism. Comment if the apparent differences in this approach and one that the authors were using arises due to different cell types.

      We did experiments along this line, using a dominant-negative LINC construct, in our previous study (Booth et al eLife 2019). LINC-DN should more specifically remove/reduce PANEM than the global actin inhibitors mentioned above. LINC-DN attenuated the reduction of CSV soon after NEBD and increased the number of polar chromosomes (Booth et al eLife 2019); i.e. in this regard, the outcome was similar to azBB treatment in the current study. One can expect that global actin inhibitors would also inhibit the PANEM formation and show effects similar to LINC-DN. By contrast, the indicated references reported that global actin inhibitors strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as pointed out by the reviewer. Such a difference may have arisen due to different cell types (e.g. some cells form the PANEM and others do not: Figure S7), a different extent in the inhibition of PANEM formation, and/or the inhibition of cell rounding and cytokinesis (e.g. if cytokinesis is more sensitive to inhibitors than is the PANEM formation, we may not observe the possible effects on early chromosome movements due to PANEM inhibition while cytokinesis is still affected). As suggested, we discussed this topic in the Discussion (page 15, second paragraph). 

      Clarify why spindle-associated actin, especially near centrosomes, as reported in prior studies using human cultured cells (Kita et al., 2019; Plessner et al., 2019; Aquino-Perez et al., 2024), was not observed in this study. The Myosin-10 and actin were also observed close to centrosomes during mitosis in X.laevis mitotic spindles (Woolner et al., 2008). Possible explanations include differences in fixation, probe selection, imaging methods, or cell type. Note that some actin probes (e.g., phalloidin) poorly penetrate internal actin, and certain antibodies require harsh extraction protocols. Comment on possibility that interference with a pool of Myo10 at the centrosomes is important for effects on congression.

      As the reviewer implies, we cannot rule out that we could not detect actin associated with the spindle or centrosomes because of the difference in methods or cell lines between the current study and the literature mentioned by the reviewer. We have therefore moderated our claim in the Discussion that ‘we did not detect any actin network inside the nucleus, on the spindle or between chromosomes’ by adding ‘at least, using the method and the cell line in the current study’ to this statement (Page 13, second paragraph). We have also cited the three references mentioned by the reviewer in the Discussion (Page 13, second paragraph). Regarding Myosin10, azBB (blebbistatin variant) should have negligible effects on class-X myosin, including Myosin-10 (Limouze et al 2004 [PMID 15548862]). It is therefore unlikely that the effects of azBB that we observed in the current study are due to the inhibition of Myosin-10. We have cited Woolner et al 2008 and another paper and discussed this topic in the Discussion (Page 13, second paragraph).

      C. Expansion of PANEM functional analysis

      Quantify not only the percentage of affected cells after azBB but also the number of chromosomes per cell with congression defects in the current and future experiments.

      It is tricky to count the number of chromosomes because they frequently overlap. Counting kinetochores is more feasible, but kinetochore signals show some non-specific background (e.g. those outside of the nucleus in prophase). We therefore quantified the chromosome volume at polar regions in azBB-treated cells (Figure 6C).

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.

      It has been a widely accepted view in the field that chromosome congression precedes biorientation, since the publication in 2006 (Kapoor et al Science 2006). Very recently, this view has been challenged by the new publication (Vukušić & Tolić, Nat comm 2025), as indicated by this reviewer. We have mentioned this new model and discussed the new interpretation of our results based on this new model, in the Discussion (page 14; ‘It has been a widely accepted view…’).

      To explain the new interpretation of our results more clearly, we plan to add a new diagram to a supplemental figure in the revised manuscript.

      Explain that PANEM is most critical for polar chromosomes because their peripheral positions are unfavorable for rapid biorientation (Barišić et al., 2014; Vukušić & Tolić, 2025).

      We have included such a statement in the Discussion, as a part of the new interpretation of our results based on the new model that chromosome biorientation precedes congression (see above). We have also cited the indicated two papers.

      Discuss how cell lines lacking PANEM (e.g., HeLa and others) nonetheless achieve efficient congression, and what alternative mechanisms compensate in the absence of PANEM. For example, it is well established that cells congress chromosomes after monastrol or nocodazole washout, which essentially bypasses the contribution of PANEM contraction.

      Following this suggestion, we discussed three possible mechanisms that could compensate for a lack of PANEM and facilitate kinetochore-MT interaction and chromosome congression, based on previous literature (Page 16): 1) the enhanced assembly rate of spindle MTs may facilitate kinetochore-MT interactions in N-CIN+ cancer cells, 2) chromosome biorientation may precede congression more frequently to promote the congression towards the spindle midplane, and 3) the balance between CENP-E, Dynein and chromokinesin’s activities may incline to greater chromosome-arm ejection forces towards the spindle midplane.

      Minor Comments

      These issues are more easily addressable but will significantly improve clarity and presentation.

      Introduction

      Remove the reference to Figure 1A in the Introduction. The portion of Figure 1 and related text that recapitulates the authors' previous work should be incorporated into the Introduction, not the Results.

      As suggested in the second sentence of this comment, we have moved most of the second paragraph of the first section of Results to Introduction (Page 4) and cited Figure 1A and 1B in Introduction. We would like to keep the reference to Figure 1A in the Introduction, because showing the PANEM images at the beginning of the manuscript would help readers’ understanding of our study. In addition, citing Figure 1A in the Introduction is more consistent with the suggestion in the second sentence of this comment.

      Results (by subheading)

      First subheading: When introducing the ~8-minute early mitotic interval, cite additional studies that have characterized this period: Magidson et al., 2011 (Cell); Renda et al., 2022 (Cell Reports); Koprivec et al., 2025 (bioRxiv); Vukušić & Tolić, 2025 (Nat Commun); Barišić et al., 2013 (Nat Cell Biol).

      As suggested, we cited these references at the indicated part of the first section of the Results (page 5).

      Second subheading: Cite key reviews and foundational research on kinetochore architecture and sequential chromosome movement during early mitosis: Mussachio & Desai, 2017

      (Biology); Itoh et al., 2018 (Sci Rep); Magidson et al., 2011 (Cell); Vukušić & Tolić, 2025 (Nat Commun); Koprivec et al., 2025 (bioRxiv); Rieder & Alexander, 1990 (J Cell Biol); Skibbens et al., 1993 (J Cell Biol); Kapoor et al., 2006 (Science); Armond et al., 2015 (PLoS Comput Biol); Jaqaman et al., 2010 (J Cell Biol).

      Rieder & Alexander, 1990 (J Cell Biol) and Kapoor et al., 2006 (Science) have already been cited in the second section of the Results in the original manuscript. We agree that all other references should be cited in this manuscript, and they are now cited in the Introduction and/or Discussion where they fit best (e.g. Mussachio & Desai 2017 reviews the kinetochore in general and is therefore best cited in the Introduction).

      Third subheading: Clarify why some kinetochores on Figure 3A appear outside the white boundaries if these boundaries are intended to represent the nuclear envelope.

      We interpret that these are background signals in the cytoplasm, which do not come from kinetochores, because 1) before NEBD, they were outside of the nucleus, and 2) after NEBD, they did not show any characteristic kinetochore motions such as those towards a spindle pole (Phase 2) and the spindle mid-plane (Phase 4). We have commented on these background signals in the legend for Figure 3A.

      Fifth subheading: Cite studies on polar chromosome movements: Klaasen et al., 2022 (Nature); Koprivec et al., 2025 (bioRxiv). Clarify that Figure 5F displays only those kinetochores that initiated directed congression movements.

      These two references have already been cited and discussed in this Result section of our original manuscript. However, considering this suggestion, we have discussed more about polar chromosome movements reported by Koprivec et al (page 11). Meanwhile, the reviewer is correct about Figure 5F, and we have clarified this point in the Figure 5F legend.

      Discussion

      When discussing cortical actin, cite key reviews on its presence and function during mitosis:

      Kunda & Baum, 2009 (Trends Cell Biol); Pollard & O'Shaughnessy, 2019 (Annu Rev Biochem); Di Pietro et al., 2016 (EMBO Rep).

      As suggested, we have cited all these review papers in the Discussion (page 15), and mentioned the role of the cortical actin on the spindle orientation and positioning (Kunda & Baum, 2009; Di Pietro et al., 2016), as well as the function of the actomyosin ring on cytokinesis (Pollard & O'Shaughnessy, 2019).

      Significance

      Advance

      This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]

      We have addressed or will address the underlined criticisms as detailed above.

      Audience

      Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.

      Expertise

      My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.

      Significance

      While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]

      The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is. So, we do not plan a revision based on this reviewer’s comments.

      Reviewer #3:

      Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochoremicrotubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.

      Major points

      (1) The complexity of tracking has been managed by classifying kinetochore movements into 4 categories, considering motions towards or away from the spindle mid-plane. While this is a very creative solution in most cases, there may be some difficult phases that involve movement in both directions or no dominant direction (eg Phase3-like). It is unclear if all kinetochores go through phase1, 2, 3 and 4 in a sequential or a few deviate from this pattern. A comment on this would be helpful. Also, it may be interesting to compare those that deviate from the sequence, and ask how they recover in the presence and absence of azBB.

      To respond to this comment, we would like to first clarify how we selected kinetochores for our analysis. We selected kinetochores that can be individually tracked. If kinetochore tracking was difficult (before the start of Phase 4 in control and azBB-treated cells or before observing the extended Phase 3 in azBB-treated cells) because of kinetochore crowding, we did not choose such kinetochores. We also did not include kinetochores close to spindle poles (within 4 µm) at NEBD in our analysis for the following two reasons: First, these kinetochores often did not show clear and rapid movements towards a spindle pole, which we used to define Phase 2. Second, although we referred to kinetochore co-localization with a microtubule signal for the start of Phase 2, this was difficult for kinetochores close to spindle poles because of a high density of microtubules. As requested, we have added this comment to the Method section (page 23).

      With the above selection, all selected kinetochores without azBB treatment (control) showed the poleward motion (Phase 2) and congression (Phase 4) in this order, though their extents were varied among kinetochores. All selected kinetochores with azBB treatment also showed the poleward motion (Phase 2), and some of them showed congression (Phase 4) after Phase 2. Then, Phase 1 and Phase 3 were defined as intervals between NEBD and Phase 2 and between Phase 2 and Phase 4, respectively. If no Phase 4 was observed with azBB, we judged that Phase 3 continued till the end of tracking. We have added this comment to the Method section (page 23-24).

      (2) Would peripheral kinetochore close to poles behave differently compared to peripheral kinetochore close to the midplane (figure S4)? In figure 3D, are they separated? If not, would it look different?

      Since we did not include kinetochores close to spindle poles (at NEBD), for which it was difficult to define Phase 2 (see our response to the above major point 1), in our analysis, the suggested comparison is not feasible.

      (3) Uncongressed polar chromosomes (eg., CENPE inhibited cells) are known to promote tumbling of the spindle. In figure 5B with polar chromosomes, it will be helpful to indicate how the authors decouple spindle pole movements from individual kinetochore movements.

      In contrast to CENPE-inhibited cells, azBB-treated cells did not show much tumbling of the spindle, though both cells showed uncongressed polar chromosomes. The reason for this difference may be fewer uncongressed polar chromosomes in azBB-treated cells. There were still modest spindle motions in azBB-treated cells. However, because kinetochore motions were assessed relative to a spindle pole (and other reference points on the spindle) in our study (Figure 2A, C), the modest spindle motions were offset in our analyses of kinetochore motions. We have clarified the underlined part in the Method section (page 22).

      Minor points

      (1) It will be helpful for readers to see how many kinetochores/cell were considered in the tracking studies. Figure legends show kinetochore numbers but not cell numbers.

      As suggested, we have now mentioned the number of cells, where the kinetochore motions were analyzed, in the legends for Figures 3, 4, 5, S4 and S5.

      (4) Are all the N-CIN- lines with PANEM highly sensitive to azBB? In other words, is PANEM essential for normal congression in some of these lines.

      We checked the sensitivity of cell lines in Figure S7B to blebbistatin (the original form of azBB) on DepMap. There was no plausible difference between PANEM+ and PANEM- cell lines, although the blebbistatin sensitivity data were available only for 4 cell lines (HCT116, MCF7, U2OS and HT29) in Figure S7B. Nonetheless, because blebbistatin could kill cells by inhibiting cytokinesis, the blebbistatin sensitivity may not necessarily reflect how essential the PANEM contraction is for chromosome congression.

      (5) Are congression times delayed in lines that naturally lack PANEM?

      For example, it takes 10-20 min for HeLa cells (lacking PANEM) to complete chromosome congression after the NEBD (Bancroft et al 2025: https://doi.org/10.1242/jcs.163659). This is not significantly different from the time (8-18 min) for chromosome congression we observed in U2OS cells (forming PANEM). We assume that cells lacking PANEM have developed a compensatory mechanism for efficient chromosome congression – we have newly discussed possible compensatory mechanisms in the last paragraph of the Discussion (page 16).

      (6) Page 23 "we first identified the end of congression" how does this relate to kinetochore oscillations that move kinetochores away from the metaphase plate?

      The start of kinetochore oscillation was defined as the end of Phase 4 if we could track the kinetochore until that point. In some cases where the kinetochore became close to the midplane (< 2.5 µm), it was not possible to track it further due to kinetochore crowding around the spindle mid-plane – in such cases, the end of Phase 4 was assigned as the end of tracking. In the original manuscript, it was not clear that the end of Phase 4 was defined in the same way for both non-polar and polar kinetochores, while the start of Phase 4 was defined differently for the two groups. This was confusing in the original manuscript. We have now clarified these points in the Method section (page 23).

      (7) Are spindle pole distances (spindle sizes) different in early and late mitotic cells (4min vs 6min after NEBD) in control vs azBB-treated cells? Please comment on Figure S2E (mean distance) in the context of when phase 4 is completed. Does spindle size return to normal after congression?

      In Figure S2E, we did not observe a significant difference in the spindle-pole distance (the spindle size) between control and azBB-treated cells at any individual time points. The smallest p-value was 0.094 at 6.0 min. As suggested, we have explained this in the legend for Figure S2E.

      Significance:

      The current work builds upon their previous work, in which the authors demonstrated that an actomyosin network forms on the cytoplasmic side of the nuclear envelope during prophase. This work explains how the network facilitates chromosome capture and congression by tracking motions of individual kinetochores during early mitosis. The findings can be broadly useful for cell division and the cytoskeletal fields.

      Description of analyses that authors prefer not to carry out

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Move methodological and descriptive details (e.g., especially from the second Results subheading and Figure 2) to the Methods or Supplementary Materials.

      In these parts, we define four phases of kinetochore motion in early mitosis. Without such a description in the main text, readers would be confused about subsequent analyses. Figure 2 is also important to show examples of how the four phases develop. Although we respect this suggestion from the reviewer, we would like to keep these parts in the main text and main figure.

      New Figure 2: Combine current Figures 2A, 3A, 3C, 3D, 4C, 4F, and 4H to illustrate how PANEM contraction facilitates initial interactions of peripheral chromosomes with spindle microtubules which increases speed of congression initiation.

      If we were to follow this suggestion, we would lose Figure 2B, D, Figure 3B and Figure 4A, where examples of kinetochore motions are shown in images and 3D diagrams. The new Figure would mostly consist of only graphs. Without examples of images and 3D diagrams, readers would have difficulty understanding the study. Although we respect this suggestion from the reviewer, we would like to keep Figures 2, 3 and 4, as they are (except for making Figure 4I simpler; see above).

      New Figure 3: Combine current Figures 5A, 5C, 5D, 5F, 6B, 6C, and lower panels of 4H to show how PANEM contraction repositions polar chromosomes and reduces chromosome volume in early mitosis to enable rapid initiation of congression.

      If we were to follow this suggestion, we would lose Figure 5B and Figure 6A, where examples of kinetochore/chromosome dynamics are shown in images and 3D diagrams. For the same reason as above, we would like to keep Figure 5 and 6 as they are, although we respect this suggestion from the reviewer.

      New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.

      As suggested, we will conduct new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We will then combine the new results with Figure S7 to make the new Figure 8.

      On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.

      B. Specificity and redundancy of actin perturbation

      To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:

      Examine higher-ploidy or binucleated cells to determine whether multiple PANEM contractions are coordinated and if PANEM contraction contributes more in cells of higher ploidies or specific nuclear morphologies.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Investigate dependency on nuclear shape or lamina stiffness; test whether PANEM force transmission requires a rigid nuclear remnant.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Analyze PANEM's contribution under mild microtubule perturbations that are known to induce congression problems (e.g., low-dose nocodazole).

      In the current study, we found that PANEM contraction affects chromosome motions in Phase 1 and Phase 3 but not Phase 2 or Phase 4. Mild microtubule perturbation itself could affect chromosome motions in all four Phases. We do not think it would be so informative to study what additional effects the reduced PANEM contraction shows when combined with mild microtubule perturbation.

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Minor Comments

      These issues are more easily addressable but will significantly improve clarity and presentation.

      Results (by subheading)

      Fourth subheading: Note that congression speed is lower for centrally located kinetochores because they achieve biorientation more rapidly (Barišić et al., 2013, Nat Cell Biol; Vukušić & Tolić, 2025, Nat Commun).

      We respect this comment. However, if biorientation were established more rapidly for centrally located kinetochores, it would advance the initiation of congression, but would not necessarily change congression speed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable work investigates the role of protein N-glycosylation in regulating T-cell activation and function and suggests that B4GALT1 is a potential target for tumor immunotherapy. The strength of evidence is solid, and further mechanistic validation could be provided.

      We sincerely thank the editor and reviewers for their time and constructive feedback. Your recognition of our work is much appreciated. We clarify our mechanistic studies as stated below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Yu et al investigated the role of protein N-glycosylation in regulating T-cell activation and functions is an interesting work. By using genome-wide CRISPR/Cas9 screenings, the authors found that B4GALT1 deficiency could activate expression of PD-1 and enhance functions of CD8+ T cells both in vitro and in vivo, suggesting the important roles of protein N-glycosylation in regulating functions of CD8+ T cells, which indicates that B4GALT1 is a potential target for tumor immunotherapy.

      Strengths:

      The strengths of this study are the findings of novel function of B4GALT1 deficiency in CD8 T cells.

      Weaknesses:

      However, authors did not directly demonstrate that B4GALT1 deficiency regulates the interaction between TCR and CD8, as well as functional outcomes of this interaction, such as TCR signaling enhancements.

      We are very sorry that we did not highlight our results in Fig. 5f-h enough. In those figures, we demonstrated the interaction between TCR and CD8 increased significantly in B4GALT1 deficient T-cells, by FRET assays. To confirm the important role of TCR-CD8 interaction in mediating the functions of B4GALT1 in regulating T-cell functions, such as in vitro killing of target cells, we artificially tethered TCR and CD8 by a CD8β-CD3ε fusion protein and tested its functions in both WT and B4GALT1 knockout CD8<sup>+</sup> T-cell. Our results demonstrate that such fusion protein could bypass the effect of B4GALT1 knockout in CD8<sup>+</sup> T-cells (Fig. 5g-h). Together with the results that B4GALT1 directly regulates the galactosylation of TCR and CD8, those results strongly support the model that B4GALT1 modulates T-cell functions mainly by galactosylations of TCR and CD8 that interfere their interaction.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors identify the N-glycosylation factor B4GALT1 as an important regulator of CD8 T-cell function.

      Strengths:

      (1) The use of complementary ex vivo and in vivo CRISPR screens is commendable and provides a useful dataset for future studies of CD8 T-cell biology.

      (2) The authors perform multiple untargeted analyses (RNAseq, glycoproteomics) to hone their model on how B4GALT1 functions in CD8 T-cell activation.

      (3) B4GALT1 is shown to be important in both in vitro T-cell killing assays and a mouse model of tumor control, reinforcing the authors' claims.

      Weaknesses:

      (1) The authors did not verify the efficiency of knockout in their single-gene KO lines.

      Thank reviewer for reminding. We verified the efficiency of some gRNAs by T7E1 assay. We will add those data in supplementary results in revised version later.

      (2) As B4GALT1 is a general N-glycosylation factor, the phenotypes the authors observe could formally be attributable to indirect effects on glycosylation of other proteins.

      Please see response to reviewer #1.

      (3) The specific N-glycosylation sites of TCR and CD8 are not identified, and would be helpful for site-specific mutational analysis to further the authors' model.

      Thank reviewer for suggestion! Unfortunately, there are multiple-sites of TCR and CD8 involved in N-glycosylation (https://glycosmos.org/glycomeatlas). We worry that mutations of all these sites may not only affect glycosylation of TCR and CD8 but also other essential functions of those proteins.

      (4) The study could benefit from further in vivo experiments testing the role of B4GALT1 in other physiological contexts relevant to CD8 T cells, for example, autoimmune disease or infectious disease.

      Thank reviewer for this great suggestion to expand the roles of B4GALT1 in autoimmune and infection diseases. However, since in current manuscript we are mainly focusing on tumor immunology, we think we should leave these studies for future works.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study by Yu et al investigated the role of protein N-glycosylation in regulating T-cell activation and functions is an interesting work. By using genome-wide CRISPR/Cas9 screenings, the authors found that B4GALT1 deficiency could activate expression of PD-1 and enhance functions of CD8+ T cells both in vitro and in vivo, suggesting the important roles of protein N-glycosylation in regulating functions of CD8+ T cells, which indicates that B4GALT1 is a potential target for tumor immunotherapy. However, authors need to directly demonstrate that B4GALT1 deficiency regulates the interaction between TCR and CD8, as well as functional outcomes of this interaction, such as TCR signaling enhancements. In addition, blocking PD1 has been shown to enhance antitumor effect, whereas the presented data in this study suggest that the activation of PD1 expression in the condition of B4GALT1 deficiency in T cells enhanced antitumor effect. How to reconcile this discrepancy? Finally, several minor questions need to be addressed to strengthen the conclusions in this manuscript.

      (1) We used a FRET (Fluorescence Resonance Energy Transfer) assay to measure interaction between TCR and CD8. FRET signals of TCR-CD8 increased significantly in B4GALT1 deficient T-cells, compared with control cells (Fig. 5f). For functional outcomes of this interaction, we observed enhanced T-cell killing activities in B4GALT1 deficient CD8<sup>+</sup> T-cells (Fig. 3f and Fig. 5h).

      To confirm whether reduced TCR-CD8 interaction is the major cause of TCR activation phenotypes in B4GALT1 knockout CD8<sup>+</sup> T-cells, we generated a construct in which we fused the CD8b ectodomain (ECD) with CD3e to artificially tether TCR with CD8 (Fig.5g). Overexpression of such CD8β-CD3ε fusion led to enhanced in vitro killing activities in control wild-type CD8<sup>+</sup> T-cells. On the other hand, in B4GALT1 deficient CD8<sup>+</sup>T-cells, such enhanced T-cell killing activities by fusion construct was significantly diminished (Fig.5h), suggesting it bypassed the regulation by B4GALT1.

      (2) PD-1 is both an early T-cell activation marker upon TCR activation and a T-exhausted marker under consecutive or repeated stimulations. In our screenings, PD-1 was used as an early activation marker for T-cells.

      We have clarified this in new Discussion section.

      (1) The present data relies on statistical graphs (e.g., bar and line charts) for all data, excluding the bioinformatics analysis. Including data such as flow cytometry plots, photomicrographs, or immunohistochemistry staining images will provide more direct support for the conclusions.

      Thank the reviewer for valuable suggestions! We added original flow cytometry gating strategies for Cas9 screening sorting (Fig. S1a), TIL analysis (Fig.S5), and FRET assay (Fig. S8) in revised version to provide more direct support for our conclusions.

      (2) To further validate the enhanced tumor infiltration phenotype resulting from B4GALT1 knockout, the following data would strengthen the manuscript:

      (a) Flow cytometric analysis of TILs or immunofluorescence data from tumor sections.

      Thank the reviewer for valuable suggestion! We added original flow cytometry gating strategies for TILs in Fig. S5 in revised version.

      (b) Assessment of in vivo T cell proliferation, for example, by tracking changes in the proportion of CD8+ T cells in the peripheral blood over time.

      We analyzed in vivo T-cell proliferation within tumor by CFSE (carboxyfluorescein succinimidyl ester) analysis. As shown in Fig. S6b, 6 days after infusion, B4GALT1 knockout OT-I T-cell showed increased proliferation within tumors, comparing with wild type control OT-I cells.

      (c) Evaluation of the proliferation and activation status of OT-1 CD8+ T cells specifically in the draining lymph nodes of the mouse model.

      Thank the reviewer for valuable suggestion! We plan to perform this experiment in the future.

      (3) The authors provide evidence that B4GALT1 knockout enhances CD8+ T cell function in both mouse models and human TCR-T cells (in vitro). Definitive support for the translational potential of this strategy would come from showing that B4GALT1-knockout human TCR-T cells also mediate potent in vivo function (NSG tumor-bearing model may be a better choice).

      Thank the reviewer for valuable suggestion! We are going to perform those experiments in the future. However, we do not expect that in vitro and in vivo (NSG mice) experiments will show much different results, which may also not add too much for current manuscript.

      (4) It would be preferable to include data on T cell activation and effector function (e.g., flow cytometry for IL-2, TNF-α, and IFN-γ, or ELISPOT) following stimulation with an OVA-specific peptide or co-culturing of OVA-expressing tumor cells with B4GALT1-knockout OT-1 CD8 T cells, especially the changes in the TILs compared with the non-targeting control group.

      Following co-culturing of B16-OVA tumor cells with B4GALT1-knockout or wild-type OT-I CD8<sup>+</sup> T-cells, the RNA levels and secretion levels of TNFα and IFNγ were detected by RT-qPCR and ELISA, respectively (Fig. 3c). B4GALT1-deficient OT-I T-cells showed increased expression of T-cell activation and cytotoxic markers such as IFNγ and TNFα.

      (5) What is the correlation between the expression of B4GALT1, PD-1, and TCR activation markers at various time points during a long-term T cell co-culture with tumor cells?

      Thanks for the reviewer for valuable suggestion! We don’t have this data now. While we agree that exploring this might be interesting, we think it falls outside the scope of the current study.

      (6) In line 136: Regarding the genetic targeting of B4GALT1 in T cells, it is unclear whether single or multiple gRNAs were used and if potential off-target effects were assessed. To fully validate the model, it would be important to clarify these strategies, and it is essential to include data on the knockout efficiency at both the protein (e.g., Western blot) and mRNA levels.

      We are sorry about the unclear statements for gene knockout strategy. In current study, single sgRNAs were used in all experiments for gene knockout. B4galt1 sg2 was used in Fig. 3a. Both B4galt1 sg1 and sg2 were used in Fig. S1d. We clarified this in each figure legend in revised version.

      The phenotypes of B4galt1 knockout T-cells could be rescued by overexpression of either a short or long isoform of mouse B4galt1 cDNA (Fig. 3b), indicating that potential off-target effects could be excluded.

      The sgRNA knockout efficiencies were confirmed by T7E1 assay in revised version (Fig. S2). Regrettably, anti-mouse B4galt1 antibody didn’t work in western blot.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study compares four models - VALOR (dynamic visual-text alignment), CLIP (static visual-text alignment), AlexNet (vision-only), and WordNet (text-only) - in their ability to predict human brain responses using voxel-wise encoding modeling. The results show that VALOR not only achieves the highest accuracy in predicting neural responses but also generalizes more effectively to novel datasets. In addition, VALOR captures meaningful semantic dimensions across the cortical surface and demonstrates impressive predictive power for brain responses elicited by future events.

      Strengths:

      The study leverages a multimodal machine learning model to investigate how the human brain aligns visual and textual information. Overall, the manuscript is logically organized, clearly written, and easy to follow. The results well support the main conclusions of the paper.

      (1) My primary concern is that the performance difference between VALOR and CLIP is not sufficiently explained. Both models are trained using contrastive learning on visual and textual inputs, yet CLIP performs significantly worse. The authors suggest that this may be due to VALOR being trained on dynamic movie data while CLIP is trained on static images. However, this explanation remains speculative. More in-depth discussion is needed on the architectural and inductive biases of the two models, and how these may contribute to their differences in modeling brain responses.

      Thank you for this thoughtful comment. We agree that attributing VALOR’s advantage over CLIP solely to ‘dynamic (video) versus static (image) pretraining’ would be incomplete, and that the architectural and inductive biases of the two models are central to understanding the observed performance gap.

      Both VALOR and CLIP use contrastive learning to align visual and textual representations, but they differ in several key inductive biases that are particularly relevant for modeling brain responses during continuous movie viewing. First, VALOR is trained to align temporally extended video segments with text, introducing an explicit temporal integration window that aggregates information across consecutive frames. This encourages representations that maintain context, stabilize semantics across time, and encode event-level structure. Second, VALOR’s alignment operates at the level of multi-second narrative units, rather than isolated visual snapshots, biasing the model toward representations that are sensitive to unfolding events and cross-frame consistency.

      In contrast, CLIP processes frames independently and aligns single static images with text. As a result, it lacks an intrinsic mechanism for temporal binding, context accumulation, or event-level representation. While CLIP can capture rich visual–semantic associations at the image level, it is less well suited to represent higher-order temporal structure, which is known to strongly drive responses in association cortex during naturalistic narrative perception.

      We therefore interpret VALOR’s superior encoding performance as reflecting not only exposure to dynamic audiovisual data, but also inductive biases—temporal integration and event-level alignment—that more closely match how the brain integrates information over time during movie watching. We have revised the Discussion (p. 16) to articulate these architectural and representational differences explicitly, rather than attributing the effect solely to training data modality.

      (On page 16) “Additionally, VALOR exceeds the performance of CLIP, a leading static multimodal model, as its training objective aligns multi-second video–text units, enforcing a temporal integration window and event-level semantics that maintain cross-frame consistency and narrative context, whereas CLIP’s image-level alignment provides no intrinsic mechanism for such temporal continuity.”

      (2) The methods section lacks clarity regarding which layers of VALOR and CLIP were used to extract features for voxel-wise encoding modeling. A more detailed methodological description is necessary to ensure reproducibility and interpretability. Furthermore, discussion of the inductive biases inherent in these models-and their implications for brain alignment - is crucial.

      Thank you for this comment. We agree that reproducibility and interpretability require precise specification of which model representations were used for voxel-wise encoding, as well as clearer discussion of the inductive biases inherent in these models and their implications for brain alignment.

      In the revised Methods, we now explicitly specify the feature sources for both models. For CLIP (ViT-B/32), we use the final pooled image embedding after projection into the shared image–text space, extracted frame-by-frame; one representative frame is sampled per TR, and its projected embedding serves as the regressor. For VALOR, we use the final joint video–text projection head, yielding a 512-dimensional embedding computed at the segment/TR level that integrates information across consecutive frames and aligns each multi-second video segment with its associated text. These procedures are now described step-by-step in the Methods (p. 21).

      In addition, we expanded the Discussion (p. 16) to explicitly articulate the models’ inductive biases and their relevance for brain alignment. In particular, we contrast CLIP’s image-level, framewise alignment—which lacks intrinsic temporal integration—with VALOR’s event-level, temporally extended video–text alignment, which biases representations toward context maintenance and narrative continuity. This distinction helps explain why the two models differ in their ability to predict neural responses during continuous movie viewing.

      (Methods, On page 21)

      “(1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) CLIP features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.”

      (Discussion, On page 16)

      “Additionally, VALOR exceeds the performance of CLIP, a leading static multimodal model, as its training objective aligns multi-second video–text units, enforcing a temporal integration window and event-level semantics that maintain cross-frame consistency and narrative context, whereas CLIP’s image-level alignment provides no intrinsic mechanism for such temporal continuity. More broadly, this difference reflects distinct inductive biases in how the two models represent visual–linguistic information. CLIP is optimized for framewise image–text correspondence, encouraging representations that emphasize instantaneous visual semantics but remain agnostic to temporal structure. In contrast, VALOR is explicitly biased toward aggregating information over multiple consecutive frames and aligning representations at the level of temporally extended events. These inductive biases favor context maintenance, semantic stabilization, and narrative coherence over time, which are known to be critical for driving responses in association cortex during continuous movie perception.”

      (3) A broader question remains insufficiently addressed: what is the purpose of visual-text alignment in the human brain? One hypothesis is that it supports the formation of abstract semantic representations that rely on no specific input modality. While VALOR performs well in voxel-wise encoding, it is unclear whether this necessarily indicates the emergence of such abstract semantics. The authors are encouraged to discuss how the computational architecture of VALOR may reflect this alignment mechanism and what implications it has for understanding brain function.

      Thank you for this important conceptual question. We agree that improved voxel-wise encoding performance does not, by itself, imply the emergence of fully amodal or modality-independent semantic representations in the brain. In the revision, we therefore avoid framing our findings as evidence for abstract amodal semantics and instead clarify a more constrained interpretation.

      Specifically, we suggest that visual–text alignment may support the stabilization and coordination of scene-level meaning across modalities and over time, rather than the formation of modality-free semantic codes. From this perspective, VALOR’s advantage reflects inductive biases that promote (i) integration of visual information over multi-second windows and (ii) alignment of temporally extended visual events with linguistic descriptions, yielding representations that are more temporally stable, context-sensitive, and constrained by language.

      We therefore interpret VALOR’s superior encoding performance as identifying cortical regions whose responses are better captured by temporally stabilized, cross-modal representations, rather than as evidence that these regions encode fully abstract semantics independent of input modality. We have expanded the Discussion (p. 16) to articulate this interpretation and to clarify the implications of video–text alignment for understanding how the brain integrates perception and language during naturalistic cognition.

      (On page 16) “Together, the relative gains over AlexNet (purely visual), WordNet (manual semantic annotation), and CLIP (static image–text alignment) indicate cortical systems whose responses are best captured by multi-second, multimodal integration, and highlight regions that accumulate and stabilize narrative context over time. At the same time, these findings do not imply that visual–text alignment in the brain gives rise to fully amodal, modality-independent semantic representations. Instead, we suggest that alignment between visual and linguistic signals may serve to stabilize and coordinate scene-level meaning across modalities and over time. From this perspective, VALOR’s architecture—by integrating visual information over multi-second windows and aligning temporally extended video segments with language—provides a computational proxy for how the brain may use linguistic constraints to organize, disambiguate, and maintain coherent representations of unfolding events. The observed encoding gains therefore highlight regions engaged in temporally stabilized, cross-modal integration during naturalistic perception, rather than providing evidence for abstract semantic codes divorced from sensory input.”

      (4) The current methods section does not provide enough details about the network architectures, parameter settings, or whether pretrained models were used. If so, please provide links to the pretrained models to facilitate reproducible science.

      We appreciate this comment and agree that our original description of model sources and implementation details was not sufficiently explicit. These details are essential for both reproducibility and interpretability. We have now made these specifications explicit in the revised Methods.

      In particular, we now state for each model:

      VALOR. We use the publicly released pretrained VALOR-large checkpoint. For each movie segment, we extract the joint video–text projection head output (512-D) that encodes the aligned segment-level audiovisual semantics. We report the checkpoint source, the segment duration (in frames/seconds), and how these segment-level embeddings are temporally aligned to TRs for voxel-wise encoding.

      CLIP (ViT-B/32). We use the standard pretrained CLIP weights. For each video frame, we extract the final pooled image representation after projection into CLIP’s shared image–text embedding space (512-D). We also clarify that one representative frame is sampled and aligned to each TR, and that these projected embeddings are used as regressors in the encoding model.

      AlexNet. We use the ImageNet-pretrained AlexNet. We take activations from conv5, and then apply PCA to reduce them to 512 dimensions before mapping them to the fMRI time series.

      For each model, the revised Methods now specify: the pretrained source/checkpoint, the layer or head from which features were taken, output dimensionality, any preprocessing or dimensionality reduction, and the temporal alignment procedure used to generate TR-level regressors. These revisions appear in the updated Methods (page 21).

      (On page 21) “(1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) P features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.

      (3) AlexNet features: Visual features were extracted by sampling frames at the TR level and processing them with AlexNet, an eight-layer convolutional neural network comprising five convolutional layers followed by three fully connected layers. Features from all five convolutional layers were evaluated in preliminary analyses; the fifth convolutional layer showed the best performance and was used in subsequent analyses. Intra-image z-score normalization was applied to reduce amplitude effects. Principal component analysis (PCA) was used to reduce dimensionality, retaining the top 512 components to match the dimensionality of multimodal feature spaces. This pipeline was implemented using the DNNBrain toolkit 53.

      (4) WordNet features: Semantic features were obtained from publicly available WordNet annotations provided with the HCP dataset (7T_movie_resources/WordNetFeatures.hdf5), following the procedure of Huth et al. (2012). Each second of the movie clips was manually annotated with WordNet categories according to predefined guidelines: (a) identifying clear objects and actions in the scene; (b) labeling categories that dominated for more than half of the segment duration; and (c) using specific category labels rather than general ones. A semantic feature matrix was constructed with rows corresponding to time points and columns to semantic categories, with category presence coded as binary values. More specific categories from the WordNet hierarchy were added to each labeled category, yielding a total of 859 semantic features. These features were used directly as regressors. We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text. For the generalization analysis in Study 2, annotations for the SFM dataset were aligned to the same WordNet category space to ensure consistency.”

      Reviewer #2 (Public review):

      Fu and colleagues have shown that VALOR, a model of multimodal and dynamic stimulus features, better predicts brain responses compared to unimodal or static models such as AlexNet, WordNet, or CLIP. The authors demonstrated the robustness of their findings by generalizing encoding results to an external dataset. They demonstrated the models' practical benefit by showing that semantic mappings were comparable to another model that required labor-intensive manual annotation. Finally, the authors showed that the model reveals predictive coding mechanisms of the brain, which held a meaningful relationship with individuals' fluid intelligence measures.

      Strengths:

      Recent advances in neural network models that extract visual, linguistic, and semantic features from real-world stimuli have enabled neuroscientists to build encoding models that predict brain responses from these features. Higher prediction accuracy indicates greater explained variance in neural activity, and therefore a better model of brain function. Commonly used models include AlexNet for visual features, WordNet for audio-semantic features, and CLIP for visuo-semantic features; these served as comparison models in the study. Building on this line of work, the authors developed an encoding model using VALOR, which captures the multimodal and dynamic nature of real-world stimuli. VALOR outperformed the comparison models in predicting brain responses. It also recapitulated known semantic mappings and revealed evidence of predictive processing in the brain. These findings support VALOR as a strong candidate model of brain function.

      (1) The authors argue that this modeling contributes to a better understanding of how the brain works. However, upon reading, I am less convinced about how VALOR's superior performance over other models tells us more about the brain. VALOR is a better model of the audiovisual stimulus because it processes multimodal and dynamic stimuli compared to other unimodal or static models. If the model better captures real-world stimuli, then I almost feel that it has to better capture brain responses, assuming that the brain is a system that is optimized to process multimodal and dynamic inputs from the real world. The authors could strengthen the manuscript if the significance of their encoding model findings were better explained.

      We thank the reviewer for this thoughtful comment and agree with the premise that a model preserving multimodal and temporal structure might a priori be expected to better predict brain responses to naturalistic stimuli. Our intent is not to claim that higher accuracy alone explains brain function, but rather that where and how VALOR improves prediction provides diagnostic insight into cortical processing. We have revised the Discussion to make this distinction explicit.

      Specifically, we clarify three ways in which VALOR’s gains are scientifically informative rather than merely unsurprising:

      (1) Anatomical specificity of improvement. VALOR’s advantage is not uniform across the cortex; gains are largest in regions implicated in multi-second, cross-modal integration. This spatial pattern constrains where the brain accumulates information over time and stabilizes visual representations using linguistic context.

      (2) Model as a computational probe. Beyond prediction accuracy, VALOR’s feature space recovers large-scale semantic organization without manual annotation and enables targeted tests of predictive processing. Features reflecting upcoming content selectively improve fits in specific regions, consistent with anticipatory coding during continuous narrative perception.

      (3) Link to individual differences. Individuals whose neural responses are better captured by anticipatory features show higher fluid intelligence, suggesting that VALOR indexes meaningful variability in forward-looking representations rather than merely tracking stimulus complexity.

      Accordingly, we have revised the Discussion (p. 16) to frame VALOR as a tool for mapping cortical integration profiles, probing semantic and predictive structure, and linking representational dynamics to cognition, rather than asserting that higher encoding accuracy alone explains brain function.

      (On page 16) “Together, the relative gains over AlexNet (purely visual), WordNet (manual semantic annotation), and CLIP (static image–text alignment) indicate cortical systems whose responses are best captured by multi-second, multimodal integration, and highlight regions that accumulate and stabilize narrative context over time.”

      (2) In Study 3, the authors show high alignment between WordNet and VALOR feature PCs. Upon reading the method together with Figure 3, I suspect that the alignment almost has to be high, given that the authors projected VALOR features to the Huth et al.'s PC space. Could the authors conduct non-parametric permutation tests, such as shuffling the VALOR features prior to mapping onto Huth et al.'s PC space, and then calculating the Jaccard scores? I imagine that the null distribution would be positively shifted. Still, I would be convinced if the alignment is higher than this shifted null distribution for each PC. If my understanding of this is incorrect, I suggest editing the relevant Method section (line 508) because this analysis was not easy to understand.

      Thank you for this helpful comment and for pointing out a potential source of confusion. We apologize that the original Methods description was not sufficiently clear. Importantly, VALOR features were never projected into the Huth et al. PC space, and no optimization or rotation toward the WordNet basis occurred at any stage.

      The analysis proceeded as follows:

      (1) VALOR PCs. We first fit voxel-wise encoding models using VALOR features on the Huth et al. dataset. We then applied PCA to the resulting cortical weight maps, yielding spatial components (‘VALOR PCs’) that summarize shared patterns of VALOR feature weights across the cortex.

      (2) WordNet PCs. We used the semantic principal components reported by Huth et al. (2012) directly as published, with no refitting, projection, or modification using VALOR.

      (3) Correspondence analysis. Only after obtaining these two independent sets of cortical maps did we threshold each to their top-loading vertices and compute Jaccard overlap between VALOR PCs and WordNet PCs.

      Although a permutation that shuffles VALOR features prior to projection addresses a scenario that does not apply here, we agree that the Methods description should more clearly convey the independence of the two decompositions. We have therefore revised the Methods (p. 24) to describe the procedure step-by-step and explicitly state that no projection, refitting, or optimization toward the WordNet basis was performed.

      (On page 24) “We first fit voxel-wise encoding models using VALOR features for each of the five participants in the Huth et al. dataset. For each participant, this yielded a weight map linking each VALOR feature to each voxel. We then stacked these weight maps across participants to form a single voxel-by-feature weight matrix and applied principal component analysis (PCA). The top four principal components from this analysis (“VALOR PCs”) captured shared spatial patterns of VALOR feature weights across cortex. To interpret these components, we projected VALOR feature vectors from >20,000 video segments in the VALOR training set onto each VALOR PC, which revealed dominant semantic axes (e.g., mobility, sociality, civilization). For comparison, we used the semantic principal components reported by Huth et al. (2012) from their WordNet-based encoding model; these “WordNet PCs” were taken directly from the published work and were not refit or reweighted using VALOR.”

      (3) In Study 4, the authors show that individuals whose superior parietal gyrus (SPG) exhibited high prediction distance had high fluid cognitive scores (Figure 4C). I had a hard time believing that this was a hypothesis-driven analysis. The authors motivate the analysis that "SPG and PCu have been strongly linked to fluid intelligence (line 304)". Did the authors conduct two analyses only-SPG-fluid intelligence and PCu-fluid intelligence-without relating other brain regions to other individual differences measures? Even if so, the authors should have reported the same r-value and p-value for PCu-fluid intelligence. If SPG-fluid intelligence indeed holds specificity in terms of statistical significance compared to all possible scenarios that were tested, is this rationally an expected result, and could the authors explain the specificity? Also, the authors should explain why they considered fluid intelligence to be the proxy of one's ability to anticipate upcoming scenes during movie watching. I would have understood the rationale better if the authors had at least aggregated predictive scores for all brain regions that held significance into one summary statistic and found a significant correlation with the fluid intelligence measure.

      We thank the reviewer for this careful and constructive comment and agree that greater transparency about analytic intent, specificity, and rationale is needed. We have revised the manuscript accordingly.

      (1) Analytic scope and a priori restriction. The analysis in Fig. 4C was hypothesis-driven and restricted a priori to two regions — superior parietal gyrus (SPG) and precuneus (PCu) — based on convergent evidence linking frontoparietal and medial parietal systems to fluid reasoning, relational integration, and domain-general cognitive control. Importantly, we did not conduct a whole-brain search across regions or behaviors to identify the strongest correlation post hoc.

      (2) Specificity and reporting. In response to the reviewer’s request, we now report the full results for both hypothesized regions. Prediction horizon in SPG showed a statistically reliable association with fluid intelligence, whereas PCu showed a positive but weaker trend that did not survive correction. Reporting both results makes the regional specificity explicit rather than implicit.

      (3) Why SPG over PCu? Although both regions are implicated in fluid cognition, SPG has been more consistently linked to active maintenance and manipulation of relational structure and top-down attentional control, whereas PCu is more often associated with internally oriented and mnemonic processes. We therefore interpret the stronger SPG association as consistent with a role for sustained, externally driven predictive processing during continuous perception, rather than as evidence of exclusivity.

      (4) Why fluid intelligence? We do not equate fluid intelligence with “anticipation” per se. Rather, we used gF as an a priori proxy for domain-general capacities — maintaining and updating relational context over multi-second windows, integrating multiple constraints, and exerting flexible control — that are plausibly recruited when anticipating upcoming events during naturalistic narratives. The reported relationship is associative and hypothesis-consistent, not causal.

      (5) Why not aggregate across regions? We agree that aggregation could reveal more global relationships; however, our goal in this analysis was to test whether predictive timescales in theoretically motivated control regions relate to individual differences, rather than to maximize correlation by pooling heterogeneous regions. We now clarify this rationale in the Results.

      These clarifications and additional statistics have been incorporated in the revised Results section (p. 14).

      (On page 14) “Finally, we examined whether prediction horizons were linked to individual differences in cognition. We focused on fluid intelligence (gF) because gF is widely taken to index domain-general capacities such as maintaining and updating relational context over several seconds, integrating multiple constraints, and exerting flexible top-down control — functions that should support anticipating what will happen next in a continuous narrative. We targeted two parietal regions, the SPG and the PCu, which have both been repeatedly linked to gF and high-level cognitive control in the individual-differences literature 36,37. For each participant, we correlated fluid cognition scores with that participant’s average prediction horizon in each region. As shown in Fig. 4c, individuals with longer prediction horizons in SPG showed higher fluid cognition scores (SPG: r = 0.172, FDR-corrected p = 0.047). PCu showed a similar positive trend (PCu: r = 0.111, FDR-corrected p = 0.146) but did not reach significance. These associations suggest that the ability to sustain a longer predictive timescale during naturalistic perception co-varies with broader fluid cognitive capacity. No additional brain regions or behavioral measures were examined in this analysis.”

      Reviewer #3 (Public review):

      In this work, the authors aim to improve neural encoding models for naturalistic video stimuli by integrating temporally aligned multimodal features derived from a deep learning model (VALOR) to predict fMRI responses during movie viewing.

      Strengths:

      The major strength of the study lies in its systematic comparison across unimodal and multimodal models using large-scale, high-resolution fMRI datasets. The VALOR model demonstrates improved predictive accuracy and cross-dataset generalization. The model also reveals inherent semantic dimensions of cortical organization and can be used to evaluate the integration timescale of predictive coding.

      This study demonstrates the utility of modern multimodal pretrained models for improving brain encoding in naturalistic contexts. While not conceptually novel, the application is technically sound, and the data and modeling pipeline may serve as a valuable benchmark for future studies.

      (1) Lines 95-96: The authors claim that "cortical areas share a common space," citing references [22-24]. However, these references primarily support the notion that different modalities or representations can be aligned in a common embedding space from a modeling perspective, rather than providing direct evidence that cortical areas themselves are aligned in a shared neural representational space.

      We thank the reviewer for this important clarification. We agree that the cited works do not provide direct evidence that cortical areas themselves are aligned in a single neural representational space. Rather, they demonstrate that representations derived from different modalities can be mapped into a shared embedding space from a modeling and computational perspective.

      We have therefore revised the text to avoid overstatement and to more precisely reflect what these studies support. In the revised manuscript (p. 4), we now frame the claim in terms of a shared representational framework or feature space used for modeling, rather than implying that cortical areas themselves intrinsically share a unified neural space. This clarification aligns the conceptual claim with the scope of the cited literature.

      (On page 4) “As a result, researchers are turning to multimodal deep learning, which learns from visual, linguistic, and auditory streams to model complex brain functions. This trend is supported by neuroscience evidence that cortical responses across regions can be jointly modeled within a common representational space.”

      (2) The authors discuss semantic annotation as if it is still a critical component of encoding models. However, recent advances in AI-based encoding methods rely on features derived from large-scale pretrained models (e.g., CLIP, GPT), which automatically capture semantic structure without requiring explicit annotation. While the manuscript does not systematically address this transition, it is important to clarify that the use of such pretrained models is now standard in the field and should not be positioned as an innovation of the present work. Additionally, the citation of Huth et al. (2012, Neuron) to justify the use of WordNet-based annotation omits the important methodological shift in Huth et al. (2016, Nature), which moved away from manual semantic labeling altogether. Since the 2012 dataset is used primarily to enable comparison in study 3, the emphasis should not be placed on reiterating the disadvantages of semantic annotation, which have already been addressed in prior work. Instead, the manuscript's strength lies in its direct comparison between data-driven feature representations and semantic annotation based on WordNet categories. The authors should place greater emphasis on analyzing and discussing the differences revealed by these two approaches, rather than focusing mainly on the general advantage of automated semantic mapping.

      Thank you for this thoughtful and constructive comment. We agree with the reviewer that the field has largely transitioned away from manual semantic annotation toward features derived from large-scale pretrained models (e.g., CLIP, GPT-style architectures), and that this shift is now standard rather than a novelty of the present work.

      We have revised the manuscript to clarify this positioning. Our goal is not to claim automated semantic extraction as an innovation, but rather to demonstrate how a multimodal, temporally informed video–text model can be used as a direct feature space for voxel-wise encoding of naturalistic movie fMRI data. VALOR is used as a representative example of this broader class of pretrained models, and our emphasis is on the general modeling approach rather than on promoting a specific architecture.

      We also agree that our original discussion underemphasized the important methodological shift introduced in Huth et al. (2016, Nature), which moved away from manual semantic labeling in the context of continuous spoken narratives. We now explicitly acknowledge this work and clarify that our use of WordNet-based annotations from Huth et al. (2012) serves a different purpose: it provides an interpretable, historically grounded benchmark for comparison in Study 3, rather than a claim that semantic annotation remains necessary or state-of-the-art.

      In response to the reviewer’s suggestion, we have revised the Results (p.10) and Discussion (p.18) to place greater emphasis on what is revealed by directly comparing data-driven multimodal features with category-based semantic annotation under matched conditions. Specifically, we focus on how these two approaches converge at the level of large-scale semantic organization while differing in their flexibility, temporal resolution, and dependence on human-defined categories. These revisions better reflect the current state of the field and sharpen the manuscript’s central contribution as a principled comparison between modeling approaches, rather than a general argument for automated semantic mapping.

      (On page 10) “Study 3: Comparing data-driven multimodal representations with category-based semantic annotation

      A central question in naturalistic encoding is how data-driven feature representations derived from pretrained models relate to more interpretable, category-based semantic annotations that have historically been used to study cortical semantic organization. Although recent work has shown that pretrained language and vision–language models can capture semantic structure without explicit annotation, category-based approaches such as WordNet remain valuable as interpretable reference frameworks. Here, we leverage the WordNet-based semantic components reported by Huth et al. (2012) 5 not as a state-of-the-art alternative, but as a historically grounded benchmark, allowing a controlled comparison between data-driven multimodal representations and manually defined semantic categories under matched naturalistic movie stimuli.”

      (On page 18) “Study 3 demonstrates the utility of video–text alignment models for probing higher-order semantic representations during naturalistic perception. Our comparison between VALOR-derived representations and WordNet-based semantic components highlights an important distinction between data-driven and category-based approaches to modeling meaning in the brain. While multimodal pretrained models offer flexible, high-dimensional representations that capture semantic structure without explicit annotation, category-based frameworks provide interpretability and theoretical anchoring 4,48. Using WordNet-based labeling from prior work as an interpretable reference point, we show that VALOR automatically extracts semantic dimensions—including mobility, sociality, and civilization—that closely mirror those identified using manual semantic categories (Fig. 3). The observed alignment between VALOR PCs and WordNet semantic components suggests that large-scale semantic organization emerges consistently across these approaches, even though they differ in how semantic structure is defined and learned. This convergence supports the use of pretrained multimodal models as practical encoding tools for naturalistic stimuli, while also underscoring the continued value of interpretable semantic benchmarks for understanding which aspects of meaning are represented across cortex. We do not argue that semantic annotation is required for modern encoding models; rather, WordNet-based features serve here as a historically grounded and interpretable reference for contextualizing data-driven multimodal representations.”

      (3) The authors use subject-specific encoding models trained on the HCP dataset to predict group-level mean responses in an independent in-house dataset. While this analysis is framed as testing model generalization, it is important to clarify that it is not assessing traditional out-of-distribution (OOD) generalization, where the same subject is tested on novel stimuli, but rather evaluating which encoding model's feature space contains more stimulus-specific and cross-subject-consistent information that can transfer across datasets.

      We thank the reviewer for this helpful clarification and agree that the type of generalization tested here should be described more precisely. Our analysis does not assess classical within-subject out-of-distribution (OOD) generalization, in which the same individual is tested on novel stimuli.

      Instead, for each HCP participant we train a subject-specific encoding model and transfer it to predict group-mean responses in an independent in-house dataset collected at a different site, with different participants, different movies, and different acquisition conditions. This design evaluates which encoding model’s feature space contains stimulus-locked representations that are consistent across individuals and robust to changes in dataset and experimental context, rather than within-subject stimulus novelty per se.

      We have revised the Results (p. 10) and Discussion section (p. 17) to explicitly describe this analysis as a test of cross-subject and cross-dataset transferability of stimulus representations, and to clarify the distinction from traditional OOD generalization.

      (On Page 10) “Although this analysis is not a classical within-subject out-of-distribution generalization test, it evaluates the extent to which different feature spaces capture stimulus-locked representations that are consistent across subjects and transferable across datasets, stimuli, and acquisition environments.”

      (On Page 17) “By contrast, VALOR exhibited stronger generalization in a cross-cohort, cross-stimulus, and cross-site transfer evaluation.”

      (4) Within this setup, the finding that VALOR outperforms CLIP, AlexNet, and WordNet is somewhat expected. VALOR encodes rich spatiotemporal information from videos, making it more aligned with movie-based neural responses. CLIP and AlexNet are static image-based models and thus lack temporal context, while WordNet only provides coarse categorical labels with no stimulus-specific detail. Therefore, the results primarily reflect the advantage of temporally-aware features in capturing shared neural dynamics, rather than revealing surprising model generalization. A direct comparison to pure video-based models, such as Video Swin Transformers or other more recent video models, would help strengthen the argument.

      We thank the reviewer for this baseline-focused comment and agree that, in naturalistic movie paradigms, a temporally structured audiovisual model would be expected to outperform static or unimodal feature spaces. Our intent in this comparison is therefore not to claim a surprising advantage, but to isolate which inductive biases matter for cross-dataset transfer of movie-evoked neural responses.

      The baseline models were chosen deliberately to span feature spaces that are widely used and interpretable in cognitive neuroscience: AlexNet (vision-only, frame-based), WordNet (human-defined semantic categories without learned visual features), and CLIP (static image–text alignment without temporal context). Comparing VALOR against these established baselines under matched preprocessing, TR alignment, and dimensionality control allows us to attribute performance differences specifically to temporal integration and audiovisual alignment, rather than to generic model capacity.

      We agree that a direct comparison with purely visual spatiotemporal encoders (e.g., Video Swin or TimeSformer-style models) would further dissociate the contribution of temporal visual processing from cross-modal video–text alignment. We now explicitly note this as an important direction for future work and frame VALOR as one representative of a broader class of multimodal video models, rather than as a uniquely optimal solution (Discussion, p. 16).

      (On page 16) “Second, we did not directly compare VALOR to state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE, and related architectures) that are designed to capture temporal visual structure without language grounding; such comparisons will be important for isolating the specific contributions of temporal visual processing versus cross-modal video–text alignment in naturalistic neural responses.”

      (5) Moreover, while WordNet-based encoding models perform reasonably well within-subject in the HCP dataset, their generalization to group-level responses in the Short Fun Movies (SFM) dataset is markedly poorer. This could indicate that these models capture a considerable amount of subject-specific variance, which fails to translate to consistent group-level activity. This observation highlights the importance of distinguishing between encoding models that capture stimulus-driven representations and those that overfit to individual heterogeneities.

      Thank you for this thoughtful observation. We agree with the reviewer’s interpretation. In our analyses, WordNet-based models perform reasonably well when fit and evaluated within individual HCP participants, but their performance degrades substantially when transferred to predict group-averaged responses in the independent SFM dataset. This dissociation suggests that, while WordNet annotations capture meaningful variance at the individual level, a larger fraction of that variance may be subject-specific or idiosyncratic, and therefore does not translate into consistent, stimulus-locked responses at the group level.

      One motivation for our cross-dataset, cross-subject evaluation is precisely to distinguish encoding models that primarily capture shared stimulus-driven structure from those whose apparent performance depends more strongly on individual heterogeneity. In this context, the reduced transferability of WordNet-based models highlights a potential limitation of category-based semantic features for capturing population-consistent neural dynamics during naturalistic viewing.

      We note that this effect likely reflects multiple factors rather than a single failure mode, including differences in annotation schemes, labeling granularity, and semantic coverage across datasets. By contrast, video–text models provide time-aligned linguistic features directly from the stimulus itself, reducing reliance on dataset-specific human annotation and exhibiting stronger transfer across cohorts. We have clarified this interpretation in the revised Discussion (p. 17).

      (Page 17) “Together, these findings underscore the importance of distinguishing encoding models that primarily capture shared, stimulus-driven neural structure from those whose performance relies more heavily on subject-specific heterogeneity, particularly when evaluating generalization across participants and datasets.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Methods section, please clarify which specific layer of VALOR the 512-dimensional feature vector was extracted from.

      Thank you for this suggestion. We have revised the Methods to state explicitly that the 512-dimensional feature vector is extracted from VALOR’s joint video–text projection head, i.e., the final projection layer of the contrastive alignment module that maps video and text representations into a shared embedding space. We also clarify that these 512-D embeddings are computed at the segment/TR level and then time-aligned to the BOLD signal (Methods, p. 21).

      (On page 21) “We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.”

      (2) It would be helpful to include more detailed descriptions of the network architectures and parameters for all models used.

      Thank you for the suggestion. We have revised the Methods to include model-specific subsections for all feature spaces used (VALOR, CLIP, AlexNet, and WordNet). For each model, we now explicitly report (i) the backbone architecture and training objective, (ii) the exact feature source (layer or projection head) and output dimensionality, and (iii) how features were temporally aligned to the BOLD signal. All models were used with their publicly released pretrained parameters, without additional fine-tuning. These additions are intended to improve transparency and reproducibility (Methods, p. 21).

      (On page 21) “Movie Feature Extraction

      (1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) CLIP features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.

      (3) AlexNet features: Visual features were extracted by sampling frames at the TR level and processing them with AlexNet, an eight-layer convolutional neural network comprising five convolutional layers followed by three fully connected layers. Features from all five convolutional layers were evaluated in preliminary analyses; the fifth convolutional layer showed the best performance and was used in subsequent analyses. Intra-image z-score normalization was applied to reduce amplitude effects. Principal component analysis (PCA) was used to reduce dimensionality, retaining the top 512 components to match the dimensionality of multimodal feature spaces. This pipeline was implemented using the DNNBrain toolkit 53.

      (4) WordNet features: Semantic features were obtained from publicly available WordNet annotations provided with the HCP dataset (7T_movie_resources/WordNetFeatures.hdf5), following the procedure of Huth et al. (2012). Throughout this manuscript, we use the term “semantic features” to refer to such human-annotated, category-based representations of scene content, and we reserve the term “linguistic features” for continuous language embeddings derived automatically from pretrained language or vision–language models. Each second of the movie clips was manually annotated with WordNet categories according to predefined guidelines: (a) identifying clear objects and actions in the scene; (b) labeling categories that dominated for more than half of the segment duration; and (c) using specific category labels rather than general ones. A semantic feature matrix was constructed with rows corresponding to time points and columns to semantic categories, with category presence coded as binary values. More specific categories from the WordNet hierarchy were added to each labeled category, yielding a total of 859 semantic features. These features were used directly as regressors. We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text. For the generalization analysis in Study 2, annotations for the SFM dataset were aligned to the same WordNet category space to ensure consistency.”

      (3) In Figure 3, consider following Huth et al.'s approach by using 3-4 distinct colors to visualize semantic representations across the cortical surface more clearly.

      Thank you for this excellent suggestion. We have generated an alternative visualization using a discrete 3–4 color scheme following Huth et al. to display the semantic components on the cortical surface. This version makes the spatial correspondence between components and the boundaries between cortical territories easier to see. We now include this visualization in the Supplement (Fig. S3)

      (4) In Figure 2, the brain renderings are too small. Please consider creating a separate, enlarged figure with clearer delineation of relevant ROIs.

      We appreciate this suggestion and agree that clear delineation of ROIs is important. We evaluated larger brain renderings; however, within the multi-panel layout of Fig. 2, enlarging them compressed accompanying plots/legends and introduced visual crowding, which reduced overall readability. To preserve a balanced layout and consistent typography across panels, we have kept the current rendering size in the main text and added Fig. S4 with enlarged brain renderings showing clearer ROI boundaries for the same ROIs.

      Reviewer #2 (Recommendations for the authors):

      (1) From the introduction, I feel like naïve readers would have a hard time understanding what semantic models (e.g., WordNet) are, which the authors write are based on "labor-intensive and subjective manual annotation of semantic content". It would be straightforward to explain the process-how scientists have written descriptions or denoted categories of what's happening within a TR and transformed these into embedding vectors based on language models. This description would explain what the authors mean by "labor-intensive, time-consuming, and subjective". Related to this point, the authors seem to be using the words "semantic model/feature" and "linguistic model/feature" interchangeably, which may exacerbate the confusion.

      Thank you for this helpful suggestion. We agree that naïve readers would benefit from a clearer explanation of how “semantic” models such as WordNet are constructed and from a more precise distinction between semantic and linguistic features.

      In response, we expanded the Introduction (p. 3) to explicitly describe the process by which semantic features are generated via dense human annotation (i.e., raters label objects, actions, and events within each TR and map these labels onto a predefined ontology to form feature vectors), clarifying why this approach is labor-intensive, time-consuming, and subject to rater variability.

      To avoid disrupting the conceptual flow of the Introduction, we placed the explicit terminology clarification in the Methods section (p. 22), where feature extraction is described. There, we now define semantic features as human-annotated, category-based representations of scene content, and linguistic features as continuous language embeddings derived automatically from pretrained language or vision–language models. These revisions are intended to improve clarity and consistency for both expert and non-expert readers.

      (On page 3) “Critically, semantic models often rely on dense human annotation. In early naturalistic encoding studies, trained raters watched the stimulus and labeled what was happening within each TR or short time window—for example, identifying objects, actions, or events present in the scene. These labels were then mapped onto a predefined semantic ontology (such as WordNet), yielding high-dimensional categorical feature vectors that served as regressors in encoding models. While this approach provides interpretable semantic features, it is labor-intensive, time-consuming, and inherently subjective, as annotations depend on rater judgment, labeling guidelines, and dataset-specific conventions, limiting scalability and reproducibility.”

      (On page 22) “Throughout this manuscript, we use the term “semantic features” to refer to such human-annotated, category-based representations of scene content, and we reserve the term “linguistic features” for continuous language embeddings derived automatically from pretrained language or vision–language models.”

      (2) Figure 1A does not look like an accurate schematic of the encoding method. For example, shouldn't the "Train" give rise to weight matrices, and Movies come from moments at Test? I would appreciate it if this schematic figure would explain what the encoding model is to naïve readers.

      (3) Figure 1B emphasizes that VALOR is utilizing multimodal features, but does not emphasize that the model is trained on dynamic video. The current figure looks like the model extracted visual and linguistic features from a screenshot of the video, much like the CLIP model.

      Thank you for this helpful comment. We agree that the original Fig. 1A did not sufficiently clarify what is learned during training versus what is applied during testing, and that this distinction is particularly important for naïve readers unfamiliar with encoding models. We also agree that the original Fig. 1B did not sufficiently emphasize that VALOR is trained on dynamic video segments, and that the schematic could be misinterpreted as aligning a single video frame with text, similar to CLIP-style image–text models.

      We have revised Fig. 1A (p. 6) to make the encoding procedure explicit and pedagogical. Specifically, we now clearly depict that, during the training phase (HCP dataset), voxel-wise encoding models learn feature-to-voxel weight matrices from stimulus features and BOLD responses. These learned weights are explicitly labeled as voxel-wise weight matrices and visually associated with the training stage. In the testing/generalization phase (SFM dataset), we now indicate that these learned weights are held fixed and applied to features extracted from novel movies to generate predicted BOLD responses. Additional labels were added to distinguish “Training (learn weights)” from “Testing/Transfer (apply fixed weights)” and to clarify that the encoding model implements a linear mapping from stimulus features to voxel responses. We have also rewritten the Fig. 1 legend (p. 6) to explicitly explain the encoding workflow in words, including (i) the learning of voxel-specific weights during training, (ii) their reuse during cross-dataset transfer, and (iii) how generalization performance is evaluated. These changes are intended to ensure that Fig. 1A accurately reflects the encoding methodology and is understandable to readers without prior experience with encoding models.

      We have revised Fig. 1B (p. 6) to explicitly highlight the temporal nature of the video input used by VALOR. In the updated schematic, the visual stream is depicted as a sequence of consecutive frames spanning multiple seconds, grouped into a video segment, rather than as a single static image. Additional labels indicate that VALOR encodes temporally extended video clips and aligns them with corresponding textual descriptions in a shared embedding space via contrastive learning. We have also updated the figure legend (p. 6) to clarify that VALOR operates on multi-frame video segments and explicitly models temporal structure, distinguishing it from static image–text models such as CLIP. These changes are intended to make clear that VALOR’s advantage derives not only from multimodality, but also from learning representations over time.

      (4) Regarding Figure 2, why were paired t-tests conducted in one-sided comparisons? Shouldn't this be two-sided, given that there is no reason to assume one is higher or lower than another?

      Thank you for raising this point. We agree that, in the absence of a preregistered directional hypothesis, paired comparisons should be evaluated using two-sided statistical tests.

      In response, we have re-run all paired comparisons reported in Figure 2 (p. 9) using two-sided paired t-tests, recomputed the corresponding p-values and false discovery rate (FDR) corrections, and updated the significance markers in the figure and captions accordingly. Importantly, this change does not alter the qualitative pattern of results or the main conclusions reported in the manuscript.

      (5) Regarding Study 4, I am curious whether the results are specific to forward-looking representations (predictive coding) or whether the results broadly reveal regions that are sensitive to contexts. For example, if the authors were to incorporate nearby past scenes in the analysis rather than the nearby future scenes, would different brain regions light up?

      Thank you for this thoughtful question. We agree that it is important to distinguish forward-looking (predictive) representations from more general sensitivity to temporal context. In Study 4, we deliberately operationalized prediction using future-aligned features, such that only information from upcoming scenes was incorporated into the encoding model. Accordingly, the reported effects should be interpreted as reflecting forward-oriented representations rather than generic context sensitivity.

      To make this interpretive scope explicit, we have added a clarifying sentence at the beginning of the Study 4 paragraph in the Discussion (p.18), noting that our analysis incorporates only future-aligned features and that directly contrasting past- and future-aligned features will be an important direction for future work. This clarification is intended to clearly bound our claims while addressing the reviewer’s conceptual distinction..

      (On page 18) “In Study 4, we used a video-text alignment model to investigate predictive coding mechanisms. Because our analysis incorporates only future-aligned features, the reported effects should be interpreted as reflecting forward-oriented representations rather than generic sensitivity to temporal context; directly contrasting past- and future-aligned features will be an important direction for future work.”

      (6) In the paragraph starting in line 447, were WordNet feature time series also reduced to 512 dimensions like the rest of the model features?

      Thank you for the question. In the main analyses, WordNet feature time series were not reduced to 512 dimensions and were instead used at their full dimensionality (859 features).

      For comparability with the other feature spaces, we additionally conducted a control analysis in which WordNet features were reduced to 512 dimensions using PCA. The PCA was fit within each training fold to avoid information leakage, and the resulting 512-D features were evaluated using the same encoding pipeline. This PCA-reduced version performed slightly worse than the full 859-D WordNet representation. Accordingly, we report results from the full 859-D WordNet features in the main text. We have clarified this point in the Methods section (p. 22).

      (On page 22) “We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text.”

      (7) I don't think authors have written what VALOR stands for.

      Thank you for the reminder. We now define the VALOR acronym at its first mention in the Abstract and Introduction and use the abbreviation thereafter.

      (On page 2) “Using a state-of-the-art deep learning model (VALOR; Vision-Audio-Language Omni-peRception)”

      (On page 5) “To answer this, we apply a video-text alignment encoding framework, using VALOR (Vision-Audio-Language Omni-peRception)—a high-performing, open-source model that aligns visual and linguistic features over time—to predict brain responses during movie watching.”

      (8) When calculating equation (3), please make sure that the correlation values are Fisher's r-to-z transformed.

      Thank you for this reminder. We confirm that all correlation coefficients used in Equation (3) are now Fisher r-to-z transformed prior to any averaging, contrasts, or statistical testing, and this procedure is now explicitly stated in the Methods. We have also updated Fig. 4a (p. 15) to reflect this transformation. Importantly, applying the r-to-z transformation does not change the qualitative pattern of results or their statistical significance.

      (9) I wasn't able to check the OSF data/codes because it required permission.

      Thank you for flagging this, and we apologize for the inconvenience. We have removed the permission restriction and set the OSF repository to public read-only access, which should resolve the issue.

      Reviewer #3 (Recommendations for the authors):

      (1) The current approach extracts features from a single "best" layer of each model, which may be suboptimal for predicting neural responses. Prior work has shown that combining features across multiple layers through optimized fusion strategies (e.g., St-Yves et al., 2023) or using model ensembles (e.g., Li et al., 2024) can substantially improve encoding performance. The authors may consider these more comprehensive approaches either as additional baselines or as alternative directions to enhance model accuracy.

      Thank you for this constructive suggestion. We agree that combining features across multiple layers or using optimized fusion and ensemble strategies, as demonstrated in recent work (e.g., St-Yves et al., 2023; Li et al., 2024), can substantially improve absolute encoding performance.

      In the present study, however, we intentionally evaluated each model using its single best-performing layer within a matched encoding pipeline. This design choice was made to maintain model-agnostic comparability and interpretability, and to ensure that performance differences could be attributed primarily to the type of representation (e.g., temporally informed video–text features versus static or unimodal features), rather than to differences in model complexity, parameter count, or fusion strategy. Importantly, this constraint was applied uniformly across all models and therefore does not favor VALOR over the baselines.

      We now explicitly note in the Discussion (p. 19) that multilayer fusion and ensemble approaches represent a natural and promising extension of our framework and are likely to further improve absolute prediction accuracy. Our goal in the current work was to establish the practical utility and generalizability of temporally aligned video–text features for naturalistic movie fMRI under a controlled and comparable evaluation setting..

      (On page 19) “Third, for comparability across models we evaluated each model using its single best-performing layer within a matched encoding pipeline rather than using multilayer fusion or ensembling, which allowed us to attribute performance differences to representational format but likely underestimates the absolute performance ceiling.”

      (2) Given the naturalistic video-based task, the manuscript would benefit from including state-of-the-art video-only models (e.g., Video Swin Transformer, VideoMAE, and other more recent architectures) as explicit baselines. These models are designed to capture spatiotemporal structure without relying on language input and would provide a more targeted comparison to assess the specific contribution of temporal visual processing.

      Thank you for this thoughtful suggestion. We agree that state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE) are highly relevant baselines for naturalistic movie paradigms and would provide a more targeted comparison for isolating the contribution of temporal visual processing independent of language input.

      In the present study, our primary goal was not to exhaustively benchmark all possible video architectures, but to evaluate whether temporally informed video–text features can serve as a practical and general-purpose encoding framework that improves upon the models most commonly used in cognitive neuroscience for naturalistic fMRI (e.g., AlexNet for vision, WordNet for semantic annotation, and CLIP for static multimodal alignment). Using these established baselines allowed us to place our results in direct continuity with prior neuroimaging work and to attribute performance differences to representational format under a controlled encoding pipeline.

      We agree that incorporating modern video-only spatiotemporal encoders is an important next step, particularly for disentangling the relative contributions of temporal visual structure and cross-modal video–text alignment. We now explicitly note this point in the Discussion (p.19) as a limitation and future direction, and view such comparisons as a natural extension of the current framework within the same TR-aligned encoding setup.

      (On page 19) “Second, we did not directly compare VALOR to state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE, and related architectures) that are designed to capture temporal visual structure without language grounding; such comparisons will be important for isolating the specific contributions of temporal visual processing versus cross-modal video–text alignment in naturalistic neural responses.”

      (3) An additional consideration is the scale of the AI models used for feature extraction. Previous studies (e.g., Matsuyama et al., 2023) have indicated that model size - particularly the number of parameters - can influence neural prediction performance, independently of architecture. A discussion or analysis of how model size contributes to the observed encoding gains would help clarify whether improvements are due to the representational quality of the model or simply its scale

      Thank you for this important point. We agree that model scale—particularly parameter count—can influence neural prediction performance independently of architecture, as noted in prior work (e.g., Matsuyama et al., 2023).

      In the present study, our primary goal was to evaluate whether temporally informed video–text representations provide practical advantages over unimodal and static multimodal baselines that are widely used in cognitive neuroscience for naturalistic movie fMRI, under a matched encoding pipeline. We did not perform a systematic scale-controlled analysis in this revision because doing so would require training or evaluating multiple size-matched variants across video-only and video–text architectures, which is beyond the scope of the current work.

      We therefore agree that part of the observed performance gains may reflect model capacity in addition to representational format, and we caution against attributing all improvements solely to cross-modal alignment or temporal structure. We now explicitly acknowledge this limitation in the Discussion and note that comparing size-matched video-only and video–text models within the same pipeline is an important next step for disentangling model scale from representational content.

      (On page 19) “Finally, part of VALOR’s advantage may reflect model capacity: larger pretrained models often yield higher encoding accuracy, so repeating these analyses with size-matched image-only and image–text models will be critical for disentangling model scale from representational content.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the current study, Huang et al. examined ACC response during a novel discrimination-avoid task. The authors concluded that ACC neurons primarily encode post-action variables over extended periods, reflecting the animal's preceding actions rather than the outcomes or values of those actions. Specifically, they identified two subgroups of ACC neurons that responded to different aspects of the actions. This work represents admirable efforts to investigate the role of ACC in task-performing mice. However, in my opinion, alternative explanations of the data were not sufficiently explored, and some key findings were not well supported.

      Strengths:

      The development of the new discrimination-avoid task is applauded. Single-unit electrophysiology in task-performing animals represents admirable efforts and the datasets are valuable. The identification of different groups of encoding neurons in ACC can be potentially important.

      Weaknesses:

      One major conclusion is that ACC primarily encodes the so-called post-action variables (specifically shuttle crossing). However, only a single example session was included in Figure 2, while in Supplementary Figure 2 a considerable fraction of ACC neurons appears to respond to either the onset of movement or ramp up their activity prior to movement onset. How did the authors reach the conclusion that ACC preferentially respond to shuttle crossing?

      We now include more example sessions and the main results from individual animals (Fig. 3; Figs. S2–S3; Fig. 8). Overall, the results are consistent across recording sessions and animals.

      While shuttle crossings were the primary reference for most analysis, using shuttle initiation as a reference led to similar conclusions (Fig.4). Namely, we found that most ACC neurons exhibit either robust (22%; Types 1a & 2a) or moderate (51%; Types 1b & 2b) post-shuttle activity changes (Fig.4), while only a subset exhibits ramping pre-shuttle activity (16%; Types 3b & 3c). Therefore, our conclusion was intended to highlight the role of post-shuttle activity in learning. While we do not exclude the possibility that pre-shuttle ACC activity contributes to learning, its involvement is likely more limited.

      In Figure 4, it was concluded that ACC neurons respond to action independent of outcome. Since these neurons are active on both correct and incorrect shuttle but not stay trials, they seem to primarily respond to overt movement. If so, the rationale for linking ACC activity and adaptive behavior/ associative learning is not very clear to me. Further analyses are needed to test whether their firing rates correlated with locomotion speed or acceleration/deceleration. On a similar note, to what extent are the action state neurons actually responding to locomotion-related signals? And can ACC activity actually differentiate correct vs. incorrect stays?

      In this study, we highlight two distinct groups of ACC neurons: action-state and action-content neurons. Both groups of neurons tend to show sustained activity even when the animals remain immobile after completing shuttle behaviors, suggesting that their activity is not directly driven by locomotion. Furthermore, action-content neurons are selectively engaged in only one of the two shuttle categories, either rooms A→B or B→A shuttles. Therefore, differences in neuronal activity are unlikely to reflect locomotor differences, given that both shuttle types involve similar movement patterns. Finally, we analyzed ACC neuronal activity in relation to locomotion speed. Our results indicate that only a small fraction of neurons (<15%) show speed-correlated activity (Fig.5), suggesting that most ACC neurons do not encode movement-related information. Taken together, these findings support the distinction between ACC activity and locomotion encoding.

      As for the small subset of speed-related neurons, it remains unclear whether these speed-related neurons represent a distinct subpopulation within the ACC or reflect recordings from the nearby motor cortex. Postmortem examination of the recording sites suggests that most neurons were recorded from the ACC, while a small subset may be located at the border between the ACC and motor cortex (Fig. S2). Therefore, it is possible that the small fraction of speed-related neurons originated from the motor cortex.

      Lastly, given that the ACC neurons display no or limited activity during stay trials, their activity generally does not differentiate correct vs. incorrect stays (Fig.S7). However, ACC activity does show moderate differentiation between room-A vs. room-B stays (Fig.S7).

      Given that a considerable amount of ACC neurons encode 'action content', it is not surprising that by including all neurons the model is able to make accurate predictions in Figure 6. How would the model performance change by removing the content neurons?

      We thank the reviewer for this thoughtful analysis idea. Excluding action-content neurons drastically reduces decoding accuracy (Fig.8), suggesting that they are the main drivers for differentiating rooms AB vs. BA shuttles.

      Moving on to Figure 7. Since Figure 4 showed that ACC neurons respond to movement regardless of outcome, it is somewhat puzzling how ACC activity can be linked to future performance.

      As discussed earlier (point #2), ACC activity does not simply reflect locomotion itself. We interpret the post-shuttle ACC activity as encoding both the preceding shuttle state (shuttle or stay) and shuttle content (rooms AB or BA). Regardless of the outcome (safety or shock), such encoding is essential for cue–action–outcome associative learning, because both positive and negative feedback can drive learning. The level of post-shuttle ACC activity may reflect task engagement, with greater engagement facilitating learning and improving future performance.

      Two mice contributed about 50% of all the recorded cells. How robust are the results when analyzing mouse by mouse?

      We have added further analysis of highlighting the results of each mouse. Although the total number of recorded neurons varied across mice, the major findings were consistent. In every mouse, we observed sustained post-shuttle ACC activity (Fig.S2), and population-level ACC activity reliably decoded shuttle contents (rooms AB vs. BA; Fig.8).

      Lastly, the development of the new discrimination-avoid task is applauded. However, a major missing piece here is to show the importance of ACC in this task and what aspects of this behavior require ACC.

      We appreciate this feedback. We are currently conducting additional experiments to determine whether inhibiting ACC activity during distinct time windows disrupts task learning. We hope to publish a follow-up paper on these findings in the near future.

      Reviewer #2 (Public review):

      Summary:

      The current dataset utilized a 2x2 factorial shuttle-escape task in combination with extracellular single-unit recording in the anterior cingulate cortex (ACC) of mice to determine ACC action coding. The contributions of neocortical signaling to action-outcome learning as assessed by behavioral tasks outside of the prototypical reward versus non-reward or punished vs non-punished is an important and relevant research topic, given that ACC plays a clear role in several human neurological and psychiatric conditions. The authors present useful findings regarding the role of ACC in action monitoring and learning. The core methods themselves - electrophysiology and behavior - are adequate; however, the analyses are incomplete since ruling out alternative explanations for neural activity, such as movement itself, requires substantial control analyses, and details on statistical methods are not clear.

      Strengths:

      (1) The factorial design nicely controls for sensory coding and value coding, since the same stimulus can signal different actions and values.

      (2) The figures are mostly well-presented, labeled, and easy to read.

      (3) Additional analyses, such as the 2.5/7.5s windows and place-field analysis, are nice to see and indicate that the authors were careful in their neural analyses.

      (4) The n-trial + 1 analysis where ACC activity was higher on trials that preceded correct responses is a nice addition, since it shows that ACC activity predicts future behavior, well before it happens.

      (5) The authors identified ACC neurons that fire to shuttle crossings in one direction or to crossings in both directions. This is very clear in the spike rasters and population-scaled color images. While other factors such as place fields, sensory input, and their integration can account for this activity, the authors discuss this and provide additional supplemental analyses.

      Weaknesses:

      (1) The behavioral data could use slightly more characterization, such as separating stay versus shuttle trials.

      We appreciate this feedback. In the revised manuscript, we present data separating stay versus shuttle trials (Fig.1). Additionally, we provide new data from extended training sessions (Fig.S2).

      (2) Some of the neural analyses could use the necessary and sufficient comparisons to strengthen the authors' claims.

      We have now used the necessary and sufficient comparisons where applicable. In the SVM decoding analysis, we show that population ACC activity is sufficient to decode AB or BA shuttles. We also show that excluding action-content, but not other ACC neurons, drastically reduces decoding accuracy, suggesting that these neurons are necessary for the decoding (Fig.8).

      (3) Many of the neural analyses seem to utilize long time windows, not leveraging the very real strength of recording spike times. Specifics on the exact neural activity binning/averaging, tests, classifier validation, and methods for quantification are difficult to find.

      We chose to perform our neural analyses on a longer time scale, given the sustained activity we see in the data. To further justify that decision, we now provide additional results highlighting the sustained activity of ACC neurons in our task (Fig.2; Fig.S2). Additionally, we now provide more specifics of the neural analyses in Methods section.

      (4) The neural analyses seem to suggest that ACC neurons encode one variable or the other, but are there any that multiplex? Given the overwhelming evidence of multiplexing in the ACC a bit more discussion of its presence or absence is warranted.

      This is an interesting point of discussion, and we thank the reviewer for pointing this out. Overall, our results suggest that individual ACC neurons preferentially engage in only one of the proposed functions, rather than multiplexing across them. For example, action-state and action-content ACC neurons primarily engage in action monitoring, but not in decision-making, planning, or outcome tracking. Nevertheless, we cannot rule out the possibility that other ACC neurons, through their distinct connectivity or location in different ACC subregions, engage in other proposed functions. Thus, when considering the ACC as a whole, its function may still be multiplexed.

      Another possible reason we do not see clear multiplexing of neurons may be due to the dynamic nature of our task. Unlike established tasks that often assign fixed positive or negative values to cues, the cues in our task are not inherently associated with valence. Instead, their meaning is dynamically determined by the animal’s location (context) at the time of cue presentation. Since values are not fixed and change based on context, value-related responses may not be reflected in the ACC in our tasks.

      We have now incorporated the above discussions into our revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors record from the ACC during a task in which animals must switch contexts to avoid shock as instructed by a cue. As expected, they find neurons that encode context, with some encoding of actions prior to the context, and encoding of neurons post-action. The primary novelty of the task seems to be dynamically encoding action-outcome in a discrimination-avoidance domain, while this is traditionally done using operant methods. While I'm not sure that this task is all that novel, I can't recall this being applied to the frontal cortex before, and this extends the well-known action/context/post-context encoding of ACC to the discrimination-avoidance domain.

      While the analysis is well done, there are several points that I believe should be elaborated upon. First, I had questions about several details (see point 3 below). Second, I wonder why the authors downplayed the clear action coding of ACC ensembles. Third, I wonder if the purported 'novelty' of the task (which I'm not sure of) and pseudo-debate on ACC's role undermines the real novelty - action/context/outcome encoding of ACC in discrimination-avoidance and early learning.

      Strengths:

      Recording frontal cortical ensembles during this task is particularly novel, and the analyses are sophisticated. The task has the potential to generate elegant comparisons of action and outcome, and the analyses are sophisticated.

      Weaknesses:

      I had some questions that might help me understand this work better.

      (1) I wonder if the field would agree that there is a true 'debate' and 'controversy' about the ACC and conflict monitoring, or if this is a pseudodebate (Line 34). They cite 2 very old papers to support this point. I might reframe this in terms of the frontal cortex studying action-outcome associations in discrimination-avoidance, as the bulk of evidence in rodents comes from overtrained operant behavior, and in humans comes from high-level tasks, and humans are unlikely to get aversive stimuli such as shocks.

      We appreciate this feedback. We have revised the Introduction and Discussion.

      (2) Does the purported novelty of the task undermine the argument? While I don't have an exhaustive knowledge of this behavior, the novelty involves applying this ACC. There are many paradigms where a shock triggers some action that could be antecedents to this task.

      We argue our newly designed discrimination–avoidance task is unique for several reasons. First, it requires animals to discriminate both sensory cues and environment contexts. Unlike established tasks that often assign fixed positive or negative values to cues, the cues in our task are not inherently associated with valence. Instead, their meaning is dynamically determined by the animal’s location (context) at the time of cue presentation, which reflects a conceptual advance over previous techniques. Furthermore, by removing valence from the cues, this design helps disentangle the ACC’s potential role in value encoding from other cognitive functions.

      Second, this task involves robust, ethologically relevant actions (i.e., shuttles), unlike many established paradigms that rely on less naturalistic behaviors such as saccades or lever presses. We view this as a key distinction from prior approaches, as even previous paradigms that utilize shutting responses or other naturalistic responses, fail to incorporate dynamic integration of cues and contexts.

      Finally, the clear temporal separation between actions and outcomes further helps disentangle the ACC’s roles in action monitoring vs. outcome tracking.

      (3) The lack of details was confusing to me:

      (a) How many total mice? Are the same mice in all analyses? Are the same neurons? Which training day? Is it 4 mice in Figure 3? Five mice in line 382? An accounting of mice should be in the methods. All data points and figures should have the number of neurons and mice clearly indicated, along with a table. Without these details, it is challenging to interpret the findings.

      We are sorry for the confusion. We now provide additional details and clear N numbers for each analysis to improve clarity.

      (b) How many neurons are from which stage of training? In some figures, I see 325, in some ~350, and in S5/S2B, 370. The number of neurons should be clearly indicated in each figure, and perhaps a table.

      All data were obtained from well-trained mice. For some analyses, the N is smaller because certain task sessions contained very few incorrect trials (≤3), which prevented us from examining ACC activity during those trials. We have modified figure legend so that neuron count is clear.

      (c) Were the tetrodes driven deeper each day? The depth should be used as a regressor in all analyses?

      Yes, the tetrodes were driven slightly deeper across task sessions (~80 µm per step; 2–4 depths per mouse). Given limited depth changes, preliminary analyses indicate no clear differences in ACC activity across these recording depths. However, we cannot rule out potential dorsal–ventral subregion differences if recordings were to span larger depth ranges.

      (d) Was is really ACC (Figure 2A)? Some shanks are in M2? All electrodes from all mice need to be plotted as a main figure with the drive length indicated.

      We have now included a supplementary figure showing all recording sites (Fig.S2). It is likely that a small subset of neurons was recorded at the ACC/M2 border area. Unfortunately, we are unable to separate them out due to blind recording design of our tetrode arrays.

      (e) It's not clear which sessions and how many go into which analysis

      We have now specified the number of task sessions for each analysis (see Methods).

      (f) How many correct and incorrect trials (<7?) are there per session?

      We have now specified the number of correct and incorrect trials per session (see Methods).

      (g) Why 'up to 10 shocks' on line 358? What amplitudes were tried? What does scrambled mean?

      We decided to use up to 10 mild shocks per trial because mice do not necessarily shuttle to the safe room after one or even a few shocks during the early stages of training. This design allows mice to efficiently learn the concept of the task (i.e., one room is safe while the other delivers shocks). Each shock was specified in the Methods section as 0.5 mA, 0.1 s. A “scrambled shock” refers to an electric shock delivered through multiple floor bars in a randomized pattern, effectively preventing the animal from avoiding the stimulus.

      (4) Why do the authors downplay pre-action encoding? It is clearly evident in the PETHs, and the classifiers are above chance. It's not surprising that post-shuttle classification is so high because the behavior has occurred. This is most evident in Figure S2B, which likely should be a main figure.

      We did not intend to downplay pre-action encoding. Our analysis shows that most ACC neurons exhibit either robust (22%; Types 1a & 2a) or moderate (51%;Types 1b & 2b) post-shuttle activity changes (Fig.4). Although a subset of ACC neurons exhibits ramping pre-shuttle activity, they represent a much smaller fraction (16%; Types 3b & 3c). Therefore, our conclusion was intended to highlight the role of post-shuttle activity in learning. While we do not exclude the possibility that pre-shuttle ACC activity contributes to learning, its involvement is likely more limited

      (5) The statistics seem inappropriate. A linear mixed effects model accounting for between-mouse variance seems most appropriate. Statistical power or effect size is needed to interpret these results. This is important in analyses like Figure 7C or 6B.

      We appreciate this feedback. We now use appropriate statistics and report effect size.

      (6) Better behavioral details might help readers understand the task. These can be pulled from Figures S2 and S5. This is particularly important in a 'novel' task.

      We now provide more details to help better understand the task and have added new figures (Fig.1; Figs. S1&S2).

      (7) Can the authors put post-action encoding on the same classification accuracy axes as Figure 6B? It'd be useful to compare.

      We appreciate the comment, but we are unsure what clarification is being requested.

      (8) What limitations are there? I can think of several - number of animals, lack of causal manipulations, ACC in rodents and humans.

      We now include discussions on limitation of our study. One caveat of our study is that the discrimination–avoidance task requires weeks of training in mice. By the time they master the task, ACC activity may reflect modified neural circuits. Investigating ACC activity during early phase of learning, such as by introducing a new pair of cues or contexts, could provide further insights into ACC’s role in learning and cognitive processes. Additionally, a limitation of the current study is the lack of evidence for the causal role of post-action ACC activity in complex associative learning. Future investigations using closed-loop strategies to selectively disrupt ACC activity during the post-action phase could help address this question.

      Minor:

      (1) Each PCA analysis needs a scree plot to understand the variance explained.

      We have added a scree plot for each PCA analysis.

      (2) Figure 4C - y and x-axes have the same label?

      We have corrected the y-axis label.

      (3) What bin size do the authors use for machine learning (Not clear from line 416)?

      The bin sizes used were 2.5, 5, 7.5, or 10 sec which have now been discussed in the Methods section.

      (4) Why not just use PCA instead of 'dimension reduction' (of which there are many?)

      We have adjusted the phrasing where appropriate.

      (5) Would a video enhance understanding of the behavior?

      We appreciate this feedback. We now include a few videos to accompany our paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Is Figure 1C sufficiently powered?

      We have now included data from additional mice and updated the figure accordingly.

      (2) Task performance was not plateaued after 10 sessions in Figure 1B. How variable is task performance in the datasets with ephys recordings (session to session, mouse to mouse).

      We have now included additional data from extended training (15 sessions; Fig.S2). Moderate variations across both sessions and mice are observed. Specifically, the total number of correct/incorrect shuttles used for ephys analysis are 19/5, 19/4, 21/5, 20/4 (mouse #1; 4 sessions); 20/7, 23/7, 20/7 (mouse #2; 3 sessions); 19/4, 16/2 (mouse #3; 2 sessions); 26/4, 23/4, 17/6, 25/5 (mouse #4; 4 sessions); 20/5, and 17/4 (mouse #5; 2 sessions), respectively.

      (3) Please quantify the results in Figure 3, for both within individual mice and across mice.

      We have calculated maximum trajectory length within the 3-D space (Fig. 3C).

      (4) What is the effect size in Figure 7C?

      We now report the effect size.

      (5) Please provide more details for spike sorting.

      We have now included more details in the Methods section.

      (6) More detailed cell type or correlation analysis in Figures 4 and 5 may be helpful. For example, if putative regular and fast-spiking neurons were simultaneously recorded, did the FS directly inhibit the RS to give rise to the apparent encoding properties?

      We recorded a small number of putative interneurons (n = 13) from only three mice, which precludes drawing meaningful conclusions, particularly given their heterogeneous responses during discrimination–avoidance tasks. Accordingly, we include only an example interneuron demonstrating discrimination between AB vs. BA shuttles (Fig. S5). Nevertheless, it is evident there are reciprocal monosynaptic connections between putative interneurons and certain pyramidal neurons, as indicated by short-latency (~2 ms) excitatory or inhibitory interactions (Fig. S5). That said, follow up studies with greater Ns are needed to parse out these details

      Reviewer #2 (Recommendations for the authors):

      (1) While I appreciate displaying the success rate for the sake of simplifying behavioral data in Figure 1B, it would be nice to also see these data broken out as correct vs incorrect for stay vs shuttle trials, since it is difficult to determine whether the performance increases are primarily driven by mice improving at stay vs shuttle responses

      We appreciate this feedback. In the revised manuscript, we present data separating stay versus shuttle trials (Fig.1; Fig.S2).

      (2) In Figure 2 the comparison between shuttle and stay is not particularly convincing, since the comparison is also essentially movement vs no movement and place1-->place2 vs place1-->place1. A more appropriate comparison might be action state neurons vs action content neurons during A-->B, B-->A, or both crossings. If it is true that these populations contain this information, then action state neurons should traverse a large component space in both directions, action content neurons only one direction, and so on.

      We agree that the comparison is not ideal due to differences in locomotion. However, it provides valuable information suggesting that the ACC plays a limited role during stay trials, despite these trials involve mental and cognitive processes comparable to shuttle trials. While we appreciate the reviewer’s suggestion, the proposed analysis is not particularly reliable given the relatively small number of simultaneously recorded action-state or action-content neurons.

      (3) I would say the above point applies to Figure 3 as well. I would also note that this reviewer greatly appreciates the rigor of showing ensemble activity in each subject.

      We appreciate this comment. See our response above.

      (4) In Figure 5 do these neurons show the same A-->B vs B-->A firing patterns during correct vs incorrect shuttles? The text describing the data in Figure 4 suggests this should be the case but even from a quick glance it sort of seems like the population dynamics during correct vs incorrect shuttles are not the same. My concern is that averaging neural activity over 5s windows washes out all these dynamics

      Preliminary analysis suggests that these firing patterns apply to both correct and incorrect shuttles. However, the main reason we did not compare correct and incorrect trials is the limited amount of data. In many sessions, there are only a few (≤5) incorrect shuttles, which include both AB or BA shuttles (Fig.1C; Fig.S2), thus lacking the statistical power for a meaningful comparison.

      (5) Some information on classifier validation is required - was this leave-out validation and if so how many trials were left-out vs tested? K-fold, and if so, how many folds? Was the trial order shuffled for each simulation? Classifiers will pick up within-session temporal information. In addition to this classifier accuracy during the different time points should be compared by a non-parametric test, and compared to the 95th percentile of the label-shuffled distribution.

      Yes, we use standard 10-fold cross-validation. We appreciate the suggestion on trial-order shuffling, and implementing this procedure does not change our original conclusion. Additionally, we have applied a non-parametric test.

      (6) How exactly were neurons classified as content vs state? Was it the average activity during the 5s following the shuttle? If this is stated I could not really find it easily so I might suggest clarifying.

      We now use a new method for classification of the two neuron types (Fig.7). We have included detailed methods in the revised manuscript.

      (7) Movement drives cortical neuron activity more than anything else I have ever seen. Really, more than anything else, it would be nice to demonstrate that it is not movement alone or movement multiplexed with place/sensory information/direction driving these responses.

      We have analyzed ACC neuronal activity in relation to locomotion speed. Our results indicate that only a small fraction of ACC neurons (<15%) show speed-correlated activity (Fig.5). It remains unclear whether these speed-related neurons represent a distinct subpopulation within the ACC or reflect recordings from nearby motor cortex. Postmortem examination of the recording sites suggests that most neurons were recorded from the ACC, while a small subset may be located at the border between the ACC and motor cortex. Therefore, it is possible that the small fraction of speed-related neurons originated from the motor cortex.

      Furthermore, we identify two distinct groups of ACC neurons: <iaction-state and action-content neurons, both of which tend to show sustained activity even when the animals remain immobile after completing shuttle behaviors. This prolonged activation in the absence of movement suggests that their activity is not directly driven by locomotion. Moreover, action-content neurons are selectively engaged in only one of the two shuttle categories, either rooms AB or BA shuttles. Therefore, differences in neuronal activity are unlikely to reflect locomotor differences, given that both shuttle types involve similar movement patterns.

      (8) In addition to the above, the place-field analysis in Supplemental Figure 5 only shows 4 neurons. Was the whole population analyzed? Is it possible to decode place from the population during the ITI? The data in this figure sort of look exactly like place fields - many cortical neurons and also some hippocampal neurons have more than 1 place field

      We have now provided additional place-field analysis. A comparison with hippocampal CA1 neurons (recorded during the same task) suggests that ACC neurons encode limited spatial information.

      (9) "a simple Pavlovian association strategy is unlikely to be sufficient for learning the task" ... is Pavlovian occasion setting not a simple association? Tones and contexts both readily act as Pavlovian occasion setters. Similarly positive/negative patterning might also explain how the task is learned.

      We appreciate this comment and have revised the sentence accordingly. It is possible that animals use multiple strategies to learn and perform the task effectively. In the early stages, animals may rely more heavily on sensory–spatial integration, whereas in later stages, sensory- or location-related Pavlovian associative strategies may contribute to performance, particularly when animals begin to show place preferences during inter-trial intervals.

      (10) I might suggest softening this language and others like it. For example, 2x2 factorial designs are not really novel.

      We have revised the language used to describe the task.

      (11) Some of the color-scale bars and figures do not have labels. For example, Supplementary Figure 3, Supplementary Figure 5. Please add labels.

      We have added the missing labels to all color bars.

      Reviewer #3 (Recommendations for the authors):

      (1) Some relevant papers that should be cited:

      https://doi.org/10.1523/JNEUROSCI.4450-08.2008

      10.1016/j.neuron.2018.11.016

      https://doi.org/10.1016/j.jphysparis.2014.12.001

      We appreciate these suggestions.

      (2) Where can we download the data and code?

      We will upload the essential data and MATLAB code to GitHub to accompany the publication of the final version of this paper.

  2. Feb 2026
    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public review:

      Weaknesses:

      (1) Controls for the genetic background are incomplete, leaving open the possibility that the observed oviposition timing defects may be due to targeted knockdown of the period (per) gene but from the GAL4, Gal80, and UAS transgenes themselves. To resolve this issue the authors should determine the egg-laying rhythms of the relevant controls (GAL4/+, UAS-RNAi/+, etc); this only needs to be done for those genotypes that produced an arrhythmic egg-laying rhythm.

      (2) Reliance on a single genetic tool to generate targeted disruption of clock function leaves the study vulnerable to associated false positive and false negative effects: a) The per RNAi transgene used may only cause partial knockdown of gene function, as suggested by the persistent rhythmicity observed when per RNAi was targeted to all clock neurons. This could indicate that the results in Fig 2C-H underestimate the phenotypes of targeted disruption of clock function. b) Use of a single per RNAi transgene makes it difficult to rule out that off-target effects contributed significantly to the observed phenotypes. We suggest that the authors repeat the critical experiments using a separate UAS-RNAi line (for period or for a different clock gene), or, better yet, use the dominant negative UAS-cycle transgene produced by the Hardin lab (https://doi.org/10.1038/22566).

      We have followed the referee advice,repeating the experiments with the dominant negative UAS-cyc<sup>DN</sup>. They nicely confirm our conclusions: the abolition of the cellular clock in LNd neurons rule out the rhythmicity of oviposition. The results are presented in Fig. 3 of the new manuscript, panels H to N. We thank the reviewer for this suggestion that has definitely improved our paper, since it allows us to confirm our result using both a different driver and a different UAS sequence. In addition, we included the required GAL4 controls, which can be found in Panels E, L of the figure as well as average egglaying profiles for all genotypes involved (Panels B, D, F, I, K and M). Regarding the MB122Bsplit-Gal4>UAS-per<sup>RNAi</sup> experiment, we moved it to a supplementary figure (Figure 3S1). The paragraph where the new Figure 3 is discussed has been modified accordingly.

      (3) The egg-laying profiles obtained show clear damping/decaying trends which necessitates careful trend removal from the data to make any sense of the rhythm. Further, the detrending approach used by the authors is not tested for artifacts introduced by the 24h moving average used.

      The method used for the assessment of rhythmicity is now more fully explained and tested in the supplementary material. In particular, the issue of trend removal is treated in the second section of the SM, and the absence of "artifacts" (interpreted as the possibility of deciding that a signal is rhythmic when it is not, or vice versa) shown in figs. S3 to S5.

      (4) According to the authors the oviposition device cannot sample at a resolution finer than 4 hours, which will compel any experimenter to record egg laying for longer durations to have a suitably long time series which could be useful for circadian analyses.

      The choice of sampling every 4 hours is not due to a limitation imposed by the device used. In fact the device can be programmed to move at whatever times are desired. As mentioned in the Material and Methods section, "more frequent sampling gives rise to less consistent rhythmic patterns", because the number of eggs sampled at each time slot become too small. In particular, we have tested sampling at intervals of 2 hours, and we have observed that this doubles the work performed by the experimenter but does not lead to an improvement in the assessment of rhythmicity.

      (5) Despite reducing the interference caused by manually measuring egg-laying, the rhythm does not improve the signal quality such that enough individual rhythmic flies could be included in the analysis methods used. The authors devise a workaround by combining both strongly and weakly rhythmic (LSpower > 0.2 but less than LSpower at p < 0.05) data series into an averaged time series, which is then tested for the presence of a 16-32h "circadian" rhythm. This approach loses valuable information about the phase and period present in the individual mated females, and instead assumes that all flies have a similar period and phase in their "signal" component while the distribution of the "noise" component varies amongst them. This assumption has not yet been tested rigorously and the evidence suggests a lot more variability in the inter-fly period for the egg-laying rhythm.

      As stressed in the paper, and in the new Supplementary Material, the individual egg records are very noisy, which in general precludes the extraction of any information about the underlying period and phase. The workaround we (and others, e.g. Howlader et al. 2006) have used is analyzing average egg records for each genotype. Even though this implies assuming the same period and phase for all individuals, we have observed, using experiments with synthetic data, that small variations in individual periods (of the same amount as those present in real experiments where the period of some flies can be assessed individually) still allow us to use our method to decide if the genotype is rhythmic or not. This issue is discussed at length in the new Supplementary Material. There we also discuss an experiment with real flies, showing the individual records, and the corresponding periodograms, for each fly, for a rhythmic (Fig. S14) and an arrhythmic genotype (Fig. S17).

      (6) This variability could also depend on the genotype being tested, as the authors themselves observe between their Canton-S and YW wild-type controls for which their egg-laying profiles show clearly different dynamics. Interestingly, the averaged records for these genotypes are not distinguishable but are reflected in the different proportions of rhythmic flies observed. Unfortunately, the authors also do not provide further data on these averaged profiles, as they did for the wild-type controls in Figure 1, when they discuss their clock circuit manipulations using perRNAi. These profiles could have been included in Supplementary figures, where they would have helped the reader decide for themselves what might have been the reason for the loss of power in the LS periodogram for some of these experimental lines.

      We have added the individual periodograms of the arrhythmic lines to the Supplementary material (Figs. 3S2, 3S5 and panel G of Fig. 3S1), where they can be compared with their respective controls (Figs 3S3, 3S4, 3S6, 3S7 and panel F of Fig. 3S1).

      (7) By selecting 'the best egg layers' for inclusion in the oviposition analyses an inadvertent bias may be introduced and the results of the assays may not be representative of the whole population.

      We agree that the results may be biased for 'the best egg layers'. We remark however, that the flies that have been left out lay very few eggs, some of them even laying no eggs on a whole day. For these flies it is difficult to understand how one can even speak of egg laying rhythmicity (let alone how one can experimentally assess it). Thus, we think it might be misleading to speak of results as "representative of the whole population". Furthermore, it is even possible that the very concept of egg laying rhythmicity makes little sense if flies do not lay enough eggs.

      (8) An approach that measures rhythmicity for groups of individual records rather than separate individual records is vulnerable to outliers in the data, such as the inclusion of a single anomalous individual record. Additionally, the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity. Therefore, the experimental data used to map the clock neurons responsible for oviposition rhythms would be more convincing if presented alongside individual fly statistics, in the same format as used for Figure 1.

      In general, we have checked that there are no "outliers", in the sense of flies that lay many more eggs than the others in the experiment. But maybe the reviewer is referring to the possibility that a few rhythmic flies make the average rhythmic. This issue is addressed in the supplementary material, at the end of section "Example of rhythmicity assessment for a synthetic experiment". In short, we found that eliminating some of the most rhythmic flies from a rhythmic population makes the average a bit less rhythmic, but still significantly so. Conversely, if these flies are transferred to an arrhythmic population, the average is still non rhythmic.

      Regarding "the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity", we stress that we have not performed a selection of flies for the averages. All of the flies tested are included in the average, independently of their individual rhythmicity, provided only that they lay enough eggs.

      (9) The features in the experimental periodogram data in Figures 3B and D are consistent with weakened complex rhythmicity rather than arrhythmicity. The inclusion of more individual records in the groups might have provided the added statistical power to demonstrate this. Graphs similar to those in 1G and 1I, might have better illustrated qualitative and quantitative aspects of the oviposition rhythms upon per knockdown via MB122B and Mai179; Pdf-Gal80.

      We are aware that in the studies of the rhythmicity of locomotor activity the presence of two significant peaks is usually interpreted as a “complex rhythm”, i.e. as evidence of the existence of two different mechanisms producing two different rhythms in the same individual. In our case, since the periodograms we show assess the rhythmicity of the average time series of several individuals, the two non-significant peaks could also correspond to the periods of two different subpopulations of individuals. However, a close examination of the individual periodograms, now provided as Supplementary Figures 3S2 to 3S9, does not show any convincing evidence of any of these two possibilities.

      Another possibility could be that such peaks are simply an artifact of the method in the analysis of time series that consist of very few cycles and also few points per cycle. In the supplemenatry material we show that this can indeed happen. Consider, for example, periodograms 2 and 4 in Fig. S12 of the SM. Even though both of them display two non significant peaks, these periodograms correspond to two synthetic time series that are completely arrhythmic.

      We have added to the manuscript a paragraph discussing the issue of possible bimodality (next to last paragraph in subsection "The molecular clock in Cry+ LNd neurons is necessary for rhythmic egg-laying").

      Wider context:

      The study of the neural basis of oviposition rhythms in Drosophila melanogaster can serve as a model for the analogous mechanisms in other animals. In particular, research in this area can have wider implications for the management of insects with societal impact such as pests, disease vectors, and pollinators. One key aspect of D. melanogaster oviposition that is not addressed here is its strong social modulation (see Bailly et al.. Curr Biol 33:2865-2877.e4. doi:10.1016/j.cub.2023.05.074). It is plausible that most natural oviposition events do not involve isolated individuals, but rather groups of flies. As oviposition is encouraged by aggregation pheromones (e.g., Dumenil et al., J Chem Ecol 2016 https://link.springer.com/article/10.1007/s10886-016-0681-3) its propensity changes upon the pre-conditioning of the oviposition substrates, which is a complication in assays of oviposition rhythms that periodically move the flies to fresh substrate.

      We agree that social modulation can be important for oviposition, as has been shown in the paper cited by the reviewer. But we think that, in order to understand the contribution of social modulation to oviposition, it is important to know, as a reference for comparisons, what the flies do when they are isolated. Our aim in this work has been to provide such a reference.

      Recommendations for the authors:

      (1) The weaknesses identified in the Public review could be addressed as follows: etc.

      We have followed the suggestions of the editor and addressed each of the weaknesses mentioned (see details above).

      (2) Could the authors comment on their choice of using individual flies for their assay rather than (small) groups of flies? Is it possible that their assay would produce less noisy results with the latter?

      First we want to emphasize that our aim here was to assess the presence of individual rhythmicity, free from any external influences, whether arising from environmental external cues (such as light or temperature changes) or by social interactions (with other females or males). However, we were also curious about the behavior when males were put in the same chamber with each female. We performed a few tests and the results were very similar to what we obtained with single females.

      (3) Minor points:

      (a) Line 57-58 - "around 24 h and a peak near night onset (Manjunatha et al., 2008). Egglaying rhythmicity is temperature-compensated and remains invariant despite the nutritional state": Rephrase to something simpler like temperature and nutrition compensated.

      Corrected.

      (b) Line 56-57 - "The circadian nature of this behavior was revealed by its persistence under DD with a period around 24 h and a peak near night onset (Manjunatha et al., 2008)." A better reference here would be to Sheeba et al, 2001 for preliminary investigations into the egg-laying rhythms of individual flies and McCabe and Birley, 1998 for groups of flies under LD12:12 and DD.

      Suggestion accepted.

      (c) Line 65-67 - "We determined..... molecular clock in the entire clock network reduced the LNv did not." This suggests that it was unknown until now that LNv does not have a role, whereas Howlader et al 2006 already suggested that. The reader becomes aware of this at a later part of the manuscript. Please revise.

      This has been revised, and the citation to Howlader et al 2006 added to the new sentence.

      (d) Line 67 - "impairing the molecular clock in the entire clock network reduced the circadian rhythm of.."; saying "Reduced the power of the circadian rhythm" might be better phrasing."

      Suggestion accepted.

      (e) Line 72 - using the Janelia hemibrain dataset.

      Corrected

      (f) Line 72 typo "ussing", should be 'using'.

      Corrected.

      (g) Line 94: why is the periodic signal the same for all on the first day of DD?

      It is well known that in LD conditions activity is driven by the environmental light-dark cycle, which entrains the endogenous circadian clock of all flies. Even after the transition to DD, the effects of this entrainment persist for a few days, allowing the individual rhythmic patterns set by the light-dark cycle to remain synchronized for at least a few cycles. We are assuming that the same happens with oviposition. A sentence has been added explaining this (beginning of third paragraph of subsection "Egg-laying is rhythmic when registered with a semiautomated egg collection device").

      (h) Figure 1A-D, Were all flies included or only rhythmic flies? Please make this clear. How do you distinguish rhythmic and arrhythmic flies in Figure 1E? Their representative individual plots of egg number graphs are required. Why was the number of flies under DD decreased from 20 to 18?

      Throughout the paper, the analysis of average rhythmicity has been performed including all flies, since we postulate that even flies that individually can be classified as non rhythmic have a rhythm that is corrupted by noise, and that this noise can be partially subtracted by performing an average. The explanation of the characterization of rhythmic and arrhythmic individuals is in the Methods section, under the Data Analysis subsection. This is now fully developed in the Supplementary material, where the individual plots for some of the genotypes are included.

      Regarding the question of the number of flies having "decreased from 20 to 18?", there is a misunderstanding here. The results depicted in Figure 1, and in particular in panel E, correspond to two different experiments: one performed only in LD (7 days, n=20), and a second one performed for 5 days in DD, with one previous day in LD (n=18).

      (i) Figure E and K, Are n=20, 18, and n=30, 22 the total numbers of flies including both rhythmic and nonrhythmic? If so, it would be better to put them in the column, not in the rhythmic column.

      The figure has been corrected.

      (j) Line 107-108, please provide a citation for this statement.

      We have added two references: Shindey et al. 2016, and Deppisch et al. 2022.

      (k) Figure 1, 2, etc., please write a peak value inside the periodogram graph. This makes comparison easier.

      The peak values have been added in all Figures.

      (l) Line 184-185, Figure 2F, tau appears shorter in Clk4.1>perRNAi flies than in control, which suggests that DNp1 may play a role?

      As explained in the Supplementary Material, the particularities of oviposition records (discrete values, noise, few samples per period, etc.) preclude an accurate determination of the period if the record is considered as rhythmic. In particular, Fig. S4 shows that differences of 1 hour between the real and the estimated periods are not unusual.

      (m) Figure 4. Why are 2 controls shown? Please explain. Are they the same strains?

      The two controls shown are the UAS control and the GAL4 control. This information has now been added to the figure.

      (n) Line 314 'that' should be 'than'?

      Corrected.

      (o) Line 73-74 - Phrasing is not clear in: "LNds and oviposition neurons, consisting with, the essential role of LNds neurons in the control of this behavior.""

      Corrected.

      (p) Line 81-84 - "the experiments particularly demanding and labor-intensive. In this approach, eggs are typically collected every 4 hours (sometimes also every 2 hours), which usually implies transferring the fly to a new vial or extracting the food with the eggs and replacing it with fresh food in the same vial (McCabe and Birley, 1998; Menon et al., 2014)." McCabe and Birley had an automated egg collection device designed for groups of flies, which sampled eggs laid every hour for 6 days. Please remove this reference in this context

      Reference removed.

      (q) Line 91-92 - "The assessment of oviposition rhythmicity is challenging because the decision of laying an egg relies on many different internal and external factors making this behavior very noisy." This sentence makes it appear that 'assessment' is the limitation. Even locomotor activity is governed by many internal and external factors, yet we can obtain very robust rhythms. The sentence that follows is also not easy to digest. Can the authors frame the idea better?

      We have rewritten the corresponding paragraph in order to make it more clear (second paragraph of the Results section). Additionally, the Supplementary Material contains now a more detailed explanation and analysis of the method used.

      (r) Line 104-107 - rhythmic (with a period close to 24 h, Figure 1F) although the average egg record is strongly rhythmic with a period around 24 h (Figure 1B). Under DD condition, individual rhythmicity percentages are the same as in LD (Figure 1E) and their average record is also very rhythmic with a period of 24 h (Figure 1D). 'Strongly rhythmic' and 'very rhythmic' are less indicative of what is happening with the oviposition rhythm and can be phrased as robust instead, with a focus on their power measured.

      We have accepted the suggestion.

      (s) Line 108-110 - "Thus, egg-laying displays a much larger variability than locomotor activity, compounding the difficulty of observing the influence of the circadian clock on this behavior." The section discussed here does not illustrate the variability in egg-laying as much as the lack of robustness of the rhythm. The variation in rhythmicity going from CS flies (~70% rhythmic) to yw flies (~50% rhythmic) showcases the variability in this rhythm and how it is difficult to observe when compared to locomotor rhythms, which are usually consistently >90% rhythmic across multiple genotypes. These lines can be placed after the discussion about yw and perS flies. Moreover, previous studies using individual flies have reported that egg-laying rhythm is more variable than others Figure 1, Sheeba et al 2001.

      We have accepted the suggestion, replacing "Thus, egg-laying displays a much larger variability than locomotor activity..." by "This shows that, at the individual level, egg-laying is much less robust than locomotor activity ..."

      (t) Figure 1. Genotype notation within the figure panels is not consistent with the accepted / conventional notation or with the main text or legend notations throughout the manuscript.

      We are sorry for this mistake. We have corrected the genotype names in Figures and text in order to make notation consistent across the paper.

      (u) Supplementary Figure 1 Legend. Error in upper right corner? Not left corner? The photo does not clearly show the apparatus. The authors may wish to consider clearer images and more details about the apparatus including details of the 3D printing of the device and perhaps even include a short video where the motor moves the flies to a new chamber (This is only a suggestion to advertise the apparatus, not related to the review of the manuscript). They could also provide information about what fraction of females survived till the end of each trial when 21 flies were examined with 4-hour sampling across 4-5 cycles.

      In general, more than 80% of the females are alive at the end of a one week oviposition experiment. We have added this information in the Methods section at the end of the corresponding subsection ("Automated egg collection device"). Regarding the eggcollection device, we have replaced the photographs in what is now Supplementary Figure 1S1, and a short supplementary movie showing its operation.

      (v) The results depicted in Figure 2B are that of averaged time series. Hence the reader does not know 'the fact' that knocked-down animals are not completely rhythmic. Is the "not completely arrhythmic" in reference to flies with a power > 0.2 (weakly rhythmic) in their egg-laying rhythm or to the presence of ~40% of male flies (Supplementary Table 1) with a locomotor rhythm after perRNAi silencing of most of their clock neurons? This is confusing because no intermediate category of flies is discussed in Figure 2. Please edit for clarity.

      We were referring to the rhythmicity of the genotype, not of the individuals. We have rewritten the corresponding paragraph in order to make it clearer (last paragraph of the first subsection of the Results section).

      (w) Line 173 - ablation or electrically silencing all PDF+ neurons (Howlader et al., 2006). There were no experiments carried out using electrical silencing of PDF+ neurons in the referenced paper.

      We are sorry for this mistake. This has been corrected (we have deleted the mention to electrical silencing).

      (x) Line 173 - Shortening of period by nearly 3 hours cannot be considered minor.

      We agree, and we have deleted the word "minor".

      (y) Line 332-333 - "We also disrupted the molecular clock (or electrically silenced) in PDFexpressing neurons as well as in the DN1p group with no apparent effect on egg-laying rhythms". There was period shortening observed for pdf GAL4 > perRNAi manipulation so there was an effect on the egg-laying rhythm. Additionally, perRNAi based silencing does not electrically silence PDF neurons as the kir 2.1 was expressed only using Clk4.1 GAL4 in the Dn1ps. This line should be rewritten.

      We have rewritten the paragraph mentioned (third paragraph of the Discussion) in order to make it more accurate.

      (4) Page 22 - Data Analysis

      Since the number of eggs laid by a mated female tend to show a downward trend, we proceeded as follows, in order to detrend the data (see the Supplementary Material for further details). First, a moving average of the data is performed, with a 6 point window, and a new time series T is obtained. In principle, T is a good approximation to the trend of the data. Then, a new, detrended, time series D is generated by pointwise dividing the two series (i.e. D(i)=E(i)/T(i), where i indexes the points of each series)." Can the authors provide a reference for this method of detrending? Smoothing can frequently introduce artifacts in the data and give incorrect period estimates. Additionally, the trend visible in the data, especially in Figure 1, suggests a linear decay that can be easily subtracted. Also, there is no discussion of detrending in the Supplementary material attached.

      We are sorry for the confusion with the Supplementary materials. The method used for subtracting both noise and trend from the data is now fully explained in the new Supplementary Material. All the issues raised by the reviewer in this comment have been addressed there.

      (5) Figure by figure

      Page - Type (Figure or text) - Comment

      (a) Page 6 Figure 1C There is remarkable phase coherence seen in the average egg laying time series for CS flies 5 days into DD and as the authors note in Lines 94-95 in the text "Under light-dark (LD) conditions, or in the first days of DD, it can be that the periodic signal is the same for all flies". Since this observation is crucial to constructing the figures seen later in the paper, a note should be made about why this rhythm could persist across flies, so deep into DD.

      As mentioned above, we have added a couple of lines explaining why we think that the assumption of a synchronized periodic signal is reasonable, at least during the first cycles (second paragraph of the first subsection of section Results).

      (b) Figure 1 G The effect of period/phase decoherence seems to be showing up here in the average profile for yw flies as they seem to completely dampen out after 2 days in DD and yet have a 24-hour rhythm in the averaged periodogram. The authors should make a note here if the LS periodogram is over-representing the periodicity of the first few days in DD or if comparing the first 3 vs. the last 3 days in DD gives different results.

      The dampening observed in average oviposition records is a product of the dampening of the oviposition records, which is well known phenomenon, probably caused by the depletion of sperm in the female spermatheque. One of the aims of the method used in the paper was to avoid the bias introduced by this dampening, by means of a detrending procedure. This is explained in the Materials an Methods, and now full details are given in the new Supplementary Materials.

      (c) Figure 1E, K Is this data pooled across 2-3 experiments, as discussed in lines 500-01 under 'Statistical Analysis'? Also, what test is being performed to check for differences between proportions here, seeing as there are no error bars to denote error around a mean value and no other viable tests mentioned in Statistical Analysis?

      We are sorry for this omission. For the comparison of proportions we used the 'N-1' Chisquared test. We have added a sentence detailing this at the end of the Statistical analysis section.

      (d) Figure 1 F, L Can the total number of weakly and strongly rhythmic values be indicated in the scatter plot?

      Corrected.

      (e) Figure 1F, L (legend) Is the Chi-squared test being performed on the proportion values of Figure 1(E, K) or for Figure 1(F, L)?"

      The chi-squared test mentioned was used for Fig1 F-L. As explained above, for the comparison of proportions we used 'N-1' Chi-squared test. This has now been added to the legend of the figure

      (f) Page 8 Figure 2B Seeing as individual flies with a LS periodogram power < 0.2 are considered weakly rhythmic in Figure 1 F, L can Clk856 > perRNAi flies on average also be considered weakly rhythmic, as the peak in the periodogram is above 0.3?

      We prefer to use the weakly rhythmic class only for individual flies. Nevertheless, we agree that this periodogram shows that the genotype analyzed is not completely arrhythmic, and that this might be due to some remaining individual rhythmicity. As mentioned above, we have rewritten the last paragraph of the first subsection of section Results in order to discuss this.

      (g) Figure 2D Can the authors comment on why there is a shorter period rhythm when PDF neurons have a dysfunctional clock, whereas previous evidence (Howlader et al., 2004) suggested that these neurons play no role in egg-laying rhythm? They should also refer to McCabe and Birley, 1998 to see if their results (where they observed a shorter period of ~19h with groups of per0 flies), might be of interest in their interpretations.

      We have added a line commenting this in the corresponding subsection ("LNv and DN1 neurons are not necessary for egg-laying rhythmicity") of the Results, as well as a discussion of this in the third paragraph of the Discussion. In a nutshell, even though Howlader et al did not find a shortening when PDF neurons are ablated, they did find it in pdf01 flies.

      (h) Figure 2 F, H As the authors mention in their Discussion on Page 16, lines 340-45, the manipulation of DN1p neurons might abolish the circadian rhythm in oogenesis as reported by Zhang et al, which is why they looked at this circuit driven by Clk4.1 neurons and comment that "The persistence of the rhythm of oviposition implies that it is not based on the availability of eggs but is instead an intrinsic property of the motor program". However, no change in fecundity is reported for either kir2.1 or perRNAi-based manipulations of these neurons, to help the reader understand if egg availability (at the level of egg formation) is playing any role in the downstream (and seemingly independent) act of egg laying. The authors should report if they see any change in total fecundity for either set of flies w.r.t their respective controls. Also, is the reduction in power seen with electrical silencing vs perRNAi expression of any relevance? Does the percentage of rhythmic flies change between these two manipulations?

      In the line mentioned by the reviewer what we meant is that our results show that the rhythm of oviposition does not seem to be based in the rhythmic production of oocytes, which is not necessarily connected with the total number of eggs produced. We have modified the corresponding line in the paper, in order to avoid this misunderstanding. Regarding the "reduction in power" mentioned, it must be stressed that, in general, the height of the peak is correlated with the fraction of rhythmic individuals. The problem is that this fraction is a much more noisy output, and that is the reason why we have chosen to work with periodograms of averages.

      (i) Figure 2 E and G, a loss of rhythmicity could also be due to a decrease in fecundity in the experimental lines. Since the number of eggs laid for each genotype is already known, can the authors show statistically relevant comparisons between the experimental lines and their respective controls? In this vein, can the averaged time series profiles also be provided for all the genotypes tested (as seen previously in Figure 1 A, C, G, I), perhaps in the supplementary?

      We did not focus on fecundity in the present work. However, our observations do not seem to show any definite relationship with rhythmicity. We plan to address the issue of fecundity more systematically in a future work. The averaged time series profiles have now been added to the figure.

      (j) Scatter plots showing the average period and SEM as seen in Figure 1 (F, L) would help in understanding if these manipulations have any effect on variation in the period of the egg-laying rhythm across flies. Particularly for pdf GAL4 > perRNAi flies which have a net shorter period, (but this might vary across the 34 flies tested).

      We have added a Supplementary Figure (2S1) that shows that the shortening of oviposition period can be also observed at the individual level. We have also added a line commenting this in the corresponding subsection ("LNv and DN1 neurons are not necessary for egg-laying rhythmicity") of the Results, as well as a discussion of this in the third paragraph of the Discussion.

      (k) Page 11 Figure 3B Does the presence of two peaks in the LS periodogram at a power > 0.2 indicate the presence of weakly rhythmic flies with both a short(20h) and a long(~27h) period component or either one? The short-period peak is nearly at p < 0.05 level of significance. So then, do most of the flies in MB122B GAL4 > perRNAi line show a weakly rhythmic shorter period?

      (l) Figure 3D A similar peak is observed again at 20h (LS power > 0.2 and nearly at p < 0.05 significance level again) and a different longer one at (~30h) though this one is almost near 0.2 on the power scale. Given the consistency of this feature in both LNd manipulations, the authors should comment on whether this is driven by variation in periods detected or the presence of complex rhythms (splitting or change in period) in the oviposition time series for these lines.

      (m) Figure 3 General scatter plots showing average period {plus minus} SEM could help explain the bimodality seen in the periodograms. Additionally indicating just how many flies are weakly rhythmic vs. strongly rhythmic can also help to illustrate how important the CRY+ LnDs are to the oviposition rhythm's stability.

      For these three comments (k, l and m), we note that the issue of bimodality has been addressed above, in our response to Weakness 9.

      (o) Figure 4B Same as comments under Figure 1, what is the statistical test done to compare the proportions for these three genotypes?

      As mentioned above, for the comparison of proportions we used the 'N-1' Chi-squared test. We have added a sentence detailing this at the end of the Statistical analysis section.

      (p) Figure 4C Are all flies significantly rhythmic? The authors should also provide an averaged LS periodogram measure for each genotype, to help illustrate the difference in power between activity-rest and egg-laying rhythms.

      Yes, the points represent periods of (significantly) rhythmic flies. This has been added to the caption, to avoid misunderstandings. The differences that arise when assessing rhythmicity in activity records vs. egg-laying records is addressed at length in the Supplementary Material (see e.g. Fig S1).

      (q) Page 15 Figure 5 - general As the authors discuss the possible contribution of DN1ps to evening activity and control over oogenesis rhythm, investigating the connections of the few that are characterized in the connectome (or lack thereof) with the Oviposition neurons, can help illustrate the distinct role they play in the female Drosophila's reproductive rhythm.

      This information was in the text and the Supplementary Tables. Lines 273-275 of the old manuscript read: "The full results are displayed in Supplementary Tables 2 and Table 3, but in short, we found that whereas there are no connections between LNv or DN1 neurons and oviposition neurons..."

      (r) Minor: The dark shading of the circles depicting some of the clusters makes it difficult to read. Consider changing the colors or moving the names outside the circles.

      Figure corrected.

      (s) Line 38: The estimated number of clock neurons has been revised recently (https://www.biorxiv.org/content/10.1101/2023.09.11.557222v2.article-info).

      Thank you for the reference. We have corrected the number of clock neurons in the Introduction of the new manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using single-unit recording in 4 regions of non-human primate brains, the authors tested whether these regions encode computational variables related to model-based and model-free reinforcement learning strategies. While some of the variables seem to be encoded by all regions, there is clear evidence for stronger encoding of model-based information in the anterior cingulate cortex and caudate.

      Strengths:

      The analyses are thorough, the writing is clear, and the work is well-motivated by prior theory and empirical studies.

      Weaknesses:

      My comments here are quite minor.

      The correlation between transition and reward coefficients is interesting, but I'm a little worried that this might be an artifact. I suspect that reward probability is higher after common transitions, due to the fact that animals are choosing actions they think will lead to higher reward. This suggests that the coefficients might be inevitably correlated by virtue of the task design and the fact that all regions are sensitive to reward. Can the authors rule out this possibility (e.g., by simulation)?

      We fully agree with the reviewer that the task design has in-built correlations between transition and reward, and thus the correlation between neural selectivity for feedback and transition (Figure 3E) may be due to the different reward expectation after common or rare transitions. We did try to make this point in the manuscript:

      This suggests that the brain treats being diverted away from your current objective equivalent to losing reward, which is sensible as the subject would normally expect lower rewards on rare trials if their reward-seeking behaviour was efficient.

      We’ve now updated the wording of this statement to try and better make this point and avoid confusion that any non-reward-related encoding is involved:

      “As the reward expectation will be higher on common compared to rare trials, this demonstrates that the brain encodes being diverted to an area with a lower reward expectation equivalent to actually receiving a low reward (and vice versa).”

      We have also adjusted the significance test of this correlation to use a circular permutation test that accounts for correlations between the regressors. This test still found there to be significant correlation in all areas.

      We have described this new permutation test in Methods:

      “For comparing correlations between weights for different features (i.e., between transition and reward coding, Figure 3E), the null distribution of correlations observed in circularly shifted data was compared to the correlation seen in the actual data. This accounts for any correlations between features that existed in the task by preserving the structure of the design matrices.”

      And updated the text in Results accordingly:

      “All regions, but particularly ACC, encoded a common transition (at the time of transition) similar to a high reward (at the time of feedback), as there was a positive correlation between the coefficients for reward and transition (the transition parameter was signed such that common and rare transitions were equivalent to high and low rewards, respectively) (ACC r=0.4963, DLPFC r=0.3273, caudate r=0.4712, putamen, r=0.5052; all p<0.002 except DLPFC where p=0.006, circular permutation test; Figure 3E, S5).”

      The explore/exploit section seems somewhat randomly tacked on. Is this really relevant? If yes, then I think it needs to be integrated more coherently.

      We thank the reviewer for this comment. We agree that the motivation for the explore/exploit analysis was not sufficiently clear in the original version.

      Our aim was not to introduce this as a separate or tangential effect, but rather to highlight how the task’s reward structure (with outcome levels stable for 5–9 trials) naturally created alternating periods favoring exploitation of a known high-value option versus exploration when outcomes changed. This feature of the task is tightly linked to MB-RL computations, as it requires integration of state-transition knowledge and updating across trials.

      Importantly, we show previously in the manuscript that ACC encoded state-transition structure (i.e., common versus rare transition) and MB-value estimates (at choice epoch). However, here we aimed to highlight that the same region also modulated choice encoding as a function of whether the subject was in an exploratory or exploitative regime – by knowing another feature of the task that relies on state-transition and outcome. We have revised this section to better integrate it into the main logic of the paper:

      “In our task, the outcome level (high, medium, low) of each second-stage stimulus remained the same for 5-9 trials before potentially changing. This design naturally created periods where subjects could ‘exploit’ the same Choice 1 to maximize reward for several trials; and other periods where they had to ‘explore’ different second-stage stimuli to optimize reward (as contingencies shifted). In classical MB-RL, the transition between reward states can be learned by keeping counts of observed transitions from a current state-action pair to a subsequent state, yielding a maximum-likelihood estimate of the environment’s dynamics [42]. In fact, knowledge about the reward contingency schedule could support decision-making in both exploitation – by enabling efficient choice when rewards are stable; and exploration – by guiding alternative behaviour most likely to yield improved outcomes (this is different from MF learning, where exploration is more random since the agent lacks explicit state-transition knowledge).

      We thus repeated our decoding analysis of choice 1 stimulus identity, but this time limited trials to those where they had not received a high reward for the previous two trials (‘explore’ trials), and those where the previous two rewards had been the highest level (‘exploit’ trials). All regions encoded choice 1 for some duration of the choice epoch for both explore (p<0.002 in all cases, permutation test; Figure 7A) and exploit (p<0.002 in all cases; Figure 7B) conditions, but decoding accuracy was strongest in ACC. Choice 1 was less strongly decoded – particularly in ACC – in the former condition compared to the latter (p<0.002 for at least 140 ms in all cases, permutation test on differences observed; Figure 7C); and, also during exploitation, the ACC encoded choice 1 before the choice was even presented to the subject (Figure S8). This pre-choice ACC encoding in exploit trials may reflect the need to allocate cognitive (or attentive) resources to features – i.e., choice 1 stimulus identity – that are most certain predictors of important outcomes. As a control, we also decoded the direction of the Choice 1 (where choice was indicated via joystick movement), which was randomised each trial and therefore orthogonal to the stimulus that was chosen. Again, all four regions encoded its direction in both explore (p<0.002 in all cases; Figure 7D) and exploit (p<0.002 in all cases; Figure 7E). However, there were minimal differences in the strength of the representation between explore and exploit conditions (ACC, p=0.088, cluster-based permutation test; DLPFC p=0.016; caudate p=0.32; putamen p=1; Figure 7F). Therefore, exploit behaviour specifically upregulated relevant task parameters that were worth remembering across trials.”

      Reviewer #2 (Public review):

      Summary:

      The authors investigate single-neuron activity in rhesus macaques during model-based (MB) and model-free (MF) reinforcement learning (RL). Using a well-established two-step choice task, they analyze neural correlates of MB and MF learning across four brain regions: the anterior cingulate cortex (ACC), dorsolateral PFC (DLPFC), caudate, and putamen. The study provides strong evidence that these regions encode distinct RL-related signals, with ACC playing a dominant role in MB learning and caudate updating value representations after rare transitions. The authors apply rigorous statistical analyses to characterize neural encoding at both population and single-neuron levels.

      Strengths:

      (1) The research fills a gap in the literature, which has been limited in directly dissociating MB vs. MF learning at the single unit level and across brain areas known to be involved in reinforcement learning. This study advances our understanding of how different brain regions are involved in RL computations.

      (2) The study used a two-step choice task Miranda et al., (2020), which was previously established for distinguishing MB and MF reinforcement learning strategies.

      (3) The use of multiple brain regions (ACC, DLPFC, caudate, and putamen) in the study enabled comparisons across cortical and subcortical structures.

      (4) The study used multiple GLMs, population-level encoding analyses, and decoding approaches. With each analysis, they conducted the appropriate controls for multiple comparisons and described their methods clearly.

      (5) They implemented control regressors to account for neural drift and temporal autocorrelation.

      (6) The authors showed evidence for three main findings:

      (a) ACC as the strongest encoder of MB variables from the four areas, which emphasizes its role in tracking transition structures and reward-based learning. The ACC also showed sustained representation of feedback that went into the next trial. b) ACC was the only area to represent both MB and MF value representations.

      (c) The caudate selectively updates value representations when rare transitions occur, supporting its role in MB updating.

      (7) The findings support the idea that MB and MF reinforcement learning operate in parallel rather than strictly competing.

      (8) The paper also discusses how MB computations could be an extension of sophisticated MF strategies.

      Weaknesses:

      (1) There is limited evidence for a causal relationship between neural activity and behavior. The authors cite previous lesion studies, but causality between neural encoding in ACC, caudate, and putamen and behavioral reliance on MB or MF learning is not established.

      We agree with the reviewer that the present study does not establish causal relationships, and we do not claim otherwise in the manuscript. Our work was designed as a comprehensive characterization of neural activity across ACC, DLPFC, caudate, and putamen during reward-seeking decision-making. By systematically comparing MB- and MF- RL signals across these regions, we provide new insights into the division of labor and cooperative interactions within cortico-striatal networks.

      While causal manipulations (e.g., lesions, inactivations, stimulation) are indeed required to directly establish necessity or sufficiency, correlational studies such as ours play a crucial role in identifying where and how computationally relevant signals are represented. Importantly, our findings align with and extend prior causal work, for example showing that ACC and striatal lesions disrupt MB control. Thus, our study contributes a detailed functional mapping of MB and MF RL encoding across multiple nodes of this circuit, which serves as an important foundation for future causal investigations (e.g., using transcranial ultrasound stimulation).

      (2) There is a heavy emphasis on ACC versus other areas, but it is unclear how much of this signal drives behavior relative to the caudate.

      We appreciate the reviewer's observation regarding this matter. Our intention was not to place a heavy emphasis on ACC, rather this came naturally from the data. The ACC demonstrated considerably more robust and enduring neural activity compared to other brain regions – for instance, reward-related signals in the ACC continued well beyond individual trials (Fig. 2A-B), and encoding of state transitions remained active from the initial transition through to the feedback phase (Fig. 3A-B). By comparison, distinctions among other regions were less pronounced, which naturally resulted in the ACC receiving greater attention in our analytical findings.

      We acknowledge that the caudate plays an essential and complementary role in driving behavior, and we believe that this is emphasized in the two key subsections of our “Results”. First, caudate neurons encoded model-based choice values (Fig. 4A, 4C) and uniquely remapped these values following rare transitions (Fig. 5), reflecting flexible adjustment of action values. Second, decoding analyses showed that both ACC and caudate populations predicted first-stage choices (Fig. 6C), linking their activity directly to behavioral decisions. In the Discussion section, we also highlight that “the distinctive caudate signal of updating (flipping) the value estimates of the currently experienced option on rare trials” goes beyond a “general temporal-difference RPE” and rather supports “the role of caudate in MB valuation”.

      (3) The role of the putamen is somewhat underexplored here.

      Our analyses were conducted in an identical manner across all four recorded regions (ACC, DLPFC, caudate, and putamen), and we consistently reported the results for putamen alongside the others. For example, in the Results section we describe how “both caudate and putamen encoded the reward from the previous trial negatively during the feedback period of the current trial” (Fig. 2F-G), and that “all regions had a significant population of neurons that encoded MB-, but not MF-, derived value” including putamen (Fig. 4F). Similarly, we show that putamen, like caudate, encoded a dopamine-like RPE signal at feedback (“both caudate and putamen neurons clearly responded at feedback with the parametric features of a dopamine-like RPE”; Discussion). These findings align with previous work linking the putamen to MF learning and are discussed explicitly in the context of MF-MB dissociations. We therefore believe that the putamen was not underexplored, but rather that its contribution was more circumscribed relative to ACC and caudate because the signals observed were quantitatively weaker and less distinctive for MB computations.

      (4) The authors mention the monkeys were overtrained before recording, which might have led to a bias in the MB versus MF strategy.

      We agree that extensive training can influence the balance between MB and MF in choice behaviour and neuronal responses.

      In a previous comprehensive behavioral analysis of the same dataset (Miranda et al., 2020, PLoS Computational Biology - ref. 36, Figure S6B) we showed that both MB and MF strategies contributed to behavior, with MB dominance stable across weeks of testing – supporting that overtraining did not eliminate MF influences (but rather stabilized a mixed strategy with robust MB contributions).

      In the same manuscript, we have also: i) cautioned the readers when comparing our results to data from the original human studies; ii) acknowledged that our extensive training cannot address earlier phases of learning in which sensitivity to the task structure is first acquired; and iii) also provided task-related reasons for such MB dominance – as training made the transition structure well learned (making MB computationally less costly and faster to implement) and the non-stationary outcomes favored the flexibility of MB strategies.

      In the present manuscript, we also have acknowledged that overtraining may have shifted neural signals toward stronger MB representations, or alternatively enabled more sophisticated task representations:

      “On the other hand, MF-based estimates were neither as striking nor as specific to striatal regions as expected and observed in previous studies [18]. The monkeys were extensively trained on the task before recordings commenced, which may have caused a shift towards both MB behaviour and MB value representation within the striatum. Alternatively, this training may have allowed more sophisticated representations to occur, such as using latent states to expand the task space [54].”

      Importantly, we strongly believe that this possibility does not detract from our main finding that both MB and MF signals were present across regions, with ACC showing the strongest multiplexing of the two.

      (5) The GLM3 model combines MB and MF value estimates but does not clearly mention how hyperparameters were optimized to prevent overfitting. While the hybrid model explains behavior well, it does not clarify whether MB/MF weighting changes dynamically over time.

      We appreciate this comment and would like to note that, for completeness, we have on several occasions directed the reader to our prior behavioural analysis of the same dataset (Miranda et al., 2020, PLoS Computational Biology, ref 36). In that work, we provide a full and detailed description of both the task and the computational modeling approach (see particularly the “Model fitting procedures” section). Furthermore, our model-fitting was grounded in the MF/MB RL framework used in the original human two-step study (Daw et al., 2011); and the fitting procedures also followed previous studies (Huys et al., 2011).

      Hyperparameters – including the MB/MF weighting parameter (ω) - were estimated using maximum likelihood under two complementary approaches and with priors providing regularization across sessions. First, we performed a fixed-effects analysis, in which parameters were estimated independently for each session by maximizing the likelihood separately; secondly, we conducted a mixed-effects analysis, treating parameters as random effects across sessions within each subject. The effect of the prior procedure reduces the risk of overfitting by constraining parameters based on their empirical distributions, rather than allowing unconstrained session-by-session estimates. Finally, all model fitting procedures were verified on surrogate generated data.

      With regard to dynamic weighting, our approach – consistent with most two-step studies – assumed ω to be constant across trials within each session. This was a deliberate choice, both for comparability with prior work and because our subjects were extensively trained, making session-level stability of strategy weights a reasonable assumption. Indeed, our analyses showed no systematic drift in ω across sessions, suggesting that MB/MF balance was stable over sessions. While approaches that allow dynamic ω estimation are possible, we believe such extensions would likely have minimal impact in the current dataset.

      (6) It was unclear from the task description whether the images used changed periodically or how the transition effect (e.g., in Figure 3) could be disambiguated from a visual response to the pair of cues.

      All images were kept constant across sessions. Common/Rare transitions themselves were not explicitly cued, but rather each second-stage state was associated with a specific background colour, followed ~1s later by the presentation of two specific second-stage choice cues (Figure 1B). Hence the subject could infer whether they were transitioned down a Rare or Common path by the background colour, which can be disambiguated in time from the visual responses to the second-stage cues. We’ve updated the Results text to make this clearer:

      “Tracking the state-transition structure of the task is imperative for solving the task as a MB-learner. All four regions encoded whether the current trial’s first-stage choice transitioned to the common or rare second-stage state (which could be inferred by a change in background colour immediately after choice indicating which second stage state they had just entered, Figure 1A).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 7 appears to be missing.

      We thank the reviewer for pointing this out. Figure 7 was inadvertently omitted in the previous version and has now been included in the revised manuscript.

      (2) No stats reported in the section on explore/exploit.

      We apologise for this oversight. This section now also reports the relevant statistics:

      “We thus repeated our decoding analysis of choice 1 stimulus identity, but this time limited trials to those where they had not received a high reward for the previous two trials (‘explore’ trials), and those where the previous two rewards had been the highest level (‘exploit’ trials). All regions encoded choice 1 for some duration of the choice epoch for both explore (p<0.002 in all cases, permutation test; Figure 7A) and exploit (p<0.002 in all cases; Figure 7B) conditions, but decoding accuracy was strongest in ACC. Choice 1 was less strongly decoded – particularly in ACC – in the former condition compared to the latter (p<0.002 for at least 140 ms in all cases, permutation test on differences observed; Figure 7C); and, also during exploitation, the ACC encoded choice 1 before the choice was even presented to the subject (Figure S8). This pre-choice ACC encoding in exploit trials may reflect the need to allocate cognitive (or attentive) resources to features – i.e., choice 1 stimulus identity – that are most certain predictors of important outcomes. As a control, we also decoded the direction of the Choice 1 (where choice was indicated via joystick movement), which was randomised each trial and therefore orthogonal to the stimulus that was chosen. Again, all four regions encoded its direction in both explore (p<0.002 in all cases; Figure 7D) and exploit (p<0.002 in all cases; Figure 7E). However, there were minimal differences in the strength of the representation between explore and exploit conditions (ACC, p=0.088, cluster-based permutation test; DLPFC p=0.016; caudate p=0.32; putamen p=1; Figure 7F).”

      (3) Make sure that error bars are explained in all figure captions where appropriate.

      We apologise that this information was absent. Error bars always represent the standard error of the mean. This has now been added to all relevant figure legends.

      Reviewer #2 (Recommendations for the authors):

      Overall, I think this is a great manuscript and was presented clearly and succinctly. I have some minor suggestions:

      (1) Typo: Abstract "ACC, DLPFC, caudate and striatum" I think should be "caudate and putamen".

      We have amended this incorrect reference in the introduction:

      “One such task that does enable the dissociation of MB and MF computations is Daw et al. (2011)’s ‘two-step’ task [18]. It contains a probabilistic transition between task states to uncouple MF learners (who would assign credit to which state was rewarded regardless of the transition) from MB learners (who would appropriately assign credit based on the reward and transition that occurred). Rodents [19], monkeys [36], and humans [18] all use MB-like behaviour to solve the task. Evidence in rodents suggests dorsal anterior cingulate cortex (ACC) tracks rewards, states, and the probabilistic transition structure, and that ACC is essential in implementing a MB-strategy [37]. Here, we compare primate single neuron activity of 4 different subregions implicated in reward-based learning and choice (ACC, dorsolateral PFC (DLPFC), caudate, and putamen) during performance of the classic two-step task, and demonstrate signatures of MB-RL primarily in ACC, and MF-RL signatures most notably in putamen.”

      (2) Could the authors provide a rationale for why they did the single-level encoding the way they did, instead of running an ANOVA?

      We thank the reviewer for this point. We are not entirely certain which specific ANOVA approach is being suggested, but our rationale for using a GLM-based encoding analysis is that such approach allows us to model continuous, trial-by-trial variables (e.g., value signals, prediction errors, transitions) while simultaneously controlling for multiple correlated predictors. This approach is widely used in systems neuroscience (particularly in decision-making research) offering analytical flexibility and comparability with prior approaches.

      (3) How were the 20 iterations for decoding decided? That seems low.

      We do not agree that 20 repetitions of 5-fold cross validation is low. The error bars in panels 6C-E demonstrate what low variance occurred across these 20 repetitions. It is the average of these low variance repetitions against which we performed statistics by performing a permutation test where these 20 repetitions were repeated a further 500 times.

      (4) It was unclear to me how the authors reached the conclusion "Thus, caudate activity appeared to represent the value of the state the subject was currently in." when the state value wasn't computed directly. I don't see how encoding the chosen and unchosen option is the same as the state the animal is in, which should also incorporate where the animal is in a block of trials or session, and the knowledge regarding the chosen and unchosen option.

      We agree with this point and have tempered this statement:

      “Thus, caudate’s encoding of an option’s value also reflected the availability of the option.”

      (5) Figures 1C, D, and E were not legible to me even at 200% zoom.

      We apologise for this oversight. We’ve now updated panels 1C-E to a more readable size:

      (6) There is a Figure 2H in the figure legend, but the panel appears to be missing from Figure 2.

      This text has been removed.

      (7) Figure 2: It would've been nice to see F and G for all areas.

      We have now added this data as additional panels in Figure 2.

      (8) Figure 3: How is the transition disambiguated from a visual response to the set of images?

      This was indicated by the background changing colour to that of the learned second stage state before the actual choices were presented. We’ve updated the Results text to make this clearer:

      “Tracking the state-transition structure of the task is imperative for solving the task as a MB-learner. All four regions encoded whether the current trial’s first-stage choice transitioned to the common or rare second-stage state (which was indicated by a change in background colour before the second stage choices were presented, Figure 1A).”

      (9) Figure 4F: Is this collapsed across time points? So neurons that were significant at any time? I'm confused how Figure 4A relates to 4F, as 4A shows much lower percentages of significant neurons.

      Figure 4F counts the total number of neurons that had a significant period of encoding at any timepoint over the epoch (as assessed with a length-based permutation test). Whereas, 4A shows the amount of significant encoding neurons at any one time point. Investigating this further, we found that the encoding was dynamic with different neurons encoding different parts of the epoch. We have now added a new supplementary figure to highlight this and refer to it in Results:

      “Examination of the strongest signal observed, ACC’s encoding of MB Q-values, showed a dynamic pattern with different neurons encoding the signal at different parts of the epoch (Figure S6). When aggregating the number of significant coders throughout the epoch, and examining the specificity of MB versus MF coding, we found that all regions had a significant population of neurons that encoded MB-, but not MF-, derived value (30, 18.72, 23 and 24% of neurons in ACC, DLPFC, caudate and putamen respectively; all p<0.0014 binomial test against 10% (as the strongest response to either of the two options was used); Figure 4F).“

      (10) Data/ code could be made publicly available instead of upon request.

      All data and code to reproduce figures are now available at https://github.com/jamesbutler01/TwoStepExperiment. The manuscript has been updated to reflect this:

      Data and materials availability:

      All data and code to reproduce figures are available at https://github.com/jamesbutler01/TwoStepExperiment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors' goal was to advance the understanding of metabolic flux in the bradyzoite cyst form of the parasite T. gondii, since this is a major form of transmission of this ubiquitous parasite, but very little is understood about cyst metabolism and growth. Nonetheless, this is an important advance in understanding and targeting bradyzoite growth.

      Strengths:

      The study used a newly developed technique for growing T. gondii cystic parasites in a human muscle-cell myotube format, which enables culturing and analysis of cysts. This enabled the screening of a set of anti-parasitic compounds to identify those that inhibit growth in both vegetative (tachyzoite) forms and bradyzoites (cysts). Three of these compounds were used for comparative Metabolomic profiling to demonstrate differences in metabolism between the two cellular forms.

      One of the compounds yielded a pattern consistent with targeting the mitochondrial bc1 complex and suggests a role for this complex in metabolism in the bradyzoite form, an important advance in understanding this life stage.

      Weaknesses:

      Studies such as these provide important insights into the overall metabolic differences between different life stages, and they also underscore the challenge of interpreting individual patterns caused by metabolic inhibitors due to the systemic level of some of the targets, so that some observed effects are indirect consequences of the inhibitor action. While the authors make a compelling argument for focusing on the role of the bc1 complex, there are some inconsistencies in the patterns that underscore the complexity of metabolic systems.

      We agree with reviewer #1 that metabolic fingerprints are complex to interpret and we did try to approach this problem by including mock treatment and non-metabolic inhibitors as controls. We address specific concerns below.

      Reviewer #2 ( Public review):

      Summary:

      A particular challenge in treating infections caused by the parasite Toxoplasma gondii is to target (and ultimately clear) the tissue cysts that persist for the lifetime of an infected individual. The study by Maus and colleagues leverages the development of a powerful in vitro culture system for the cyst-forming bradyzoite stage of Toxoplasma parasites to screen a compound library for candidate inhibitors of parasite proliferation and survival. They identify numerous inhibitors capable of inhibiting both the disease-causing tachyzoite and the cyst-forming bradyzoite stages of the parasite. To characterize the potential targets of some of these inhibitors, they undertake metabolomic analyses. The metabolic signatures from these analyses lead them to identify one compound (MMV1028806) that interferes with aspects of parasite mitochondrial metabolism. The authors claim that MV1028806 targets the bc1 complex of the mitochondrial electron transport chain of the parasite, although the evidence for this is indirect and speculative. Nevertheless, the study presents an exciting approach for identifying and characterizing much-needed inhibitors for targeting tissue cysts in these parasites.

      Strengths:

      The study presents convincing proof-of-principle evidence that the myotube-based in vitro culture system for T. gondii bradyzoites can be used to screen compound libraries, enabling the identification of compounds that target the proliferation and/or survival of this stage of the parasite. The study also utilizes metabolomic approaches to characterize metabolic 'signatures' that provide clues to the potential targets of candidate inhibitors, although falls short of identifying the actual targets.

      Weaknesses:

      (1) The authors claim to have identified a compound in their screen (MMV1028806) that targets the bc1 complex of the mitochondrial electron transport chain (ETC). The evidence they present for this claim is indirect (metabolomic signatures and changes in mitochondrial membrane potential) and could be explained by the compound targeting other components of the ETC or affecting mitochondrial biology or metabolism in other ways. In order to make the conclusion that MMV1028806 targets the bc1 complex, the authors should test specifically whether MMV1028806 inhibits bc1-complex activity (i.e. in a direct enzymatic assay for bc1 complex activity). Testing the activity of MMV1028806 against other mitochondrial dehydrogenases (e.g. dihydroorotate dehydrogenase) that feed electrons into the ETC might also provide valuable insights. The experiments the authors perform also do not directly measure whether MMV1028806 impairs ETC activity, and the authors could also test whether this compound inhibits mitochondrial O2 consumption (as would be expected for a bc1 inhibitor).

      We thank the reviewer for highlighting this important aspect. To further investigate the effect of MMV1028806 on the mETC, we adapted a commercial oxygen consumption assay and demonstrated that MMV1028806, like Atovaquone and Buparvaquone, inhibits the ETC, leading to reduced oxygen consumption similar to Antimycin A, which inhibits the bc1-complex. These results are now included in the revised manuscript (Methods, lines 210–233; Results, lines 460–468).

      (2) The authors claim that compounds targeting bradyzoites have greater lipophilicity than other compounds in the library (and imply that these compounds also have greater gastrointestinal absorbability and permeability across the blood-brain barrier). While it is an attractive idea that lipophilicity influences drug targeting against bradyzoites, the effect seems pretty small and is complicated by the fact that the comparison is being made to compounds that are not active against parasites. If the authors are correct in their assertion that lipophilicity is a major determinant of bradyzoicidal compounds compared to compounds that target tachyzoites alone, you would expect that compounds that target tachyzoites alone would have lower lipophilicity than those that target bradyzoites. It would therefore make more sense to (statistically) compare the bradyzoicidal and dual-acting compounds to those that are only active in tachyzoites (visually the differences seem small in Figure S2B). This hypothesis would be better tested through a structure-activity relationship study of select compounds (which is beyond the scope of the study). Overall, the evidence the authors present that high lipophilicity is a determinant of bradyzoite targeting is not very convincing, and the authors should present their conclusions in a more cautious manner.

      Thank you for raising this excellent point. We performed a statistical test of tachyzoidal and both bradyzoidal and dually active compounds and find indeed no significant difference (P = 0.06). We altered the results text line 367-368 and the figure S2B caption to explicitly mention this.

      (3) Page 11 and Figure 7. The authors claim that their data indicate that ATP is produced by the mitochondria of bradyzoites "independently of exogenous glucose and HDQ-target enzymes." The authors cite their previous study (Christiansen et al, 2022) as evidence that HDQ can enter bradyzoites, since HDQ causes a decrease in mitochondrial membrane potential. Membrane potential is linked to the synthesis of ATP via oxidative phosphorylation. If HDQ is really causing a depletion of membrane potential, is it surprising that the authors observe no decrease in ATP levels in these parasites? Testing the importance of HDQ-target enzymes using genetic approaches (e.g. gene knockout approaches) would provide better insights than the ATP measurements presented in the manuscript, although would require considerable extra work that may be beyond the scope of the study. Given that the authors' assay can't distinguish between ATP synthesized in the mitochondrion vs glycolysis, they may wish to interpret their data with greater caution.

      We thank the reviewer for addressing this important point. The enzymatic assay used in our study cannot distinguish whether ATP is produced via glycolysis or mitochondrial respiration. However, we minimized glycolytic ATP production in bradyzoites by starving them for one week without glucose. After this period, amylopectin stores are depleted, forcing the parasites to utilize glutamine via the GABA shunt to fuel the TCA cycle and generate ATP predominantly through respiration. While minor ATP production via gluconeogenic fluxes cannot be excluded, the main ATP supply under these conditions is expected to originate from the mitochondrial electron transport chain. Indeed, ATP levels are lower in HDQ-treated bradyzoites, which we attribute to the compound’s impact on electron-supplying enzymes upstream of the bc1 complex, although this inhibition is not sufficient to fully abolish ATP production as observed with Atovaquone treatment.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an exciting 400-drug screening using a MMV pathogen box to select compounds that effectively affect the medically important Toxoplasma parasite bradyzoite stage. This work utilises a bradyzoites culture technique that was published recently by the same group. They focused on compounds that affected directly the mitochondria electron transport chain (mETC) bc1-complex and compared them with other bc1 inhibitors described in the literature such as atovaquone and HDQs. They further provide metabolomics analysis of inhibited parasites which serves to provide support for the target and to characterise the outcome of the different inhibitors.

      Strengths:

      This work is important as, until now, there are no effective drugs that clear cysts during T. gondii infection. So, the discovery of new inhibitors that are effective against this parasite stage in culture and thus have the potential to battle chronic infection is needed. The further metabolic characterization provides indirect target validation and highlights different metabolic outcomes for different inhibitors. The latter forms the basis for new studies in the field to understand the mode of inhibition and mechanism of bc1-complex function in detail.

      The authors focused on the function of one compound, MMV1028806, that is demonstrated to have a similar metabolic outcome to burvaquone. Furthermore, the authors evaluated the importance of ATP production in tachyzoite and bradyzoites stages and under atovaquone/HDQs drugs.

      Weaknesses:

      Although the authors did experiments to identify the metabolomic profile of the compounds and suggested bc-1 complex as the main target of MMV1028806, they did not provide experimental validation for that.

      In our updated manuscript we performed additional experiments such as oxygen consumption assay to further qualify the bc1 complex as the target. We also toned down some of our statements to make sure that no false claims are made.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Introduction: It would be helpful to briefly describe what the pathogen Box is, what compounds are in it, and the rationale for using a drug screen to better understand mitochondrial function in cysts.

      Thank you for this suggestion, we added an introduction of the MMV pathogen box and outlined our rationale for our experimental approach in lines 90 to 99.

      Please explain why dual-active drugs were useful for understanding differences, rather than just seeking drugs that might target bradyzoites alone.

      We focused on dually active compounds for two reasons. First, these are the most promising and potent targets to develop drugs against. Both stages might occur simultaneously and these dually active drugs may eliminate the need for treatment with a drug combination. Second, we speculated that monitoring the responses to inhibition of the same process in both parasite stages would reveal its functional consequences. Dually active compounds enable this direct comparison. Bradyzoite-specific compounds may be interesting from a developmental perspective but may require a reverse genetic follow-up to compare differences between stages. The lack of a well-established inducible expression system in bradyzoites that allows short term and synchronized knock-down makes metabolomic approaches difficult. We added these two points in brief to the results section (line 378 – 381).

      Figure 4: this is a very important figure in understanding the significance of the work, but it is not well described in the legend. Even if these graphics have been used in other manuscripts, it would be helpful to provide better annotation in the figure legend.

      Thank you for pointing this out. We expanded the figure legend to explain the isotopologues data in more detail. Line 793 to 802.

      B,D: Explain what the three columns for each drug category represent.

      Addressed

      C,E: Explain what isotopologues are, what the M+ notation means, and what the pie charts represent. Other main figures have suitable legends.

      Addressed

      Discussion: there are several places where the reasoning is a bit hard to follow, and rearrangement to provide a clear logical flow would be helpful. In particular, the reasoning for why HDQ impairs active but non-essential processes could be laid out more clearly.

      We added additional clarifications to the discussion section and re-wrote the HDQ paragraph. We hope that our reasoning is now easier to follow.

      Abbreviations: A list of abbreviations for the entire manuscript would be helpful.

      This is a good idea and we now provide an abbreviations list.

      Minor typos:

      P12, 2d paragraph: sentence beginning with: Consistent with this hypothesis... "cysts" is used twice

      Corrected

      P15, top of the second paragraph: "nano" and "molar" should be one word

      Corrected

      Reviewer #2 (Recommendations for the authors):

      Major comments (not already covered in the weaknesses section of the public review)

      (1) Figure 2 and the related description of these experiments in the methods section (page 3). The approach for calculating IC50 values for the compounds against tachyzoites is unclear. How did the authors determine the time point for calculating IC50 vacuoles? Was this when the DMSO control wells reached maximum fluorescence? This could be described in a clearer manner. A concern with calculating IC50 values on different days is that parasites will have undergone more lytic cycles after 7 days compared to 4 days, which means that the IC50 values for fast- vs slow-acting compounds might be quite different between these days. As a more minor comment on these experiments, the methods section does not describe whether the test compound was removed after 7 days, as the experimental scheme in Figure S1A seems to imply. Please clarify in the methods section.

      This is a very good point and we clarified this in the methods section, line 157–160. In brief, we choose the latest time point when exponential growth could be observed in the fastest growing cultures, generally this was in mock treated cultures and at day 4 post infection. We also clarified that we changed media and removed treatment after 7 days.

      Minor Comments

      (2) Page 2. "we employed a recently developed human myotube-based culture system to generate mature T. gondii drug-tolerant bradyzoites". What makes these bradyzoites 'drug-tolerant' or to which drugs are they tolerant? This isn't clear from the description.

      We added these details in the introduction (line 94 to 96) and state that these cysts develop resistance against anti-folates, bumped kinase inhibitors and HDQ, a Co-enzyme Q analog.

      (3) Figure 1E. The number of compounds in this pie chart adds up to 384, whereas the methods describe that 371 compounds were tested. What explains this discrepancy in numbers?

      We understand the confusion. We now updated the pie chart to reflect only compounds that were included in the primary screen (371) as reflected in Supplementary Table S1. We separately analysed 29 compounds that were previously tested against tachyzoites by Spalenka et al., and found an additional 13 compound, that were originally included in the pie chart. In a secondary test the activity of 10 of these 13 compounds could be confirmed. All in all we found the 16 compounds shown in Fig. 2 E-G.

      (4) Page 3. The resazurin assays for measuring host cell viability could be explained in a clearer manner. What host cells were used? Were the host cells confluent when the drug was added (and the assay conducted) or was the drug added when the host cells were first seeded? How long were the host cells cultured in the candidate inhibitors before the assays were performed? What concentration (or concentration range) were the compounds tested? The host inhibition data are not easily accessible to the reader - the authors might consider including these data as part of Table S2D.

      The necessary information was added to the methods section (line 145 to 153). We tested for host toxicity in both HFF and KD3 myotubes during the primary screen at 10 µM in triplicates. The colorimetric assay was performed after tachyzoite growth assays in HFFs 7 days post infection and after completion of the 4 week re-growth phase of bradyzoites in myotubes. The resulting data is already part of Supplementary File 1. In addition, we performed concentration dependent resazurin assays after secondary concentration dependent growth inhibition assays and also included data in Supplementary File 1. For the bradyzoite growth assay we performed visual inspection after drug exposure for one week and before tachyzoite re-growth to detect missing or damaged monolayer. Also, this data is included in the Supplementary File 1. We also included the cytotoxicity data as suggested into Table S2D.

      (5) Page 7. "Except for four compounds (MMV021013, MMV022478, MMV658988, MMV659004), minimal lethal concentrations were higher in bradyzoites". The variation in these data seems quite large to be making this claim. Consider a statistical analysis of these data to compare potencies in tachyzoites vs bradyzoites.

      With this sentence we aimed to describe the results and not to make a statement. We toned down the sentence to “… minimal lethal concentrations appear generally higher in bradyzoites… “ line 344 to 347. We also added a line 1 µM in the charts to facilitate easier comparison of compound efficacies.

      (6) It would be helpful to readers to include the structures of hit compounds in the figures (perhaps as part of Figure 3).

      This is a good idea and would improve the manuscript. To not overburden figure 3 we added structures to Fig S3.

      (7) Page 8. "Infected monolayers were treated for three hours with a 3-fold of respective IC50 concentrations". 3-fold higher than IC50 concentrations? This isn't clear.

      Thank you for noticing this: We clarified the sentence and also corrected the concentration, corresponding to five times their IC50s as stated in the methods section: “Infected monolayers were treated for three hours with compound concentrations five times their respective IC<sub>50</sub> values or the solvent DMSO.” Line 374 - 376

      (8) Page 9. "buparvaquone, which we found to be dually active against T. gondii tachyzoites and bradyzoites, targets the bc1-complex in Theileria annulata (McHardy et al. 1985) and Neospora caninum (Müller et al. 2015) and was recently found active against T. gondii tachyzoites (Hayward et al. 2023)." The latter paper showed that buparvaquone targets the bc1 complex in T. gondii tachyzoites as well.

      Yes, it was found to inhibit O2 consumption rate in tachyzoites. We changed the sentence accordingly. Line 407 to 411.

      (9) Page 9. "Anaplerotic substrates were also affected by all three treatments, most notably a strong accumulation of aspartic acid." It is interesting that the M+3 isotopologue of aspartate (presumably synthesised from pyruvate) is the predominant form (rather than the M+2 and M+4 isotopologues that would derive from the TCA cycle, and as the diagram in Figure 4A seems to suggest). Given that aspartate is a precursor of pyrimidine biosynthesis that is upstream of the DHODH reaction, it is conceivable that its accumulation is related to the depletion of pyrimidine biosynthesis (so would tie into the point about the accumulation of DHO and CarbAsp noted earlier in the paragraph).

      Yes, we assume the same. We altered the text and summarized the changes in Asp as a result of DHOD inhibition, as we also already do in the next paragraph using <sup>15</sup>N-glutamine labelling. Line: 416 - 418

      (10) Figure 6 and Page 10. Regarding the metabolomic experiments that show increased levels of acyl-carnitines. The authors note that "Since [beta-oxidation] is thought to be absent in T. gondii, we attribute these changes to inhibition of host mitochondria". This is conceivable, although the T. gondii genome does encode homologs of the proteins necessary for beta-oxidation (e.g. see PMID 35298557). If the carnitine is coming from host mitochondria, is host contamination a concern for interpreting the metabolomic data? Or do the authors think that parasites are scavenging carnitine from host cells? It is curious that the carnitine accumulation is observed in parasites treated with buparvaquone (and MMV1028806) but not atovaquone, even though buparvaquone and atovaquone (and possibly MMV1028806) target the same enzyme. Do the authors have any thoughts on why that might be the case?

      Yes, thank you for raising this point. We changed the discussion elaborating on this and included the debated presence of beta-oxidation: line 640: “We also detect elevated levels of acyl-carnitines in BPQ and MMV1028806 treated bradyzoites. These molecules act as shuttles for the mitochondrial import of fatty acids for β-oxidation. However, this pathway has not been shown to be active and is deemed absent in T. gondii (35298557, 18775675). The presence of acyl-carnitines in bradyzoites might reflect import from the host. It is conceivable that their elevation in response to buparvaquone and MMV1028806 indicates compromised functionality of the host bc1-complex and subsequently accumulating β-oxidation substrates. Indeed, BPQ has a very broad activity across Apicomplexa (Hudson et al. 1985) and kinetoplastids (Croft et al. 1992).“ Regarding the existence of beta-oxidation: some potential enzymes might be conserved, but those could in part take part in branched chain amino acid degradation pathways. On a separate note: we looked extensively on beta-oxidation using stable isotope labelling and became convinced that any activity occurred in the host cell only but not in the parasite (unpublished).

      (11) Page 11. "the mitochondrial [electron] transport chain in bradyzoites".

      Corrected.

      (12) Figure S6B. Were these optimization experiments performed in tachyzoites or bradyzoites? If the former, and given that bradyzoites have apparently smaller amounts of ATP per parasite (Figure 7C), are these values in the linear range for 10^5 bradyzoites?

      Yes, we do think that the assay remains linear for these lower concentrations. Tachyzoites give a linear response starting from 10^3 parasites per sample. In the actual experiment we used 10^5 parasites, both tachyzoites and bradyzoites. Under the tested conditions bradyzoites maintain 10% of the ATP pools of tachyzoites, which should be well within the linear range of the assay. Also in Atovaquone-treated bradyzoites ATP concentration could be lower to 10% and still remain in the linear range of the assay. For practical reasons, we simply acknowledge this limitation and consider it acceptable within the scope of this study.

      Reviewer #3 (Recommendations for the authors):

      Major comments

      (1) The authors should provide a negative control for the experiment on Figure 5. I would suggest doing the same experiment with an inhibitor that has no effect on mitochondrial potential.

      We addressed this criticism by repeating the assay on tachyzoites and additionally including inhibitors that do not have the mitochondrial electron transport chain as their primary target (Pyrimethamine, Clindamycin, 6-Diazo-5-oxo-L-norleucin). The results are summarized in the supplementary Fig S5, line 445 – 449) and show that there is no effect of these inhibitors on the mitochondrial membrane potential. This supports the specificity of the assay and suggests that MMV1028806 and BPQ indeed target a mitochondrial process in this stage. Also, in this repetition ATQ, BPQ and MMV1028806 did significantly deplete the Mitotracker signal.

      (2) Figure 5 - Did the authors perform this experiment in 3 biological replicates? This requires clarification of the figure legend.

      No, we did not perform the experiment in 3 biological replicates. After establishing the assay thoroughly, we performed it once on tachyzoites and bradyzoites. The sampling was done on every vacuole we encountered during microscopy going through the slide from left to right. That is the reason the sample size varies from treatment to treatment. The sample size is mentioned in the caption of figure 5. However, we repeated the experiment with additional controls (see Fig. S5), which showed that the Mitotracker signals were significantly depleted in a very similar manner in ATQ, BPQ and MMV1028806 treated parasites.

      (3) The authors identify that MMV1028806 has bc1-complex as the main target. I suggest that they should perform a complex III activity assay to affirm this. Also, it would be good to test if other mETC complexes are affected by this compound to prove its specificity. There is only one paper showing complex III activity in tachyzoites (PMID:37471441) and no papers in bradyzoites. So if the authors cannot do this assay, I suggest that they should change the text indicating that bc-1 complex could be the main target of the compound but more experimental validation is needed.

      We hope to have satisfied the reviewer’s request by performing an oxygen consumption assay on tachyzoites. Together with metabolic profiling and labelling data, this shows that both upstream and downstream processes are impacted by MMV1028806 and strongly suggest the bc1-complex as a target (Fig 5E).

      (4) Figure S5 - Are the differences shown in the EM experiment statistically supported?

      We analyzed 28 images and measured the areas in 12 to 26 images. We substituted the table of means in Fig S6B by a graph showing individual values. These areas are indeed statistically different between DMSO and ATQ / MMV treated parasites. We changed the wording in the results section accordingly “Analysis by thin section electron microscopy revealed a largely unaffected sub-mitochondrial ultrastructure but the areas of mitochondrial profiles were changed in comparison to control after exposure with ATQ and MMV1028806 but not with BPQ (Fig. S6)“. The description of Fig S6B was changed to “(B) Measured areas of mitochondrial profiles from 21, 12, 15 and 26 images showing DMSO, ATQ, BPQ and MMV1028806 treated parasites (* denotes p < 0.05 in Mann-Whitney tests)”.

      Minor comments:

      (1) What was the criteria to choose the example compounds in Figure 1B and 1D? The authors should clarify this in the text.

      These graphs are shown for illustrative purposes and were chosen based on their display of different drug efficacies. We considered this helpful for interpreting the screening data.

      (2) Figure 2G - add statistical analysis.

      We added Mann-Whitney tests and updated the figure legend and results text accordingly in line 344 – 347.

      (3) The authors should provide more insights in the discussion about why this new compound is the next step in drug discovery compared to atovaquone or burvaquone - for example, do you expect better availability in the brain, etc.

      We used MMV1028806 and the other hits ATQ and BPQ to make the point that the bc1-complex is a good target in bradyzoites that allows curative treatment. We do not suggest that the compound itself is a good starting point. We point to other actively developed candidates such as ELQ series in the discussion, line 719.

      (4) Scale bars in Figure 5 should be aligned and have equal thickness.

      We re-formatted the scale bars and aligned them when not obscuring parasites.

      (5) The authors should be consistent with font sizes and styles in all the figures.

      We adjusted the font styles to match each other.

    1. These developments make the evidentiary gap salient: funders, editors, and policymakers need to know when AI evaluation outputs are trustworthy enough to use, and when they are unstable, biased, or manipulable. Recent work highlights all three concerns. First, reproducibility can be “jagged”: repeated runs of the same models on the same corpus over time can be highly consistent for some tasks and models, but much less so for others (Thomas, Romasanta, and Pujol Priego 2026); robustness may require separating scientific judgment from computational execution (Xu and Yang 2026); and even without overt adversarial intent, subtle reframings of the same task can induce systematic shifts in outputs—a form of LLM “specification search”—raising concerns about frame-sensitive biases when models serve as measurement instruments (Asher et al. 2026). Second, adversarial manipulation is not hypothetical: invisible-text “prompt injection” can substantially inflate LLM-assigned review scores and acceptance recommendations in simulated peer review (Choi et al. 2026), and prompt-injection vulnerabilities are also documented in other high-stakes advice settings (Lee et al. 2025). Third, even when outputs look fluent and plausible, it remains unclear whether AI models approximate expert judgment: AI-generated reviews tend to cover more surface-level sections while being less thematically diverse and less focused on interpretation, originality, and applicability than human reviews (Rajakumar et al. 2026); LLMs used as manuscript quality checkers identify only a small fraction of confirmed critical errors even with the strongest reasoning models (Zhang and Abernethy 2025); and LLM scoring exhibits systematic range restriction and halo effects that can distort agreement metrics (Wang et al. 2025).

      This seems too long. This isn't really coming from us, so we might mention some of these things, but I tend to make this a lot shorter. Perhaps some things can be put in footnotes. Obviously we need to check these carefully to see if we agree with them.

      I think I mentioned before I'm not sure our work really speaks to the prompt injection issue. The set of work we're putting the LLMs and humans to evaluate would seem to be rather unlikely to have such prompt injection, so we can't really test that (unless we modified the work being fed in, but I don't think that's in our wheelhouse right now. )

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity (Required)

      Ali et al investigate the composition of putative kinetochore subcomplexes in the unicellular eukaryote Tetrahymena thermophila. Up to the point of this study, only a CENP-A ortholog and two subunits of the microtubule-binding Ndc80 complex had been clearly identified. This left open the question, whether Tetrahmena kinetochores follow the conventional organization found in common model systems such as yeast or human cells, or contain many unconventional proteins. The authors combine proximity biotinylation coupled to mass spectrometry with deep homology searches and structure predictions.

      Extensive bioinformatic analysis of the T. thermophila genome allows the authors to annotate 16 genes as kinetochore genes (KiTT). Using sequence comparisons with known kinetochore proteins, they were able to relate their novel KiTT proteins to the conserved kinetochore components Cenp A, Cenp C, the KMN network, as well as auxiliary proteins. In particular, the authors were able to complete the organization of the Ndc80 complex and identify subunits of the Mtw1/Mis12 complex and a Knl1 ortholog. This characterizes a KMN network as the centerpiece of the Tetrahymena kinetochore architecture.

      The CCAN seems to be represented solely by CENP-C, with key binding interfaces to KMN and CENP-A being preserved. An interesting aspect is that neither a Dam1-, nor a Ska homolog seems to support the Ndc80 complex. Instead, the authors identify a Kinesin-6 homolog that may potentially compensate for the absence of these factors.

      The study is well-designed, the results are thoughtfully discussed and the expertly conducted experiments highlight the power of combining experimental identification (BioID) with bioinformatic analyses.

      We appreciate the favorable assessment of our manuscript and would like to extend our thanks for the reviewers’ constructive criticisms and insightful comments. Where possible we aim to incorporate them (see below).

      Major comments The functional validation of the newly identified subunits using RNAi feels somewhat limited in this study. I understand there a technical limitations in this system, but whenever possible, I would at least expect the authors to explore differential effects on different parts of the kinetochore using the reagents they have at hand. In the particular, the authors show the effects of depleting KiTT12 (the kinesin-6 homolog) on Ndc80 kinetochore localization. It would be important to check effects also on CENP-A (using the anti-CNA1 antibody), or on other subunits. Given the available reagents, this should be readily possible.

      We agree that examining the effect of KiTT12 depletion on inner kinetochore components will strengthen the functional interpretation. While we do not expect, based on KiTT12’s relative location, a direct impact of KiTT12 RNAi on CNA1 (CENP-A) or CENP-C, we will perform immunofluorescence analyses using anti-CNA1 and anti-CENP-C antibodies in KiTT12 RNAi cells (and KiTT2 (NUF2) RNAi as control). These experiments will allow us to determine whether KiTT12 depletion specifically affects outer kinetochore integrity (as suggested by Ndc80 mislocalization) or more broadly perturbs kinetochore architecture (CNA1/CENP-C). We will include quantitative analyses of signal intensity and kinetochore organization to clarify potential hierarchical dependencies.

      The organization of the Knl1 ortholog and the question of whether a mitotic checkpoint is present, deserves some additional discussion. Interestingly, the positional organization of a PP1 binding motif at the N-terminus of a long disordered domain seems conserved. On the other hand, MELT motifs appear to be absent. The authors should discuss the implications of this some more. Is there an Mps1 homolog? What about the error correction machinery including Aurora B and the CPC? The putative MadBub homolog does not seem to localize to kinetochores, but maybe this is not detectable, unless the respective conditions (unattached kinetochores) are generated. Is it known, how the system reacts to spindle depolymerization?

      Tetrahymena does not appear to have a spindle checkpoint, given prior reports that chromosome segregation is not halted by microtubule depolymerization [Kaczanowski et al. 1985]. In line with this, the SAC protein orthologs that are present lack the motifs to mount a sufficient response and halt the cell cycle. We thus agree that the architecture of the Tetrahymena KNL1 ortholog and possible other SAC-related proteins raises important evolutionary questions. We will expand the discussion to address:

      • The absence of canonical MELT motifs in the Tetrahymena KNL1 ortholog.
      • The absence of a detectable Mps1 ortholog in our homology searches.
      • The divergence of the Tetrahymena MadBub protein and its lack of conserved KEN–ABBA–KEN motifs typically required for APC/C inhibition.
      • The absence of Mad2 and Mad2-binding motif in Cdc20. Relevant REFS:

      • Kaczanowski et al. 1985, Experimental Cell Research

      • Loidl et al. 2009, Molecular Biology of the Cell

      • Kops et al. 2020, Current Biology

      Minor comments - Introduction: When introducing the Tetrahymena kinetochore, please add some sentences on microtubule/spindle organization in the MIC. What is known about the kinetochore-microtubule attachment site in Tetrahymena?

      We will expand the introduction to include a concise description of spindle organization in the micronucleus (MIC), including known features of centromere clustering, spindle assembly, and microtubule attachment sites during MIC mitosis.

      Relevant REFs:

      • Davidson et al. 1975, Biosystems

      • Lafountain Jr et al. 1979, Chromosoma

      • Lafountain Jr et al. 1980, Cell Motility

      • Line 128: putative homology to Spc24 (E=13), comment on why this was considered, what cutoffs were applied etc..

      We will clarify the homology detection criteria, including E-value thresholds, domain architecture considerations, reciprocal searches, and structure-based validation. We will explain why this candidate was retained despite weak sequence similarity and how structural prediction strengthened confidence. In (very) short, we used the ‘top hit’ principle. E=13 for spc24 was simply the first hit and upon AlphaFold-predicted structures, the protein was clearly similar to spc24.

      • Line 135: briefly mention and discuss conservation of the RWD folds in the Spc24-25 orthologs.

      We will expand this section to explicitly describe conservation of the RWD fold and how structural modeling supports ortholog assignment despite sequence divergence. The E-values mentioned in line 128 for instance are for the RWD domain-only, not the full-length protein, we will further indicate this in the text.

      • Line 194: Maybe replace "show" with "suggest", given there is no experimental data behind the CENP-C identification

      We agree and will revise wording to “suggest” to avoid overstatement. However, we do want to point out that CENP-C/KiTT8 was identified experimentally as well through the BioID pipeline, and also an antibody was raised against KiTT8 that places this protein at the inner kinetochore.

      • Figure 7B: please add the information for the RNAi target directly to the Figure

      We will add the requested information directly to the figure.

      • Figures in the combined pdf: please add the respective Figure number or Supplementary Figure number directly on the Figure.

      We will add the figure numbers to the supplementary figure files.

      Significance (Required)

      While functional studies are often conducted in very few model organisms, exploring the evolutionary variations of kinetochore architecture can help to understand the design principles of kinetochores. I also helps to assign functions to specific subcomplexes and can reveal how adaptations of a core machinery occurs. Tetrahymena is historically an important experimental system that has had a great impact on the understanding of multiple aspects of nuclear biology. Deciphering the organization of the chromosome segregation machinery in this organism is therefore of great interest to researchers interested in mitosis and genome stability.


      Reviewer #2

      Evidence, reproducibility and clarity (Required)

      Summary Ali, Raas et al. provide a comprehensive molecular characterization of the kinetochore in the ciliate Tetrahymena thermophila. By integrating proximity proteomics (TurboID) with structure-based "deep" homology detection, they identify 16 kinetochore proteins (KiTT1-16), including nine highly diverged "cryptic" orthologs of conserved LECA components and four lineage-specific proteins. Their results demonstrate that while the Tetrahymena kinetochore lacks a conventional CCAN complex, it maintains a recognizable outer kinetochore structure supplemented by novel proteins essential for faithful chromosome segregation.

      Major comments 1. Representation of known kinetochore diversity - Since this manuscript wants to highlight that it is important to characterize kinetochore components in different eukaryotic clades, it would be good to highlight the known diversity from the literature in Figure 1, e.g. indicating species/clades for which components have been experimentally validated vs. only computationally inferred. - It would be good to specifically highlight this on the figure for the clade closest to Tetrahymena in which KT components have been experimentally validated (Apicomplexa?). - L58-64: the sentences 'we have a limited understanding about kinetochore composition and function from other branches of the eukaryotic tree of life' and 'these surveys also uncovered a surprisingly extensive diversity of kinetochore composition across eukaryotes' seem to contradict each other. Instead of/in addition to the literature described in the introduction, as suggested above, having known diversity indicated on a figure would therefore be helpful. This could be done quite roughly, just mentioning the number of verified KT components and the number of species for which this was done.

      We will add a more elaborate version of Figure 1a (or include an extended version to the supplement), summarizing the requested information in the above three points. Indeed, our mention of diversity in lineages is an inferred one, not a directly tested one. We will amend the text to clarify this.

      • L46-L56: when explaining the structure of the KT, it would be good to already refer to a figure, like the diagram of a human KT in 1B. As it is now, the introduction first explains the general structure, and then goes into diversity. This is fine, but it would be easier to understand if the figure panels followed this order.

      We will include additional references to figure 1b at the appropriate places in the introduction.

      The data can sometimes be represented in a more straightforward manner: - L120-...: After reading through the whole text, I understand why the authors choose to talk about Spc24 and Spc25 first (since Spc25 is also used in the TurboID experiment). However, the presented pipeline for these two proteins is much less convincing than for the other proteins. Spc24/25: 'Some homology > slight structure similarity > right localization in immunostaining' vs. the pipeline for the other proteins: 'TurboID > confirmation using homology + immunostaining' (what is depicted in Fig. 2C). The latter is very convincing, but by starting off with the less convincing pipeline, the reader starts off on the wrong track. Since Spc24 is not used in the end for the first TurboID results, is Spc25 necessary at this point or can this come later?

      We used this ‘story line’ because it was the way it happened. It felt wrong to us to pretend we hadn’t already found Spc24 and Spc25 by bioinformatic means before doing the TurboID, which might also have caused concerns with some readers as to our ability to detect orthologs for these and other proteins. Of note: a re-analysis of the Spc24-BioID experiment revealed that it was previously wrongfully considered unusable, hence we now include it in our NDC80-C based TurboID discovery pipeline in Figure 2. We will where possible revise the narrative structure to more clearly explain the logic of the discovery pipeline, while maintaining transparency about the historical order in which candidates were identified. We will streamline the Spc24/25 section and more prominently introduce the TurboID-driven identification pipeline (Figure 2C) to guide the reader.

      • It is very good and thorough that the authors noticed that some of the KT proteins were simply missed because they were not part of the original predicted proteome. However, why weren't the TurboID analyses simply redone with the new proteome? The authors could still note that it was important to use the most recent version, but it would be much more straightforward for readers to immediately have the most up to date analysis.

      We thank the reviewer for pointing this out. We agree that remapping to the most recent proteome annotation will improve clarity. We will remap the TurboID datasets to the updated Tetrahymena proteome, which includes Nnf1 and Csm1, and report whether additional components are identified. Of note: in a preliminary analysis with the newest version of the proteome we do not find any new proteins in the NDC80-C-TurboID experiments. We will also clarify in the manuscript what “not in original proteome” refers to and revise Figure 2C accordingly.

      Figure 4 and accompanying paragraph: this is an interesting analysis, but impossible to interpret without comparing with the branch length of other Tetrahymena proteins or Tetrahymena as a species (if I interpreted the analysis correctly). L251: 'this underscores the high rates of evolution of kinetochore proteins'. This could be true, but this isn't proven here because there is no comparison with the evolutionary rate of other proteins in Tetrahymena.

      The reviewer is correct in arguing that without comparisons to other proteins, the statement that kinetochores proteins in Tetrahymena evolved at high rates is incorrect, or at least not supported by the present data. What we meant was to say that they evolved at high/increased rates compared to kinetochore proteins of other species. This in our view explains why we have missed them in past searches, regardless of whether this is specific to the kinetochore in Tetrahymena or to Tetrahymena proteins in general. We will amend the text to reflect this more clearly. We will explicitly acknowledge analytical limitations and remove claims regarding lineage-specific acceleration.

      Figure 5: For further validation and to better show the layered structure of the Tetrahymena kinetochore it would be nice to have a couple of images here with increased resolution by using expansion microscopy.

      We agree that improved spatial resolution would strengthen the layered organization model. We will attempt to perform expansion microscopy (ExM) on selected tagged kinetochore components and incorporate representative images into the revised manuscript (main or supplementary figures).

      Minor comments - Abstract: if you are going to call out individual components, maybe also point out the few that were already known (KiTT1-2 and 14). Otherwise the reader might be confused about the missing numbers.

      We will revise this in the abstract.

      • L37: is 'cryptic ortholog' an official term? Doesn't this just depend on the starting point of the homology search and the number of experimentally verified hits you have in certain parts of the tree? Just wondering.

      This is a valid question. Indeed, ‘cryptic’ refers to the starting point of our study (based on our previous analyses) and the process towards identifying them as being canonical. We chose this term because we feel it signifies to the reader that identifying these orthologs required approaches beyond conventional orthology searches.

      • For future submissions, it would be useful to have the figure numbers indicated on the figures, because now it was sometimes difficult to keep track.

      As mentioned above, we will add the figure numbers to the revision.

      • L51: mentioning the SAC might make it a bit too complicated for people not 100% familiar with all the complexes. Either leave it out until later, or have a short sentence explaining what the SAC is.

      We will leave out the spindle assembly checkpoint (SAC) in the beginning and will bring it up at a later point, also explaining its explicit function.

      • Figure 1A: the identity of the black 'nuclei' is not explained for the Ciliophora and Apicomplexa in the figure or figure legend.

      We apologise for the confusing black organelles in apicomplexans, these are actually the micronemes and apical complex, characteristic features of these parasites. We will change the color to that of the clade so that it is clear that only ciliates have two types of nuclei (nuclear dimorphism).

      • In Figure 1B, instead of saying 'absent', wouldn't it be more correct to say something like 'not found/detected/identified'?

      We agree and will replace ‘absent’ by ‘not detected’.

      • Figure 1C. During interphase, sometimes homologous chromosomes seem to cluster at the centromeres (5 foci - example on the left), but sometimes they don't (10 foci - example on the right). Is this something you observe a lot? Is it strain-dependent?

      We thank the reviewer for making a very good point. In principle we take the cells showing 5 foci to be interphase cells. We interpret the cells with 10 foci to be cells just prior to mitosis. So these would be G2 cells where the homologous chromosomes have been replicated and the sister pairs are still seen together here. However, if this would be the case one would expect to see 20 centromeres/kinetochores in metaphase and this is not always observed. To prevent confusion on this point, we will replace the right panel in 1C for one that contains 5 foci and will make it more clear that these foci indeed represent homologous chromosomes. In addition, we will make panels to clearly show the behaviour of chromosomes over the different stages of mitosis.

      • Figure 1C (and later in Fig. 5): centromeres don't seem to align during metaphase. Is this true, or are these examples showing late metaphase/early anaphase?

      Indeed, a true metaphase similar to classic textbook images does not seem to be present. In 3D reconstructions we do see that kinetochores sit close to the nuclear envelope forming a sphere on the outside of the spindle, but almost never exactly in the same plane. Whether this means we simply have not caught true metaphase state, or there is none (like for instance in apicomplexans, which also do not appear to have a spindle checkpoint), is unclear at this point. We will further review our images and will use consistent stages for these images, and will revise terminology on metaphase state if warranted.

      • Why was STU2 included in the kinetochore? Wouldn't it be better classified as a MAP as in Fig. 3A? I saw this is actually discussed in the discussion, but maybe this explanation should come earlier.

      We thank the reviewer for pointing this out and will add a short sentence about the MAP function of STU2, and kinetochore localization in other lineages in the introduction.

      • Figure 2A: 'strong similarity'. For a TM score of 0.4 and 0.54, I am not sure I would say 'strong similarity'. Visually, they also look different. TM is also not explained in the legend.

      What we meant to say with ‘strong similarity’ is that a domain is predicted with a matching set of secondary structure elements to the RWD domains in yeast Spc24/Spc25. As for the TM score, a score of ≥ 0.5 has been shown to be a robust metric for fold similarity significance , which is the case for the comparison of the putative T. thermophila Spc25 ortholog and the yeast ortholog. However, we acknowledge that the T. thermophila Spc24 ortholog shows additional beta sheets compared to its yeast counterpart and has a TM score below 0.5, and so we will tone down this statement and remove ‘strong similarity’. We nonetheless maintain that this protein is a Spc24 ortholog with derived properties in its RWD domain.

      Relevant reference on TM score interpretation:

      Xu & Yang 2010 Bioinformatics (https://pmc.ncbi.nlm.nih.gov/articles/PMC2913670/)

      • Fig. 2D: why not PC2? Please explain this somewhere.

      We thank the reviewer for this question. We shall add an elaborate explanation of the PC selection in the method section. In short, PC2 (together with PC1 or PC3) did not reveal any separate cluster/cloud of points surrounding the NDC80-C components (KiTT1-4). Since PC3 did reveal such a cluster, we opted to select PC3.

      • Fig. 3C-D: 'striking similarity', again, it is hard to evaluate whether this is true from the figures and TM values alone (all are >0.5). Either change the phrasing, or explain how much similarity one would expect between homologs.

      Please see our response to the previous question regarding the significance of a TM score of ≥ 0.5.

      • How certain are you that these are all diverged homologs? For example, for KNL1, could another RWD domain-containing protein have evolved to become a kinetochore protein?

      In most cases we consider multiple lines of evidence: AF2/3, HHsearch and overall protein topology, in the case of RWD KT proteins, a coiled-coil followed by a single or double RWD. In the case of SPC24, SPC25 and CSM1 we have clear best hits for both structure and sequence searches. For KNL1 (double RWD), we have a newer version of our eukaryote-wide ortholog alignment now usable for HHsearches, which reveals KiTT7 (KNL1) to be the best hit also. As such, the RWD domain proteins that we uncover are not merely some RWD, but are specifically those of the kinetochore that are found in other lineages. In addition, there are only very few double RWD proteins present amongst eukaryotes, which makes the proposed scenario of homolog replacement for KNL1 unlikely.

      • Fig. 5: why wasn't CNA1 used as a marker of the inner kinetochore or tested?

      The CNA1 antibody gave quite some background (see figure 1C), we therefore favored the use of the CENPC/KiTT8 antibody.

      • Fig. 8: There is a time axis below, but I'm not sure what is indicated on this axis. Are the events above mapped on this axis?

      We agree this axis may be confusing. The idea was to show a number of ancestral nodes relevant for the evolutionary events noted in this figure. We will add clear references in the figure to each of these ancestors.

      • L347-349: 'convergent evolution'. Is the loss of the CCAN convergent evolution, or was it already lost in the SAR common ancestor?

      This was indeed convergent evolution. Amongst Stramenopila most CCAN subunits can be detected (see for instance van Hooff et al. 2017). In addition, the alveolate ancestor already had the CCAN as we can clearly detect orthologs in Colponemida. We will add this piece of information to the presence/absence plot in either Figure 1 or in the supplemental (see comment above to Reviewer 1).

      Significance (Required)

      General Assessment: The study is robust, thorough, and well-written. The analyses are technically sound, and the authors avoid overstating their conclusions. Key strengths include the successful identification of diverged components using a "deep homology" pipeline and the functional validation of novel subunits. To improve the study, the data representation could be made more straightforward, and the manuscript structure could be condensed to better highlight the most convincing results. Finally, the claims on the speed of evolution of the kinetochore components need to be better supported.

      Advance: The study provides the first molecular map of a ciliate kinetochore. By uncovering "cryptic" orthologs that escaped previous detection, the work demonstrates that many "missing" complexes in diverse eukaryotes are likely present but highly diverged.

      Audience: This work will interest evolutionary cell biologists studying mitosis and kinetochores (especially those interested in eukaryotic diversity), as well as the ciliate research community. It also serves as a methodological roadmap for researchers using structural homology to identify divergent proteins in other non-model organisms.

      Expertise: My field of expertise includes evolutionary cell biology, kinetochores, centromeres, microbiology, microscopy and phylogenetics.


      Reviewer #3

      Evidence, reproducibility and clarity (Required)

      Kinetochores are protein complexes essential for chromosome segregation in all eukaryotes. Unexpectedly, despite their crucial function, many kinetochore components evolve rapidly, which can hinder their identification based solely on sequence comparisons. In this study, the authors combine experimental and computational analyses to provide insights into the composition of the kinetochore protein complex in the ciliate Tetrahymena thermophila. This study makes an important contribution because kinetochore components in Tetrahymena have not previously been investigated experimentally, and the composition of the Tetrahymena complex was largely unknown.

      Starting with previously identified orthologs of the outer kinetochore proteins Ndc80 and Nuf2, the authors computationally identified the two additional members of the Ndc80 complex, Spc24 and Spc25. All four components were subjected to BioID analyses, leading to the identification of 23 additional candidates, some of which are factors known to be associated with centromeric chromatin in other eukaryotes (condensin, etc.). Focusing on a subset of unknown components, the authors provide experimental support for their kinetochore participation using microscopy and confirm distant homology with several known kinetochore components in other eukaryotes. Four components referred to as KiTT10-13, however, lack detectable orthology to known kinetochore components.

      Relative localization analyses using super-resolution microscopy revealed that KiTT10, 11, and 13 are more proximal to the inner kinetochore component CENP-C, while KiTT12 localizes closer to outer kinetochore components. Remote homology and phylogenetic analyses identify divergent WD40 or SANT domains in KiTT10 and 11, as well as a kinesin motor domain for the outer-kinetochore proximal KiTT12. Finally, RNAi-mediated depletion of KiTT12 demonstrated its requirement for accurate chromosome segregation and Ndc80 localization.

      Overall, I think this manuscript is interesting and makes an important contribution to the field of kinetochore biology. The results of this study, particularly regarding the novel kinetochore components identified, will likely also spark follow-up studies. My major comment concerns the discussion and presentation of the data:

      Major Comments At times, the explanation of homology search appears very technical and would not be accessible to non-experts..

      We thank the reviewer for raising this point. Given that the homology detection approach is an important part of the message of our manuscript, we do think that it is warranted to keep some technicalities in the results section. However, we do agree that quite some detail could be quite easily transferred to a specific supplementary section about our homology detection approach. We will rewrite the results section to better suit non-experts.

      Moreover, the authors could include more details about their analysis of TurboID data to improve clarity.

      I was initially confused what does "not in original proteome" mean in the figure before understanding that two different proteome versions were used. I think it would be less confusing for the reader if the authors simply map their bioID data to the most recent version of the Tetrahymena proteome, which includes both Nnf1 and Csm1. Is it possible that this might also reveal the presence of other components in addition to the two that were specifically targeted?

      We agree that mapping to the most recent proteome annotation will eliminate confusion. We will remap all TurboID datasets to the updated proteome and report whether additional candidates are detected. We will revise the figure legends to clearly explain enrichment categories and annotation differences between proteome versions (or in a supplementary section). So far we have not detected any new proteins in a re-analysis of the MS data for components of the NDC80-C.

      The data presentation in Figure 2 is confusing and requires clarification of the analyses performed. The Figure legend for panel 2C is incomplete. For example, there is no mapping for the character "*" in the legend. The legend can be revised for better clarity. Also, more than 23 proteins are shown in the 2D inset; were those not enriched in the other BioID experiments? It would be helpful to include a legend for these hits as well.

      We will revise Figure 2 and its legend to:

      • Clearly define all symbols (including “*”).
      • Provide a complete legend for enriched hits.
      • Clarify PCA interpretation.
      • Explicitly state how many proteins are included and how they were categorized. I would be cautious about using the word comprehensive, as the identification depends on many aspects, including the completeness of the annotated proteome used to map the MassSpec spectra against. Even if their bioID experiments always converge on the same set of proteins, factors can still be missing due to annotation issues. In addition, certain components might be refractory for detection by MassSpec due to their amino acid composition. Other digestion methods, other than trypsin, could, however, identify those.

      We agree that this term overstates completeness. We will revise wording to reflect that our identification is extensive but dependent on proteome annotation and mass spectrometry detectability.

      Figure 4: I guess the result is somewhat expected given the previous inability to identify these components computationally. I guess the distribution of the non-tetrahymena components might be skewed towards lower sequence divergence, since they do not include orthologs that require experimental approaches for identification. If the authors agree, this could be added as a discussion.

      We agree with the reviewer that Figure 4 merely showcases why we could not detect these kinetochore orthologs in the first place. In our present analysis we did not include orthologs of species with previously shown ‘difficult-to-detect’ orthologs. We will add discussion acknowledging that detectable homologs in other species may be biased toward less divergent sequences and that experimental identification may reveal additional highly diverged components elsewhere.

      The telophase-specific localization of TTHERM_00932010 is interesting. Although the paper focuses on the structural composition of kinetochores, it would be useful if the authors included more details about this protein.

      We will expand the description of TTHERM_00932010 to provide additional contextual information regarding domain architecture, expression timing, and potential functional implications. Off note, for this protein we cannot detect any orthologs outside Tetrahymena spp.

      What is the function of kinesin-6, known roles with respect to chromosome segregation in other species?

      We already discuss the role of kinesin-6 in chromosome segregation in other species in the discussion section at L355-356 (bioRxiv v1). We will expand this section and add two more sentences on diverse functions of this family in eukaryotes.

      Perhaps MadBub localization is more apparent in the presence of unattached kinetochores? In that scenario, it would be useful if the authors knock down KiTT12 and test whether they can localize MadBub.

      We agree this is an interesting possibility. However, systematic spindle perturbation experiments fall outside the primary scope of this structural study. We will clarify this limitation and discuss it as a direction for future work.

      Minor comments It would be useful if the authors added either the expression of all genes or known constitutive genes as a background profile to Figure 2E, in order for the reader to be able to evaluate the G2/M specific increase in expression of bioID hits.

      The data has been taken from [Bertagna et al. 2025, Bioinformatics], and the expression profiles of all the other proteins are provided for inspection by the reviewer/reader (see Table S3). The data representation asked for by the reviewer can thus be found in Bertagna et al. 2025. To provide further overview, we will add a supplementary figure including expression profile for protein with peaks in each of the cell cycle phases, including an overview of those peaking in G2/M.

      What is TTHERM_0046753? One of the identified unknown hits? It is also not part of Figure 2E unless this is a typo and the correct identifier should be 00467535?

      The reviewer is correct that this is a typo on our end, for which we apologise. The correct identifier should be 00467535.

      Why are 29 expressions shown in 2E but only 27 mentioned in the text (23 bioID hits as well as the four Ndc80 complex components)? Or did the authors instead identify 25 specific bioID hits that were further classified into the different categories? A rewrite on this section would likely help the reader to better understand the analyses of the PCA data.

      We agree this section can do with some optimization. We will clarify the number of proteins included in PCA and expression analyses and revise the relevant section for clarity.

      Significance (Required)

      This study highlights the importance of non-model organisms, such as ciliates, in understanding the evolution of the chromosome segregation machinery. Studies on such organisms would shed light on the evolutionary aspects of kinetochore biology.

    1. When looking at who contributes in crowdsourcing systems, or with social media in generally, we almost always find that we can split the users into a small group of power users who do the majority of the contributions, and a very large group of lurkers who contribute little to nothing. For example, Nearly All of Wikipedia Is Written By Just 1 Percent of Its Editors, and on StackOverflow “A 2013 study has found that 75% of users only ask one question, 65% only answer one question, and only 8% of users answer more than 5 questions..” We see the same phenomenon on Twitter: Fig. 16.3 Summary of Twitter use by Pew Research Center# This small percentage of people doing most of the work in some areas is not a new phenomenon. In many aspects of our lives, some tasks have been done by a small group of people with specialization or resources. Their work is then shared with others. This goes back many thousands of years with activities such as collecting obsidian and making jewelry, to more modern activities like writing books, building cars, reporting on news, and making movies.

      When it comes to lurkers, I think the best way is not credit them if it's a project or something else that requires crediting the people who have worked on it. It may not be as effective, but I think, as of now, that is the best way to avoid lurkers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      An interesting manuscript from the Carrington lab is presented investigating the behavior of single vs double GPI-anchored nutrient receptors in bloodstream form (BSF) T. brucei. These include the transferrin receptor (TfR), the HpHb receptor (HpHbR), and the factor H receptor (FHR). The central question is why these critical proteins are not targeted by host-acquired immunity. It has generally been thought that they are sequestered in the flagellar pocket (FP), where they are subject to rapid endocytosis - any Ab:receptor complexes would be rapidly removed from the cell surface. This manuscript challenges that assumption by showing that these receptors can be found all over the outer cell body and flagella surfaces, if one looks in an appropriate manner (rapid direct fixation in culture media).

      The main part of the manuscript focuses on TfR, typically a GPI1 heterodimer of very similar E6 (GPI anchored) and E7 (truncated, no GPI) subunits. These are expressed coordinately from 15 telomeric expression sites (BES), of which only one can be transcribed at a time. The authors identify a native E6:E7 pair in BES7 in which E7 is not truncated and therefore forms a GPI2 heterodimer. By in situ genetic manipulation, they generate two different sets of GPI1:GPI2 TfR combinations expressed from two different BESs (BES1 and BES7). Comparative analyses of these receptors form the bulk of the data.

      The main findings are:

      (1) Both GPI1 and GPI2 TfR can be found on the cell body/flagellar surface.

      (2) Both are functional for Tf binding and uptake.

      (3) GPI2 TfR is expressed at ~1.5x relative to GPI1 TfR

      (4) Ultimate TfR expression level (protein) is dependent on the BES from which it is expressed.

      Most of these results are quite reasonably explained in light of the hydrodynamic flow model of the Engstler lab and the GPI valence model of the Bangs lab. Additional experiments, again by rapid fixation, with HpHbR and FHR, show that these GPI1 receptors can also be seen on the cell surface, in contrast to published localizations.

      It is quite interesting that the authors have identified a native GPI2 TfR. However, essentially all of the data with GPI2 TfR are confirmatory for the prior, more detailed studies of Tiengwe et al. (2017). That said, the suggestion that GPI2 was the ancestral state makes good evolutionary sense, and begs the question of why trypanosomes prefer GPI1 TfR in 14 of 15 ESs (i.e., what is the selection pressure?)

      Strengths and weaknesses:

      (1) BES7 TfR subunit genes (BES7_Tb427v10): There are actually three (in order 5'3'): E7gpi, E6.1 and E6.2. E6.1 and E6.2 have a single nucleotide difference. This raises the issue of coordinate expression. If overall levels of E6 (2 genes) are not down-regulated to match E7 (1 gene), this will result in a 2x excess of E6 subunits. The most likely fate of these is the formation of non-functional GPI2 homodimers on the cell surface, as shown in Tiengwe et al. (2017), which will contribute to the elevated TfR expression seen in BES7.

      We would like to thank the reviewer for pointing out that there are two ESAG6 genes in BES7, we had relied on the publicly available annotation and should have known better.

      For transferrin expression levels, see the discussion in response to reviewer 1 point 3 below

      (2) Surface binding studies: This is the most puzzling aspect of the entire manuscript. That surface GPI2 TfR should be functional for Tf binding and uptake is not surprising, as this has already been shown by Tiengwe et al. (2017), but the methodology for this assay raises important questions. First, labeled Tf is added at 500 nM to live cells in complete media containing 2.5 uM unlabeled Tf - a 5x excess. It is difficult to see how significant binding of labeled TfR could occur in as little as 15 seconds under these conditions.

      The k<sub>on</sub> for transferrin is very rapid (BES1 TfR / bovine transferrin at pH7.4 = 4.5 x 10<sup>5</sup> M<sup>-1</sup>s<sup>-1</sup> (Trevor et al., 2019) and binding would occur to unoccupied receptors within 15 sec. The k<sub>off</sub> is also fast (BES1 TfR / bovine transferrin at pH7.4 = 3.6 x 10<sup>-2</sup> s<sup>-1</sup> (Trevor et al., 2019) and there would be exchange of transferrin within the time taken for endocytosis. These values are in vitro with purified proteins, the in vivo values may be affected by the VSG coat.

      The failure to bind canine transferrin (Supp. Figure 4B) acts as a control for specificity of the interaction.

      We have now performed a competition experiment as an additional control; cells in culture were supplemented with: A, 0.5 µM labelled transferrin; B, 0.5 µM labelled and 2.5 µM unlabelled transferrin; C, 0.5 µM labelled and 5 µM unlabelled transferrin, fixed after 60 s and visualised by fluorescence microscopy (Figure S4C). There was effective competition and greatly reduced binding of transferrin was seen in the presence of a 10-fold excess of unlabelled. We would like to thank the reviewer for suggesting this experiment.

      Second, Tiengwe et al. (2017) found that trypanosomes taken directly from culture could not bind labeled Tf in direct surface labelling experiments. To achieve binding, it was necessary to first culture cells in serum-free media for a sufficient time to allow new unligated TfR to be synthesized and transported to the surface. This result suggests that essentially all surface TfR is normally ligated and unavailable to the added probe.

      As part of the preliminary experiments for this paper we found that centrifugation followed by resuspension in either complete or serum free (but 1% BSA) medium resulted in a reduction is total cellular TfR and determined by western blotting. We have now included this experiment (Figure S4D). The inference from this experiment is that centrifugation and subsequently incubation will have an effect on receptor detection and endocytosis rates for a discreet time period.

      The amount of binding of labelled transferrin to cells in culture will depend on the specific activity of the labelled transferrin. This reasoning was behind the use of 0.5 µM labelled transferrin when roughly 1 in 6 molecules in the culture medium are labelled and there was only a small effect on the overall concentration of transferrin.

      Third, the authors have themselves argued previously, based on binding affinities, that all surface-exposed TfR is likely ligated in a natural setting (DOI:10.1002/bies.202400053). Could the observed binding actually be non-specific due to the high levels of fixative used?

      The absence of binding/uptake of canine transferrin argues against a non-specific interaction. In our previous publication, we did not pay enough attention to the on and off rates which allow for a degree of exchange and, here, TfR newly appearing on the cell surface has a 1 in 6 chance of binding a labelled transferrin.

      (3) Variable TfR expression in different BESs: It appears that native TfR is expressed at higher levels from BES7 compared to BES1, and even more so when compared to BES3. This raises the possibility that the anti-TfR used in these experiments has differential reactivity with the three sets of TfRs. The authors discount this possibility due to the overall high sequence similarities of E6s and E7s from the various ESs. However, their own analyses show that the BES1, BES3, and BES7 TfRs are relatively distal to each other in the phylogenetic trees, and this Reviewer strongly suspects that the apparent difference in expression is due to differential reactivity with the anti-TfR used in this work. In the grand scheme, this is a minor issue that does not impact the other major conclusions concerning TfR localization and function, nor the behavior of HpHbR and FHR. However, the authors make very strong conclusions about the role of BESs in TfR expression levels, even claiming that it is the 'dominant determinant' (line 189).

      This point is valid but exceptionally difficult to address at the protein level. As an orthogonal approach, we performed RNAseq analysis of the ‘wild type’ BES1, BES3, and BES7 cell lines to determine whether differences in receptor mRNA levels were consistent with the proposed difference in protein levels (Table S1). The analysis showed total ESAG6/7 mRNA levels to vary in a similar manner to the protein estimates with BES3 < BES1 < BES7 providing support for the differences in protein levels.

      The strongest evidence for the expression site determining the TfR level is the comparison of the cell lines in which the VSG were exchanged. This had no effect on TfR levels and so there is no evidence that the identity of the VSG alters TfR expression.

      (4) Surface immuno-localization of receptors: These experiments are compelling and useful to the field. To explain the difference with essentially all prior studies, the authors suggest that typical fixation procedures allow for clearance of receptor:ligand complexes by hydrodynamic flow due to extended manipulation prior to fixation (washing steps). Despite the fact that these protocols typically involve ice-cold physiological buffers that minimize membrane mobility, this is a reasonable possibility. Have the authors challenged their hypothesis by testing more typical protocols themselves? Other contributing factors that could play a role are the use of deconvolution, which tends to minimize weak signals, and also the fact that investigators tend to discount weak surface signals as background relative to stronger internal signals.

      We have added preliminary experiments that compared fixation protocols in two parts. First the effect on TfR levels of washing and resuspending cells discussed above (Figure S4D), and second how different fixation protocols alter apparent TfR immunolocalisation (Supp Figure S5A-B). The comparison shows that both the absence of glutaraldeyde and the use of washing alters the outcome.

      (5) Shedding: A central aspect of the GPI valence model (Schwartz et al., 2005, Tiengwe et al., 2017) is that GPI1 reporters that reach the cell body surface are shed into the media because a single dimyristoylglycerol-containing GPI anchor does not stably associate with biological membranes. As the authors point out, this is a major factor contributing to higher steady-state levels of cell-associated GPI2 TfR relative to GPI1 TfR. Those studies also found that the size/complexity of the attached protein correlated inversely with shedding, suggesting exit from the flagellar pocket as a restricting factor in cell body surface localization. The amount of newly synthesized TfR shed into the media was ~5%, indicating that very little actually exits the FP to the outer surface. In this regard, is it possible to know the overall ratio of cell surface:FP:endosomal localized receptors? Could these data not be 'harvested' from the 3D structural illumination imaging?

      A ratio could be determined but we did not do this as it would only be valid if the antibody has equal access to the internal TfR in a diluted VSG environment and the external VSG embedded in a densely packed and cross-linked VSG layer As such, we would have no confidence in the accuracy of any estimate.

      Reviewer #2 (Public review):

      The work has significant implications for understanding immune evasion and nutrient uptake mechanisms in trypanosomes.

      While the experimental rigor is commendable, revisions are needed to clarify methodological limitations and to broaden the discussion of functional consequences.

      The authors argue that prior studies missed surface-localized TfR due to harsh washing/fixation (e.g., methanol). While this is plausible, additional evidence would strengthen the claim.

      Preliminary experiments that compared fixation protocols are now included to show that method affects outcome.

      It remains unclear how centrifugation steps of various lengths (as in previous publications) can equally and quantitatively redistribute TfR into the flagellar pocket. If this were the case, it should be straightforward for the authors to test this experimentally.

      Not aware of previous studies that demonstrate equal and quantitative redistribution to the flagellar pocket. In previous reports, there is variation in cell surface/flagellar pocket localisation depending on expression levels, for example (Mussmann et al., 2003) (Mussmann et al., 2004), it’s worth noting that the increase in TfR expression in these papers is similar to the difference in the cell lines used here. In addition, most report the presence of TfR in endosomal compartments. In the experiments here, there are cells where the majority of signal from labelled transferrin is present in the flagellar pocket and the argument is that this is a stage of a continuous process in which the receptor picks up a transferrin on the cell surface and is swept towards the pocket.

      If TfR is distributed over the cell surface, live-cell imaging with fluorescent transferrin should be performed as a control. Modern detection limits now reach the singlemolecule level, and transient immobilization of live trypanosomes has been established, which would exclude hydrodynamic surface clearance as a confounding factor.

      This is non-trivial and is a longer-term aim. The immobilisation involves significant manipulation of the cells prior to restraining.

      In most images, TfR is not evenly distributed on the surface but rather appears punctate. Could this reflect localization to membrane domains? Immuno-EM with high-pressure frozen parasites could resolve this question and is relatively straightforward.

      There is a non-uniform appearance in the super-resolution images for both TfR and FHR. We cannot distinguish whether this represents random variation in receptor density over the cell surface or results from a biological phenomenon. Whatever the cause, the experiments showed unambiguous cell surface localisation.

      The authors might consider discussing whether differences in parasite life cycle stages (procyclic versus bloodstream forms) or culture conditions (e.g., cell density) affect localization. The developmentally regulated retention of GPI-anchored procyclin in the flagellar pocket might be worth mentioning.

      The aim of this paper was to determine the localisation of receptors in proliferating bloodstream form trypanosomes in culture. TfR and HpHbR are not expressed in insect stages in culture. FHR is expressed in insect stages and is present all over the cell surface (Macleod et al., 2020). A procyclin-based reporter was distributed over the whole cell surface in one report (Schwartz et al. 2005). In other reports, the retention of procyclin in the flagellar pocket of proliferating bloodstream forms is probably dependent on structure/sequence as other single GPI-anchored proteins, such as FHR (Macleod et al., 2020) and GPI-anchored sfGFP (Martos-Esteban et al., 2022) can access the surface.

      References:

      MacGregor, P., Gonzalez-Munoz, A. L., Jobe, F., Taylor, M. C., Rust, S., Sandercock, A. M., Macleod, O. J. S., Van Bocxlaer, K., Francisco, A. F., D’Hooge, F., Tiberghien, A., Barry, C. S., Howard, P., Higgins, M. K., Vaughan, T. J., Minter, R., & Carrington, M. (2019). A single dose of antibody-drug conjugate cures a stage 1 model of African trypanosomiasis. PLoS Neglected Tropical Diseases, 13(5), e0007373. https://doi.org/10.1371/journal.pntd.0007373

      Macleod, O. J. S., Bart, J.-M., MacGregor, P., Peacock, L., Savill, N. J., Hester, S., Ravel, S., Sunter, J. D., Trevor, C., Rust, S., Vaughan, T. J., Minter, R., Mohammed, S., Gibson, W., Taylor, M. C., Higgins, M. K., & Carrington, M. (2020). A receptor for the complement regulator factor H increases transmission of trypanosomes to tsetse flies. Nature Communications, 11(1), 1326. https://doi.org/10.1038/s41467-020-15125-y

      Martos-Esteban, A., Macleod, O. J. S., Maudlin, I., Kalogeropoulos, K., Jürgensen, J. A., Carrington, M., & Laustsen, A. H. (2022). Black-necked spitting cobra (Naja nigricollis) phospholipases A2 may cause Trypanosoma brucei death by blocking endocytosis through the flagellar pocket. Scientific Reports, 12(1), 6394. https://doi.org/10.1038/s41598-02210091-5

      Mussmann, R., Engstler, M., Gerrits, H., Kieft, R., Toaldo, C. B., Onderwater, J., Koerten, H., van Luenen, H. G. A. M., & Borst, P. (2004). Factors affecting the level and localization of the transferrin receptor in Trypanosoma brucei. The Journal of Biological Chemistry, 279(39), 40690–40698. https://doi.org/10.1074/jbc.M404697200

      Mussmann, R., Janssen, H., Calafat, J., Engstler, M., Ansorge, I., Clayton, C., & Borst, P. (2003). The expression level determines the surface distribution of the transferrin receptor in Trypanosoma brucei. Molecular Microbiology, 47(1), 23–35. https://doi.org/10.1046/j.13652958.2003.03245.x

      Schwartz, K. J., Peck, R. F., Tazeh, N. N., & Bangs, J. D. (2005). GPI valence and the fate of secretory membrane proteins in African trypanosomes. Journal of Cell Science, 118(Pt 23), 5499–5511. https://doi.org/10.1242/jcs.02667

      Trevor, C. E., Gonzalez-Munoz, A. L., Macleod, O. J. S., Woodcock, P. G., Rust, S., Vaughan, T. J., Garman, E. F., Minter, R., Carrington, M., & Higgins, M. K. (2019). Structure of the trypanosome transferrin receptor reveals mechanisms of ligand recognition and immune evasion. Nature Microbiology, 4(12), 2074–2081. https://doi.org/10.1038/s41564-019-0589-0

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major Recommendations:

      (1) 2 E6 gene in BES7s: This does not affect the overall conclusions, but the text should be modified to reflect the existence of the second gene, and to discuss the ramifications.

      This has been corrected

      (2) Surface binding studies: To clarify this issue, two experimental approaches are strongly recommended. First: additional excess unlabelled Tf should be added. If binding is truly receptor-mediated, it must by definition be saturable at some experimentally achievable level. Second: TfR expression should be abrogated by RNAi silencing to show that binding is TfR-dependent. Without some validation of specific binding by one or both of these approaches, these counter-intuitive results must be questioned.

      The excess unlabelled transferrin experiment is now included (we would like to thank the reviewer for this suggestion). The absence of binding of canine transferrin provides strong evidence for the specificity.

      (3) Variable TfR expression in different BESs: To make such claims, quantitative RTPCR should be performed with conserved primers to assess the actual relative expression at the transcriptional level. Absent this, the claims should be eliminated, or at the very least greatly tempered.

      This has been done using an RNAseq analysis.

      (4) Surface immuno-localization of receptors: An example of discounting weak signals as background can be seen in Figure 8 of Duncan et al. (2024). It has also been shown that at least one other GPI1 reporter (procyclin) is readily detected on the outer cell surface under ectopic expression in BSF trypanosomes (Schwartz et al., 2005) using typical fixation procedures. This could be cited, and the authors could discuss the fact that procyclin is not a receptor and may not be susceptible to hydrodynamic drag.

      Yes

      Minor issues:

      (1) Fully appreciating the data presented requires an understanding of the hydrodynamic flow and GPI valence models of the Engstler and Bangs labs, respectively. For the uninitiated,d it might perhaps be useful to include brief summaries of each in the Introduction.

      Added to the introduction

      (2) Lines 110-112: ISG65 and ISG75 both have strong localizations in endosomal compartments. This should be noted with citation of any of the work from the Field lab.

      Added

      (3) Lines 121-132: This passage presents the role of GPI anchors (1 vs 2) in a rather digital manner (in or out). Schwartz et al (2005) present a much more nuanced view of what is likely taking place. This is one reason summaries of hydrodynamic flow and GPI valence would be helpful.

      Modified

      (4) Lines 182-184: The increased size of GPI-anchored E7 is in part due to the presence of the GPI itself, as the authors state, but there are also 24 additional amino acid residues in this protein that contribute.

      Modified

      (5) Lines 212-214: Do p>0.95 and p>0.99 indicate statistical significance? This must be a typo.

      Thank you, corrected

      (6) Lines 218-219: The better references documenting GPI number in regard to turnover/shedding are Schwartz et al. 2005 and Tiengwe et al. 2017.

      Changed

      (7) Line 241 and Figures 3, 4, and 6: The transverse sections add little to the presentation. That there is signal variation in all dimensions is readily apparent from the images themselves, and similar profiles would be obtained regardless of the transect. Was there some process/rationale in the selection of the individual transects intended to make a broader point? If so, a description of the process should be provided.

      The point was to show that the signal had a pattern consistent with plasma membrane (two distal peaks) as opposed to cytoplasm (single central peak). As such, we think it is important.

      (8) Lines 582-596: Methodology for quantitation of cellular fluorescent signals should be provided.

      Has been expanded

      Reviewer #2 (Recommendations for the authors):

      (1) As a less critical but still useful control, antibody accessibility assays on live versus fixed parasites could test whether VSG coats limit detection.

      This could only be quantified by using a range of monoclonal antibodies which are not available.

      (2) The rapid transferrin uptake (15-60 seconds) could reflect fast endocytic recycling rather than stable surface residency. A pulse-chase experiment tracking receptor movement would clarify this (though I acknowledge that this is technically challenging).

      We agree that endocytic recycling is probably the main source of unoccupied TfR on the cell surface. It is hard to see how the pulse chase experiment could be performed without centrifugation which will affect the outcome – see above.

      (3) Statistical and quantitative reporting

      Added as Table S2- S4

      (4) Report confidence intervals (e.g., for fluorescence intensity comparisons in Figure 3B) to contextualize claims of "no significant difference."

      We do not claim ‘no significant difference’ and the SD overlap due to a high level of variation in the population

      (5) Specify the number of biological replicates and cells analyzed per condition in the figure legends.

      Added

      (6) The study notes that surface-exposed receptors avoid antibody detection, but does not explore how.

      We don’t claim that receptors avoid detection and have published evidence to the contrary. The cell has evolved mechanisms to reduce/minimise the effect of antibody binding.

      (7) Comparing antibody binding to TfR in VSG221 versus VSG224 coats.

      This is already present in Figure 3D

      (8) Testing whether receptor shedding or conformational masking contributes to immune evasion.

      A lifetime’s work

      (9) Evolutionary trade-offs: Discuss why T. brucei maintains ~15 TfR variants if the GPI-anchor number has minimal impact on function (Figure 3).

      The possible reason for the evolution of ~15 TfR variants was discussed in a previous publication.

      (10) How do their findings align with recent studies on ISG75 surface exposure?

      If this refers to the finding that ISG75 is an Ig Fc receptor, this has been included

      (11) Add scale bars to 3D reconstructions (Figure 5).

      Added

      (12) Include a schematic summarizing key findings in the main text.

      Chosen not to do

      (13) Explicitly state where raw microscopy images, flow cytometry data, and analysis scripts are deposited.

      Microscope Images have deposited in Bioimage Archive repository at EMBL/EBI No flow cytometry used

      (14) Correct inconsistent GPI-anchor terminology (e.g., "glycosylphosphoinositol" to "glycosylphosphatidylinositol").

      Our typo, corrected

      (15) Clarify ambiguous phrases (e.g., "subtle mechanisms" in the Discussion).

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely appreciate your constructive feedback. Based on the comments from the three reviewers, we were able to substantially improve the manuscript. Below, we provide our point-by-point responses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examined the functional organization of the mouse posterior parietal cortex (PPC) using meso-scale two-photon calcium imaging during visually-guided and history-guided tasks. The researchers found distinct functional modules within the medial PPC: area A, which integrates somatosensory and choice information, and area AM, which integrates visual and choice information. Area A also showed a robust representation of choice history and posture. The study further revealed distinct patterns of inter-area correlations for A and AM, suggesting different roles in cortical communication. These findings shed light on the functional architecture of the mouse PPC and its involvement in various sensorimotor and cognitive functions.

      Strengths:

      Overall, I find this manuscript excellent. It is very clearly written and built up logically. The subject is important, and the data supports the conclusions without overstating implications. Where the manuscript shines the most is the exceptionally thorough analysis of the data. The authors set a high bar for identifying the boundaries of the PPC subareas, where they combine both somatosensory and visual intrinsic imaging. There are many things to compliment the authors on, but one thing that should be applauded in particular is the analysis of the body movements of the mice in the tube. Anyone working with head-fixed mice knows that mice don't sit still but that almost invariable remains unanalyzed. Here the authors show that this indeed explained some of the variance in the data.

      Weaknesses:

      I see no major weaknesses and I only have minor comments.

      Reviewer #2 (Public review):

      Summary:

      The posterior parietal cortex (PPC) has been identified as an integrator of multiple sensory streams and guides decision-making. Hira et al observe that dissection of the functional specialization of PPC subregions requires simultaneous measurement of neuronal activity throughout these areas. To this end, they use wide-field calcium imaging to capture the activity of thousands of neurons across the PPC and surrounding areas. They begin by delineating the boundaries between the primary sensory and higher visual areas using intrinsic imaging and validate their mapping using calcium imaging. They then conduct imaging during a visually guided task to identify neurons that respond selectively to visual stimuli or choices. They find that vision and choice neurons intermingle primarily in the anterior medial (AM) area, and that AM uniquely encodes information regarding both the visual stimulus and the previous choice, positioning AM as the main site of integration of behavioral and visual information for this task.

      Strengths:

      There is an enormous amount of data and results reveal very interesting relationships between stimulus and choice coding across areas and how network dynamics relate to task coding.

      Weaknesses:

      The enormity of the data and the complexity of the analysis make the manuscript hard to follow. Sometimes it reads like a laundry list of results as opposed to a cohesive story.

      Reviewer #3 (Public review):

      Summary: This work from Hira et al leverages mesoscopic 2-photon imaging to study large neural populations in different higher visual areas, in particular areas A and AM of the parietal cortex. The focus of the study is to obtain a better understanding of the representation of different task-related parameters, such as choice formation and short-term history, as well as visual responses in large neural populations across different cortical regions to obtain a better understanding of the functional specialization of neural populations in each region as well as the interaction of neural populations across regions. The authors image a large number of neurons in animals that either perform visual discrimination or a history-dependent task to test how task demands affect neural responses and population dynamics. Furthermore, by including a behavioral perturbation of animal posture they aim to dissociate the neural representation of history signals from body posture. Lastly, they relate their functional findings to anatomical data from the Allen connectivity atlas and show a strong relation between functional correlations on anatomical connectivity patterns.

      Strengths:

      Overall, the study is very well done and tackles a problem that should be of high interest to the field by aiming to obtain a better understanding of the function and spatial structure of different regions in the parietal cortex. The experimental approach and analyses are sound and of high quality and the main conclusions are well supported by the results. Aside from the detailed analyses, a particular strength is the additional experimental perturbation of posture to isolate history-related activity which supports the conclusion that both posture and history signals are represented in different neurons within the same region. Weaknesses: The main point that I found hard to understand was the fairly strong language on functional clusters of neurons while also stating that neurons encoded combinations of different types of information and leveraging the encoding model to dissociate these contributions. Do the authors find mixed selectivity or rather functional segregation of neural tuning in their data? More details on this and some other points are below.

      We thank the three reviewers for their accurate and expert evaluations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It wasn't clear to me why the authors focused on areas A and AM, but not RL. After all, at the beginning of the results, the authors ask: "PPC has been reported to have functions including visually guided decision-making and working memory. Do these functions differ among RL, A, and AM?".

      Thank you for the comment. The manuscript first characterizes AM as a region involved in visually guided decision-making and A as a region related to history and/or working memory. Subsequently, when discussing correlation structure, we stated the following:

      “In particular, based on the critical functional differences between A and AM that we found, A and AM may belong to distinct cortical networks that consist of different sets of densely interacting cortical areas.”

      Thus, the logical flow of our analysis is to first reveal the functional contrast between A and AM through comparative functional analyses across RL, A, and AM, and then to focus on this contrast. We speculate that RL may exhibit more distinctive functional properties in tasks that rely on whisker-based processing or related modalities. We have therefore revised the text as described below to avoid the impression that the manuscript places disproportionate emphasis on RL.

      Line 137: “PPC has been reported to have functions including visually guided decisionmaking and working memory. Do these functions differ among A, AM, and RL?”

      (2) Figures 2 E, F, and Figure 3A, could the authors indicate the trial structure better on these plots?

      Thank you for the comment. We have added explanations of the bar meanings to the figure legends.

      Figure 2:

      “(E) Representative vision neurons (ROI 1-4 in I). The red bars indicate sampling periods during video presentation, and the brown bars indicate sampling periods without video stimulation. Vertical black lines mark the onset of the sampling period. F. Representative choice neuron (ROI 5-8 in I) and a non-selective neuron (ROI 9). Light blue lines indicate the response periods in trials with left choices, and purple lines indicate the response periods in trials with right choices. Vertical black lines mark the onset of the response period.”

      Figure 3:

      “(A) The representative history neurons. Numbers correspond to that of panel B and C. Light blue lines indicate rewards delivered from the left lick port, and purple lines indicate rewards delivered from the right lick port. Vertical white lines mark the onset of the sampling period.”

      (3) There are several typos that need correcting. Also, small and big capital letters to demark the panel names in the legends have been mixed.

      Thank you for the comment. We have corrected the panel labels as described below.

      Figure 2 legend:

      “Representative choice neuron (ROI 5-8 in I) and a non-selective neuron (ROI 9)”

      Figure 3 legend:

      “..than the next choice. I. The decoding accuracy of the next choice …”

      Figure 3 legend:

      “Error bars, mean ± s.e.m. in I, 95% confidence interval in G. M, and O.”

      Supplementary Figure 6:

      “…neurons with rt ≥ 0.3 (blue) were shown. B. Trial-to-trial activity fluctuation … (rt ≥ 0.3, panel B) was color coded…”

      We thoroughly checked the manuscript for typographical errors and corrected the issues.

      (4) Many in the field still use the Paxinos nomenclature for PPC subfields, could the authors write something short about how these two nomenclatures correspond?

      We have described the relationship between our area definitions and those of Paxinos in the main text as follows.

      Line 702: “In addition to our definition, previous studies have also defined posterior parietal cortex (PPC) to include the higher visual areas A, AM, and RL (Glickfeld and Olsen, 2017; Wang et al., 2011). These areas partially overlap with the parietal association regions defined in the Paxinos atlas, including MPtA, LPtA, PtPD, and PtPR. For a detailed discussion of the correspondence and variability among these regional definitions, see Lyamzin and Benucci (2019).”

      (5) Analyzing choice history may be affected by the long fluorescence Ca transients and will depend on excellent event deconvolution. Could the authors show some more zoomed-in examples of how well their deconvolution works?

      We provide enlarged, trial-by-trial activity traces of the four example neurons shown in Figure 3A in Supplementary Figure 3G. In all neurons, multiple small calcium transients occur repeatedly throughout the delay period, which lasts longer than 10 s. If the sustained activity during the delay were simply due to a long decay time constant, one would expect a large calcium transient in the preceding trial that slowly decays over the delay period. However, such a pattern is not observed in the actual data. Also, since the decay time constant of GCaMP6s is on the order of ~1 s, signals persisting for ~10 s cannot be explained by slow decay alone.

      (6) The authors write: "the history neurons exhibited properties of working memory." However, note that this is not a working memory task since the mice don't need to keep evidence in memory, the direction to lick can be made at the very beginning of a trial.

      Behaviorally, demonstrating that an animal maintains working memory requires showing that its behavior changes based on retained information when new information is introduced, as in delayed match-to-sample tasks. In the present task, however, the correct action for the next trial is determined at the moment the action in the previous trial is completed, such that animals can simply switch to motor preparation at that point. Thus, from a strictly behavioral perspective, working memory is not required.

      On the other hand, during the inter-trial interval (ITI), information from the previous trial dominates over information from the upcoming trial (Fig. 3H), which is more consistent with retention of past information than with motor preparation. Moreover, trials in which neural activity maintained information about the previous trial’s action were associated with a higher probability of correct performance in the subsequent trial. In other words, retaining past information contributes to guiding correct behavior in the next trial.

      Based on these neural analyses, we interpret that mice retain information about their previous trial’s action history in working memory and use it to determine behavior in the subsequent trial. Accordingly, we consider ITI activity in PPC to reflect working memory rather than motor preparation. Nevertheless, we acknowledge that your concern is valid, and we have therefore revised the text as follows:

      Line 234: “These results suggest that the history neurons exhibited properties of working memory.”

      (7) In the section about the Choice History Task, the authors write: "Since the visual stimuli were randomly presented during the sampling period, the mice had to ignore the visual stimuli." Why continue to present the visual stimuli?

      Thank you for the suggestion. By designing the vision task and the history task to have identical structures, we can apply the same encoding and decoding models to both tasks, which facilitates direct comparison between them. This design makes it easier to examine how neuronal activity patterns change depending on task demands.

      Reviewer #2 (Recommendations for the authors):

      (1) I don't understand the logic of Figure S7 and the neuropil analysis in general. Neuropil activity is purported to represent input, so it seems unsurprising that nearby neurons would exhibit similar dynamics.

      Thank you for your comment. Your argument is correct, and it is not at all surprising that neuropil signals correlate with the activity of surrounding neurons. Here, we quantitatively examined the relationship between neuropil activity and the average activity of nearby neurons. In addition, in a separate analysis, we clarified the relationship between connectome information and neuropil activity. Taken together, these analyses reveal the relationship between connectome information and the local average of neuronal activity. We describe this point as follows:

      “Indeed, the trial-to-trial variation of a neuropil activity could be approximated by the average of 1,000–10,000 neurons within several hundred micrometers from the center (Figure S7).”

      Although we analyzed this phenomenon in the cases of areas A and AM, this finding should not be considered specific to A and AM but instead has broader, general significance. Accordingly, we added a new Results subsection and revised the manuscript as follows.

      Line 448: “Constraints and limits of anatomical connectivity on neuronal population activity Although we have so far focused on the differences between A and AM, our data provide broader insights into the relationship between anatomical connectivity and neuronal population activity. First, based on Figure S7 and the considerations above, anatomical input correlations strongly constrain the correlations between local averages of activity across thousands of neurons. We then asked whether this anatomical constraint extends beyond mean activity, and how anatomical input correlations relate to relationships between neuronal population activities (population vectors).

      The correlation between CC<sub>t</sub> and r<sub>anatomy</sub> was moderate (r = 0.60, Figure 6L). This moderate correlation did not change when the coupling neurons were eliminated (r = 0.61). Interestingly, the largest canonical component was the most unpredictable from the anatomical data (Figure 6M). Thus, while inter-area correlations based on the mean activity of neuronal populations are largely determined by anatomical input correlations, correlations between population vectors contain additional structure that cannot be captured by anatomical input correlations alone.

      One possible source of this additional structure is globally shared activity, which may reflect behavior, brain state, or levels of neuromodulators. To evaluate the contribution of global activity on the canonical correlation between areas, we first compared the canonical coefficient vectors (CCV). We found that the first CCV had a similar orientation, regardless of the paired areas (Figure6N). This indicates that the largest components of correlated activity in the CCA analysis are globally shared fluctuations. We also directly evaluated the correlated activity components across all 8 areas with generalized canonical correlation analysis. The first CCV also had a similar orientation to the first generalized canonical coefficient vector (GCCV) (Figure 6O). These results indicate that the largest canonical component reflects a global correlation across all cortical areas imaged. Such global correlations may be driven by factors beyond cortico-cortical or thalamo-cortical inputs, such as the animal’s behavioral state as we recently characterized (H. Imamura et al., 2025; F. Imamura et al., 2025). We also confirmed the robustness of these results by repeating analyses using only the 40% highly active neurons after denoising with non-negative deconvolution (36828 out of 91397 neurons; Figure S9).”

      (2) Furthermore, the neuropil signal likely contains signals from out-of-focus neurons that are presumably functioning similarly to the in-focus cells. Wouldn't the interesting question be to what extent the local neuropil signal in, for example, area A resembled that of neuronal activity in S1t?

      Thank you very much for your comment. We agree with your point. Based on the evaluation in Figure S7, the neuropil signal likely contains the average activity of several thousand local neurons, including out-of-focus contributions. The neuropil signal in area A may also partially reflect neuronal activity from the neighboring S1t area. In particular, neurons that show little correlation with the local population average (i.e., the neuropil signal) within the same area are sometimes referred to as “soloists” (M. Okun et al., 2015). If such soloist neurons were found to exhibit strong correlations with the neuropil signal of an adjacent area, this would be a highly interesting result. However, such an analysis would go beyond the scope of the present manuscript and would require a new line of discussion; therefore, we plan to address this issue in future work.

      (3) I generally found the final Results section (Relationship between mesoscale functional correlation and anatomical connections) to be hard to follow. The motivation for this analysis should be better explained.

      We fully incorporated your suggestion and rewrote the final section of the Results accordingly. Please refer to our responses to the two comments above.

      (4) The question of brain state/neuromodulation as a driver of the globally shared activity may be addressable by considering its correlation with pupillometry data.

      We fully agree with your suggestion. In our experiments, visual stimuli change continuously, and thus pupil diameter changes are most likely driven primarily by changes in visual input. Although state-dependent fluctuations of brain activity may also be present, they are likely masked by the larger effects induced by visual stimulation. Therefore, analyzing pupil-linked signals as a factor of globally shared activity would be more appropriately addressed in experiments without visual stimulation. We plan to investigate this issue in future studies. Here, we have added the following description regarding pupil dynamics and their associated relationships.

      Line 292: “We found that the neurons related to the tail and forepaws were similarly distributed around the parietal cortex including S1 and A, while the pupil-size related neurons were mapped around visual areas (Figure 4C). Changes in pupil diameter may influence neuronal activity through multiple mechanisms, including behavioral state or noradrenergic level [REF], nonlinear interactions with visual stimulation, and changes in the amount of light reaching the retina.”

      Minor issues

      (1) The authors deploy sophisticated mathematical techniques with essentially no explanation outside the Methods section. A brief introduction of jPCA and CCA in the main text would help the reader understand the value of these analyses.

      Thank you for the comment. We added the following explanation.

      Line 238: “In this task, left and right selection are alternated, so the activity of the history neuron is a sequence that repeats in two consecutive trials. We used jPCA<sup>49</sup> to visualize and quantify this activity pattern (Figure 3K). jPCA identifies low-dimensional projections of population activity that maximize rotational dynamics across time.”

      Line 374: “Next, to investigate r<sub>t</sub> of the population activity (r<sub>t_population</sub>), we first reduced the dimension of population activity in each area into 10 by using PCA (principal component analysis) (Figure S6B,C). Then, “fluctuation activity” was recalculated for each dimension and trial type, analogous to the single-neuron analysis described above, but here representing noise in population-level activation patterns. We applied CCA (canonical correlation analysis) to each pair of areas and obtained an average of 10 canonical correlations (CC<sub>t</sub>) as r<sub>t_population</sub>. CCA identifies pairs of linear combinations of population activity from two areas that maximize their correlation across trials, thereby capturing shared population-level fluctuations. The CC<sub>t</sub> structure between areas was similar across task types (Figure 5H) indicating that this structure reflects the underlying functional connectivity independent of the task. The CC<sub>t</sub> between A and S1t was the largest among all the pairs (Figure 5H), whereas when the CC<sub>t</sub> was averaged across all connections for each area, A and AM had the largest and second largest C<sub>t</sub>, respectively (Figure 5I). The dominance in CC<sub>t</sub> in A and AM disappeared when the neurons with r<sub>t_single</sub> >0.3 were removed. Notably, the CC<sub>t</sub> of AM and the other areas was uniform regardless of the paired areas across all 10 canonical components (Figure 5J). Thus, area AM is an integration hub of interareal communication, whereas A simply coupled with S1t, and such correlation structure at the population level critically depends on this subset of neurons.”

      (2) The manuscript contains numerous typos ("hoice"), spelling errors ("parameters", "costom"), abbreviations that are not defined (ex: RL/rostrolateral), and minor grammatical issues that should be addressed by a round of copy editing.

      We thank the reviewer for pointing this out. We have thoroughly corrected these typographical and grammatical errors, and have described the revisions in detail in our response to Reviewer 1, comment (3). In addition, we have clarified the abbreviations in the manuscript as follows.

      Line 94: “rostrolateral area (RL)”

      Figure 1 legend: “Abbreviations: RL, rostrolateral HVA; PM, posteromedial HVA; RSC, retrosplenial cortex.“

      (3) Figure 3K unlabeled axes.

      Thank you for the comment. We have added the axis labels.

      (4) Figure 3K caption, first "(right)" should be "(left)".

      Thank you very much for your careful attention to detail. We have made the requested correction.

      (5) Figure 6 is hard to read. Panel A is too small, and the interpretation of G is difficult.

      - For panel A, we added an enlarged view with images from a larger number of trials in Figure S7A.

      - G represents the connectivity matrix. The sources correspond to the injection sites, and the targets correspond to voxels in the cerebral cortex. Because the latter may not be immediately clear, we explicitly indicated in the figure that the targets are cortical voxels.

      (6) Figure S4C has a double compass.

      Thank you for the comment. We have revised the manuscript accordingly.

      Reviewer #3 (Recommendations for the authors):

      While I have some questions and additional suggestions to further improve the clarity of the manuscript, I already found it to be highly interesting and well done in its current form.

      Major points:

      (1) The t-SNE comes up rather abruptly and is not well-explained in the main text or the figure caption. It would be good to provide some more information on the rationale of this analysis and how to interpret it. In particular, I don't see clear clusters in Figure 2H although the description of the authors seems to indicate that they observe clear functional classes such as choice, stimulus, and history neurons. Similarly, in Figure 3B, I don't see a clear separation between history and choice neurons in the t-SNE map. The example cells in Figure 3A appear to be delayed or long-tailed choice neurons rather than a dedicated group of 'history neurons'. It would be helpful for the interpretation of the t-SNE plots to show different PSTHs for different regions of the t-SNE map to better illustrate what different regions within the t-SNE projection represent and what distinguishes these cells.

      Thank you for the comment. The absence of clearly defined clusters in the t-SNE map suggests that neuronal activity forms a continuum rather than discrete classes. Importantly, the purpose of the t-SNE map here is not to identify sharp clusters, but to demonstrate that the functional categorization provided by our encoding model broadly and comprehensively spans the major structures present in the unsupervised t-SNE map. We have revised the relevant text in the manuscript accordingly as follows.

      Line 158: “To examine whether the neuron groups labeled by this model broadly capture the diversity of neuronal activity, we performed unsupervised clustering of neuronal activity using t-SNE. The functional labels revealed by this encoding model were consistent with the t-SNE clusters, indicating the validity of the encoding model (Figure 2H; Figure S4B; materials and methods).”

      The issue regarding History neurons was also raised in Reviewer #1’s comment (5). We provide an enlarged view of Figure 3A in Figure S3A. Each History neuron exhibits multiple calcium transients repeatedly and asynchronously following the previous reward acquisition. Therefore, rather than being “choice neurons with a long tail,” these neurons are better interpreted as neurons whose activity is sustained during this delay period.

      (2) Although the authors mention that neurons represent a mixture of features, they then use the encoding model to isolate clusters, such as vision or choice neurons. In general, the language throughout the manuscript suggests that there are various clusters of functionally segregated neurons (vision, choice, history, or coupling neurons). However, it is not clear to me to what extent this is supported by the data. Couldn't a choice neuron also be a vision neuron if both variables make significant contributions to the model? Similarly, are 'history' and 'choice' separate labels from the encoding model, or could a cell be given multiple labels? If a cell could be given multiple labels how did the authors create the colored plots on the right-hand side of Figures 2H and 3B? The example history cells in Figure 3J also appear to be highly selective for the contralateral choice, so again this seems to argue against a clear separation of choice and history neurons.

      Each label is assigned based on whether the corresponding coefficient is significant in the encoding model, and therefore neurons that are both vision- and choice-selective do exist. The presence of mixed selectivity neurons in PPC is well established (e.g., MJ Goard et al., 2016 elife). In this manuscript, however, we focus not on functional overlap at the single neuron level, but on the spatial distribution of functional classes, and thus do not explicitly address mixed selectivity. Although the colors in Figure 2H and Figure 3B overlap, the underlying data for each are presented separately in Figure S4B and S4D, respectively. As shown there, each color generally occupies distinct regions in the t-SNE map.

      (3) The decoding analysis in Figure 3F also suggests that a potential reason why there are more choice history signals in areas S1 and A is that neural activity is simply larger rather than due to the activity of a dedicated group of history neurons. Are the authors interpreting this differently? Could the duration of stored choice information also be affected by the dynamics of the calcium indicator?

      Thank you for the comment. Simply having larger neural activity in S1t or A would not result in calcium transients with a ~1-s time constant persisting throughout a delay period lasting up to 10 seconds. As also noted in comment (1), History neurons exhibit sustained and repeated calcium transients, and therefore their activity cannot be explained merely by elevated neural activity levels. One could argue that all cortical areas carry history-related information but that the signal-to-noise ratio is higher in S1t or A, which might make such signals more detectable there. If this were the case, however, differences across areas in all forms of selectivity should similarly depend on signal-to-noise ratio. This is not what we observe in our data.

      (4) I'm confused as to why the decoding accuracy is so high for areas A and S1t at time -3 relative to the choice in Figure 3F. Shouldn't this be the same as predicting the next choice in Figure 3H? Why is the decoding accuracy lower in this case?

      Thank you for the comment. The analysis shown in Figure 3F includes only trials in which the choice was correct. This is the reason why the decoding performance in Figure 3H is lower. We have added this clarification to the main text.

      Figure 3F: “Decoding accuracy of choice, outcome, and visual stimuli by the activity of 20 neurons from each area using only correct trials, before and after the choice onset, reward delivery, and the end of the visual stimuli, respectively. Line colors corresponded to the areas shown in panel G.”

      (5) In general, the text is not very detailed about the statistics. While test scores and p-values are mentioned, it would be good to also state what is actually compared and what the n is (e.g. how many neurons, neuron pairs, areas, sessions, or animals) for each case. How do the authors account for the nested experiment design where many neurons are coming from a low number of animals?

      Thank you for the comment. In our decoding analyses, we generally treat the number of animals as the independent variable. In contrast, for the encoding model analyses, we treat the number of neurons as the independent variable. As you correctly pointed out, because we recorded activity from a large number of neurons, statistical tests that treat individual neurons as independent samples can readily yield significant p-values even with a small number of animals. We have therefore confirmed that our conclusions are not driven by a large effect from a single animal. When making qualitative claims, we rely not only on statistical significance (p-values) but also require clear differences in effect size. We have added the following clarification to the Statistics section accordingly.

      Line 1049: ”For the decoding analyses, the number of animals was treated as the independent variable, whereas for the encoding model analyses, the number of neurons was treated as the independent variable. To ensure that the results were not driven by a single animal, we repeated the statistical tests while systematically excluding data from one animal at a time and confirmed that statistical significance was preserved in all cases. Furthermore, qualitative interpretations were made only when differences in effect size were clearly observed.”

      (6) How was the grouping in Figure 2O done? Specifically, how were the thresholds for the dashed lines selected to separate PM and V1 from AM and RL as association areas? It seems to me like this grouping was done rather arbitrarily as the difference in choice decoding accuracy is not particularly large between these areas.

      This line does not have a specific quantitative basis, but we consider it useful as an illustrative aid. We have added this clarification to the figure legend.

      Figure 2O: “Decoding accuracies of time in video presentation and choice direction indicate that AM would be the best position for associating these two signals. The background color and dashed lines are provided as visual aids for illustrative purposes.”

      (7) The fact that neurons with high rt_single tend to share the same function might also indicate the approach is insufficient to remove all effects of tuning to trial types from the neural data. Since the authors subtract the average of each trial type, the average trial-type related information is removed but type-specific variations that are not equally presented in the average might remain. For choice neurons for example, attentive vs in-attentive choices could be represented differently and thus remain in the data since the average would be a mixture of both. The same goes for other factors that would drive a particular modulation in the choice - or stimulus - related part of the trial which could still tie these neurons together. One way to circumvent this concern could be to first compute the mean activity for all time points in each trial and then compute the trial-to-trial variability across all trials of the same type. Alternatively, I would be curious how the results play out when using data when the animal is not actively performing the task to compute rt_single.

      Thank you for the comment. The concern raised by the reviewer applies to all noise-correlation analyses and highlights an important limitation of this approach, namely that factors other than the observed variables are treated as noise. By subtracting the trial-averaged activity, information related to sensory input and the direction of the first lick at choice can be removed. However, other factors cannot be eliminated if they are not observed. For example, if right hindlimb movements tend to occur only in trials with visual stimulation combined with left choice, such effects cannot be removed because they are not measured. The same issue remains even when restricting the analysis to a single trial type. Based on these considerations, we have added the following text to the manuscript.

      Line 932: “Correlation of trial-to-trial variance of activity between a pair of single neurons was defined as r<sub>t_single</sub>. To calculate r<sub>t_single</sub>, we averaged the activity of individual neurons over the sampling period, and the average across each trial type was subtracted from this value. The trial types consisted of four sets of pairs of stimuli and responses, that is, the video stimulation and left choice, the video stimulation and right choice, the black screen and left choice, and the black screen and right choice. By this operation, we extracted the fluctuating components of single-neuron activity that are independent of the trial types. Although the finding that neurons with high r<sub>t_single</sub> tend to share the functional properties we propose is not a trivial consequence of the analysis. At the same time, it remains possible that high r<sub>t_single</sub> reflects the degree to which neurons share unobserved features, and that such features are correlated with our functional classification. Thus, while this analysis suggests that correlated fluctuations across cortical areas may contribute to the determination of functional types, establishing an exclusive conclusion will require more fine-grained behavioral measurements, tighter control of internal states, and causal identification through targeted interventions.”

      Minor points:

      (1) Why did the authors use the activity of 50 neurons for the decoder analysis in Figure 2K? Didn't they have many more neurons available? How were these selected?

      We found that the conclusions were identical when using datasets consisting of either 50 neurons or 20 neurons across all analyses. Because the total number of recorded PM neurons did not reach 100 in at least one mouse, we standardized the analyses to 50 neurons in order to match the number of neurons across all cortical areas and animals.

      (2) The authors mention that some PPC neurons showed complex dynamics rather than encoding a specific feature such as visual or choice information but do not mention actual numbers on this point. It would be good to quantify to what extent neurons in different regions represent such mixed selectivity and whether there are clear differences in selectivity. This would also be interesting to discuss in context to earlier work on mixed selectivity in the parietal cortex, such as Raposo et al 2015.

      Thank you for the comment. Your point is entirely valid. However, as explained in our response to your major comment, our analyses focus not on how individual neurons are classified, but rather on the spatial distribution of these functional categories.

      (3) I have a hard time understanding what the length of the bars in the right panel of Figure 2k indicates. Does this plot show more than the decoder accuracy before and after the choice? Is the bar length related to the standard deviation? The same question for the visualization in panel 2n. It looks nice but I'm confused about what it shows exactly.

      These bars represent confidence intervals. Although this is stated at the end of the Figure 2 legend, we agree that it may not be sufficiently clear, and we have therefore added this information to the Statistics section.

      Line 1046: “In Figure 2K and N, and Figure 3G, L, M, and O, the bars indicate the 95% confidence intervals. All other bars denote s.e.m., unless otherwise noted.”

      (4) Is Figure 3D showing the same association index as in Figure 2j, thus showing the same result as in the vision task or is this meant to show something new? It was not clear to me from the wording, so it would be good to clarify.

      You are correct that the magenta trace in Fig. 3D is the same as in Fig. 2J. This panel was included to explicitly illustrate that, in areas A and AM, the separation between History and Association approximately overlaps. We have added the following clarification to the figure legend accordingly.

      Figure 3D: “The percentage of history neurons and the association index (as defined in Fig. 2J) were overlaid for comparison.”

      (5) When computing the Pseudo R2 for regressor contribution, how was the null model computed? From shuffling all regressors in the model? I think this is fine but it's not fully clear what the intended effect of this procedure is. For the description of Figure 4C it would be good to add a sentence explaining how to interpret the pseudo R^2.

      The null model predicts a fixed value that is independent of the explanatory variables, i.e., it predicts only the intercept. This provides a useful correction term when performing cross-validation, particularly in cases where baseline values differ across folds. In Figure 4C, the analysis shows the contribution of adding body part positions and pupil diameter to the model for predicting neural activity. We have added the following text to the Methods section.

      Line 881: “To estimate the contribution of parameters for the left forelimb, the right forelimb, the tail, and the pupil, we repeated the same analysis with a reduced model where each set of predictors was eliminated from the full model (Figure 4B). Then, the pseudo-R<sup>2</sup> was obtained for each set of predictors by (MSE<sub>reduced</sub>MSE<sub>full</sub>) /MSE<sub>null</sub>, where MSE is the mean squared error, MSE<sub>reduced</sub> is MSE for the reduced model, MSE<sub>full</sub> is the MSE of the full model, and MSE<sub>null</sub> is the null model. The null model predicts a fixed value that is independent of the explanatory variables; specifically, it simply outputs the mean of the training data. For example, we constructed a regression model without the parameters regarding the left forelimb (green shade of Figure 4B), obtained MSE<sub>reduced</sub> for the left forelimb, and the pseudo-R<sup>2</sup> was calculated as above by comparing the MSE of the full model and the null model. This value reflects the extent to which the position of the left forelimb contributes to the prediction of neuronal activity.”

      (6) It seems surprising that the pupil-size-related neurons were mapped around visual areas although the pupil should carry clear luminance information. Is this because the luminancerelated information in the pupil can also be explained by the stimulus variable in the model?

      Pupil size changed markedly before and after visual stimulus presentation (Figure S5C), dilating during the black stimulus and constricting during the video stimulus. This likely reflects changes relative to the luminance of the gray screen presented in the absence of visual stimuli. In our encoding model, visual stimuli are included as independent regressors for each corresponding time window. Therefore, pupil fluctuations that are temporally locked to visual stimulation are explained by these visual regressors. Neuronal activity that is better explained by pupil size changes not accounted for by the visual regressors is classified as pupil-related. At least three mechanisms may underlie the influence of pupil size on neuronal activity. First, fluctuations in pupil diameter have been linked to behavioral state or noradrenergic level [REF], which can act as variables independent of visual stimulation. Second, pupil fluctuations may be amplified in a stimulus-dependent manner, reflecting nonlinear interactions between visual input and brain state. Third, changes in pupil diameter alter the amount of light reaching the retina, which can modulate activity in visual cortical areas. The latter two mechanisms are therefore expected to predominantly affect visual areas and may explain why pupil-related neurons are more frequently observed there. The first mechanism is likely related to global brain state, and its association with behavior may account for the presence of pupil-related neurons in S1. However, these interpretations require confirmation through more refined causal manipulations. Accordingly, we limited the addition to the manuscript to the following statement.

      Line 292: “We found that the neurons related to the tail and forepaws were similarly distributed around the parietal cortex including S1 and A, while the pupil-size related neurons were mapped around visual areas (Figure 4C). Changes in pupil diameter may influence neuronal activity through multiple mechanisms, including behavioral state or noradrenergic level [REF], nonlinear interactions with visual stimulation, and changes in the amount of light reaching the retina.”

      (7) What is meant by 'external control parameters such as a video frame' when explaining the encoding model?

      Thank you for the comment. We added the following explanation.

      Line 151: “In the encoding model, the activity of each neuron was fitted by a weighted sum of external control parameters, such as video frames, and behavioral parameters, such as choice and reward direction. Because the visual stimulus changes continuously over time, sliding time windows were placed during the visual stimulus period.”

      (8) What does the trace in Figure 2G show? Is this a single-cell example? What are the axes here?

      We added an explanation to the figure legend.

      Figure 2G: “Schematic of our encoding model. The bottom right panel shows an example of single-neuron activity with an overlay of the fitting obtained by the encoding model.”

      (9) There seems to be a word missing in the sentence that describes the results for Figure 3O in the main text.

      Thank you for the comment. We added the following description related to Fig. 3O.

      Line 247: “resulting in the decoding accuracy of time after a specific choice being lower than in A (Figure 3O).”

      (10) The abbreviation RP is used when describing Figure S5A. It should be mentioned that this refers to the response period.

      Thank you for the comment. We added the following description related to Figure S5A.

      Line 283: “We found that the angle of the tail was significantly different from the baseline values several seconds after the response period (RP) (Figure S5A)”

      (11) I can't see the color difference between the traces in Figure 2E. There are probably red and green but this is hard to see for readers with red-green color blindness. Does the black indicate the time of visual stimulation? Is the line in Figure 2F the time when the spouts move in?

      Thank you for the comment. In Fig. 2E, we improved visibility by changing the line opacity. In addition, the vertical line in Fig. 2E indicates the onset of the visual stimulus, and the vertical line in Fig. 2F indicates the onset of the response period. We have added the following explanations to the figure legend.

      Figure 2: E. “Representative vision neurons (ROI 1-4 in I). The red bars indicate sampling periods during video presentation, and the brown bars indicate sampling periods without video stimulation. Vertical black lines mark the onset of the sampling period. F. Representative choice neuron (ROI 5-8 in I) and a non-selective neuron (ROI 9). Light blue lines indicate the response periods in trials with left choices, and purple lines indicate the response periods in trials with right choices. Vertical black lines mark the onset of the response period.”

      (12) It might be useful to provide a short explanation in the results or methods of why the harmonic mean was used for the computation of the association index. I think it makes sense but since it is not commonly used this could be helpful for the reader to understand the approach.

      Thank you for the comment. We added the following explanation to the main text.

      Line 869: “The association index was determined by the harmonic mean of the rates of vision neurons and choice neurons. The harmonic mean approaches the arithmetic mean when the two values are similar, but becomes closer to the smaller value when the two values differ substantially. Therefore, the association index takes a large value when both vision neurons and choice neurons are abundant.”

      (13) I don't fully understand how coupling diversity is computed. If there are six preference vectors, what is meant by taking the average of angles between all pairs of the two vectors?

      Which two are meant here?

      Thank you for the comment. We revised the explanation as follows.

      Line 950: “To quantify the diversity of coupling patterns across clusters, we computed the angle between every pair of preference vectors. We then averaged these pairwise angles and defined this quantity as the “coupling diversity.”

      (14) The results text states that the high correlation between r_anatomy and r_neuropil (Figure 6I) is evidence for the functional correlations being driven by cortico-cortical connectivity. However, Figure 6J shows that correlations for either cortico-cortical or thalamo-cortical connectivity are below 0.94 and generally higher for thalamo-cortical connectivity. This doesn't negate the general point of the authors but it would be good to clarify this section so it is easier to understand if r_anatomy includes both cortico-cortical and thalamo-cortical data and how the results in Figure I and J go together with the description in the results section.

      You are correct. We have revised the text to clarify that the analysis reflects the combined effects of both cortico-cortical and thalamo-cortical inputs.

      Line 436: “This correspondence suggests that the mesoscale interarea correlation is determined by the cortico-cortical and thalamo-cortical common input at mesoscale. Figure S8: A. Using Allen connectivity atlas, the axonal density of cortico-cortical and thalamo-cortical projection was analyzed.”

      (15) I'm not very familiar with canonical correlation analysis and found this part hard to follow. Some additional explainer sentences would be helpful here. For example, what does it mean to take the average of the top 10 canonical correlations as rt_population? What exactly are the canonical correlation vectors? It was also not clear to me what exactly the results in Figure 5J signify.

      Thank you for the comment. We have clarified the description in the main text related to CCA and the associated analyses as follows.

      Line 374: “Next, to investigate r<sub>t</sub> of the population activity (r<sub>t_population</sub>), we first reduced the dimension of population activity in each area into 10 by using PCA (principal component analysis) (Figure S6B,C). Then, “fluctuation activity” was recalculated for each dimension and trial type, analogous to the single-neuron analysis described above, but here representing noise in population-level activation patterns. We applied CCA (canonical correlation analysis) to each pair of areas and obtained an average of 10 canonical correlations (CC<sub>t</sub>) as r<sub>t_population</sub>. CCA identifies pairs of linear combinations of population activity from two areas that maximize their correlation across trials, thereby capturing shared population-level fluctuations. The CC<sub>t</sub> structure between areas was similar across task types (Figure 5H) indicating that this structure reflects the underlying functional connectivity independent of the task. The CC<sub>t</sub> between A and S1t was the largest among all the pairs (Figure 5H), whereas when the CC<sub>t</sub> was averaged across all connections for each area, A and AM had the largest and second largest CC<sub>t</sub>, respectively (Figure 5I). The dominance in CC<sub>t</sub> in A and AM disappeared when the neurons with r<sub>t,single</sub> >0.3 were removed. Notably, the CC<sub>t</sub> of AM and the other areas was uniform regardless of the paired areas across all 10 canonical components (Figure 5J). Thus, area AM is an integration hub of interareal communication, whereas A simply coupled with S1t, and such a correlation structure at the population level critically depends on this subset of neurons.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      In the manuscript, Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake.

      Key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype for genetic knock out is a major weakness.

      We agree with the reviewer that a S. aureus invasion phenotype in ASM K.O. cells would unequivocally demonstrate the importance of ASM for the process. In the revised manuscript, we report an invasion phenotype in ASM K.O. cells. The absence of an invasion phenotype in ASM K.O. cells in our original experiments was likely caused by SM accumulation in ASM-depleted cells originating from FBS (see Figure 2I, in the revised manuscript).

      We thus cultured cells for up to three days in 2% FBS and then reduced the concentration to 1% FBS one day prior to experimentation. Under these conditions reduced S. aureus invasion in ASM K.O.s was observed when compared to wildtype cells.

      This was not detected when we cultured the cells in medium containing the common concentration of 10% FBS. Our new data supports the results we acquired with three different ASM inhibitors.

      The invasion defect in ASM K.O.s cultured in low FBS was more pronounced at 10 min p.i. when compared to the 30 minute time point (Figure 2K), further corroborating that the ASM-dependent invasion pathway is relevant early in infection. This is consistent with the invasion dynamics we observed upon interference with lysosomal Ca<sup>2+</sup> signaling [TPC1 K.O. (Figure 1C), BAPTA-AM (Figure 3D)], lysosomal exocytosis [Syt7 K.O. (Figure 2F), Ionomycin (Figure 3D)] and ASM activity by inhibitor treatment (Figure 3D).

      Originally, we had hypothesized that changes in the sphingolipidome induced by absence of ASM may have caused the lack of an S. aureus invasion phenotype. We thus compared the sphingolipidome of ASM K.O.s cultured in 1% and 10% FBS. Indeed, SM accumulation was less severe when we cultured the cells in 1% FBS (Figure 2M and Supp. Figure 3). Hence, we think that strong SM accumulations in ASM K.O. cells cultured in 10% FBS may facilitate ASM-independent invasion mechanisms and thus, the absence of ASM-dependent invasion could not be detected by analyzing the number of invaded bacteria. This is supported by experiments, where we treated ASM K.O.s with the ASM inhibitor ARC39, which only slightly affected S. aureus invasion, whereas we detected a strong reduction of internalized bacteria by ARC39 treatment of WT cells (Figure 2 J). We think that this experiment and the reduced invasion in ASM K.O.s rule out an ASM/SM-independent effect of the inhibitors.

      - While the authors argue a role for undetectable nano-scale Cer platforms on the cell surface caused by ASM activity, results do not rule out a SM independent role in the cellular uptake phenotype of ASM inhibitors.

      We agree with reviewer that we do not show formation of ceramide-enriched platforms, and we thus changed the manuscript accordingly (see below).

      - The authors have attempted to address many of the points raised in the previous revision. While the new data presented provide partial evidence, the reliance on chemical inhibitors and lack of clear results directly documenting release of lysosomal Ca2+, or single bacterial tracking, or clear distinction between ASM dependent and independent processes dampen the enthusiasm.

      We shared the reviewer’s desire to discriminate between ASM-dependent and ASM-independent processes, but we are limited by cell biology and the simultaneous occurrence of processes - here the uptake of bacteria by multiple pathways.

      However, we were able to address ASM-dependency of our rapid uptake mechanism by observing a genetic phenotype in SMPD1 knockout-cells.

      We here do not make any assumptions on the centrality of the pathway and its importance in vivo. As scientists we were interested in the fact that such an ASM dependent pathway existed. In different as of yet still unidentified cell lines such a pathway may pose the main entry point for bacteria. Or maybe it represent an ASM-dependent mode of receptor uptake which we have identified with the bacteria piggy-backing into the cells.

      - I acknowledge the author's argument of different ASM inhibitors showing similar phenotypes across different assays as pointing to a role for ASM, but the lack of phenotype in ASM KO cells is concerning. The author's argument that altered lipid composition in ASM KO cells could be overcoming the ASM-mediated infection effects by other ASM-independent mechanisms is speculative, as they acknowledge, and moderates the importance of ASM-dependent pathway. The SM accumulation in ASM KO cells does not distinguish between localized alterations within the cells. If this pathway can be compensated, how central is it likely to be?

      We are convinced that our new genetic evidence of an S. aureus invasion phenotype in ASM K.O.s will eliminate the reviewer’s concerns about the role of ASM during the bacterial invasion.

      The new lipidomics data of ASM K.O.s cultured in 1% and 10% FBS (Figure 2, M, Supp. Figure 3) and inhibitor-treated WT cells (Figure 2L, Supp. Figure 3) show a correlation between SM accumulation and the invasion phenotype.

      We agree with the reviewer, however, that the reason why changes in sphingolipidome increase ASM-independent S. aureus internalization by host cells remains elusive. One possible explanation is a dysfunction of the lipid raft-associated protein caveolin-1 upon strong SM accumulation, which was previously shown to appear in ASM-deficient cells (1, 2). A lack of caveolin-1 results in strongly increased host cell entry of S. aureus (3, 4). Characterization of the mechanism behind these observations requires further experimentation and is beyond the scope of the current manuscript.

      Host cells possess mechanisms to prevent infections, while pathogens developed strategies to circumvent these defense processes. In the present scenario, a physiological membrane composition of the host cell represents such a pathogen defense mechanism (as shown e.g. for caveolin-1 that restricts invasion of S. aureus in healthy cells). If a defense mechanism is disabled (as we speculate it is the case upon strong SM accumulation in ASM K.O.s cultured in 10%FBS), infection is facilitated. In healthy WT cells, these mechanisms (e.g. caveolin-1) are functional and, hence, we would not expect a “compensation” of ASM-dependent invasion. We here analyze invasion events that cannot be prevented by host defense mechanisms as they occur in untreated WT cells and are absent upon interfering with the ASM-dependent invasion pathway (by inhibitors and genetic K.O.). Thus, we think the ASM-dependent pathway, which mediates 50-70% of bacteria internalized by healthy WT cells 10 min p.i., is central for the infection.

      - The authors allude to lower phagosomal escape rate in ASM KO cells compared to inhibitor treatment, which appears to contradict the notion of uptake and intracellular trafficking phenotype being tightly linked. As they point out, these results might be hard to interpret.

      We measured phagosomal escape of S. aureus JE2 in ASM K.O. cells cultured in 1% FBS. Again, we infected cells for 10 or 30 min and determined the escape rates 3h p.i. However, the results are similar to escape rates determined with 10% FBS (Author response image 1).

      Escape rates of S. aureus were significantly decreased in absence of ASM regardless of the FBS concentration in the medium. We therefore think that prolonged absence of ASM has other side effects. For instance, certain endocytic pathways could be up- or down-regulated to adapt for the absence of ASM or could be affected by other changes in the lipidome (that can be minimized but not completely prevented by culturing cells in 1% FBS). This could, for instance, affect maturation of S. aureus-containing phagosomes and hence phagosomal escape.

      Author response image 1.

      As it is unclear how prolonged absence of ASM can affect cellular processes, we think other experiments investigating the role of ASM-dependent invasion for phagosomal escape are more reliable. Most importantly, bacteria that enter host cell early during infection (and thus, predominantly via the “rapid” ASM-dependent pathway) possess lower phagosomal escape rates than bacteria that entered host cells later during infection (Figure 5, D and E). This is confirmed by higher escapes rates upon blocking ASM-dependent invasion with Vacuolin-1 (Figure 4E) and three different ASM inhibitors (Figure 4C and D). We further demonstrate that sphingomyelin on the plasma membrane during invasion influences phagosomal escape, while sphingomyelin levels in the phagosomal membrane did not change phagosomal escape (Figure5 a and b). This is summarized in Figure 5F.

      - Could an inducible KD system recapitulate (some of) the phenotype of inhibitor treatment ? If S. aureus does not escape phagosome in macrophages, could it provide a system to potentially decouple the uptake and intracellular trafficking effects by ASM (or its inhibitor treatment)?

      Inducible knock-downs in our laboratory are based on the vector pLVTHM in cells co-expressing the repressor TetR fused to a KRAB domain. It needs to be stated that for optimal knock-downs the induction has to be performed by doxycycline supplementation in the medium for 7 days thus leading to several days of growth of the cells, which will allow the cells to adapt their lipid metabolism thus reflecting a situation that we encounter for the K.O.s.

      ASM-dependent uptake of S. aureus in macrophages has been demonstrated before (5). However, the course of infection in macrophages differs from non-professional phagocytes (6). E.g. in macrophages, S. aureus replicates within phagosomes, whereas in non-professional phagocytes replicates in the host cytosol. Absence of ASM therefore may influence the intracellular infection of macrophages with S. aureus in a distinct manner.

      - The role of ASM on cell surface remains unclear. The hypothesis proposed by the authors that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms could be plausible, but is not backed by data, technical challenges to visualize these platforms notwithstanding. These results do not rule out possible SM independent effects of ASM on the cell surface, if indeed the role of ASM is confirmed by controlled genetic depletion studies.

      We agree with the reviewer that we do not show generation of ceramide-enriched platforms. We thus changed Figure 6F in the revised manuscript to make clear that it remains elusive whether ceramide-enriched platforms are formed. We also added a sentence to the discussion (line 615) to emphasize that the existence of these microdomains is still debated in lipid research.

      We think that the following observations support SM-dependent effects of ASM during S. aureus invasion:

      (i) reduced invasion upon removing SM from the plasma membrane (Figure 2N, Supp. Figure 2M)

      (ii) increased invasion in TPC1 and Syt7 K.O. (Figure 2, P) in presence of exogenously added SMase.

      However, we agree with the reviewer that we do not directly demonstrate ASM-mediated SM cleavage during S. aureus invasion. Hence, we added a sentence to the discussion that mentions a possible SM-independent role of ASM for invasion (line 556) that reads:

      “Since it remains elusive to which extent ASM processes SM on the plasma membrane during S. aureus invasion, one may speculate that ASM could also have functions other than SM metabolization during host cell entry of the pathogen. However, we did not detect a direct interaction between S. aureus and ASM in an S. aureus-host interactome screen (7).”

      - The reviewer acknowledges technical challenges in directly visualizing lysosomal Ca2+ using the methods outlined. Genetically encoded lysosomal Ca2+ sensor such as Gcamp3-ML1 might provide better ways to directly visualize this during inhibitor treatment, or S. aureus infection.

      We thank the reviewer for this suggestion. We included the following section in our discussion (line 593):

      “Since fluorescent calcium reporters allow to monitor this process microscopically (8, 9) ,future experiments may visualize this process in more detail and contribute to our understanding of the underlying signaling. mechanisms.”

      References

      (1) J. Rappaport, C. Garnacho, S. Muro, Clathrin-mediated endocytosis is impaired in type A-B Niemann-Pick disease model cells and can be restored by ICAM-1-mediated enzyme replacement. Mol Pharm 11, 2887-2895 (2014).

      (2) J. Rappaport, R. L. Manthe, C. Garnacho, S. Muro, Altered Clathrin-Independent Endocytosis in Type A Niemann-Pick Disease Cells and Rescue by ICAM-1-Targeted Enzyme Delivery. Mol Pharm 12, 1366-1376 (2015).

      (3) C. Hoffmann et al., Caveolin limits membrane microdomain mobility and integrin-mediated uptake of fibronectin-binding pathogens. J Cell Sci 123, 4280-4291 (2010).

      (4) L.-P. Tricou et al., Staphylococcus aureus can use an alternative pathway to be internalized by osteoblasts in absence of β1 integrins. Scientific Reports 14, 28643 (2024).

      (5) C. Li et al., Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal 28, 916-934 (2018).

      (6) A. Moldovan, M. J. Fraunholz, In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol 21, e12997 (2019).

      (7) M. Rühling, F. Schmelz, A. Kempf, K. Paprotka, J. Fraunholz Martin, Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio 0, e03654-03624 (2025).

      (8) D. Shen et al., Lipid storage disorders block lysosomal trafficking by inhibiting a TRP channel and lysosomal calcium release. Nat Commun 3, 731 (2012).

      (9) L. C. Davis, A. J. Morgan, A. Galione, NAADP-regulated two-pore channels drive phagocytosis through endo-lysosomal Ca(2+) nanodomains, calcineurin and dynamin. EMBO J 39, e104058 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      General assessment of the work:

      In this manuscript, Mohr and Kelly show that the C1 component of the human VEP is correlated with binary choices in a contrast discrimination task, even when the stimulus is kept constant and confounding variables are considered in the analysis. They interpret this as evidence for the role V1 plays during perceptual decision formation. Choice-related signals in single sensory cells are enlightening because they speak to the spatial (and temporal) scale of the brain computations underlying perceptual decision-making. However, similar signals in aggregate measures of neural activity offer a less direct window and thus less insight into these computations. For example, although I am not a VEP specialist, it seems doubtful that the measurements are exclusively picking up (an unbiased selection of) V1 spikes. Moreover, although this is not widely known, there is in fact a long history to this line of work. In 1972, Campbell and Kulikowski ("The Visual Evoked Potential as a function of contrast of a grating pattern" - Journal of Physiology) already showed a similar effect in a contrast detection task (this finding inspired the original Choice Probability analyses in the monkey physiology studies conducted in the early 1990's). Finally, it is not clear to me that there is an interesting alternative hypothesis that is somehow ruled out by these results. Should we really consider that simple visual signals such as spatial contrast are *not* mediated by V1? This seems to fly in the face of well-established anatomy and function of visual circuits. Or should we be open to the idea that VEP measurements are almost completely divorced from task-relevant neural signals? Why would this be an interesting technique then? In sum, while this work reports results in line with several single-cell and VEP studies and perhaps is technically superior in its domain, I find it hard to see how these findings would meaningfully impact our thinking about the neural and computational basis of spatial contrast discrimination.

      We agree that single cell measurements allow for a spatially more detailed analysis, but they are not feasible in humans. Assuming we value insights into the relationship between neural activity and decision making in the human as well as non-human brain, we are restricted to non-invasive measurements such as EEG, which inevitably showcase the neural underpinnings of decision making at a coarser level of analysis. This was the challenge we met with our paradigm design. For example, we chose contrast as the task-relevant stimulus feature in this study because monotonic contrast response functions exist for sensory neurons throughout the visual system, and the aggregated measures that we could attain with EEG would reflect that contrast-sensitivity and hence provide a window onto the encoding of the main decision-relevant quantity. We were specifically interested in initial afferent, contrast-dependent V1 activity reflected in the C1 component (80-90 ms). As we point out in the Introduction, the C1 is unusual among EEG signals in the extent to which it is dominated by a single visual area, V1 (Jeffreys & Axford, 1972; Clark et al., 1994; Di Russo et al., 2002; Ales et al., 2010; Mohr et al., 2024), and even if other downstream areas also make a minor contribution in the C1 time period, it still represents a very low-level sensory response early in the sensory analysis pipeline, appropriate for addressing our primary question of whether such a low-level signal is used in the formation of perceptual decisions. The alternative hypothesis, that early responses are passed over in decision readout, relates to a fundamental debate about whether early sensory responses are separated from cognition. The possibility that late, but not early, representations are correlated with choices does not imply that the later sensory representations are divorced from the earlier ones, only that there is a noise component that is not shared between the two, such as that produced by the ensuing computations that generate the later representations. Instead, a lack of choice probability in early representations would imply that decision readout is selective in where it sources sensory evidence from, with some possible reasons being to maintain high quality standards for sensory evidence or to impose a layer of separation between cognition and sensation.

      As the reviewer points out, the animal literature is highly mixed on the topic of choice probability in V1. Even for orientation discrimination tasks where V1 is ostensibly highly suited given the existence of orientation columns in V1, and even when measurements are taken from V1 neurons with good neurometric performance and/or aggregated across a V1 population (Jasper et al 2019), some studies have reported little to no V1 choice probability. If our alternative hypothesis of no EEG-indexed V1 choice probability flies in the face of well-established anatomy and function of visual circuits, then so also do these empirical findings in the animal neurophysiology literature. 

      Although there are important aspects of choice probability that are accessible in single cell studies but not in EEG (e.g. noise correlations, details of circuit physiology), our EEG measurements tap into the same phenomenon, just at a different level of analysis, i.e. the neural population level. At this level, we have been able to address whether the full body of sensory responses at a particular stage of visual analysis is systematically related to perceptual decision outcomes. Very similar questions are in fact sometimes addressed in the animal neurophysiology literature; for example, Kang and Maunsell (2020) aggregated single-cell choice probability measurements within visual areas to investigate whether choice probability strength at the level of an entire visual area was sensitive to task demands. The global vantage point of EEG comes with the additional benefit of picking up signatures of other potentially mediating processes such as attention and being able to control for them in our analysis. Our human study thus provides a valuable complementary viewpoint alongside animal neurophysiology work in this area.

      Summary of substantive concerns:

      (1) The study of choice probability in V1 cells is more extensive than portrayed in the paper's introduction. In recent years, choice-related activity in V1 has also been studied by Nienborg & Cumming (2014), Goris et al (2017), Jasper et al (2019), Lange et al (2023), and Boundy-Singer et al (2025). These studies paint a complex picture (a mixture of positive, absent, and negative results), but should be mentioned in the paper's introduction.

      We thank the reviewer for highlighting these papers bearing on choice-related activity in V1, only two of which we had cited. The three additional studies do indeed lend further support to our description of the complex picture around V1-CP effects in the literature and we have now included them.

      (2) The very first study to conduct an analysis of stimulus-conditioned neural activity during a perceptual decision-making task was, in fact, a VEP study: Campbell and Kulikowski (1972). This study never gained the fame it perhaps deserves. But it would be appropriate to weave it into the introduction and motivation of this paper.

      We are aware of this paper, and indeed we ourselves have shown steady-state VEP (SSVEP) correlations with timing and selection of decision reports (O'Connell et al 2012; Grogan et al 2023), but SSVEPs do not provide an index of initial afferent V1 activity in the way that the C1 of the transient VEP does. SSVEPs are evoked by a rapid sequence of stimulus onsets, so that activity cannot be attributed to a particular stimulus onset nor its bottom-up latency resolved, and, being a response to an ongoing stimulus, it combines top-down and bottom-up influences from striate and extra striate areas (Di Russo et al 2007). Indeed, in Campbell and Kulikowski (1972) the SSVEP was almost entirely eliminated when the stimulus was undetected. This is in keeping with robust modulations of the SSVEP by spatial attention (Muller and Hillyard 2000). Cognitive influences of this magnitude are never observed in the C1, and in fact are often not observed at all even when later VEP components show robust modulations (Luck et al 2000), which motivated a recent meta-analysis to address the issue (Qin et al 2022). This highlights the important distinction between the earliest transient VEP activity reflecting mainly the initial afferent response in V1, and steady-state sensory activity reflecting a mix of bottom-up and top-down influences across visual cortex. Because of the importance of this distinction, we have added a reference to the above SSVEP papers to the 3rd paragraph of the introduction along with a statement about the distinction.

      (3) What are interesting alternative hypotheses to be considered here? I don't understand the (somewhat implicit) suggestion here that contrast representations late in the system can somehow be divorced from early representations. If they were, they would not be correlated with stimulus contrast.

      This same conundrum applies to single-cell studies of choice probability. Do studies showing choice probability in V4 but not V1 for example demonstrate that V4 is divorced from V1? In such studies, measurements are typically taken from large representative samples of neurons from both areas with good neurometric performance in both cases and the task often (though not always) involves a target stimulus feature that is encoded in V1 such as orientation. Why then should V4 but not V1 show choice probability when we know the vast majority of input to the visual cortex passes through V1? It must be that feature representation and choice formation are different things with one not inferring the other. This is true for an EEG study as much as it is for a single-cell study.

      The alternative hypothesis in our study is that the early sensory responses indexed by the C1 are not directly used in the formation of the perceptual decision at hand. As outlined in our comments above, this does not imply that those early responses are divorced from later responses. Of course, both are correlated with stimulus contrast and so would correlate with each other across changing contrast but this does not necessitate that their noise is correlated when contrast is held constant because new instantiations of noise can be generated by the computations performed at each stage of visual processing. Thus, the interesting alternative hypothesis is that information contained in the sensory representation generated during initial afferent V1 activity is not used directly to form decisions, and instead, decisions are read out from the outputs of computations performed further downstream. Such an outcome, if it had arisen in our data, would have been consistent with a separation between cognition and early visual processing. Instead, our results suggest a certain level of cognitive interfacing at the lowest and earliest cortical levels of visual processing. We have now added text to the Introduction to highlight the distinction between sensory representation and decision readout in order to make the alternative hypothesis clearer.

      (4) I find the arguments about the timing of the VEP signals somewhat complex and not very compelling, to be honest. It might help if you added a simulation of a process model that illustrated the temporal flow of the neural computations involved in the task. When are sensory signals manifested in V1 activity informing the decision-making process, in your view? And how is your measure of neural activity related to this latent variable? Can you show in a simulation that the combination of this process and linking hypothesis gives rise to inverted U-shaped relationships, as is the case for your data?

      We thank the reviewer for this suggestion of a simulation, which we carried out using the Matlab code. We have also included new Figure 1-Figure Supplement 1 in the revised manuscript.

      In our view, sensory signals in V1 are informing the decision-making process in this task from at least as early as the initial afferent response. The main point about C1 latency in relation to the response-time contingency of the choice probability effect is that the more time that elapses without a decision made (and therefore the more additional sensory processing that contributes to the decision), the more diluted is the contribution of the C1 to the decision by contributions from later representations, and thus choice probability reduces. Likewise, when response times are too quick for C1 evidence to contribute, choice probability is also absent, hence the inverted-U-shaped curve. Moreover, if the C1-choice correlation is mediated by a top-down factor such as attention rather than readout, the inverted-U-shaped curve is not expected because in such a case the relative timing of the C1 and choice commitment would not be relevant.

      Reviewer #2 (Public review):

      Summary:

      Mohr and Kelly report a high-density EEG study in healthy human volunteers in which they test whether correlations between neural activity in the primary visual cortex and choice behavior can be measured non-invasively. Participants performed a contrast discrimination task on large arrays of Gabor gratings presented in the upper left and lower right quadrants of the visual field. The results indicate that single-trial amplitudes of C1, the earliest cortical component of the visual evoked potential in humans, predict forced-choice behavior over and beyond other behavioral and electrophysiological choice-related signals. These results constitute an important advance for our understanding of the nature and flexibility of early visual processing.

      Strengths:

      (1) The findings suggest a previously unsuspected role for aggregate early visual cortex activity in shaping behavioral choices.

      (2) The authors extend well-established methods for assessing covariation between neural signals and behavioral output to non-invasive EEG recordings.

      (3) The effects of initial afferent information in the primary visual cortex on choice behavior are carefully assessed by accounting for a wide range of potential behavioral and electrophysiological confounds.

      (4) Caveats and limitations are transparently addressed and discussed.

      We would like to thank the reviewer for these positive remarks.

      Weaknesses:

      (1) It is not clear whether integration of contrast information across relatively large arrays is a good test case for decision-related information in C1. The authors raise this issue in the Discussion, and I agree that it is all the more striking that they do find C1 choice probability. Nevertheless, I think the choice of task and stimuli should be explained in more detail.

      We thank the reviewer for raising this point about the large stimulus arrays. As we said in our Discussion, it would seem that aggregation across a large stimulus region would be better suited to a downstream visual area with larger receptive fields, yet our setting of a strict deadline would put the emphasis back on earlier sensory representations. We now elaborate on this matter in the discussion, to say that although the small receptive fields and short, slow horizontal connections in V1 mean that the aggregation necessary for performing the task is unlikely to happen within V1 during the C1 timeframe, the aggregation would be readily achieved simply by convergence of the outputs of all relevant V1 neurons for a given stimulus array on the same decision process. In this sense, the design of our paradigm was such that the globally-measured C1 component on the scalp reflected the same aggregated evidence input as the summed V1 readout that we suppose would be entering the decision process.  

      We have also added further rationale in the Methods section on the practical benefits of the stimulus design, as the reviewer anticipates in their subsequent point, of yielding robust C1 signals. This concern was paramount in the design of this study because we expected the C1 difference metric that was of interest to be very small. We also needed a robust C1 to be measured in both the upper and lower visual field in as many individuals as possible and, in our experience, this is true less often when using smaller stimuli, even with a pre-mapping procedure.

      It also helped to homogenize C1 topography across individuals and ensure that topographies from the upper and lower visual field had sufficient overlap that there were electrodes with strong loading from both topographies where the C1 difference as a function of which array was brighter would be maximal.

      We have updated the methods section to provide these rationales while we describe the stimulus design.

      (2) In a similar vein, while C1 has canonical topographical properties at the grand-average level, these may differ substantially depending on individual anatomy (which the authors did not assess). This means that task-relevant information will be represented to different degrees in individuals' single-trial data. My guess is that this confound was mitigated precisely by choosing relatively extended stimulus arrays. But given the authors' impressive track record on C1 mapping and modeling, I was surprised that the underlying rationale is only roughly outlined. For example, given the topographies shown and the electrode selection procedure employed, I assume that the differences between upper and lower targets are mainly driven by stimulus arms on the main diagonal. Did the authors run pilot experiments with more restricted stimulus arrays? I do not mean to imply that such additional information needs to be detailed in the main article, but it would be worth mentioning.

      We thank the reviewer for their thoughtful consideration of this issue about individual variability in C1 retinotopy. Indeed, as the reviewer anticipated we expected the large stimulus coverage to mitigate this issue and we think that our response to the point above and the changes we made to the manuscript in response address this point also. Although we did not show this in the manuscript, we did in fact find that C1 topography was much more similar across individuals than it has been in previous C1 experiments we have carried out with smaller stimuli.

      However, we acknowledge the reviewer’s point that the signal measured at a specific electrode likely has a variable loading strength from the various gratings in the stimulus array and that the gratings of maximal loading may indeed vary from subject to subject. Such inter-subject variability cannot confound the choice probability effects because the latter are measured within-subject. Nevertheless, it could be a source of noise. We believe the impact of this is unlikely to be substantial for the following reasons:

      i) We designed the spatial spread of contrasts in such a way as to encourage participants to aggregate across the full array. In essence, to match the property of the C1 as an aggregate measure of V1 activity, we designed a task that involved aggregating across stimulus elements. Therefore, the decision weighting applied to any particular grating should be representative of the weighting applied to all gratings and, as such, the specific gratings that contribute most to the C1 signal for a particular participant should be relatively inconsequential.

      ii) By avoiding the horizontal and vertical meridians we avoided the regions of space where the shifts in C1 topography are largest.

      (3) Also, the stimulus arrangement disregards known differences in conduction velocity between the upper and lower visual fields. While no such differences are evident from the maximal-electrode averages shown in Figure 1B, it is difficult to assess this issue without single-stimulus VEPs and/or a dedicated latency analysis. The authors touch upon this issue when discussing potential pre-C1 signals emanating from the magnocellular pathway.

      Indeed, there are important differences in V1 properties between the upper and lower visual fields, visual acuity being another example in addition to conduction velocity as the reviewer points out. However, these differences appeared to be quite minimal in this case (Figure 1B does in fact include a single-stimulus VEP – the “1-stim” entry in the legend). Perhaps this is also due to the large stimulus array which may include a range of conduction velocities within it and thereby blur overall differences between the upper and lower visual field. The variability of contrast within each array was also quite high (+/-20% from the midpoint), which would have further increased within-array conduction velocity variability and blurred differences between arrays.

      Our staircasing procedure may have also helped in this regard to some extent as it included a bias parameter between the arrays to account for any behavioural response biases. Although the small contrast changes it usually incurred are likely much too small to change conduction velocities, it corrected for any effect on behaviour they may have.

      (4) I suspect that most of these issues are at least partly related to a lack of clarity regarding levels of description: the authors often refer to 'information' contained in C1 or, apparently interchangeably, to 'visual representations' before, during, or following C1. However, if I understand correctly, the signal predicting (or predicted by) behavioral choice is much cruder than what an RSA-primed readership may expect, and also cruder than the other choice-predictive signals entered as control variables: namely, a univariate difference score on single-trial data integrated over a 10 ms window determined on the basis of grand-averaged data. I think it is worth clarifying and emphasizing the nature of this signal as the difference of aggregate contrast responses that *can* only be read out at higher levels of the visual system due to the limited extent of horizontal connectivity in V1. I do not think that this diminishes the importance of the findings - if anything, it makes them more remarkable.

      This is true that a univariate measure may stick out in a field increasingly favouring multivariate analyses with the spread of machine learning, and so we have added a short qualifier in the methods section where we describe the C1 measurement to explicitly state that it is a scalar variable. What we have done in using this univariate measure is leverage the rich prior knowledge about V1 anatomy and neurophysiology, rather than trust in data-driven classifiers; interestingly, we found that such a classifier trained on all electrodes discriminates choices less well than our informed univariate measure during the C1 time-frame. 

      We also thank the reviewer for raising an interesting point about the nature of aggregation and readout in the context of our stimulus. We agree that it is not feasible that V1 activity would be aggregated locally in V1 across such large regions of space prior to being readout within the C1 time period. As we say above, the aggregation may instead be carried out through convergent transmission of the parallel, spatially-local V1 information to the decision process.

      (5) Arguably even more remarkable is the finding that C1 amplitudes themselves appear to be influenced by choice history. The authors address this issue in the Discussion; however, I'm afraid I could not follow their argument regarding preparatory (and differential?) weighting of read-outs across the visual hierarchy. I believe this point is worth developing further, as it bears on the issue of whether C1 modulations are present and ecologically relevant when looking (before and) beyond stimulus-locked averages.

      We thank the reviewer for their positive appraisal of this additional finding, which we also found remarkable. We agree that our description of our interpretation was too brief and lacked clarity. We have reworded it and expressed it in terms of the speed accuracy trade-off, with the new explanation given below. However, it is important to remember that this account is speculative and serves only to explain the response-time contingency of the bias. That the bias was present and constitutes a modulation of the C1 does not rest on this argument:

      […] “to explain the RT contingency for the C1 bias, we speculate that the speed-accuracy trade-off could fluctuate from trial to trial and that the corresponding decision bound fluctuations (Heitz and Schall 2012) could be implemented by pre-determining decision weights across visual areas. For example, to achieve faster decisions, the sensory evidence requirement could be reduced by placing greater emphasis on initial afferent V1 evidence. In such a case, the RT contingency of the above choice history bias could be explained if the C1 bias is exerted in proportion with the planned emphasis of C1 evidence for the upcoming decision.”

      Recommendations to the Authors:

      Reviewer #2 (Recommendations for the authors):

      (1) As someone whose first language is not English, I am somewhat hesitant to bring this up, but I found the use of 'readout' as both noun and verb somewhat confusing. I thought read-out was defined as 'that which is read out'.

      We agree that this dual use of the word readout may cause confusion. To avoid this, we have edited the manuscript to replace verbal forms of the word “readout” with “read out”.

      (2) I found it difficult to follow the reasoning for why intermediate RTs should be the ones most affected by C1-related information. Perhaps this could be described in more detail for the uninitiated reader.

      We appreciate that our reasoning for why intermediate RTs should be the ones most affected by C1-related information was difficult to follow. We have now added a simulation to showcase this rationale more clearly - see response to reviewer 1, and new figure supplement to figure 1. 

      (3) It would be interesting to compare the effect sizes observed here to those seen in single-cell studies and to discuss this comparison with regard to differences in the nature of EEG signals and single-cell firing rates.

      While we agree that such a comparison would be interesting if feasible, it would have to be for the same task settings, which have not been used in a single-cell study, and  the very different nature and extent of noise between the two recording modalities would make such a comparison difficult to interpret, e.g. background noise in EEG from ongoing processes unrelated to the task. 

      (4) Figure 1: It may be worth mentioning in the legend that only parts of the peripheral stimulus grid are shown for better visibility, as the Methods speak of 9 x 9 grids. Also, in panel B, it should be mentioned that waveshapes are calculated using individually selected maximal-difference electrodes.

      We thank the reviewer for spotting these. We have updated the caption for this figure to reflect these two observations.

      (5) Figure 4: The different shades of green may be difficult to distinguish when printed.

      Although this may be true, we chose shades of green that differ in luminance so they should still be distinguishable. Different colours may in fact be less distinguishable if they had the same luminance and the print was black-and-white. We chose different shades of the same colour to reflect the fact that we were plotting the same signals at different difficulty levels. In our opinion, this takes precedence since eLife is an online journal so the majority of readers will likely read it digitally.

      (6) Methods/Task: While the ITI of 780 ms is substantial, I was wondering why the authors decided against jittering this interval? It would be helpful to briefly discuss whether contrast adaptation for slow periodic stimulation may have affected the findings.

      We opted against jittering the ITI to avoid an additional source of inter-trial variability. While this may allow for adaptation effects of this source, this would be approximately constant across trials and therefore less of a concern for our design. We have added text to the methods section to state this rationale.

      (7) Methods/Stimuli: The authors convincingly argue that focusing on single arms of the stimuli is an unlikely strategy, but did they ask for participants' strategies during debriefing?

      We are glad that the reviewer found our argument about whether or not participants may have focused on a single arm of the stimuli convincing. We did not ask participants about their strategies but even with such a debriefing, there would still remain a possibility that a participant may have used that strategy but were unaware that they were doing so. In any case, if participants were doing this it would have dampened the strength of our choice probability result. 

      (8) Methods/Procedure, Difficulty Titration: Why did the authors opt for manually adapting the difficulty level in a separate session rather than constantly and automatically titrating difficulty?

      We did this because calculating choice probability requires a comparison of trials with different choice outcomes but the same stimulus so continuously staircasing difficulty level during the experiment would have created a confound. Although this could have been corrected for in our regression, this would have entailed greater noise that we could avoid by staircasing in advance.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The manuscript by Ma et al. provides robust and novel evidence that the noctuid moth Spodoptera frugiperda (Fall Armyworm) possesses a complex compass mechanism for seasonal migration that integrates visual horizon cues with Earth's magnetic field (likely its horizontal component). This is an important and timely study: apart from the Bogong moth, no other nocturnal Lepidoptera has yet been shown to rely on such a dual-compass system. The research therefore expands our understanding of magnetic orientation in insects with both theoretical (evolution and sensory biology) and applied (agricultural pest management, a new model of magnetoreception) significance.

      The study uses state-of-the-art methods and presents convincing behavioural evidence for a multimodal compass. It also establishes the Fall Armyworm as a tractable new insect model for exploring the sensory mechanisms of magnetoreception, given the experimental challenges of working with migratory birds. Overall, the experiments are well-designed, the analyses are appropriate, and the conclusions are generally well supported by the data.

      Strengths

      (1) Novelty and significance: First strong demonstration of a magnetic-visual compass in a globally relevant migratory moth species, extending previous findings from the Bogong moth and opening new research avenues in comparative magnetoreception.

      (2) Methodological robustness: Use of validated and sophisticated behavioural paradigms and magnetic manipulations consistent with best practices in the field. The use of 5-minute bins to study the dynamic nature of the magnetic compass which is anchored to a visual cue but updated with a latency of several minutes, is an important finding and a new methodological aspect in insect orientation studies.

      (3) Clarity of experimental logic: The cue-conflict and visual cue manipulations are conceptually sound and capable of addressing clear mechanistic questions.

      (4) Ecological and applied relevance: Results have implications for understanding migration in an invasive agricultural pest with an expanding global range.

      (5) Potential model system: Provides a new, experimentally accessible species for dissecting the sensory and neural bases of magnetic orientation.

      Weaknesses

      While the study is strong overall, several recommendations should be addressed to improve clarity, contextualisation, and reproducibility:

      We thank Reviewer #1 for the positive and encouraging evaluation of our study. We appreciate the recognition of our work’s strengths and are grateful for the constructive feedback on the remaining weaknesses, which will guide and strengthen our revisions.

      Structure and presentation of results

      Requires reordering the visual-cue experiments to move from simpler (no cues) to more complex (cue-conflict) conditions, improving narrative logic and accessibility for non-specialists.

      Thank you for this thoughtful suggestion. While we appreciate the rationale for presenting results from simpler to more complex conditions, we kept the original sequence because it aligns with the logic of our study. Our initial aim was to determine whether fall armyworms use a magnetic compass integrated with visual cues, as shown in the Bogong moth. After establishing this phenotype, we then examined whether visual cues are required for maintaining magnetic orientation. We have also clarified in the Introduction that magnetic orientation in the Bogong moth relies on integration with visual cues, which provides readers with clearer context and improves the overall narrative flow.

      Ecological interpretation

      (a) The authors should discuss how their highly simplified, static cue setup translates to natural migratory conditions where landmarks are dynamic, transient or absent.

      Thank you for raising this important point. We agree that natural migratory environments provide visual information that is often dynamic, transient, or intermittently absent, in contrast to the simplified and static cue used in our indoor experiments. Our intention in using a minimal, static cue was to isolate and test the fundamental presence of magnetic–visual integration in fall armyworms under fully controlled conditions.To address the reviewer’s concern, we have added a brief note in the Discussion indicating that fall armyworms may encounter both static and dynamic luminance-based visual cues in nature, such as light–dark gradients created by terrain features or more stable celestial patterns. Although these natural cues differ from our simplified laboratory stimulus, they may similarly provide asymmetric visual structure that can be integrated with magnetic information. We also note that determining which natural visual cues support the magnetic–visual compass will be an important direction for future work.

      (b) Further consideration is required regarding how the compass might function when landmarks shift position, are obscured, or are replaced by celestial cues. Also, more consolidated (one section) and concrete suggestions for future experiments are needed, with transient, multiple, or more naturalistic visual cues to address this.

      Thank you for this constructive suggestion. We appreciate the reviewer’s point that additional consideration of how the compass might function under shifting, obscured, or celestial visual cues would strengthen the manuscript. Given the limited evidence currently available for this species, we have incorporated a concise and appropriately cautious discussion addressing these possibilities.

      Methodological details and reproducibility

      (a) It would be better to move critical information (e.g., electromagnetic noise measurements) from the supplementary material into the main Methods.

      Thank you for this helpful suggestion. In the revised manuscript, we have added the key electromagnetic noise measurements information to the main Methods section.

      (b) Specifying luminance levels and spectral composition at the moth's eye is required for all visual treatments.

      Thank you for this helpful comment. We have clarified in the Methods as well as the legend of Fig. S3 that both luminance levels and spectral composition were measured at the position corresponding to the moth’s head.

      (c) Details are needed on the sex ratio/reproductive status of tested moths, and a map of the experimental site and migratory routes (spring vs. fall) should be included.

      Thanks. We have added the reproductive status of the tested moths in the Methods, specifying that all individuals used were unmated 2-day-old adults.

      (d) Expanding on activity-level analyses is required, replacing "fatigue" with "reduced flight activity," and clarifying if such analyses were performed.

      Thank you for this comment. In this context, the term “fatigue” referred to the possibility that moths might gradually lose motivation or attention to orient when flying for an extended period in a simplified, artificial environment with limited sensory cues. Such a decrease in orientation motivation over time could, in theory, lead to a loss of individual orientation and consequently to the observed loss of group orientation. To test this possibility, we analyzed the orientation performance of each individual moth across different phases using the Rayleigh test. The r-value was used as a measure of individual directedness (higher r-values indicate stronger orientation). Our results showed that mean r-values did not differ significantly among the experimental phases (multiple comparisons, Table S2). This indicates that 25min measurement itself was not responsible for the loss of orientation. We did not perform a quantitative activity-level analysis in this study. However, as mentioned in Methods, flight activity was continuously monitored during the experiments by observing fluctuations in the pointer values on the experimental software, which corresponded to the moth’s rotational movements. If the pointer values remained unchanged for more than 10 seconds, the experimenter checked for wing vibrations by sound; if the moth had stopped flying, gentle tapping on the arena wall was used to stimulate renewed flight. Only individuals that maintained active flight throughout the experiment, with fewer than four instances of wingbeat cessation, were included in the analysis. We also mentioned that activity level analysis was not performed due to technical difficulties in the revised manuscript.

      Figures and data presentation

      (a) The font sizes on circular plots should be increased; compass labels (magnetic North), sample sizes, and p-values should be included.

      Thank you for this helpful suggestion. Regarding the compass labels and statistical reporting, our analysis provides significance levels as ranges rather than exact p-values; therefore, we clarified in the figure legends that the two dashed circles correspond to thresholds for statistical significance p = 0.05 and p = 0.01, respectively. Sample sizes are already indicated within each panel. To avoid visual clutter caused by displaying both magnetic North and South, we show only the magnetic South direction (mS) consistently across panels, which can improve readability.

      (b) More clarity is required on what "no visual cue" conditions entail, and schematics or photos should be provided.

      Thank you for this comment. In our study, the “no visual cue” condition refers to the absence of the black triangular landmark inside the flight simulator. To improve clarity, we have updated the legend of Fig. 4 to explicitly state this and have referred readers to the schematic in Fig. 1, which illustrates the structure of the flight simulator. These additions clarify what the “no visual cue” condition entails without requiring additional schematics.

      (c) The figure legends should be adjusted for readability and consistency (e.g., replace "magnetic South" with magnetic North, and for box plots better to use asterisks for significance, report confidence intervals).

      Thank you. Regarding the choice of compass labeling, we intentionally used magnetic South (mS) rather than magnetic North (mN) because the main population tested in our experiments represents the autumn migratory generation. During autumn, fall armyworms orient southward when visual and magnetic cues are aligned. Using magnetic South in the plots therefore provides a clearer representation of cue alignment in this season and avoids potential confusion when interpreting the combined visual–magnetic information.

      Conceptual framing and discussion

      (a) Generalisations across species should be toned down, given the small number of systems tested by overlapping author groups.

      Thank you for this valuable comment. In the revised manuscript, we have softened such statements in both abstract and maintext.

      (b) It requires highlighting that, unlike some vertebrates, moths require both magnetic and visual cues for orientation.

      Thank you for this helpful suggestion. We have added a sentence to the Discussion explicitly highlighting that, unlike some vertebrates capable of using magnetic information in the absence of visual cues, moths require the integration of both magnetic and visual cues for accurate orientation. This clarification emphasizes the distinct multimodal nature of compass use in migratory moths.

      (c) It should be emphasised that this study addresses direction finding rather than full navigation.

      Thank you for this important clarification. We have now made it explicit in the manuscript that our experiments address direction finding (i.e., orientation) rather than full navigation. This distinction is stated in both the Introduction and Discussion to clearly define the scope of the study.

      (d) Future Directions should be integrated and consolidated into one coherent subsection proposing realistic next steps (e.g., more complex visual environments, temporal adaptation to cue-field relationships).

      Thank you for this constructive suggestion. We agree that outlining realistic next steps is valuable. However, given the limited scope of the current data, we have only slightly expanded the existing forward-looking statements in the Discussion.

      (e) The limitations should be better discussed, due to the artificiality of the visual cue earlier in the Discussion.

      Thank you for this comment. We agree that the artificiality of the visual cue is an important limitation of the present study. Rather than extending speculative discussion, we have clarified this limitation in the revised Discussion and highlighted the key questions that future work must address.

      Technical and open-science points

      Appropriate circular statistics should be used instead of t-tests for angular data shown in the supplementary material.

      Thank you for this comment. We have addressed this point (Fig. S1) in the revised supplementary material.

      Details should be provided on light intensities, power supplies, and improvements to the apparatus.

      Thank you. Light intensities are reported as spectral irradiance measurements in Supplementary Materials, which provide full wavelength-resolved information for the illumination used, although a separate measurement of total illuminance (lux) was not performed. We have also added the requested information on the power supplies.

      The derivation of individual r-values should be clarified.

      Thanks. We have clarified in the revised manuscript.

      Share R code openly (e.g., GitHub).

      Thanks. We are in the process of organizing the relevant R code, but have not been able to upload it to GitHub before the current revision deadline. The code is available from the corresponding author upon request.\

      Some highly relevant - yet missing - recent and relevant citations should be added, and some less relevant ones removed..

      Thanks. We added one recent relevant reference to the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This work provided experimental evidence on how geomagnetic and visual cues are integrated, and visual cues are indispensable for magnetic orientation in the nocturnal fall armyworm.

      Strengths:

      Although it has been demonstrated previously that the Australian Bogon moth could integrate global stellar cues with the geomagnetic field for long-distance navigation, the study presented in this manuscript is still fundamentally important to the field of magnetoreception and sensory biology. It clearly shows that the integration of geomagnetic and visual cues may represent a conserved navigational mechanism broadly employed across migratory insects. I find the research very important, and the results are presented very well.

      We thank Reviewer #2 for the positive and encouraging evaluation of our study. We appreciate the recognition of our work’s strengths.

      Weaknesses:

      The authors developed an indoor experimental system to study the influence of magnetic fields and visual cues on insect orientation, which is certainly a valuable approach for this field. However, the ecological relevance of the visual cue may be limited or unclear based on the current version. The visual cues were provided "by a black isosceles triangle (10 cm high, 10 cm 513 base) made from black wallpaper and fixed to the horizon at the bottom of the arena". It is difficult to conceive how such a stimulus (intended to represent a landmark like a mountain) could provide directional information for LONG-DISTANCE navigation in nocturnal fall armyworms, particularly given that these insects would have no prior memory of this specific landmark. It might be a good idea to make a more detailed explanation of this question.

      We appreciate the constructive feedback on the weaknesses, which will guide and strengthen our revisions. To address the reviewer’s concern, we have added a brief note in the Discussion indicating that fall armyworms may encounter both static and dynamic luminance-based visual cues in nature, such as light–dark gradients created by terrain features or more stable celestial patterns. Although such natural cues differ from our simplified laboratory stimulus, they may represent intermittently sampled visual inputs that can be optimally integrated with magnetic information, whether the cues are static or changing, and brief periods without them may still allow the subsequent recovery of a stable long-distance orientation strategy.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major to Medium Suggestions

      (a) Reordering of Visual Cue Tests

      The manuscript currently presents cue-conflict experiments before the simpler "no visual cue" tests. For non-specialist readers, it would be more logical to start with the basic condition (no visual cues) and then move to progressively more complex ones. This provides a clearer and more logically sound narrative.

      For example, the results could first demonstrate that without visual cues, the moths fail to orient (both in darkness and uniform light), and then show that introducing a single salient cue (a triangle on the horizon) restores directed behaviour. This would help readers understand the logic of the progression and should be better integrated throughout the Results and Discussion.

      Thanks. We have responded this comment in Public Reviews.

      (b) Translating Key Findings to Realistic Scenarios (LL 333-344 or where suitable in Discussion, and mentioning that we utilised a reductionist principle first in Intro, but clearly articulated that it is very simplified)

      The main text (eg Discussion) should address how these findings translate to real-world conditions. The experimental design used a single, highly salient, and static cue, always aligned with the migratory direction. In nature, such a consistent landmark is unlikely-mountains or other features would shift position relative to the moth's trajectory as it flies.

      Key questions arise which need to be addressed:

      - How would the compass system adapt to changing landmark positions as the moth moves?

      - What happens when no landmarks are visible (e.g. over flat plains or cloudy nights)?

      - Would stellar or other cues take over in such cases? Your hypotheses, please.

      Addressing these points - and proposing specific future experiments (e.g. with transient or multiple visual cues)-would strengthen the ecological relevance of the findings and show a clear way forward.

      Thanks for your kind comments. We now explicitly state in the Introduction that our study employs a reductionist approach using a simplified visual environment to isolate magnetic-visual interactions. As the ecological questions raised by the reviewer cannot be addressed with the current dataset, we avoid extended speculation but have added brief clarification in the Discussion and addressed these points in the Public Reviews response. We also indicate that future work will need to examine the types of visual cues that can support magnetic orientation and how such cues couple with geomagnetic information.

      Technical and Methodological Points

      (a) Incomplete Methods Section

      Critical technical information (e.g. electromagnetic noise measurements) currently appears only in supplementary figure legends. All such details should be included in the main Methods section if the word count allows (or include a short section in the main text with reference to more details in the supplementary material).

      Thanks for your kind comments. We have addressed this as suggested in the Public Reviews.

      (b) Lighting Conditions

      Specify luminance levels (the amount of light emitted and passing through in quanta per unit of surface, eg m2) at the moth's eye and indicate whether spectral composition was consistent between treatments (with and without the visual cue).

      Thanks for your comments. We have responded to this point in the Public Reviews.

      (c) Figures

      - Increase font sizes on circular histograms.

      - Add compass labels (ideally magnetic North, mN, not south, etc, as it is usual in pertinent literature), sample sizes, and p-values on each panel.

      - Replace "magnetic South" (mS) indicators with magnetic North (mN) to align with convention.

      Thanks for your comments. We have responded to this point in the Public Reviews.

      (d) Migratory Expectations

      Include expected compass bearings for spring and autumn migrations (with citations) to relevant figures (Figure 2, 4, S2).

      Thanks for your comments. We have added the information that “We recently found that fall armyworms from the year-round range in Southwest China (Yunnan) exhibit seasonally appropriate migratory headings when flown outdoors in virtual flight simulators, heading northward in the spring and southward in the fall, and this seasonal reversal is controlled by photoperiod (Chen et al., 2023).” in Introduction. Thus, we didn’t offer expected seasonal compass bearings in Results section.

      (e) Add a map showing the experimental site and known migratory routes, clearly labelling spring vs fall routes. It would help justify expected headings.

      Thank you for this suggestion. At present, there are no experimentally validated migratory routes (e.g., through mark-release-recapture or tracking approaches) for the specific fall armyworm population used in our study. Because these routes have not been biologically confirmed, we didn’t offer a presumed migratory map that may imply unwarranted certainty.

      (f) Composition of Test Groups

      Indicate sex ratios and reproductive status (mated/unmated) of tested moths, if known or comment if unknown, as both can affect migratory motivation and behaviour.

      Thank you for this suggestion. We have responded to this point in the Public Reviews.

      (g) Role and Nature of Visual Cues

      While the results clearly show that orientation disappears without visual cues, the triangle cue is highly artificial. Well-studied Bogong moths are known to rely on views of Australian mountain ranges during their nocturnal migrations, but there is no evidence that armyworms use a similar strategy. Even for bogongs, it is not just one salient mountain always in front of them on migration. Discuss whether Fall Armyworm would encounter comparable natural cues in the field along their migratory route, or whether the triangle might simply provide a frame of reference rather than a true landmark.

      Thank you for this comments. We have responded to this point in the Public Reviews.

      (h) Future work could test:

      - More naturalistic sky cues (moonlight, star fields).

      - Varying the landmark's position relative to the magnetic field - slowly moving along - transient landmarks. Also, less salient landmarks and a more complex skyline, as it is usually more complex than just a single salient peak.

      Thank you for this comments. We have responded to this point in the Public Reviews. Brief discussion as suggested has been added to the revised manuscript.

      Minor Comments and Line-by-Line Suggestions

      L70 - Check citation (possibly Mouritsen 2018). Missing in the list of references.

      Thanks. This point has been addressed.

      L75 - Consider citing the new and highly relevant preprint:

      Pakhomov, A., Shapoval, A., Shapoval, N., & Kishkinev, D. (2025). Not All Butterflies Are Monarchs: Compass Systems in the Red Admiral (Vanessa atalanta). bioRxiv.

      Thanks. We have cited this reference.

      LL81-82 - Clarify vague phrasing; specify criteria for "good" vs "poor" orientation ability. Or reword/leave out.

      Thanks for your comments.

      L85 - "but one," not "bar one." 

      Thanks. Corrected.

      L124 - The 2 genetic citations are weakly linked to magnetoreception. We do not have a clear understanding of the insect magnetoreceptor and its underlying mechanism, so we simply cannot interpret genetic associations very well to underpin them to magnetoreception. For example, does noctuid's magnetic sense require a magnetised-based receptor and genes involved in biomineralization? Consider removing or softening claims. 

      Thanks. Adressed.

      LL123-126 - Define what for YOU constitutes "strong evidence" for magnetoreception (e.g. adaptive directional behaviour consistent with migratory orientation?). Is there such a thing as strong evidence at all?

      Thanks for your comments. We agree that terms such as “confirmed” or “strong evidence” can overstate the certainty of magnetoreception findings, given the ongoing debates in the field. In the revised manuscript, we have toned down.

      L153 - Indicate whether coils in NMF condition were powered or inactive.

      Thanks for your comments. Addressed.

      L163 - Justify use of multiple 5-min phases (e.g. temporal resolution of behaviour). It is confusing at the start, where first mentioned, and becomes clearer only towards the end, but it should be clearer at the start.

      Thanks for your comments. The assay was divided into these 5-min segments to provide the temporal resolution needed to detect changes in flight orientation as the relative alignment of magnetic and visual cues was systematically altered. We now clarify this earlier in the Results.

      LL167-171 - This is a good place where you can provide a map (main or supplementary with referencing) showing the study site and migration routes.

      Thanks for your suggestion. We have responded to this point in the Public Reviews.

      L174 - Avoid repetition of "expected."

      Thanks. Addressed.

      LL176-177 - Report 95% confidence intervals or equivalent and clarify which test (e.g. Moore's paired test) each p-value refers to.

      Thanks for your suggestion.

      LL189-191 - explain what fatigue means. I would remove fatigue and substitute it with "lowered flight activity". Also, the same statement comes later, so avoid repetitiveness and remove it in one place. The analysis of directedness is good throughout, but what about the analysis of activity level? Could you explain whether you did it or not, and if not, why, or if angular changes can serve as an activity proxy? Replace "fatigue" with "reduced flight activity." Avoid repetition. Clarify if activity level analysis was performed or if it was not, e.g. due to technical difficulties.

      Thanks for your comments. We have responded to this point in the Public Reviews.

      L196 - Note whether 95% CI overlaps with the expected direction. This is a crucial outcome.

      Thanks for your comments.

      LL203-205 - unclear, better to stick to "congruency", especially "initial congruency for the relationship between mN and visual cue" throughout.

      Thanks for your suggestions.

      L206 - Better to introduce a new subheading: "Laboratory-Reared Animals.".

      Thanks for your suggestion. A new subheading has been added in the revised manuscript.

      LL207-208 - Clarify which cues were available in Chen et al. (2023) and how they differ here.

      Thanks for your comments. In Chen et al. (2023), the moths oriented under an artificial starry sky together with optic flow cues. In contrast, our experiments intentionally removed both the starry-sky pattern and optic flow to avoid introducing additional visual information when testing magnetic-visual integration for orientation. We have added further clarification regarding the conditions used in Chen et al. (2023) in the revised manuscript.

      L228 - Use "lab-reared" consistently throughout the entire MS. Do not mix with lab-raised.

      Thanks. Addressed by consistently using “lab-raised”.

      Figure 2 - Confusing in parts, especially for people coming from birds and other vertebrates orientation background. At 12 o'clock, you usually expect either mN / gN (magnetic or geographic North) or the animal's own initial directional response used as control to compare the same animal's direction post-treatment. Here, your 6 o'clock is magnetic South in the first place - non-conventional. At 12 o'clock, better use mN or gN. Avoid using non-conventional references such as magnetic south. Remind readers of seasonally appropriate headings and refer to the map.

      Thanks. We have responded to this point in the Public Reviews.

      LL232-234 - Emphasize that cue-magnetic congruency is key. Highlight the most important point that the congruency between the seasonal migratory direction and visual cues is key, not that in spring/fall, visual cues must be towards or opposite to the migratory goal. But the visual cue could be in the migratory direction or opposite, or at an angle - this is for future direction.

      Thanks. We have responded to this point in the Public Reviews.

      Figure 2 and associated main text - highlight that you only tested the designs when in all seasons the salient and single visual cue was in the migratory direction (in spring it coincided with mN but in fall it was towards the magnetic south). Other directions of visual cues have not been tested, but for simplicity and consistency, you chose to do these ones as the first step, perhaps.

      Thank you for this insightful comment. Yes, our experiments tested only the conditions in which the salient and single visual cue was aligned with the migratory direction. Other angular relationships between visual cues and the magnetic field were not examined in this study. For simplicity and consistency, we focused on this alignment as a first step toward understanding magnetic-visual cue integration in migratory orientation. We now highlight this in the Fig. 2 legend.

      Figures captures/legends - hard to tell from the main text now, better to italicize figure caption text and visually space them from the main text.

      Thanks for your suggestions.

      LL 250-251 - mention to people more familiar with r - lowercase - what is the expected range for R uppercase. It is not bound 0-1 as r. Could it be negative? How large can it be?

      Thanks. Thanks for the comment. After revisiting Moore (1980) we think that R* cannot take negative values. However, since R* = R*/N^ (3/2), it is not bounded between 0 and 1. We didn’t find any concept of an upper bound in the paper (https://doi.org/10.2307/2335330).

      Figure 3 - Consider adding a horizontal line indicating the 5% significance threshold.

      Thanks for your suggestions.

      L 261 - need to have some narrative after the subheading before you insert Figure 3.

      Thanks. Addreseed.

      LL274-275 - highlight that the timeline of this congruency between mN and a landmark and the effect of this on directedness is not explored here, but worth doing in future. How long does a new congruency or a relationship between mN and a visual cue need to be exposed to the animal to regain its directional response? Clearly, it is just a question of time of exposure so that a new association is established. Suggest future work on time-dependent adaptation to new cue-field relationships.

      Thanks for your suggestion. We have now included this point as a future direction in the revised Discussion.

      Figure 4 & S4 - Replace letters with asterisks/brackets for significance. The use of the letter is confusing and unconventional.

      Thanks for your suggestion.

      Figure 4 caption - Clarify the main takeaway.

      Thanks for your suggestion.

      Figure 4 - bare minimum is confusing. I understand that you wanted to avoid "no visual cues" because, as long as the animal sees things, there are things to be used as visual cues, even if this is not the intention of the experimenter. However, it needs clarification and rewording. Better to be more specific, like "no black triangle and horizon were used, just the uniformly white cylinder", or something like that.

      Thanks for your comments. In our setup it accurately describes the intentional removal of both the black triangle and the horizon, leaving only the uniformly white cylinder as the visual environment. This wording was chosen to reflect the practical limitations of producing a perfectly symmetrical flight simulator under laboratory conditions, and we therefore prefer to retain the original phrasing.

      L328 - Remove Xu et al. (2021) citation (not relevant). This is an in vitro study with a protein which may not work exactly as it is claimed in the paper in vivo.

      Thanks. Citation removed.

      L349-350 - Clarify what "no visual cue" means (e.g., uniformly white cylinder, no horizon line). Include a photo or a schematic of the inner surface of the cylinder for this condition in the Supplementary Materials.

      Thanks. We have responded to this point in the Public Reviews.

      L380 & throughout - Replace "barely minimum visual cues" (BMVC) with "no visual cues", clarifying limitations in Methods, meaning that you can explain that absolutely no visual cues is practically impossible because, as long as there is light, animals can use some asymmetries as cues even if this is not the intention of the experimenter.

      Thank you for this comment. We have decided to retain the term “barely minimum visual cues (BMVC)” because it accurately describes our experimental condition, which is distinct from a true “no visual cues” environment. In the revised Figure legend, we now clarify that BMVC refers to conditions in which obvious visual cues (i.e., features such as the black triangle in Fig. 1) were removed, while acknowledging that complete elimination of all visual information is not possible under illuminated conditions.

      L396 - Be cautious when generalizing from two species tested by a research group that is not absolutely independent (some authors in bogong and armyworm works overlap). We saw examples in diurnal migratory butterflies (Monarchs), a more studied species than the armyworm, that the findings do not entirely translate to Red Admirals (Pakhomov et al. 2025 preprint mentioned). Suggestion to tone down any claims of broad generalisation throughout the manuscript.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      LL402-407 - Note that, unlike birds (e.g. European robins), moths appear to require both magnetic and visual cues for orientation, whereas birds, mole rats and some other animals can use magnetic cues alone.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      L410 - Specify that this is correct only in the Northern Hemisphere.

      Thank you for this comment. Addressed.

      LL415-416 - Acknowledge artificiality of single-cue setup (see the major comments above); integrate earlier in the Discussion.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      LL420-425 - Consolidate Future Directions into a single subsection; include more concrete experimental ideas, for example, using more naturalistic, numerous transient landmarks (could be done in a virtual maze with LEDs on the wall of the cylinder with cues moving with time). Multiple visual cues. Manipulating with salience of cues - less simplistic, less salient.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      L431 - Does this paper support this statement? I think it just tested the use of stellar cues in a zero magnetic field. It also dealt with direction finding, not navigation, which is a position-finding ability - a much more complex feat and might not be the ability of moths (requires further studies like with geographic and magnetic displacements, etc). Reword and check this. Show the distinction between direction finding and navigation.

      Thank you for this comment. We have reworded the relevant sentence to use “orientation” instead of “navigation”.

      L436-437 - Specify "global visual cues" (stellar, lunar, etc.) and merge all future directions into one coherent section.

      Thank you for this comment. Addressed.

      LL443-446 - A bit early to plan such studies because migratory direction could well be a complex multigenetic trait, so that you cannot approach it simply with the knock out of a single gene. The genetic basis of magnetic direction needs to be first demonstrated, which leads you to the Future Directions section.

      Thank you for this helpful comment. We fully agree that migratory direction is likely a complex multigenic trait, and our intention was not to imply that knocking out a single gene would be sufficient to explain magnetic or migratory orientation. Our statement aimed only to highlight that identifying candidate genes is an important first step toward understanding the genetic basis of magnetic orientation.

      Line 496 - Clarify whether optic flow was used (unlike previous studies).

      Thank you for pointing this out. Clarified.

      LL499-511 - Clarify the improvements done in Chen's system and their relevance.

      Thank you for pointing this out. We reworded this sentence “The Flash flight simulator system was developed based on the early design of the Mouritsen-Frost flight simulator and adapted for our experiments in Yuanjiang”.

      Line 531 - Report and compare light intensities between indoor and outdoor experiments.

      Thanks for this comment. Unfortunately, due to the sensitivity limits of our current equipment, we were unable to reliably measure outdoor light intensities at night. However, we did not perform any open-top outdoor flight-simulator experiments; instead, we used field-captured moths but conducted all behavioral tests indoors.

      L549 - Add make/model of power supplies.

      Thanks. Addressed.

      LL582-585 - Specify whether R code will be shared; recommend open access (e.g., GitHub, other open repositories). Reiterate the importance of open science and sharing all scripts. Also here, add citations to some studies where MMRT has been used recently.

      Thank you for this comment. We have responded to this point in the Public Reviews.

      Line 592 - Explain how individual r-values were derived from optical encoder data.

      Thank you for this comment. Addressed.

      L842-843 - t-tests are inappropriate for angular data; use circular tests (Watson-Williams, Mardia-Watson-Wheeler, etc.).

      Thank you for this comment. Addressed.

      L865 - Reword to avoid repetition of "fall." Example: "In field captured armyworms during fall migration".

      Thank you for this comment. Addressed.

      LL882-885 - Improve phrasing and language here. Confirming that - no colon after. "Both the acrylic plate and diffusion paper." Confirm relevance of spectra to moth visual sensitivity - add relevant citation to original studies showing that.

      Thank you for this comment. Addressed.

      L886 - Reword "uniform" - does not look uniform to me.

      Thank you for this comment. Addressed.

      Reviewer #2 (Recommendations for the authors):

      The first two sentences of the abstract ("The navigational mechanisms employed by nocturnal insect migrants remain to be elucidated in most species. Nocturnal insect migrants are often considered to use the Earth's geomagnetic field for navigation, yet the underlying mechanisms of magnetoreception in insects remain elusive") are somewhat redundant. The authors may consider rewriting them.

      Thank you for pointing this out. We have rewritten this opening to provide a more concise and non-repetitive introduction.

    1. Seals Allers—and her fifteen-year-old son, Michael—are working on their own data-driven contribution to the maternal and infant health conversation: a platform and app called Irth—from birth, but with the b for bias removed (figure 1.8). One of the major contributing factors to poor birth outcomes, as well as maternal and infant mortality, is biased care. Hospitals, clinics, and caregivers routinely disregard Black women’s expressions of pain and wishes for treatment.81 As we saw, Serena Williams’s own story almost ended in this way, despite the fact that she is an international tennis star. To combat this, Irth operates like an intersectional Yelp for birth experiences. Users post ratings and reviews of their prenatal, postpartum, and birth experiences at specific hospitals and in the hands of specific caregivers. Their reviews include important details like their race, religion, sexuality, and gender identity, as well as whether they felt that those identities were respected in the care that they received. The app also has a taxonomy of bias and asks users to tick boxes to indicate whether and how they may have experienced different types of bias. Irth allows parents who are seeking care to search for a review from someone like them—from a racial, ethnic, socioeconomic, and/or gender perspective—to see how they experienced a certain doctor or hospital.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Amanda Christopher.

      "taxonomy of bias" love this term and didn't think about it as biased originally.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This paper formulates an individual-based model to understand the evolution of division of labor in vertebrates. The model considers a population subdivided in groups, each group has a single asexually-reproducing breeder, other group members (subordinates) can perform two types of tasks called "work" or "defense", individuals have different ages, individuals can disperse between groups, each individual has a dominance rank that increases with age, and upon death of the breeder a new breeder is chosen among group members depending on their dominance. "Workers" pay a reproduction cost by having their dominance decreased, and "defenders" pay a survival cost. Every group member receives a survival benefit with increasing group size. There are 6 genetic traits, each controlled by a single locus, that control propensities to help and disperse, and how task choice and dispersal relate to dominance. To study the effect of group augmentation without kin selection, the authors cross-foster individuals to eliminate relatedness. The paper allows for the evolution of the 6 genetic traits under some different parameter values to study the conditions under which division of labor evolves, defined as the occurrence of different subordinates performing "work" and "defense" tasks. The authors envision the model as one of vertebrate division of labor.

      The main conclusion of the paper is that group augmentation is the primary factor causing the evolution of vertebrate division of labor, rather than kin selection. This conclusion is drawn because, for the parameter values considered, when the benefit of group augmentation is set to zero, no division of labor evolves and all subordinates perform "work" tasks but no "defense" tasks.

      Strengths:

      The model incorporates various biologically realistic details, including the possibility to evolve age polytheism where individuals switch from "work" to "defense" tasks as they age or vice versa, as well as the possibility of comparing the action of group augmentation alone with that of kin selection alone.

      Weaknesses:

      The model and its analysis are limited, which in my view makes the results insufficient to reach the main conclusion that group augmentation and not kin selection is the primary cause of the evolution of vertebrate division of labor. There are several reasons.

      (1) First, although the main claim that group augmentation drives the evolution of division of labor in vertebrates, the model is rather conceptual in that it doesn't use quantitative empirical data that applies to all/most vertebrates and vertebrates only. So, I think the approach has a conceptual reach rather than being able to achieve such a conclusion about a real taxon.

      We appreciate the reviewer’s point that our model does not incorporate quantitative empirical data across vertebrate taxa. This is indeed a limitation and reflects the current lack of fine-scale datasets on task division, the influence of life-history traits, and the fitness consequences of different cooperative activities in vertebrates. One of our aims, however, is precisely to stimulate such empirical work by highlighting the value of examining division of labor in species inhabiting harsh environments, considering age/size/dominance structure when evaluating variation in cooperative activities, and incorporating defense behaviors more consistently into analyses of helping, especially since defenders are often overlooked relative to the classic helpers-at-the-nest that provision offspring. The model therefore remains directly relevant to vertebrate systems because it departs from insect-inspired approaches that focus on fitness outcomes based solely in maximizing colony productivity. Instead, it incorporates direct fitness benefits to group members, an essential feature of vertebrate cooperative breeding and of other systems with fertile “workers,” as we clarified in the discussion.

      (2) Second, I think that the model strongly restricts the possibility that kin selection is relevant. The two tasks considered essentially differ only by whether they are costly for reproduction or survival. "Work" tasks are those costly for reproduction and "defense" tasks are those costly for survival. The two tasks provide the same benefits for reproduction (eqs. 4, 5) and survival (through group augmentation, eq. 3.1). So, whether one, the other, or both helper types evolve presumably only depends on which task is less costly, not really on which benefits it provides. As the two tasks give the same benefits, there is no possibility that the two tasks act synergistically, where performing one task increases a benefit (e.g., increasing someone's survival) that is going to be compounded by someone else performing the other task (e.g., increasing that someone's reproduction). So, there is very little scope for kin selection to cause the evolution of labor in this model. Note synergy between tasks is not something unusual in division of labor models, but is in fact a basic element in them, so excluding it from the start in the model and then making general claims about division of labor is unwarranted. In their reply, the authors point out that they only consider fertility benefits as this, according to them, is what happens in cooperative breeders with alloparental care; however, alloparental care entails that workers can increase other's survival *without group augmentation*, such as via workers feeding young or defenders reducing predator-caused mortality, as a mentioned in my previous review but these potentially kin-selected benefits are not allowed here.

      We understand the reviewer’s concern that our model restricts the scope for kin-selected benefits by not including task-specific synergy effects—specifically, help that directly increases the survival of group members (e.g., load-lightening via feeding young, or predator defense that reduces mortality of breeders or offspring independently of group augmentation). We agree that such effects can occur in some cooperative breeders, and that they can, in principle, generate indirect fitness benefits. However, even when helpers increase the survival of breeders or reduce parental investment per offspring, these effects generally translate into higher breeder productivity—either via increased fecundity, increased survival to the next breeding attempt, or increased investment in subsequent broods. Thus, although we treat benefits in terms of enhanced breeder productivity, this formulation implicitly captures a range of help-related effects that ultimately improve the reproductive output of the breeders, including those mediated through increased survival. For this reason, we believe that the model remains relevant for vertebrate systems despite not representing each pathway separately.

      (3) Third, the parameter space is understandably little explored. This is necessarily an issue when trying to make general claims from an individual-based model where only a very narrow parameter region of a necessarily particular model can be feasibly explored. As in this model the two tasks ultimately only differ by their costs, the parameter values specifying their costs should be varied to determine their effects. In the main results, the model sets a very low survival cost for work (yh=0.1) and a very high survival cost for defense (xh=3), the latter of which can be compensated by the benefit of group augmentation (xn=3). Some limited variation of xh and xn is explored, always for very high values, effectively making defense unevolvable except if there is group augmentation. In this revision, additional runs have been included varying yh and keeping xh and xn constant (Fig. S6), so without addressing my comment as xn remains very high. Consequently, the main conclusion that "division of labor" needs group augmentation seems essentially enforced by the limited parameter exploration, in addition to the second reason above.

      As we have explained in previous revisions, the costs associated with work and defense are not directly comparable because they affect different fitness components: work costs reduce dominance, whereas defense costs reduce survival. Whether a particular cost is “high” or “low” can only be evaluated by examining the evolved reaction norms and identifying the ranges over which these norms change. For this reason, we focused on parameter ranges that actually generate shifts in reaction norms rather than presenting large regions of parameter space where nothing changes.

      We also reiterate that we did in fact explore broader parameter ranges than those shown in the main text. Additional analyses, including those specifically designed to identify conditions under which division of labor evolves under kin selection alone, are provided in the Supplementary Material. Specifically, Figure S1 addresses the point raised by the “need” of group augmentation benefits for defense to evolve, by increasing the baseline survival x<sub>0</sub>.

      We now include one additional figure in the Supplementary Material with a lower value for the benefit of group size (x<sub>n</sub> = 1 instead of x<sub>n</sub> = 3), and we extended the range of x<sub>h</sub> to include lower values (x<sub>h</sub> = 1). As we can see in Figure S7 and Table S8, group augmentation benefits are still the primary reason for individuals to group (see dispersal values). For low benefits of group augmentation, defense evolves in harsh environments in the absence of kin selection, and in benign environments when both direct and indirect fitness benefits take place. We have also now expanded the results section to include these last results. Note that we also checked even lower values for x<sub>h</sub> under the only kin selection implementation, with results being qualitatively similar, but chose not to include them in the manuscript since it is already a very long Supplementary Material. Here are the averages for two examples with x<sub>h</sub> = 0.1 and when we promote division of labor:

      Author response table 1.

      In short, the conclusion that division of labor requires group augmentation is not an artifact of limited parameter exploration. It arises because kin selection alone favors division of labor only under highly restrictive parameter combinations, whereas including direct fitness benefits substantially expands the conditions under which division of labor evolves. This pattern is consistent across the full set of parameter combinations we examined.

      (4) Fourth, my view is that what is called "division of labor" here is an overinterpretation. When the two helper types evolve, what exists in the model is some individuals that do reproduction-costly tasks (so-called "work") and survival-costly tasks (so-called "defense"). However, there are really no two tasks that are being completed, in the sense that completing both tasks (e.g., work and defense) is not necessary to achieve a goal (e.g., reproduction). In this model there is only one task (reproduction, equation 4,5) to which both helper types contribute equally and so one task doesn't need to be completed if completing the other task compensates for it; instead, it seems more fitting to say that there are two types of helpers, one that pays a fertility cost and another one a survival cost, for doing the same task. So, this model does not actually consider division of labor but the evolution of different helper types where both helper types are just as good at doing the single task but perhaps do it differently and so pay different types of costs. In this revision, the authors introduced a modified model where "work" and "defense" must be performed to a similar extent. Although I appreciate their effort, this model modification is rather unnatural and forces the evolution of different helper types if any help is to evolve.

      In previous models of division of labor in eusocial insects, the implicit benefit is also colony-level productivity (see Beshers & Fewell, 2001, for a review of division of labor in insects). Even in humans, division of labor functions as a means to increase efficiency toward achieving a shared goal. Our model adopts this same interpretation, as outlined in the Introduction, but extends it by considering that different tasks may impose different fitness costs, an aspect that has been largely overlooked in the existing literature. It is precisely because fitness outcomes are not fully shared among group members in vertebrates that distinguishing these cost structures matters. Unlike eusocial insects with sterile workers, vertebrate helpers can obtain direct fitness benefits, and the model explicitly accounts for these direct benefits—something absent from most insect-inspired approaches even when direct fitness benefits can also arise in some of those systems. Thus, our framework is not simply evolving “two types of helpers doing the same task,” but instead evolving specialization in different cooperative roles that carry different fitness consequences. It is therefore suitable for our model to treat contributions to breeder productivity as a common currency, while allowing individuals to specialize in different cost-distinct forms of help.

      Finally, regarding synergy: with the extension introduced in the previous revision, we now incorporate the requirement that multiple forms of help must be performed for the group to achieve maximal reproductive output. This directly addressed the reviewer’s concern about synergistic dependencies between tasks and aligns our framework with the kinds of complementarity highlighted in other models of division of labor.

      In summary, the structure of the model is consistent with both the theoretical literature on division of labor and the biological realities of vertebrate cooperative systems. We believe it is important for future models to explicitly consider the different fitness benefits and costs associated with distinct cooperative behaviors, and hope that our framework encourages more targeted empirical research on division of labor in vertebrates (e.g. inclusion of data on defense, life-history traits and environmental challenges) to better inform future modelling efforts.

      I should end by saying that these comments don't aim to discourage the authors, who have worked hard to put together a worthwhile model and have patiently attended to my reviews. My hope is that these comments can be helpful to build upon what has been done to address the question posed.

      We appreciate the reviewer’s thoughtful and constructive comments, as well as the time invested in evaluating our work. These insights have greatly helped us improve the clarity and overall quality of the manuscript. We hope that the revisions and additional clarifications we have provided adequately address all remaining concerns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors aimed to characterize neurocomputational signals underlying interpersonal guilt and responsibility. Across two studies, one behavioral and one fMRI, participants made risky economic decisions for themselves or for themselves and a partner; they also experienced a condition in which the partners made decisions for themselves and the participant. The authors also assessed momentary happiness intermittently between choices in the task. Briefly, results demonstrated that participants' self-reported happiness decreased after disadvantageous outcomes for themselves and when both they and their partner were affected; this effect was exacerbated when participants were responsible for their partner's low outcome, rather than the opposite, reflecting experienced guilt. Consistent with previous work, BOLD signals in the insula correlated with experienced guilt, and insula-right IFG connectivity was enhanced when participants made risky choices for themselves and safe choices for themselves and a partner.

      Strengths:

      This study implements an interesting approach to investigating guilt and responsibility; the paradigm in particular is well-suited to approach this question, offering participants the chance to make risky v. safe choices that affect both themselves and others. I appreciate the assessment of happiness as a metric for assessing guilt across the different task/outcome conditions, as well as the implementation of both computational models and fMRI.

      We thank Reviewer 1 for their positive assessment of our manuscript.

      Weaknesses:

      In spite of the overall strengths of the study, I think there are a few areas in which the paper fell a bit short and could be improved.

      We thank Reviewer 1 for their comments, which we have used to improve our manuscript. We hope that these changes address the issues raised by the Reviewer.

      (1) While the framing and goal of this study was to investigate guilt and felt responsibility, the task implemented - a risky choice task with social conditions - has been conducted in similar ways in past research that were not addressed here. The novelty of this study would appear to be the additional happiness assessments, but it would be helpful to consider the changes noted in risk-taking behavior in the context of additional studies that have investigated changes in risky economic choice in social contexts (e.g., Arioli et al., 2023 Cerebral Cortex; Fareri et al., 2022 Scientific Reports).

      We certainly agree that several previously published studies have relied on risky choice tasks with social conditions. In this revised version, we now mention these two studies in the substantially revised Introduction.

      (2) The authors note they assessed changes in risk preferences between social and solo conditions in two ways - by calculating a 'risk premium' and then by estimating rho from an expected utility model. I am curious why the authors took both approaches (this did not seem clearly justified, though I apologize if I missed it). Relatedly, in the expected utility approach, the authors report that since 'the number of these types of trials varied across participants', they 'only obtained reliable estimates for [gain and loss] trials in some participants' - in study 1, 22 participants had unreliable estimates and in study 2, 28 participants had unreliable estimates. Because of this, and because the task itself only had 20 gains, 20 losses, and 20 mixed gambles per condition, I wonder if the authors can comment on how interpretable these findings are in the Discussion. Other work investigating loss aversion has implemented larger numbers of trials to mitigate the potential for unreliable estimates (e.g., Sokol-Hessner et al., 2009).

      We agree that we have not clearly justified why we have taken two approaches to assess risk preferences. In short, while the expected utility approach is a more comprehensive method to model a participant’s choices, we had not sufficiently considered the need for the large number of trials required to fit such models when designing our experiment. Calculating the risk premium was the less comprehensive, simpler alternative that we could calculate for all participants. We have now mentioned this fact in the Results section. As the only difference in risk aversion across conditions was found in Study 1 using the expected utility method, which could only be successfully applied in a minority of participants, we believe that this difference should not be taken as a strong finding. We have now mentioned this fact in the revised Discussion.

      (3) One thing seemingly not addressed in the Discussion is the fact that the behavioral effect did not replicate significantly in study 2.

      We agree that we had not sufficiently discussed the fact that there were (slight but significant) differences in risk preferences between the Solo and Social conditions in Study 1 but not in Study 2. We now do so in the revised Discussion, and write the following:

      “Participants made slightly more risk-seeking choices when deciding for themselves than for both themselves and the partner in Study 1, but this difference disappeared in Study 2. The ρ parameter on which this finding in Study 1 is based could only be estimated in a minority of participants due to a relatively low number of trials, which suggests that this finding may not be very reliable. The simpler and more robust method (evaluation of a risk premium) showed no difference in risk aversion across conditions in either study. Overall, we believe that we do not have strong evidence of differences in risk preferences across conditions.”

      (4) Regarding the computational models, the authors suggest that the Reponsibility and Responsibility Redux models provided the best fit, but they are claiming this based on separate metrics (e.g., in study 1, the redux model had the lowest AIC, but the responsibility only model had the highest R^2; additionally, the basic model had the lowest BIC). I am wondering if the authors considered conducting a direct model comparison to statistically compare model fits.

      We agree that we should run formal, direct model comparison tests. We now ran likelihood-ratio tests which showed that the Responsibility model was the best. We now report this in the Results section, just below Table 1:

      “A likelihood ratio test (Equation 9) revealed that the Responsibility model fitted better than all the other models, including the Responsibility Redux model (Study 1: all LR ≥ 47.36, p < 0.0001; Study 2: all LR ≥ 77.83, p < 0.0001).”

      (5) In the reporting of imaging results, the authors report in a univariate analysis that a small cluster in the left anterior insula showed a stronger response to low outcomes for the partner as a result of participant choice rather than from partner choice. It then seems as though the authors performed small volume correction on this cluster to see whether it survived. If that is accurate, then I would suggest that this result be removed because it is not recommended to perform SVC where the volume is defined based on a result from the same whole-brain analysis (i.e., it should be done a priori).

      As indicated in the manuscript, the small insula cluster centered at [-28 24 -4] and shown in Figure 4F survived corrections for multiple tests within the anatomically-defined anterior insula (based on the anatomical maximum probability map described in Faillenot et al., 2017), which is independent of the result of our analysis. Functionally defining the small volume based on the same data would indeed be circular and misleading “double-dipping”. We have most certainly NOT done this. The reason why we selected the anterior insula is because it is one of the regions most frequently associated with guilt (see the explanations in our Introduction, which refers for example to Bastin et al., 2016; Lamm & Singer, 2010; Piretti et al., 2023). Thus we feel that performing small-volume correction within the anatomically-defined anterior insula is a valid analysis. We fully acknowledge that, independently of any correction, the effect and the cluster are small. We now write:

      “We found a weak response in a small cluster within the left anterior insula (peak T = 3.95, d = 0.59, 22 voxels, peak intensity at [-28 24 -4]; Figure 4F). Given the documented association between anterior insula and guilt (see Introduction), we proceeded to test whether this result survived correction for family-wise errors due to multiple comparisons restricted to the left anterior insula gray matter [defined anatomically and thus independently from our findings, as the anterior short gyrus, middle short gyrus, and anterior inferior cortex in an anatomical maximum probability map (Faillenot et al., 2017)]. This correction resulted in a p value of 0.024. This result, although it is only a small effect in a small cluster, is consistent with the mixed model analysis reported earlier.”

      Reviewer #2 (Public review):

      Summary

      This manuscript focuses on the role of social responsibility and guilt in social decision-making by integrating neuroimaging and computational modeling methods. Across two studies, participants completed a lottery task in which they made decisions for themselves or for a social partner. By measuring momentary happiness throughout the task, the authors show that being responsible for a partner's bad lottery outcome leads to decreased happiness compared to trials in which the participant was not responsible for their partner's bad outcome. At the neural level, this guilt effect was reflected in increased neural activity in the anterior insula, and altered functional connectivity between the insula and the inferior frontal gyrus. Using computational modeling, the authors show that trial-by-trial fluctuations in happiness were successfully captured by a model including participant and partner rewards and prediction errors (a 'responsibility' model), and model-based neuroimaging analyses suggested that prediction errors for the partner were tracked by the superior temporal sulcus. Taken together, these findings suggest that responsibility and interpersonal guilt influence social decision-making.

      Strengths

      This manuscript investigates the concept of guilt in social decision-making through both statistical and computational modeling. It integrates behavioral and neural data, providing a more comprehensive understanding of the psychological mechanisms. For the behavioral results, data from two different studies is included, and although minor differences are found between the two studies, the main findings remain consistent. The authors share all their code and materials, leading to transparency and reproducibility of their methods.

      The manuscript is well-grounded in prior work. The task design is inspired by a large body of previous work on social decision-making and includes the necessary conditions to support their claims (i.e., Solo, Social, and Partner conditions). The computational models used in this study are inspired by previous work and build on well-established economic theories of decision-making. The research question and hypotheses clearly extend previous findings, and the more traditional univariate results align with prior work.

      The authors conducted extensive analyses, as supported by the inclusion of different linear models and computational models described in the supplemental materials. Psychological concepts like risk preferences are defined and tested in different ways, and different types of analyses (e.g., univariate and multivariate neuroimaging analyses) are used to try to answer the research questions. The inclusion and comparison of different computational models provide compelling support for the claim that partner prediction errors indeed influence task behavior, as illustrated by the multiple model comparison metrics and the good model recovery.

      We thank the reviewer very much for their comprehensive description of our study and the positive assessment of our study and approach.

      Weaknesses

      As the authors already note, they did not directly ask participants to report their feelings of guilt. The decrease in happiness reported after a bad choice for a partner might thus be something else than guilt, for example, empathy or feelings of failure (not necessarily related to guilt towards the other person). Although the patterns of neural activity evoked during the task match with previously found patterns of guilt, there is no direct measure of guilt included in the task. This warrants caution in the interpretation of these findings as guilt per see.

      We fully agree that not directly asking participants about feelings of guilt is a clear limitation of our study. While we already mention this in our Discussion, we have expanded our discussion of the consequences on the interpretation of our results along the lines described by the reviewer in the revised manuscript. We would like to thank the reviewer for proposing these lines of thought, and have now made the following changes to the text:

      In the first paragraph of the discussion, we now write: “Being responsible for choosing a lottery that yielded a low outcome for a partner made our participants feel worse than witnessing the same outcome resulting from their partner’s choice, which we interpret as interpersonal guilt; although we note that we have not asked participants specifically about which emotion they felt in these situations.

      Later on, in the third paragraph focusing on the anterior insula, we now write: “This replicates a large body of evidence associating aIns with feelings of guilt evoked during social decisions (see Introduction). Because we have neither asked our participants specifically what they felt in these situations, nor specifically whether they experienced guilt, we cannot exclude the possibility that they have instead or in addition felt empathy for their partner, a feeling of failure or bad luck, or some other emotion.”

      As most comparisons contrast the social condition (making the decision for your partner) against either the partner condition (watching your partner make their decision) or the solo condition (making your own decision), an open question remains of how agency influences momentary happiness, independent of potential guilt. Other open questions relate to individual differences in interpersonal guilt, and how those might influence behavior.

      How agency influences momentary happiness or variations thereof during the course of an experiment such as ours is an interesting question in itself. We now ran linear mixed models assessing agency (i.e. we compared happiness in conditions Solo & Social conditions vs. Partner condition), which revealed lower happiness in Solo and Social conditions (i.e. when it was the participant’s turn to decide) in both studies. This is interesting in itself and may reflect the drive behind responsibility aversion reported by Edelson et al.’s 2018 study: being assigned the role of the decider in a social setting may make people slightly unhappy, perhaps due to “weight of the responsibility”. We now report these findings in the Results section, including this proposed explanation; because we were not specifically interested in responsibility aversion, we do not discuss this further in the Discussion. The edited text is under the new subsection entitled ‘Momentary happiness: effects of agency, responsibility and guilt’, on page 12:

      “Next, we assessed whether happiness varied depending on the participant’s agency (Social + Solo vs. Partner), and found happiness to be lower when the participant chose, independent of the outcome (Study 1: t(3600) = -3.92, p = 0.00009, β = -0.14, 95% CI = [-0.20 -0.07]; Study 2: t(2870) = -6.07, p = 0.000000001, β = -0.24, 95% CI = [-0.31 -0.16]). . This is interesting in itself and may reflect the drive behind responsibility aversion reported by Edelson et al.’s 2018 study: being assigned the role of the decider in a social setting may make people slightly unhappy, perhaps due to “weight of the responsibility”. To specifically search for a sign of interpersonal guilt, [...]”

      Regarding individual differences: this is a very interesting topic that we have not addressed here due to the (relatively) small number of participants in our studies, but we might consider this for future follow-up studies, which we mention in the Discussion paragraph regarding open questions.

      This manuscript is an impressive combination of multiple approaches, but how these different approaches relate to each other and how they can aid in answering slightly different questions is not very clearly described. The authors could improve this by more clearly describing the different methods and their added value in the introduction, and/or by including a paragraph on implications, open questions, and future work in the discussion.

      We thank the reviewer for their appreciation of our complementary approach, and agree that we had not sufficiently explained the reasons why we used several methods. We have now added a paragraph explaining this at the end of the Introduction (page 5):

      “We analysed our behavioural data using several complementary methods: choices were modelled with mixed-effects regressions serving as manipulation checks; risk preferences expressed in choices were assessed using a comprehensive expected utility model as well as with a simpler, more robust “risk premium” approach; and happiness data were fitted, in addition to the computational models, with several linear mixed models to assess the impact of both the participant’s and their partner’s rewards, the impact of agency and their interactions. Inspired by findings reported in previous neuroimaging of social emotions, we also used several methods to analyse our fMRI data, including conventional methods (both region-of-interest and mass univariate); mixed-effects regression models; computational model-based analyses (inspired by e.g. Konovalov et al., 2021; Rutledge et al., 2014); and functional connectivity (e.g. Edelson et al., 2018; Konovalov et al., 2021). The behavioural modelling is thus complemented by neuroimaging analyses that offer insight about both the activity in regions associated with guilt as well as their place in a wider network, providing an in-depth comprehensive analysis of the mechanisms behind guilt evoked by social responsibility.”

      In addition, as suggested we added the following paragraph on open questions and future work in the Discussion:

      “Several open questions remain at the end of this study. As discussed above, asking participants directly about which emotions they have felt during the different stages of this task would allow us to link subjective experience with our analytical measures. Testing more participants would allow us to assess the impact of inter-individual variations in personality traits on the experience as well as the behavioural and neural correlates of guilt and responsibility. Using more trials in the experiment would allow separate modelling of risk preferences in gain and loss trials in each experimental condition using expected utility models, and could allow testing whether changes in momentary happiness affect subsequent choices. Varying partner identities (friends, strangers, artificial agent) could reveal the impact of social discounting on guilt and responsibility. In sum, we believe that this experimental approach lends itself very well to the study of several aspects of social emotions.”

      However, taken together, this study provides useful insights into the neural and behavioral mechanisms of responsibility and guilt in social decision-making and how they influence behavior. 

      We thank the reviewer again for their appreciation of our work and hope that our revisions improved the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The majority of my suggestions are in the public review, so I will not repeat them here. But in general, I like the paper, and in addition to my other comments, I think that there should be more discussion of the potential limitations of the study and conclusions that can be drawn. I also thought parts of the results were a little hard to follow, particularly in the 'momentary happiness' section. Perhaps an additional subsection here might help with flow.

      We agree that we could have discussed further the limitations of our study and the conclusions that can be drawn from it, which we have now done in the last paragraphs of the Discussion in this revised version.

      To improve the structure of the section on ‘momentary happiness’, we separated this section into two, entitled: ‘Momentary happiness: links to reward‘ and ‘Momentary happiness: effects of agency, responsibility and guilt’, which should facilitate the reading of this long section. We proceeded in a similar manner for the Choices section, which is now subdivided into ‘Choices: manipulation check’ and ‘Choices: risk preferences’. We believe that these changes have indeed improved the readability of our manuscript.

      Reviewer #2 (Recommendations for the authors):

      Overall, I believe this manuscript was well-designed, consists of extensive analyses, and provides interesting new insights into the mechanisms underlying social decision-making. I mostly have some clarifying questions and minor comments, which are described below. 

      (1) Integration of prior findings in the first paragraphs of the Introduction. Although all the previous work described in the 2nd-5th paragraph introduction is interesting, it felt a bit like an enumeration of findings rather than an integrated introduction leading to the current research question. At the end of paragraph 5, it becomes clear how these findings relate to the current research question, but I believe it will improve the flow and readability of the introduction if this becomes clear earlier on.

      We agree that we could have integrated the cited previous work into the Introduction so that the text builds up to the research question. We have now extensively reworked several paragraphs in the Introduction (pages 3-5) and hope that these changes have made it easier to follow.

      (2) For the risk attitudes (Choices), you describe pooling the gains and losses and then comparing the social and solo conditions. I was wondering whether you also looked at potential differences between gains and losses (delta measure) for social versus the solo condition (so a comparison of the delta). Based on prior work, I can imagine that the difference in risk attitudes for gains and losses might differ when making decisions for yourself versus when you're doing it for a partner. In general, I was wondering how you explain these findings, as there is also a lot of work showing differences in risk-taking patterns for gains and losses.

      We agree that we could have compared delta measures between solo and social conditions. However, as we describe in the Results section and comment on in the Discussion, the relatively low number of trials made separate fitting of gain and loss trials across conditions difficult. While this question could thus be addressed in subsequent versions of our experiment with more trials, such a fine-grained analysis of the decisions was not the focus of our current study.

      (3) On page 11, you state: "in particular the partner's reward prediction errors resulting from the participants' decisions, i.e. those pRPE for which participants were responsible." From the results described in the paragraph above, this doesn't become clear (e.g., there's no distinction made between social_pRPE and partner_pRPE in the text), as it only discusses differences in weights between pRPE and sRPE. I would recommend including some more information in the main text on these main modeling findings, so one doesn't have to go to the Supplemental Materials to understand them.

      We did indeed fail to report these findings in the text! We thank the reviewer for pointing this out. We have now edited this passage as follows:

      “Crucially, we find here that the partner’s reward prediction errors (social_pRPE and partner_pRPE) contributed to explaining changes in participants’ momentary happiness: the Responsibility and ResponsibilityRedux models explained the data better than the models without these parameters (see Table 1). In particular, the partner’s reward prediction errors resulting from the participants’ decisions (social_pRPE), i.e. those pRPE for which participants were responsible, contributed to explaining our data (weights for social_pRPE were greater than 0: Responsibility model: Study 1: Z = 2.85, p = 0.004, Study 2: Z = 3.26, p = 0.001; Responsibility Redux model: Study 1: Z = 2.93, p = 0.003, Study 2: Z = 3.30, p = 0.001; weights for social_pRPE tended to be higher than weights for partner_pRPE: Responsibility model: Study 1: Z = 2.14, p = 0.033; Study 2: Z = 1.41, p = 0.16).”

      (4) The functional connectivity findings seem to come out of nowhere and are not introduced or described anywhere prior in the manuscript. It is therefore not completely clear why you conducted these analyses, or what they add above and beyond previous analyses. Already introducing this method earlier on would fix that.

      We agree that we could have introduced functional connectivity analyses earlier in the text, particularly given the many previous studies in our field using this technique. We have now done this at the end of a new last paragraph of the Introduction:

      “Inspired by findings reported in previous neuroimaging of social emotions, we also used several methods to analyse our fMRI data, including conventional methods (both region-of-interest and mass univariate); mixed-effects regression models; computational model-based analyses (inspired by e.g. Konovalov et al., 2021; Rutledge et al., 2014); and functional connectivity (e.g. Edelson et al., 2018; Konovalov et al., 2021). The behavioural modelling is thus complemented by neuroimaging analyses that offer insight about both the activity in regions associated with guilt as well as their place in a wider network, providing an in-depth comprehensive analysis of the mechanisms behind guilt evoked by social responsibility.”

      (5) For the functional connectivity findings: I was wondering why you only looked at the choice phase, and not at the feedback phase. I understand that previous work focused on the choice phase, but for the purpose of this study (focus on guilt), I can imagine it is also interesting to see what happens with feedback. In the discussion, you also state "How we feel when we witness our decisions' consequences on others is an important signal to consider when attempting to make good social decisions." (p. 19), which is more focused on the feedback rather than choice, and also supports the idea that looking at the feedback moment might be relevant.

      We agree that we could also have looked at the functional connectivity during the feedback phase. The main reason why we had originally not done so was time constraints. At the current time we would in addition point out that the manuscript is already very long and contains many analyses of behavioural and fMRI data. Adding this analysis would cost additional time and would further delay the publication of our manuscript, which we would prefer to avoid. However, one could of course look at these effects in subsequent analyses of the same data or in subsequent versions of this experiment. We have now mentioned this in the Discussion, in the paragraphs on open questions.

      Minor comments:

      (1) For some of the Figures, it would be helpful if the subtitles were more informative. For Figure 2 and Figure 3 for example, it would be nice if Study 1 and Study 2 were not only mentioned in the figure description but also in the actual figure. For Figures 3 and 4, it would be helpful to have significance stars for the bar plots as well.

      We agree that these changes make the figures more easily understandable and have implemented them all, except for adding stars on Figure 4, because all bar plots in panels C and E would have been labeled with two or more stars, which would have made the figure difficult to read. We have now mentioned the fact that all these coefficients were significant in the figure legend.

      (2) For some of the Supplementary Results, it would be very helpful if there was a legend or description. This is already the case for most of the SR, but not for all.

      We have now added a legend to all elements of the Supplementary Results.

      Some questions that came to mind while going through them:

      - Supplementary Table 1: which p-values correspond to the significance stars? This information is included for Supplementary Table 2, but not for ST1. 

      We have now added the missing information in ST1.

      - Supplementary Figure 1: do the colors correspond to different participants? 

      We have now specified that the colors do indeed correspond to different participants.

      - Supplementary Table 5 (final table): what do the - represent? As in, why is there no value for "run" for the MPFC? At first, I thought you only included the significant values, but then I noticed a few non-significant values as well, so it wasn't completely clear to me why some of the values were missing. This also applies to Supplementary Table 6.

      We have indeed forgotten to explain this. The ‘-’ in Supplementary Tables 4 and 6 indicate that the linear mixed model without the factor ‘run’ was the better-fitting one. We have now added the following explanation in the text accompanying Supplementary Table 4:

      “We tested these models both with and without the factor Run and associated interaction, and we report the best-fitting model in the table below: a dash (‘-’) in the row displaying parameters for the run and socialVsSolo:run regressors indicates that the model without factor run was better-fitting for this ROI.”

      (3) I came across a few minor typos or sentences that were not completely clear to me.

      - On page 3: "Patients with damage to ventromedial prefrontal cortex (vmPFC) seem insensitive to guilt when playing social economic games (Krajbich et al., 2009)." This sentence felt a bit out of nowhere and doesn't logically follow from the previous sentences. 

      We have now revised the descriptions of this previous study as well as several others and how they fit into the research question.

      - On page 3: "In another study, participant errors in a difficult perception task lead to a partner feeling pain and evoked activations in left aIns and dlPFC (Koban et al., 2013)." This sentence doesn't really flow, and from the wording, it is not completely clear whether it's the errors or the partner pain that led to the aIns and dlPFC activation.

      We have now revised the description of this study as well, as follows:

      “In another study, partners received painful stimuli when participants made errors during a difficult perception task. These errors evoked activations in the left aIns and dlPFC in the participants (Koban et al., 2013).”

      - Supplementary Figure 1: there is a missing period after the sentence "We then compared these new estimated parameters to the actual parameters from which the synthetic data were generated"

      We have now added a missing comma after “generated”.

      - On page 5: "We ran two experiments, Study 1 outside fMRI and Study 2 during fMRI, with separate groups of participants." I would change "outside fMRI" to outside the MRI scanner or something like that, as it's not completely correct to say "outside fMRI".

      We have changed the sentence to “outside the MRI scanner”.

      - On page 6: for the first result, there are currently two p-values reported (p < 2.5e-20 and p < 2e-16). I believe this is an error?

      This was indeed an error! We have re-run this analysis, noticed that also the degrees of freedom were miscalculated, and have updated this result and the effect of condition (solo vs social). Results are almost identical as previously and all conclusions hold. We have also checked the other analyses reported in this paragraph – all results replicate exactly.

      - On page 6: "Supplemental Table 1" should be "Supplementary Table 1" (for consistency).

      Done.

      On page 8: "participants in both conditions of both studies", I would change "of both studies" to "for both studies".

      Done.

      On page 8: for the "Momentary Happiness" paragraph, it would be helpful if you could briefly describe the Rutledge method here, for people who are unfamiliar with the approach.

      We now write the following at the beginning of this paragraph:

      “Following Rutledge and colleagues’ methodology, which considers that changes in momentary happiness in response to outcomes of a probabilistic reward task are explained by the combined influence of recent reward expectations and prediction errors arising from those expectations, we fitted computational models to each participant’s happiness data.”

      On page 10: "Wilkoxon sign-rank tests", should be "Wilcoxon".

      Done.

      We thank the reviewer for their careful reading of our manuscript. We believe that these changes have indeed improved our manuscript.

    1. SummaryThe chromatin accessibility landscape is the basis of cell-specific gene expression. We generated a multiorgan, single-nucleus chromatin accessibility landscape from the model organism Rattus norvegicus. For this single-cell atlas, we constructed 25 libraries via snATAC-seq from nine organs in the rat, with a total of over 110,000 cells. Cell classification integrating gene activity scores with known marker genes identified 77 cell types, which were strongly correlated with those in published mouse single-cell transcriptome atlases. We further investigated the enrichment of cell type- and organ-specific transcription factors (TFs), the dynamics of T-cell developmental trajectories across organs, and the conservation and specificity of gene expression patterns across species. These findings provide a foundation for further investigations of the cell composition and gene regulatory networks throughout the rat body.HighlightsGeneration of a single-cell atlas of chromatin accessibility in nine organs of the ratCharacterization of cell type- and organ-specific transcription factors (TFs)Dynamics of chromatin accessibility in developing T cells revealed by cross-organ analysisConservation and specificity of gene expression patterns among humans, mice, and rats revealed by cross-species analysisCompeting Interest StatementThe authors have declared no competing interest.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag013), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1:

      Li et al presents a manuscript where they generated a snATAC-seq atlas of 9 major organs in adult rat and integrated the atlas with mouse and human scRNA-seq and scATAC-seq data, revealing that chromatin accessbility is largely conserved between celltypes across species and that there also tissue-specific regulation in some celltypes even when they are common across several tissues. Overall, this looks like a great carefully analysed and annotated resource that would be useful for the community. I appreciate the amount of work that went into curating and analysing this dataset and i thought that the manuscript was very well written and clear.

      I think the most interesting finding is in figure 3 where the authors found unique TFs regulating the same cell-types but in different organs. However, the analysis ends abruptly other than listing these TFs. Can the authors comment on what are the functional consequences/associations of these tissue-specific TFs, perhaps in the discussion?

      The raw data is deposited into a database and can be openly downloaded but i find that the lack of processed data e.g. processed and labelled expression matrices or objects may prevent the adoption of this data by the community as it is a lengthy process to reach the author's conclusions. The authors might also want to consider incorporating an interactive platform for users to explore and navigate this dataset.

      While i appreciate that the authors have detailed in their manuscripts how they performed the data analysis, i would still encourage the authors to upload their scripts/notebooks to an open code repository otherwise again it would be prohibitive for adoption by the community as it is.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      *We thank the reviewers for their insightful and constructive comments, which have substantially strengthened the manuscript. We have addressed all concerns and replaced the previous non-quantitative RNA-seq analysis with a new analysis that allowed for quantitative assessment. We were encouraged to find that the revised analysis not only confirmed our original observations but also reinforced and extended our conclusions. *

      2. Point-by-point description of the revisions

      Reviewer #1


      Significance

      Comment 1: At its current stage, this work represents a robust resource for molecular parasitology research programs, paving the way for mechanistic studies on multilayered gene expression control and it would benefit from experimental evidence for some of the claims concerning the in silico regulatory networks. Terms like "regulons", "recursive feedback loop" are employed without solid confirmation or extensive literature support. In my view, the most relevant contribution of this study is centered in the direct association between proteasome-dependent degradation and Leishmania differentiation.

      __Response: __We thank the reviewer to acknowledge the impact of our work as a robust resource for further mechanistic studies. We agree that the new concepts emerging from our multilayered analysis should be experimentally assessed. However, given the scope of our analysis (i.e. a complete systems-level analysis of bona fide, hamster-isolated L. donovani amastigotes and derived promastigotes) and the amount of data presented in the current manuscript, such functional genetic analysis will merit an independent, in-depth investigation. The current version has been very much toned down and modified to emphasize the impact of our work as a powerful new resource for downstream functional analyses.


      Evidence, reproducibility and clarity

      Comment 1: The narrative becomes somewhat diffuse with the shift to putative multilevel regulatory networks, which would benefit from further experimental validation.

      Response: We agree with the reviewer and toned down the general discussion while suggesting putative multilevel regulatory networks for follow-up, mechanistic analyses. We now emphasize those networks for which evidence in trypanosomatids and other organisms has been published. Experimental validation of some of these regulatory networks is outside the scope of our manuscript and will be pursued as part of independent investigations.

      Major issues

      Comment 1: Fig.1D suggests a significant portion of the SNPs are exclusive, with a frequency of zero in one of the two stages. Were only the heterozygous and minor alleles plotted in Fig.1D, since frequencies close to 1 are barely observed? Is the same true in Sup Fig. S2B? Why do chrs 4 and 33 show unusual patterns in S2B?

      __Response: __We thank the reviewer for this observation. The SNPs exclusive to either one or the other stage are likely the result of the 10% cutoff we use for this kind of analysis (eliminating SNPs that lack sufficient support, i.e. less than 10 reads). Due to bottle neck events (such as in vitro culture or stage differentiation), many low frequency SNPs are either 'lost' (filtered out) or 'gained' (passing the 10% cutoff) between the ama and pro samples. All SNPs above 10% were plotted. The absence of SNPs at 100% is one of the hallmarks of the Ld1S L. donovani strain we are using. Instead, these parasites show a majority of SNPs at a frequency of around 50%, which is likely a sign of a previous hybridization event. Chr 4 and chr 33 show a very low SNP density, most likely as they went through a transient monosomy at one moment of their evolutionary history, causing loss of heterozygosity. We now explain these facts in the figure legend.


      Comment 2: Chr26 revealed a striking contrasting gene coverage between H-1 and the other two samples. While a peak is observed for H-1 in the middle of this chr, the other two show a decrease in coverage. Is there any correlation with the transcriptomic/proteomic findings?

      Response: This analysis is based on normalized median read depth, taking somy variations into account. This is now more clearly specified in the figure legend. We do not see any significant expression changes that would correlate with the observed (minor) read depth changes. As indicated in the legend, we do not consider such small fluctuations (less than +/- 1,5 fold) as significant. The reversal of the signal for chr 26 sample H1 eludes us (but again, these fluctuations are minor and not observed at mRNA level).

      Comment 3: The term "regulon" is used somewhat loosely in many parts of the text. Evidence of co-transcriptomic patterns alone does not necessarily demonstrate control by a common regulator (e.g., RNA-binding protein), and therefore does not fulfill the strict definition of a regulon. It should be clear whether the authors are highlighting potential multiple inferred regulons within a list of genes or not. Maybe functional/ gene module/cluster would be more appropriate terms.

      Response: We thank the reviewer for this important comment. We replaced 'regulon' throughout the manuscript by 'co-regulated, functional gene clusters' (or similar).

      Comment 4: It is unclear whether the findings in Fig.3E are based on previous analysis of stage-specific rRNA modifications or inferred from the pre-snoRNA transcriptomic data in the current work or something else. I struggle to find the significance of presenting this here.

      __Response: __We thank the reviewer for this comment. Yes, these data show stage-specific rRNA modifications based on previous analyses that mapped stage-specific differences of pseudouridine (Y) (Rajan et al., Cell Reports 2023, DOI: 10.1016/j.celrep.2024.114203) and 2'-O-modifications (Rajan et al., Nature Com, in revision) by various RNA-seq analyses and cryoEM. This figure has been modified in the revised version to consider the identification of stage-regulated snoRNAs in our new and statistically robust RNA-seq analysis. These data are shown to further support the existence of stage-regulated ribosomes that may control mRNA translatability, as suggested by the enriched GO terms 'ribosome biogenesis', 'rRNA processing' and 'RNA methylation' shown in Figure 2. We better integrated these analyses by moving the panels from Figure 3 to Figure 2.

      Comment 5: The protein turnover analysis is missing the critical confirmation of the expected lactacystin activity on the proteasome in both ama and pro. A straightforward experiment would be an anti-polyUb western blotting using a low concentration SDS-PAGE or a proteasome activity assay on total extracts.

      Response: We thank the reviewer for this comment and have now included an anti-polyUb Western blot analysis (see Fig S7).

      Comment 6: The viability tests upon lactacystin treatment need a positive control for the PI and the YoPro staining (i.e., permeabilized or heat-killed promastigotes).

      Response: This control is now included in Fig S7 and we have added the corresponding description to the text.

      Comment 7: I found that the section on regulatory networks was somewhat speculative and less focused. Several of the associated conclusions are, in some parts, overstated, such as in "uncovered a similar recursive feedback loop" (line 566) or "unprecedented insight into the regulatory landscape" (line 643). It would be important to provide some form of direct evidence supporting a functional connection between phosphorylation/ubiquitination, ribosome biogenesis/proteins and gene expression regulation.

      Response: We agree with the reviewer and have considerably toned down our statements. Functional analyses to investigate and validate some of the shown network interactions are planned for the near future and will be published separately.

      Minor issues

      1) The ordinal transition words "First,"/"Second," are used too frequently in explanatory sections. I noted six instances. I suggest replacing or rephrasing some to improve flow.

      Response: Rectified, thanks for pointing this out.

      2) Ln 168: Unformatted citations were given for the Python packages used in the study.

      Response: Rectified, thanks for pointing this out.

      3) Fig.1D: "SNP frequency" is the preferred term in English.

      Response: Corrected.

      4) Fig.2A: not sure what "counts}1" mean.

      __Response: __This figure has been replaced.

      5) Ln 685: "Transcripts with FC 0.01 are represented by black dots" -> This sentence is inaccurate. The intended wording might be: "Transcripts with FC 0.01 are represented by black dots"

      Response: We thank the reviewer and corrected accordingly.

      6) Ln 698: Same as ln 685 mentioned above.

      Response: We thank the reviewer and corrected accordingly.

      7) Fig.2B and elsewhere: The legend key for the GO term enrichment is a bit confusing. It seems like the color scales represent the adj. p-values, but the legend keys read "Cluster efficiency" and "Enrichment score", while those values are actually represented by each bar length. Does light blue correspond to a max value of 0.05 in one scale, and dark blue to a max value of 10-7 in the other scale?

      Response: This was corrected in the figure and the legends were updated accordingly.

      8) Sup Figure S3A and S4A: The hierarchical clustering dendrograms are barely visible in the heatmaps.

      Response: Thanks for the comment. Figure S3 was removed and replaced by a hierarchical clustering and a PCA plot.

      9) S3A Legend: The following sentence sounds a bit awkward: "Rows and columns have been re-ordered thanks to a hierarchical clustering". I suggest switching "thanks to a hierarchical clustering" to "based on hierarchical clustering".

      Response: This figure was removed and the legend modified.

      10) Fig.5D: The font size everywhere except the legend key is too small. In addition, on the left panel, gene product names are given as a column, while on the right, the names are shown below the GeneIDs. Consistency would make it clearer.

      Response: Thank you, this is now rectified. To ensue readability, we reduced the number of shown protein kinase examples.



      Reviewer #2

      Evidence, reproducibility and clarity

      Comment 1: In the absence of riboprofiling the authors return to the RNA-seq to assess the levels of pre-Sno RNA (the role of the could be more explicitly stated).

      Response: We thank the reviewer for this comment. We moved the snoRNA analysis from Fig 3 to Fig 2 (see also the similar comment of reviewer 1), which better integrates and justifies this analysis. Based on the new and statistically robust RNA-seq analysis, the volcano plot showing differential snoRNA expression and possible ribosome modification has been adjusted (Figures 2C and D).

      __Comment 2: __The authors provide a clear and comprehensive description of the data at each stage of the results and this in woven together in the discussion allowing hypotheses to be formed on the potential regulatory and signalling pathways that control the differentiation of amastigotes to promastigotes. Given the amount and breadth of data presented the authors are able to present a high-level assessment of the processes that form feedback loops and/or intersectional signalling, but specific examples are not picked out for deeper validation or exploration.

      __Response: __We thank the reviewer to acknowledge the amount and breadth of data presented. As indicated above (see responses to reviewer 1), mechanistic studies will be conducted in the near future to validate some of the regulatory interactions. These will be subject of separate publications. As noted above (response to reviewer 1), we toned down the general discussion, suggest follow-up mechanistic analyses and emphasize those networks for which evidence in trypanosomatids and other organisms has been published.

      __ __ Major comments:

      Are the claims and the conclusions supported by the data or do they require additional experiments or analyses to support them?

      Comment 1: As I have understood it from the description in the text, and in Data Table 4, the RNA-seq element of the work has only been conducted using two replicates. If this is the case, it would substantially undermine the RNA-seq and the inferences drawn from it. Minimum replicates required for inferential analysis is 3 bio-replicates and potentially up to 6 or 12. It may be necessary for the authors to repeat this for the RNA-seq to carry enough weight to support their arguments. (PMID: 27022035)

      Response: We agree with the reviewer and conducted a new RNA-seq analysis with 4 independent biological replicates of spleen-purified amastigotes and derived promastigotes. Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary. We thank the reviewer for this important comment, and the new data not only confirm the previous one (providing a high level of robustness to our data) but allowed us to increase the number of identified stage-regulated snoRNAs, thus further supporting a possible role of ribosome modification in Leishmania stage development.

      Comment 2: There are several examples that are given as reciprocal or recursive signalling pathways, but these are not followed up with independent, orthogonal techniques. I think the paper currently forms a great resource to pursue these interesting signalling interactions and is certainly more than just a catalogue of modifications, but to take it to the next level ideally a novel signalling interaction would be demonstrated using an orthogonal approach. Perhaps the regulation of the ribosomes could have been explored further (same teams recently published related work on this). Or perhaps more interestingly, a novel target(s) from the ubiquitinated protein kinases could have been explored further; for example making precision mutants that lack the ubiquitination or phosphorylation sites - does this abrogate differentiation?

      Response: We agree with the reviewer that the paper currently forms a great resource. In-depth molecular analysis investigating key signaling pathways and regulatory interactions are outside the scope of the current multilevel systems analysis but will be pursued in independent investigations.

      Comment 3: I found the use of lactacystin a bit curious as there are more potent and specific inhibitors of Leishmania proteasomes e.g. LXE-408. This could be clarified in the write-up (See below).

      __Response: __We thank the reviewer for this comment. We opted for the highly specific and irreversible proteasome inhibitor lactacystin that has been previously applied to study the Leishmania proteasome (PMID: 15234661) rather than the typanosomatid-specific drug candidate LXE408 as the strong cytotoxic effect of the latter makes it difficult to distinguish between direct effects on protein turnover and secondary effects resulting from cell death, limiting its utility for dissecting proteasome function in living parasites. We have added this information in the Results section.

      Comment 4: If it is the case that only 2 replicates of the RNA-Seq have been performed it really is not the accepted level of replication for the field. Most studies use a minimum of 3 bioreplicates and even a minimum of 6 is recommended by independent assessment of DESeq2.

      __Response: __See response to comment 1 above.


      Comment 5: As far as I could see, the cell viability assay does not include a positive control that shows it is capable of detecting cytotoxic effects of inhibitors. Add treatment showing that it can differentiate cytostatic vs cytotoxic compound.

      __Response: __This control has now been added to Fig S7.

      If you have constructive further reaching suggestions that could significantly improve the study but would open new lines of investigations, please label them as "OPTIONAL". Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated time investment for substantial experiments.

      Comment 6: It is realistic for the authors to validate the cell viability assay. If the RNA-seq needs to be repeated then this would be a substantial involvement.

      Response: Redoing the RNA-seq analysis was entirely feasible and very much improved the robustness of our results.

      Are the data and the methods presented in such a way that they can be reproduced?

      Comment 7: All the methods are written to a good level of detail. The sample prep, acquisition and data analysis of the protein mass spectrometry contained a high level of detail in a supplemental section. The authors should be more explicit about the amount of replication at each stage, as in parts of the manuscript this was quite unclear.

      Response: We thank the reviewer for this comment and explicitly state the number of replicates in Methods, Results and Figure legends for all analyses. The number of replicates for each analysis is further shown in the overview Figure S1.

      Are the experiments adequately replicated and statistical analysis adequate?

      Comment 8: Unless I have misunderstood the manuscript, I believe the RNA-seq dataset is underpowered according to the number of replicates the authors report in the text.

      Response: See response to comment 1 above.

      Comment 9: Looking at Figure 1 and S1 and Data Table 4 to show the sample workflow I was surprised to see that the RNA-seq only used 2 replicates. The authors do show concordance between the individual biological replicates, but I would consider that only having 2 is problematic here, especially given the importance placed on the mRNA levels and linkage in this study. This would constitute a major weakness of the study, given that it is the basis for a crucial comparison between the RNA and protein levels.

      Response: We agree and have repeated the RNAseq analysis using four independent biological replicates - see response to comment 1.

      Comment 10: It also wasn't clear to me how many replicates were performed at each condition for the lactacystin treatment experiment - can the authors please state this clearly in the text, it looks like 4 replicates from Figure S1 and Data Table 8.

      Response: Indeed, we did 4 replicates. This is now clarified in Methods, Results and Figure legends and shown in Figure S1.

      Comment 11: Four replicates are used for the phosphoproteomics data set, which is probably ok, but other researchers have used a minimum of 5 in phosphoproteomics experiments to deal with the high level of variability that can often be observed with low abundance proteins & modifications. The method for the phosphoproteomics analysis suggests that a detection of a phosphosite in 1 sample (also with a localisation probability of >0.75) was required for then using missing value imputation of other samples. This seems like a low threshold for inclusion of that phosphosite for further relative quantitative analysis. For example, Geoghegan et al (2022) (PMID: 36437406) used a much more stringent threshold of greater than or equal to 2 missing values from 5 replicates as an exclusion criteria for detected phoshopeptides. Please correct me if I misunderstood the data processing, but as it stands the imputation of so many missing values (potentially 3 of 4 per sample category) could be reducing the quality of this analysis.

      Response: We thank the reviewer for this remark and for highlighting best practices in phosphoproteomics data analysis. Unlike other studies that use cultured parasites and thus have access to unlimited amounts, our study employs bona fide amastigotes isolated from infected hamster spleens. In France, the use of animals is tightly controlled and only the minimal number of animals to obtain statistically significant results is tolerated (and necessary to obtain permission to conduct animal experiments).

      Regarding the number of biological replicates, we would like to emphasize that the use of four biological replicates is fully acceptable and used in quantitative proteomics and phosphoproteomics, particularly when combined with high-quality LC-MS/MS data and stringent peptide-level filtering. While some studies indeed employ five or more replicates, this is not a strict requirement, and many high-impact phosphoproteomics studies have successfully relied on four replicates when experimental quality and depth are high. In the present study, we adopted a discovery-oriented approach, aimed at detecting as many confidently identified phosphopeptides as possible. The consistency between replicates, combined with the depth of coverage and signal quality, indicates that four replicates are adequate for both the global proteome and the phosphoproteome in this context. Importantly, the quality of the MS data in this study is supported by (i) a high number of confidently identified peptides and phosphopeptides (identification FDR0.75), and (iii) reproducible quantitative profiles across replicates. Notably, most of the identified phosphopeptides are quantified in at least two replicates within a given condition (between 73.2% and 83.4% of all the identified phosphopeptides among replicates of the same condition).

      Regarding missing value imputation, we appreciate that our initial description may have been unclear and we have revised the Methods to avoid misunderstanding. Phosphosites were only considered if detected with high confidence (identification FDR0.75) in at least one replicate. This criterion was chosen to retain biologically relevant, low-abundance phosphosites, which are more difficult to identify and are often stochastically sampled in phosphoproteomics datasets. For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition. Notably, they were replaced by values in the neighborhood of the observed intensities, rather than by globally low, noise-like values.

      We agree that more stringent exclusion rules, such as those used by Geoghegan et al. (2022), are appropriate in some contexts. However, there is no universally accepted standard for missingness thresholds in phosphoproteomics, and different strategies reflect trade-offs between sensitivity and stringency. In our discovery-oriented approach, we deliberately prioritized biological coverage while maintaining data quality. Our main conclusions are supported by coherent biological patterns, rather than by isolated phosphosite measurements.


      Comment 12: For the metabolomics analysis it looks like 2 amastigote samples were compared against 4 promastigote samples. Why not triplicates of each?

      Response: We thank the reviewer for noticing this point. It is an error in the figure file (Sup figure S1). Four biological replicates of splenic amastigotes were prepared (H130-1, H130-2, H133-1 and H133-2). Amastigotes from 2 biological replicates (H131-1 and H131-2) were seeded for differentiation into promastigotes in 4 flasks (2 per biological replicate) that were collected at passage 2. We have updated the figure file accordingly.

      Minor comments:

      __ __Specific experimental issues that are easily addressable. Are prior studies referenced appropriately?

      * *Comment 1: Yes

      Are the text and figures clear and accurate?

      * *Comment 2: The write up is clear, with the data presented coherently for each method. The analyses that link everything together are well discussed. The figures are mostly clear (see below) and are well described in the legends. There is good use of graphics to explain the experimental designs and sample names - although it is unclear if technical replicates are defined in these figures.

      Response: We thank the reviewer for these positive comments. We now included the information on replicates in the overview figure (Figure S1).

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Comment 3: As I have understood it, the authors have calculated the "phosphostoichiometry" using the ratio of change in the phosphopeptide to the ratio of the change in total protein level changes. This is detailed in the supplemental method (see below). Whilst this has normalised the data, it has not resulted in an occupancy or stoichiometry measurement, which are measured between 0-1 (0% to 100%). The normalisation has probably been sufficient and useful for this analysis, but this section needs to be re-worded to be more precise about what the authors are doing and presenting. These concepts are nicely reviewed by Muneer, Chen & Chen 2025 (PMID: 39696887) who reference seminal papers on determination of phosphopeptide occupancy - and may be a good place to start. An alternative phrase should be used to describe the ratio of ratios calculated here, not phosphostoichiometry.

      Response: We thank the reviewer for this insightful comment and fully agree with the conceptual distinction raised. The reviewer is correct that the approach used in this study does not measure absolute phosphosite occupancy or stoichiometry, which would indeed require dedicated experimental strategies and would yield values bounded between 0 and 1 (0-100%). Instead, we calculated a normalized phosphorylation change, defined as the ratio of the change in phosphopeptide abundance relative to the change in the corresponding total protein abundance (a ratio-of-ratios approach - see doi :10.1007/978-1-0716-1967-4_12), and we tested whether this normalized phosphorylation change differed significantly from zero. This normalization approach is comparable to those previously published in the « Experimental Design and Statistical Analysis of the Proteome and the Phosphoproteome » section of the following paper (DOI: 10.1016/j.mcpro.2022.100428).

      Our intention was to account for protein-level regulation and thereby better isolate changes in phosphorylation dynamics. While this normalization is informative and appropriate for the biological questions addressed here, we agree that the term "phosphostoichiometry" is imprecise and not correct in this context.

      In response, we (i) replaced the term "phosphostoichiometry" throughout the manuscript with a more accurate description, such as "normalized phosphorylation level", or "relative phosphorylation change normalized to protein abundance", and (ii) revised the corresponding Methods and Results text to clearly state that absolute occupancy was not measured.

      This rewording will improve conceptual accuracy without altering the validity or interpretation of the results.

      Comment 4: From the authors methods describing the ratio comparison approach: "Another statistical test was performed in a second step: a contrasted t-test was performed to compare the variation in abundance of each modified peptide to the one of its parent unmodified protein using the limma R package {Ritchie, 2015; Smyth, 2005}. This second test allows determining whether the fold-change of a phosphorylated peptide between two conditions is significantly different from the one of its parent and unmodified protein (paragraph 3.9 in Giai Gianetto et al 2023). An adaptive Benjamini-Hochberg procedure was applied on the resulting p-values thanks to the adjust.p function of R package cp4p {Giai Gianetto, 2016} using the Pounds et al {Pounds, 2006} method to control the False Discovery Rate level."

      Response: The references have been formatted.

      Comment 5: Several aspects of the figures that contain STRING networks are quite useful, particularly the way colour around the circle of each node to denote different molecular functions/biological processes. However, some have descended into "hairball" plots that convey little useful information that would be equally conveyed in a table, for example. Added to this, the points on the figure are identified by gene IDs which, while clear and incontrovertible, are lacking human readability. I suggest that protein name could be included here too.

      Response: We thank the reviewer for this comment but for readability we opted to keep the figure as is. We now refer to Tables 8, 9, and 12 that allow the reader to link gene IDs to protein name and annotation (if available).

      Comment 6: It is also not clear what STRING data is being plotted here, what are the edges indicating - physical interactions proven in Leishmania, or inferred interactions mapped on from other organisms? Perhaps as supplemental data provide the Cytoscape network files so readers can explore the networks themselves?

      Response: We thank the reviewer for this comment. While the STRING plugin in Cytoscape enables integrated network-based analyses, it represents protein-protein associations as a single edge per protein pair derived from the combined confidence score. Consequently, the specific contribution of individual evidence channels (e.g. experimental evidence, curated databases, co-expression, or text mining) cannot be disentangled within this framework. However, this representation was considered appropriate for the present study, which focused on global network topology and functional enrichment rather than on the interpretation of individual interaction types. The information on stringency has been added to the Methods section and the Figure legends (adding the information on confidence score cutoff).

      We decided not to submit the Cytoscape files as they were generated with previous versions of Cytoscape and the STRING plugin. Based on the differential abundance data shown in the tables it will be very easy to recreate these networks with the new versions for any follow up study.

      Comment 7: The title of columns in table S10 panel A are written in French, which will be ok for many people particularly those familiar with proteomics software outputs, but everything else is in English so perhaps those titles could be made consistent.

      __Response: __We apologize and have translated the text in English.

      Comment 8: I would suggest that the authors provide a table that has all the gene IDs of the Ld1S2D strain and the orthologs for at least one other species that is in TriTrypDB. This would make it easy to interrogate the data and make it a more useful resource for the community who work on different strains and species of Leishmania. Although this data is available it is a supplemental material file in a previous paper (Bussotti et al PNAS 2021) and not easy to find.

      Response: We thank the reviewer for this very useful suggestion and have added this table (Table S13).

      Comment 9: Figure 5b - from the legend it is not clear where the confidence values were derived in this analysis, although this is explained in the supplemental method. Perhaps the legend can be a bit clearer.

      Response: We have the following statement to the legend: 'Confidence values were derived as described in Supplementary Methods'.

      Comment 10: Can the authors discuss why lactacystin was used? While this is a commonly used proteasome inhibitor in mammalian cells there is concern that it can inhibit other proteases. At the concentrations (10 µM) the authors used there are off-target effects in Leishmania, certainly the inhibition of a carboxypeptidase (PMID: 35910377) and potentially cathepsins as is observed in other systems (PMID: 9175783). There is a specific inhibitor of the Leishmania proteasome LXE-408 (PMID: 32667203), which comes closer to fulfilling the SGC criteria (PMID: 26196764) for a chemical probe - why not use this. Does lactacystin inhibit a different aspect of proteasome activity compared to LXE-408?

      Response: We have add the following justification to the results section (see also response above to comment 3 for reviewer 2): We chose the highly specific and irreversible proteasome inhibitor lactacystin over the typanosomatid-specific, reversible drug candidate LXE408 as the latter's potent cytotoxicity can confound direct effects on protein turnover with secondary consequences of cell death, limiting its utility for dissecting proteasome function in living parasites.

      Comment 11: The application of lactacystin is changing the abundance of a multitude of proteins but no precision follow up is done to identify if those proteins are necessary and/or sufficient from driving/blocking differentiation. This could be tested using precision edited lines that are unable to be ubiquitinated? There is a lack of direct evidence that the proteins protected from degradation by lactacystin are ubiquitinated? Perhaps some of these could be tagged and IP'd then probed for ubiquitin signal. Di-Gly proteomics to reveal ubiquitinated proteins? These suggestions should be considered as OPTIONAL experiments in the relevant section above.

      Response: We very much appreciate these very interesting suggestions, which we will be considered for ongoing follow-up studies.

      Comment 12: In the data availability RNA-seq section the text for the GEO link is : (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE227637) but the embedded link takes me to (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE165615) which is data for another, different study. Also, the link to the GEO site for the DNA seq isn't working and manual searches with the archive number (BioProject PRJNA1231373 ) does not appear to find anything. The IDs for the mass spec data PRIDE/ProteomeXchange don't seem to bring up available datasets: PXD035697 and PXD035698

      Response: The links have now been rectified and validated. For those data that are still under quarantine, here is the login information: To access the data:

      DNAseq data: https://dataview.ncbi.nlm.nih.gov/object/PRJNA1231373?reviewer=6qt24dd7f475838rbqfn228d0

      RNAseq data:

      https://www.ebi.ac.uk/biostudies/ArrayExpress/studies/E-MTAB-16528?key=65367b55-d77f-4c06-b4bd-bc10f2dc0b14

      Proteomic data: http://www.ebi.ac.uk/pride

      __Username: __reviewer_pxd035698@ebi.ac.uk

      __Password: __gOIcRx0g

      Phosphoproteomic data: http://www.ebi.ac.uk/pride

      __Username: __reviewer_pxd035697@ebi.ac.uk

      __Password: __7GWtBmvx

      Significance Provide contextual information to readers (editors and researchers) about the novelty of the study, its value for the field and the communities that might be interested. The following aspects are important:

      * General assessment: provide a summary of the strengths and limitations of the study. What are the strongest and most important aspects? What aspects of the study should be improved or could be developed?*

      Strengths: Comment 1: The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses of the intersections of regulatory proteins that are associated with life-cycle progression.

      Response: We thank the reviewer for this positive assessment of our work.

      Comment 2: The differentiation step studied is from amastigote to promastigote. I am not aware that this has been studied before using phosphoproteomics. The use of the hamster derived amastigotes is a major strength. While a difficult/less common model, the use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy, the promastigote experiments are performed at a low passage number. This is a strength or the work as it reduces the interference of the biological plasticity of Leishmania when it is cultured outside the host.

      Response: We thank the reviewer for the acknowledgment of our relevant hamster system, for which we face many challenges (financial, ethical, administrative as protocols need to be approved by the French government).

      Limitations: __ __Comment 1: Potential lack of appropriate replication (see above).

      Response: See response to comment 1.

      Comment 2: Lack of follow up/validation of a novel signalling interaction identified from the systems-wide approach. There is a lack of assessment of whether a single signalling cascade is driving the differentiation or these are all parallel, requisite pathways. The authors state the differentiation is not driven by a single master regulator, but I am not sure there is adequate evidence to rule this in or out.

      Response: See response to comment 2 above.

      Advance: compare the study to the closest related results in the literature or highlight results reported for the first time to your knowledge; does the study extend the knowledge in the field and in which way? Describe the nature of the advance and the resulting insights (for example: conceptual, technical, clinical, mechanistic, functional,...).

      Comment 3: The study applies well established techniques without any particular technical step-change. The application of large-scale multi-omics techniques and integrated comparisons of the different experimental workflows allow a synthesis of data that is a step forward from that existing in the previous Leishmania literature. It allows the generation of new hypotheses about specific regulatory pathways and crosstalk that potentially drive, or are at least active, during amastigote>promastigote differentiation.

      Response: We thank the reviewer for these positive comments.

      *Audience: describe the type of audience ("specialized", "broad", "basic research", "translational/clinical", etc...) that will be interested or influenced by this research; how will this research be used by others; will it be of interest beyond the specific field? * This manuscript will have primary interest to those researchers studying the molecular and cell biology of Leishmania and other kinetoplastid parasites. The approaches used are quite standard (so not so interesting in terms of methods development etc.) and given the specific quirks of Leishmania biology it may not be that relevant to those working more broadly in parasites from different clades/phyla, or those working on opisthokont systems- yeast, humans etc. Other Leishmania focused groups will surely cherry-pick interesting hits from this dataset to advance their studies, so this dataset will form a valuable reference point for hypothesis generation.

      Response: We thank the reviewer for this assessment and agree that our data sets will be very valuable for us and other teams to generate hypotheses for follow-up studies.

      Please define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Relevant expertise: Trypanosoma & Leishmania molecular & cell biology, RNA-seq, proteomics, transcriptional/epigenetic regulation, protein kinases - some experience of UPS system.

      I have not provided comment on the metabolomics as it is outside my core expertise. However, I can see it was performed at one of the leading parasitology metabolomics labs.

      Response: We thank the reviewer for sharing expertise, investing time and intelligence in the assessment of our manuscript, and the highly constructive criticisms provided.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      __Summary: __The study presents a comprehensive multi-omics investigation of Leishmania differentiation, combining genomic, transcriptomic, proteomic, phospho-proteomic and metabolomic data. The authors aim to uncover mechanisms of post-transcriptional and post-translational regulation that drive the stage-specific biology of L. donovani. The authors provide a detailed characterization of transcriptomic, proteomic, and phospho-proteomic changes between life stages, and dissect the relative contributions of mRNA abundance and protein degradation to stage-specific protein expression. Notably, the study is accompanied by comprehensive supplementary materials for each molecular layer and provides public access to both raw and processed data, enhancing transparency and reproducibility. While the data are rich and compelling, several mechanistic interpretations (e.g., "feedback loops," "recursive networks," "signaling cascades") are overstated. Similarly, the classification of gene sets as "regulons" is not adequately supported, as no common regulatory factor has been identified and only a single condition change (amastigote to promastigote) was assessed.

      __Response: __We thank the reviewer for these comments and have corrected the manuscript to eliminate all unjustified mechanistic interpretations.

      Major Comments:


      Comment 1:__ Across several sections (incl abstract, L559-565, L589-599, L600-L603, L610-612, L613-614, L625, L643-645, L650-652), the manuscript describes "recursive or self-controlling networks", "signaling cascades", "self-regulating", and "recursive feedback loops" - involving protein kinases, phosphatases, and translational regulators. While the data convincingly demonstrate stage-specific changes in phosphorylation and abundance changes in key molecules, the language used implies causal, direct and directional regulatory relationships that have not been experimentally validated.

      Response: __We agree with the reviewer and have corrected the text, replacing all expressions that may allude to causal or directional relationships by more neutral expressions such as 'co-expression'. __

      Comment 2: Co-expression and shared function alone do not define a regulon (L363, and several other places in the manuscript). A regulon also requires the gene set to be regulated by the same factor, for which there is no evidence here. Regulons can be derived from transcriptomic experiments, but then they need to show the same transcriptional behavior across many biological conditions, while here just 1 condition change is evaluated. Therefore, this analysis is conventional GO enrichment analysis and should not be overinterpreted into regulons.

      __Response: __We agree with the reviewer and have replaced 'regulon' with 'co-regulated gene clusters' (or similar).

      Comment 3: LFQ intensity of 0 (e.g., L389): An LFQ intensity of 0 does not necessarily indicate that a protein is absent, but rather that it was not detected. This can occur for several reasons: (1) true biological absence in one condition, (2) low abundance below the detection threshold, or (3) stochastic missingness due to random dropout in mass spectrometry. While the authors state that adjusted p-values for the 1534 proteins exclusively detected in either amastigotes or promastigotes are below 0.01, I could not find corresponding p-values for these proteins in Table 8 ('Global_Proteomic'). An appropriate statistical method designed to handle this type of missingness should be used. In this context, I also find the following statement unclear: "identified over 4000 proteins at each stage in at least 3 out of 4 biological replicates, representing 3521 differentially expressed proteins (adjusted p-value Response: We fully agree with the reviewer, an LFQ intensity of 0 may results from various reasons. We realize that our wording may have been ambiguous. For clarity, we have modified the original text to: 'Label-free quantitative proteomic analysis of 4 replicates of amastigotes and derived promastigotes identified over 4000 proteins, including 1987 differentially expressed proteins (adjusted p-value<br /> Comment 4: L412 - Figure 3B: The figure shows proteins with infinite fold changes, which result from division by zero due to LFQ intensity values of zero in one of the compared conditions. As previously noted, interpreting LFQ zero values as true absence of expression is problematic, since these zeros can arise from several technical reasons - such as proteins being just below the detection threshold or due to stochastic dropout during MS analysis. Therefore, the calculated fold changes for these proteins are likely highly overestimated. This concern is visually supported by the large gap on the y-axis (even in log scale) between these "infinite" fold changes and the rest of the data. Moreover, given Leishmania's model of constitutive gene expression, it seems biologically implausible that all these proteins would be completely absent in one stage. This issue applies not only to Figure 3B, but also to the analyses presented in Figures 4D and 4E.

      Response: __We thank the reviewer for this comment. To clarify this section, we modified the text as follows: 'Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p <br /> __Minor Comments:

      Methods L132: Typo: "A according" should be "according."

      __Response: __The 'A' refers to RNase A. We added a comma for clarification (...RNase A, according to...)

      L158: How exactly were somy levels calculated? Please specify the method used, as I could not find a clear description in the referenced manuscript.

      __Response: __We thank the reviewer for this comment. Aside the already quite detailed description in Methods and the reference there to the paper describing the pipeline, we now added a link to the description of the karyotype module of the giptools package (https://gip.readthedocs.io/en/latest/giptools/karyotype.html). There the following explanation can be found: "The karyotype module aims at comparing the chromosome sequencing coverage distributions of multiple samples. This module is useful when trying to detect chromosome ploidy differences in different isolates. For each sample the module loads the GIP files with the bin sequencing coverage (.covPerBin.gz files) and normalizes the meancoverage values by the median coverage of all bins. The bin scores are then converted to somy scores which are then used for producing plots and statistics." The description then goes into further detail.

      L158: Chromosome 36 is not consistently disomic, as stated. It has been observed in other somy states (e.g., Negreira et al. 2023, EMBO Reports, Figure 1), even if such occurrences are rare in the studied context. Normalizing by chr36 remains a reasonable choice, but it would be helpful to confirm that the majority of chromosomes appear disomic post-normalization to support the assumption that chr36 is disomic in this dataset as well.

      __Response: __We thank the reviewer for this comment. Unlike the paper cited above (using long-term cultured promastigotes), our analysis uses promastigote parasites from early culture adaptation (p2) that were freshly derived from splenic amastigotes known to be disomic (and confirmed here), which represents an internal control validating our analysis.

      L163: Suggestion: Cite the GIP pipeline here rather than delaying the reference until L173.

      Response: corrected

      L188: "Controlled" may be a miswording. Consider replacing with "confirmed" or "validated."

      Response: corrected to 'validated'

      L214: Please specify which statistical test was used to assess differential expression at the protein level. L227: Similarly, clarify which statistical test was applied for determining differential expression in the phospho-proteomics data.

      Response: As noted in the Methods section, a limma t-test was applied to determine proteins/phosphoproteins with a significant difference in abundance while imposing a minimal fold change of 2 between the conditions to conclude that they are differentially abundant {Ritchie, 2015; Smyth, 2005}.

      __Results __ L337-339: The interpretation here is too speculative. Phrases like "suggesting" and "likely" are too strong given the evidence presented. Alternative explanations, such as mosaic variation combined with early-stage selective pressure in the culture environment, should be considered.

      Response: We thank the reviewers for these suggestions and have reformulated into: 'In the absence of convergent selection, it is impossible to distinguish if these gene CNVs provide some strain-specific advantage or are merely the result of random genetic drift.'

      L340: The "undulating pattern" mentioned is somewhat subjective. To support this interpretation, consider adding a moving average (or similar) line to Figure 3A, which would more clearly highlight this trend across the data points.

      Response: These lines have been added to Figure 1C (not 3A).

      L356: It may be more accurate to say "control of individual gene expression," since Leishmania does have promoters - the key distinction is that initiation does not occur on a gene-by-gene basis.

      Response: corrected

      L403-405: The statement "this is because these metabolites comprise a glycosomal succinate shunt..." should be rephrased as a hypothesis rather than a definitive explanation, as this causal link has not been experimentally validated.

      Response: Thank you for the comment - we followed your advice.

      L407: Replace "confirming" with "matching" to avoid overstating the agreement with previous observations.

      Response: corrected

      L408: Replace "correlated" with "matched" for more accurate interpretation of results.

      Response: corrected

      L433: It is unclear how differential RNA modifications were detected. Please specify which biological material was used, the number of replicates per life stage, and how statistical evaluation of differential modifications was performed.

      Response: This figure has now been updated using our statistically robust RNA-seq analysis conducted for the revision. See comments above.

      L436: This conclusion appears incomplete. While the manuscript mentions transcript-regulated proteins, it should also note that other proteins showed discordant mRNA/protein patterns. A more balanced conclusion would mention both the matching and non-matching subsets.

      Response: We thank the reviewer for this comment and have made the necessary adjustments to better balance this conclusion.

      L441: The phrase "poor correlation" overgeneralizes and lacks nuance. Earlier sections of the manuscript describe hundreds of genes where mRNA and protein levels correlate well, suggesting that mRNA turnover plays a key regulatory role. Please rephrase this sentence to clarify that poor correlation applies only to a subset of the data.

      Response: This has been corrected to 'The discrepancies we observed in a sub-set of genes between....'.

      L454: The claim that "epitranscriptomic regulation and stage-adapted ribosomes are key processes" should be supported with references. If this builds on previously published work, please cite it accordingly.

      Response: corrected

      L457: Proteasomal degradation is a well-established mechanism in Leishmania. These findings are interesting but should be presented in the context of existing literature (e.g. Silva-Jardim et al.2014, [PMID: 15234661]) rather than as entirely novel.

      Response: corrected

      L459: The authors shoumd add a microscopy image of promastigotes treated with lactacystin. This would provide insight into whether treatment affects morphology, as is known in T. cruzi (see Dias et al., 2008). It would be particularly informative if Leishmania behaves differently.

      Response: We added this information to Figure S7.

      L472 + L481: Table 9 shows several significant GO terms not discussed in the manuscript. Please clarify how the subset presented in the text was selected.

      Response: We added this information to the text ('some of the most significantly enrichment terms included ...').

      L482: The argument that a single master regulator can be excluded is unclear. Could the authors please elaborate on the reasoning or data supporting this conclusion?

      Response: This statement was too speculative and has been removed. Instead, we added 'Thus, Leishmania differentiation correlates with the expression of complex signaling networks that are established in a stage-specific manner'.

      L494: The term "unexpected" may not be appropriate here, as protein degradation is a well-established regulatory mechanism in trypanosomatids. Consider omitting this term to better reflect the field's current understanding.

      Response: We deleted the term as suggested and reformulated to '....our results confirm the important role of protein degradation....'.

      L543: The term "feedback loop" should be used more cautiously. The current data are correlative, and no interventional experiments are provided to support a causal regulatory loop between proteasomal activity and protein kinases. As such, this remains a hypothesis rather than a confirmed mechanism.

      Response: We fully agree and have toned down the entire manuscript, referring to feedback loops only as a hypothesis and not as a fact emerging from our datasets, which set the stage for future functional analyses.

      __Discussion __ L555: As noted in L494, reconsider using the word "unexpected."

      Response: removed

      L589: The data do not fully support the presence of stage-specific ribosomes. Rather, they suggest differential ribosomal function through changes in abundance and regulation. Please consider rephrasing.

      Response: We thank the reviewer for this comment and have follow the advice reformulating the sentence according to the suggestion.

      L657-658: The discussion of post-transcriptional and post-translational regulation of gene dosage effects would benefit from citing additional literature beyond the authors' own work. E.g. the study by Cuypers et al. (PMID: 36149920) offers a relevant and comprehensive analysis covering 4 'omic layers.

      Response: We apologize for this omission and now describe and cite this publication in the Results section when concluding the results shown in Figure 1.

      L659-664: The reference to deep learning for biomarker discovery appears speculative and loosely connected to the current findings. As no such methods were applied in the study, and the manuscript does not clarify what types of biomarkers are intended, this statement could be seen as aspirational rather than evidence-based. Consider either omitting or elaborating with clear justification.

      Response: We agree and have deleted this section.

      L690 + L705 (Figure 2): The phrase "main GO terms" is vague. Please clarify the criteria for selecting the GO terms shown - were they chosen based on adjusted p-value, enrichment score, or another metric? Additionally, define "cluster efficiency," explaining how it was calculated and what it represents.

      Response: Corrected to 'some of the most significantly enriched GO terms'.

      Signed: Bart Cuypers, PhD

      **Referee cross-commenting**

      Overall, I think the other reviewers' comments are fair. They seem to align particularly on the following points:

      1) Reviewers agree that this is a comprehensive body of work with original contributions to the field of Leishmania/trypanosomatid molecular biology, and that it will serve as a valuable reference for hypothesis generation.

      2) Several reviewers raise concerns about overinterpretation of the data, particularly regarding regulatory networks, regulons, and master regulators. The interpretation and large parts of the discussion are considered too speculative without additional functional validation.

      3) There are comments about the incorrect statistical treatment of missing values in the proteomics experiments, which affects confidence in some of the conclusions.

      4) While the correlation between the two RNA-Seq replicates is high, the decision to include only two biological replicates is seen as unfortunate and not ideal for statistical robustness.

      5) The use of lactacystin should be more clearly motivated, and its limitations discussed in the context of the experiments.

      Even though I did not remark on the last two points (4 and 5) in my own review, I agree with them.

      Response: We thank the reviewer for this cross-comparison, which served us as guide to revise our manuscript. We believe that we have responded to all these concerns.

      Reviewer #3 (Significance (Required)):


      This study provides a rich, integrative multi-omics dataset that advances our understanding of stage-specific adaptation in the transcriptionally unique parasite Leishmania. By dissecting the relative contributions of mRNA abundance and protein turnover to final protein levels across life stages, the authors offer valuable insights into post-transcriptional and post-translational regulation. The work represents a resource-driven yet conceptually informative contribution to the field, with comprehensive supplementary materials and transparent data sharing standing out as additional strengths.

      However, the mechanistic insights proposed are speculative in several places and require more cautious language. The study is most impactful as a resource and descriptive atlas, initiating hypotheses for future validation. The broad scientific community working on Leishmania, trypanosomatids, and post-transcriptional regulation in eukaryotes would benefit from this work.

      Response: We thank the reviewer for this positive assessment and have modified the manuscript to further emphasize its strength as an important resource to incite mechanistic follow-up studies.

      Field of reviewer expertise: multi-omics integration, bioinformatics, molecular parasitology, transcriptomics, proteomics, metabolomics, Leishmania, Trypanosoma.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)): __ __Summary:

      This study investigates the regulatory mechanisms underlying stage differentiation in Leishmania donovani, a parasitic protist. Pesher et al., aim to address the central question of how these parasites establish and maintain distinct life cycle stages in mostly the absence of transcriptional control. The authors employed a five-layered systems-level analysis comparing hamster-derived amastigotes and their in vitro-derived promastigotes. From those parasites, they performed a genomic, transcriptomic, proteomic, metabolomic and phosphoproteomic analysis to reveal the changes the parasites undertook between the two life stages.

      The main conclusion stated by the authors are:

      • The stage differentiation in vitro is largely independent of major changes in gene dosage or karyotype.
      • RNA-seq analysis identified substantial stage-specific differences in transcript abundance, forming distinct regulons with shared functional annotations. Amastigotes showed enrichment in transcripts related to amastins and ribosome biogenesis, while promastigotes exhibited enrichment in transcripts associated with ciliary cell motility, oxidative phosphorylation, and post-transcriptional regulation itself.

      • Quantitative phosphoproteome analysis revealed a significant increase in global protein phosphorylation in promastigotes. Normalizing phosphorylation changes against protein abundance identified numerous stage-specific phosphoproteins and phosphosites, indicating that differential phosphorylation also plays a crucial role in establishing stage-specific biological networks. The study identified recursive feedback loops (where components of a pathway regulate themselves) in post-transcriptional regulation, protein translation (potentially involving stage-specific ribosomes), and protein kinase activity. Reciprocal feedback loops (where components of different pathways cross-regulate each other) were observed between kinases and phosphatases, kinases and the translation machinery, and crucially, between kinases and the proteasomal system, with proteasomal inhibition disrupting promastigote differentiation.

      Response: We thank the reviewer for the time and implication dedicated to our manuscript.

      Comments:

      Further details are organised by order of apparition in the text:

      Comment 1: Material and Methods: while the authors are indicating some key parameters, providing the codes and scripts they used throughout the manuscript would improve reproducibility.

      Response: We thank the reviewer for this comment and added the URL for the codes to the data availability section.

      Comment 2: Why only 2 biological replicates for RNA while the others layers have 3 or 4?

      __Response: __We agree with the other reviewers and have repeated this analysis to have statistically more robust results.

      Comment 3: Is the slight but reproducible increase in median coverage observed for chr 1, 2, 3, 4, 6 and 20 stable on longer culture derived promastigotes and sandfly derived promastigotes ?

      Response: No, as published in Barja et al Nature EcolEvol 2017 (PMID: 29109466) and Bussotti et al PNAS 2023 (PMID: 36848551), these minor fluctuations are not predicting subsequent aneuploidies in long-term culture nor in sand fly-derived promastigotes. This information has been added to the text.

      Comment 4: Is this change of ploidy a culture adaptation representation rather than a life cycle event as the authors discuss later on? (This is probably an optional request that would be nice to include, if the authors have performed the sequencing of such parasites. Otherwise, it should be mentioned in the discussion).

      __Response: __Yes, this is a well-known culture adaptation phenomenon, on which we have published extensively. We added this conclusion and the references to the text.

      Comment 5: L333 "Likewise, stage differentiation was not associated with any major gene copy number variation (Figure 1C, Table 2)". The authors are looking here at steady differentiated stages rather than differentiation itself. "Likewise, stage differentiation was.." would be more appropriate.

      __Response: __We corrected this sentence to 'Likewise, differentiation of promastigotes was not associated with any major gene copy number variation at early passage 2'.

      Comment 6: L349-355: have the mRNA presenting change in abundance between stages been normalised by their relative DNA abundance ? Said otherwise, can the wave patterns observed at the genome level explain the respective mRNA level ? Can the authors plot in a similar way the enrichment scores in regards to the position on the genome and can the authors indicate if there is a positional enrichment in addition to the functional one they observe ? This may affect the conclusion in L356-358.

      Response: As noted above, we did not see any significant read depth changes at DNA level when comparing amastigotes and promastigotes. Thus there is no need to normalize the RNA-seq results to DNA read depth. Furthermore, in our comparative transcriptomics analysis, we only consider 2-fold or higher changes in mRNA abundance (which is far beyond the non-significant read depth change we have observed on DNA level). Manual inspection of the enrichment scores with respect to position did not reveal any significant signal (other than revealing some over-represented tandem gene arrays where all gene copies share the same location and GO term).

      Comment 8: L415 "stage-specific expression changes correlate between protein and RNA levels, suggesting that the abundance of these proteins is mainly regulated by mRNA turn-over". Overstatement. Correlation does not suggest causation. "suggesting that the abundance of these proteins could be regulated by mRNA turn-over" would be more appropriate.

      Response: We thank the reviewer for this comment and have corrected the statement accordingly.

      Comment 9: Figure 3B, could the authors clarify what are the "unique genes" that are on the infinite quadrants? It seems these proteins are identified in one stage and not the other. This implies that the corresponding missing values are missing non-at random (MNAR). Rather than removing those proteins containing NMAR from the differential expression analysis, the authors should probably impute those missing values. Methods of imputation of NMAR and MAR can be found in the literature. Indeed, the level of expression in one stage of those proteins is now missing, while it could strongly affect the conclusions the authors are drawing in figure 4E regarding the proteins targeted for degradation and rescued in presence of the proteasome inhibitor.

      Response: We thank the reviewer for this important comment. However, we would like to clarify several key points regarding the treatment of proteins identified in only one condition.

      First, the reviewer assumes that proteins identified in one stage but not the other are necessarily missing not-at-random (MNAR). However, this cannot be definitively established, as these missing values could equally be missing completely at random (MCAR). Without additional information, categorizing them specifically as MNAR may be an oversimplification. More importantly, we have concerns about the reliability of imputation methods in this specific context. Algorithms designed to impute MNAR values (such as QRILC) replace absent data using random sampling from arbitrary probability distributions, typically assuming low intensity values. However, when no intensity value has been detected or quantified for a protein in a given condition, imputing an arbitrary low value raises significant concerns about data interpretation. Such imputed values would not reflect actual measurements but rather statistical assumptions that could introduce bias into downstream analyses. For instance, imputed values could lead to the conclusion that a protein is not differentially abundant, when in reality it is detected in one condition but completely absent in the other. In our view, there are two biologically plausible scenarios: either these proteins are expressed at levels below our detection threshold, or they are genuinely absent (or present at negligible levels) in the corresponding stage. Rather than introducing potentially misleading imputed values, we chose to treat these as genuine stage-specific differences (presence/absence), which results in infinite fold-changes in Figure 3B. Critically, our approach is strongly supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stage-specific proteins, providing biological coherence to these findings. These converging lines of evidence (proteomics, transcriptomics, and functional enrichment) strengthen our confidence that these represent biologically meaningful differences rather than technical artifacts.Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions.To clarify this section, we modified the text as follows: 'Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p Comment 10: L430-435 "These data fit with the GO [...] the ribosome translational activity (34)." This discussion feels out of place and context. It is too speculative and with little support by the data presented at this stage of the manuscript. It should be removed as Figure 3E or could be placed in the discussion and supplementary information.

      Response: We agree with the reviewer. In response to a comment from reviewer 1, we have moved both panels to Figure 2, which much better integrates these data.

      Comment 10: The authors present an elegant way to show stage specific degradation through the comparison of stage specific proteasome blockages that show rescue in ama of proteins present in pro and vice versa. L494 "reveal an unexpected but substantial" the term unexpected is inappropriate, as several studies have shown in kinetoplastids the essential role of protein turnover through degradation / autophagy during differentiation. Furthermore the conclusions may be strongly affected by the level of expression of the proteins in the infinite quadrants as we discussed above, and should be revised accordingly.

      Response: We rephrased the conclusion to 'In conclusion, our results confirm the important role of protein degradation in regulating the L. donovani amastigote and promastigote proteomes and identify protein kinases as key targets of stage-specific proteasomal activities.' Please see the response to comment 9 regarding the unique proteins.

      Comment 11: L518 "These data reveal a surprising level of stage-specific phosphorylation in promastigotes, which may reflect their increased biosynthetic and proliferative activities compared to amastigotes." Overstatement. Could also be due to culture adaptation - What is the overlap of stage-specific phosphorylations with previous published datasets in other species of Leishmania? Looking at such comparisons could help to decipher the role of culture adaptation response, species specificity and true differentiation conserved mechanisms.

      Response: We agree with the reviewer and have toned this statement down by adding the statement '....or simply be a consequence of culture adaptation'.

      Comment 12: The discussion is extremely speculative. While some speculation at this stage is acceptable, claiming direct link and feedback without further validation is probably far too stretched. For example, the changes of phosphorylation observed on particular sets of proteins, such as phosphatase and DUBs, need to be validated for their respective change of protein activity in the direction that fits the model of the authors. Those discussions should be toned down.

      Response: We agree with the reviewer and have strongly toned down the entire discussion, emphasizing the hypothesis-building character of our results, which provide a novel framework for future experimental analyses.

      Comment 13: A couple of typos:

      • In the phosphoproteome analysis section, "...0,2 % DCA..." should be "...0.2 % DCA..." (use a decimal point).

      • L225 "...peptide match was disable." should be "...peptide match was disabled."

      Response: both corrected

      __Reviewer #4 (Significance (Required)): __

      While there is not too much novelty around the emphasis of gene expression at post-translational level in kinetoplastid organisms, the scale of the work presented here, looking at 5 layers of potential regulations, is. Therefore, this study represents a substantial amount of work and provides interesting and comprehensive datasets useful for the parasitology community.

      Response: We thank the reviewer for this positive statement.

      Several potential concerns regarding the biological meaning of the findings were identified. These include the limitations of in vitro systems promastigote differentiation potentially limiting the conclusions, the challenge of inferring causality from correlative "omics" data, and the complexities of functional interpretation of changes in phosphorylation and metabolite levels. The proposed feedback loops and functional roles of specific molecules would require further experimental validation to confirm their biological relevance in the natural life cycle of Leishmania, but that would probably fall out of the scope of this manuscript.

      Response: We agree with the reviewer and have modified pour manuscript throughout to remove any causal relationships. Indeed, this work is setting the stage for future investigations on dissecting some of the suggested regulatory mechanisms.

      Area of expertise of the reviewers: Kinetoplastid, Differentiation, Signalling, Omics

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate how UVC-induced DNA damage alters the interaction between the mitochondrial transcription factor TFAM and mtDNA. Using live-cell imaging, qPCR, atomic force microscopy (AFM), fluorescence anisotropy, and high-throughput DNA-chip assays, they show that UVC irradiation reduces TFAM sequence specificity and increases mtDNA compaction without protecting mtDNA from lesion formation. From these findings, the authors suggest that TFAM acts as a "sensor" of damage rather than a protective or repair-promoting factor.

      Strengths:

      (1) The focus on UVC damage offers a clean system to study mtDNA damage sensing independently of more commonly studied repair pathways, such as oxidative DNA damage. The impact of UVC damage is not well understood in the mitochondria, and this study fills that gap in knowledge.

      (2) In particular, the custom mitochondrial genome DNA chip provides high-resolution mapping of TFAM binding and reveals a global loss of sequence specificity following UVC exposure.

      (3) The combination of in vitro TFAM DNA biophysical approaches, combined with cellular responses (gene expression, mtDNA turnover), provides a coherent multi-scale view.

      (4) The authors demonstrate that TFAM-induced compaction does not protect mtDNA from UVC lesions, an important contribution given assumptions about TFAM providing protection.

      Weaknesses:

      (1) The authors show a decrease in mtDNA levels and increased lysosomal colocalization but do not define the pathway responsible for degradation. Distinguishing between replication dilution, mitophagy, or targeted degradation would strengthen the interpretation

      We thank the reviewer for their careful reading of our manuscript and thoughtful suggestions. We agree that distinguishing between replication dilution, mitophagy, and/or targeted degradation would strengthen our understanding of how UV-induced DNA damage is handled in the mitochondria. Currently we are undertaking experiments to tease this apart, but consider the scope of those experiments to be beyond this manuscript and expect to publish them in a subsequent paper rather than this one. We added text explicitly stating that these possibilities are not distinguished by our results in pages 8-9 in the Discussion under the subsection ‘Mitochondria respond to UVC-induced mtDNA damage in the absence of apparent mitochondrial dysfunction’.

      (2) The sudden induction of mtDNA replication genes and transcription at 24 h suggests that intermediate timepoints (e.g., 12 hours) could clarify the kinetics of the response and avoid the impression that the sampling coincidentally captured the peak.

      We agree and have added additional timepoints of 12 hours and 18 hours post exposure. We have updated Figure 2 to include the new data and have added text on page 4 to include these results.

      (3) The authors report no loss of mitochondrial membrane potential, but this single measure is limited. Complementary assays such as Seahorse analysis, ATP quantification, or reactive oxygen species measurement could more fully assess functional integrity.

      We focused on membrane potential because loss of membrane potential is such a well-understood of mechanism for triggering mitophagy, but agree that these additional measurements are useful. We have added experiments to assess ATP levels, but did not see changes; we have added this data to Figure 2. We have also added text highlighting that we previously assessed mtROS following the same levels of UV exposure and observed no changes (in the results section on page 5 and in the discussion section on page 9). Given that we observe no changes in membrane potential or ATP, we have opted to not move forward with Seahorse analysis for the purposes of this paper.

      (4) The manuscript briefly notes enrichment of TFAM at certain regions of the mitochondrial genome but provides little interpretation of why these regions are favored. Discussion of whether high-occupancy sites correspond to regulatory or structural elements would add valuable context.

      We agree a discussion of these findings provides context and insight into where the field is currently in understanding TFAM sequence specificity. We have updated text in the discussion (pages 9-10) to include our thoughts on the drivers of TFAM sequence specificity with regard to the discrepancy with the anisotropy data and the lack of overlap with regulatory/structural elements.

      (5) It remains unclear whether the altered DNA topology promotes TFAM compaction or vice versa. Addressing this directionality, perhaps by including UVC-only controls for plasmid conformation, would help disentangle these effects if UVC is causing compaction alone.

      We have added an additional control making this comparison and updated the text on page 7 in the results section. UVC by itself (without TFAM being present) does not alter the plasmid compaction; see new supplemental Figure S16.

      (6) The authors provide a discrepancy between the anisotropy and binding array results. The reason for this is not clear, and one wonders if an orthogonal approach for the binding experiments would elucidate this difference (minor point).

      The discrepancy between anisotropy and the binding array results is certainly unusual and contrary to previous studies that have used these arrays. In addition to the anisotropy experiments, we selected a ‘high occupancy’ and ‘low occupancy’ sequence from the binding array and performed oligomerization experiments using atomic force microscopy, which allowed us to detect small changes in cooperativity (see supplemental Figure S15). We previously only discussed this briefly in the results section on page 6, but we have now updated the discussion section (pages 9-10) to highlight this finding and put forth ideas for the field as to why we think this might be the case. While we do see that the binding array data aligns with oligomerization and cooperativity of TFAM, we still do not know what it is about these sequences that would drive such differences in TFAM binding, but we speculate that it could have something to do with flexibility of the DNA sequences.

      Assessment of conclusions:

      The manuscript successfully meets its primary goal of testing whether TFAM protects mtDNA from UVC damage and the impact this has on the mtDNA. While their data points to an intriguing model that TFAM acts as a sensor of damaged mtDNA, the validation of this model requires further investigation to make the model more convincing. This is likely warranted for a follow-up study. Also, the biological impact of this compaction, such as altering transcription levels, is not clear in this study.

      We have updated wording in the Abstract, Introduction, and elsewhere in the text (as detailed in other portions of our response) to make as explicit and clear as possible which results are supported by the in vitro versus in vivo data, and which parts are conclusions supported by the data versus hypothesized models to be tested in future work.

      Impact and utility of the methods:

      This work advances our understanding of how mitochondria manage UVC genome damage and proposes a structural mechanism for damage "sensing" independent of canonical repair. The methodology, including the custom TFAM DNA chip, will be broadly useful to the scientific community.

      Context:

      The study supports a model in which mitochondrial genome integrity is maintained not only by repair factors, but also by selective sequestration or removal of damaged genomes. The demonstration that TFAM compaction correlates with damage rather than protection reframes an interesting role in mtDNA quality control.

      Reviewer #2 (Public review):

      Summary:

      King et al. present several sets of experiments aimed to address the potential impact of UV irradiation on human mitochondrial DNA as well as the possible role of mitochondrial TFAM protein in handling UV-irradiated mitochondrial genomes. The carefully worded conclusion derived from the results of experiments performed with human HeLa cells, in vitro small plasmid DNA, with PCR-generated human mitochondrial DNA, and with UV-irradiated small oligonucleotides is presented in the title of the manuscript: "UV irradiation alters TFAM binding to mitochondrial DNA". The authors also interpret results of somewhat unconnected experimental approaches to speculate that "TFAM is a potential DNA damage sensing protein in that it promotes UVC-dependent conformational changes in the [mitochondrial] nucleoids, making them more compact." They further propose that such a proposed compaction triggers the removal of UV-damaged mitochondrial genomes as well as facilitates replication of undamaged mitochondrial genomes.

      Strengths:

      (1) The authors presented convincing evidence that a very high dose (1500 J/m2) of UVC applied to oligonucleotides covering the entire mitochondrial DNA genome alleviates sequence specificity of TFAM binding (Figure 3). This high dose was sufficient to cause UV lesions in a large fraction of individual oligonucleotides. The method was developed in the lab of one of the corresponding authors (reference 74) and is technically well-refined. This result can be published as is or in combination with other data.

      (2) The manuscript also presents AFM evidence (Figure 4) that TFAM, which was long known to facilitate compaction of the mitochondrial genome (Alam et al., 2003; PMID 12626705 and follow-up citations), causes in vitro compaction of a small pUC19 plasmid and that approximately 3 UVC lesions per plasmid molecule result in a slight, albeit detectable, increase in TFAM compaction of the plasmid. Both results can be discussed in line with a possible extrapolation to in vivo phenomena, but such a discussion should include a clear statement that no in vivo support was provided within the set of experiments presented in the manuscript.

      We thank this reviewer for their careful reading and interpretation of the manuscript. We agree that discussion of in vivo implications and extrapolations need clear statements indicating where there is not currently in vivo support. We have updated the text throughout the paper to include this.

      Weaknesses:

      Besides the experiments presented in Figures 3 and 4, other results do not either support or contradict the speculation that TFAM can play a protective role, eliminating mitochondrial genomes with bulky lesions by way of excessive compaction and removing damaged genomes from the in vivo pool.

      To specify these weaknesses:

      (1) Figure 1 - presents evidence that UVC causes a reduction in the number of mitochondrial spots in cells. The role of TFAM is not assessed.

      We are working to understand the role of TFAM in vivo following UV irradiation, but believe that work should be included in follow up studies rather than this publication.

      (2) Figure 2 - presents evidence that UVC causes lesions in mitochondrial genomes in vivo, detectable by qPCR. No direct assessment of TFAM roles in damage repair or mitochondrial DNA turnover is assessed despite the statements in the title of Figure 2 or in associated text. Approximately 2-fold change in gene expression of TFAM and of the three other genes does not provide any reasonable support to suggestion about increased mitochondrial DNA turnover over multiple explanations on related to mitochondrial DNA maintenance.

      We agree and have updated the title of Figure 2 to better reflect the findings outlined in the figure as well as the text.

      The new title is, “UVC causes mtDNA damage that decreases over time and is associated with upregulation of mtDNA replication genes, in the absence of apparent mitochondrial dysfunction.”

      We agree that there are numerous mechanistic hypotheses that could explain the decrease in mtDNA damage over time. In Figure 1, we show that there is an overall decrease in mtDNA spots, and an increase in mtDNA-lysosome colocalization, suggestive of mtDNA degradation, which could serve to remove damaged genomes. One possibility is that TFAM is playing a role in the damage removal (but not repair per cell as these lesions are not repaired). Another is changes in mtDNA turnover via increasing the replication machinery in order the synthesize non-damaged mtDNA molecules to dilute out damage. These and other possibilities are not mutually exclusive. We have added text (pages 8-9) to make explicit that additional work will be required to distinguish these possibilities. We note that we have also added an additional experiment showing that TFAM knockdown affects mtDNA damage at baseline, as well as after UVC exposure (Figure 5J).

      (3) Figure 5. Shows that TFAM does not protect either mitochondrial nucleoids formed in vitro or mitochondrial DNA in vivo from UVC lesions as well as has no effect on in vivo repair of UV lesions.

      We agree that Figure 5 shows that TFAM does not protect DNA from UVC-induced lesions, and that a roughly 2-fold increase in TFAM protein does not alter damage reduction over time. We have added new data showing that in vivo, knockdown of TFAM results in an increase in baseline (control conditions) mtDNA damage, and also alters the rate of decrease of mtDNA damage over time after UVC (Figure 5J).

      (4) Figure 6: Based on the above analysis, the model of the role of TFAM in sensing mtDNA damage and elimination of damaged genomes in vivo appears unsupported.

      We have updated the legend for Figure 6 in which we outline our hypothesized role of TFAM in sensing mtDNA damage to ensure that readers know this has yet to be fully tested in vivo. We have also updated the Figure legend title from “proposed model” to “hypothesized model,” and changed the wording in the conclusion section (page 11) to highlight more clearly that this is a working model.

      (5) Additional concern about Figure 3 and relevant discussion: It is not clear if more uniform TFAM binding to UV irradiated oligonucleotides with varying sequence as compared to non-irradiated oligonucleotides can be explained by just overall reduced binding eliminating sequence specific peaks.

      We do not believe this is the case given the similar K<sub>D</sub> values for the sequences tested. In our hands and in other publications (reviewed in PMID: 34440420), it has been well established that TFAM binds damaged DNA very well—essentially just as well as nondamaged DNA or better.

      Additionally, a reduction in overall binding on these DNA arrays tends to make sequence specific peaks more apparent. We ran our experiments at both 30 nM and 300 nM TFAM specifically to be able to assess this question. The 300 nM data can be found in supplemental Figure S7. In this figure, we notice that the peaks appear more uniform at the high concentration (comparing Figure 3A to Figure S7A). That is presumably because there is so much more binding happening across the array that the peaks associated with the strongest binders become less pronounced. For the sake of brevity, we have not added this reasoning to the text, but are willing to do so if the Reviewers and Editor feel that it is important to include.

      Reviewer #3 (Public review):

      Summary:

      The study is grounded in the observations that mitochondrial DNA (mtDNA) exhibits a degree of resistance to mutagenesis under genotoxic stress. The manuscript focuses on the effects of UVC-induced DNA damage on TFAM-DNA binding in vitro and in cells. The authors demonstrate increased TFAM-DNA compaction following UVC irradiation in vitro based on high-throughput protein-DNA binding and atomic force microscopy (AFM) experiments. They did not observe a similar trend in fluorescence polarization assays. In cells, the authors found that UVC exposure upregulated TFAM, POLG, and POLRMT mRNA levels without affecting the mitochondrial membrane potential. Overexpressing TFAM in cells or varying TFAM concentration in reconstituted nucleoids did not alter the accumulation or disappearance of mtDNA damage. Based on their data, the authors proposed a plausible model that, following UVC-induced DNA damage, TFAM facilitates nucleoid compaction, which may serve to signal damage in the mitochondrial genome.

      Strengths:

      The presented data are solid, technically rigorous, and consistent with established literature findings. The experiments are well-executed, providing reliable evidence on the change of TFAM-DNA interactions following UVC irradiation. The proposed model may inspire future follow-up studies to further study the role of TFAM in sensing UVC-induced damage.

      Weaknesses:

      The manuscript could be further improved by refining specific interpretations and ensuring terminology aligns precisely with the data presented.

      (1) In line 322, the claim of increased "nucleoid compaction" in cells should be removed, as there is a lack of direct cellular evidence. Given that non-DNA-bound TFAM is subject to protease digestion, it is uncertain to what extent the overexpressed TFAM actually integrates into and compacts mitochondrial nucleoids in the absence of supporting immunofluorescence data.

      We would like to thank this reviewer for their comments and suggestions. We feel these specific language changes have strengthened the interpretability of the text. The TFAM overexpression cells used in this experiment were given to us by Isaac et al., who demonstrated that when TFAM was overexpressed in this specific cell line, the nucleoids were indeed more compact, measured by Fiber-seq (Isaac et al., 2024; PMID: 38347148). We have removed the claim “increased compaction” from the section title, Figure 5 legend title, and from line 322 (now on page 8), and have also added an additional sentence to ensure the reader knows these cells have been shown to have presumed increased compaction by other groups.

      (2) In lines 405 and 406, the authors should avoid equating TFAM overexpression with compaction in the cellular context unless the compaction is directly visualized or measured.

      We have updated the text to ensure that it is clear that this was tested by other groups. We also changed the wording to “inaccessible (presumably compacted) nucleoids.” While we did not demonstrate altered compaction in our study, we think that based on the results from Isaac et al., it is likely that there was increased compaction. In addition, some readers might not have the context to make the connection between compaction and accessibility, so eliminating all reference to compaction could obscure the point.

      (3) In lines 304 and 305 (and several other places throughout the manuscript), the authors use the term "removal rates". A "removal rate" requires a direct comparison of accumulated lesion levels over a time course under different conditions. Given the complexity of UV-induced DNA damage-which involves both damage formation and potential removal via multiple pathways-a more accurate term that reflects the net result of these opposing processes is "accumulated DNA damage levels." This terminology better reflects the final state measured and avoids implying a single, active 'removal' pathway without sufficient kinetic data.

      We agree and have updated the language throughout the text as well as the results heading for this section.

      (4) In line 357, the authors refer to the decrease in the total DNA damage level as "The removal of damaged mtDNA". The decrease may be simply due to the turnover and resynthesis of non-damaged mtDNA molecules. The term "removal" may mislead the casual reader into interpreting the effect as an active repair/removal process.

      We agree and have restructured this sentence for clarity. We do believe there is some removal happening, given the increase in mtDNA colocalization in lysosomes alongside decrease of mtDNA spots in our live cell imaging. We have written it to reflect the inclusion of removal and resynthesis of nondamaged mtDNA molecules (see pages 8-9).

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers appreciate the quality of the presented data but concur that they do not support the primary claims in the title and abstract. The reviewers also realize that in vivo evidence for the model would require extensive new experimentation that goes beyond a reasonable revision. The recommendation is to change the title and significantly revise text, figure titles and legends for transparency, and conclusions within results and discussion sections.

      We thank the editor and all the reviewers for their feedback. We have added additional experiments, updated text throughout the entire paper to ensure our claims are supported, and revised our title. We feel that the changes we have made have indeed made the paper stronger, more transparent, and that the evidence put forth in this paper provides support for all claims made.

      Reviewer #1 (Recommendations for the authors):

      (1) Clarify mitochondrial response kinetics by adding an intermediate (e.g., 12 hrs) recovery timepoint for transcriptional analysis to resolve when TFAM and replication genes are induced.

      We have added additional timepoints of 12 and 18 hours following exposure in Figure 2. These results strengthen our finding that the nuclear transcriptional program supporting mtDNA replication appears to be activated prior to the nuclear transcriptional program supporting mitochondrial transcription, in that POLG and TFAM come up before POLRMT and ND1.

      (2) Strengthen functional readouts by assessing additional parameters of mitochondrial function to substantiate the claim that UVC does not impair mitochondrial performance.

      We have referenced our previously-published data on mtROS and added a measurement of ATP following UVC exposure in Figure 2.

      (3) Consider exploring whether mtDNA degradation occurs via mitophagy, nucleoid-phagy, or another pathway-potentially by using inhibitors or markers of these processes.

      While we agree that this is an important follow up question and are currently working on experiments to address this, those experiments are outside the scope of this manuscript.

      (4) Provide additional details for the high occupancy TFAM sites. Provide brief annotation or discussion of genomic regions showing strong TFAM binding under non-irradiated conditions that are lost during UVC treatment. This would be helpful to the field as a whole.

      We have updated our discussion section to include this.

      (5) Include or discuss a control using UVC irradiated pUC19 without TFAM to confirm that observed compaction categories are TFAM dependent rather than an UVC induced DNA distortion.

      We have added in a supplemental figure (Figure S16) containing comparison of area analysis of control pUC19 and UV-irradiated pUC19 and we have added associated text in the results section of the paper.

      (6) It would be interesting to explore the link between compaction to transcriptional output. In the TFAM overexpression model, the authors could measure expression of mtDNA encoded transcripts (e.g., ND1, COX1) to connect increased compaction with altered mitochondrial transcription.

      While we agree that understanding how the compactional status alters mitochondrial transcription is worthwhile, we believe this is beyond the scope of this paper. Furthermore, this connection has previously been shown by Bruser et al., 2021 (PMID: 34818548) who showed that more compact nucleoids are not undergoing active transcription. It will be interesting to see in future work if mtDNA damage drives changes in both compaction as well as transcriptional activity.

      (7) Clarify quantitative presentation in figure 2F to explicitly note whether the observed increase in fluorescence intensity was statistically insignificant and confirm that the assay sensitivity is sufficient to detect small potential changes. As presented it is not clear if there is a change.

      We have changed the presentation of Figure 2F. There is a slight increase in membrane potential at the 24-hour time point and we have made that clear in the text as well. We included FCCP as a (standard) positive control, for which we can detect the associated decrease in membrane potential for. While it is always possible that a very small decrease occurred that we were unable to detect, we note that none of the six UVC-exposed groups that we tested even trended towards a decrease in MMP, making it less likely that there was an effect that we simply lacked the power or sensitivity to detect.

      (8) It would be interesting if the authors can comment on whether TFAM induced compaction after UVC might shield mtDNA from other, repairable lesions (e.g., oxidative or alkylation damage), offering a broader context for this mechanism beyond just UVC.

      In theory, we believe this is possible. It will also be interesting to see if the increased compaction following UVC also protects or shields the mtDNA from other enzymatic processes, such as repair proteins that may be searching for repairable lesions such as oxidative or alkylation damage. In this case, it seems as though the increased compaction would prevent the repair from happening at genomes harboring damage.

      In this study we show with our in vitro nucleoids that the increased compaction does not protect against UVC, but this is likely because UVC does not need physical access to the DNA in order to damage it, as the wavelengths of UVC (centered in this case at 254nm) are readily absorbed by proteins and thus can go right through the proteins. Currently, we know that increased compaction by TFAM makes the DNA inaccessible to the enzymes required to methylate DNA used in Fiber-seq (PMID: 38347148), but we do not know if the compaction is tight enough to prevent ROS or alkylating agents from damaging the DNA. We have updated text in the discussion on page 10 to highlight some of these ideas.

      Reviewer #2 (Recommendations for the authors):

      Please, go over all display items and text and clarify details that can help readers to understand important specifics of the experiments. Examples are provided below:

      (1) Abstract and Introduction - indicate species and cell line

      We have updated the text to include this information.

      (2) Table 1 "TFAM KD measurements"- title and footnotes are entirely cryptic. Please, clarify the experimental design, question(s) addressed and conclusions drawn from data.

      We have updated the title of Table 1 to "Binding of TFAM to array sequences, measured using fluorescence anisotropy,” and clarified the footnotes to make sure it is clear which sequences were selected for AFM oligomerization experiments.

      (3) Figure 3 and Material and Methods - specify UVC dose.

      We have added this information to both the figure legend and the methods section.

      (4) Figure 4 - specify UVC dose.

      We have added this information to the figure legend.

      (5) Figure 5. Panel B indicate which band is TFAM and which is HA-tag; Indicate clearly which panel is showing in vivo or in vitro results.

      We have updated the figure to label the untagged TFAM and HA-tagged TFAM and changed the panel titles to specify if they are in vivo results.

    1. In what situations would impromptu speaking be used? Since we’ve already started thinking of the similarities between public speaking and conversations, we can clearly see that most of our day-to-day interactions involve impromptu speaking. When your roommate asks you what your plans for the weekend are, you don’t pull a few note cards out of your back pocket to prompt your response. This type of conversational impromptu speaking isn’t anxiety inducing because we’re talking about our lives, experiences, or something we’re familiar with. This is also usually the case when we are asked to speak publicly with little to no advance warning. For example, if you are at a meeting for work and you are representing the public relations department, a colleague may ask you to say a few words about a recent news story involving a public relations misstep of a competing company. In this case, you are being asked to speak on the spot because of your expertise. A competent communicator should anticipate instances like this when they might be called on to speak, so they won’t be so surprised. Of course, being caught completely off guard or being asked to comment on something unfamiliar to you creates more anxiety. In such cases, do not pretend to know something you don’t, as that may come back to hurt you later. You can usually mention that you do not have the necessary background information at that time but will follow up later with your comments.

      This reading explains that each delivery method—impromptu, manuscript, and memorized—has specific strengths and weaknesses depending on the speaking situation. I found it interesting that impromptu speaking, although anxiety-inducing, can actually strengthen public speaking skills because it forces speakers to think quickly and organize ideas on the spot. However, it also carries the risk of rambling or overstating knowledge. Manuscript delivery, on the other hand, offers precision and consistency, especially for complex information, but often reduces audience engagement because the speaker may sound like they are reading rather than speaking naturally.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This work shows that resistance profiles to a variety of drugs are variable between different mycobacterial species and are not correlated with growth rate or intrabacterial compound concentration (at least for linezolid, bedaquiline, and Rifampicin). Note that intrabacterial compound concentration does not distinguish between cytosolic and periplasmic/cell wall-associated drugs. The susceptibility profiles for a wide range of mycobacteria tested under the same conditions against 15 commonly used antimycobacterial drugs provide the first recorded cross-species comparison which will be a valuable resource for the scientific community. To understand the reasons for the high Rifampicin resistance seen in many mycobacteria, the authors confirm the presence of the arr gene known to encode a Rif ribosyltransferase involved in Rif resistance in M. smegmatis in the resistant mycobacteria after confirming the absence of on-target mutations in the RpoB RRDR. Metabolomic analyses confirm the presence of ribosylated Rif in some of the naturally resistant mycobacteria which may not be entirely surprising but an important confirmation. Presumably M. branderi is highly resistant despite lacking the arr homolog due to the rpoB S45N mutation. M. flavescens has an MIC similar to that of M. smegmatis, despite having both Arr-1 and Arr-X. Various Arr-1 and Arr-X proteins are expressed and characterized for catalytic activity which shows that Arr-X is a faster enzyme,, especially with respect to more hydrophobic rifamycins. M. flavescens has similar MIC values to Rifapentine and Rifabutin to M. smegmatis. Thus, the Arr-1 versus Arr-X comparison does not provide a complete explanation for the underlying reasons driving natural Rif resistance in mycobacteria. Downregulation of Arr-X expression in M. conceptionense confers increased sensitivity to Rifabutin confirming its role as a rifamycin-inactivating enzyme.

      Overall, the comparison of cross-species susceptibility profiles is novel; the demonstration that MIC is not correlated with intracellular drug concentration is important but not sufficiently interrogated, the demonstration that Arr-X is also a Rif ADP-ribosyltransferase is a good confirmation and shows that it is more efficient than Arr-1 on hydrophobic rifamycins is interesting but maybe not entirely surprising. The manuscript seems to have two parts that are related, but the rifamycin modification aspect of the work is not strongly linked to the first part since it interrogates the modification of one drug but not the common cause of natural resistance for other drugs.

      Reviewer #2 (Public review):

      Summary:

      The authors use a variety of methods to investigate the mechanisms of innate drug resistance in mycobacteria. They end up focusing on two primary determinants - drug accumulation, which correlates rather poorly with resistance for many species, and, for the rifamycins, ADP-ribosyltransferases. The latter enzymes do appear to account for a good deal of resistance, though it is difficult to extrapolate quantitatively what their relative contributions are.

      Overall, they make excellent use of biochemical methods to support their conclusions. Though they set out to draw very broad lessons, much of the focus ends up being on rifamycins. This is still a very interesting set of conclusions.

      Strengths:

      (1) A very interesting approach and set of questions.

      (2) Outstanding technical approaches to measuring intracellular drug concentrations and chemical modification of rifamycins.

      (3) Excellent characterization of variant rifamycin ADP-ribosyltransferases

      Weaknesses:

      (1) Figure 3c/d: These panels show the same experiment done twice, yet they display substantially different results in certain cases. For instance, M. smegmatis appears to show an order of magnitude lower RIF accumulation in panel d compared to M. flavescens, despite them displaying equal accumulation in panel c. The authors should provide justification for this variation, particularly as quantitative intra-species comparisons are central to the conclusions of this figure.

      The data in panels 3c and 3d are from different sets of experiments. The reviewer is correct with regards to M. smegmatis. The data indeed is ~ 1 order of magnitude different. However, the data for other species is very similar. The reviewer may also have noticed that the error bars are also larger in 3d, compared to 3c, indicating a greater variation between independent experiments use in 3d. We do not have a good explanation for this, other than the experiments shown in 3d were associated with greater biological variability.

      (2) There are several technical concerns with Figure 3 that affect how to interpret the work. According to the methods, the authors did not appear to normalize to an internal standard, only to an external antibiotic standard (which may account for some of the technical variation alluded to above).

      We agree that using a labeled drug as an internal standard (IS) would be ideal. However, the experiment initially followed an untargeted metabolomics approach, which later shifted to relative drug quantification. At that stage, normalizing with IS was impractical because proper implementation would require multiple IS across the chromatographic range. Therefore, we opted for total ion current (TIC) normalization, which accounts for variability in overall metabolite abundance—even though the experimental setup was already adjusted for each bacterial species’ growth rate. Additionally, we prepared external standard curves for each drug to enable quantification, and the amount of drug added to each plate was considered when reporting these values.

      Second, the authors used different concentrations of drug for each species to try to match the species' MICs. I appreciate the authors' thinking on this, but I think for an uptake experiment it would be more appropriate to treat with the same concentration of drug since uptake is likely saturable at higher drug concentrations. In the current setup, for the species with higher MIC, they have to be able to uptake substantially more antibiotics than the species with low MIC in order to end up with the same normalized uptake value in Figure 3d. It would be helpful to repeat this experiment with a single drug concentration in the media for all species and test whether that gives the same results seen here.

      We respectfully disagree with the reviewer. Experiments such as the one proposed by the review work well when MIC values are a few fold apart, for strains of the same species, but have not been tested when MIC values are 100-1000-fold apart, with different species. Furthermore, what would be the interpretation of compound uptake at 1000-fold the MIC for one species and MIC level for another? By using antibiotic concentrations at the respective MIC for each species we are at least under conditions where we know the biological effect of the antibiotic across species is the same, based on its potency.

      (3) Figure 4f: This panel seems to argue against the idea that the efficacy of RIF ribosylation is what's driving drug susceptibility. M. flavescens is similarly resistant to RIF as M. smegmatis, yet M. flavescens has dramatically lower riboslyation of RIF. This is perhaps not surprising, as the authors appropriately highlight the number of different rif-modifying enzymes that have been identified that likely also contribute to drug resistance. However, I do think this means that the authors can't make the claim that the resistance they observe is caused by rifamycin modification, so those claims in the text and figure legend should be altered unless the authors can provide further evidence to support them. This experiment also has results that are inconsistent with what appears to be an identical experiment performed in Supplemental Figure 5b. The authors should provide context for why these results differ.

      In regard to enzyme efficiency, the apparent rate of all Arr-1 is relatively similar in converting RIF into ADP-Ribosyl-Rif between species. However, Arr-X is much more efficient when compared to Arr-1 in both M. flavescents and M. conceptionense. This is indicated by the apparent rate measured and displayed on figure 5c.

      Proteomics data shows that there is upregulation of Arr-1 and Arr-X upon rifampicin treatment in M. flavescens and M. conceptionense. However, the same experiment was not performed in Arr-1 KD. Therefore, we can’t verify through this approach if the activity observed in vivo directly correlates with a higher expression of Arr-X alone. Of note, likely both enzymes contribute to resistance to rifamycins, as per our results with the Arr-X KD and sensitization of M. conceptionense to RIF.

      Author response image 1.

      It is also worth mentioning that there are other enzymes in the pathway of RIF ribosylation and their efficiency is unknown (Author response image 2). Therefore ADP-Ribosyl-RIF It is not an “end-metabolite” and maybe not the sole determinant of RIF resistance via ADP-ribosylation. Downstream enzymes can also account for the difference observed between M. flavescens and M. smegmatis.

      Author response image 2.

      It is correct that the Rifampicin MIC for M. flavescens is the same as M. smegmatis.

      (4) Fig 4f/5c: M. flavescens has both Arr-1 and Arr-X, yet it appears to not have ribosylated RIF. This result seems to undermine the authors' reliance on the enzyme assay shown in Fig 5c - in that assay, M. flavescens Arr-X is very capable of modifying rifampicin, yet that doesn't appear to translate to the in vivo setting. This is of importance because the authors use this enzyme assay to argue that Arr-X is a fundamentally more powerful RIF resistance mechanism than Arr-1 and that it has specificity for rifabutin. However, the result in Figure 4f would argue that the enzyme assay results cannot be directly translated to in vivo contexts. For the authors to claim that Arr-X is most potent at modifying rifabutin, they could test their CRISPRi knockdowns of Arr-X and Arr-1 under treatment with each of the rifamycins they use in the enzyme assay. The authors mentioned that they didn't do this because all the strains are resistant to those compounds; however, if Arr-X is important for drug resistance, it would be reasonable to expect to see sensitization of the bacteria to those compounds upon knockdown.

      The reviewer is reading Fig. 4f incorrectly, probably because it is plotted in a linear scale instead of logarithmic scale. Ribosylated Rif is present in M. flavescens, just at lower levels than M. conceptionense and M. smegmatis. In species where there is no Arr-1 or Arr-3, ribosylated RIF is not detected at all (e.g. M. tuberculosis), i.e., concentration is zero. Therefore, any detection of ribosylated RIF can be considered significant. In addition, as mentioned before, ADP-ribosylation of RIF is not the final product of the reaction and further studies need to be undertaken to understand subsequent reactions.

      (5) Figure 5d: The authors use this CRISRPi experiment to claim that ArrX from M. conceptionanse is more potent at inactivating rifabutin than Arr-1. This claim depends on there being equal degrees of knockdown of Arr-1 and Arr-X, so the authors should validate the degree of knockdown they get. This is particularly important because, to my knowledge, nobody has used this system in M. conceptionanse before.

      We agree with the reviewer that a qPCR should have been performed to define the extent of interference in the strain. generated Unfortunately, at this time a qPCR was not performed in the strains tested to confirm the extent of down regulation. Although it is the best practice to validate the strain KD, there is no indication that the effect observed is due to unspecific downregulation. The genetic environment in which Arr-X is positioned is different from Arr-1 and the targeting oligonucleotides are specific and would not promiscuously bind to Arr-1. Said that, this is indeed a fault in our setup.

      (6) The authors' arguments about Arr-X and Arr-1 would be strengthened by showing by LC/MS that Arr-X knockdown in M. conceptionense results in more loss of ribosyl-rifabutin than knockdown of Arr-1.

      We agree with the reviewer that performing the LC-MS analysis of the Arr-x knockdown would have strengthened the argument of our paper. Unfortunately, this experiment was not performed.

      Reviewer #3 (Public review):

      This manuscript presents a macroevolutionary approach to the identification of novel high-level antibiotic resistance determinants that takes advantage of the natural genetic diversity within a genus (mycobacteria, in this case) by comparing antibiotic resistance profiles across related bacterial species and then using computational, molecular, and cellular approaches to identify and characterize the distinguishing mechanisms of resistance. The approach is contrasted with "microevolutionary" approaches based on comparing resistant and susceptible strains of the same species and approaches based on ecological sampling that may not include clinically relevant pathogens or related species. The potential for new discoveries with the macroevolution-inspired approach is evident in the diversity of drug susceptibility profiles revealed amongst the selected mycobacterial species and the identification and characterization of a new group of rifamycin-modifying ADP-ribosyltransferase (Arr) orthologs of previously described mycobacterial Arr enzymes. Additional findings that intra-bacterial antibiotic accumulation does not always predict potency within this genus, that M. marinum is a better proxy for M. tuberculosis drug susceptibility than the commonly used saprophyte M. smegmatis, and that susceptibility to semi-synthetic antibiotic classes is generally less variable than susceptibility to antibiotics more directly derived from natural products strengthen the claim that the macroevolutionary lens is valuable for elucidating general principles of susceptibility within a genus.

      There are some limitations to the work. The argument for the novelty of the approach could be better articulated. While the opportunities for new discoveries presented by the identification of discrepant susceptibility results between related species are evident, it is less clear how the macroevolutionary approach is further leveraged for the discovery of truly novel resistance determinants. The example of the discovery of Arr-X enzymes presented here relied upon foundational knowledge of previously characterized Arr orthologs. There is little clarity on what the pipeline for identifying more novel resistance determinants would look like. In other words, what does the macroevolutionary perspective contribute to discovery from the point of finding interspecies differences in susceptibility? Does the framework still remain distinct from other discovery frameworks and approaches? If so, how?

      Thanks for pointing this out, as this is a critical feature of our study and method. Our approach relies on inter-species comparative genomics and phenotypes, and therefore, it is distinct from inter-strains comparison. This difference is dramatic, and it becomes clearer when we are comparing the core genome of M. tuberculosis (one species) 92% with the core genome of the genus, circa of 1%. While we focus on rifamycin in this manuscript, future manuscripts will investigate many of the other dozens of “inconsistencies” observed between the genetic makeup of different mycobacterial species and there actual performance in the presence of different antibiotics.

      While the experimentation and analyses performed appear well-designed and rigorous, there are a few instances in which broad claims are based on inferences from sample sets or data sets that are too limited to provide robust support. For example, the claim that rifampicin modification, and precisely ADP-ribosylation, is the dominant mechanism of resistance to rifampicin in mycobacteria may be a bit premature or an over-generalization, as other enzymatic modification mechanisms and other mechanisms such as helR-mediated dissociation of rifampicin-stalled RNA polymerases, efflux, etc were not examined nor were CRISPRi knockdown experiments conducted beyond an experiment to tease out the role of Arr-X and Arr-1 in one strain. The general claim that intra-bacterial antibiotic accumulation does not predict potency in mycobacteria may be another over-generalization based on the limited number of drugs and species studied, but perhaps the intended assertion was that antibiotic accumulation ALONE does not predict potency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments

      (1) The metabolomics is done using mycobacteria grown on filters. Initially, mycobacterial cells are grown on the filters for 5 doublings before being transferred to drug-containing (or free) agar for one doubling. Is this based on calculated doubling time in liquid culture or a true determination of the fact that the biomass increases to what would amount to 5 doublings?

      The doubling time used is the one determined in liquid media. Although it is possible that the growth kinetics in solid media is slightly different from liquid (±10%), this experimental design is well established for M. tuberculosis (since Proc Natl Acad Sci U S A. 2010 May 25;107(21):9819-24.) and M. smegmatis (unpublished). Therefore, we used the growth rate as a proxy for having the same biomass of cells for each species tested. A maximum difference of 10% was observed between M. tuberculosis growth in liquid and in solid media, however, cells grow exponentially for much longer in filters. This makes filter-based experiments more reliable, as few growth phase-derived differences are present.

      (2) The demonstration that intrabacterial drug concentrations vary between mycobacterial species in a manner not related to MIC for at least LZD and RIF, is an important finding. However, intrabacterial does not mean cytoplasmic since a considerable fraction could be present in the periplasmic/cell wall layers. Ideally, this would need to be determined but would of course be a massive undertaking since the method needs validation & optimization for each mycobacterial species. Nevertheless, this has to be mentioned. In addition, three drugs are limiting. Measuring additional drug concentrations in these 5 mycobacteria would at least establish some confirmation about the extent of this lack of correlation. Thus, could the authors measure concentrations of additional drugs with intracellular targets?

      Testing additional drugs can be beneficial and would be an expansion of our paper, which will definitely be on future plans for further studies focusing on other antibiotics described here. It would also provide new insights into other possible mechanisms of resistance in mycobacterial species. However, in this study we aimed to first determine the antibiotic response profile in different mycobacterial species, and once we identified interesting resistance phenotypes that could not be readily explained by known mechanisms of resistance, we narrowed it down to certain drugs and species that would potentially provide insights into new mechanisms of antibiotic resistance. Finally, exploring drug concentration across multiple bacterial compartments is a dauting task and it has not been done extensively with any species, not to mention with multiple species, many of which are still lacking any study of their actual cell envelope.

      (3) CRISPRi was used to reduce transcription in M. conceptionense. What was the level of gene downregulation?

      As mentioned previously, a setback from our setup is that the level of KD was not measured at this instance.

      Minor comments:

      (1) The introduction mentions the fast and slow-growing mycobacteria which are classified based on the time that it takes to observe colonies on solid agar. However, in liquid medium, there is less correlation between the reported growth on agar and doubling time in liquid (Figure 1b, Figure 2d). This could be mentioned in the results section. In Figure 2d, the filled circles represent fast-growers but this does not hold well for liquid culture and it might make more sense to not distinguish between fast- and slow-growers in these graphs. A small complication would also be the fact that the doubling time represents growth in a liquid medium with Tyloxapol as a detergent whereas the MIC and metabolomics are done on solid agar with no detergent. The metabolomics is done after a doubling but for those where agar growth and liquid growth have large discrepancies in growth rate, there could be some differences.

      Apologies for this misunderstanding. Fast- and slow-growth phenotypes are determined in Lowenstein-Jensen (LJ) agar, not in 7H10 agar (used in our study and most studies of mycobacteria). Furthermore, this is a qualitative definition, not a quantitative one. Therefore, our measurements do not need to correlate with fast- and slow-growth phenotypes, unless we had used that one specific medium. Furthermore, in liquid medium, we determined growth rate directly, which is never done with LJ medium.

      In addition to adding the same amount of cells to each filter, we also perform TIC normalization, which should account for how rich the samples were – and therefore how much material we had. Therefore, we do not observe discrepancies due to differences in growth rate and the presence/absence of detergent in the media.

      It is also worth mentioning that this experimental set up has been well established in many M. tuberculosis labs that study metabolism. Importantly, the use of detergent drastically affects mass spectrometry, and therefore cannot be used.

      (2) Figure 1g in the text should be Figure 1f.

      Apologies, it has been fixed.

      (3) Figure S1 would be ideal to have in (supplementary) table format.

      This data is now being provided in a table format.

      (4) Table S1 - ethambutol misspelt.

      Spelling has been corrected.

      (5) MIC for species such as M. abscessus could depend on medium (7H9-based medium can give different MIC values than CAMH).

      Indeed, different media can significantly change MIC values, and this is true for many bacterial species, if not all. For this study we used only species that could be grown in 7H9 broth containing 10 % ADC, 0.05% glycerol 0.05% tyloxapol and 7H10 plates containing 10% OADC and 0.05% glycerol. MIC<sub>99</sub> was determined in the latter as we found more efficient and robust to do our tests it in solid media. The goal of our experiment was not to the determined the “true” MIC for the antibiotics tested, as this value does not exist. It was to find lack of correlations between relative values and the presence of genes that can account for it.

      (6) The statement "the experiment was performed at a concentration of antibiotic equal to its MIC" initially seems confusing. It was not equal to the MIC but performed at 6-fold the respective MIC of the species in question. Maybe re-phrasing this would help.

      Apologies for this oversight. It has been corrected.

      (7) Note that some mutations outside the RRDR (eg. V170F and I491F) can also cause Rif resistance.

      Author response image 3.

      A Rainbow diagram of RpoB X-Ray structure coloured according to sequence conservation. Dark purple indicates high conservation, whereas dark orange indicates low conservation. RIF (showed in magenta) is bound to RpoB. Zoomed view displays that the RIF-binding pocket is considerably conserved. B RpoB protein sequence has an 81bp region called Rifampicin Resistance Determining Region (RRDR) that is known to be important for RIF binding and is where most mutations occur in drug-resistant TB. Sequence alignment displays that the RRDR region is conserved with the exception of M. branderi, which has an Asn instead of a Ser residue in position 456 (numbering is related to the M. tuberculosis sequence), highlighted in bold.

      Attached we have a structural alignment of RpoB of the species highlighted on this paper. Although there is variability within the sequences, which is also displayed in Author response image 3 with the conservation analysis, the residues that have been implicated with resistance (including V170 and I491) are conserved. Alignment sent on .fasta file that can be opened in jalview.

      (8) Discuss how the RpoB S450N mutation in M. branderi confers the observed level of resistance.

      That’s a great point, thank you. Now it reads as:

      “The rifampicin (RIF) binding pocket is generally conserved, but Mycobacterium branderi has an S450N mutation in the RRDR region. While this specific mutation hasn't been found in clinical isolates, it's located at the binding site and may confer resistance (273). Although both serine (S) and asparagine (N) have similar side chains, related mutations like S450Q have been linked to resistance (156). Thus, M. branderi may be RIF-resistant due to this mutation. In contrast, M. conceptionense, M. flavescens, and M. smegmatis show no target sequence differences that explain their resistance”

      (9) The statement that the three tested NTM are sensitive to rifabutin ("resistant to all rifamycins except for rifabutin") needs to be interpreted considering what sensitivity means. The MIC is still high (1.6-3.1 ug/mL) when compared to that of Mtb. The 2-fold differences in MIC between M. smegmatis and M. conceptionense do not really prove or disprove the role of Arr-X in rifabutin resistance.

      We fixed the sentence to be more careful with the language on the text. We agree, but it is worth mentioning that generally with bacteria there is a regulation by the CLSI. Each bacterial species has a range that is considered sensitive or resistant, but these are not available for the species used in this study. In general, bacteria with MIC values above 8 µg/mL are considered resistant to rifampin (J Antibiot 2014 67:625).

      (10) Figure 1d: It's hard to quantify the sensitivity of the plates. Can this be done by MIC? Was only rifabutin tested or also rifampicin?

      The initial experiments described on the paper were all performed using Rifampicin only. Then, the MIC for the remaining rifamycins was determined for M. smegmatis, M. flavescens and M. conceptionense, and can be perused on “Supplementary table 4”. Figure 5d is to illustrate the effect of the KD in M. conceptionense sensitivity to rifabutin.

      (11) Is there data to show the ADP-ribosylation of rifabutin in M. conceptionense and the CRISPRi strains?

      Unfortunately, we did not perform LC-MS analysis on M. conceptionense CRISPRi strains exposed to rifabutin to measure potential ADP-ribosylation.

      Reviewer #2 (Recommendations for the authors):

      (1) It would be useful if the authors would complete Figure 1A by determining growth rates for the remaining 18 strains that they currently omitted.

      These growth rates were obtained using roller bottles and in at least 3 independent experiments, unfortunately the throughput is far ideal. The goal of the experiment was to highlight difference in growth rate, beyond fast- and slow-growth, which we did. Adding the remaining values would not change this conclusion. Growth rate variation in 7H9 is significant and the point is made in our figure.

      (2) The authors should justify their choice of species used in Figures 3-4. It would be useful to know, for instance, if the authors chose these species in an unbiased fashion, or if they were chosen because the authors had already determined that they possess rifamycin-modifying enzymes of interest. In that case, they wouldn't necessarily be a representative sample to use for the correlation analysis of antibiotic uptake and potency in Figure 3.

      They were chosen because of their resistance profile for BDQ, LZD and RIF. This has been addressed in the text, which now reads “Given the antibiotic response profiles observed, we selected BDQ, LZD and RIF to explore the molecular causes of these dramatic changes in antibiotic potency observed across the Mycobacterium genus.”

      (3) Figure 4b: The data in this panel appear inconsistent - for instance, M. houstonense appears to grow at 10X Mtb MIC, but fails to grow at 1X Mtb MIC. Repeating this experiment would better establish the validity of the authors' claims about the relative susceptibility of these strains to RIF.

      The figures got rotated when exported from illustrator. Corrected figure is uploaded, and original plate photos are also uploaded for clarity.

      (4) Figure 4e: Does Arr-X get upregulated in these proteomic datasets? The authors' argument that proteomic upregulation correlates with important drug resistance genes would imply that it might be, so that would be useful information to provide.

      Arr-X is slightly upregulated, but not statistically significant – this could be due to the native expression of Arr-1. Data is displayed in a previous answer.

      (5) I wasn't able to find the supplementary tables that the authors allude to - not sure if that was a file mixup, but those tables would be useful for interpreting the manuscript.

      We are sorry that you couldn’t access the table. It must be a file corruption issues, as the other reviewers were able to. We will make sure that all tables are available and accessible.

      (6) For LC/MS, the authors use peak height instead of peak area, which they argue correlates better with the amount of drug in cells because of the poor peak shape they observed for linezolid. This is not standard practice, so the authors should provide evidence to support this claim by running an LC/MS standard curve, then showing the correlation between peak height and amount of compound added as well as the correlation between peak area and compound.

      Thank you for pointing that out, accuracy calculated and displayed. Both peak area and height can be used, but indeed area is standard practice.

      (7) The authors should provide methods information about the LC column and the gradient settings used for LC-MS, as well as the settings of the MS.

      The full method has been added to the paper.

      Reviewer #3 (Recommendations for the authors):

      I have only minor comments aside from the information in the Public Review:

      (1) Results, section on Intra-bacterial antibiotic accumulation, line 8: "experiment was performed at a concentration of antibiotic PROPORTIONAL to its MIC" would be more accurate?

      Agreed and adjusted according to Reviewer’s suggestion.

      (2) Results, section on A minor role for pre-existing target modification, last sentence: the mere presence of RIF-ribosylating enzymes does not, in and of itself indicate that "RIF modification, and precisely ADP-ribosylation, is the dominant mechanism of resistance to RIF in mycobacteria", as other mechanisms and other forms of modifying enzymes are known to confer rifamycin resistance, with redundancy (e.g., other rifampicin-modifying enzymes, or helR-mediated dissociation of rifampicin-stalled RNA polymerases from DNA). It would be more appropriate to suggest the results presented to this point indicate RIF modification is common among mycobacteria. The evidence from the CRISPRi knockdown of Arrs shown in Fig 5d is the kind of evidence that suggests ribosylation as a dominant mechanism, at least against rifabutin in this particular species.

      Absolutely, there are other possible modifying enzymes that could be encoded by these mycobacterial species. There is a possibility that M. flavescens and M. smegmatis encode for a putative helR (attached alignment) but further experiments would need to be carried out to confirm its ability to displace RIF in the RNAP. Interestingly, the presence of both Arr and HelR has been studied in M. abscessus and those mechanisms of resistance are independent from each other (Molecular Cell 2022 82(17):3166-3177.e5).

      (3) Discussion, 2nd sentence needs grammatical editing.

      Rephrased and it reads “Using our mycobacterial library, we identified for the first time high- and ultra-high-level intrinsic resistance (3) to many of the antibiotics tested. Of note, the resistant phenotype is naturally occurring and not a result of mutations due to exposure to the antibiotic in the clinic – which is the more traditional approach for probing mechanisms of antibiotic resistance. Our observations revealed that resistance profiles are highly variable across the genus and do not follow phylogeny, implicating HGT as the key mechanism for acquisition of resistance determinants and evolution of antibiotic resistance in mycobacteria (42).”

      (4) Discussion, page 7, first line: the inclusion of LZD and BDQ in this statement seems at odds with Figure 2c and the statements in the first paragraph of page 5 highlighting these as examples of drugs to which most mycobacteria are susceptible.

      Indeed, many of the species are susceptible, however the MIC<sub>99</sub> levels observed have never been reported before, and therefore we found it to be an interesting finding to highlight. From a treatment perspective, knowing which species are sensitive to which drugs is of course the most useful outcome of our study.

      (5) The next sentence..."We found that resistance to these antibiotics in mycobacteria cannot be explained by uptake/efflux mechanisms..." is a bit of an over-generalization and conflicts with the evidence presented earlier that efflux could be playing a role in BDQ resistance and the published evidence establishing a clinically significant role for efflux-mediated BDQ resistance in M. tuberculosis, M. avium complex and M. abscessus complex.

      We rephrased it to make it more specific to our findings. It reads “We found that resistance to these antibiotics in mycobacteria do not correlate with by uptake/efflux mechanisms in the species tested and it does not correlate with growth rate. Identification of mycobacterial species highly resistant to BDQ and LZD is worrisome as most of this species, if not all, have never been exposed to these drugs.”

      (6) Methods, section on In vitro activity assay of Arr enzymes, line 1: reference(s) should be provided for previously reported methods.

      Reference now added.

      (7) Figure 2d: the low end of the susceptibility range is not well defined.

      In this figure the susceptibility is not defined as the lowest area of the graph, but the lower concentrations are indeed harder to be defined. Hopefully supplementary figure 1 and the additional table containing the MIC can be informative to address this comment.

      (8) Figures 3c,d: the presentation of the relative antibiotic concentrations could be harmonized between the graphs in 3c and those in 3d to enable a more ready comparison.

      We disagree. The goal of these different panels is exactly to illustrate two distinct points. C gives the relative concentration of antibiotic, while D correlates relative concentration with MIC99. The use of log scale in D further clarifies that there is no correlation between intracellular antibiotic concentration and potency (MIC). This information is not present in C.

      (9) Figure 4f and Supplementary Figure 5b: it is difficult to understand the limited amount of ribsosyl-RIF in M. flavescens in Fig 4f relative to Supplementary Figure 5b (esp. when considering M. smeg as a common comparator); and, further, to understand the seeming lack of correlation between RIF susceptibility, ribosylation and Arr number and catalytic efficiency for these two strains without considering additional resistance mechanisms.

      In reality the difference between figure 4f and Supplementary figure 5b is mainly due to M. smegmatis – that has an apparent lower production of ribosyl-RIF in the experiment described in the supplementary figure. The values for M. flavescens are relatively similar. In addition, the ADP-Ribosyl-RIF is not the final metabolite of the pathway.

      In regards of having the entire picture, it is true that we were unable to completely unravel and correlate MIC value, expression of Arr-1, expression of Arr-3, efficiency of each enzyme, production of ADP-Ribosyl-RIF and the presence of other possible mechanisms of resistance and this is indeed a setback in our study, and of most studies ever published, which usually focus on one resistant determinant.

    1. Author response:

      The following is the authors’ response to the original reviews

      Many thanks for your helpful and constructive comments for our work examining the effect of inhibiting both the insulin receptor (IR) and IGF1 receptor (IGF1R) in the podocyte. We are pleased to submit an updated manuscript addressing your concerns.

      (1) A major concern was a lack of mechanistic insight into how deletion (or knock-down) of both receptors caused the spliceosomal phenotype (Reviewer 1 and Reviewer 3).

      We now think this is due to the lack of a network of insulin/IGF phospho-signalling events to a variety of spliceosomal proteins and kinases. The reasons for this are as follows:

      A. Since submitting our paper Turewicz et al have published a comprehensive phospho-proteomic paper examining the effects of 100nM insulin on human primary myotubes (DOI: 10.1038/s41467-025-56335-6). They discovered that multiple post-translational phosphorylation events occur in a variety of spliceosomal proteins at differing time points (1 minute to 60 minutes). Furthermore, they show that mRNA splicing is rapidly modified in response to insulin stimulation in their cells. This follows elegant work from Bastista et al who studied diabetic and non-diabetic iPSC derived human myositis and also detected a spliceosome phosphorylation signature (DOI: 10.1016/j.cmet.2020.08.007).

      B. We have examined phospho-proteosome changes that occur in wild -type podocytes (expressing both the IR and IGF1R) compared to double (IR and IGF1R) knockout cells using phosho-proteomics. We have done this 3 days after inducing receptor knockdown, before major cell loss, and have stimulated the cells with either 10nM insulin or 100mg IGF1.

      Interestingly, we detected several post-translational modifications (PTM) in our data set that are also present in Turewicz’s studies. Of note, 100nM insulin (as used by Turewicz) will signal through both the insulin and IGF1 receptor (and hybrid Insulin/IGF1 receptors) which is relevant to our studies.

      Our work shows a cascade of phospho- signalling events affecting multiple components of the spliceosomal complex and evidence of kinase modulation (phosphorylation) (New Figure 7 and supplementary Figure 5). Also new results section in paper (lines 391-425 in track changes version). We acknowledge that we only studied a single time point after stimulation (10 minutes) and could have missed other PTM in the spliceosomal complex and other kinases. This is mentioned in our new limitations of study section (lines 595-606). This will be a focus of future work. We did not find major PTM differences when stimulating with either insulin or IGF1 in our studies and suspect that the doses of insulin (10nM) and IGF1 (100mg) used are still able to signal through cognate receptors.

      Furthermore, we have examined the relative contributions of the insulin and IGF1 receptor in detail in the model (addressed in point 13 below).

      (2) The phenotype of the mouse is only superficially addressed. The main issues are that the completeness of the mouse KO is never assessed nor is the completeness of the KO in cell lines. The absence of this data is a significant weakness. (Reviewer 1)

      We apologise for not making this clear, but we did assess the level of receptor knockdown in both the animal and cell models. The in vivo model showed variable and non-complete levels of insulin receptor and IGF1 receptor podocyte knock down (shown in supplementary Figure 1C). This is why we made the in vitro floxed podocyte cell lines in which we could robustly knockdown both the IR and IGF1R. We show this using Western blotting (shown in Figure 2A). We agree that calling the models knockout is misleading and have changed all to knock down (KD) now.

      (3) The mouse experiments would be improved if the serum creatinine’s were measured to provide some idea how severe the kidney injury is. (Reviewer 1)

      There is variability in creatinine levels which is not uncommon in transgenic mouse models (probably partly due to variability in receptor knock down levels with cre-lox system). This is part of rationale of developing the robust double receptor knockout cell models where we robustly knocked out both receptors by >80%. We have added measured creatinine levels in a subset of mice in supplementary data (New Supplementary Figure 1E) and mention this in the text (lines 285-286). As some mice died we expect they may have developed acute kidney injury, but we did not serially measure the creatinine’s in every mouse over time. We could have assessed the GFR in a more sensitive way to look at differences. However, we consider the highly significant levels of albuminuria and histological damage observed in our models show a significant kidney phenotype.

      (4) An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful. If this didn't work, an explanation in the text would suffice. (Reviewer 1).

      We did consider doing this but on reflection think it is very unlikely to rescue the phenotype as an array of different spliceosomal proteins quantitatively changed and were differentially phosphorylated / dephosphorylated throughout the complex (as we hope our revised work illustrates now). We think a single protein rescue is highly unlikely to work. We hope this is an appropriate explanation for this action. We have mentioned this in the text now in our discussion (lines 601-602).

      (5) As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on. (Reviewer 1).

      Thank you for this suggestion. We did not extensively examine the metabolism of the mice however we did perform blood glucose measurement and weight which are included in the paper (Figure 1A and Figure 1B).

      (6) The authors should caveat the cell experiments by discussing the ramifications of studying the 50% of the cells that survive vs the ones that died. (Reviewer 1).

      We appreciate this and this was the rationale behind cells being studied after 3 days differentiation for total and phospho-proteomics before significant cell loss to avoid the issue of studying the 50% of cells that survive (which happened at 7 days). We have made this clearer in the manuscript. We also have added the data showing less cell death at 3 days in the cell model (New Supp Figure 2B).

      (7) It would be helpful to say that tissue scoring was performed by an investigator masked to sample identity. (Reviewer 2)

      We did this and have added to manuscript (line 113).

      (8) Data are presented as mean/SEM. In general, mean/SD or median/IQR are preferred to allow the reader to evaluate the spread of the data. There may be exceptions where only SEM is reasonable. (Reviewer 2)

      All graphs have now been changed to SD rather than SEM.

      (9) It would be useful to for the reader to be told the number of over-lapping genes (with similar expression between mouse groups) and the results of a statistical test comparing WT and KO mice. The overlap of intron retention events between experimental repeats was about 30% in both knock-out podocytes. This seems low and I am curious to know whether this is typical for this method; a reference could be helpful. (Reviewer 2)

      This is an excellent question. We had 30% overlap as the parameters used for analysis were very stringent. We suspect we could get more than 30% by being less stringent, which still be considered as similar events if requested. Our methods were based on FLAIR analysis (PMID: 32188845). We have added this reference to the manuscript (Line 242 & 680).

      (10) With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism, the major limitations are the lack of information regarding the completeness of the KO's. If, for example, they can determine that in the mice, the KO is complete, that the GFR is relatively normal, then the phenotype they describe is relatively mild. (Reviewer 1)

      Thank you. The receptor knock-out (KO) in the mice is highly unlikely to be complete (Please see comments above and Supplementary Figure 1C). There are many examples of “KO” animal models targeting other tissues showing that complete KO of these receptors seems difficult to achieve, particularly in reference to the IGF1 receptor. In the brain, which also contains terminally differentiated cells, barely 50% of IGF1R knockdown was achieved in the target cells (PMID:28595357). In ovarian granulosa cells (PMID:28407051) -several tissue specific drivers tried but couldn't achieve any better than 80%. The paper states that 10% of IGF1R is sufficient for function in these cells so they conclude that their knockdown animals are probably still responding to IGF1. Finally, in our recent IGF1R podocyte knockdown model we found Cre levels were important for excision of a single homozygous floxed gene (PMID: 38706850) hence we were not surprised that trying to excise two homozygous floxed genes (insulin receptor and IGF1 receptor) was challenging. This was the rationale for making the double receptor knockout cell lines to understand processes / biology in more detail. As stated earlier, we have changed our description of the mice and cell lines from knock-out to knock-down throughout the revised manuscript as this is more accurate.

      (11) For the in vivo studies, the only information given is for mice at 24 weeks of age. There needs to be a full-time course of when the albuminuria was first seen and the rate of development. Also, GFR was not measured. Since the podocin-Cre utilized was not inducible, there should be a determination of whether there was a developmental defect in glomeruli or podocytes. Were there any differences in wither prenatal post-natal development or number of glomeruli? (Reviewer 3)

      We have added further urinary Albumin:creatinine ratio (uACR) data at 12, 16 and 20 weeks to manuscript. We do not think there was a major developmental phenotype as albuminuria did not become significantly different until several months of age (new Supp Figure 1B). We did consider using a doxycycline inducible model but we know the excision efficiency is much less than the constitutive podocin-cre driven model Author response image 1. This would likely give a very mild (if any) phenotype when attempting to knockout both receptors and not reveal the biology adequately. We acknowledge the weaknesses of the animal model and this was the rationale for generating the cell models.

      (12) Although the in vitro studies are of interest, there are no studies to determine if this is the underlying mechanism for the in vivo abnormalities seen in the mice. Cultured podocytes may not necessarily reflect what is occurring in podocytes in vivo. (Reviewer 3)

      This is a good point. We have now immune-stained the DKD and WT mice for Sf3b4 (a spliceosomal change in our in vitro proteomics) and also find a significant reduction in this protein in podocytes of the DKD mice (New Figure 3F).

      (13) Given that both receptors are deleted in the podocyte cell line, it is not clear if the spliceosome defect requires deletion of both receptors or if there is redundancy in the effect. The studies need to be repeated in podocyte cell lines with either IR or IGFR single deletions. (Reviewer 3)

      We have now performed proteomics and phospho-proteomics in all 4 cell types (Wild-type, Insulin receptor knock down, IGF1R knockdown and double knockdown) at 3 days (New Figure 8 and supplementary Figure 6. Also new results section lines 425 to 450). This shows that both receptors contribute to the pathways (and hence there is a high level of compensation built into the system). For total proteins we detected that spliceosomal tri-snRNP was only reduced when both receptors were lacking but other proteins / pathways had an incremental effect of losing the insulin or IGF1 receptor. Likewise, the spliceosomal phospho-signaling events can go through either the insulin or igf1 receptors predominantly or through both. We think this reflects the complexity of this system and how evolutioatily it has developed in mammals to protect against its loss.

      Finally in revision we have rewritten the discussion with a “limitations of the study” section and hopefully in an easier to read fashion for the readership.

      Author response image 1.

      (A) mT/mG reporter mouse crossed to constitutional podocin Cre heterozygous mouse. Illustrates podocyte specificity for Cre driver and excision Of reporter Figure shows GFP expression in Cre producing cells (top panel scale bar=250vm; bottom panel scale bar=50pm). Cre expression causes GFP to be switched on. (B) mT/mG reporter mouse crossed to podocin RtTA— tet-o-cre heterozygous mouse shows podocyte specificity for driver and approximately 60% excision. (top and bottom panels scale bar=250pm; middle panel scale bar=50pm). Doxycycline required for expression showing not leaky.

    1. Capulet. When the sun sets, the air doth drizzle dew; But for the sunset of my brother's son It rains downright. 2235How now! a conduit, girl? what, still in tears? Evermore showering? In one little body Thou counterfeit'st a bark, a sea, a wind; For still thy eyes, which I may call the sea, Do ebb and flow with tears; the bark thy body is, 2240Sailing in this salt flood; the winds, thy sighs; Who, raging with thy tears, and they with them, Without a sudden calm, will overset Thy tempest-tossed body. How now, wife! Have you deliver'd to her our decree? 2245 Lady Capulet. Ay, sir; but she will none, she gives you thanks. I would the fool were married to her grave! Capulet. Soft! take me with you, take me with you, wife. How! will she none? doth she not give us thanks? Is she not proud? doth she not count her blest, 2250Unworthy as she is, that we have wrought So worthy a gentleman to be her bridegroom? Juliet. Not proud, you have; but thankful, that you have: Proud can I never be of what I hate; But thankful even for hate, that is meant love. 2255 Capulet. How now, how now, chop-logic! What is this? 'Proud,' and 'I thank you,' and 'I thank you not;' And yet 'not proud,' mistress minion, you, Thank me no thankings, nor, proud me no prouds, But fettle your fine joints 'gainst Thursday next, 2260To go with Paris to Saint Peter's Church, Or I will drag thee on a hurdle thither. Out, you green-sickness carrion! out, you baggage! You tallow-face! Lady Capulet. Fie, fie! what, are you mad? 2265 Juliet. Good father, I beseech you on my knees, Hear me with patience but to speak a word. Capulet. Hang thee, young baggage! disobedient wretch! I tell thee what: get thee to church o' Thursday, Or never after look me in the face: 2270Speak not, reply not, do not answer me; My fingers itch. Wife, we scarce thought us blest That God had lent us but this only child; But now I see this one is one too much, And that we have a curse in having her: 2275Out on her, hilding! Nurse. God in heaven bless her! You are to blame, my lord, to rate her so. Capulet. And why, my lady wisdom? hold your tongue, Good prudence; smatter with your gossips, go. 2280 Nurse. I speak no treason. Capulet. O, God ye god-den. Nurse. May not one speak? Capulet. Peace, you mumbling fool! Utter your gravity o'er a gossip's bowl; 2285For here we need it not. Lady Capulet. You are too hot. Capulet. God's bread! it makes me mad: Day, night, hour, tide, time, work, play, Alone, in company, still my care hath been 2290To have her match'd: and having now provided A gentleman of noble parentage, Of fair demesnes, youthful, and nobly train'd, Stuff'd, as they say, with honourable parts, Proportion'd as one's thought would wish a man; 2295And then to have a wretched puling fool, A whining mammet, in her fortune's tender, To answer 'I'll not wed; I cannot love, I am too young; I pray you, pardon me.' But, as you will not wed, I'll pardon you: 2300Graze where you will you shall not house with me: Look to't, think on't, I do not use to jest. Thursday is near; lay hand on heart, advise: An you be mine, I'll give you to my friend; And you be not, hang, beg, starve, die in 2305the streets, For, by my soul, I'll ne'er acknowledge thee, Nor what is mine shall never do thee good: Trust to't, bethink you; I'll not be forsworn. [Exit]

      lord Capulet enters and mocks Juliet's grief however after he learns that Juliet is rejecting the wedding he gets enraged saying that he would drag her to the church himself he then gives juliet a ultimatium saying if he doesnt marry paris he would disown juliet and leave her a beggar on the streets

    2. Tybalt. Well, peace be with you, sir: here comes my man. Mercutio. But I'll be hanged, sir, if he wear your livery: 1555Marry, go before to field, he'll be your follower; Your worship in that sense may call him 'man.' Tybalt. Romeo, the hate I bear thee can afford No better term than this,—thou art a villain. Romeo. Tybalt, the reason that I have to love thee 1560Doth much excuse the appertaining rage To such a greeting: villain am I none; Therefore farewell; I see thou know'st me not. Tybalt. Boy, this shall not excuse the injuries That thou hast done me; therefore turn and draw. 1565 Romeo. I do protest, I never injured thee, But love thee better than thou canst devise, Till thou shalt know the reason of my love: And so, good Capulet,—which name I tender As dearly as my own,—be satisfied. 1570 Mercutio. O calm, dishonourable, vile submission! Alla stoccata carries it away. [Draws] Tybalt, you rat-catcher, will you walk? Tybalt. What wouldst thou have with me? 1575 Mercutio. Good king of cats, nothing but one of your nine lives; that I mean to make bold withal, and as you shall use me hereafter, drybeat the rest of the eight. Will you pluck your sword out of his pitcher by the ears? make haste, lest mine be about your 1580ears ere it be out. Tybalt. I am for you.

      tybalt spots romeo and challenges him to a fight but romeo refuses and says that we are closer than you think mercutio sees romeo as a coward and decide to draw his sword challenging tybalt in romeos place

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study examines the role of E2 ubiquitin enzyme, Uev1a in tissue resistance to oncogenic RasV12 in Drosophila melanogaster polyploid germline cells and human cancer cell lines. The incomplete evidence suggests that Uev1a works with the E3 ligase APC/C to degrade Cyclin A, and the strength of evidence could be increased by addressing the expression of CycA in the ovaries and the uev1a loss of function in human cancer cells. This work would be of interest to researchers in germline biology and cancer.

      Thank you for your valuable assessment. The requested data on CycA expression (Figure 4E-G) and uev1a loss-of-function in human cancer cells (Figure 8 and Figure 8-figure supplement 2) have been added to the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study uncovers a protective role of the ubiquitin-conjugating enzyme variant Uev1A in mitigating cell death caused by over-expressed oncogenic Ras in polyploid Drosophila nurse cells and by RasK12 in diploid human tumor cell lines. The authors previously showed that overexpression of oncogenic Ras induces death in nurse cells, and now they perform a deficiency screen for modifiers. They identified Uev1A as a suppressor of this Ras-induced cell death. Using genetics and biochemistry, the authors found that Uev1A collaborates with the APC/C E3 ubiquitin ligase complex to promote proteasomal degradation of Cyclin A. This function of Uev1A appears to extend to diploid cells, where its human homologs UBE2V1 and UBE2V2 suppress oncogenic Ras-dependent phenotypes in human colorectal cancer cells in vitro and in xenografts in mice.

      Strengths:

      (1) Most of the data is supported by a sufficient sample size and appropriate statistics.

      (2) Good mix of genetics and biochemistry.

      (3) Generation of new transgenes and Drosophila alleles that will be beneficial for the community.

      We greatly appreciate your comments.

      Weaknesses:

      (1) Phenotypes are based on artificial overexpression. It is not clear whether these results are relevant to normal physiology.

      Downregulation of Uev1A, Ben, and Cdc27 together significantly increased the incidence of dying nurse cells in normal ovaries (Figure 5-figure supplement 2), indicating that the mechanism we uncovered also protects nurse cells from death during normal oogenesis.

      (2) The phenotype of "degenerating ovaries" is very broad, and the study is not focused on phenotypes at the cellular level. Furthermore, no information is provided in the Materials and Methods on how degenerating ovaries are scored, despite this being the most important assay in the study.

      Thank you for pointing out this issue. We quantified the phenotype of nurse cell death using “degrading/total egg chambers per ovary”, not “degenerating ovaries”. Normal nurse cell nuclei exhibit a large, round morphology in DAPI staining (see the first panel in Figure 1D). During early death, they become disorganized and begin to condense and fragment (see the second panel in Figure 1D). In late-stage death, they are completely fragmented into small, spherical structures (see the third panel in Figure 1D), making cellular-level phenotypic quantification impossible. Since all nurse cells within the same egg chamber are interconnected, their death process is synchronous. Thus, quantifying the phenotype at the egg-chamber level is more practical than at the cellular level. We have added the description of this death phenotype and its quantification to the main text (Lines 104-108).

      (3) In Figure 5, the authors want to conclude that uev1a is a tumor-suppressor, and so they over-express ubev1/2 in human cancer cell lines that have RasK12 and find reduced proliferation, colony formation, and xenograft size. However, genes that act as tumor suppressors have loss-of-function phenotypes that allow for increased cell division. The Drosophila uev1a mutant is viable and fertile, suggesting that it is not a tumor suppressor in flies. Additionally, they do not deplete human ubev1/2 from human cancer cell lines and assess whether this increases cell division, colony formation, and xenograph growth.

      We apologize for any misleading description. We aimed to demonstrate that UBE2V1/2, like Uev1A in Drosophilanos>Ras<sup>G12V</sup>+bam-RNAi” germline tumors, suppress oncogenic KRAS-driven overgrowth in diploid human cancer cells. Importantly, this function of Uev1A and UBE2V1/2 is dependent on Ras-driven tumors; there is no evidence that they act as broad tumor suppressors in the absence of oncogenic Ras. Drosophila uev1a mutants were lethal, not viable (see Lines 135-137), and germline-specific knockdown of uev1a (nos>uev1a-RNAi) caused female sterility without inducing tumors. These findings suggest that Uev1A lacks tumor-suppressive activity in the Drosophila female germline in the absence of Ras-driven tumors. We have revised the manuscript to prevent misinterpretation. Furthermore, we have added data demonstrating that the combined knockdown of UBE2V1 and UBE2V2 significantly promotes the growth of KRAS-mutant human cancer cells, as suggested (Figure 8 and Figure 8-figure supplement 2).

      (4) A critical part of the model does not make sense. CycA is a key part of their model, but they do not show CycA protein expression in WT egg chambers or in their over-expression models (nos.RasV12 or bam>RasV12). Based on Lilly and Spradling 1996, Cyclin A is not expressed in germ cells in region 2-3 of the germarium; whether CycA is expressed in nurse cells in later egg chambers is not shown but is critical to document comprehensively.

      We appreciate your critical comment. CycA is a key cyclin that partners with Cdk1 to promote cell division (Edgar and Lehner, 1996). Notably, nurse cells are post-mitotic endocycling cells (Hammond and Laird, 1985) and typically do not express CycA (Lilly and Spradling, 1996) (see the last sentence, page 2518, paragraph 3 in this 1996 paper). However, their death induced by oncogenic Ras<sup>G12V</sup> is significantly suppressed by monoallelic deletion of either cycA or cdk1 (Zhang et al., 2024). Conversely, ectopic CycA expression in nurse cells triggers their death (Figure 4C, D). These findings suggest that polyploid nurse cells exhibit high sensitivity to aberrant division-promoting stress, which may represent a distinct form of cellular stress unique to polyploid cells. In the revised manuscript, we have provided the CycA-staining data, comparing its expression in normal nurse cells versus cells undergoing oncogenic Ras<sup>G12V</sup>-induced death (Figure 4E-G).

      (5) The authors should provide more information about the knowledge base of uev1a and its homologs in the introduction.

      Thank you for your suggestion. In the revised introduction, we have provided a more detailed description of Uev1A (Lines 72-79). Additionally, we have introduced its human homologs, UBE2V1 and UBE2V2, in the main text (Lines 143-145).

      Reviewer #2 (Public review):

      Summary:

      The authors performed a genetic screen using deficiency lines and identified Uev1a as a factor that protects nurse cells from RasG12V-induced cell death. According to a previous study from the same lab, this cell death is caused by aberrant mitotic stress due to CycA upregulation (Zhang et al.). This paper further reveals that Uev1a forms a complex with APC/C to promote proteasome-mediated degradation of CycA.

      In addition to polyploid nurse cells, the authors also examined the effect of RasG12V-overexpression in diploid germline cells, where RasG12V-overexpression triggers active proliferation, not cell death. Uev1a was found to suppress its overgrowth as well.

      Finally, the authors show that the overexpression of the human homologs, UBE2V1 and UBE2V2, suppresses tumor growth in human colorectal cancer xenografts and cell lines. Notably, the expression of these genes correlates with the survival of colorectal cancer patients carrying the Ras mutation.

      Strength:

      This paper presents a significant finding that UBE2V1/2 may serve as a potential therapy for cancers harboring Ras mutations. The authors propose a fascinating mechanism in which Uev1a forms a complex with APC/C to inhibit aberrant cell cycle progression.

      We greatly appreciate your comments.

      Weakness:

      The quantification of some crucial experiments lacks sufficient clarity.

      Thank you for highlighting this issue. We have provided more details regarding the quantification data in the revised manuscript.

      References

      Edgar, B.A., and Lehner, C.F. (1996). Developmental control of cell cycle regulators: a fly's perspective. Science 274, 1646-1652.

      Hammond, M.P., and Laird, C.D. (1985). Chromosome structure and DNA replication in nurse and follicle cells of Drosophila melanogaster. Chromosoma 91, 267-278.

      Lilly, M.A., and Spradling, A.C. (1996). The Drosophila endocycle is controlled by Cyclin E and lacks a checkpoint ensuring S-phase completion. Genes Dev 10, 2514-2526.

      Zhang, Q., Wang, Y., Bu, Z., Zhang, Y., Zhang, Q., Li, L., Yan, L., Wang, Y., and Zhao, S. (2024). Ras promotes germline stem cell division in Drosophila ovaries. Stem Cell Reports 19, 1205-1216.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The figure legends insufficiently describe the figures. One example is Figure 3, where there are no details in the figure legend about what conditions apply to each panel and each lane of the gels.

      For clarity and brevity, detailed experimental conditions are described in the Materials and Methods section. Figure legends therefore focus on summarizing the key findings. Thank you for your understanding!

      (2) The font size on the figure is too small.

      Thank you for your constructive suggestion. In response, we have enlarged all font sizes to improve readability.

      (3) There are places where the authors overstate their results, and there are issues with the clarity of the text:

      (3a) Lines 170: "excessive" is not appropriate. Their prior study showed a mild increase in proliferation.

      “Excessive” has been removed in the revised manuscript (Lines 215-216).

      (3b) Line 187-8: The authors should restate this sentence. Here's a possibility. Over-expression of Uev1a suppressed the phenotypes caused by CycA over-expression.

      This sentence has been restated as “Notably, this cell death was suppressed by co-overexpression of CycA and Uev1A, indicating a genetic interaction between them”. (Lines 229-231).

      (3c) Lines 266-7: The properties of Uev1a (ie, lacking a conserved Cys) should be in the introduction.

      This information has been added to the revised introduction (Lines 74-76).

      (3d) Line 318: "markedly" is an overstatement of the prior results.

      Our quantification data revealed that “nos>Ras<sup>G12V</sup>; bam<sup>-/-</sup>” ovaries are three times larger than “nos>GFP; bam<sup>-/-</sup>” control ovaries (see Figure 4A-C in Zhang et al., Stem Cell Reports 19, 1205-1216). Given this substantial difference, we think that using "markedly" is not an overstatement.

      (4) Data not shown occurs in a few places in the text. Given the ability to supply supplemental information in eLife preprints, these data should be shown.

      Thanks for your suggestion. All “not shown” data have been added to the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Major Comments

      (1) Cyclin A (CycA) is a key player in this study, but the authors do not provide evidence showing the upregulation of CycA following Ras overexpression in either polyploid or diploid cells. Data on CycA expression should be included.

      Thank you for your constructive suggestion. These data have been added to the revised manuscript (Figure 4E-G).

      (2) DNA replication stress, cellular senescence, and cell death should be assessed under Ras overexpression (RasOE) and RasOE + Uev1A RNAi conditions to support the model proposed in Figure 4F.

      We apologize for any confusion caused by our initial model. We do not have evidence that DNA replication stress and cellular senescence occur under these conditions. Cell death can be readily detected through the presence of fragmented nuclei and condensed DNA (see Figure 1D). The model has been updated accordingly (Figure 9E).

      (3) Appropriate controls should be performed alongside the experimental sets. The same nos>Ras+GFPi data set was repeatedly used in Figures 1I, 2B, 2H, and Figures 2, S2B, which is not ideal.

      All these experiments were performed under identical conditions. Therefore, we deem it appropriate to use the same control data across these analyses.

      (4) Overall, the microscopic images are too small and hard to see.

      Thank you for raising this important point. In the revised manuscript, all images and the font size on figures have been enlarged for improved clarity.

      (5) Figure 1H

      Why is the frequency of egg chamber degradation quite less in nos>RasG12V+GFP-RNAi (about 40%) than nos > RasG12V (about 80%)? And the authors do not show that there is a significant difference between those two conditions, although it should be there. We will need the explanation from the authors on why there is a difference here.

      These overexpression experiments were conducted using the GAL4/UAS system. While both “nos>Ras<sup>G12V</sup>+GFP-RNAi” and “nos>Ras<sup>G12V</sup>” contain a single nos-GAL4 driver, they differ in UAS copy number: the former incorporates two UAS elements compared to only one in the latter (see the detailed genotypes in Source data 2). These results demonstrate that UAS copy number impacts experimental outcomes in our system.

      In the previous paper (Zhang et al. (2024), Figure 7H shows that the frequency of egg chambers in nos>RasG12V is 33%, although this paper shows it as about 80%. There seems to be a difference in flies' age (previous paper: 7d, this paper: 3d), but this data raises the question of why nos>RasG12V shows more egg chamber degradation this time.

      We greatly appreciate your careful observation. The nurse-cell-death phenotype exhibits a spectrum from mild to severe manifestations [see Figure 1D and our response to weekness (2) in Reviewer #1’s public reviews]. While our 2024 paper exclusively quantified egg chambers with severe phenotypes as degrading, the current study included both mild and severe cases in this classification. We do not think fly age could account for this substantial phenotypic difference. A detailed description of the nurse-cell-death phenotype and its quantification have been added to the revised manuscript (Lines 104-108).

      In the following experiments, only nos>RasG12V+GFP-RNAi is used as a control (Figures 2B, H, S2B). I wonder if these results would give us a different conclusion if nos>RasG12V were used as a control.

      As explained above, the UAS copy number does matter in our analyses, so it is important to keep them identical for comparison.

      (6) In the abstract, the authors mention that uev1a is an intrinsic factor to protect cells from RasG12V-induced cell death. RasG12V does not induce much cell death of cystocytes with bam-gal4, whereas it induces a lot of nurse cells' death. Does it mean the intrinsic expression level of uev1a is low in nurse cells (or polyploid cells) compared to cystocytes (or diploid cells)?

      Overexpression of Ras<sup>G12V</sup> driven by bam-GAL4 exhibited only minimal nurse cell death (Figure 1D, E). Additionally, Uev1A exhibited low intrinsic expression levels in both cystocytes and nurse cells (Figure 3E and Figure 5-figure supplement 1).

      (7) Is uev1a-RNAi alone sufficient to induce egg chamber degradation? Or does it have any effect on ovarian development? (Related to question #1 in minor comments)

      While nos>uev1a-RNAi resulted in female sterility, it alone was insufficient to induce egg chamber degradation. However, simultaneous downregulation of Uev1A, Ben, and Cdc27 triggered significant egg chamber degradation (Figure 5-figure supplement 2).

      (8) Which stages of egg chambers get degraded with RasG12V induction?

      This is a good question. In our analyses, we noted that degrading egg chambers exhibited considerable size variability (Figure 1D). Because degradation disrupts normal morphological cues, precise staging of these egg chambers is nearly impossible.

      (9) I suggest testing the cellular senescence marker as well if the authors mention that CycA-degradation by Uev1a-APC/C complex prevents cellular senescence induced by RasG12V in a schematic image of Figure 4 (e.g., Dap/p21, SA-β-gal).

      As addressed in our response to your Major Comment (2), we lacked experimental evidence to support cellular senescence in this context. We have therefore revised the model accordingly (Figure 9E). While this study focuses specifically on cell death, investigating potential roles of cellular senescence remains an important direction for future research. Thank you for your suggestion!

      Minor Comments

      (1) Figure 1D: Df#7584

      It seems that the late-stage egg chamber is missing in this condition. Why does this occur without egg chamber degradation? Is there a possibility that we do not see egg chamber degradation because this deficiency line does not have a properly developed egg chamber that can have a degradation?

      While this image represents only a single sample, we have confirmed the presence of late-stage egg chambers in other samples. If “Df#7584/+” females were unable to support late-stage egg chamber development, complete sterility would be expected due to the lack of mature eggs. However, as shown in this image (Figure 1D), the ovary contains mature eggs, and the “Df#7584/+” fly strain remains fertile.

      (2) Based on the results that DDR signaling functions as keeping egg chambers from degradation, the authors may be better to check the DNA-damage markers in nos>RasG12V, nos>RasG12V +uev1a. (e.g. γ-H2AX)

      Thank you for your constructive recommendation. These data have been added to the revised manuscript (Figure 3C).

    1. Author response:

      eLife Assessment

      Using genome databases, the authors performed solid bioinformatic analyses to trace the genomic history of the clinically relevant Staphylococcus aureus tetracycline resistance plasmid pT181 over the last seven decades. They discovered that this element has transitioned from a multicopy plasmid to a chromosomally integrated element, and the work represents a valuable demonstration of the use of publicly available data to investigate plasmid biology and inform clinical epidemiology. This work will appeal to researchers interested in staphylococcal evolution and plasmid biology.

      Thank you, we agree with this overview. We also think this work is interesting to people interested in antimicrobial resistance and bacterial genome structure.

      Public Reviews:

      Reviewer #1 (Public review):

      The study provides a robust bioinformatic characterization of the evolution of pT181. My main criticism of the work is the lack of experimental validation for the hypotheses proposed by the authors.

      Comments on the study:

      (1) One potential reason for the decline in pT181 copy number over time may be a high cost associated with the multicopy state. In this sense, it would be interesting if the authors could use (or construct) isogenic strains differing only in the state of the plasmid (multicopy/integrated). With this system, the authors could measure the fitness of the strains in the presence and absence of tetracycline, and they could be able to understand the benefit associated with the plasmid transition. The authors discuss these ideas, but it would be nice to test them.

      We agree that the relative fitness of integrated versus multicopy plasmids is interesting and a costly multicopy state could explain the transition of independent pT181 replicons to chromosomal integration. This is a project we are exploring for a future study. However, we think that this additional experimental work goes beyond the scope of the paper.

      (2) It would be interesting to know the transfer frequencies of the multicopy mobilizable pT181 plasmid, compared to the transfer frequency of the plasmid integrated into the SSCmec element (which can be co-transferred, integrated in conjugative plasmids, or by transduction).

      We agree with the reviewer that this is an interesting question. However, we think inferring these rates from natural sequence data is not feasible in this case given the low heterogeneity of the plasmid sequence. A laboratory-based experimental study could not address the real transfers we observe over the course of decades, as in vitro S. aureus transfer rates are often not good proxies for in vivo (McCarthy et al., 2014). In addition, we do not know what is moving the integrated plasmid. pT181 could be moved by a phage or plasmid, so we are uncertain what the correct experiment would be to explore this.

      (3) One important limitation of the study that should be mentioned is that inferring pT181 PCN from whole genome data can be problematic. For example, some DNA extraction methods may underestimate the copy number of small plasmids because the small, circular plasmids are preferentially depleted during the process (see, for example, https://www.nature.com/articles/srep28063).

      We will investigate this issue further in the revisions. The kits used to extract DNA for the earlier-collected samples may possibly yield more plasmid DNA relative to the chromosome compared to newer ones on average; however, we think this is not driving the decline that we observe in multicopy pT181 copy number. Multiple BioProjects find the same result, where earlier samples have higher copy number compared to later samples. We expect extraction methods to be consistent within a BioProject, suggesting that this decline is genuine and not technical. In revisions, we intend to evaluate the effect of date of sequencing and additional metadata on copy number.

      Reviewer #2 (Public review):

      Summary:

      The authors performed bioinformatic analyses to trace the genomic history of the clinically relevant pT181 plasmid. Specifically, they:

      (1) Tracked the presence of pT181 across different S. aureus strain backgrounds through time. It was first found in one, later multiple strains, though this may reflect changes in sampling over time.

      (2) Estimated the mutation rate of the chromosome and plasmid.

      (3) Estimated the plasmid copy number of pT181, and found that it decreased over time. The latter was supported by two sets of statistical analyses, first showing that the number of single-copy isolates increased over time, and second, that the multicopy isolates demonstrated a lower PCN over time.

      (4) Reported the different integration sites at which pT181 integrated into the genome.

      As a caveat, they mentioned that identical plasmid sequences have variable plasmid copy numbers across different genomes in their dataset.

      Strengths:

      This is a very solid, well-considered bioinformatic study on publicly available data. I greatly appreciate the thoughtful approach the authors have taken to their subject matter, neither over- nor underselling their results. It is a strength that the authors focused on a single plasmid in a single bacterial species, as it allowed them to take into account unique knowledge about the biology of this system and really dive deep into the evolution of this specific plasmid. It makes for a compelling case study. At the same time, I think the introduction and discussion can be strengthened to demonstrate what lessons might be drawn from this case study for other plasmids.

      Weaknesses:

      The finding that the pT181 copy number declined over time is the most interesting claim of the paper to me, and not something that I have seen done before. While the authors have looked at some confounders in this analysis, I think this could be strengthened further in a revision.

      In the revisions, we will further explore the impact that technical variation could have in contributing to copy number variation and update our claims for the decline in copy number of the independent replicon over time and variation for the same plasmid sequence accordingly. Multiple BioProjects show earlier samples have higher copy number compared to later samples; we expect extraction methods to be consistent within a BioProject, supporting our initial findings that this decline over time is not due to technical variation.

      For the flow of the storyline, I also think the estimation of mutation rates (starting L181) and integration into the chromosome (starting L255) could be moved to the supplement or a later position in the main text.

      We will revisit the text organization for flow and clarity of storyline.

      Clearly, the use of publicly available data prevents the authors from controlling the growth and sequencing conditions of the isolates. It is striking that they observe a clear signal in spite of this, but I would have loved to see more discussion of the metadata that came with the publicly available sequences and even more use of that metadata to control for confounding.

      In revisions, we will further investigate possible contributors to the observed decline in copy number of multicopy pT181 over time. We have incorporated the date of sample collection and BioProject in our analysis, but not the date of sequencing or extraction technique.

      References

      McCarthy, A. J., Loeffler, A., Witney, A. A., Gould, K. A., Lloyd, D. H., & Lindsay, J. A. (2014). Extensive horizontal gene transfer during Staphylococcus aureus co-colonization in vivo. Genome Biology and Evolution, 6(10), 2697–2708. https://doi.org/10.1093/gbe/evu214

    1. ord itself imilieu in which records are creattermined by all these factors: fustructures, as well as records-creaobservation I am not abandoninggrounding in the evidence, structuway. I am asserting, however, thatcircumstances of creation a

      When I think about this passage alongside the rise of artificial intelligence (AI), Cook’s emphasis on context feels even more urgent. Terry Cook argues that records are shaped by the functional and structural environments in which they are created. In an AI-driven world, where systems generate, sort, and analyze massive volumes of data automatically, understanding that broader context becomes essential. AI can process content at scale, but without contextual grounding, it risks misinterpreting records or reinforcing surface-level patterns. I see AI as both an opportunity and a challenge for appraisal theory. On one hand, AI tools can help identify patterns across enormous bureaucratic systems, making macro-level analysis more feasible. They can cluster records, detect trends, and even suggest appraisal priorities. This could strengthen Cook’s top-down approach by giving archivists analytical support in mapping institutional functions. On the other hand, AI systems are trained on existing data, which may already reflect institutional biases and power imbalances. If archivists rely too heavily on AI-driven selection, we risk automating those biases. Cook stresses that archivists must actively and consciously shape the archival record. AI does not remove that responsibility—it arguably heightens it. I cannot simply defer judgment to an algorithm.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Abdelmageed et al. investigate age-related changes in the subcellular localization of DNA polymerase kappa (POLK) in the brains of mice. POLK has been actively investigated for its role in translesion DNA synthesis and involvement in other DNA repair pathways in proliferating cells, very little is known about POLK in a tissue-specific context, let alone in post-mitotic cells. The authors investigated POLK subcellular distribution in the brains of young, middle-aged, and old mice via immunoblotting of fractioned tissue extracts and immunofluorescence (IF). Immunoblotting revealed a progressive decrease in the abundance of nuclear POLK, while cytoplasmic POLK levels concomitantly increased. Similar findings were present when IF was performed on brain sections. Further, IF studies of the cingulate cortex (Cg1), the motor cortex (M1, M2), and the somatosensory (S1) cortical regions all showed an age-related decline in nuclear POLK. Nuclear speckles of POLK decrease in each region, meanwhile, the number of cytoplasmic POLK granules decreases in all four regions, but granule size is increasing. The authors report similar findings for REV1, another Y-family DNA polymerase.

      The authors then investigate the colocalization of POLK with other DNA damage response (DDR) proteins in either pyramidal neurons or inhibitory interneurons. At 18 months of age, DNA damage marker gH2AX demonstrated colocalization with nuclear POLK, while strong colocalization of POLK and 8-oxo-dG was present in geriatric mice. The authors find that cytoplasmic POLK granules colocalize with stress granule marker G3BP1, suggesting that the accumulated POLK ends up in the lysosome.

      Brain regions were further stained to identify POLK patterns in NeuN+ neurons, GABAergic neurons, and other non-neuronal cell types present in the cortex. Microglia associated with pyramidal neurons or inhibitory interneurons were found to have a higher abundance of cytoplasmic POLK. The authors also report that POLK localization can be regulated by neuronal activity induced by Kainic acid treatment. Lastly, the authors suggest that POLK could serve as an aging clock for brain tissue, but POLK deserves further characterization and correlation to functional changes before being considered as a biomarker.

      Strengths:

      Investigation of TLS polymerases in specific tissues and in post-mitotic cells is largely understudied. The potential changes in sub-cellular localization of POLK and potentially other TLS polymerases open up many questions about DNA repair and damage tolerance in the brain and how it can change with age.

      Weaknesses:

      The work is quite novel and interesting, and the authors do suggest some potentially interesting roles for POLK in the brain, but these are in and of themselves a bit speculative. The majority of the findings of this paper draw upon findings from POLK antibody and its presumed specificity for POLK. However, this antibody has not been fully validated and needs further work. Further validation experiments using Polk-deficient or knocked-down cells to investigate antibody specificity for both immunoblotting and immunofluorescence should be performed. More mechanistic investigation is needed before POLK could be considered as a brain aging clock.

      We are thankful for the overall enthusiasm and positive comments.

      (a) Concern over POLK antibody characterization in mouse:

      We performed siRNA and shRNA knock downs in mouse primary cortical neurons as well as efficiently transfectable murine lines like 4T1 and Neuro-2A showing knock down of 99kDa and 120kDa bands recognized by sc-166667 anti-POLK antibody (exact figure number Figure 1 and S1). We show that in IF sc-166667 and A12052 (Figure S1G) shows similar immunostaining patterns and we used sc-166667 in all reported figures and western blots.

      (b) More mechanistic investigation is needed before POLK could be considered as a brain aging clock:

      We sincerely appreciate the valuable suggestion. We agree as a terminal assay POLK nucleo-cytoplasmic status is not practical for longitudinal studies. However, we believe it may serve an investigative/correlative endogenous signal for determining tissue age, that may be useful to "date" brain sections, since not many such cell biological markers exist. We have added clarification texts to address this.

      Reviewer #2 (Public review):

      Summary:

      Abdelmageed et al., demonstrate POLK expression in nervous tissue and focus mainly on neurons. Here they describe an exciting age-dependent change in POLK subcellular localization, from the nucleus in young tissue to the cytoplasm in old tissue. They argue that the cytosolic POLK is associated with stress granules. They also investigate the cell-type specific expression of POLK, and quantitate expression changes induced by cell-autonomous (activity) and cell nonautonomous (microglia) factors.

      I think it is an interesting report but requires a few more experiments to support their findings in the latter half of the paper. Additionally, a more mechanistic understanding of the pathways regulating POLK dynamics between the nucleus and cytosol, what is POLK doing in the cytosol, and what is it interacting with; would greatly increase the impact of this report. However, additional mechanistic experiments are mostly not needed to support much of the currently presented results, again, it would simply increase the impact.

      (a) Concern on more mechanistic understanding of the pathways regulating POLK dynamics between the nucleus and cytosol:

      We sincerely appreciate the reviewer’s enthusiasm and valuable guidance in helping us better understand the mechanism of nuclear-cytoplasmic POLK dynamics. Previously, we developed a modified aniPOND (accelerated native isolation of proteins on nascent DNA) protocol, which we termed iPoKD-MS (isolation of proteins on Pol kappa synthesized DNA followed by mass spectrometry), to capture proteins bound to nascent DNA synthesized by POLK in human cell lines (bioRxiv https://www.biorxiv.org/content/10.1101/2022.10.27.513845v3). In this dataset, we identified potential candidates that may regulate nuclear/cytoplasmic POLK dynamics. These candidates are currently undergoing validation in human cell lines, and we are preparing a manuscript on these findings. Among these, some candidates, including previously identified proteins such as exportin and importin (Temprine et al., 2020, PMID: 32345725), are being explored further as potential POLK nuclear/cytoplasmic shuttles. We are also conducting tests on these candidates in mouse cortical primary neurons to assess their role in POLK dynamics. In the revised version of the manuscript, we have included a discussion of our current understanding.

      (b) Question on “… what is POLK doing in the cytosol, and what is it interacting with …”: Our data so far indicate that POLK accumulates in stress granules and lysosomes. We are very grateful for the reviewer’s insightful suggestions and will make every effort to incorporate them in the revised manuscript. We characterized POLK accumulation in the cytoplasm using six additional endo-lysosomal markers, as recommended by the reviewer. This data is now part of entirely new Figure 3.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors show that DNA polymerase kappa POLK relocalizes in the cytoplasm as granules with age in mice. The reduction of nuclear POLK in old brains is congruent with an increase in DNA damage markers. The cytoplasmic granules colocalize with stress granules and endo-lysosome. The study proposes that protein localization of POLK could be used to determine the biological age of brain tissue sections.

      Strengths:

      Very few studies focus on the POLK protein in the peripheral nervous system (PNS). The microscopy approach used here is also very relevant: it allows the authors to highlight a radical change in POLK localization (nuclear versus cytoplasmic) depending on the age of the neurons. 

      The conclusions of the study are strong. Several types of neurons are compared, the colocalization with several proteins from the NHEJ and BER repair pathways is tested, and microscopy images are systematically quantified.

      Weaknesses:

      The authors do not discuss the physical nature of POLK granules. There is a large field of research dedicated to the nature and function of condensates: in particular numerous studies have shown that some condensates but not all exhibit liquid-like properties (https://www.nature.com/articles/nrm.2017.7, https://pubmed.ncbi.nlm.nih.gov/33510441/ https://www.mdpi.com/2073-4425/13/10/1846). The change of physical properties of condensates is particularly important in cells undergoing stress and during aging. The authors should discuss this literature.

      We highly appreciate the reviewer bringing up the context of biomolecular condensates. Our iPoKD-MS data referenced above suggests candidates from various biomolecular condensates that we are currently investigating. We appreciate the reviewer providing important literature cited these articles in text and potential biomolecular condensates are discussed in the revised version. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The work is quite novel and interesting, and the authors do suggest some potentially interesting roles for POLK in the brain, but these are in of themselves a bit speculative. The majority of the findings of this paper rely upon the POLK antibody and its specificity for POLK, which is not fully characterized and needs further work (validation of antibodies using immunoblots of Polk KO cells or siRNA KD of POLK in murine cells) to provide confidence in the authors' findings. 

      Points

      siRNA knockdown of Polk in primary neurons showed a dramatic reduction in signal by IF even though qPCR analysis showed a reduction of only ~35% at the transcript level. Typically many DNA repair genes need to be knocked down by 80% or more to see discernable differences at the protein level. siRNA knockdown in a murine cell line (MEFs, neurons, or some other easily transfectable cell type) needs to be performed with immunoblotting with whole cell and fractionated (nuclear/cytoplasmic) lysates in order to better validate the anti-POLK antibodies and which bands that are visualized during immunoblotting are specific to POLK.

      We performed siRNA and shRNA knock downs in mouse primary cortical neurons as well as efficiently transfectable murine lines like 4T1 and Neuro-2A showing knock down of 99kDa and 120kDa bands recognized by sc-166667 anti-POLK antibody (exact figure number Figure 1 and S1). We show that in IF sc-166667 and A12052 (Figure S1G) shows similar immunostaining patterns and we used sc-166667 in all reported figures and western blots.

      Figure 1B and C, it is not clear which antibody(ies) are used for the immunoblotting of nuclear and cytoplasmic fractions and for a blot with whole tissue lysates. Please place the antibody vendor or clone next to the corresponding blot or describe it in the figure legend. Bands of varying sizes are present in 1B (and Figure S1) but only a band at 99 kDa was shown in 1C. Because there are no bands of equivalent size present in the nuclear and cytoplasmic fractions in Figure 1B, please describe or denote which bands were used for quantification purposes for nuclear and cytoplasmic POLK.

      This has been clarified by using only one antibody throughout the manuscript sc-166667. We observed in whole cell lysate an intense ~99kDa and a faint ~120kDa band, which gets intense in nuclear fraction and is absent in cytoplasmic fraction. We have noted this in multiple human cell lines and hiPSC-derived neurons, which is our ongoing work. We do not know yet if the ~120kDa is a modification or isoform of POLK. We have hints from our proteomics data that it may be SUMOylated or ubiquitinylated or other post translational modifications. We added this in the discussion section.

      Figure 1I, is there a quantification beyond just the representative image? There is no green staining pattern outside the cytoplasm in the 1-month-old M1 images that is present in all the other images in the panel.

      Fig 1I is now Fig S1G in the revised manuscript. Since REV1 and POLH were not central to the study that focused on POLK, they were meant to be exploratory data panels and as such we did not quantify beyond the qualitative evaluation, which broadly resembled POLK’s disposition with age. We have noted there are some sample to sample variability in the background signal. In general, outside the cytoplasm as subcellularly segmented by fluorescent nissl expression, tends to be variable by brain areas but also higher in older brains

      "Association with PRKDC further suggests POLK's role in the "gap-filling" step in the NHEJ repair pathway in neurons." There is no strong evidence in the literature for mammalian POLK playing a role in NHEJ. Some description of a role in HR has been described, however. The reference regarding the iPoKD-MS data set that provides evidence of POLK associating with BER and NHEJ factors is listed as Paul, 2022 but is in the reference list as Shilpi Paul 2022.

      We removed this speculative statement and citation fixed.

      Figure 4A, what is the age of the mouse for the representative images?

      19 months and now mentioned in the figure legend

      Figure 4C, Could the data from the different ages be plotted side by side to better evaluate the differences for each cell type/region?

      Data is plotted side by side

      Why was the one-month time point chosen as this could still represent the developing and not mature murine brain? 

      Reviewer correctly noted that a 1 month brain is still developing, but mostly from the behavioral and circuit maturation standpoint. However, from cell division and neurogenesis perspective, that is considered to be complete by first postnatal month, with neuron production thereafter largely restricted to specialized adult niches in the dentate gyrus and subventricular zone–olfactory bulb pathway; these adult neurogenic stem cells are embryonically derived and are regulated in ways that are distinct from the early, expansionary developmental waves of neurogenesis. In our study we performed our measurements in the cortical areas only. (Caviness et al., 1995, PMID: 7482802; Ansorg et al., 2012, PMID: 22564330; Ming & Song, 2011, PMID: 21609825; Bond et al., 2015, PMID: 26431181; Bond et al., 2021, PMID: 33706926; Bartkowska et al., 2022, PMID: 36078144). Also, in Figure 6A it was incorrectly mentioned to be just 1month, we rechecked our metadata and noted that young brains were comprised of 1 and 2 month old brains and now it has been corrected.

      Furthermore, can the authors describe which sex of mice was used in these experiments and the justification if a single sex was used? If both sexes were used, were there any dimorphic differences in POLK localization patterns?

      This is an important aspect, but in the beginning to keep mice numbers within manageable limits, we were focusing more on the age component. While both males and female brains were assayed but due to uneven sample distribution between sexes, we could not estimate if there were any statistically significant sexual dimorphic differences in IN, PN and NNs. Future studies will investigate the sex component as a function of age.

      The suggestion of POLK as a brain aging clock may be a bit premature as the functional and behavioral consequences of cytoplasmic POLK sequestration are not fully known. Furthermore, investigation of POLK levels in other genetic models of neurodegeneration or with gerotherapeutics would be needed to establish if the POLK brain clock is responsive to changes that shift brain aging. Lastly, this clock may be impractical and not useful for longitudinal studies due to the terminal nature of assessing POLK levels.

      We agree as a terminal assay POLK nucleo-cytoplasmic status is not practical for longitudinal studies. However, we believe it may serve an investigative/correlative endogenous signal for determining tissue age, that may be useful to "date" brain sections, since not many such cell biological markers exist. We have added clarification text.

      Some discussion of the Polk-null mice is warranted, as they only have a slightly shortened lifespan, and any disease phenotypes were not reported. This stands in contrast to other DNA repair-deficient mice that mimic premature aging and show behavioral and motor deficits. This calls into question the role of POLK in brain aging.

      Discussion statements on Polk-null mice has been added.

      Please correct the catalog number for the SCBT anti-POLK antibody to sc-166667

      Typographical error has been corrected

      Reviewer #2 (Recommendations for the authors):

      Results:

      Figure by figure 

      (1) A progressive age-associated shift in subcellular localization of POLK The authors state that POLK has not been studied in nervous tissue before and they want to see if it is expressed, and if it changes subcellular location as a function of age. The authors argue age = stress like that seen in previous models using genotoxic agents and cancer cells. Indeed, POLK seems to convincingly change subcellular location from the nucleus to larger cytosolic puncta. 

      (2) Nuclear POLK co-localizes with DNA damage response and repair proteins This was a difficult dataset for me to decipher. To me, it appears as though POLK colocalizes with these examined proteins in the CYTOSOL, not the nucleus. Especially, in the oldest mice.

      We added in the discussion that DNA repair proteins were observed to be present in the cytoplasm and biomolecular condensates citing relevant reviews and primary references.

      (3) POLK in the cytoplasm is associated with stress granules and lysosomes in old brains LAMP1 has some issues as a lysosome marker. The authors even state it can be on endosomes. It would be nice to use a marker for mature lysosomes, some fluorescent reporter that is activated only by lysosomal proteases or pH. It is also of interest if POLK is localized to the membrane or the inside of these structures. The authors have access to an airyscan which is sufficient to examine luminal vs membrane localization on larger organelles like lysosomes.

      We thank the reviewer for pushing us to investigate the nature of cytoplasmic POLK in endo-lysosomal compartments. We have now added a full-page figure on the cell biological results from six different markers, subset (Cathepsin B and D) are known to present in the lumens of endo-lysosomes, in Figure 3. Further high-resolution membrane vs lumen was not pursued, which is perhaps better suited in cultured neurons rather than thick fixed tissues.

      (4) Differentially altered POLK subcellular expression amongst excitatory, inhibitory, and nonneuronal cells in the cortex.

      This seems fine. I don't see anything wrong with the author's statement that there is more POLK in neurons vs non-neuronal cells. 

      (5) Microglia associated with IN and PN have significantly higher levels of cytoplasmic POLK I don't see really any convincing evidence of the author's claim here. They find a difference at early-old age, but not at old-old, or other ages. This is explained by "However, this effect is lost in late-old age (Figure 5D), likely due to the MG-mediated removal of the INs.". But no trend being observed, no experiment to show sufficiency, and no experiment to uncover a directional relationship; this is a tough claim to stand by.

      Changes made in text to reflect speculative nature of this observation

      (6) Subcellular localization of POLK is regulated by neuronal activity

      Interesting and fairly difficult experiment. Can the authors talk more about what these values mean? I am confused as to why there is a decline in nuclear puncta at 80 min. Also, why are POLK counts in 6c similar at baseline between young and early-old? In Figures 5 and 6 I also worry about statistical analysis. Are all assumptions checked to use t-tests? Why not always use a test that has fewer assumptions?

      We have explained in the text the artificial nature of few hour long acute slice preparations is very different and inherently a stressful environment, especially for the old brains, compared to the vascular perfused PFA fixed brain tissues tested between young and old ages.

      We don’t have a proper explanation for the initial dip in nuclear puncta in both young and old brains at 80min of very similar magnitude. It could be a separate biological phenomenon that occurs at much shorter time scales that would not otherwise be captured in a fixed tissue assay and needs careful investigation using live tissue fluorescence imaging that is beyond the scope of this manuscript.

      We apologize for the typographical error in the figure legend. We rechecked our R code and the tests were all Wilcoxon rank-sum (Mann–Whitney U) two-sided nonparametric.

      Figure 6B & E had absurdly small p values due to large sample numbers. So, we implemented random sampling of 100 cells repeating for 200 times and presented the distribution of p values and Cohen’s d in the supplement and reported the median p value and Cohen’s in the main plot.

      (7) POLK as an endogenous "aging clock" for brain tissue

      Trainable model. What are the criteria for the model, and how does it work? The cutoffs it uses to classify each age group might be interesting in that the model may have identified a trait the researchers were unaware of. Otherwise, it is not especially useful. Maybe as an independent 'blind' analysis of the data?

      We have added a better description of the models, assumptions and how two different unsupervised approaches converge on the same set of features with high AUROCs.

      Minor questions:

      The cartoons (1a, 2a-b, 5a, 6a) help a lot. However, I still had to work a bit to understand some of the graphs (e.g., 5d, 6b-e, fig 7). Is there a simpler way to present them? Maybe simply additional labelling? I'm not sure.

      A more thorough discussion of statistical tests is warranted I think. I am not very clear why some were chosen (t-test vs nonparametric with fewer assumptions). Infinitesimally small p values also make me think maybe incorrect tests were done or no power analysis was performed beforehand. A fix for this is just discussing what went into the testing methods and why they were chosen.

      Statistical analysis for Fig2 (using Generalized Estimating Equations), and Fig6 (with random repeated subsampling; method explained in text, figure legend updated and supplementary data on the distribution of p values and cohen’s d are added) to address the very small p values. Descriptions rewritten in relevant text.

      In the absence of further mechanistic experiments, it would still be interesting to hear what the authors think is going on and what the significance of this altered subcellular location means. How do the authors think this is occurring? I think they are arguing that cytosolic localization of POLK is 100% detrimental to the neuron. ("The reduction of nuclear POLK in old brains is congruent with an increase in DNA damage markers") Do they have any idea what the 'bug' is in the POLK system then?

      Statements in the discussion has been added.

      Reviewer #3 (Recommendations for the authors):

      POLK is detected as small " as small "speckles" inside the nucleus at a young age (1-2 months) and larger "granules" can be seen in the cytoplasm at progressively older time points (>9 months). In the nucleus, is POLK bound to DNA? In the cytoplasm, how are the POLK molecules organized: are they bound to a substrate or are they just organized as a proteins condensate without DNA?

      In human U2OS cell line Dnase1 treatment leads to loss of POLK from the nucleus as well as its activity as reported in Fig5 of Paul, S. et. al. 2023 bioRxiv. While we haven’t reproduced these results in mouse primary neurons, we anticipate a similar situation which will be tested in the future. We have addressed limited aspects of the POLK in the cytoplasm in all new Fig3 with six endo-lysosomal markers, and added text.

      When POLK proteins accumulate in the cytoplasm in aging cells, do they also repair condensates in the cytoplasm? What is the function of cytoplasmic POLK granules? More generally, is it known if other granules or foci, such as repair foci are found in the cytoplasms in aging cells, or in cells under stress?

      Six markers for endo-lysosomes were tested to characterize the cytoplasmic granules now shown in Fig3.

      While the authors quantify the number and sizes of the POLK signal, they don't discuss their physical nature. Some membrane-less condensates exhibit liquid-like properties, such as stress granules, P-bodies, or in the nucleus some repair condensates. In some diseased tissues, some condensates lose their liquid properties and become solid-like. Is it known if POLK condensates behave like liquid condensates or they are simply formed by bound molecules on DNA? Since they are larger and fewer in the cytoplasm, is it because several small puncta fused together to form a larger one? It would be worthwhile to discuss these points.

      Discussion statements on the nature of condensates in context of the POLK cytoplasmic signal has been added.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript titled, "Sleep-Wake Transitions Are Impaired in the AppNL-G-F Mouse Model of Early Onset Alzheimer's Disease", is about a study of sleep/wake phenomena in a knockin mouse strain carrying "three mutations in the human App gene associated with elevated risk for early onset AD". Traditional, in-depth characterization of sleep/wake states, EEG parameters, and response to sleep loss are employed to provide evidence, "supporting the use of this strain as a model to investigate interventions that mitigate AD burden during early disease stages". The sleep/wake findings of earlier studies (especially Maezono et al., 2020, as noted by the authors) were extended by several important, genotype-related observations, including age-related hyperactivity onset that is typically associated with increased arousal, a normal response to loss of sleep and to multiple sleep latency testing, and a stronger AD-like phenotype in females. The authors conclude that the AppNL-G-F mice demonstrate many of the human AD prodromal symptoms and suggest that this strain may serve as a model for prodromal AD in humans, confirming the earlier results and conclusions of Maezono et al. Finally, based on state bout frequency and duration analyses, it is suggested that the AppNL-G-F mice may develop disruptions in mechanism(s) involved in state transition.

      Strengths:

      The study appears to have been, technically, rigorously conducted with high quality, in-depth traditional assessment of both state and EEG characteristics, with the concordant addition of activity and temperature. The major strengths of this study derive from observations that the AppNL-G-F mice: (1) are more hyperactive in association with decreased transitions between states; (2) maintain a normal response to sleep deprivation and have normal MSLT results; and (3) display a sex specific, "stronger" insomnia-like effect of the knockin in females.

      Weaknesses:

      The weaknesses stem from the study's impact being limited due to its being largely confirmatory of the Maezono et al. study, with advances of importance to a potentially more focused field. Further, the authors conclude that AppNL-G-F mice have disrupted mechanism(s) responsible for state transition; however, these were not directly examined. The rationale for this conclusion is stated by the authors as based on the observations that bouts of both W and NREM tend to be longer in duration and decreased in frequency in AppNL-G-F mice. Although altered mechanism(s) of state transition (it is not clear what mechanisms are referenced here) cannot be ruled out, other explanations might be considered. For example, increased arousal in association with hyperactivity would be expected to result in increased duration of W bouts during the active phase. This would also predictably result in greater sleep pressure that is typically associated with more consolidated NREM bouts, consistent with the observations of bout duration and frequency.

      Reviewer 1 succinctly summarizes the advances of this study beyond the ground-breaking Maezono et al (2020) study of this “humanized” mouse model exhibiting amyloid deposition. Whereas Maezono et al. conducted sleep/wake studies on male App<sup>NL-G-F</sup> mice at 6 and 12 months of age, we had the unusual opportunity to study both sexes of homozygous App<sup>NL-G-F</sup> mice and WT littermates at 14-18 months of age and to conduct a longitudinal assessment of many of the same individuals at 18-22 months. In addition to baseline sleep/wake and EEG spectral analyses, we (1) measured subcutaneous body temperature and activity to obtain a broader picture of the physiology and behavior of this strain at advanced ages; (2) assessed baseline sleepiness in this strain using the murine version of the clinically-relevant Multiple Sleep Latency Test (MSLT); (3) evaluated the response of App<sup>NL-G-F</sup> mice and WT littermates to a perturbation of the sleep homeostat; (4) compared the sleep/wake characteristics of male vs. female App<sup>NL-G-F</sup> mice at 18-22 months and, (5) to assess the stability of the phenotypes, analyzed these data over a continuous 14-d recording rather than the conventional 24h recordings typical of most sleep/wake studies including Maezono et al. We found that a long wake/short sleep phenotype was characteristic of homozygous App<sup>NL-G-F</sup> mice at these advanced ages which is also evident in the Maezono et al. (2020) study at 12 months of age (but not at 6 months), although the authors do not comment on this phenotype and instead focus on the reduced REM sleep which is particularly evident in female App<sup>NL-G-F</sup> mice in our study. Remarkably, despite being awake ~20% longer per day, we find that App<sup>NL-G-F</sup> mice are no sleepier than WT mice as determined by the MSLT and that their sleep homeostat is intact when challenged by 6-h sleep deprivation. At both advanced ages, the long wake/short sleep phenotype is due primarily to longer Wake bouts and shorter bouts of both NREM and REM sleep during the dark phase. Moreover, hyperactivity develops in older in App<sup>NL-G-F</sup> mice, particularly females, which contributes to this phenotype. We agree with Reviewer 1 that “hyperactivity would be expected to result in increased duration of W bouts during the active phase” and that this could result in more consolidated NREM bouts and we will modify the manuscript to discuss this alternative. However, the suggestion of greater sleep pressure is not borne out by the MSLT studies as we did not observe the shorter sleep latencies and increased sleep during the nap opportunities on the MSLT that we have observed in other mouse strains. Moreover, due to their short sleep phenotype, App<sup>NL-G-F</sup> mice would be entering the sleep deprivation study with a greater sleep debt than WT mice, yet we did not observe greater EEG Slow Wave Activity in this strain during recovery from sleep deprivation. Thus, we have suggested that App<sup>NL-G-F</sup> mice are unable to transition from Wake to sleep as readily as their WT littermates. Our observations summarized above set the stage for subsequent mechanistic studies in aged App<sup>NL-G-F</sup> mice, although realistically, mice of this age and genotype are a rare commodity.

      Reviewer #2 (Public review):

      Summary:

      The authors have used a knock-in mouse model to explore late-in-life amyloid effects on sleep. This is an excellent model as the mutated genes are regulated by the endogenous promoter system. The sleep study techniques and statistical analyses are also first-rate.

      The group finds an age-dependent increase in motor activity in advanced age in the NLGF homozygous knock-in mice (NLGF), with a parallel age-dependent increase in body temperature, both effects predominate in the dark period. Interestingly, the sleep patterns do not quite follow the sleep changes. Wake time is increased in NLGF mice, and there is no progression in increased wake over time. NREMS and REM sleep are both reduced, and there is no progression. Sleep-wake effects, however, show a robust light:dark effect with larger effects in the dark period. These findings support distinct effects of this mutation on activity and temperature and on sleep. This is the first description of the temporal pattern of these effects. NLGF mice show wake stability (longer bout durations in the dark period (their active period) and fewer brief arousals from sleep. Sleep homeostasis across the lights-on period is normal. Wake power spectral density is unaffected in NLGF mice at either age. Only REM power spectra are affected, with NLGF mice showing less theta and more delta. There are interesting sex differences, with females showing no gene difference in wake bout number, while males show a gene effect. Similarly, gene effects on NREM bout number seem larger in males than in females. Although there was no difference in homeostatic response, there was normalization of sleep-wake activity after sleep deprivation.

      Strengths:

      Approach (model extent of sleep phenotyping), analysis.

      Weaknesses:

      The weaknesses are summarized below and are viewed as "addressable".

      (1) The term insomnia. Insomnia is defined as a subjective dissatisfaction with sleep, which cannot be ascertained in a mouse model. The findings across baseline sleep in NLGF mice support increased wake consolidation in the active period. The predominant sleep period (lights on) is largely unaffected, and the active period (lights off) shows increased activity and increased wake with longer bouts. There is a fantastic clue where NLGF effects are consistent with increased hypocretinergic (orexinergic) neuron activity in the dark period, and/or increased drive to hypocretin neurons from PVH.

      (2) Sleep-wake transitions are impaired: This should not be termed an impairment. It could actually be beneficial to have greater state stability, especially wake stability in the dark or active period. There is reduced sleep in the model that can be normalized by short-term sleep loss. It is fascinating that recovery sleep normalized sleep in the NLGF in the immediate lights-on and light-off period. This is a key finding.

      Reviewer 2 suggests a provocative hypothesis to test. Curiously, although a recent Science paper suggests that hyperexcitable hypocretin/orexin neurons in aging mice results in greater sleep/wake fragmentation, hyperexcitability of this system could result in hyperactivity and longer wake bouts in aged App<sup>NL-G-F</sup> mice.

      Reviewer #3 (Public review):

      Summary:

      In this study, Tisdale et al. studied the sleep/wake patterns in the biological mouse model of Alzheimer's disease. The results in this study, together with the established literature on the relationship of sleep and Alzheimer's disease progression, guided the authors to propose this mouse model for the mechanistic understanding of sleep states that translates to Alzheimer's disease patients. However, the manuscript currently suffers from a disconnect between the physiological data and the mechanistic interpretations. Specifically, the claim of "impaired transitions" is logically at odds with the observed increase in wake-state stability or possible hyperactivity. Additionally, the description of the methods, the quantification, and the figure presentation could be substantially improved. I detail some of my concerns below.

      Strengths:

      The selection of the knock-in model is a notable strength as it avoids the artifacts associated with APP overexpression and more closely mimics human pathology. The study utilizes continuous 14-day EEG recordings, providing a unique dataset for assessing chronic changes in arousal states. The assessment of sex as a biological variable identifies a more severe "insomniac-like" phenotype in females, which aligns with the higher prevalence and severity of Alzheimer's disease in women.

      Weaknesses:

      The study seems to lack a clear hypothesis-driven approach and relies mostly on explorative investigations. Moreover, lack of quantitative analytical methods as well as shaky logical conclusions, possibly not supported by data in its current form, leaves room for major improvement.

      Since this paper studied sleep states, the "Methods" section is quite unclear on what specific criteria were used to classify sleep states. There is no quantitative description of classifying sleep based on clear, reproducible procedures. There are many reasonably well-characterized sleep scoring systems used in rat electrophysiological literature, which could be useful here. The authors are generally expected to describe movement speed and/or EMG and/or EEG (theta/delta/gamma) criteria used to classify these epochs. The subjective (manual) nature of this procedure provides no verifiable validation of the accuracy and interpretability of the results.

      One of the bigger claims is that "state transition mechanism(s)" are impaired. However, Figure 7 shows that model mice exhibit significantly more long wake bouts (>260s) and fewer short wake bouts (<60s). Logically, an "impaired switch" (the flip-flop model, Saper et al., 2010) results in state fragmentation. The data here show the opposite: the wake state has become too stable. This suggests the primary defect is not in the transition mechanism itself, but possibly in a pathological increase in arousal drive (hyper-arousal), likely linked to the dark-phase hyperactivity shown in Figures 4 and 5. Also, a point to note is that this finding is not new.

      Figure 3 heatmaps lack color bars and units. Spectral power must be quantitatively defined and methods well-explained in the Methods section. Without these, the reader cannot discern if the "reduced power" in females is a global suppression of signal or a frequency-specific shift. Additionally, the representative example used to claim shorter sleep bouts lacks the statistical weight required for a major physiological conclusion. How does a cooler color (not clear what range and what the interpretation is) mean shorter sleep bout in female mice? The authors should clearly mark the frequency ranges that support their claims. In this figure, there is a question mark following the theta/delta range. The authors should avoid speculation and state their claims based on facts. They should also add the theta and delta ranges in the plot, such that readers can draw their own conclusions.

      Figure 8 and the MSLT results show that model mice are "no sleepier than WT mice" and have a functional homeostatic rebound. This presents a logical flaw in the "insomnia" narrative. True insomnia in AD patients typically involves a failure of the homeostatic process or a debilitating accumulation of sleep debt. If these mice do not show increased sleepiness (shorter latency) despite ~19% less sleep, the authors might be describing a "reduced need" for sleep or a "hyper-aroused" state, possibly not a clinical insomnia phenotype.

      In Figure 9, LFP power shown and compared in percentages is problematic, as LFP power distribution is known to be skewed (follows power law). This is particularly problematic here because all the frequencies above ~20 Hz seem to be totally flattened or nonexistent, which makes this comparison of power severely limited and biased towards the relative frequency in the highly skewed portion of the LFP power spectrum, i.e., very low frequency ranges like delta, theta, and possibly beta. This ignores low, mid, and high gamma as well as ripple band frequencies. NREM sleep is known to have relatively greater ripple band (100-250 Hz) power bursts in hippocampal regions, and REM sleep is known to have synchronous theta-gamma relationships.

      We agree with the reviewer that the “Classification of arousal states” section was missing the key description of how we scored the recordings into arousal states based on EEG, EMG and locomotor activity; this was an oversight as the corresponding text exists in all our previous sleep/wake studies published over several decades. Reviewer 1 also points out the alternative interpretation that “the wake state has become too stable.” However, I think we are using different words to say the same thing: that the transition from wake to sleep is impaired whether it is due to hyperarousal or to a defect in the flip/flop switch that results in greater Wake stability. We will revise Fig 3 (Reviewer 2 suggests combining with Fig 14) but note that the X-axis is labelled 0-25 Hz and that this figure was intended to be descriptive -- illustrating how unusual the female App<sup>NL-G-F</sup> mice are relative to WT -- rather than a quantitative analysis of spectral power as in Fig. 14. Both Reviewer 2 and 3 suggest that we are using “insomnia” incorrectly, which we have simply used to describe less sleep per 24h period. Reviewer 2 states that “Insomnia is defined as a subjective dissatisfaction with sleep” and Reviewer 3 suggests a narrow definition of insomnia as due only to “a failure of the homeostatic process or a debilitating accumulation of sleep debt.” In a revised manuscript, we will define “insomnia” as an operational term to succinctly mean “less sleep”. Regarding the problem of presenting spectral power in percentages, we completely agree with the reviewer. However, we intentionally presented spectral power density, a measure of relative power, as in Figure 3A and 3B of Maezono et al. (2020). At the risk of making Fig. 9 even more busy, we will revise Fig. 9 to add labels for all Y-axes.

      In addition to a revised Fig. 9, in the revised manuscript, we will reformat Tables 1-3, Figs. S1 and S2 for legibility and correct an error in Fig. 7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Wu et al. uses endogenous bruchpilot expression in a cell-type-specific manner to assess synaptic heterogeneity in adult Drosophila melanogaster mushroom body output neurons. The authors performed genomic on locus tagging of the presynaptic scaffold protein bruchpilot (BRP) with one part of splitGFP (GFP11) using the CRISPR/Cas9 methodology and co-expressed the other part of splitGFP (GFP1-10) using the GAL4/UAS system. Upon expression of both parts of splitGFP, fluorescent GFP is assembled at the N-terminus of BRP, exactly where BRP is endogenously expressed in active zones. For manageable analysis, a high-throughput pipeline was developed. This analysis evaluated parameters like location of BRP clusters, volume of clusters, and cluster intensity as a direct measure of the relative amount of BRP expression levels on site, using publicly available 3D analysis tools that are integrated in Fiji. Analysis was conducted for different mushroom body cell types in different mushroom body lobes using various specific GAL4 drivers. To test this new method of synapse assessment, Wu et al. performed an associative learning experiment in which an odor was paired with an aversive stimulus and found that, in a specific time frame after conditioning, the new analysis solidly revealed changes in BRP levels at specific synapses that are associated with aversive learning.

      Strengths:

      Expression of splitGFP bound to BRP enables intensity analysis of BRP expression levels as exactly one GFP molecule is expressed per BRP. This is a great tool for synapse assessment. This tool can be widely used for any synapse as long as driver lines are available to co-express the other part of splitGFP in a cell-type-specific manner. As neuropils and thus the BRP label can be extremely dense, the analysis pipeline developed here is very useful and important. The authors have chosen an exceptionally dense neuropil - the mushroom bodies - for their analysis and convincingly show that BRP assessment can be achieved with such densely packed active zones. The result that BRP levels change upon associative learning in an experiment with odor presentation paired with punishment is likewise convincing, and strongly suggests that the tool and pipeline developed here can be used in an in vivo context.

      Weaknesses:

      Although BRP is an important scaffold protein and its expression levels were associated with function and plasticity, I am still somewhat reluctant to accept that synapse structure profiling can be inferred from only assessing BRP expression levels and BRP cluster volume. Also, is it guaranteed that synaptic plasticity is not impaired by the large GFP fluorophore? Could the GFP10 construct that is tagged to BRP in all BRP-expressing cells, independent of GAL4, possibly hamper neuronal function? Is it certain that only active zones are labeled? I do see that plastic changes are made visible in this study after an associative learning experiment with BRP intensity and cluster volume as read-out, but I would be reassured by direct measurement of synaptic plasticity with splitGFP directly connected to BRP, maybe at a different synapse that is more accessible.

      We appreciate the reviewer’s comments. In the revised manuscript, we have clarified that Brp is an important, but not the only player in the active zone. We have included new data to demonstrate that split-GFP tagging does not severely affect the localization and plasticity of Brp and the function of synapses by showing: (1) nanoscopic localization of Brp::rGFP using STED imaging; (2) colocalization between Brp::rGFP and anti-Brp signals/VGCCs; (3) activity-dependent Brp remodeling in R8 photoreceptors; (4) no defect in memory performance when labeling Brp::rGFP in KCs; These four lines of additional evidence further corroborate our approach to characterize endogenous Brp as a proxy of active zone structure.

      Reviewer #2 (Public review):

      Summary:

      The authors developed a cell-type specific fluorescence-tagging approach using a CRISPR/Cas9 induced spilt-GFP reconstitution system to visualize endogenous Bruchpilot (BRP) clusters as presynaptic active zones (AZ) in specific cell types of the mushroom body (MB) in the adult Drosophila brain. This AZ profiling approach was implemented in a high-throughput quantification process, allowing for the comparison of synapse profiles within single cells, cell types, MB compartments, and between different individuals. The aim is to analyse in more detail neuronal connectivity and circuits in this centre of associative learning. These are notoriously difficult to investigate due to the density of cells and structures within a cell. The authors detect and characterize cell-type-specific differences in BRP-dependent profiling of presynapses in different compartments of the MB, while intracellular AZ distribution was found to be stereotyped. Next to the descriptive part characterizing various AZ profiles in the MB, the authors apply an associative learning assay and detect consequent AZ re-organisation.

      Strengths:

      The strength of this study lies in the outstanding resolution of synapse profiling in the extremely dense compartments of the MB. This detailed analysis will be the entry point for many future analyses of synapse diversity in connection with functional specificity to uncover the molecular mechanisms underlying learning and memory formation and neuronal network logics. Therefore, this approach is of high importance for the scientific community and a valuable tool to investigate and correlate AZ architecture and synapse function in the CNS.

      Weaknesses:

      The results and conclusions presented in this study are, in many aspects, well-supported by the data presented. To further support the key findings of the manuscript, additional controls, comments, and possibly broader functional analysis would be helpful. In particular:

      (1) All experiments in the study are based on spilt-GFP lines (BRP:GFP11 and UAS-GFP1-10).The Materials and Methods section does not contain any cloning strategy (gRNA, primer, PCR/sequencing validation, exact position of tag insertion, etc.) and only refers to a bioRxiv publication. It might be helpful to add a Materials and Methods section (at least for the BRP:GFP11 line). Additionally, as this is an on locus insertion the in BRP-ORF, it needs a general validation of this line, including controls (Western Blot and correlative antibody staining against BRP) showing that overall BRP expression is not compromised due to the GFP insertion and localizes as BRP in wild type flies, that flies are viable, have no defects in locomotion and learning and memory formation and MB morphology is not affected compared to wild type animals.

      We thank the reviewer for suggesting these important validations. We included details of the design of the construct and insertion site to the Methods section, performed several new experiments to validate the split-GFP tagging of Brp, and present the data in the revision.

      First, to examine whether the transcription of the brp gene is unaffected by the insertion of GFP<sub>11</sub>, we conducted qRT-PCR to compare the brp mRNA levels between brp::GFP<sub>11</sub>, UAS-GFP1-10 and UAS-GFP1-10 and found no difference (Figure 1 - figure supplement 1A).

      To further verify the effect of GFP<sub>11</sub> tagging at the protein level, we performed anti-Brp (nc82) immunohistochemistry of brains where GFP is reconstituted pan-neuronally. We found unaltered neuropile localization of nc82 signals (Figure 1 - figure supplement 1C). In presynaptic terminals of the mushroom body calyx, we found integration of Brp::rGFP to nc82 accumulation (Figure 1D). We performed super-resolution microscopy to verify the configuration of Brp::rGFP and confirmed the donut-shape arrangement of Brp::rGFP in the terminals of motor neurons (see Wu, Eno et al., 2025 PLOS Biology), corroborating the nanoscopic assembly of Brp::rGFP at active zones (Kittel et al., 2006 Science).

      Furthermore, co-expression of RFP-tagged voltage-gated calcium channel alpha subunit Cacophony (Cac) and Brp::rGFP in PAM-γ5 dopaminergic neurons revealed strong presynaptic colocalization of their punctate clusters (Figure 1E), suggesting that rGFP tagging of Brp did not damage key protein assembly at active zones (Kawasaki et al., 2004 J Neuroscience; Kittel et al., Science).

      These lines of evidence suggest that the localization of endogenous Brp is barely affected by the C-terminal GFP<sub>11</sub> insertion or GFP reconstitution therewith. This is in line with a large body of studies confirming that the N-terminal region and coiled-coil domains, but not the C-terminal, region of Brp are necessary and sufficient for active zone localization (Fouquet et al., 2009 J Cell Biol; Oswald et al., 2010 J Cell Biol; Mosca and Luo, 2014 eLife; Kiragasi et al., 2017 Cell Rep; Akbergenova et al., 2018 eLife; Nieratschker et al., 2009 PLoS Genet; Johnson et al., 2009 PLoS Biol; Hallermann et al., 2010 J Neurosci). We nevertheless report homozygous lethality and found the decreased immunoreactive signals in flies carrying the GFP<sub>11</sub> insertion (Figure 1 - figure supplement 1B).

      For these reasons, we always use heterozygotes for all the experiments therefore there is no conspicuous defect in locomotion as reported in the original study (Wagh et al., 2005 Neuron). To functionally validate the heterozygotes, we measured the aversive olfactory memory performance of flies where GFP reconstitution was induced in Kenyon cells using R13F02-GAL4. We found that all these transgenes did not alter mushroom body morphology (Figure 7 - figure supplement 1) or memory performance as compared to wild-type flies (Figure 7 - figure supplement 2), suggesting the synapse function required for short-term memory formation is not affected by split-GFP tagging of Brp.

      (2) Several aspects of image acquisition and high-throughput quantification data analysis would benefit from a more detailed clarification.

      (a) For BRP cluster segmentation it is stated in the Materials and Methods state, that intensity threshold and noise tolerance were "set" - this setting has a large effect on the quantification, and it should be specified and setting criteria named and justified (if set manually (how and why) or automatically (to what)). Additionally, if Pyhton was used for "Nearest Neigbor" analysis, the code should be made available within this manuscript; otherwise, it is difficult to judge the quality of this quantification step.

      (b) To better evaluate the quality of both the imaging analysis and image presentation, it would be important to state, if presented and analysed images are deconvolved and if so, at least one proof of principle example of a comparison of original and deconvoluted file should be shown and quantified to show the impact of deconvolution on the output quality as this is central to this study.

      We thank the reviewer for suggesting these clarifications. We have included more description to the revised manuscript to clarify the setting of segmentation, which was manually adjusted to optimize the F-score (previous Figure 1D, now moved to Figure 1 -figure supplement 5). We have included the code used for analyzing nearest neighbor distance, AZ density and local Brp density in the revised manuscript (Supplementary file 1), together with a pre-processed sample data sheet (Supplementary file 2).

      Regarding image deconvolution, we have clarified the differential use of deconvolved and not-deconvolved images in the revised manuscript. We have also included a quantitative evaluation of Richardson-Lucy iterative deconvolution (Figure 1 - figure supplement 4). We used 20 iterations due to only marginal FWHM improvement beyond this point (Figure 1 - figure supplement 4).

      (3) The major part of this study focuses on the description and comparison of the divergent synapse parameters across cell-types in MB compartments, which is highly relevant and interesting. Yet it would be very interesting to connect this new method with functional aspects of the heterogeneous synapses. This is done in Figure 7 with an associative learning approach, which is, in part, not trivial to follow for the reader and would profit from a more comprehensive analysis.

      (a) It would be important for the understanding and validation of the learning induced changes, if not (only) a ratio (of AZ density/local intensity) would be presented, but both values on their own, especially to allow a comparison to the quoted, previous AZ remodelling analysis quantifying BRP intensities (ref. 17, 18). It should be elucidated in more detail why only the ratio was presented here.

      We thank the reviewer for the suggestion on the presentation of learning-induced Brp remodeling. The reported values in Figure 7C are the correlation coefficient of AZ density and local intensity in each compartment, but not the ratio. These results suggest that subcompartment-sized clusters of AZs with high Brp accumulation (Figure 6) undergo local structural remodeling upon associative learning (Figure 7). For clarity, we have included a schematic of this correlation and an example scatter plot to Figure 6. Unlike the previous studies (refs 17 and 18), we did not observe robust learning-dependent changes in the Brp intensity, possibly due to some confounding factors such as overall expression levels and conditioning protocols as described in the previous and following points, respectively.

      (b) The reason why a single instead of a dual odour conditioning was performed could be clarified and discussed (would that have the same effects?).

      (c) Additionally, "controls" for the unpaired values - that is, in flies receiving neither shock nor odour - it would help to evaluate the unpaired control values in the different MB compartments.

      We use single odor conditioning because it is the simplest way to examine the effect of odor-shock association by comparing the paired and unpaired group. Standard differential conditioning with two odors contains unpaired odor presentation (CS-) even in the ‘paired’ group. We now show that single-odor conditioning induces memory that lasts one day as in differential conditioning (Figure 7B; Tully and Quinn, J Comp Phys A 1985).

      (d) The temporal resolution of the effect is very interesting (Figure 7D), and at more time points, especially between 90 and 270 min, this might raise interesting results.

      The sampling time points after training was chosen based on approximately logarithmic intervals, as the memory decay is roughly exponential (Figure 7B). This transient remodeling is consistent with the previous studies reporting that the Brp plasticity was short-lived (Zhang et al., 2018 Neuron; Turrel et al., 2022 Current Biol).

      (e) Additionally, it would be very interesting and rewarding to have at least one additional assay, relating structure and function, e.g. on a molecular level by a correlative analysis of BRP and synaptic vesicles (by staining or co-expression of SV-protein markers) or calcium activity imaging or on a functional level by additional learning assays.

      We thank the reviewer for raising this important point. We have performed calcium imaging of KC presynaptic terminals to correlate the structure and function in another study (see Figure 2 in Wu, Eno et al., 2025 PLOS Biology for more detail). The basal presynaptic calcium pattern along the γ compartments is strikingly similar to the compartmental heterogeneity of Brp accumulation (see also Figure 2 in this study). Considering colocalization of other active-zone components, such as Cac (Figure 1E), we propose that the learning-induced remodeling of local Brp clusters should transiently modulate synaptic properties.

      As a response to other reviewers’ interest, we used Brp::rGFP to measure different forms of Brp-based structural plasticity upon constant light exposure in the photoreceptors and upon silencing rab3 in KCs. Since these experiments nicely reproduced the results of previous studies (Sugie et al., Neuron 2013; Graf et al., Neuron 2009), we believe the learning-induced plasticity of Brp clustering in KCs has a transient nature.

      Reviewer #3 (Public review):

      Summary:

      The authors develop a tool for marking presynaptic active zones in Drosophila brains, dependent on the GAL4 construct used to express a fragment of GFP, which will incorporate with a genome-engineered partial GFP attached to the active zone protein bruchpilot - signal will be specific to the GAL4-expressing neuronal compartment. They then use various GAL4s to examine innervation onto the mushroom bodies to dissect compartment-specific differences in the size and intensity of active zones. After a description of these differences, they induce learning in flies with classic odour/electric shock pairing and observe changes after conditioning that are specific to the paired conditioning/learning paradigm.

      Strengths:

      The imaging and analysis appear strong. The tool is novel and exciting.

      Weaknesses:

      I feel that the tool could do with a little more characterisation. It is assumed that the puncta observed are AZs with no further definition or characterisation.

      We performed additional validation on the tool, including (1) nanoscopic localization of Brp::rGFP using STED imaging; (2) colocalization between Brp::rGFP and anti-Brp signals/VGCCs (Figure 1D-E); 3) activity-dependent active zone remodeling in R8 photoreceptors (Figure 1F). These will be detailed in our point-by-point response below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors keep stating, they profile or assess synaptic structure by analyzing BRP localization, cluster volume, and intensity. However, I do not think that BRP cluster volume and intensity warrant an educated statement about presynaptic structure as a whole. I do not challenge the usefulness of BRP cluster analysis for synapse evaluation, but as there are so many more players involved in synaptic function, BRP analysis certainly cannot explain it all. This should at least be discussed.

      It is correct that Brp is not the only player in the active zone. We have included more discussion on the specific role of Brp (line 84 to 89) and other synaptic markers (line 250) and edited potentially misunderstanding text.

      (2) I do see that changes in BRP expression were observed following associative learning, but is it certain, that synaptic plasticity is generally unaffected by the large GFP fluorophore? BRP is grabbing onto other proteins, both with its C- and N-termini. As the GFP is right before the stop codon, it should be at the N-terminus. How far could BRP function be hampered by this? Is there still enough space for other proteins to interact?

      We thank the reviewer for sharing the concerns. We here provided three lines of evidence to demonstrate that the Brp assembly at active zones required for synaptic plasticity is unaffected by split-GFP tagging.

      First, we assessed olfactory memory of flies that have Brp::rGFP labeled in Kenyon cells and found the performance comparable to wild-type (Figure 7 - figure supplement 2), suggesting the Brp function required for olfactory memory (Knapek et al., J Neurosci 2011) is unaffected by split-GFP tagging.

      Second, we measured Brp remodeling in photoreceptors induced by constant light exposure (LL; Sugie et al., 2015 Neuron). Consistent with the previous study, we found that LL decreased the numbers of Brp::rGFP clusters in R8 terminals in the medulla, as compared to constant dark condition (DD). This result validates the synaptic plasticity involving dynamic Brp rearrangement in the photoreceptors. We have included this result into the revised manuscript (Figure 1F).

      To further validate protein interaction of Brp::rGFP, we focused on Rab3, as it was previously shown to control Brp allocation at active zones (Graf et al., 2009 Neuron). To this end, we silenced rab3 expression in Kenyon cells using RNAi and measured the intensity of Brp::rGFP clusters in γ Kenyon cells. As previously reported in the neuromuscular junction, we found that rab3 knock-down increased Brp::rGFP accumulation to the active zones, suggesting that Brp::rGFP represents the interaction with Rab3. We have included all the new data to the revised manuscript (Figure 1 - figure supplement 3).

      (3) It may well be that not only active-zone-associated BRP is labeled but possibly also BRP molecules elsewhere in the neuron. I would like to see more validation, e.g., the percentage of tagged endogenous BRP associated with other presynaptic proteins.

      To answer to what extent Brp::rGFP clusters represent active zones, we double-labelled Brp::rGFP and Cac::tdTomato (Cacophony, the alpha subunit of the voltage-gated calcium channels). We found that 97% of Brp::rGFP clusters showed co-localization with Cac::tdTomato in PAM-γ5 dopamine neurons terminals (Figure 1E), suggesting most Brp::rGFP clusters represent functional AZs.

      (4) Z-size is ~200 nm, while x/y pixel size is ~75 nm during acquisition. How far down does the resolution go after deconvolution?

      The Z-step was 370 nm and XY pixel size was 79 nm for image acquisition. We performed 20 iterations of Richarson-Lucy deconvolution using an empirical point spread function (PSF). We found that the effect of deconvolution on the full-width at half maximum (FWHM) of Brp::rGFP clusters improves only marginally beyond 20 iterations, when the XY FWHM is around 200 nm and the XZ FWHM is around 450 nm (Figure 1 - figure supplement 4).

      (5) Figure Legend 7: What is a "cytoplasm membrane marker"? Does this mean membrane-bound tdTom is sticking into the cytoplasm?

      We apologize for the typo and have corrected it to “plasma membrane marker”.

      (6) At the end of the introduction: "characterizing multiple structural parameters..." - which were these parameters? I was under the assumption that BRP localization, cluster volume, and intensity were assessed. I do not see how these are structural parameters. Please define what exactly is meant by "structural parameters".

      We apologize for the confusion. By "structural parameters”, we indeed referred to the volume, intensity and molecular density of Brp::rGFP clusters. We have revised the sentence to “Characterizing the distinct parameters and localization of Brp::rGFP cluster.”

      (7) Next to last sentence of the introduction: "Characterizing multiple structural parameters revealed a significant synaptic heterogeneity within single neurons and AZ distribution stereotypy across individuals." What do the authors mean by "significant synaptic heterogeneity"?

      By “synaptic heterogeneity”, we refer to the intracellular variability of active zone cytomatrices reported by Brp clusters. For instance, the intensities of Brp::rGFP clusters within Kenyon cell subtypes were variable among compartments (Figure 2). Intracellular variability of the Brp concentration of individual active zones was higher in DPM and APL neurons than Kenyon cells (Figure 3). These variabilities demonstrate intracellular synaptic heterogeneity. We have revised the sentence to be more specific to the different characters of Brp clusters.

      (8) I do not understand the last sentence of the introduction. "These cell-type-specific synapse profiles suggest that AZs are organized at multiple scales, ranging from neighboring synapses to across individuals." What do the authors mean by "ranging from neighboring synapses to across individuals"? Does this mean that even neighboring synapses in the same cell can be different?

      We have revised the sentence to “These cell-type-specific synapse profiles suggest that AZs are spatially organized at multiple scales, ranging from interindividual stereotypy to neighboring synapses in the same cells.”

      By “neighboring synapses", we refer to the nearest neighbor similarity in Brp levels in some cell-types (Figure 6A-C), and also the sub-compartmental dense AZ clusters with high Brp level in Kenyon cells (Figure 6D-H). By “across individuals”, we refer to the individually conserved active zone distribution patterns in some neurons (Figure 5).

      (9) The title talks about cell-type-specific spatial configurations. I do not understand what is meant by "spatial configurations"? Do you mean BRP cluster volume? I think the title is a little misleading.

      By “spatial configuration”, we refer to the arrangement of Brp clusters within individual mushroom body neurons. This statement is based on our findings on the intracellular synaptic heterogeneity (see also response to comment #7). We have streamlined the text description in the revised manuscript for clarity.

      Reviewer #2 (Recommendations for the authors):

      (1) For Figure 3A: exemplary two AZs are compared here, a histogram comparing more AZs would aid in making the point that in general, AZ of similar size have different BRP level (intensities) and how much variation exists.

      We have included histograms for Brp::rGFP intensity and cluster volumes to Figure 3 in the revised manuscript.

      (2) Line 52: "endogenous synapses" is a confusing term; it's probably meant that the protein levels within the synapse are endogenous and not overexpressed. 

      We apologize for the confusion and have revised the term to “endogenous synaptic proteins.”

      (3) It is not clear from the Materials and Methods section, whether and where deconvolved or not-deconvolved images were used for the quantification pipeline. Please comment on this. 

      We have now revised the Method section to clarify how deconvolved or not-deconvolved images were differently used in the pipeline.

      (4) Line 664 (C) not bold.

      We have corrected the error.

      (5) 725 "Files" should be Flies.

      We have corrected the error.

      (6) 727 two times "first".

      We have corrected the error.

      (7) Figure 7. All (A) etc., not bold - there should be consistent annotation. 

      We want to thank the reviewer for the detailed proof and have corrected all the errors spotted.

      Reviewer #3 (Recommendations for the authors):

      (1) Has there been an expression of the construct in a non-neuronal cell? Astrocyte-like cell? Any glia? As some sort of control for background and activity?

      As the reviewer suggested, we verified the neuronal expression specificity of Brp::rGFP. Using R86E01-GAL4 and Amon-GAL4, we compared Brp::rGFP in astrocyte-like glia and neuropeptide-releasing neurons. We found no Brp::rGFP puncta in the neuropils in astrocyte-like glia compared to neurons, suggesting Brp::rGFP is specific to neurons. We have included this new dataset to the revised manuscript (Figure 1 - figure supplement 2).

      (2) Similarly, expression of the construct co-expressed with a channelrhodopsin, and induction of a 'learning'-like regime of activity, similarly in a control type of experiment, expression of an inwardly rectifying channel (e.g. Kir2.1) to show that increases in size of the BRP puncta are truly activity dependent? The NMJ may be an optimal neuron to use to see the 'donut' structures of the AZs and their increase with activity. Also, are these truly AZs we are seeing here? Perhaps try co-expressing cacophony-dsRed? If the GFP Puncta are active zones, then they should be surrounded by cacophony.

      We would like to clarify that we did not find Brp::rGFP size increase upon learning. Instead, we demonstrated that associative training transiently remodelled sub-compartment-sized AZ “hot spots” in Kenyon cells, indicated by the correlation of local intensity and AZ density (Figure 6-7).

      To demonstrate split-GFP tagging does not affect activity-dependent plasticity associated with Brp, we measured Brp remodeling in photoreceptors induced by constant light exposure (LL; Sugie et al., 2015 Neuron). Consistent with the previous study, we found that LL decreased the numbers of Brp::rGFP clusters in R8 terminals in the medulla, as compared to constant dark condition (DD). This result validates the synaptic plasticity involving dynamic Brp rearrangement in the photoreceptors (Figure 1F).

      As the reviewer suggested, we performed the STED microscopy for the larval motor neuron and confirmed the donut-shape arrangement of Brp::rGFP (Wu, Eno et al., PLOS Biol 2025).

      Also following the reviewer’s suggestion, we double-labelled Brp::rGFP and Cac::tdTomato (Cacophony, the alpha subunit of the voltage-gated calcium channels). We found that 97% Brp::rGFP clusters showed co-localization with Cac::tdTomato in PAM-γ5 dopamine neurons terminals (Figure 1E), suggesting most Brp::rGFP clusters represent functional AZs.

      (3) In the introduction: Intro, a sentence about BRP - central organiser of the active zone, so a key regulator of activity.

      We have included a few more sentences about the role Brp in the active zones to the revised manuscript.

      (4) Figure 1 E, line 650 'cite the resource here'. 

      We thank the reviewer for pointing out the error and we have corrected it.

      (5) Many readers may not be MB aficionados, and to make the data more accessible, perhaps use a cartoon of an MB with the cell bodies of the neurons around the MB expressing the constructs highlighted so that the reader can have a wider idea of the anatomy in relation to the MB.

      We appreciate these comments and have appended cartoons of the MB to figures to help readers understand the anatomy.

    1. Reviewer #1 (Public review):

      Summary:

      This study focuses on characterizing the EEG correlates of item-specific proportion congruency effects. Two types of learned associations are characterized, one being associations between stimulus features and control states (SC), and the other being stimulus features and responses (SR). Decoding methods are used to identify time-resolved SC and SR correlates, which are used to test properties of their dynamics.

      The conclusion is reached that SC and SR associations can independently and simultaneously guide behavior. This conclusion is based on results showing SC and SR correlates are: (1) not entirely overlapping in cross-decoding; (2) simultaneously observed on average over trials in overlapping time bins; (3) independently correlate with RT; and (4) have a positive within-trial correlation.

      Strengths:

      Fearless, creative use of EEG decoding to test tricky hypotheses regarding latent associations.

      Nice idea to orthogonalize ISPC condition (MC/MI) from stimulus features.

      Weaknesses:

      I still have my concern from the first round that the decoders are overfit to temporally structured noise. As I wrote before, the SC and SR classes are highly confounded with phase (chunk of session). I do not see how the control analyses conducted in the revision adequately deal with this issue.

      In the figures, there are several hints that these decoders are biased. Unfortunately, the figures are also constructed in such a way that hides or diminishes the salience of the clues of bias. This bias and lack of transparency discourage trust in the methods and results.

      I have two main suggestions:

      (1) Run a new experiment with a design that properly supports this question.

      I don't make this suggestion lightly, and I understand that it may not be feasible to implement given constraints; but I feel that this suggestion is warranted. The desired inferences rely on successful identification of SC and SR representations. Solidly identifying SC and SR representations necessitates an experimental design wherein these variables are sufficiently orthogonalized, within-subject, from temporally structured noise. The experimental design reported in this paper unfortunately does not meet this bar, in my opinion (and the opinion of a colleague I solicited).

      An adequate design would have enough phases to properly support "cross-phase" cross-validation. Deconfounding temporal noise is a basic requirement for decoding analyses of EEG and fMRI data (see e.g., leave-one-run-out CV that is effectively necessary in fMRI; in my experience, EEG is not much different, when the decoded classes are blocked in time, as here). In a journal with a typical acceptance-based review process, this would be grounds for rejection.

      Please note that this issue of decoder bias would seem to weaken the rest of the downstream analyses that are based on the decoded values. For instance, if the decoders are biased, in the within-trial correlation analysis, how can we be sure that co-fluctuations along certain dimensions within their projected values are driven by signal or noise? A similar issue clouds the LMM decoding-RT correlations.

      (2) Increase transparency in the reporting of results throughout main text.

      Please do not truncate stimulus-aligned timecourses at time=0. Displaying the baseline period is very useful to identify bias, that is, to verify that stimulus-dependent conditions cannot be decoded pre-stimulus. Bias is most expected to be revealed in the baseline interval when the data are NOT baseline-corrected, which is why I previously asked to see the results omitting baseline correction. (But also note that if the decoders are biased, baseline-correcting would not remove this bias; instead, it would spread it across the rest of the epoch, while the baseline interval would, on average, be centered at zero.)

      Please use a more standard p-value correction threshold, rather than Bonferroni-corrected p<0.001. This threshold is unusually conservative for this type of study. And yet, despite this conservativeness, stimulus-evoked information can be decoded from nearly every time bin, including at t=0. This does not encourage trust in the accuracy of these p-values. Instead, I suggest using permutation-based cluster correction, with corrected p<0.05. This is much more standard and would therefore allow for better comparison to many other studies.

      I don't think these things should be done as control analyses, tucked away in the supplemental materials, but instead should be done as a part of the figures in the main text -- including decoding, RSA, cross-trial correlations, and RT correlations.

      Other issues:

      Regarding the analysis of the within-trial correlation of RSA betas, and "Cai 2019" bias:

      The correction that authors perform in the revision -- estimating the correlation within the baseline time interval and subtracting this estimate from subsequent timepoints -- assumes that the "Cai 2019" bias is stationary. This is a fairly strong assumption, however, as this bias depends not only on the design matrix, but also on the structure of the noise (see the Cai paper), which can be non-stationary. No data were provided in support of stationarity. It seems safer and potentially more realistic to assume non-stationarity.

      This analysis was included in the supplemental material. However, given that the correlation analysis presented in the Results is subject to the "Cai 2019" bias, it would seem to be more appropriate to replace that analysis, rather than supplement it.

      Regardless, this seems to be a moot issue, given that the underlying decoders seem to be overfit to temporally structured noise (see point above regarding weakening of downstream analyses based on decoder bias).

      Outliers and t-values:

      More outliers with beta coefficients could be because the original SD estimates from the t-values are influenced more by extreme values. When you use a threshold on the median absolute deviation instead of mean +/-SD, do you still get more outliers with beta coefficients vs t-values?

      Random slopes:

      Were random slopes (by subject) for all within-subject variables included in the LMMs? If not, please include them, and report this in the Methods.

    2. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study uses creative scalp EEG decoding methods to attempt to demonstrate that two forms of learned associations in a Stroop task are dissociable, despite sharing similar temporal dynamics. However, the evidence supporting the conclusions is incomplete due to concerns with the experimental design and methodology. This paper would be of interest to researchers studying cognitive control and adaptive behavior, if the concerns raised in the reviews can be addressed satisfactorily.

      We thank the editors and the reviewers for their positive assessment of our work and for providing us with an opportunity to strengthen this manuscript. Please see below our responses to each comment raised in the reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study focuses on characterizing the EEG correlates of item-specific proportion congruency effects. In particular, two types of learned associations are characterized. One being associations between stimulus features and control states (SC), and the other being stimulus features and responses (SR). Decoding methods are used to identify SC and SR correlates and to determine whether they have similar topographies and dynamics.

      The results suggest SC and SR associations are simultaneously coactivated and have shared topographies, with the inference being that these associations may share a common generator.

      Strengths:

      Fearless, creative use of EEG decoding to test tricky hypotheses regarding latent associations. Nice idea to orthogonalize the ISPC condition (MC/MI) from stimulus features.

      Thank you for acknowledging the strength in EEG decoding and design. We have addressed all your concerns raised below point by point.

      Weaknesses:

      (1a) I'm relatively concerned that these results may be spurious. I hope to be proven wrong, but I would suggest taking another look at a few things.

      While a nice idea in principle, the ISPC manipulation seems to be quite confounded with the trial number. E.g., color-red is MI only during phase 2, and is MC primarily only during Phase 3 (since phase 1 is so sparsely represented). In my experience, EEG noise is highly structured across a session and easily exploited by decoders. Plus, behavior seems quite different between Phase 2 and Phase 3. So, it seems likely that the classes you are asking the decoder to separate are highly confounded with temporally structured noise.

      I suggest thinking of how to handle this concern in a rigorous way. A compelling way to address this would be to perform "cross-phase" decoding, however I am not sure if that is possible given the design.

      Thank you for raising this important issue. To test whether decoding might be confounded by temporally structured noise, we performed a control decoding analysis. As the reviewer correctly pointed out, cross-phase decoding is not possible due to the experimental design. Alternatively, to maximize temporal separation between the training and test data, we divided the EEG data in phase 2 and phase 1&3 into the first and second half chronologically. Phase 1 and 3 were combined because they share the same MC and MI assignments. We then trained the decoders on one half and tested them on the other half. Finally, we averaged the decoding results across all possible assignments of training and test data. The similar patterns (Supplementary Fig.1) observed confirmed that the decoding results are unlikely to be driven by temporally structured noise in the EEG data. The clarification has been added to page 13 of the revised manuscript.

      (1b) The time courses also seem concerning. What are we to make of the SR and SC timecourses, which have aggregate decoding dynamics that look to be <1Hz?

      As detailed in the response to your next comment, some new results using data without baseline correction show a narrower time window of above-chance decoding. We speculate that the remaining results of long-lasting above-chance decoding could be attributed to trials with slow responses (some responses were made near the response deadline of 1500 ms). Additionally, as shown in Figure 6a, the long-lasting above-chance decoding seems to be driven by color and congruency representations. Thus, it is also possible that the binding of color and congruency contributes to decoding. This interpretation has been added to page 17 of the revised manuscript.

      (1c) Some sanity checks would be one place to start. Time courses were baselined, but this is often not necessary with decoding; it can cause bias (10.1016/j.jneumeth.2021.109080), and can mask deeper issues. What do things look like when not baselined? Can variables be decoded when they should not be decoded? What does cross-temporal decoding look like - everything stable across all times, etc.?

      As the reviewer mentioned, baseline-corrected data may introduce bias to the decoding results. Thus, we cited the van Driel et al (2021) paper in the revised manuscript to justify the use of EEG data without baseline-correction in decoding analysis (Page 27 of the revised manuscript), and re-ran all decoding analysis accordingly. The new results revealed largely similar results (Fig. 2, 4, 6 and 8 in the revised manuscript) with the following exceptions: narrower time window for separatable SC subspace and SR subspace (Fig. 4b), narrower time window for concurrent representations of SC and SR (Fig. 6a-b), and wider time window for the correlations of SC/SR representations with RTs (Fig. 8).

      (2) The nature of the shared features between SR and SC subspaces is unclear.

      The simulation is framed in terms of the amount of overlap, revealing the number of shared dimensions between subspaces. In reality, it seems like it's closer to 'proportion of volume shared', i.e., a small number of dominant dimensions could drive a large degree of alignment between subspaces.

      What features drive the similarity? What features drive the distinctions between SR and SC? Aside from the temporal confounds I mentioned above, is it possible that some low-dimensional feature, like EEG congruency effect (e.g., low-D ERPs associated with conflict), or RT dynamics, drives discriminability among these classes? It seems plausible to me - all one would need is non-homogeneity in the size of the congruency effect across different items (subject-level idiosyncracies could contribute: 10.1016/j.neuroimage.2013.03.039).

      Thank you for this question. To test what dimensions are shared between SC and SR subspaces, we first identify which factors can be shared across SC and SR subspaces. For SC, the eight conditions are the four colors × ISPC. Thus, the possible shared dimensions are color and ISPC. Additionally, because the four colors and words are divided into two groups (e.g., red-blue and green-yellow, counterbalanced across subjects, see Methods), the group is a third potential shared dimension. Similarly, for SR decoders, potential shared dimensions are word, ISPC and group. Note that each class in SC and SR decoders has both congruent and incongruent trials. Thus, congruency is not decodable from SC/SR decoders and hence unlikely to be a shared dimension in our analysis. To test the effect of sharing for each of the potential dimensions, we performed RSA on decoding results of the SC decoder trained on SR subspace (SR | SC) (Supplementary Fig. 4a) and the SR decoder trained on SC subspace (SC | SR) (Supplementary Fig. 4b), where the decoders indicated the decoding accuracy of shared SC and SR representations. In the SC classes of SR | SC, word red and blue were mixed within the same class, same were word yellow and green. The similarity matrix for “Group” of SR | SC (Supplementary Fig. 4a) shows the comparison between two word groups (red & blue vs. yellow & green). The similarity matrix for “Group” of SC | SR (Supplementary Fig. 4b) shows the comparison between two color groups (red & blue vs. yellow & green).

      The RSA results revealed that the contributions of group to the SC decoder (Supplementary Fig. 5a) and the SR decoder (Supplementary Fig. 5b) were significant. Meanwhile, a wider time window showed significant effect of color on the SC decoder (approximately 100 - 1100 ms post-stimulus onset, Supplementary Fig. 5a) and a narrower time window showed significant effect of word on SR decoder (approximately 100 - 500 ms post-stimulus onset, Supplementary Fig. 5b). However, we found no significant effect of ISPC on either SC or SR decoders. We also performed the same analyses on response-locked data from the time window -800 to 200 ms. The results showed shared representation of color in the SC decoder (Supplementary Fig. 5c) and group in both decoders (Supplementary Fig. 5c-d). Overall, the above results demonstrated that color, word and group information are shared between SC and SR subspaces.

      Lastly, we would like to stress that our main hypothesis for the cross-subspace decoding analysis is that SR and SC subspaces are not identical. This hypothesis was supported by lower decoding accuracy for cross-subspace than within-subspace decoders and enables following analyses that treated SC and SR as separate representations.

      We have added the interpretation to page 13-14 of the revised manuscript.

      (3) The time-resolved within-trial correlation of RSA betas is a cool idea, but I am concerned it is biased. Estimating correlations among different coefficients from the same GLM design matrix is, in general, biased, i.e., when the regressors are non-orthogonal. This bias comes from the expected covariance of the betas and is discussed in detail here (10.1371/journal.pcbi.1006299). In short, correlations could be inflated due to a combination of the design matrix and the structure of the noise. The most established solution, to cross-validate across different GLM estimations, is unfortunately not available here. I would suggest that the authors think of ways to handle this issue.

      Thank you for raising this important issue. Because the bias comes from the covariance between the regressors and the same GLM was applied to all time points in our analysis, we assume that the inflation would be similar at different time points. Therefore, we calculated the correlation of SC and SR betas ranging from -200 to 0 ms relative to stimulus onset as a baseline (i.e., no SC or SR representation is expected before the stimulus onset) and compared the post-stimulus onset correlation coefficients against this baseline. We hypothesized that if the positively within-trial correlation of SC and SR betas resulted from the simultaneous representation instead of inflation, we should observe significantly higher correlation when compared with the baseline. To examine this hypothesis, we first performed the linear discriminant analysis (Supplementary Fig. 7a) and RSA regression (Supplementary Fig. 7b) on the -200 - 0 ms window relative to stimulus onset. We then calculated the average r<sub>baseline</sub> of SC and SR betas on that time window for each participant (group results at each time point are shown in Supplementary Fig. 7c) and computed the relative correlation at each post-stimulus onset time point using (fisher-z (r) - fisher-z (r<sub>baseline</sub>)). Finally, we performed a simple t test at the group level on baseline-corrected correlation coefficients with Bonferroni correction. The results (Fig. 6c) showed significantly more positive correlation from 100 - 500 ms post-stimulus onset compared with baseline, supporting our hypothesis that the positive within-trial correlation of SC and SR betas arise from simultaneous representation rather than inflation. The related interpretation was added to page 17 of the revised manuscript.

      (4) Are results robust to running response-locked analyses? Especially the EEG-behavior correlation. Could this be driven by different RTs across trials & trial-types? I.e., at 400 ms poststim onset, some trials would be near or at RT/action execution, while others may not be nearly as close, and so EEG features would differ & "predict" RT.

      Thanks for this question. We now pair each of the stimulus-locked EEG analysis in the manuscript with response-locked analysis. To control for RT variations among trial types, when using the linear mixed model (LMM) to predict RTs from trial-wise RSA results, we included a separate intercept for each of the eight trial types in SC or SR. Furthermore, at each time point, we only included trials that have not generated a response (for stimulus-locked analysis) or already started (for response-locked analysis). All the results (Fig. 3, 5, 7, 9 in the revised manuscript) are in support of our hypothesis. We added these detailed to page 31 of the revised manuscript.

      (5) I suggest providing more explanation about the logic of the subspace decoding method - what trialtypes exactly constitute the different classes, why we would expect this method to capture something useful regarding ISPC, & what this something might be. I felt that the first paragraph of the results breezes by a lot of important logic.

      In general, this paper does not seem to be written for readers who are unfamiliar with this particular topic area. If authors think this is undesirable, I would suggest altering the text.

      To improve clarity, we revised the first paragraph of the SC and SR association subspace analysis to list the conditions for each of the SC and SR decoders and explain more about how the concept of being separatable can be tested by cross-decoding between SC and SR subspaces. The revised paragraph now reads:

      “Prior to testing whether controlled and non-controlled associations were represented simultaneously, we first tested whether the two representations were separable in the EEG data.

      In other words, we reorganized the 16 experimental conditions into 8 conditions for SC (4 colors × MC/MI, while collapsing across SR levels) and SR (4 words × 2 possible responses per word, while collapsing across SC levels) associations separately. If SC and SR associations are not separable, it follows that they encode the same information, such that both SC and SR associations can be represented in the same subspace (i.e., by the same information encoded in both associations). For example, because (1) the word can be determined by the color and congruency and (2) the most-likely response can be determined by color and ISPC, the SR association (i.e., association between word and most-likely response) can in theory be represented using the same information as the SC association. On the other hand, if SC and SR associations are separable, they are expected to be represented in different subspaces (i.e., the information used to encode the two associations is different). Notably, if some, but not all, information is shared between SC and SR associations, they are still separable by the unique information encoded. In this case, the SC and SR subspaces will partially overlap but still differ in some dimensions. To summarize, whether SC and SR associations are separable is operationalized as whether the associations are represented in the same subspace of EEG data. To test this, we leveraged the subspace created by the LDA (see Methods). Briefly, to capture the subspace that best distinguishes our experimental conditions, we trained SC and SR decoders using their respective aforementioned 8 experimental conditions. We then projected the EEG data onto the decoding weights of the LDA for each of the SC and SR decoders to obtain its respective subspace. We hypothesized that if SC and SR subspaces are identical (i.e., not separable), SC/SR decoding accuracy should not differ by which subspace (SC or SR) the decoder is trained on. For example, SC decoders trained in SC subspace should show similar decoding performance as SC decoders trained in SR subspace. On the other hand, if SC and SR association representations are in different subspaces, the SC/SR subspace will not encode all information for SR/SC associations. As a result, decoding accuracy should be higher using its own subspace (e.g., decoding SC using the SC subspace) than using the other subspace (e.g., decoding SC using the SR subspace). We used cross-validation to avoid artificially higher decoding accuracy for decoders using their own subspace (see Methods).” (Page 11-12).

      We also explicitly tested what information is shared between SC and SR representations (see response to comment #2). Lastly, to help the readers navigate the EEG results, we added a section “Overview of EEG analysis” to summarize the EEG analysis and their relations in the following manner:

      “EEG analysis overview. We started by validating that the 16 experimental conditions (8 unique stimuli × MC/MI) were represented in the EEG data. Evidence of representation was provided by above-chance decoding of the experimental conditions (Fig. 2-3). We then examined whether the SC and SR associations were separable (i.e., whether SC and SR associations were different representations of equivalent information). As our results supported separable representations of SC and SR association (Fig. 4-5), we further estimated the temporal dynamics of each representation within a trial using RSA. This analysis revealed that the temporal dynamics of SC and SR association representations overlapped (Fig. 6a-b, Fig. 7a-b). To explore the potential reason behind the temporal overlap of the two representations, we investigated whether SC and SR associations were represented simultaneously as part of the task representation, independently from each other, or competitively/exclusively (e.g., on some trials only SC association was represented, while on other trials only SR association was represented). This was done by assessing the correlation between the strength of SC and SR representations across trials (Fig. 6c, Fig. 7c). Lastly, we tested how SC and SR representations facilitated performance (Fig.8-9).” (Page 8-9).

      Minor suggestions:

      (6) I'd suggest using single-trial RSA beta coefficients, not t-values, as they can be more stable (it's a t-value based on 16 observations against 9 or so regressors.... the SE can be tiny).

      Thank you for your suggestion. To choose between using betas and t-values, we calculate the proportion of outliers (defined as values beyond mean ± 5 SD) for each predictor of the design matrix and each subject. We found that outliers were less frequent for t-values than for beta coefficients (t-values: mean = 0.07%, SD = 0.009%; beta-values: mean = 0.19%, SD = 0.033%). Thus, we decided to stay with t-values.

      (7) Instead of prewhitening the RTs before the HLM with drift terms, try putting those in the HLM itself, to avoid two-stage regression bias.

      Thank you for your suggestion. Because our current LMM included each of the eight trial types in SC or SR as separate predictors with their own intercepts (as mentioned above), adding regressors of trial number and mini blocks (1-100 blocks) introduced collinearity (as ISPC flipped during the experiment). We therefore excluded these regressors from the current LMM (Page 31).

      (8) The text says classical MDS was performed on decoding *accuracy* - is this accurate?

      We now clarify in the manuscript that it is the decoders’ probabilistic classification results (Page 28).

      (9) At a few points, it was claimed that a negative correlation between SC and SR would be expected within single trials, if the two were temporally dissociable. Wouldn't it also be possible that they are not correlated/orthogonal?

      We agree with the reviewer and revised the null hypothesis in the cross-trial correlation analysis to include no correlation as SC and SR association representations may be independent from each other (Page 17, 22).

      Reviewer #2 (Public review):

      Summary:

      In this EEG study, Huang et al. investigated the relative contribution of two accounts to the process of conflict control, namely the stimulus-control association (SC), which refers to the phenomenon that the ratio of congruent vs. incongruent trials affects the overall control demands, and the stimulus-response association (SR), stating that the frequency of stimulusresponse pairings can also impact the level of control. The authors extended the Stroop task with novel manipulation of item congruencies across blocks in order to test whether both types of information are encoded and related to behaviour. Using decoding and RSA, they showed that the SC and SR representations were concurrently present in voltage signals, and they also positively co-varied. In addition, the variability in both of their strengths was predictive of reaction time. In general, the experiment has a solid design, but there are some confounding factors in the analyses that should be addressed to provide strong support for the conclusions.

      Strengths:

      (1) The authors used an interesting task design that extended the classic Stroop paradigm and is potentially effective in teasing apart the relative contribution of the two different accounts regarding item-specific proportion congruency effect, provided that some confounds are addressed.

      (2) Linking the strength of RSA scores with behavioural measures is critical to demonstrating the functional significance of the task representations in question.

      Thank you for your positive feedback. We hope our responses below address your concerns.

      Weakness:

      (1) While the use of RSA to model the decoding strength vector is a fitting choice, looking at the RDMs in Figure 7, it seems that SC, SR, ISPC, and Identity matrices are all somewhat correlated. I wouldn't be surprised if some correlations would be quite high if they were reported. Total orthogonality is, of course, impossible depending on the hypothesis, but from experience, having highly covaried predictors in a regression can lead to unexpected results, such as artificially boosting the significance of one predictor in one direction, and the other one to the opposite direction. Perhaps some efforts to address how stable the timed-resolved RSA correlations for SC and SR are with and without the other highly correlated predictors will be valuable to raising confidence in the findings.

      Thank you for this important point. The results of proportion of variability explained shown in the Author response table 1 below, indicated relatively higher correlation of SC/SR with Color and Identity. We agree that it is impossible to fully orthogonalize them. To address the issue of collinearity, we performed a control RSA by removing predictors highly correlated with others. Specifically, we calculated the variance inflation factor (VIF) for each predictor. The Identity predictor had a high VIF of 5 and was removed from the RSA. All other predictors had VIFs < 4 and were kept in the RSA. The results (Supplementary Fig. 6) showed patterns similar to the results with the Identity predictor, suggesting that the findings are not significantly influenced by collinearity. We have added the interpretation to page 17 of the revised manuscript.

      Author response table 1.

      Proportion of variability explained (r<sup>2</sup>) of RSA predictors.

      (2) In "task overview", SR is defined as the word-response pair; however, in the Methods, lines 495-496, the definition changed to "the pairing between word and ISPC" which is in accordance with the values in the RDMs (e.g., mccbb and mcirb have similarity of 1, but they are linked to different responses, so should they not be considered different in terms of SR?). This needs clarification as they have very different implications for the task design and interpretation of results, e.g., how correlated the SC and SR manipulations were.

      Thank you for pointing out this important issue with how our operationalization captures the concept in questions. In the revised manuscript, we clarified the stimulus-response (SR) association is the link between the word and the most-likely response (i.e., not necessarily the actual response on the current trial). This association is likely to be encoded based on statistical learning over several trials. On each trial, the association is updated based on the stimulus and the actual response. Over multiple trials, the accumulated association will be driven towards the most-common (i.e., most-likely) response. In our ISPC manipulation, a color is presented in mostly congruent/incongruent (MC/MI) trials, which will also pair a word with a most-likely response. For example, if the color blue is MC, the color blue, which leads to the response blue, will co-occur with the word blue with high frequency. In other words, the SR association here is between the word blue and the response blue. As the actual response is not part of the SR association, in the RDM two trial types with different responses may share the same SR association, as long as they share the same word and the same ISPC manipulation, which, by the logic above, will produce the same most-likely response. These clarifications have been added to page 4 and 29 of the revised manuscript.

      In the revised manuscript (Page 17), we addressed how much the correlated SC and SR predictors in the RDM could affect the correlation analysis between SC and SR association representation strength. Specifically, we conducted the RSA using the same GLM on EEG data prior to stimulus onset (Supplementary Fig. 7a-b). As no SC and SR associations are expected to be present before stimulus onset, the correlation between SC and SR representation would serve as a baseline of inflation due to correlated predictors in the GLM (Supplementary Fig. 7c, also see comment #3 of R1). The SC-SR correlation coefficients following stimulus onset was then compared to the baseline to control for potential inflation (Fig. 6c). Significantly above-baseline correlation was still observed between ~100-500 ms post-stimulus onset, providing support for the hypothesis that SC and SR are encoded in the same task representation.

      Minor suggestions:

      (3) Overall, I find that calling SC-controlled and SR-uncontrolled representations unwarranted. How is the level controlledness defined? Both are essentially types of statistical expectation that provide contextual information for the block of tasks. Is one really more automatic and requires less conscious processing than the other? More background/justification could be provided if the authors would like to use these terms.

      Following your advice, we have added more discussion on how controlledness is conceptualized in this work and in the literature, which reads:

      “We consider SC and SR as controlled and uncontrolled respectively based on the literature investigating the mechanism of ISPC effect. The SC account posits that the ISPC effect results from conflict and involves conflict adaptation, which requires the regulation of attention or control (Bugg & Hutchison, 2013; Bugg et al., 2011; Schmidt, 2018; Schmidt & Besner, 2008). On the other hand, the SR account argues that ISPC effect does not require conflict adaptation but instead reflects contingency leaning. That is, the response can be directly retrieved from the association between the stimulus and the most-likely response without top-down regulation of attention or control. As more empirical evidence emerged, researchers advocating control view began to acknowledge the role of associative learning in cognitive control regarding the ISPC effect (Abrahamse et al., 2016). SC association has been thought to include both automatic that is fast and resource saving and controlled processes that is flexible and generalizable (Chiu, 2019). Overall, we do not intend to claim that SC is entirely controlled or SR is completely automatic. We use SC-controlled and SR-uncontrolled representations to align with the original theoretical motivation and to highlight the conceptual difference between SC and SR associations.” (Page 24-25)

      (4) Figures 3c and d: the figures could benefit from more explanation of what they try to show to the readers. Also for 3d, the dimensions were aligned with color sets and congruencies, but word identities were not linearly separable, at least for the first 3 axes. Shouldn't one expect that words can be decoded in the SR subspace if word-response pairs were decodable (e.g., Figure 3b)?

      Thank you for the insightful observation. We now clarified that Fig. 3c and d in the original manuscript (Fig. 4c and d in the current manuscript) aim to show how each of the 8 trial types in the SC and SR subspaces are represented. The MDS approach we used for visualization tries to preserve dissimilarity between trial types when projecting from data from a high dimensional to a low dimensional space. However, such projection may also make patterns linearly separatable in high dimensional space not linearly separatable in low dimensional space. For example, if the word blue has two points (-1, -1) and (1, 1) and the word red has two points (-1, 1) and (1, -1), they are not linearly separatable in the 2D space. Yet, if they are projected from a 3D space with coordinates of (-1, -1, -0.1), (1, 1, -0.1), (-1, 1, 0.1) and (1, -1, 0.1), the two words can be linearly separatable using the 3<sup>rd</sup> dimension. Thus, a better way to test whether word can be linearly separated in SR subspace is to perform RSA on the original high dimensional space. We performed the RSA with word (Supplementary Fig. 2) on the SR decoder trained on the SR subspace. Note that in Fig. 3c and d of the original script (Fig. 4c and d in the current manuscript) there are two pairs of words that are not linearly separable: red-blue and yellow-green. Thus, we specifically tested the separability within the two pairs using the one predictor for each pair, as shown in Supplementary Fig. 2. The results showed that within both word pairs individual words were presented above chance level (Supplementary Fig. 3). Considering that the decoders are linear, this finding indicates linear separability of the word pairs in the original SR subspace. The clarification has been added to page 13 (the end of the second paragraph) of the revised manuscript.

      References

      Abrahamse, E., Braem, S., Notebaert, W., & Verguts, T. (2016). Grounding cognitive control in associative learning. Psychological Bulletin, 142(7), 693-728.doi:10.1037/bul0000047.

      Bugg, J. M., & Hutchison, K. A. (2013). Converging evidence for control of color-word Stroop interference at the item level. Journal of Experimental Psychology:Human Perception and Performance, 39(2), 433-449. doi:10.1037/a0029145.

      Bugg, J. M., Jacoby, L. L., & Chanani, S. (2011). Why it is too early to lose control in accounts of item-specific proportion congruency effects. Journal of Experimental Psychology: Human Perception and Performance, 37(3), 844-859. doi:10.1037/a0019957.

      Chiu, Y.-C. (2019). Automating adaptive control with item-specific learning. In Psychology of Learning and Motivation (Vol. 71, pp. 1-37).

      Schmidt, J. R. (2018). Evidence against conflict monitoring and adaptation: An updated review. Psychonomic Bulletin & Review, 26(3), 753-771. doi:10.3758/s13423018-1520-z.

      Schmidt, J. R., & Besner, D. (2008). The Stroop effect: Why proportion congruent has nothing to do with congruency and everything to do with contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(3), 514-523. doi:10.1037/0278-7393.34.3.514.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Response to Review

      We would like to thank all three reviewers for their encouraging comments on our manuscript. We now submit our revised study after considerable efforts to address each of the reviewer concerns. I will first provide a response related to a major change we have made in the revision that addressed a concern common to all three reviewers, followed by a point-by-point response to individual comments.

      Replacing LRRK2ARM data with a LRRK2 specific type II kinase inhibitor: The most critical issue for all 3 reviewers was the use of our new CRISPR-generated truncation mutant of LRRK2 that we called LRRK2ARM. We had not provided direct evidence of the protein product of this truncation, which was a significant limitation. To address this we performed proteomics analysis of all clones, and to our surprise, we identified 7 peptides that were C-terminal to our "predicted" stop codon we had engineered into the CRISPR design. A repeat of the deep sequencing analysis in both directions then more clearly revealed site specific mutations leading to 4 amino acid changes at the junction of exon 19, without introducing a stop codon. Given that we could not detect the protein by western blot (even though proteomics now indicated the region of LRRK2 recognized by our antibodies was present) we decided to remove this clone from the manuscript. In the meantime we had compared the ineffectiveness of MLi-2 to block Rab8 phosphorylation during iron overload in the LRRK2G2019S cells with a type II kinase inhibitor called rebastinib. The data showed very clearly that treatment with rebastinib reversed the iron-induced phospho-Rab8 at the plasma membrane (and by western blot, in new Fig 3). Since this inhibitor is very broad spectrum inhibiting ~30% of the kinome we reached out to Sam Reck-Peterson and Andres Leschziner, experts in LRRK2 structure/function, who recently developed a much more selective LRRK2-specific type II kinase inhibitor they called RN341 and RN277 (developed with Stefan Knapp PMID: 40465731). These compounds effectively coupled the MLi-2 compound through an indole ring to a rebastinib type II compound to provide LRRK2 binding specificity to the efficient DYG "out" type II inhibitor. As with rebastinib, the new LRRK-specific kinase inhibitors also effectively reversed the cell surface p-Rab8 seen in LRRK2G2019S, iron loaded cells. These new data provide the first biological paradigm where the kinase activity of LRRK2 is resistant to type I MLi-2, yet remains highly sensitive to type II inhibitors. While the loss of our LRRK2ARM clone marks a significant change in the manuscript we believe the main message is stronger with the addition of the new LRRK2 specific type II kinase inhibitor. Our data show that it is indeed the active kinase function of LRRK2G2019S that is impacting the iron phenotypes we observe but highlight the conformational specificity upon iron overload such that MLi-2 is ineffective. The overall phenotypes we observe in LRRK2G2019S macrophages remain unchanged and are now expanded within the manuscript. We hope reviewers will agree that our work provides important new insights into LRRK2 function in iron homeostasis while opening new avenues of research in future studies.

      Given this new information we have changed the title from "LRRK2G2019S acts as a dominant interfering mutant in the context of iron overload" to the more accurate "LRRK2G2019S interferes with NCOA4 trafficking in response to iron overload leading to oxidative stress and ferroptotic cell death."

      Response to Reviewer 1

      Reviewer 1 (R1): There are two major concerns with the data in their present form. In brief, first, the G2019S cells express much less LRRK2 and more Rab8 that the WT cells and this severely affects interpretability.

      Heidi McBride (HM): We agree that the LRRK2G2019S lines express lower levels of LRRK2 than wild type, which is a previously documented phenomenon, presumably as the cell attempts to downregulate the increased kinase activity by reducing protein expression. However, the levels of Rab8 across 10s of experiments do not consistently show any differences between the wild type, G2019S and KO. We have provided more comprehensive quantifications of the blots in the revised version, and the Rab8 levels are consistent across all the blots presented in the manuscript (Figure 1A and 1B).

      R1: Second, the investigators used CRISPR to truncate the endogenous LRRK2 locus to produce a hypothetical truncated LRRK2-ARM polypeptide. This appears to have robust effects on NCOA4, in particular, which drives the overall interpretation of the data. However, the expression of this novel LRRK2 species is not confirmed nor compared to WT or G2019S in these cells (although admittedly the investigators did seek to address this with subsequent KO in the ARM cells). It would be premature to account for the changes reported without evidence of protein expression. This latter issue may be more easily addressed and could provide very strong support for a novel function/finding, see more detailed comments below, most seeking clarifications beyond the above.

      HM: As described in my common response above, we have removed the LRRK2ARM data from the manuscript.

      R1: Need to make clear in the results whether the G2019S CRISPR mutant is heterozygous or homozygous (presumably homozygous, same for ARM)

      HM: The RAW cell line we generated is homozygous for the G2019S and the KO alleles. We added this to the beginning of the results section and methods.

      R1: The text of the results implies that MLi2 was used in both WT and G2019S Raw cells, but it's only shown for G2019S. Given the premise for the use of RAW cells, it's important to show that there is basal LRRK2 kinase activity in WT cells to go along with its high protein expression. This is particularly important as the G2019S blot suggests minor LRRK2-independent phosphorylation of Rab8a (and other detected pRabs). One would imagine that pRab8 levels in both WT and G2019S would reduce to the same base line or ratio of total Rab in the presence of MLi2, but WT untreated is similar to G2019S with MLi2. This suggests no basal LRRK2 activity in the Raw cells, but I don't think that is the case.

      HM: We have included the data from MLi-2 treatment of wild type cells in Fig 3C quantified in D. Again, the baseline levels of Rab8 are unchanged across the genotypes. However, the reviewer is correct that there is some baseline LRRK2 kinase activity that is sensitive to MLi2 in wild type cells. This is seen most clearly on the autophosphorylation of LRRK2 at S1292 in Fig 3C. The pRab8 blots is not as clear in wild type cells. It is likely that LRRK2 must be actively recruited to membranes (as seen by others with LLOME, etc) to easily visualize p-Rabs in wild type cells. Nevertheless, we do clearly see the activity of autophosphorylation in wild type cells. Therefore while we understand the reviewers point that there should be some Rab8 phosphorylation in wild type cells, we don't see a significant, or very convincing, amount of it in our RAW macrophages.

      R1: Also, in terms of these cells, the levels of LRRK2 are surprisingly unmatched (Fig 1A, 1D, 1H, S1D, etc.) as are total levels of Rab8 (but in opposite directions) between the WT and G2019S. This is not mentioned in the Results text and is clearly reproducible and significant. Why do the investigators think this is? If Rab8 plays a role in iron, how do these differences affect the interpretation of the G2019S cells (especially given that MLi2 does not rescue)? Are other LRRK2-related Rabs affected at the protein (not phosphorylation level)? Could reduced levels of LRRK2 or increase Rab 8 alone or together account for some of these differences? Substantial further characterization is required as this seriously affects the interpretability of the data. Since pRab8 is not normalized to total Rab8, this G2019S model may not reflect a total increase in LRRK2 kinase activity, and could in fact have both less LRRK2 protein and less cellular kinase activity than WT (in this case).

      HM: In our hands, the RAW cells with homozygous LRRK2G2019S mutations show clearly that the total protein levels of LRRK2 is reduced compared to wild type, which is likely a compensatory effect to reduce cellular kinase activity overall. We understand that some of our previous blots were not so clear on the total Rab8 levels across the different experiments. We have repeated many of these experiments and hope the reviewer can see in Figs 1A, 3C, 3E, 3J, and Sup3A that the total Rab8 levels are stable across the conditions. We also present quantifications from 3 independent experiments normalizing the pRab8/Rab8 levels in all three genotypes in untreated and iron-loaded conditions (Supp Fig 3A and B), and upon MLi2 treatment (Fig 3C). In 3C and D the data show the effectiveness of MLi-2 to reduce pRab8 in control conditions, but the resistance to MLi-2 in FAS treated cells.

      R1: Presumably, the blots in 1H are whole cell lysates and account for the pooled soluble and insoluble NCOA4 (increased in G2019S), as there is no difference in soluble NCOA4 (Fig 2H). I suspect the prior difference is nicely reflected in the insoluble fraction (Fig 2H). This should be better explained in the Results text. This is a very interesting finding and I wonder what the investigators believe is driving this phenotype? Is the NCOA4 partitioning into a detergent-inaccessible compartment? Does this replicate with other detergents, those perhaps better at solubilizing lipid rafts? Is this a phenotype reversible with MLi2? Very interesting data.

      HM: We apologize for not being clearer in the text describing the behavior of NCOA4. The reviewer is correct that the major change in G2019S is the increased triton-X100 insoluble NCOA4. Previous work has established that NCOA4 segregates into detergent-insoluble foci upon iron overload as a way to release it from ferritin cages, and this fraction is then internalized into lysosomes through a microautophagy pathway (see Mizushima's work PMID: 36066504). In Fig 1I we show that the elevation in NCOA4 and ferritin heavy chain seen in untreated G2019S cells can be cleared upon iron chelation with DFO, indicating that the canonical NCOA4 mediated ferritinophagy (macroautophagy) pathway remains intact to recycle the iron in conditions of iron starvation. However in Figure 2 we show that conditions of iron overload, when NCOA4 segregates from ferritin (to allow cytosolic storage of iron), this form of NCOA4 cannot be degraded within the lysosome through the microautophagy pathway, and begins to accumulate. We see this with our live and fixed imaging compared to wild type cells (Fig 2A,D), and by the lack of clearance seen by western blot (Fig 2E). As for the impact of MLi-2, we observe some reversal of NCOA4 accumulation in untreated cells at 4 and 8 hrs after MLi-2 treatment (Supp Fig 2F). However, in iron loaded conditions the high NCOA4 levels in G2019S cells are MLi2 insensitive, while the elevated NCOA4 in wild type cells is reduced upon MLi2 addition (Fig. 2F, compare lates 3vs4 in wt with lanes 7vs8 in G2019S). This is consistent with a block in the microautophagy pathway of phase-separated NCOA4 degradation in G2019S cells.

      R1: Figure 2 describes the increased NCOA4-positive iron structures after iron load, but does not emphasize that the G2019S cells begin preloaded with more NCOA4. How do the investigators account for differential NCOA4 in this interpretation? Is this simply a reflection of more NCOA4 available in G2019S cells? This seems reasonable.

      HM: The reviewer is correct, we showed that there is some turnover of NCOA4 in untreated conditions through canonical ferritinophagy, but in iron overload this appears to be blocked, the NCOA4 segregates from ferritin and remains within insoluble, phase-separated structures that cannot be degraded through microautophagy. We have written the text to be more clear on these points.

      R1: These are very long exposures to iron, some as high as 48 hr which will then take into account novel transcriptomic and protein changes. Did the investigators evaluate cell death? Iron uptake would be trackable much quicker.

      HM: We agree that many things will change after our FAS treatments and now provide a full proteomics dataset on wild type and G2019S cells with and without iron overload, which is presented in Figure 4A-B. Indeed Figure 4 is entirely new to this revised submission. The proteomics highlighted a series of cellular changes that reflect major cell stress responses including the upregulation of HMOX1 (western blots to validate in Supp Fig 4A), an NRF2 transcriptional target consistent with our observation that NRF2 is stabilized and translocated to the nucleus in G2019S iron loaded cells (Sup Fig 4B,C). There are several interesting changes, and we highlighted the three major nodes, which are changes in iron response proteins, lysosomal proteins - particularly a loss of catalytic enzymes like lysozymes and granzymes consistent with the loss of hydrolytic capacity we show in Fig. 4C,D. We also noted changes in cytoskeletal proteins we suspect is consistent with the "blebbing" of the plasma membrane we see decorated with pRab8 in Fig 3. To test the activation of lipid oxidation likely resulting from the elevation in Fe2+ and oxidation signatures we employed the C11-bodipy probe and observe strong signal specific to the G2019 iron-loaded cells, particularly labelling endocytic compartments and the cell surface (Fig. 4E-G).

      Lastly, an analysis of SYTOX green uptake experiments was done to monitor the uptake of the dye into cells that have died of cell membrane rupture, commonly used to examine ferroptotic cell death. We now show the G2019S cells are very susceptible to this form of death (Fig 4H,I). These data add new functional evidence for the consequence of the G2019S mutation in an increased susceptibility to iron stress.

      R1: The legend for 2F is awkward (BSADQRED)

      HM: We have changed this to BSA-DQRed, which is a widely used probe to monitor the hydrolytic capacity of the lysosome.

      R1: Why are WT cells not included in Fig 2G?

      HM: We have now included new panels in Fig 3C,D showing wild type and G2019S +/- FAS and +/-ML-i2 with quantifications of pRab8/Rab8.

      R1: The biochemical characterization of NCOA4 in the LRRK2-arm cells is a great experiment and strength of the paper. The field would benefit by a bit further interrogation, other detergents, etc.

      HM: We have removed all of the LRRK2ARM data given our confusion over the impact of the 4 amino acid changes in exon 19 and our inability to monitor this protein by western blot. The concept that NCOA4 enters into TX100 insoluble, phase separated compartments has been well established, so we didn't explore other detergents at this point.

      R1: Have the investigators looked for aberrant Rab trafficking to lysosomes in the LRRK2-arm cells? Is pRab8 mislocalized compared to WT? Other pRabs?

      HM: We did initially show that pRab8 was also at the plasma membrane in the LRRK2ARM cells, and we still focus on this finding for the G2019S, seen in Fig 3A,B,F,H. We did try to look at other p-Rabs known to be targets of LRRK2 but none of them worked in immunofluorescence so we couldn't easily monitor specific traffic and/or localization changes for them.

      R1: The expression levels and therefore stability of the ARM fragment is not shown. This is necessary for interpretation. While very intriguing, the data in Aim 3 rely on the assumption that the ARM fragment is expressed, and at comparable levels to G2019S to account for phenotypes. The generation of second clone is admirable, but the expression of the protein must be characterized. This is especially true because of the different LRRK2 levels between WT and G2019S. One could easily conceive of exogenous expression of a tagged-ARM fragment into LRRK2 KO cells, for example, as another proof-of-concept experiment. If it is truly dominant, does this effect require or benefit from some FL LRRK2? It seems easy enough to express the LRRK2-ARM in at least WT and KO RAW cells.

      HM: We agree and our attempts to understand this clone resulted in its removal from the manuscript. We did also express cDNA encoding our ARM domain (up to exon 19), but it didn't phenocopy the CRISPR clone, which of course made sense once we had better proteomics and repeated our deep sequencing.

      In our further efforts to understand why our phenotype was MLi-2 resistant upon iron overload we expanded to examine the impact of pan-specific TypeII kinase inhibitors, and then reached out to the Reck-Peterson and Leschziner labs to obtain a newly developed LRRK2 selective type II kinase inhibitor. These all very efficiently reversed the pRab8 signals seen at the plasma membrane of G2019S cells upon iron overload (Fig 3E-K). Therefore the G2019S is not dominant negative, as we had initially supposed, rather there is a specific conformation of LRRK2 in high iron that potentially opens the ATP binding pocket to bind the type II inhibitors, but not MLi2. We do not understand exactly what this conformation is but likely involves new protein interactions specific to high iron, or perhaps LRRK2 binds iron directly as a sensor somehow that ultimately leads to the differential sensitivity we observe between type I and type II kinase inhibitors. Our data indicate that MLi-2 treatment in clinic will not be protective against iron toxicity phenotypes that may contribute to PD, where these newer selective type II LRRK2 kinase inhibitors would be effective in this conformation-specific context of iron toxicity.

      R1: Does iron overload induce Rab8a phosphorylation in a LRRK2 KO cell? This would be a solid extension on the ARM data and support the important finding that an additional kinase(s) can phosphorylate Rab8a under these conditions, and while not unexpected, this may not have been demonstrated by others as clearly. It also addresses whether the ARM domain is important to this other putative kinase(s), which may add value to the authors' model.

      HM: Iron overload does not induce pRab8 in LRRK2 KO cells, as seen by immunofluorescence in Fig 3A,B, and western blot in Supp Fig 3 A,B. With our new type II kinase inhibitor data we can confirm that the plasma membrane localized Rab8 is indeed phosphorylated by LRRK2.

      R1: Minor concern - the abstract but not the introduction emphasizes a hypothesis that loss of neuromelanin may promote cell loss in PD (through loss of iron chelation), while post mortem studies are by definition only correlative, early works suggested that the higher melanized DA neurons were preferentially lost when compared to poorly melanized neurons in PD. This speculation in the abstract is not necessary to the novel findings of the paper.

      HM: We appreciate that the links to iron in PD are correlative, we have maintained some of our discussion on this point within the manuscript given the lack of attention the field has paid to the cell biology of iron homeostasis in PD models. If there is a cell autonomous nature to the loss of DA neurons in PD, iron is very likely to be a part of this specificity in our opinion. Most of the newer MRI studies looking at iron levels in patient brains are showing higher free iron and working on this as potential biomarkers of disease. The precise timing of this relative to the stability/loss of neuromelanin is, I agree, not really clear.

      R1: (Significance (Required)): This study could shed light on a both novel and unexpected behavior of the LRRK2 protein, and open new insights into how pathogenic mutations may affect the cell. While studied in one cell line known for unusually high LRRK2 expression levels, data in this cell type have been broadly applicable elsewhere. Give the link to Parkinson's disease, Rab-dependent trafficking, and iron homeostasis, the findings could have import and relevance to a rather broad audience.

      HM: We are so very appreciative that reviewer 1 feels our work will be of interest to the PD and cell biology communities.

      Response to Reviewer 2

      Reviewer 2 (R2): Major: Please confirm that the observed phenotype is conserved within bone marrow-derived macrophages of LRRK2 G2019S mice. These mice are widely available within the community and frozen bone marrow could be sent to the labs. The main reason for this experiment is that CRISPR macrophage cell lines do sometimes acquire weird phenotypes (at least in our lab they sometimes do!) and it would strengthen the validity of the observations.

      HM: We did a series of experiments on primary BMDM derived from 3 pairs of wild type, LRRK2G2019S and LRRK2KO mice. We examined levels of ferritin heavy and light chains in steady state and withFAS treatment experiments. Unfortunately the data did not phenocopy the RAW macrophage lines we present here since FTL and FTH were mostly unchanged. We did observe an increase in NCOA4 levels, consistent with potential issues with microautophagy as observed in our RAW system.

      While we understand the danger that our phenotypes are nonspecific and linked to a CRISPR-based anomaly, there are a number of arguments we would make that these data and pathways are potentially very important to our understanding of LRRK2 mutant phenotypes and pathology. The first point is that we now include a LRRK2-specific type II kinase inhibitor that reverses the iron-overload pRab8 accumulation at the plasma membrane in LRRK2G2019S cells, showing that this is at least directly linked to LRRK2 kinase activity, even though it is resistant to MLi2.

      Second, Suzanne Pfeffer recently published their single cell RNAseq datasets from brains of untreated LRRK2G2019S mice (PMID: 39088390). She reported major changes in Ferritin heavy chain (it is lost) in very specific cell types of the brain, astrocytes, microglia and oligodendrocytes, with no changes in other cell types at all (her Fig 6 included left). This is consistent with a very context specific impact of LRRK2 on iron homeostasis that we don't yet understand.

      Third, the labs of both Cookson, Mamais and Lavoie have been working on the impact of LRRK2 mutations on iron handling in a few different model systems, including iPSCs, and see changes in transferrin recycling and iron accumulation. Those studies did not go into much detail on ferritin, NCOA4 and other readouts of iron homeostasis but are roughly in agreement with our work here. In the last biorxiv study submitted after we sent this work for review they concluded their phenotypes were reversed by MLi2 treatment, however they required 7 days of treatment for a ~20% restoration in iron levels. Given our work it would seem the impact of LRRK2G019S in high iron conditions is also very resistant to MLi2 treatment. In all these studies we do not yet know for sure whether iron overload in the brain may be a precursor to DA neuron cell death, which could be exacerbated in G2019S carriers. But we hope the reviewer will agree that our approach and findings will be useful for the field to expand on these concepts within different models of PD.

      R2: Minor comments: Supplementary Fig 1: I don't think one should normalize all controls to 1 and then do a statistical test as obviously the standard deviation of control is 0.

      HM: We agree with the reviewer that statistical testing is not appropriate when the WT control is fixed to a value of 1, as this necessarily eliminates variance in that group; accordingly, we have removed both statistical comparisons and standard deviation from the WT control while retaining variability measures for all experimental conditions. Raw densitometry values could not be pooled across independent experiments due to substantial inter-blot variability, and therefore normalization to the WT control was used solely to allow relative comparison within experiments, acknowledging the inherent quantitative limitations of Western blot densitometry. Ultimately the magnitude of the changes relative to the control lanes in each biological replicate was consistent across experiments, even if the absolute density of the bands between experiments was not always the same.

      R2: The raw data needs to be submitted to PRIDE or similar.

      HM: All of our data is being uploaded to the GEO databases, protocols to protocols.io and raw data deposited on Zenodo site in compliance with our ASAP funding requirements and the journals.

      R2: Some of the western blots could be improved. If these are the best shown, I am a little concerned about the reproducibility. How often has they been done?

      HM: We now ensure there is quantification of all the blots for at least 3 independent experiments and have worked to improve the quality of them throughout the revision period.

      R2: (Significance (Required)): Considering the importance of LRRK2 biology in Parkinson's and the new biology shown, this paper will be of great interest to the community and wider research fields.

      HM: We are so very grateful that the reviewer appreciates that the LRRK2 and PD community will find our work of interest. We hope our revisions will prove satisfactory even in the absence of ferritin changes in primary G2019S BMDM.

      Response to Reviewer 3

      Reviewer 3 (R3): What is missing in the study is the physiological relevance of these findings, mainly whether this effect actually results in higher cell death during iron overload. Since iron overload is known to result in ferroptosis, it is surprising that the authors have not checked whether the LRRK2 G2019S and ARM cells undergo more ferroptosis relative to LRRK2 WT cells.

      HM: We thank the reviewer for pushing us to monitor the functional implications of the iron mishandling upon iron overload in the G2019S RAW cell system. We now add a completely new Figure 4 to get to these functional points. We employed two tools to look at established aspects of ferroptosis, first the C11-bodipy probe that labels oxidized lipids and we see significant signals specific to the G2019S iron loaded cells, where it labels endocytic membranes and the cell surface (Fig 4 E-G). This is consistent with the elevation of free iron 2+. We also used the SYTOX green death assay where the dye is internalized into cells when the cell surface is ruptured and show that G2019S cells die upon iron overload, but not the LRRK2KO or wild type cells (Fig 4 H,I). Lastly, we performed full proteomics analysis of the wt and G2019S RAW cells in iron overload conditions. These data provide a better view of the full stress response initiated in the G2019S cells, including the upregulation of HMOX1 (an NRF2 target gene), changes in lysosomal hydrolytic enzymes consistent with the reduction in BSA-DQRed signals, and in cytoskeleton, which is consistent with the plasma membrane blebbing phenotypes we see in G2019S (Fig. 4A-D and Supp. Fig 4 data). We hope these new data help to position the phenotype into a more physiological output.

      R3: Moreover, their conclusion of the findings as "resistant to LRRK2 kinase inhibitors" is not convincing, since in most of the studies, they have removed the kinase domain, and this description implies the use of pharmacological kinase inhibition which has not been done in this paper.

      HM: We took this comment to heart and, as explained in the general response we removed the LRRK2ARM clones from the study. To understand the kinase function in the iron overload conditions we first explored the pan-specific type II kinase inhibitor rebastinib, shown to inhibit LRRK2. In contrast to MLi2, this drug effectively blocked p-Rab8 in G2019S cells exposed to high iron. However, since it is not specific and likely inhibits about 30-40% of all kinases we reached out to the Reck-Peterson and Leschziner labs who have developed a LRRK2 specific type II kinase inhibitor (published in June 2025 PMID: 40465731). They provided these to us (along with a great deal of discussion) and the two drugs both blocked the effect of LRRK2G2019 on p-Rab8 at the plasma membrane. These data show that the phenotypes we observe are indeed linked to the increased kinase activity of LRRK2, even though they are fully resistant to MLi-2. It suggests that high iron results in some alteration in LRRK2 conformation that alters the ability of MLi2 to block the kinase activity, while still allowing the type II kinase inhibitors that bind deeper in the ATP-binding pocket, to functionally block activity. We believe that these new data remove a great deal of confusion we had in the initial submission to explain the MLi-2 resistance.

      R3: There is lower LRRK2 expression in LRRK2 G2019S cells, have the authors checked Rab phosphorylation to validate the mutation?

      HM: We agree that the G2019S mutation leads a reduction in total LRRK2 levels in the cell, which is likely a compensatory effect to lower kinase activity in the cell. We do show that the G2019S mutation has clear activation of phosphorylation on both Rab8 and at the autophosphorylation site S1292 of LRRK2, as seen in Fig 1A, quantified in Fig 1B. In untreated conditions, these phosphorylation events are reversible upon treatment with MLi-2. We also provide the sequencing data in the supplement to confirm the presence of the G2019S mutation in this clone, shown in Supp Fig. 1A.

      R3: The authors should specify if their cells are heterozygous or homozygous since they are discussing a dominant interfering mutant.

      HM: The G2019S and LRRK2 KO are both homozygous. We state this early in the results section and the methods.

      R3: The transferrin phenotype validated through proteomics and western blot is solid. HM: We agree, thank you very much!

      R3: Quantification in figure 1F-G is problematic, not clear what they mean by "diffuse and lysosomal". Puncta is either colocalising with lysosomes or not colocalising. This needs to be clarified and re-analysed.

      HM: We apologize for the confusion. In control cells the Cherry tagged FTL is efficiently cycling through the lysosomes and we don't see a strong cytosolic (diffuse) pool, which likely reflects the relatively iron-poor culture conditions. However, in G2019S cells, there is a highly elevated amount of FTL, with a strong cytosolic/diffuse stain in steady state, with some flux into lysosomes. In this experiment we chelated iron to test whether this cytosolic pool of FTL was capable of clearing through the lysosomes (ferritinophagy). While there is a cytosolic (diffuse) pool that remains, the pool that fluxes into the lysosome increases in G2019S chelated cells. This is also seen by the reduction in total FTL seen by western blot (endogenous FTL). Our conclusion here is that the general ferritinophagy machinery remains functional in G2019S cells. We have changed the term "diffuse" to "cytosolic" and improved our description of this experiment in the text.

      R3: Text in the first results part called "LRRK2G2019S RAW macrophages have altered iron homeostasis" is very long. It could be divided into more sections to improve readability. HM: We have improved the text to be more descriptive of the conclusions and added new sections

      R3: If the effect is armadillo-dependent, where does LRRK2 G2019S is implicated since there is no kinase domain in these cells?

      HM: Our new data employing the LRRK2-specific type II kinase inhibitors now confirm that the effects of the G2019S on iron overload are indeed kinase dependent, it's just insensitive to MLi2.

      R3: The authors do not show any controls (PCR, sequencing) confirming knockout or truncation. HM: We did higher resolution proteomics and deep sequencing and learned that the "Arm" mutation was not a truncation but a series of 4 point mutations around exon 19. Therefore we removed all data referring to this clone and replaced it with the use of the type II kinase inhibitor experiments. We feel this removed a lot of confusion and provides much clearer conclusions on the role of the kinase activity in iron overload. We may continue to explore what the 4 amino acid mutations created such strong phenotypes, as it could reflect a critical conformational change that impacts the kinase activity. But that is for future work. We now include the sequencing files of the G2019 and KO as Supplementary Data Files 1 and 2.

      R3: The data is interesting and the image quality with the insets is very high. HM: We thank the reviewer for their positive comments!

      R3: Mutant not clearly described in text, did the authors remove just the kinase and ROC-COR domains or all the domains downstream of the Armadillo domain? This is not clear. HM: We have removed the clone from the manuscript.

      R3: The authors cannot conclude that their phenotype is due to the independence of the kinase domain specifically as they are also interfering with the GTPase activity by removing the ROC-COR domains. HM: We agree and our new drugs allow us to confirm that the phenotypes are due to kinase activity, but there is a new conformation of LRRK2 induced in high iron that renders the kinase domain resistant to MLi-2 inhibition. We discuss this in the manuscript now.

      R3: In Figure 3E, is the difference between the "ARM CTRL" and the "ARM FAS" conditions significant? A trend appears to be there, but the p-value is not shown. HM: these data are now removed.

      R3: In figure 4A, it would have been important to check if Rab8 phosphorylation is also observed in LRRK2 KO cells after administration of FAS to further evaluate the mechanism through which this Rab8 phosphorylation is occurring.

      HM: We show that the pRab8 is specific to the G2019S lines and not seen in LRRK2 KO (Fig 3A,B, Supp. Fig. 3A,B).

      R3: The vinculin bands in figure 4A are misaligned with the rest of the bands.

      HM: We now provide new blots for all of these experiments (in Fig 3) as we removed the LRRK2ARM data from the manuscript and the appropriate loading controls are all included.

      R3: The authors do not have any controls to validate the pRab8 staining in IF. This is an important caveat and needs to be addressed. HM: We now include siRNA validation of Rab8 (vs Rab10) to confirm the specificity of the antibody to pRab8 in IF where it labels the plasma membrane in G2019S iron loaded cells.

      R3: The authors should have checked if FAS administration in the LRRK2 G2019S and the ARM cells is leading to ferroptotic cell death (or cell death in general). This is key to validate the link between the altered iron homeostasis in LRRK2 G2019S cells and increased cytotoxicity observed during neurodegeneration.

      HM: As mentioned above, we have added extensively to our new Fig 4 to include full proteomics analysis of the changes in iron loaded G2019S cells, we use C11-Bodipy probes to monitor lipid oxidation, and SYTOX green assays to monitor cell death through cell surface rupture (consistent with ferroptosis). We thank the reviewer for pushing us to do these experiments and provide further relevance to the potential for LRRK2 mutations to promote cell toxicity during neurodegeneration.

      R3: Regarding the literature, the authors are missing some important papers that are preprinted and these studies need to be discussed. This includes a report with opposite findingshttps://www.biorxiv.org/content/10.1101/2025.09.26.678370v1.full and a report showing kinase independent cell death in macrophages https://www.biorxiv.org/content/10.1101/2023.09.27.559807v1.abstract

      HM: We thank the reviewers for alerting us to the biorxiv papers, one of which was submitted after we sent our manuscript to review. We are excited to see the growing interest in the impact of LRRK2 function in iron homeostasis and hope our work will contribute to this. Upon reading the study from the LaVoie lab they do show some sensitivity of the iron loaded phenotype in G2019S cells, however they see a ~20% reduction in lysosomal iron after 7 days of MLi treatment in Astrocytes (their Fig 2L). To us, this is very likely an indication of a relatively high resistance to the drug. I'm sure if they tried these new Type II inhibitors the iron load would be much more rapidly reversed. The specificity of their phenotype to Rab8 is also very interesting considering the cell surface localization we see for pRab8 in our iron loaded system. Similar comments for the Guttierez study in macrophages. We have included the findings of these papers within the manuscript and thank the reviewer for pointing them out.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary:

      In this study, Lamberti et al. investigate how translation initiation and elongation are coordinated at the single-mRNA level in mammalian cells. The authors aim to uncover whether and how cells dynamically adjust initiation rates in response to elongation dynamics, with the overarching goal of understanding how translational homeostasis is maintained. To this end, the study combines single-molecule live-cell imaging using the SunTag system with a kinetic modeling framework grounded in the Totally Asymmetric Simple Exclusion Process (TASEP). By applying this approach to custom reporter constructs with different coding sequences, and under perturbations of the initiation/elongation factor eIF5A, the authors infer initiation and elongation rates from individual mRNAs and examine how these rates covary.

      The central finding is that initiation and elongation rates are strongly correlated across a range of coding sequences, resulting in consistently low ribosome density ({less than or equal to}12% of the coding sequence occupied). This coupling is preserved under partial pharmacological inhibition of eIF5A, which slows elongation but is matched by a proportional decrease in initiation, thereby maintaining ribosome density. However, a complete genetic knockout of eIF5A disrupts this coordination, leading to reduced ribosome density, potentially due to changes in ribosome stalling resolution or degradation.

      Strengths:

      A key strength of this work is its methodological innovation. The authors develop and validate a TASEP-based Hidden Markov Model (HMM) to infer translation kinetics at single-mRNA resolution. This approach provides a substantial advance over previous population-level or averaged models and enables dynamic reconstruction of ribosome behavior from experimental traces. The model is carefully benchmarked against simulated data and appropriately applied. The experimental design is also strong. The authors construct matched SunTag reporters differing only in codon composition in a defined region of the coding sequence, allowing them to isolate the effects of elongation-related features while controlling for other regulatory elements. The use of both pharmacological and genetic perturbations of eIF5A adds robustness and depth to the biological conclusions. The results are compelling: across all constructs and conditions, ribosome density remains low, and initiation and elongation appear tightly coordinated, suggesting an intrinsic feedback mechanism in translational regulation. These findings challenge the classical view of translation initiation as the sole rate-limiting step and provide new insights into how cells may dynamically maintain translation efficiency and avoid ribosome collisions.

      We thank the reviewer for their constructive assessment of our work, and for recognizing the methodological innovation and experimental rigor of our study.

      Weaknesses:

      A limitation of the study is its reliance on exogenous reporter mRNAs in HeLa cells, which may not fully capture the complexity of endogenous translation regulation. While the authors acknowledge this, it remains unclear how generalizable the observed coupling is to native mRNAs or in different cellular contexts.

      We agree that the use of exogenous reporters is a limitation inherent to the SunTag system, for which there is currently no simple alternative for single-mRNA translation imaging. However, we believe our findings are likely generalizable for several reasons.

      As discussed in our introduction and discussion, there is growing mechanistic evidence in the literature for coupling between elongation (ribosome collisions) and initiation via pathways such as the GIGYF2-4EHP axis (Amaya et al. 2018, Hickey et al. 2020, Juszkiewicz et al. 2020), which might operate on both exogenous and endogenous mRNAs.

      As already acknowledged in our limitations section, our exogenous reporters may not fully recapitulate certain aspects of endogenous translation (e.g., ER-coupled collagen processing), yet the observed initiation-elongation coupling was robust across all tested constructs and conditions.

      We have now expanded the Discussion (L393-395) to cite complementary evidence from Dufourt et al. (2021), who used a CRISPR-based approach in Drosophila embryos to measure translation of endogenous genes. We also added a reference to Choi et al. 2025, who uses a ER-specific SunTag reporter to visualize translation at the ER (L395-397).

      Additionally, the model assumes homogeneous elongation rates and does not explicitly account for ribosome pausing or collisions, which could affect inference accuracy, particularly in constructs designed to induce stalling. While the model is validated under low-density assumptions, more work may be needed to understand how deviations from these assumptions affect parameter estimates in real data.

      We agree with the reviewer that the assumption of homogeneous elongation rates is a simplification, and that our work represents a first step towards rigorous single-trace analysis of translation dynamics. We have explicitly tested the robustness of our model to violations of the low-density assumption through simulations (Figure 2 - figure supplement 2). These show that while parameter inference remains accurate at low ribosome densities, accuracy slightly deteriorates at higher densities, as expected. In fact, our experimental data do provide evidence for heterogeneous elongation: the waiting times between termination events deviate significantly from an exponential distribution (Figure 3 - figure supplement 2C), indicating the presence of ribosome stalling and/or bursting, consistent with the reviewer's concern. We acknowledge in the Limitations section (L402-406) that extending the model to explicitly capture transcript-dependent elongation rates and ribosome interactions remains challenging. The TASEP is difficult to solve analytically under these conditions, but we note that simulation-based inference approaches, such as particle filters to replace HMMs, could provide a path forward for future work to capture this complexity at the single-trace level.

      Furthermore, although the study observes translation "bursting" behavior, this is not explicitly modeled. Given the growing recognition of translational bursting as a regulatory feature, incorporating or quantifying this behavior more rigorously could strengthen the work's impact.

      While we do not explicitly model the bursting dynamics in the HMM framework, we have quantified bursting behavior directly from the data. Specifically, we measure the duration of translated (ON) and untranslated (OFF) periods across all reporters and conditions (Figure 1G for control conditions and Figure 4G-H for perturbed conditions), finding that active translation typically lasts 10-15 minutes interspersed with shorter silent periods of 5-10 minutes. This empirical characterization demonstrates that bursting is a consistent feature of translation across our experimental conditions. The average duration of silent periods is similar to what was inferred by Livingston et al. 2023 for a similar SunTag reporter; while the average duration of active periods is substantially shorter (~15 min instead of ~40 min), which is consistent with the shorter trace duration in our system compared to theirs (~15 min compared to ~80 min, on average). Incorporating an explicit two-state or multi-state bursting model into the TASEP-HMM framework would indeed be computationally intensive and represents an important direction for future work, as it would enable inference of switching rates alongside initiation and elongation parameters. We have added this point to the Discussion (L415-417).

      Assessment of Goals and Conclusions:

      The authors successfully achieve their stated aims: they quantify translation initiation and elongation at the single-mRNA level and show that these processes are dynamically coupled to maintain low ribosome density. The modeling framework is well suited to this task, and the conclusions are supported by multiple lines of evidence, including inferred kinetic parameters, independent ribosome counts, and consistent behavior under perturbation.

      Impact and Utility:

      This work makes a significant conceptual and technical contribution to the field of translation biology. The modeling framework developed here opens the door to more detailed and quantitative studies of ribosome dynamics on single mRNAs and could be adapted to other imaging systems or perturbations. The discovery of initiation-elongation coupling as a general feature of translation in mammalian cells will likely influence how researchers think about translational regulation under homeostatic and stress conditions.

      The data, models, and tools developed in this study will be of broad utility to the community, particularly for researchers studying translation dynamics, ribosome behavior, or the effects of codon usage and mRNA structure on protein synthesis.

      Context and Interpretation:

      This study contributes to a growing body of evidence that translation is not merely controlled at initiation but involves feedback between elongation and initiation. It supports the emerging view that ribosome collisions, stalling, and quality control pathways play active roles in regulating initiation rates in cis. The findings are consistent with recent studies in yeast and metazoans showing translation initiation repression following stalling events. However, the mechanistic details of this feedback remain incompletely understood and merit further investigation, particularly in physiological or stress contexts. 

      In summary, this is a thoughtfully executed and timely study that provides valuable insights into the dynamic regulation of translation and introduces a modeling framework with broad applicability. It will be of interest to a wide audience in molecular biology, systems biology, and quantitative imaging.

      We appreciate the reviewer's thorough and positive assessment of our work, and that they recognize both the technical innovation of our modeling framework and its potential broad utility to the translation biology community. We agree that further mechanistic investigation of initiation-elongation feedback under various physiological contexts represents an important direction for future research.

      Reviewer #2 (Public review):

      Summary:

      This manuscript uses single-molecule run-off experiments and TASEP/HMM models to estimate biophysical parameters, i.e., ribosomal initiation and elongation rates. Combining inferred initiation and elongation rates, the authors quantify ribosomal density. TASEP modeling was used to simulate the mechanistic dynamics of ribosomal translation, and the HMM is used to link ribosomal dynamics to microscope intensity measurements. The authors' main conclusions and findings are:

      (1) Ribosomal elongation rates and initiation rates are strongly coordinated.

      (2) Elongation rates were estimated between 1-4.5 aa/sec. Initiation rates were estimated between 0.5-2.5 events/min. These values agree with previously reported values.

      (3) Ribosomal density was determined below 12% for all constructs and conditions.

      (4) eIF5A-perturbations (KO and GC7 inhibition) resulted in non-significant changes in translational bursting and ribosome density.

      (5) eIF5A perturbations resulted in increases in elongation and decreases in initiation rates.

      Strengths:

      This manuscript presents an interesting scientific hypothesis to study ribosome initiation and elongation concurrently. This topic is highly relevant for the field. The manuscript presents a novel quantitative methodology to estimate ribosomal initiation rates from Harringtonine run-off assays. This is relevant because run-off assays have been used to estimate, exclusively, elongation rates.

      We thank the reviewer for their careful evaluation of our work and for recognizing the novelty of our quantitative methodology to extract both initiation and elongation rates from harringtonine run-off assays, extending beyond the traditional use of these experiments.

      Weaknesses:

      The conclusion of the strong coordination between initiation and elongation rates is interesting, but some results are unexpected, and further experimental validation is needed to ensure this coordination is valid. 

      We agree that some of our findings need further experimental investigation in future studies. However, we believe that the coordination between initiation and elongation is supported by multiple results in our current work: (1) the strong correlation observed across all reporters and conditions (Figure 3E), and (2) the consistent maintenance of low ribosome density despite varying elongation rates. While additional experimental validation would be valuable, we note that directly manipulating initiation or elongation independently in mammalian cells remains technically challenging. Nevertheless, our findings are consistent with emerging mechanistic understanding of collision-sensing pathways (GIGYF2-4EHP) that could mediate such coupling, as discussed in our manuscript.

      (1) eIF5a perturbations resulted in a non-significant effect on the fraction of translating mRNA, translation duration, and bursting periods. Given the central role of eIF5a, I would have expected a different outcome. I would recommend that the authors expand the discussion and review more literature to justify these findings.

      We appreciate this comment. This finding is indeed discussed in detail in our manuscript (Discussion, paragraphs 6-7). As we note there, while eIF5A plays a critical role in elongation, the maintenance of bursting dynamics and ribosome density upon perturbation can be explained by compensatory feedback mechanisms. Specifically, the coordinated decrease in initiation rates that counterbalances slower elongation to maintain homeostatic ribosome density. We also discuss several factors that complicate interpretation: (1) potential RQC-mediated degradation masking stronger effects in proline-rich constructs, (2) differences between GC7 treatment and genetic knockout suggesting altered stalling resolution kinetics, and (3) the limitations of using exogenous reporters that lack ER-coupled processing, which may be critical for eIF5A function in endogenous collagen translation (as suggested by Rossi et al., 2014; Mandal et al., 2016; Barba-Aliaga et al., 2021). The mechanistic complexity and tissue-specific nature of eIF5A function in mammals, which differs substantially from the better-characterized yeast system, likely contributes to the nuanced phenotype we observe. We believe our Discussion adequately addresses these points.

      (2) The AAG construct leading to slow elongation is very surprising. It is the opposite of the field consensus, where codon-optimized gene sequences are expected to elongate faster. More information about each construct should be provided. I would recommend more bioinformatic analysis on this, for example, calculating CAI for all constructs, or predicting the structures of the proteins.

      We agree that the slow elongation of the AAG construct is counterintuitive and indeed surprising. Following the reviewer's suggestion, we have now calculated the Codon Adaptation Index (CAI) for all constructs (Renilla 0.89, Col1a1 0.78, Col1a1 mutated 0.74). It is therefore unlikely that codon bias explains the slow translation, particularly since we designed the mutated Col1a1 construct with alanine codons selected to respect human codon usage bias, thereby minimizing changes in codon optimality. As we discuss in the manuscript, we hypothesize that the proline-to-alanine substitutions disrupted co-translational folding of the collagen-derived sequence. Prolines are critical for collagen triple-helix formation (Shoulders and Raines, 2009), and their replacement with alanines likely generates misfolded intermediates that cause ribosome stalling (Barba-Aliaga et al., 2021; Komar et al., 2024). This interpretation is supported by the high frequency (>30%) of incomplete run-off traces for AAG, suggesting persistent stalling events. Our findings thus illustrate an important potential caveat: "optimizing" a sequence based solely on codon usage can be detrimental when it disrupts functionally important structural features or co-translational folding pathways.

      This highlights that elongation rates depend not only on codon optimality but also on the interplay between nascent chain properties and ribosome progression.

      (3) The authors should consider using their methodology to study the effects of modifying the 5'UTR, resulting in changes in initiation rate and bursting, such as previously shown in reference Livingston et al., 2023. This may be outside of the scope of this project, but the authors could add this as a future direction and discuss if this may corroborate their conclusions. 

      We thank the reviewer for this excellent suggestion. We agree that applying our methodology to 5'-UTR variants would provide a complementary test of initiation-elongation coupling, and we have now added this as a future direction in the Discussion (L417-420).

      (4) The mathematical model and parameter inference routines are central to the conclusions of this manuscript. In order to support reproducibility, the computational code should be made available and well-documented, with a requirements file indicating the dependencies and their versions. 

      We have added the Github link in the manuscript (https://github.com/naef-lab/suntag-analysis) and have also deposited the data (.ome.tif) on Zenodo (https://zenodo.org/records/17669332).

      Reviewer #3 (Public review):

      Disclaimer:

      My expertise is in live single-molecule imaging of RNA and transcription, as well as associated data analysis and modeling. While this aligns well with the technical aspects of the manuscript, my background in translation is more limited, and I am not best positioned to assess the novelty of the biological conclusions.

      Summary:

      This study combines live-cell imaging of nascent proteins on single mRNAs with time-series analysis to investigate the kinetics of mRNA translation.

      The authors (i) used a calibration method for estimating absolute ribosome counts, and (ii) developed a new Bayesian approach to infer ribosome counts over time from run-off experiments, enabling estimation of elongation rates and ribosome density across conditions.

      They report (i) translational bursting at the single-mRNA level, (ii) low ribosome density (~10% occupancy

      {plus minus} a few percents), (iii) that ribosome density is minimally affected by perturbations of elongation (using a drug and/or different coding sequences in the reporter), suggesting a homeostatic mechanism potentially involving a feedback of elongation onto initiation, although (iv) this coupling breaks down upon knockout of elongation factor eIF5A.

      Strengths:

      (1) The manuscript is well written, and the conclusions are, in general, appropriately cautious (besides the few improvements I suggest below).

      (2) The time-series inference method is interesting and promising for broader applications. 

      (3) Simulations provide convincing support for the modeling (though some improvements are possible). 

      (4) The reported homeostatic effect on ribosome density is surprising and carefully validated with multiple perturbations.

      (5) Imaging quality and corrections (e.g., flat-fielding, laser power measurements) are robust.

      (6) Mathematical modeling is clearly described and precise; a few clarifications could improve it further.

      We thank the reviewer for recognizing the novelty of the approach and its rigour, and for providing suggestions to improve it further.

      Weaknesses:

      (1) The absolute quantification of ribosome numbers (via the measurement of $i_{MP}$ ) should be improved.This only affects the finding that ribosome density is low, not that it appears to be under homeostatic control. However, if $i_{MP}$ turns out to be substantially overestimated (hence ribosome density underestimated), then "ribosomes queuing up to the initiation site and physically blocking initiation" could become a relevant hypothesis. In my detailed recommendations to the authors, I list points that need clarification in their quantifications and suggest an independent validation experiment (measuring the intensity of an object with a known number of GFP molecules, e.g., MS2-GFP MS2-GFP-labeled RNAs, or individual GEMs).

      We agree with the reviewer that the estimation of the number of ribosomes is central to our finding that translation happens at low density on our reporters. This result derives from our measurement of the intensity of one mature protein (i<sub>MP</sub>), that we have achieved by using a SunTag reporter with a RH1 domain in the C terminus of the mature protein, allowing us to stabilise mature proteins via actin-tethering. In addition, as suggested by the reviewer, we already validated this result with an independent estimate of the mature protein intensity (Figure 5 - figure supplement 2B), which was obtained by adding the mature protein intensity directly as a free parameter of the HMM. The inferred value of mature protein intensity for each construct (10-15 a.u) was remarkably close to the experimental calibration result (14 ± 2 a.u.). Therefore, we have confidence that our absolute quantification of ribosome numbers is accurate.

      (2) The proposed initiation-elongation coupling is plausible, but alternative explanations, such as changes in abortive elongation frequency, should be considered more carefully. The authors mention this possibility, but should test or rule it out quantitatively. 

      We thank the reviewer for the comment, but we consider that ruling out alternative explanations through new perturbation experiments is beyond the scope of the present work.

      (3) The observation of translational bursting is presented as novel, but similar findings were reported by Livingston et al. (2023) using a similar SunTag-MS2 system. This prior work should be acknowledged, and the added value of the current approach clarified.

      We did cite Livingston et al. (2023) in several places, but we recognized that we could add a few citations in key places, to make clear that the observation of bursting is not novel but is in agreement with previous results. We now did so in the Results and Discussion sections.

      (4) It is unclear what the single-mRNA nature of the inference method is bringing since it is only used here to report _average_ ribosome elongation rate and density (averaged across mRNAs and across time during the run-off experiments - although the method, in principle, has the power to resolve these two aspects).

      While decoding individual traces, our model infers shared (population-level) rates. Inferring transcript-specific parameters would be more informative, but it is highly challenging due to the uncertainty on the initial ribosome distribution on single transcripts. Pooling multiple transcripts together allows us to use some assumptions on the initial distribution and infer average elongation and initiation-rate parameters, while revealing substantial mRNA-to-mRNA variability in the posterior decoding (e.g. Figure 3 - figure Supplement 2C). Indeed, the inference still informs on the single-trace run-off time distribution (Figure 3 A) and the waiting time between termination events (Figure 3 - figure supplement 2C), suggesting the presence of stalling and bursting. In addition, the transcript-to-transcript heterogeneity is likely accounted for by our model better than previous methods (linear fit of the average run-off intensity), as suggested by their comparison (Figure 3 - figure supplement 2 A). In the future the model could be refined by introducing transcript-specific parameters, possibly in a hierarchical way, alongside shared parameters.

      (5) I did not find any statement about data availability. The data should be made available. Their absence limits the ability to fully assess and reproduce the findings.

      We have added the Github link in the manuscript (https://github.com/naef-lab/suntag-analysis) and have also deposited the data (.ome.tif) on Zenodo (https://zenodo.org/records/17669332).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      Major Comments:

      (1) Lack of Explicit Bursting Model

      Although translation "bursts" are observed, the current framework does not explicitly model initiation as a stochastic ON/OFF process. This limits insight into regulatory mechanisms controlling burst frequency or duration. The authors should either incorporate a two-state/more-state (bursting) model of initiation or perform statistical analysis (e.g., dwell-time distributions) to quantify bursting dynamics. They should clarify how bursting influences the interpretation of initiation rate estimates.

      We agree with the reviewer that an explicit bursting model (e.g., a two-state telegraph model) would be the ideal theoretical framework. However, integrating such a model into the TASEP-HMM inference framework is computationally intensive and complex. As a robust first step, we have opted to quantify bursting empirically based on the decoded single-mRNA traces. As shown in Figure 1G (control) and Figure 4G (perturbed conditions), we explicitly measured the duration of "ON" (translated) and "OFF" (untranslated) periods. This statistical analysis provides a quantitative description of the bursting dynamics without relying on the specific assumptions of a telegraph model. We have clarified this in the text (L123-125) and, as suggested, added a discussion (L415-417) on the potential extensions of the model to include explicit switching kinetics in the Outlook section.

      (2) Assumption of Uniform Elongation Rates

      The model assumes homogeneous elongation across coding sequences, which may not hold for stalling-prone inserts (e.g., PPG). This simplification could bias inference, particularly in cases of sequence-specific pausing. Adding simulations or sensitivity analysis to assess how non-uniform elongation affects the accuracy of inferred parameters. The authors should explicitly discuss how ribosome stalling, collisions, or heterogeneity might skew model outputs (see point 4).

      A strong stalling sequence that affects all ribosomes equally should not deteriorate the inference of the initiation rate, provided that the low-density assumption holds. The scenario where stalling events lead to higher density, and thus increased ribosome-ribosome interactions, is comparable to the conditions explored in Figure 2E. In those simulations, we tested the inference on data generated with varying initiation and elongation rates, resulting in ribosome densities ranging from low to high. We demonstrated that the inference remains robust at low ribosome densities (<10%). At higher densities, the accuracy of the initiation rate estimate decreases, whereas the elongation rate estimate remains comparatively robust. Additionally, the model tends to overestimate ribosome density under high-density conditions, likely because it neglects ribosome interference at the initiation site (Figure 2 figure supplement 2C). We agree that a deeper investigation into the consequences of stochastic stalling and bursting would be beneficial, and we have explicitly acknowledged this in the Limitations section.

      (3) Interpretation of eIF5A Knockout Phenotype

      The observation that eIF5A KO reduces initiation more than elongation, leading to decreased ribosome density, is biologically intriguing. However, the explanation invoking altered RQC kinetics is speculative and not directly tested. The authors should consider validating the RQC hypothesis by monitoring reporter mRNA stability, ribosome collision markers, or translation termination intermediates.

      We thank the reviewer for the comment, but we consider that ruling out alternative explanations through new experiments is beyond the scope of the present work.

      (4) To strengthen the manuscript, the authors should incorporate insights from three studies.

      - Livingston et al. (PMC10330622) found that translation occurs in bursts, influenced by mRNA features and initiation factors, supporting the coupling of initiation and elongation.

      - Madern et al. (PMID: 39892379) demonstrated that ribosome cooperativity enhances translational efficiency, highlighting coordinated ribosome behavior.

      - Dufourt et al. (PMID: 33927056) observed that high initiation rates correlate with high elongation rates, suggesting a conserved mechanism across cell cultures and organisms.

      Integrating these studies could enrich the manuscript's interpretation and stimulate new avenues of thought.

      We thank the reviewer for the valuable comment. We added citations of Livingston et al. in the context of translational bursting. We already cited Madern et al. in multiple places and, although its observations of ribosome cooperativity are very compelling, they cannot be linked with our observations of a feedback between initiation and elongation, and it would be very challenging to see a similar effect on our reporters. This is why we did not expressly discuss cooperativity. We also integrated Dufourt et al. in the Discussion about the possibility of designing genetically-encoded reporter. We also added a sentence about the possibility of using an ER-specific SunTag reporter, as done recently in Choi et al., Nature (2025) (https://doi.org/10.1038/s41586-025-09718-0).

      Minor Comments:

      (1) Use consistent naming for SunTag reporters (e.g., "PPG" vs "proline-rich") throughout.

      Thank you for the comment. However, the term proline-rich always appears together with PPG, so we believe that the naming is clear and consistent.

      (2) Consider a schematic overview of the experimental design and modeling pipeline for accessibility.

      Thank you for the suggestion. We consider that experimental design and modeling is now sufficiently clearly described and does not justify an additional scheme. 

      (3) Clarify how incomplete run-off traces are handled in the HMM inference.

      Incomplete run-off traces are treated identically to complete traces in our HMM inference. This is possible because our model relies on the probability of transitions occurring per time step to infer rates. It does not require observing the final "empty" state to estimate the kinetic parameters ɑ and λ. The loss of signal (e.g., mRNA moving out of the focal volume or photobleaching) does not invalidate the kinetic information contained in the portion of the trace that was observed. We have clarified this in the Methods section.

      Reviewer #2 (Recommendations for the authors):

      (1) Reproducibility:

      (1.1) The authors should use a GitHub repository with a timestamp for the release version.

      The code is available on GitHub (https://github.com/naef-lab/suntag-analysis).

      (1.2) Make raw images and data available in a figure repository like Figshare.

      The raw images (.ome.tif) are now available on Zenodo (https://zenodo.org/records/17669332).

      (2) Paper reorganization and expansion of the intensity and ribosome quantification:

      (2.1) Given the relevance of the initiation and elongation rates for the conclusions of this study, and the fact that the authors inferred these rates from the spot intensities. I recommend that the authors move Figure 1 Supplement 2 to the main text and expand the description of the process to relate spot intensity and number of ribosomes. Please also expand the figure caption for this image.

      We agree with the importance of this validation. We have expanded the description of the calibration experiment in the main text and in the figure caption.

      (2.2) I suggest the authors explicitly mention the use of HMM in the abstract.

      We have now explicitly mentioned the TASEP-based HMM in the abstract.

      (2.3) In line 492, please add the frame rate used to acquire the images for the run-off assays.

      We have added the specific frame rate (one frame every 20 seconds) to the relevant section.

      (3) Figures and captions:

      (3.1) Figure 1, Supplement 2. Please add a description of the colors used in plots B, C. 

      We have expanded the caption and added the color description.

      (3.2) In the Figure 2 caption. It is not clear what the authors mean by "traceseLife". Please ensure it is not a typo.

      Thank you for spotting this. We have corrected the typo.

      (3.3) Figure 1 A, in the cartoon N(alpha)->N-1, shouldn't the transition also depend on lambda?

      The transition probability was explicitly derived in the “Bayesian modeling of run-off traces” section (Eqs. 17-18), and does not depend on λ, but only on the initiation rate under the low-density assumption.

      (3.4) Figure 3, Supplement 2. "presence of bursting and stalling.." has a typo.

      Corrected.

      (3.5) Figure 5, panel C, the y-axis label should be "run-off time (min)."

      Corrected.

      (3.6) For most figures, add significance bars.

      (3.7) In the figure captions, please add the total number of cells used for each condition.

      We have systematically indicated the number of traces (n<sub>t</sub>) and the number of independent experiments (n<sub>e</sub>) in the captions in this format (n<sub>t</sub>, n<sub>e</sub>).

      (4) Mathematical Methods:

      We greatly thank the reviewer for their detailed attention to the mathematical notation. We have addressed all points below.

      (4.1) In lines 555, Materials and Methods, subsection, Quantification of Intensity Traces, multiple equations are not numbered. For example, after Equation (4), no numbers are provided for the rest of the equations. Please keep consistency throughout the whole document.

      We have ensured that all equations are now consistently numbered throughout the document.

      (4.2) In line 588, the authors mention "$X$ is a standard normal random variable with mean $\mu$ and standard deviation $s_0$". Please ensure this is correct. A standard normal random variable has a 0 mean and std 1. 

      Thank you for the suggestion, we have corrected the text (L678).

      (4.3) Line 546, Equation 2. The authors use mu(x,y) to describe a 2d Gaussian function. But later in line 587, the authors reuse the same variable name in equation 5 to redefine the intensity as mu = b_0 + I.

      We have renamed the 2D Gaussian function to \mu_{2D}(x,y) in the spot tracking section

      (4.4) For the complete document, it could be beneficial to the reader if the authors expand the definition of the relationship between the signal "y" and the spot intensity "I". Please note how the paragraph in lines 582-587 does not properly introduce "y".

      We have added an explicit definition of y and its relationship to the underlying spot intensity I in the text to improve readability and clarity.

      (4.5) Please ensure consistency in variable names. For example, "I" is used in line 587 for the experimental spot intensity, then line 763 redefines I(t) as the total intensity obtained from the TASEP model; please use "I_sim(t)" for simulated intensities. Please note that reusing the variable "I" for different contexts makes it hard for the reader to follow the text. 

      We agree that this was confusing. We have implemented the suggestion and now distinguish simulated intensities using the notation I<sub>S</sub> .

      (4.6) Line 555 "The prior on the total intensity I is an "uninformative" prior" I ~ half_normal(1000). Please ensure it is not "I_0 ~ half_normal(1000)."? 

      We confirm that “I” is the correct variable representing the total intensity in this context; we do not use an “I<sub>0</sub>” variable here.

      (4.7) In lines 595, equation 6. Ensure that the equation is correct. Shouldn't it be: s_0^2 = ln ( 1 + (sigma_meas^2 / ⟨y⟩^2) )? Please ensure that this is correct and it is not affecting the calculated values given in lines 598.

      Thank you for catching this typo. We have corrected the equation in the manuscript. We confirm that the calculations performed in the code used the correct formula, so the reported values remain unchanged.

      (4.8) In line 597, "the mean intensity square ^2". Please ensure it is not "the square of the temporal mean intensity."

      We have corrected the text to "the square of the temporal mean intensity."

      (4.9) In lines 602-619, Bayesian modeling of run-off traces, please ensure to introduce the constant "\ell". Used to define the ribosomal footprint?

      We have added the explicit definition of 𝓁 as the ribosome footprint size (length of transcript occupied by one ribosome) in the "Bayesian modeling of run-off traces" section.

      (4.10) Line 687 has a minor typo "[...] ribosome distribution.. Then, [...]"

      We have corrected the punctuation.

      (4.11) In line 678, Equation 19 introduces the constant "L_S", Please ensure that it is defined in the text.

      We have added the explicit definition of L<sub>S</sub> (the length of the SunTag) to the text surrounding Equation 19.

      (4.12) In line 695, Equation 22, please consider using a subscript to differentiate the variance due to ribosome configuration. For example, instead of "sigma (...)^2" use something like "sigma_c ^2 (...)". Ensure that this change is correctly applied to Equation 24 and all other affected equations.

      Thank you, we have implemented the suggestions.

      (4.13) In line 696, please double-check equations 26 and 27. Specifically, the denominator ^2. Given the previous text, it is hard to follow the meaning of this variable. 

      We have revised the notation in Equations 26 and 27 to ensure the denominator is consistent with the definitions provided in the text.

      (4.14) In lines 726, the authors mention "[...], but for the purposes of this dissertation [...]", it should be "[...], but for the purposes of this study [...]"

      Thank you for spotting this. We have replaced "dissertation" with "study."

      (4.15) Equations 5, 28, 37, and the unnumbered equation between Equations 16 and 17 are similar, but in some, "y" does not explicitly depend on time. Please ensure this is correct. 

      We have verified these equations and believe they are correct.

      (4.16) Please review the complete document and ensure that variables and constants used in the equations are defined in the text. Please ensure that the same variable names are not reused for different concepts. To improve readability and flow in the text, please review the complete Materials and Methods sections and evaluate if the modeling section can be written more clearly and concisely. For example, Equation 28 is repeated in the text.

      We have performed a comprehensive review of the Materials and Methods section. To improve conciseness and flow, we have merged the subsection “Observation model and estimation of observation parameters” with the “Bayesian modeling of run-off traces” section. This allowed us to remove redundant definitions and repeated equations (such as the previous Equation 28). We have also checked that all variables and constants are defined upon first use and that variable names remain consistent throughout the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Data Presentation

      (1.1) In main Figures 1D and 4E, the traces appear to show frequent on-off-on transitions ("bursting"), but in supplementary figures (1-S1A and 4-S1A), this behavior is seen in only ~8 of 54 traces. Are the main figure examples truly representative?

      We acknowledge the reviewer's point. In Figure 1D, we selected some of the longest and most illustrative traces to highlight the bursting dynamics. We agree that the term "representative" might be misleading if interpreted as "average." We have updated the text to state "we show bursting traces" to more accurately reflect the selection.

      (1.2) There are 8 videos, but I could not identify which is which.

      Thank you for pointing this out. We have renamed the video files to clearly correspond to the figures and conditions they represent.

      (2) Data Availability:

      As noted above, the data should be shared. This is in accordance with eLife's policy: "Authors must make all original data used to support the claims of the paper, or that are required to reproduce them, available in the manuscript text, tables, figures or supplementary materials, or at a trusted digital repository (the latter is recommended). [...] eLife considers works to be published when they are posted as preprints, and expects preprints we review to meet the standards outlined here." Access to the time traces would have been helpful for reviewers.

      We have now added the Github link for the code (https://github.com/naef-lab/suntag-analysis) and deposited the raw data (.ome.tif files) on Zenodo (10.5281/zenodo.17669332).

      (3) Model Assumptions:

      (3.1) The broad range of run-off times (Figure 3A) suggests stalling, which may be incompatible with the 'low-density' assumption used on the TASEP model, which essentially assumes that ribosomes do not bump into each other. This could impact the validity of the assumptions that ribosomes behave independently, elongate at constant speed (necessary for the continuum-limit approximation), and that the rate-limiting step is the initiation. How robust are the inferences to this assumption?

      We agree that the deviation of waiting times from an exponential distribution (Figure 3 - figure supplement 2C) suggests the presence of stalling, which challenges the strict low-density assumption and constant elongation speed. We explicitly explored the robustness of our model to higher ribosome densities in simulations. As shown in Figure 2 - figure supplement 2, while the model accuracy for single parameters deteriorates at very high densities (overestimating density due to neglected interference), it remains robust for estimating global rates in the regime relevant to our data. We have expanded the discussion on the limitations of the low density and homogeneous elongation rate assumptions in the text (L404-408).

      (3.2) Since all constructs share the same SunTag region, elongation rates should be identical there and diverge only in the variable region. This would affect $\gamma (t)$ and hence possibly affect the results. A brief discussion would be helpful.

      This is a valid point. Currently, our model infers a single average elongation rate that effectively averages the behavior over the SunTag and the variable CDS regions. Modeling distinct rates for these regions would be a valuable extension but adds significant complexity. While our current "effective rate" approach might underestimate the magnitude of differences between reporters, it captures the global kinetic trend. We have added a brief discussion acknowledging this simplification (L408-412).

      (3.3) A similar point applies to the Gillespie simulations: modeling the SunTag region with a shared elongation rate would be more accurate.

      We agree. Simulating distinct rates for the SunTag and CDS would increase realism, though our current homogeneous simulations serve primarily to benchmark the inference framework itself. We have noted this as a potential future improvement (L413-414).

      (3.4) Equation (13) assumes that switching between bursting and non-bursting states is much slower than the elongation time. First, this should be made explicit. Second, this is not quite true (~5 min elongation time on Figure 3-s2A vs ~5-15min switching times on Figure 1). It would be useful to show the intensity distribution at t=0 and compare it to the expected mixture distribution (i.e., a Poisson distribution + some extra 'N=0' cells). 

      We thank the reviewer for this insightful comment. We have added a sentence to the text explicitly stating the assumption that switching dynamics are slower than the translation time. While the timescales are indeed closer than ideal (5 min vs. 5-15 min), this assumption allows for a tractable approximation of the initial conditions for the run-off inference. Comparing the intensity distribution at t=0 to a zero-inflated Poisson distribution is an excellent suggestion for validation, which we will consider for future iterations of the model.

      (4) Microscopy Quantifications:

      (4.1) Figure 1-S2A shows variable scFv-GFP expression across cells. Were cells selected for uniform expression in the analysis? Or is the SunTag assumed saturated? which would then need to be demonstrated. 

      All cell lines used are monoclonal, and cells were selected via FACS for consistent average cytoplasmic GFP signal. We assume the SunTag is saturated based on the established characterization of the system by Tanenbaum et al. (2014), where the high affinity of the scFv-GFP ensures saturation at expression levels similar to ours.

      (4.2) As translation proceeds, free scFv-GFP may become limiting due to the accumulation of mature SunTag-containing proteins. This would be difficult to detect (since mature proteins stay in the cytoplasm) and could affect intensity measurements (newly synthesized SunTag proteins getting dimmer over time).

      This effect can occur with very long induction times. To mitigate this, we optimized the Doxycycline (Dox) incubation time for our harringtonine experiments to prevent excessive accumulation of mature protein. We also monitor the cytoplasmic background for granularity, which would indicate aggregation or accumulation.

      (4.3) The statements "for some traces, the mRNA signal was lost before the run-off completion" (line 195) and "we observed relatively consistent fractions of translated transcripts and trace duration distributions across reporters" (line 340) should be supported by a supplementary figure.

      The first statement is supported by Figure 2 - figure supplement 1, which shows representative run-off traces for all constructs, including incomplete ones.

      The second statement regarding consistency is supported by the quantitative data in Figure 1E and G, which summarize the fraction of translated transcripts and trace durations across conditions.

      (4.4) Measurements of single mature protein intensity $i_{MP}$:

      (4.4.1) Since puromycin is used to disassemble elongating ribosomes, calibration may be biased by incomplete translation products (likely a substantial fraction, since the Dox induction is only 20min and RNAs need several minutes to be transcribed, exported, and then fully translated).

      As mentioned in the “Live-cell imaging” paragraph, the imaging takes place 40 min after the end of Dox incubation. This provides ample time for mRNA export and full translation of the synthesized proteins. Consequently, the fraction of incomplete products generated by the final puromycin addition is negligible compared to the pool of fully synthesized mature proteins accumulated during the preceding hour.

      (4.4.2) Line 519: "The intensity of each spot is averaged over the 100 frames". Do I understand correctly that you are looking at immobile proteins? What immobilizes these proteins? Are these small aggregates? It would be surprising that these aggregates have really only 1, 2, or 3 proteins, as suggested by Figure 1-S2A.

      We are visualizing mature proteins that are specifically tethered to the actin cytoskeleton. This is achieved using a reporter where the RH1 domain is fused directly to the C-terminus of the Renilla protein (SunTag-Renilla-RH1). The RH1 domain recruits the endogenous Myosin Va motor, which anchors the protein to actin filaments, rendering it immobile. Since each Myosin Va motor interacts with one RH1 domain (and thus one mature protein), the resulting spots represent individual immobilized proteins rather than aggregates. We have now revised the text and Methods section to make this calibration strategy and the construct design clearer (L130-140).

      (4.4.3) Estimating the average intensity $i_{MP}$ of single proteins all resides in the seeing discrete modes in the histogram of Figure 1-S2B, which is not very convincing. A complementary experiment, measuring *on the same microscope* the intensity of an object with a known number of GFP molecules (e.g., MS2-GFP labeled RNAs, or individual GEMs https://doi.org/10.1016/j.cell.2018.05.042 (only requiring a single transfection)) would be reassuring to convince the reader that we are not off by an order of magnitude.

      While a complementary calibration experiment would be valuable, we believe our current estimate is robust because it is independently validated by our model. When we inferred i<sub>MP</sub> as a free parameter in the HMM (Figure 5 - figure supplement 2B), the resulting value (10-15 a.u.) was remarkably consistent with our experimental calibration (14 ± 2 a.u.). We have clarified this independent validation in the text to strengthen the confidence in our quantification (L264-272).

      (4.4.4) Further on the histogram in Figure 1-S2B:

      - The gap between the first two modes is unexpectedly sharp. Can you double-check? It means that we have a completely empty bin between two of the most populated bins.

      We have double-checked the data; the plot is correct, though the sharp gap is likely due to the small sample size (n=29).

      - I am surprised not to see 3 modes or more, given that panel A shows three levels of intensity (the three colors of the arrows).

      As noted below, brighter foci exist but fall outside the displayed range of the histogram.

      - It is unclear what the statistical test is and what it is supposed to demonstrate.

      The Student's t-test compares the means of the two identified populations to confirm they are statistically distinct intensity groups.

      - I count n = 29, not 31. (The sample is small enough that the bars of the histogram show clear discrete heights, proportional to 1, 2, 3, 4, and 5 --adding up all the counts, I get 29). Is there a mistake somewhere? Or are some points falling outside of the displayed x-range?

      You are correct. Two brighter data points fell outside the displayed range. The total number of foci in the histogram is 29. We have corrected the figure caption and the text accordingly.

      (5) Miscellaneous Points: 

      (5.1) Panel B in Figure 2-s1 appears to be missing.

      The figure contains only one panel.

      (5.2) In Equation (7), $l$ is not defined (presumably ribosome footprint length?). Instead, $J$ is defined right after eq (7), as if it were used in this equation.

      Thank you for pointing this out, we have corrected it.

      (5.3) Line 703, did you mean to write something else than "Equation 26" (since equation 26 is defined after)?

      Yes, this was a typo. We have corrected the cross-reference.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors aim to investigate the potential improvements of ANNs when used to explain brain data using top-down feedback connections found in the neocortex. To do so, they use a retinotopic and tonotopic organization to model each subregion of the ventral visual (V1, V2, V4, and IT) and ventral auditory (A1, Belt, A4) regions using Convolutional Gated Recurrent Units. The top-down feedback connections are inspired by the apical tree of pyramidal neurons, modeled either with a multiplicative effect (change of gain of the activation function) or a composite effect (change of gain and threshold of the activation function).

      To assess the functional impact of the top-down connections, the authors compare three architectures: a brain-like architecture derived directly from brain data analysis, a reversed architecture where all feedforward connections become feedback connections and vice versa, and a random connectivity architecture. More specifically, in the brain-like model the visual regions provide feedforward input to all auditory areas, whereas auditory areas provide feedback to visual regions.

      First, the authors found that top-down feedback influences audiovisual processing and that the brain-like model exhibits a visual bias in multimodal visual and auditory tasks. Second, they discovered that in the brain-like model, the composite integration of top-down feedback, similar to that found in the neocortex, leads to an inductive bias toward visual stimuli, which is not observed in the feedforward-only model. Furthermore, the authors found that the brain-like model learns to utilize relevant stimuli more quickly while ignoring distractors. Finally, by analyzing the activations of all hidden layers (brain regions), they found that the feedforward and feedback connectivity of a region could determine its functional specializations during the given tasks.

      Strengths:

      The study introduces a novel methodology for designing connectivity between regions in deep learning models. The authors also employ several tasks based on audiovisual stimuli to support their conclusions. Additionally, the model utilizes backpropagation of error as a learning algorithm, making it applicable across a range of tasks, from various supervised learning scenarios to reinforcement learning agents. Conversely, the presented framework offers a valuable tool for studying top-down feedback connections in cortical models. Thus, it is a very nice study that also can give inspiration to other fields (machine learning) to start exploring new architectures.

      We thank the reviewer for their accurate summary of our work and their kind assessment of its strengths.

      Weaknesses:

      Although the study explores some novel ideas on how to study the feedback connections of the neocortex, the data presented here are not complete in order to propose a concrete theory of the role of top-down feedback inputs in such models of the brain.

      (1) The gap in the literature that the paper tries to fill in the ability of DL algorithms to predict behavior: "However, there are still significant gaps in most deep neural networks' ability to predict behavior, particularly when presented with ambiguous, challenging stimuli." and "[...] to accurately model the brain."

      It is unclear to me how the presented work addresses this gap, as the only facts provided are derived from a simple categorization task that could also be solved by the feedforward-only model (see Figures 4 and 5). In my opinion, this statement is somewhat far-fetched, and there is insufficient data throughout the manuscript to support this claim.

      We can see now that the way the introduction was initially written led to some confusion about our goal in this study. Our goal here was not to demonstrate that top-down feedback can enable superior matches to human behaviour. Rather, our goal was to determine if top-down feedback had any real implications for processing ambiguous stimuli. The sentence that the reviewer has highlighted was intended as an explanation for why top-down feedback, and its impact on ambiguous stimuli, might be something one would want to examine for deep neural networks. But, here, we simply wanted to (1) provide an overview of the code base we have created, (2) demonstrate that top-down feedback does impact the processing of ambiguous stimuli.

      We agree with the reviewer that if our goal was to improve our ability to predict behaviour, then there was a big gap in the evidence we provided here. But, this was not our goal, and we believe that the data we provide here does convincingly show that top-down feedback has an impact on processing of ambiguous stimuli. We have updated the text in the introduction to make our goals more clear for the reader and avoid this misunderstanding of what we were trying to accomplish here. Specifically, the end of the introduction is changed to:

      “To study the effect of top-down feedback on such tasks, we built a freely available code base for creating deep neural networks with an algorithmic approximation of top-down feedback. Specifically, top-down feedback was designed to modulate ongoing activity in recurrent, convolutional neural networks. We explored different architectural configurations of connectivity, including a configuration based on the human brain, where all visual areas send feedforward inputs to, and receive top-down feedback from, the auditory areas. The human brain-based model performed well on all audiovisual tasks, but displayed a unique and persistent visual bias compared to models with only driving connectivity and models with different hierarchies. This qualitatively matches the reported visual bias of humans engaged in audio-visual tasks. Our results confirm that distinct configurations of feedforward/feedback connectivity have an important functional impact on a model's behavior. Therefore, top-down feedback captures behaviors and perceptual preferences that do not manifest reliably in feedforward-only networks. Further experiments are needed to clarify whether top-down feedback helps an ANN fit better to neural data, but the results show that top-down feedback affects the processing of stimuli and is thus a relevant feature that should be considered for deep ANN models in computational neuroscience more broadly.”

      (2) It is not clear what the advantages are between the brain-like model and a feedforward-only model in terms of performance in solving the task. Given Figures 4 and 5, it is evident that the feedforward-only model reaches almost the same performance as the brain-like model (when the latter uses the modulatory feedback with the composite function) on almost all tasks tested. The speed of learning is nearly the same: for some tested tasks the brain-like model learns faster, while for others it learns slower. Thus, it is hard to attribute a functional implication to the feedback connections given the presented figures and therefore the strong claims in the Discussion should be rephrased or toned down.

      Again, we believe that there has been a misunderstanding regarding the goals of this study, as we are not trying to claim here that there are performance advantages conferred by top-down feedback in this case. Indeed, we share the reviewer’s assessment that the feedforward only model seems to be capable of solving this task well. To reiterate: our goal here was to demonstrate that top-down feedback alters the computations in the network and, thus, has distinct effects on behaviour that need to be considered by researchers who use deep networks to model the brain. But we make no claims of “superiority” of the brain-like model.

      In-line with this, we’re not completely sure which claims in the discussion the reviewer is referring to. We note that we were quite careful in our claims. For example, in the first section of the discussion we say:

      “Altogether, our results demonstrate that the distinction between feedforward and feedback inputs has clear computational implications, and that ANN models of the brain should therefore consider top-down feedback as an important biological feature.”

      And later on:

      “In summary, our study shows that modulatory top-down feedback and the architectural diversity enabled by it can have important functional implications for computational models of the brain. We believe that future work examining brain function with deep neural networks should therefore consider incorporating top-down modulatory feedback into model architectures when appropriate.”

      If we have missed a claim in the discussion that implies superiority of the brain-like model in terms of task performance we would be happy to change it.

      (3) The Methods section lacks sufficient detail. There is no explanation provided for the choice of hyperparameters nor for the structure of the networks (number of trainable parameters, number of nodes per layer, etc). Clarifying the rationale behind these decisions would enhance understanding. Moreover, since the authors draw conclusions based on the performance of the networks on specific tasks, it is unclear whether the comparisons are fair, particularly concerning the number of trainable parameters. Furthermore, it is not clear if the visual bias observed in the brain-like model is an emerging property of the network or has been created because of the asymmetries in the visual vs. auditory pathway (size of the layer, number of layers, etc).

      We thank the reviewer for raising this issue, and want to provide some clarifications: First, the number of trainable parameters are roughly equal, since we were only switching the direction of connectivity (top-down versus bottom-up), not the number of connections. We confirmed the biggest difference in size is between models with composite and multiplicative feedback; models with composite feedback have roughly ~1K more parameters, and all models are within the 280K parameter range. We now state this in the methods.

      Second, because superior performance was not the goal of this study, as stated above, we conducted limited hyperparameter tuning. Given the reviewer’s comment, we wondered whether this may have impacted our results. Therefore, we explored different hyperparameters for the model during the multimodal auditory tasks, which show the clearest example of the visual dominance in the brainlike model (Figure 3).

      We explored different hidden state sizes, learning rates and processing times, and examined whether the core results were different. We found that extremely high learning rates (0.1) destabilize all models and that some models perform poorly under different processing times. But overall, the core results are evident across all hyperparameters where the models learn i.e the different behaviors of models with different connectivities and the visual dominance observed in the brainlike model. We now provide these results in a supplementary figure (Fig. S2, showing larger models trained with different learning rates, and Fig S3, which shows the effect of processing time on AS task performance).

      Reviewer #2 (Public review):

      Summary:

      This work addresses the question of whether artificial deep neural network models of the brain could be improved by incorporating top-down feedback, inspired by the architecture of the neocortex.

      In line with known biological features of cortical top-down feedback, the authors model such feedback connections with both, a typical driving effect and a purely modulatory effect on the activation of units in the network.

      To assess the functional impact of these top-down connections, they compare different architectures of feedforward and feedback connections in a model that mimics the ventral visual and auditory pathways in the cortex on an audiovisual integration task.

      Notably, one architecture is inspired by human anatomical data, where higher visual and auditory layers possess modulatory top-down connections to all lower-level layers of the same modality, and visual areas provide feedforward input to auditory layers, whereas auditory areas provide modulatory feedback to visual areas.

      First, the authors find that this brain-like architecture imparts the models with a light visual bias similar to what is seen in human data, which is the opposite in a reversed architecture, where auditory areas provide a feedforward drive to the visual areas.

      Second, they find that, in their model, modulatory feedback should be complemented by a driving component to enable effective audiovisual integration, similar to what is observed in neural data.

      Last, they find that the brain-like architecture with modulatory feedback learns a bit faster in some audiovisual switching tasks compared to a feedforward-only model.

      Overall, the study shows some possible functional implications when adding feedback connections in a deep artificial neural network that mimics some functional aspects of visual perception in humans.

      Strengths:

      The study contains innovative ideas, such as incorporating an anatomically inspired architecture into a deep ANN, and comparing its impact on a relevant task to alternative architectures.

      Moreover, the simplicity of the model allows it to draw conclusions on how features of the architecture and functional aspects of the top-down feedback affect the performance of the network.

      This could be a helpful resource for future studies of the impact of top-down connections in deep artificial neural network models of the neocortex.

      We thank the reviewer for their summary and their recognition of the innovative components and helpful resources therein.

      Weaknesses:

      Overall, the study appears to be a bit premature, as several parts need to be worked out more to support the claims of the paper and to increase its impact.

      First, the functional implication of modulatory feedback is not really clear. The "only feedforward" model (is a drive-only model meant?) attains the same performance as the composite model (with modulatory feedback) on virtually all tasks tested, it just takes a bit longer to learn for some tasks, but then is also faster at others. It even reproduces the visual bias on the audiovisual switching task. Therefore, the claims "Altogether, our results demonstrate that the distinction between feedforward and feedback inputs has clear computational implications, and that ANN models of the brain should therefore consider top-down feedback as an important biological feature." and "More broadly, our work supports the conclusion that both the cellular neurophysiology and structure of feed-back inputs have critical functional implications that need to be considered by computational models of brain function" are not sufficiently supported by the results of the study. Moreover, the latter points would require showing that this model describes neural data better, e.g., by comparing representations in the model with and without top-down feedback to recorded neural activity.

      To emphasize again our specific claims, we believe that our data shows that top-down feedback has functional implications for deep neural network behaviour, not increased performance or neural alignment. Indeed, our results demonstrate that top-down feedback alters the behaviour of the networks, as shown by the differences in responses to various combinations of ambiguous stimuli. We agree with the reviewer that if our goal was to claim either superior performance on these tasks, or better fit to neural data, we would need to actually provide data supporting that claim.

      Given the comments from the reviewer, we have tried to provide more clarity in the introduction and discussion regarding our claims. In particular, we now highlight that we are not trying to demonstrate that the models with top-down feedback exhibit superior performance or better fit to neural data.

      As one final note, yes, the reviewer understood correctly that the “only feedforward” model is a model with only driving inputs. We have renamed the feedforward-only models to drive only models and added additional emphasis in the text to ensure that the distinction is clear for all readers.

      Second, the analyses are not supported by supplementary material, hence it is difficult to evaluate parts of the claims. For example, it would be helpful to investigate the impact of the process time after which the output is taken for evaluation of the model. This is especially important because in recurrent and feedback models the convergence should be checked, and if the network does not converge, then it should be discussed why at which point in time the network is evaluated.

      This is an excellent point, and we thank the reviewer for raising it. We allowed the network to process the stimuli for seven time-steps, which was enough for information from any one region to be transmitted to any other. We found in some initial investigations that if we shortened the processing time some seeds would fail to solve the task. But, based on the reviewer’s comment, we have now also run additional tests with longer processing times for the auditory tasks where we see the clearest visual bias (Figure 3). We find that different process times do not change the behavioral biases observed in our models, but may introduce difficulties ignoring visual stimuli for some models. Thus, while process time is an important hyperparameter for optimal performance of the model, the central claim of the paper remains. We include this new data in a supplementary figure S3.

      Third, the descriptions of the models in the methods are hard to understand, i.e., parameters are not described and equations are explained by referring to multiple other studies. Since the implications of the results heavily rely on the model, a more detailed description of the model seems necessary.

      We agree with the reviewer that the methods could have been more thorough. Therefore, we have greatly expanded the methods section. We hope the model details are now more clear.

      Lastly, the discussion and testable predictions are not very well worked out and need more details. For example, the point "This represents another testable prediction flowing from our study, which could be studied in humans by examining the optical flow (Pines et al., 2023) between auditory and visual regions during an audiovisual task" needs to be made more precise to be useful as a prediction. What did the model predict in terms of "optic flow", how can modulatory from simple driving effect be distinguished, etc.

      We see that the original wording of this prediction was ambiguous, thank you for pointing this out. In the study highlighted (Pines et al., 2023) the authors use an analysis technique for measuring information flow between brain regions, which is related to analysis of optical flow in images, but applied to fMRI scans. This is confusing given the current study, though. Therefore, we have changed this sentence to make clear that we are speaking of information flow here. 

      Reviewer #3 (Public review):

      Summary:

      This study investigates the computational role of top-down feedback in artificial neural networks (ANNs), a feature that is prevalent in the brain but largely absent in standard ANN architectures. The authors construct hierarchical recurrent ANN models that incorporate key properties of top-down feedback in the neocortex. Using these models in an audiovisual integration task, they find that hierarchical structures introduce a mild visual bias, akin to that observed in human perception, not always compromising task performance.

      Strengths:

      The study investigates a relevant and current topic of considering top-down feedback in deep neural networks. In designing their brain-like model, they use neurophysiological data, such as externopyramidisation and hierarchical connectivity. Their brain-like model exhibits a visual bias that qualitatively matches human perception.

      We thank the reviewer for their summary and evaluation of our paper’s strengths.

      Weaknesses:

      While the model is brain-inspired, it has limited bioplausibility. The model assumes a simplified and fixed hierarchy. In the brain with additional neuromodulation, the hierarchy could be more flexible and more task-dependent.

      We agree, there are still many facets of top-down feedback that we have not captured here, and the modulation of hierarchy is an interesting example. We have added some consideration of this point to the limitations section of the discussion.

      While the brain-like model showed an advantage in ignoring distracting auditory inputs, it struggled when visual information had to be ignored. This suggests that its rigid bias toward visual processing could make it less adaptive in tasks requiring flexible multimodal integration. It hence does not necessarily constitute an improvement over existing ANNs. It is unclear, whether this aspect of the model also matches human data. In general, there is no direct comparison to human data. The study does not evaluate whether the top-down feedback architecture scales well to more complex problems or larger datasets. The model is not well enough specified in the methods and some definitions are missing.

      We agree with the reviewer that we have not demonstrated anything like superior performance (since the brain-like network is quite rigid, as noted) nor have we shown better match to human data with the brain-like network. This was not our intended claim. Rather, we demonstrated here simply that top-down feedback impacts behavior of the networks in response to ambiguous stimuli. We have now added statements to the introduction and discussion to make our specific claims (which are supported by our data, we believe) clear.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I believe that the work is very nice but not so mature at this stage. Below, you can find some comments that eventually could improve your manuscript.

      (1) Intro, last sentence: "Therefore, top-down feedback is a relevant feature that should be considered for deep ANN models in computational neuroscience more broadly." I don't understand what the authors refer to with this sentence. There are numerous models (deep ANNs) that have been used to model the neural activity and are much simpler than the one proposed here which contains very complex models and connectivity. Although I do agree that the top-down connections are very important there is no data to support their importance for modeling the brain.

      Respectfully, we disagree with the reviewer that we don’t provide data to demonstrate the importance of top-down feedback for modelling. Indeed, we provided a great deal of data to show that top-down feedback in the networks has real functional implications for behaviour, e.g., it can induce a human-like visual bias. Thus, top-down feedback is a factor that one should care about when modelling the brain. But, we agree with the reviewer that more demonstration of the utility of using top-down feedback for achieving better fits to neural data would be an important next step. 

      (2) I suggest adding some extra supplementary simulations where, for example, the number of data for visual and auditory pathways is equal in size (i.e., the same number of examples), the number of layers is identical (3 per pathway), and also the number of parameters. Doing this would help strengthen the claims presented in the paper.

      In fact, all of the hyperparameters the reviewer mentions here were identical for the different networks, so the experiments the reviewer is requesting here were already part of the paper. We now clarify this in the text.

      (3) Results: I suggest adding Tables with quantifications of the presented results. For example, best performance, epochs to converge, etc. As it is now, it is very hard to follow the evidence shown in Figures.

      This is a good suggestion, we have now added this table to the start of the supplemental figures.

      (4) Figure 2e, 3e: Although VS3, and AS3 have been used only for testing, the plot shows alignments with respect to training epochs. The authors should clarify in the Methods if they tested the network with all intermediate weights during VS1/VS2 or AS1/AS2 training.

      Testing scenarios in this context meant that the model was never shown the scenario/task during training, but the models were indeed evaluated on the VS3 and AS3 after each training epoch. We have added clarifications to the figure legends.

      (5) Methods: It would be beneficial to discuss how specific hyperparameters were selected based on prior research, empirical testing, or theoretical considerations. Also, it is not clear how the alignment (visual or audio) is calculated. Do the authors use the examples that have been classified correctly for both stimuli or do they exclude those from the analysis (maybe I have missed it).

      As noted above, because superior performance was not the goal of this study, we conducted limited hyperparameter tuning. But we have extended the results with additional hyperparameter tuning in a supplementary figure, and describe the hyperparameter choices more thoroughly in the methods. As well, all data includes all model responses, regardless of whether they were correct or not. We now clarify this in the methods.

      (6) Code: The code repository lacks straightforward examples demonstrating how to utilize the modeling approach. Given that it is referred to as a "framework", one would expect it to facilitate easy integration into various models and tasks. Including detailed instructions or clear examples would significantly improve usability and help users effectively apply the proposed methodology.

      We agree with the reviewer, this would be beneficial. We have revised the README of the codebase to explain the model and its usage more clearly and included an interactive jupyter notebook with example training on MNIST.

      Some minor comments are given below. Generally speaking, the Figures need to be more carefully checked for consistent labels, colors, etc.

      (1) Page 4, 1st paragraph - grammar correction: "a larger infragranular layer" or "larger infragranular layers"

      Thank you for catching this, we have fixed the text.

      (2) Page 4, 2nd para - rephrase: "In three additional control ANNs" → "In the third additional control ANN"

      In fact, we did mean three additional control ANNs, each one representing a different randomized connectivity profile. We now clarify this in the text and provide the connectivity of the two other random graphs in the supplemental figures.

      (3) Page 4, VAE acronym needs to be defined before its first use

      The variational autoencoder is introduced by its full name in the text now.

      (4) Page 4: Fig. 2c reference should be Fig. 2b, Fig. 2d should be Fig. 2c, Fig. 2b should be Fig. 2d, VS4; Fig. 2b, bottom should be VS4; Fig. 2f, Fig. 2f to Fig. 2g. Double check the Figure references in the text. Here is very confusing for the reader.

      We have now fixed this, thank you for catching it.

      (5) Page 5, 1st para: "Altogether, our results demonstrated both" → "Altogether, our results demonstrated that both"

      This has been updated.

      (6) Figure 2: In the e and g panels the x label is missing.

      This was actually because the x-axis were the same across the panels, but we see how this was unclear, so we have updated the figure.

      (7) Figure 3: There is no panel g (the title is missing); In panels b, c, e, and g the y label is missing, and in panels e and g the x label is missing. Also, the Feedforward model is shown in panel g but it is introduced later in the text. Please remove it from Figure 3. Also in legend: "AV Reverse graph" → "Reverse graph". Also, "Accuracy" and "Alignment" should be presented as percentages (as in Figure 2).

      This has been corrected.

      (8) Figure 4; x labels are missing.

      As with point (6), this was actually because the x-axis were the same across the panels, but we see how this was unclear, so we have updated the figure.

      (9) Page 7; I can’t find the cited Figure S1.

      Apologies, we have added the supplemental figure (now as S4). It shows the results of models with multiplicative feedback on the task in Fig 5 (as opposed to models with composite feedback shown in the main figure).

      Reviewer #2 (Recommendations for the authors):

      (1) Discussion Section 3.1 is only a literature review, and does not really add any value.

      Respectfully, we think it is important to relate our work to other computational work on the role of top-down feedback, and to make clear what our specific contribution is. But, we have updated the text to try to place additional emphasis on our study’s contribution, so that this section is more than just a literature review.

      “Our study adds to this previous work by incorporating modulatory top-down feedback into deep, convolutional, recurrent networks that can be matched to real brain anatomy. Importantly, using this framework we could demonstrate that the specific architecture of top-down feedback in a neural network has important computational implications, endowing networks with different inductive biases.”

      (2) Including ipython notebooks and some examples would be great to make it easier to use the code.

      We now provide a demo of how to use the code base in a jupyter notebook.

      (3) The description of the model is hard to comprehend. Please name and describe all parameters. Also, a figure would be great to understand the different model equations.

      We have added definitions of all model terms and parameters.

      (4) The terminology is not really clear to me. For example "The results further suggest that different configurations of top-down feedback make otherwise identically connected models functionally distinct from each other and from traditional feedforward only recurrent models." The feedforward and only recurrent seem to contradict each other. Would maybe driving and modulatory be a better term here? I also saw in the code that you differentiate between three types of inputs, modulatory, threshold offset and basal (like feedforward). How about you only classify connections based on these three type? I was also confused about the feedforward only model, because I was unsure whether it is still feedback connections but with "basal" quality, or whether feedback connections between modalities and higher-to-lower level layers were omitted altogether.

      We take the reviewer’s point here. To clarify this, we have updated the text to refer to “driving only” rather than “feedforward only”, to make it obvious that what we change in these models is simply whether the connection has any modulatory impact on the activity. 

      (5) "incorporating it into ANNs can affect their behavior and help determine the solutions that the network can discover." -> Do you mean constrain? Overall, I did not really get this point.

      Yes, we mean that it constrains the solutions that the network is likely to discover.

      (6) "ignore the auditory inputs when they visual inputs were unambiguous" -> the not they

      This has been fixed. Thank you for catching it.

      (7) xlabel in Figure 4 is missing.

      This has been fixed, thank you for catching it.

      Reviewer #3 (Recommendations for the authors):

      Major:

      (1) How alignment is computed is not defined. In addition to a proper definition in the methods section, it would be nice to briefly define it when it first appears in the results section.

      We’ve added an explicit definition of how alignment is calculated in the methods and emphasized the calculation when its first explained in the results

      (2) A connectivity matrix for the feedforward-only model is missing and could be added.

      We have added this to Figure 1.

      (3) The connectivity matrix for each random model should also be shown.

      We’ve shown each of the random model configurations in the new supplemental figure S1.

      (4) Initial parameters are not defined, such as W, b etc. A table with all model parameters would be great.

      We have added a table to the methods listing all of the parameters.

      (5) Would be nice to show the t-sne plots (not just the NH score) for each model and each task in the appendix.

      We can provide these figures on request. They massively increase the file size of the paper pdf, as there’s 49 of them for each task and each model, 980 in total. An example t-SNE plot is provided in figure 6.

      Minor:

      (1) Page 4:

      "we refer to this as Visual-dominant Stimulus case 1, or VS1; Fig. 1a, top)." This should be Fig. 2a.

      (2) "In stimulus condition VS1, all of the models were able to learn to use the auditory clues to disambiguate the images (Fig. 2c)."

      This should be Fig. 2b.

      (3) "In comparison, in VS2, we found that the brainlike model learned to ignore distracting audio inputs quickly and consistently compared to the random models, and a bit more rapidly than the auditory information (Fig 2d)."

      This should be Fig. 2c.

      (4) "VS3; Fig. 2b, top"

      This should be Fig. 2d

      (5) "while all other models had to learn to do so further along in training (Fig. 2e)."

      It is not stated explicitly, but this suggests that the image-aligned target was considered correct, and that weight updates were happening.

      (6) "VS4; Fig. 2b, bottom"

      This should be Fig. 2f

      (7) "adept at learning (Fig. 2f)."

      This should be Fig. 2g

      (8) Figure 3:b,c,e y-labels are missing

      3f: both x and y labels are missing

      (9) Figure labeling in the text is not consistent (Fig. 1A versus Fig. 2a)

      (10) Doubled "the" in ""This shows that the inductive bias towards vision in the brainlike model depended on the presence of the multiplicative component of the the feedback"

      (11) Page 9 Figure 6: The caption says b shows the latent spaces for the VS2 task, whereas the main text refers to 6b as showing the latent space for the AS2 task. Please correct which task it is.

      (12) Methods 4.1 page 13

      "which is derived from the feedback input (h_{l−1})"

      This should be h_{l+1}

      (13) r_l, u_l, u and c are not defined to which aspects of the model they refer to

      Even though this is based on a previous model, the methods section should completely describe the model.

      Equations 1,2,3: the notation [x;y] is unclear and should be defined.

      Equation 5: u should probably be u_l.

      (14) Page 14 typo: externopyrmidisation.

      (15) It is confusing to use different names for the same thing: the all-feedforward model, the all feedforward network, the feedforward network, and the feedforward-only model are probably all the same? Consistent naming would help here.

      Thank you for the detailed comments! We’ve fixed the minor errors and renamed the feedforward models to drive-only models.

    1. Reviewer #1 (Public review):

      Summary:

      This manuscript presents findings on the adaptation mechanisms of Saccharomyces cerevisiae under extreme stress conditions. The authors try to generalize this to adaptation to stress tolerance. A major finding is that S. cerevisiae evolves a quiescence-like state with high trehalose to adapt to freeze-thaw tolerance independent of their genetic background. The manuscript is comprehensive, and each of the conclusions is well supported by careful experiments.

      Strengths:

      This is excellent interdisciplinary work.

      I have commented on the response of the authors, in-line, below. This is to maintain the conversation thread with the authors.

      Comment 1:

      Earlier papers have shown that loss of ribosomal proteins, that slow growth, leads to better stress tolerance in S. cerevisiae. Given this, isn't it expected that any adaptation that slows down growth would, overall, increase stress tolerance? Even for other systems, it has been shown that slowing down growth (by spore formation in yeast or bacteria/or dauer formation in C. elegans) is an effective strategy to combat stress and hence is a likely route to adaptation. The authors stress this as one of the primary findings. I would like the authors to explain their position, detailing how their findings are unexpected in the context of the literature.

      Response:

      We agree that the link between slower growth and higher stress tolerance has been well stud-ied. What is distinctive here is that repeated, near-lethal freeze-thaw selected not only for a tolerant/quiescent-like state but also for a shorter lag on re-entry. In this regime of freeze-thaw-regrowth, cells that are tolerant but slow to restart would be outcompeted by naive fast growers. Our quiescence-based selection simulations reproduce exactly this constraint. We have added this explanation to the Results to make clear that the novelty is the co-evolution of a tolerant, trehalose-rich state together with rapid regrowth under an alternating regime.

      Comment to Response: I get the point. I believe that the outcome is highly dependent on how selection pressure is administered. So, generalizing this over all stresses (as done in the abstract) may not be accurate.

      Comment 2:

      Convergent evolution of traits: I find the results unsurprising. When selecting for a trait, if there is a major mode to adapt to that stress, most of the strains would adapt to that mode, independent of the route. According to me, finding out this major route was the objective of many of the previous reports on adaptive evolution. The surprising part in the previous papers (on adaptive evolution of bacteria or yeast) was the resampling of genes that acquired mutations in multiple replicates of an evolution experiments, providing a handle to understand the major genetic route or the molecular mechanism that guides the adaptation (for example in this case it would be - what guides the over-accumulation of trehalose). I fail to understand why the authors find the results surprising, and I would be happy to understand that from the authors. I may have missed something important.

      Response:

      Our surprise was precisely that we did not see the classical pattern of "phenotypic convergence + repeated mutations in the same locus/module." All independently evolved lines converged on a trehalose-rich, mechanically reinforced, quiescence-like phenotype, but population sequencing across lines did not reveal a single repeatedly hit gene or small shared pathway, even when we increased selection stringency (1-3 freeze-thaw cycles per round). We have now stated in the manuscript that this decoupling (strong phenotypic convergence, non-overlapping genetic routes) is the central inference: selection is acting on a physiologically defined state that multiple genotypes can reach.

      Comment to Response: You indeed saw a case of phenotypic convergence. Converging towards trehalose-rich, mechanically reinforced, quiescent like - are phenotypes that have converged. This is what prevented lysis. The same locus need not be mutated over and over again, if the trehalose pathway is controlled by many processes (it is, and many are still unknown as I point in the next comment), many different mutations on different loci can result in the same regulation! I do not see the decoupling between phenotypic convergence and decoupling of genetic mutations as surprising or novel; molecular and cellular biology is replete with such examples where deletion(mutation) of hundreds of different genes can have the same phenotypic outcome (yeast deletion library screening, indirect effects etc). If this was a specific question unsolved in evolutionary biology, then the matter is different.

      A minor point: Here I would also like to point out that the three phenotypes you measure may be linked to each other, so their independent evolution may just be a cause-effect relationship. For example Trehalose accumulation may drive the other two. This has not been deconvoluted in this manuscript.

      Comment 3:

      Adaptive evolution would work on phenotype, as all of selective evolution is supposed to. So, given that one of the phenotypes well-known in literature to allow free-tolerance is trehalose accumulation, I think it is not surprising that this trait is selected. For me, this is not a case of "non-genetic" adaptation as the authors point out: it is likely because perturbation of many genes can individually result in the same outcome - up-regulation of trehalose accumulation. Thereby, although the adaptation is genetic, it is not homogeneous across the evolving lines - the end result is. Do the authors check that the trait is actually a non-genetic adaptation, i.e., if they regrow the cells for a few generations without the stress, the cells fall back to being similarly only partially fit to freeze-thaw cycles? Additionally, the inability to identify a network that is conserved in the sequencing does not mean that there is no regulatory pathway. A large number of cryptic pathways may exist to alter cellular metabolic states.<br /> This is a point in continuation of point #2, and I would like to understand what I have missed.

      Response:

      We agree, and we have removed the wording "non-genetic adaptation." The evolved populations retain high survival even after regrowth for {greater than or equal to}25 generations without freeze-thaw, so the adaptation is clearly genetically maintained. What our data show is that there is no single genetic route to the shared phenotype; different mutations can all drive cells into the same trehalose-rich, quiescence-like, mechanochemically reinforced state. We now describe this as "genetic diversification with phenotypic convergence."

      Comment to Response: While the last term does explain what is going on, isn't it an outcome that is routine in cell biology (as pointed out in my previous comment to your response)? I apologize for not understanding the punchline that is provided in the last few sentences of the abstract.

      Comment 4:

      To propose the convergent nature, it would be important to check for independently evolved lines and most probably more than 2 lines. It is not clear from their results section if they have multiple lines that have evolved independently.

      Response:

      We indeed evolved four independent lines and maintained two independent controls. We have added this information at the start of the Results so that the level of replication is immediately clear.

      Comment to Response: Previous large scale studies have done hundreds of sequencing to oversample the pathway and figure out reproducible loci. With pooled sequencing (as mentioned below) and only 4 sample evolution, I am not sure that you would have the power in your study to conclude in the loci are sampled or not! If there were 10 gene LOFs that control Trehalose levels (which you can find from the published deletion screening experiment), then four of the experiments are likely to go through one of these routes; what is the likely event that you would identify the same route in two pools? It is unlikely, and therefore, sequencing of 4 pools cannot tell you if the mutation path is repeatedly sampled or not.

      Comment 5:

      For the genomic studies, it is not clear if the authors sequenced a pool or a single colony from the evolved strains. This is an important point, since an average sequence will miss out on many mutations and only focus on the mutations inherited from a common ancestral cell. It is also not clear from the section.

      Response:

      We sequenced population samples from the evolved lines. Our specific question was whether independently evolved lines would show the same high-frequency genetic solution, as is often seen in parallel evolution. Pool sequencing may under-sample rare/private variants, but it is appropriate for detecting such shared, high-frequency routes - and we do not find any. We have clarified this rationale in the Methods/Results.

      Comment to Response: Please provide the average sequencing depth of each sequencing run. It is essential to understand the power of this study in identifying mutations. What coverage was used in Xgenome size?

    2. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and the reviewers for the detailed and constructive comments. In revising the manuscript we have: (i) clarified what is new relative to prior stress tolerance work, (ii) made explicit that we observe phenotypic convergence without a shared genetic route, (iii) stated upfront that we evolved four independent lines plus two controls, and (iv) corrected figure legends, statistics, and the missing citations. Below we respond point-by-point.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents findings on the adaptation mechanisms of Saccharomyces cerevisiae under extreme stress conditions. The authors try to generalize this to adaptation to stress tolerance. A major finding is that S. cerevisiae evolves a quiescence-like state with high trehalose to adapt to freeze-thaw tolerance independent of their genetic background. The manuscript is comprehensive, and each of the conclusions is well supported by careful experiments.

      Strengths:

      This is excellent interdisciplinary work.

      Weaknesses:

      I have questions regarding the overall novelty of the proposal, which I would like the authors to explain.

      (1) Earlier papers have shown that loss of ribosomal proteins, that slow growth, leads to better stress tolerance in S. cerevisiae. Given this, isn’t it expected that any adaptation that slows down growth would, overall, increase stress tolerance? Even for other systems, it has been shown that slowing down growth (by spore formation in yeast or bacteria/or dauer formation in C. elegans) is an effective strategy to combat stress and hence is a likely route to adaptation. The authors stress this as one of the primary findings. I would like the authors to explain their position, detailing how their findings are unexpected in the context of the literature.

      We agree that the link between slower growth and higher stress tolerance has been well studied. What is distinctive here is that repeated, near-lethal freeze–thaw selected not only for a tolerant/quiescent-like state but also for a shorter lag on re-entry. In this regime of freeze–thaw–regrowth, cells that are tolerant but slow to restart would be outcompeted by naive fast growers. Our quiescence-based selection simulations reproduce exactly this constraint. We have added this explanation to the Results to make clear that the novelty is the co-evolution of a tolerant, trehaloserich state together with rapid regrowth under an alternating regime.

      (2) Convergent evolution of traits: I find the results unsurprising. When selecting for a trait, if there is a major mode to adapt to that stress, most of the strains would adapt to that mode, independent of the route. According to me, finding out this major route was the objective of many of the previous reports on adaptive evolution. The surprising part in the previous papers (on adaptive evolution of bacteria or yeast) was the resampling of genes that acquired mutations in multiple replicates of an evolution experiments, providing a handle to understand the major genetic route or the molecular mechanism that guides the adaptation (for example in this case it would be - what guides the overaccumulation of trehalose). I fail to understand why the authors find the results surprising, and I would be happy to understand that from the authors. I may have missed something important.

      Our surprise was precisely that we did not see the classical pattern of “phenotypic convergence + repeated mutations in the same locus/module.” All independently evolved lines converged on a trehalose-rich, mechanically reinforced, quiescence-like phenotype, but population sequencing across lines did not reveal a single repeatedly hit gene or small shared pathway, even when we increased selection stringency (1–3 freeze–thaw cycles per round). We have now stated in the manuscript that this decoupling (strong phenotypic convergence, non-overlapping genetic routes) is the central inference: selection is acting on a physiologically defined state that multiple genotypes can reach.

      (3) Adaptive evolution would work on phenotype, as all of selective evolution is supposed to. So, given that one of the phenotypes well-known in literature to allow free-tolerance is trehalose accumulation, I think it is not surprising that this trait is selected. For me, this is not a case of ”non-genetic” adaptation as the authors point out: it is likely because perturbation of many genes can individually result in the same outcome - up-regulation of trehalose accumulation. Thereby, although the adaptation is genetic, it is not homogeneous across the evolving lines - the end result is. Do the authors check that the trait is actually a non-genetic adaptation, i.e., if they regrow the cells for a few generations without the stress, the cells fall back to being similarly only partially fit to freeze-thaw cycles? Additionally, the inability to identify a network that is conserved in the sequencing does not mean that there is no regulatory pathway. A large number of cryptic pathways may exist to alter cellular metabolic states.

      This is a point in continuation of point #2, and I would like to understand what I have missed.

      We agree, and we have removed the wording “non-genetic adaptation.” The evolved populations retain high survival even after regrowth for ≥25 generations without freeze–thaw, so the adaptation is clearly genetically maintained. What our data show is that there is no single genetic route to the shared phenotype; different mutations can all drive cells into the same trehalose-rich, quiescencelike, mechanochemically reinforced state. We now describe this as “genetic diversification with phenotypic convergence.”

      (4) To propose the convergent nature, it would be important to check for independently evolved lines and most probably more than 2 lines. It is not clear from their results section if they have multiple lines that have evolved independently.

      We indeed evolved four independent lines and maintained two independent controls. We have added this information at the start of the Results so that the level of replication is immediately clear.

      (5) For the genomic studies, it is not clear if the authors sequenced a pool or a single colony from the evolved strains. This is an important point, since an average sequence will miss out on many mutations and only focus on the mutations inherited from a common ancestral cell. It is also not clear from the section.

      We sequenced population samples from the evolved lines. Our specific question was whether independently evolved lines would show the same high-frequency genetic solution, as is often seen in parallel evolution. Pool sequencing may under-sample rare/private variants, but it is appropriate for detecting such shared, high-frequency routes — and we do not find any. We have clarified this rationale in the Methods/Results.

      Reviewer #2 (Public review):

      Summary:

      The authors used experimental evolution, repeatedly subjecting Saccharomyces cerevisiae populations to rapid liquid-nitrogen freeze-thaw cycles while tracking survival, cellular biophysics, metabolite levels, and whole-genome sequence changes. Within 25 cycles, viability rose from ~2 % to ~70 % in all independent lines, demonstrating rapid and highly convergent adaptation despite distinct starting genotypes. Evolved cells accumulated about threefold more intracellular trehalose, adopted a quiescence-like phenotype (smaller, denser, non-budding cells), showed cytoplasmic stiffening and reduced membrane damage, and re-entered growth with shorter lag traits that together protected them from ice-induced injury. Whole-genome sequencing indicated that multiple genetic routes can yield the same mechano-chemical survival strategy. A population model in which trehalose controls quiescence entry, growth rate, lag, and freeze-thaw survival reproduced the empirical dynamics, implicating physiological state transitions rather than specific mutations as the primary adaptive driver. The study therefore concludes that extreme-stress tolerance can evolve quickly through a convergent, trehalose-rich quiescence-like state that reinforces membrane integrity and cytoplasmic structure.

      Strengths:

      The strengths of the paper are the experimental design, data presentation and interpretation, and that it is well-written.

      (1) While the phenotyping is thorough, a few more growth curves would be quite revealing to determine the extent of cross-stress protection. For example, comparing growth rates under YPD vs. YPEG (EtOH/glycerol), and measuring growth at 37ºC or in the presence of 0.8 M KCl.

      We thank the referee for the interesting suggestions. However, growth rates alone may be difficult to interpret since WT strains also show different growth rates under these conditions. Therefore, comparing the relative fitness or survival of the evolved strains versus the WT under these stresses would be more informative. In the present study we limited growth/survival measurements to what was needed to parameterize the adaptation model in YPD under the freeze–thaw regime. We have now added a statement in the Discussion that, given the shared trehalose/mechanical mechanism, such cross-stress assays are an expected and straightforward follow-up.

      (2) Is GEMS integrated prior to evolution? Are the evolved cells transformable?

      Yes. GEMs were integrated prior to evolution, because the non-integrated evolved population showed low transformation efficiency, likely due to altered cell-wall properties.

      (3) From the table, it looks like strains either have mutations in Ras1/2 or Vac8. Given the known requirements of Ras/PKA signaling for the G1/S checkpoint (to make sure there are enough nutrients for S phase), this seems like a pathway worth mentioning and referencing. Regarding Vac8, its emerging roles in NVJ and autophagy suggest another nutrient checkpoint, perhaps through TORC1. The common theme is rewired metabolism, which is probably influencing the carbon shuttling to trehalose synthesis.

      We appreciate the reviewer’s suggestion to consider pathways like Ras/PKA (linked to Ras1/2) and autophagy/TORC1 (linked to Vac8) as potential upstream modulators. While these pathways are involved in nutrient sensing and metabolic regulation, we choose not to emphasize them specifically. This is because (i) some evolved lines lack Ras1/2 or Vac8 variants, and (ii) none of the variants lies directly in trehalose synthesis/degradation pathways. Furthermore, direct links to trehalose accumulation are not well established for these specific variants in this context, and pathways like Ras are global regulators with broad effects. Together with the strongly convergent phenotype, this supports our main inference that multiple genetic/metabolic routes can feed into the same trehalose-rich, mechanochemically reinforced, quiescence-like state. We have added a note in the discussion regarding metabolic rewiring and trehalose.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Generally, the results sections should have more details. The figures should be corrected, and the legends should be checked for correctness. The manuscript seems to have been assembled in haste?

      We have expanded the relevant Results subsections with one-sentence motivations (why each measurement was performed) and we have corrected the figure legends for ordering and consistency.

      Figure 3: It will be good to have the correct p-values on the figure itself. P-values are typically less than 1, unless there is some special method (here the values presented are , etc). Please explain how the P-values were obtained in the figure legend itself.

      Figure 3 now shows the actual p-values. The legend specifies the details and the sample sizes used.

      Figure 5: It is not clear what the error bars show in 5B, E (different evolved population/ clones/ cells?). All the figure legends are mixed up, please correct them. It is difficult to follow the paper.

      Figure 5 legends now state clearly what the error bars represent (biological replicates) and which panels are from single-cell measurements. We have checked the panel lettering and legend order for consistency with the flow of the main text.

      Reviewer #3 (Recommendations for the authors):

      Overall, the paper is outstanding, well-written, and insightful.

      A point to address is that there are missing citations on lines 60, 91.

      We have added the missing citations at both locations. We apologize for the omission, which was due to a compilation error. This error has been fixed, and the bibliography has been corrected (now containing 74 references).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      Schafer et al. tested whether the hippocampus tracks social interactions as sequences of neural states within an abstract social space defined by dimensions of affiliation and power, using a task in which participants engaged in narrative-based social interactions. The findings of this study revealed that individual social relationships are represented by unique sequences of hippocampal activity patterns. These neural trajectories corresponded to the history of trial-to-trial affiliation and power dynamics between participants and each character, suggesting an extended role of the hippocampus in encoding sequences of events beyond spatial relationships.

      The current version has limited information on details in decoding and clustering analyses which can be improved in the future revision.

      Strengths:

      (1) Robust Analysis: The research combined representational similarity analysis with manifold analyses, enhancing the robustness of the findings and the interpretation of the hippocampus's role in social cognition.

      (2) Replicability: The study included two independent samples, which strengthens the generalizability and reliability of the results.

      Weaknesses:

      I appreciate the authors for utilizing contemporary machine-learning techniques to analyze neuroimaging data and examine the intricacies of human cognition. However, the manuscript would benefit from a more detailed explanation of the rationale behind the selection of each method and a thorough description of the validation procedures. Such clarifications are essential to understand the true impact of the research. Moreover, refining these areas will broaden the manuscript's accessibility to a diverse audience.

      We thank the reviewer for these comments and have addressed them in various ways.

      First, we removed the spline-based decoding and spectral clustering analyses. As we detail in our response to the recommendations, these approaches were complex and raised legitimate interpretational concerns, making it unclear how they supported our core claims. The revised manuscript now focuses on a set of representational similarity analyses to show representations consistent with social dimension similarity (affiliation vs. power decision trials) and social location similarity (trajectory/map-like coding based on participant choices).

      Second, we expanded the Methods and Results to more clearly explain the analyses, the questions they address, and associated controls and robustness tests. The dimension similarity analysis tests whether hippocampal patterns differentiate affiliation and power decisions in a way consistent with an abstract dimension representation. The location similarity RSAs test whether within-character neural pattern distances scale with Euclidean distance in social space (relationship-specific trajectories), and whether pattern distances across all characters scale with location distances when distances are globally standardized, consistent with a shared map-like coordinate system.

      Third, we emphasize new controls. For the dimension similarity RSA, we test for potential confounds such as word count, text sentiment, and reaction time differences between affiliation and power trials. For the location similarity RSA, we control for temporal distance between trials and show (in the Supplement) that the reported effects cannot be explained by temporal autocorrelation in the fMRI data or by the relationship between temporal distance and behavioral location distance.

      We believe that these changes address the reviewer’s request for clearer rationale and validation.

      Reviewer #2 (Public review):

      Summary:

      Using an innovative task design and analysis approach, the authors set out to show that the activity patterns in the hippocampus related to the development of social relationships with multiple partners in a virtual game. While I found the paper highly interesting (and would be thrilled if the claims made in the paper turned out to be true), I found many of the analyses presented either unconvincing or slightly unconnected to the claims that they were supposed to support. I very much hope the authors can alleviate these concerns in a revision of the paper.

      Strengths & Weaknesses:

      (1) The innovative task design and analyses, and the two independent samples of participants are clear strengths of the paper.

      We thank the reviewer for this comment.

      (2) The RSA analysis is not what I expected after I read the abstract and tile of the result section "The hippocampus represents abstract dimensions of affiliation and power". To me, the title suggests that the hippocampus has voxel patterns, which could be read out by a downstream area to infer the affiliation and power value, independent of the exact identity of the character in the current trial. The presented RSA analysis however presents something entirely different - namely that the affiliation trials and power trials elicit different activity patterns in the area indicated in Figure 3. What is the meaning of this analysis? It is not clear to me what is being "decoded" here and alternative explanations have not been considered. How do affiliation and power trials differ in terms of the length of sentences, complexity of the statements, and reaction time? Can the subsequent decision be decoded from these areas? I hope in the revision the authors can test these ideas - and also explain how the current RSA analysis relates to a representation of the "dimensions of affiliation and power".

      We agree that this analysis needed to be better justified and explained. We have revised the text to clarify that by “represents the interaction decision trials along abstract social dimensions” we mean that hippocampal multivoxel patterns differentiate affiliation and power decisions in a way consistent with the conceptual framework of underlying latent dimensions. The analysis tests one simple prediction of this view – that on average these trial types are separable in the neural patterns. We have added details to the Methods, showing how the affiliation and power trials do not differ in word count or in sentiment, but do differ in their semantics, as assessed by a Large Language Model, as we expect from our task assumptions. Thanks to the reviewer’s comment, we also tested for and found a reaction time difference between affiliation and power trials, that we now control for.

      (3) Overall, I found that the paper was missing some more fundamental and simpler RSA analyses that would provide a necessary backdrop for the more complicated analyses that followed. Can you decode character identity from the regions in question? If you trained a simple decoder for power and affiliation values (using the LLE, but without consideration of the sequential position as used in the spline analysis), could you predict left-out trials? Are affiliation and power represented in a way that is consistent across participants - i.e. could you train a model that predicts affiliation and power from N-1 subjects and then predict the Nth subject? Even if the answer to these questions is "no", I believe that they are important to report for the reader to get a full understanding of the nature of the neural representations in these areas. If the claim is that the hippocampus represents an "abstract" relationship space, then I think it is important to show that these representations hold across relationships. Otherwise, the claim needs to be adjusted to say that it is a representation of a relationship-specific trajectory, but not an abstract social space.

      We appreciate this comment and agree on the value of clear, conceptually simple analyses. To address this concern, we have simplified our main analysis significantly by removing the spline-based analysis and substituting it with a multiple regression representational similarity analysis approach. We test whether within-character neural pattern distances scale with distance in social space (relationship-specific trajectories), and whether pattern distances across all characters scale with location distances when distances are globally standardized. We find evidence for both, consistent with a shared map-like coordinate system.

      We agree that decoding character identity and an across-participant decoding approach could be informative. However, our current task is not well designed for such analyses and as such would complicate the paper. Although we agree that these questions are interesting, they would test questions that are outside the scope of this paper. 

      (4) To determine that the location of a specific character can be decoded from the hippocampal activity patterns, the authors use a sequential analysis in a lowdimensional space (using local linear embedding). In essence, each trial is decoded by finding the pair of two temporally sequential trials that is closest to this pattern, and then interpolating the power/affiliation values linearly between these two points. The obvious problem with this analysis is that fMRI pattern will have temporal autocorrelation and the power and affiliation values have temporal autocorrelation. Successful decoding could just reflect this smoothness in both time series. The authors present a series of control analyses, but I found most of them to not be incisive or convincing and I believe that they (and their explanation of their rationale) need to be improved. For example, the circular shifting of the patterns preserves some of the autocorrelation of the time series - but not entirely. In the shifted patterns, the first and last items are considered to be neighboring and used in the evaluation, which alone could explain the poor performance. The simplest way that I can see is to also connect the first and last item in a circular fashion, even when evaluating the veridical ordering. The only really convincing control condition I found was the generation of new sequences for every character by shuffling the sequence of choices and re-creating new artificial trajectories with the same start and endpoint. This analysis performs much better than chance (circular shuffling), suggesting to me that a lot of the observed decoding accuracy is indeed simply caused by the temporal smoothness of both time series.

      We thank the reviewer for emphasizing this important concern; we agree that we did not sufficiently address this in the initial submission. This concern is one main reason we removed the spline-based analysis and now use regression-based representational similarity analyses in its place. In the revision, we report autocorrelation-related analyses in the supplement, and via controls and additional analysis show that temporal distance (or its square) cannot explain the location-like effects. This substantially improves our ability to interpret the findings.

      (5) Overall, I found the analysis of the brain-behavior correlation presented in Figure 5 unconvincing. First, the correlation is mostly driven by one individual with a large network size and a 6.5 cluster. I suspect that the exclusion of this individual would lead to the correlation losing significance. Secondly, the neural measure used for this analysis (determining the number of optimal clusters that maximize the overlap between neural clustering and behavioral clustering) is new, non-validated, and disconnected from all the analyses that had been reported previously. The authors need to forgive me for saying so, but at this point of the paper, would it not be much more obvious to use the decoding accuracy for power and affiliation from the main model used in the paper thus far? Does this correlate? Another obvious candidate would be the decoding accuracy for character identity or the size of the region that encodes affiliation and power. Given the plethora of candidate neural measures, I would appreciate if the authors reported the other neural measures that were tried (and that did not correlate). One way to address this would have been to select the method on the initial sample and then test it on the validation sample - unfortunately, the measure was not pre-registered before the validation sample was collected. It seems that the correlation was only found and reported on the validation sample?

      We agree that this analysis was too complicated and under constrained, and thus not convincing. We think that removing this cluster-based analysis is the most conservative response to the reviewer’s concerns and have removed it from the revised paper.

      Recommendations to the authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript's description of the shuffling analysis performed during decoding is currently ambiguous, particularly concerning the control variables. This ambiguity is present only in the Figure 4 legends and requires a more detailed explanation within the methods section. It is essential to clarify whether the permutation process was conducted within each character's data set or across multiple characters' data sets. If permutations were confined to within-character data, the conclusion would be that the hippocampus encodes context-specific information rather than providing a twodimensional common space.

      We thank the reviewer for this comment. We have now removed the spline analysis due to these and other problems and have replaced it with representational similarity analyses that are both more rigorous and easier to interpret. We think these analyses allow us to make the claim that the characters are represented in a common space. 

      In the methods, we explain the analyses (page 23-24, lines 475-500):

      “We also expected the hippocampus to represent the different characters’ changing social locations, which are implicit in the participant’s choices. We used multiple regression searchlight RSA to test whether hippocampal pattern dissimilarity increases with social location distance, based on participant-specific trial-wise beta images where boxcar regressors spanned each trial’s reaction time.”

      “We ran two complementary regression analyses to address two related questions. First, we asked whether the hippocampus represents how a specific relationship changes over time. For this analysis, for each participant and each searchlight, we computed character-specific (i.e., only for same character trial pairs) correlation distances between trial-wise beta patterns and Euclidean distances between the social location behavioral coordinates. Distances were zscored within character trial pairs to isolate character-specific changes. The second analysis asked whether the there is a common map-like representation, where all trials, regardless of relationship, are represented in a shared coordinate system. Here, we included all trial pairs and z-scored the distances globally. For both regression analyses, we included control distances to control for possible confounds. To account for generic time-related changes, we controlled for absolute scan-time difference, as this correlated with location distance across participants (see Temporal autocorrelation of hippocampal beta patterns in the supplement). Although the square of this temporal distance did not explain any additional variance in behavioral distances, we ran a robustness analysis including both temporal distance and its square and saw qualitatively the same clusters with similar effect sizes. As such, we report the main analysis only. We included binary dimension difference (0 = trial pairs of different dimension, 1 = trials pairs of the same dimension), to ensure effects could not be explained by dimension-related effects. In the group-level model, we controlled for sample and the average reaction time between affiliation and power decisions.”

      In the results, we describe the results and our interpretation (pages 11-12, lines 185208):

      “We have shown that the left hippocampus represents the affiliation and power trials differently, consistent with an abstract dimensional representation. Does it also represent the changing social coordinates of each character? To test this, we multiple-regression RSA searchlight to test whether left hippocampus patterns represent the characters’ changing social locations across interactions (see Figure 3). We restricted the distances to those from trial pairs from the same character and standardized the distances within character (see Figure 3BD). We controlled for temporal distance to ensure the effect was not explainable by the time between trials, and for whether the trials shared the same underlying dimension (affiliation or power; see Location similarity searchlight analyses for more details). At the group level, we controlled for sample and the average reaction time difference between affiliation and power trials. Using the same testing logic as the dimensionality similarity analysis, we first tested our hypothesis in the bilateral hippocampus and found widespread effects in both the left (peak voxel MNI x/y/z = -35/-22/-15, cluster extent = 1470 voxels) and right (peak voxel MNI x/y/z = 37/-19/-14, cluster extent = 1953 voxels) hemispheres. The whole-brain searchlight analysis revealed additional clusters in the left putamen (-27/-3/14, cluster extent = 131 voxels) and left posterior cingulate cortex (-10/-28/41, cluster extent = 304 voxels).”

      “We then asked a second, complementary question: does the hippocampus represent all interactions, across characters, within a shared map? To test for this map-like structure, we repeated the analysis but now included all trial pairs, z-scoring distances globally rather than within character (Figure 3E-F). The remainder of the procedure followed the same logic as the preceding analysis. The hippocampus analysis revealed an extensive right hippocampal cluster (27/27/-14, cluster extent = 1667 voxels). The whole-brain analysis did not show any significant clusters.”

      We also describe the results in the discussion (page 12, lines 220-226): 

      “Then, we show that the hippocampus tracks the changing social locations (affiliation and power coordinates), above and beyond the effects of dimension or time; the hippocampus seemed to reflect both the changing within-character locations, tracking their locations over time, and locations across characters, as if in a shared map. Thus, these results suggest that the hippocampus does not just encode static character-related representations but rather tracks relationship changes in terms of underlying affiliation and power.”

      The manuscript's description of the decoding analysis is unclear regarding the variability of the decoded positions. The authors appear to decode the position of a character along a spline, which raises the question of whether this position correlates with time, since characters are more likely to be located further from the center in later trials. There is a concern that the decoded position may not solely reflect the hippocampal encoding of spatial location, but could also be influenced by an inherent temporal association. Given that a character's position at time t is likely to be similar to its positions at t−1 and t+1, it is crucial that the authors clearly articulate their approach to separating spatial representation from temporal autocorrelation. While this issue may have been addressed in the construction of the test set, the manuscript does not seem to adequately explain how such biases were mitigated in the training set.

      We agree that temporal confounding needs to be better accounted for, as our claims depend on space-like signals being separable from time-like ones. We address this in several ways in the revised manuscript.

      First, we emphasize that this is a narrative-based task, where temporal structure is relevant. As such, our analyses aim to demonstrate that effects go beyond simple temporal confounds, like trial order or time elapsed.

      Despite the temporal structure to the task, the decisions for the same character are spaced in time, and interleaved with other characters’ decisions, reducing the chance that a simple temporal confound could explain trajectory-related effects. We now describe the task better in the revised methods (page 16, lines 314-318):

      “All six characters’ decision trials are interleaved with one another and with narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to that same character, such that each character’s choices are separated by an average of ~20 seconds (range 12 seconds to 10 min).”

      To address temporal autocorrelation in the fMRI time series, we used SPM’s FAST algorithm. Briefly, FAST models temporal autocorrelation as a weighted combination of candidate correlation functions, using the best estimate to remove autocorrelated signal.

      We also now report the temporal autocorrelation profile of the hippocampal beta series in the supplement, including (pages 29-31, lines 593-656):

      “The Social Navigation Task is a narrative-based task, where the relationships with characters evolve over time; trial pairs that are close in time may have more similar fMRI patterns for reasons unrelated to social mapping (e.g., slow drift). It is important to account for the role of time in our analyses, to ensure effects go beyond simple temporal confounds, like the time between decision trials. To aid in this, we quantified how fMRI signals change over time using a pattern autocorrelation function across decision trial lags. We defined the left and right hippocampus and the left and right intracalcarine cortex using the HarvardOxford atlas and thresholded them at 50% probability. We chose intracalcarine corex as an early visual control region that largely corresponds to primary visual cortex (V1), as it is likely to be driven by the visually presented narrative. We used the same trial-wise beta images as in the location similarity RSA (boxcar regressors spanning each decision trial’s reaction time). For each participant and region-of-interest (ROI), we extracted the decision trial-by-voxel beta matrix and quantified three kinds of temporal dependence: beta autocorrelation, multivoxel pattern correlation and multivoxel pattern correlation after regressing out temporal distance.”

      “To estimate the temporal autocorrelation of the trial-wise beta values, we treated each voxel’s beta values as a time series across trials and measured how much a voxel’s response on one trial correlated (Pearson) with its response on previous trials. We averaged these voxel wise autocorrelations within each ROI. At one trial apart (lag 1), both the hippocampus and V1 showed small positive autocorrelations, indicating modest trial-to-trial carryover in response amplitude (see Supplemental figure 1) that by three trials apart was approximately 0.”

      “Because our representational similarity analyses depend on trial-by-trial pattern similarity, we also estimated how multivoxel patterns were autocorrelated over time. For each lag, we computed the Pearson correlation between each trial’s voxelwise pattern and the pattern from the trial that many trials earlier, then averaged those correlations to obtain a single autocorrelation value for that lag. At one trial apart, both regions showed positive autocorrelation, with V1 having greater autocorrelation than the hippocampus; pattern correlations between trials 3 or 4 trials apart reduced across participants, settling into low but positive values. Then, for each participant and ROI, we regressed out the effect of absolute trial onset differences from all pairwise pattern correlations, to mirror the effects of controlling for these temporal distances in regressions. After removing this temporal distance component, the short lag pattern autocorrelation dropped substantially in both regions. The similarity in autocorrelation profiles between the two regions suggests that significant similarity effects in the hippocampus are unlikely to be driven by generic temporal autocorrelation.”

      “Relationship between behavioral location distance and temporal distance “

      “We also quantified how temporal distances between trials relates to their behavioral location distances, participant by participant. Our dimension similarity analysis controls for temporal distance between trials by design (see Social dimension similarity searchlight analysis), but our location similarity analysis does not. To decide on covariates to include in the analysis, we tested whether temporal distances can explain behavioral location distances. For each participant, we computed the correlations between trial pairs’ Euclidean distances in social locations and their linear temporal distances (“linear”) and the temporal distances squared (“quadratic”), to test for nonlinear effects. We then summarized the correlations using one-sample t-tests. The linear relationship was statistically significant (t<sub>49</sub> = 12.24, p < 0.001), whereas the quadratic relationship was not (t<sub>49</sub> = -0.55, p = 0.586). Similarly, in participant specific regressions with both linear and quadratic temporal distances, the linear effect was significant (t<sub>49</sub> = 5.69, p < 0.001) whereas the quadratic effect was not (t<sub>49</sub> = 0.20, p = 0.84). Based on this, we included linear temporal distances as a covariate in our location similarity analyses (see Location similarity searchlight analyses), and verified that adding a quadratic temporal distance covariate does not alter the results. Thus, the reported location-related pattern similarity effects go beyond what can be explained by temporal distance alone.”

      How the free parameter of spectral clustering was determined, if there is any?

      The interpretation of the number of hippocampal activity clusters is ambiguous. It is suggested that this number could fluctuate due to unique activity patterns or the fit to behaviorally defined trajectories. A lower number of clusters might indicate either a noisier or less distinct representation, raising the question of the necessity and interpretability of such a complex analysis. This concern is compounded by the potential sensitivity of the clustering to the variance in Euclidean distances of each trial's position relative to the center. If a character's position is consistently near the center, this could artificially reduce the perceived number of clusters. Furthermore, the manuscript should address whether there is any correlation between the number of clusters and behavioral performance. Specifically, what are the implications if participants are able to perform the task adequately with a smaller number of distinct hippocampal representation states?

      The rationale for conducting both cluster analysis and position decoding as separate analyses remains unclear. While cluster analysis can corroborate the findings of position decoding, it is not apparent why the authors chose to include trials across characters for cluster analysis but not for decoding analysis. An explanation of the reasoning behind this methodological divergence would help in understanding the distinct contributions of each analysis to the study's findings.

      The paper by Cohen et al. (1997), which provides the questionnaire for measuring the social network index, is not cited in the references. Upon reviewing the questionnaire that the author may have used, it appears that the term "social network size" does not refer to the actual size but to a score or index derived from the questionnaire responses. It may be more appropriate to replace the term "size" with a different term to more accurately reflect this distinction.

      Thank you for seeking these clarifications. Given the complexity of this analysis, we have decided to drop it to focus instead on our dimension and location representational similarity analysis results.

      Reviewer #2 (Recommendations for the authors):

      How did the participants' decisions on previous trials influence the future trials that the subjects saw? If the different participants were faced with different decision trials, then how did you compare their decision? If two participants made the same decisions, would they have seen exactly the same sequence of trials (see point X on how the trial sequence was randomized).

      All participants experience the same narrative, with the same decisions (i.e., the same available options); their choices (i.e., the options they select) are what implicitly shape each character’s affiliation and power locations, and thus each character’s trajectory. In other words, the narrative is fixed; what changes is the social coordinates assigned to each trial’s outcome depending on the participant’s choice of how to interact from the two narrative options. This means that we can meaningfully compare participants' neural patterns, given that every participant received the same text and images throughout.

      We have now added details on the narrative structure, replacing more ambiguous statements with a clearer description (page 16, lines 309-318):

      “The sequence of trials, including both narrative and decision trials, were fixed across participants; all that differs are the choices that the participants make. Narrative trials varied in duration, depending on the content (range 2-10 seconds), but were identical across participants. Decision trials always lasted 12 seconds, with two options presented until the participant made a choice, after which a blank screen was presented for the remainder of the duration. All six characters’ decision trials are interleaved with one another, and with the narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to another decision with the same character, such that each character’s choices are separated by an average of ~20 seconds (ranging from 12 seconds to 10 min).”

      Figure 2B: I assume that "count" is "count of participants"? It would be good to indicate this on the axis/caption.

      Thank you for noting this. We have now removed this figure to improve the clarity of our figures. 

      We have shown that the hippocampus represents the interaction decision trials along abstract social dimensions, but does it track each relationship's unique sequence of abstract social coordinates?". Please clarify what you mean by "represents the interaction decision trials”.

      By “represents the interaction decision trials along abstract social dimensions”, we mean that when the participant makes a choice during the social interactions the hippocampal patterns represent the current social dimension of the choice (affiliation vs power). In other words, the hippocampal BOLD patterns differentiate affiliation and power decisions, consistent with our hypothesis of abstract social dimension representation in the hippocampus. We have clarified this (page 11, lines 185-187):

      “We have shown that the left hippocampus represents the affiliation and power trials differently, consistent with an abstract dimensional representation.”

      Page 8: "Hippocampal sequences are ordered like trajectories": It is not entirely clear to me what is meant by the split midpoint. Is this the midpoint of the piece-wise linear interpolation between two points, or simply the mean of all piecewise splines from one character? If the latter, is the null model the same as simply predicting the mean affiliation and power value for this character? If yes, please clarify and simplify this for the reader.

      Page 8: "Hippocampal sequences track relationship-specific paths". First, I was misled by the "relationship-specific". I first understood this to mean that you wanted to test whether two relationships (i.e. the identity of the partner) had different representations in Hippocampus, even if the power/affiliation trajectories are the same. I suggest changing the title of this section.

      The analysis in this section also breaks any temporal autocorrelation of measured patterns - so I am not sure if this is a strong analysis that should be interpreted at all. This analysis seems to not address the claim and conclusion that is drawn from it. I assume that the random trajectories have different choices and different affiliation/power values than the true trajectories. So the fact that the true trajectories can be better decoded simply shows that either choices or affiliation and power (or both) are represented in the neural code - but not necessarily anything beyond this.

      Page 9: "Neural trajectories reflect social locations, not just choices". The motivation of this analysis is not clear to me. As I understand this analysis, both social location and choices are changed from the real trajectories. How can it then show that it reflects social locations, not just the choices?

      Figure 4 caption: "on the -based approximation" Is there a missing "point"-[based] here?

      We agree with the reviewer that this analysis is hard to interpret and does not adequately address concerns regarding temporal autocorrelation, and as such we have removed it from the manuscript. We describe the new results that include controlling for temporal distance between trials (pages 11-12, lines 185-208):

      “We have shown that the left hippocampus represents the affiliation and power trials differently, consistent with an abstract dimensional representation. Does it also represent the changing social coordinates of each character? To test this, we multiple-regression RSA searchlight to test whether left hippocampus patterns represent the characters’ changing social locations across interactions (see Figure 3). We restricted the distances to those from trial pairs from the same character and standardized the distances within character (see Figure 3BD). We controlled for temporal distance to ensure the effect was not explainable by the time between trials, and for whether the trials shared the same underlying dimension (affiliation or power; see Location similarity searchlight analyses for more details). At the group level, we controlled for sample and the average reaction time difference between affiliation and power trials. Using the same testing logic as the dimensionality similarity analysis, we first tested our hypothesis in the bilateral hippocampus and found widespread effects in both the left (peak voxel MNI x/y/z = -35/-22/-15, cluster extent = 1470 voxels) and right (peak voxel MNI x/y/z = 37/-19/-14, cluster extent = 1953 voxels) hemispheres. The whole-brain searchlight analysis revealed additional clusters in the left putamen (-27/-3/14, cluster extent = 131 voxels) and left posterior cingulate cortex (-10/-28/41, cluster extent = 304 voxels).”

      “We then asked a second, complementary question: does the hippocampus represent all interactions, across characters, within a shared map? To test for this map-like structure, we repeated the analysis but now included all trial pairs, z-scoring distances globally rather than within character (Figure 3E-F). The remainder of the procedure followed the same logic as the preceding analysis. The hippocampus analysis revealed an extensive right hippocampal cluster (27/27/-14, cluster extent = 1667 voxels). The whole-brain analysis did not show any significant clusters.”

      We emphasize that the results are robust to the inclusion of temporal distance squared, in the methods (pages 23-24, lines 493-496):

      “Although the square of this temporal distance did not explain any additional variance in behavioral distances, we ran a robustness analysis including both temporal distance and its square and saw qualitatively the same clusters with similar effect sizes.”

      Page 8: last paragraph: The text sounds like you have already shown that you can decode character identity from the patterns - but I do not believe you have it this point. I would consider this would be an interesting addition to the paper, though.

      This section has been removed, and we have been careful to not imply this in the current version of the manuscript. While we agree a character identity decoding would enrich our argument, we do not believe our task is well-suited to capture a character identity effect. Each character only has 12 decision trials, and these trials are partially clustered in time - this is one problem of temporal autocorrelation that we thank the reviewers for pushing us to consider in more detail. Dimension and location patterns, on the other hand, are more natural to analyze in our task, especially in representational similarity analyses that test whether the relevant differences scale with neural distances.

      Page 14ff: Why is "Analysis section" not part of "Materials and Methods"? I believe adding the analysis after a careful description of the methods would improve the clarity of this section.

      We agree with the reviewer and have now consolidated these two sections.

      Two or three examples of Affiliation and Power decision trials should be provided, so the reader can form a more thorough understanding of how these dimensions were operationalized. For the RSA analysis, it is important to consider other differences between these two types of trials.

      We agree that adding examples will clarify the operationalization of these dimensions. We now include example affiliation and power trials in a table (page 17-18).

      We thank the reviewer for noting the need to rule out alternative hypotheses; we have added several such tests. Affiliation and power trials were not different in word count (page 17, lines 329-332):

      “To ensure that any observed neural or behavioral differences were not confounded by trivial features of the text, we tested for differences between the affiliation and power trials (where the two options are concatenated). There were no differences in word count (affiliation average = 26.6, power average = 25.6; t-test p = 0.56).”

      They were also not different in their sentiment, as assessed by a Large Language Model (LLM) analysis (page 17, lines 332-335): 

      “The text’s sentiment also did not differ between these trial types (t-test p = 0.72), as quantified by comparing sentiment compound scores (from most negative, −1, to most positive, +1), using a Large Language Model (LLM) specialized for sentiment analysis [26]. “

      The affiliation and power trials were different in terms of semantic content, consistent with our assumptions (page 17, lines 337-347):

      “Our framework assumes that affiliation and power trials differ in their semantic content–that is, in the conceptual meaning of the text, beyond word count or sentiment. To test this assumption, we used an LLM-based semantic embedding analysis. Each decision trial was embedded into a semantic vector. We then measured the cosine similarity between pairs of trials and calculated the difference between average within-dimension similarity (affiliation-affiliation and power-power comparisons) and average between-dimension similarity (affiliationpower comparisons) and assessed its statistical significance with permutation testing (1,000 shuffles of trial labels). As expected, decision trials of the same dimension were more similar to each other than trials of different dimension, across multiple LLMs (OpenAI’s text-embedding-3-small [27]: similarity difference = 0.041, p < 0.001; all-MiniLM-L12-v2 [28]: similarity difference = 0.032, p < 0.001).”

      The affiliation and power trials were different in average reaction time. To control for this difference in the dimension RSA analysis, we added each participant’s absolute value reaction time difference between the trial types as a covariate. The results were nearly identical to what they were before. We updated the text to reflect this new control (page 23, lines 471-474):

      “However, there was a significant difference in the average reaction time between affiliation and power decisions across participants (t<sub>49</sub> = 6.92, p < 0.001; affiliation mean = 4.92 seconds (s), power mean = 4.51 s), so we controlled for this in the group-level analysis.”

      The exact implementation and timing of the behavioral tasks should be described better. How many narrative trials were intermixed with the decision trials? Which characters were they assigned to? How was the sequence of trials determined? Was it fixed across participants, or randomized?

      We agree that additional details are helpful. In the Methods, we now describe this with more detail (page 16, lines 301-318):

      “There are two types of trials: “narrative” trials where background information is provided or characters talk or take actions (a total of 154 trials), and “decision” trials where the participant makes decisions in one-on-one interactions with a character that can change the relationship with that character (a total of 63 trials). On each decision, participants used a button response box to select between the two options. The options (1 or 2, assigned to the index and middle fingers) choice directions (+/-1 arbitrary unit on the current dimension) were counterbalanced.”

      “The sequence of trials, including both narrative and decision trials, were fixed across participants; all that differs are the choices that the participants make. Narrative trials varied in duration, depending on the content (range 2-10 seconds), but were identical across participants. Decision trials always lasted 12 seconds, with two options presented until the participant made a choice, after which a blank screen was presented for the remainder of the duration. All six characters’ decision trials are interleaved with one another, and with the narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to another decision with the same character, such that each character’s choices are separated by an average of ~20 seconds (ranging from 12 seconds to 10 min).”

      What is the exact timing of trials during fMRI acquisition - i.e. how long were the trials, what was the ITI, were there long phases of rest to determine the resting baseline? These are all important factors that will determine the covariance between regressors and should be reported carefully. Ideally, I would like to see the trial-by-trial temporal auto-correlation structure across beta-weights to be reported.

      We thank the reviewer for asking for this clarification. We have added the following text to clarify the trial timing (page 16, lines 314-318):

      “All six characters’ decision trials are interleaved with one another and with narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to that same character, such that each character’s choices are separated by an average of ~20 seconds (range 12 seconds to 10 min).”

      We now describe the temporal autocorrelation patterns in the supplement, including how we decided on how to control for temporal distance in representational similarity analyses (pages 29-31, lines 593-656):

      “The Social Navigation Task is a narrative-based task, where the relationships with characters evolve over time; trial pairs that are close in time may have more similar fMRI patterns for reasons unrelated to social mapping (e.g., slow drift). It is important to account for the role of time in our analyses, to ensure effects go beyond simple temporal confounds, like the time between decision trials. To aid in this, we quantified how fMRI signals change over time using a pattern autocorrelation function across decision trial lags. We defined the left and right hippocampus and the left and right intracalcarine cortex using the HarvardOxford atlas and thresholded them at 50% probability. We chose intracalcarine corex as an early visual control region that largely corresponds to primary visual cortex (V1), as it is likely to be driven by the visually presented narrative. We used the same trial-wise beta images as in the location similarity RSA (boxcar regressors spanning each decision trial’s reaction time). For each participant and region-of-interest (ROI), we extracted the decision trial-by-voxel beta matrix and quantified three kinds of temporal dependence: beta autocorrelation, multivoxel pattern correlation and multivoxel pattern correlation after regressing out temporal distance.”

      “To estimate the temporal autocorrelation of the trial-wise beta values, we treated each voxel’s beta values as a time series across trials and measured how much a voxel’s response on one trial correlated (Pearson) with its response on previous trials. We averaged these voxel wise autocorrelations within each ROI. At one trial apart (lag 1), both the hippocampus and V1 showed small positive autocorrelations, indicating modest trial-to-trial carryover in response amplitude (see Supplemental figure 1) that by three trials apart was approximately 0.”

      “Because our representational similarity analyses depend on trial-by-trial pattern similarity, we also estimated how multivoxel patterns were autocorrelated over time. For each lag, we computed the Pearson correlation between each trial’s voxelwise pattern and the pattern from the trial that many trials earlier, then averaged those correlations to obtain a single autocorrelation value for that lag. At one trial apart, both regions showed positive autocorrelation, with V1 having greater autocorrelation than the hippocampus; pattern correlations between trials 3 or 4 trials apart reduced across participants, settling into low but positive values. Then, for each participant and ROI, we regressed out the effect of absolute trial onset differences from all pairwise pattern correlations, to mirror the effects of controlling for these temporal distances in regressions. After removing this temporal distance component, the short lag pattern autocorrelation dropped substantially in both regions. The similarity in autocorrelation profiles between the two regions suggests that significant similarity effects in the hippocampus are unlikely to be driven by generic temporal autocorrelation.”

      “Relationship between behavioral location distance and temporal distance “

      “We also quantified how temporal distances between trials relates to their behavioral location distances, participant by participant. Our dimension similarity analysis controls for temporal distance between trials by design (see Social dimension similarity searchlight analysis), but our location similarity analysis does not. To decide on covariates to include in the analysis, we tested whether temporal distances can explain behavioral location distances. For each participant, we computed the correlations between trial pairs’ Euclidean distances in social locations and their linear temporal distances (“linear”) and the temporal distances squared (“quadratic”), to test for nonlinear effects. We then summarized the correlations using one-sample t-tests. The linear relationship was statistically significant (t<sub>49</sub> = 12.24, p < 0.001), whereas the quadratic relationship was not (t<sub>49</sub> = -0.55, p = 0.586). Similarly, in participant specific regressions with both linear and quadratic temporal distances, the linear effect was significant (t<sub>49</sub> = 5.69, p < 0.001) whereas the quadratic effect was not (t<sub>49</sub> = 0.20, p = 0.84). Based on this, we included linear temporal distances as a covariate in our location similarity analyses (see Location similarity searchlight analyses), and verified that adding a quadratic temporal distance covariate does not alter the results. Thus, the reported location-related pattern similarity effects go beyond what can be explained by temporal distance alone.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study by Howe and colleagues investigates the role of the posterolateral cortical amygdala (plCoA) in mediating innate responses to odors, specifically attraction and aversion. By combining optogenetic stimulation, single-cell RNA sequencing, and spatial analysis, the authors identify a topographically organized circuit within plCoA that governs these behaviors. They show that specific glutamatergic neurons in the anterior and posterior regions of plCoA are responsible for driving attraction and avoidance, respectively, and that these neurons project to distinct downstream regions, including the medial amygdala and nucleus accumbens, to control these responses.

      Strengths:

      The major strength of the study is the thoroughness of the experimental approach, which combines advanced techniques in neural manipulation and mapping with high-resolution molecular profiling. The identification of a topographically organized circuit in plCoA and the connection between molecularly defined populations and distinct behaviors is a notable contribution to understanding the neural basis of innate motivational responses. Additionally, the use of functional manipulations adds depth to the findings, offering valuable insights into the functionality of specific neuronal populations.

      Weaknesses:

      There are some weaknesses in the study's methods and interpretation. The lack of clarity regarding the behavior of the mice during head-fixed imaging experiments raises the possibility that restricted behavior could explain the absence of valence encoding at the population level.

      We agree with idea that head-fixation may alter the state of the animal and the neural encoding of odor. To address this, we have provided further analysis of walking behavior during the imaging sessions, which is provided in Figure S2. Overall, we could not identify any clear patterns in locomotor behavior that are odor-specific. Moreover, when neural activity was sorted depending on the behavioral state (walking, pausing or fleeing) we didn’t observe any apparent patterns in odor-evoked neural activity. This is now discussed in the Results and Limitations sections of the manuscript.

      Furthermore, while the authors employ chemogenetic inhibition of specific pathways, the rationale for this choice over optogenetic inhibition is not fully addressed, and this could potentially affect the interpretation of the results.

      The rationale was logistical. First, inhibition of over a timescale of minutes is problematic with heat generation during prolonged optical stimulation. Second, our behavioral apparatus has a narrow height between the ceiling and floor, making tethering difficult. This is now explained the results section. The trade-off of using chemogenetics is that we are silencing neurons and not specific projections. However, because we find that NAc- and MeA- projecting neurons have little shared collateralization, we believe the conclusion of divergent pathways still stands. This is now discussed in the Limitations section.

      Additionally, the choice of the mplCoA for manipulation, rather than the more directly implicated anterior and posterior subregions, is not well-explained, which could undermine the conclusions drawn about the topographic organization of plCoA.

      We targeted the middle region of plCoA because it contains a mixture of cell types found in both the anterior and posterior plCoA, allowing us to test the hypothesis that cell types, not intra plCoA location, elicit different responses. Had we targeted the anterior or posterior regions, we would expect to simply recapitulate the result from activation of random cells in each region. As a result, we think stimulation in the middle plCoA is a better test for the contribution of cell types. We have now clarified this in the text.

      Despite these concerns, the work provides significant insights into the neural circuits underlying innate behaviors and opens new avenues for further research. The findings are particularly relevant for understanding the neural basis of motivational behaviors in response to sensory stimuli, and the methods used could be valuable for researchers studying similar circuits in other brain regions. If the authors address the methodological issues raised, this work could have a substantial impact on the field, contributing to both basic neuroscience and translational research on the neural control of behavior.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by the Root laboratory and colleagues describes how the posterolateral cortical amygdala (plCoA) generates valenced behaviors. Using a suite of methods, the authors demonstrate that valence encoding is mediated by several factors, including spatial localization of neurons within the plCoA, glutamatergic markers, and projection. The manuscript shows convincingly that multiple features (spatial, genetic, and projection) contribute to overall population encoding of valence. Overall, the authors conduct many challenging experiments, each of which contains the relevant controls, and the results are interpreted within the framework of their experiments.

      Strengths:

      - For a first submission the manuscript is well constructed, containing lots of data sets and clearly presented, in spite of the abundance of experimental results.

      - The authors should be commended for their rigorous anatomical characterizations and posthoc analysis. In the field of circuit neuroscience, this is rarely done so carefully, and when it is, often new insights are gleaned as is the case in the current manuscript.

      - The combination of molecular markers, behavioral readouts and projection mapping together substantially strengthen the results.

      - The focus on this relatively understudied brain region in the context is valence is well appreciated, exciting and novel.

      Weaknesses:

      - Interpretation of calcium imaging data is very limited and requires additional analysis and behavioral responses specific to odors should be considered. If there are neural responses behavioral epochs and responses to those neuronal responses should be displayed and analyzed.

      We have now considered this, see response above.

      - The effect of odor habituation is not considered.

      We considered this, but we did not find any apparent differences in valence encoding as measured by the proportion of neurons with significant valence scores across trials (see Figure 1J).

      - Optogenetic data in the two subregions relies on very careful viral spread and fiber placement. The current anatomy results provided should be clear about the spread of virus in A-P, and D-V axis, providing coordinates for this, to ensure readers the specificity of each sub-zone is real.

      We were careful to exclude animals for improper targeting. The spread of virus is detailed in Figures S3, S8 & S9.

      - The choice of behavioral assays across the two regions doesn't seem balanced and would benefit from more congruency.

      The choice of the 4-quadrant assay was used because this study builds off of our prior experiments that demonstrate a role for the plCoA in innate behavior. It is noteworthy that the responses to odor seen in this assay are generally in agreement with other olfactory behavioral assays, so one wouldn’t predict a different result. Moreover, the approach and avoidance responses measured in this assay are precisely the behaviors we wish to understand. We did examine other non-olfactory behavioral readouts (Figures S3, S8), and didn’t observe any effect of manipulation of these pathways.

      - Rationale for some of the choices of photo-stimulation experiment parameters isn't well defined.

      The parameters for photo-stimulation were based on those used in our past work (Root et al., 2014). We used a gradient of frequency from 1-10 Hz based on the idea that odor likely exists in a gradient and this was meant to mimic a potential gradient, though we don’t know if it exists. The range in stimulation frequencies appears to align with the actual rate of firing of plCoA neurons (Iurilli et al., 2017).

      Reviewer #3 (Public review):

      Summary:

      Combining electrophysiological recording, circuit tracing, single cell RNAseq, and optogenetic and chemogenetic manipulation, Howe and colleagues have identified a graded division between anterior and posterior plCoA and determined the molecular characteristics that distinguish the neurons in this part of the amygdala. They demonstrate that the expression of slc17a6 is mostly restricted to the anterior plCoA whereas slc17a7 is more broadly expressed. Through both anterograde and retrograde tracing experiments, they demonstrate that the anterior plCoA neurons preferentially projected to the MEA whereas those in the posterior plCoA preferentially innervated the nucleus accumbens. Interestingly, optogenetic activation of the aplCoA drives avoidance in a spatial preference assay whereas activating the pplCoA leads to preference. The data support a model that spatially segregated and molecularly defined populations of neurons and their projection targets carry valence specific information for the odors. The discoveries represent a conceptual advance in understanding plCoA function and innate valence coding in the olfactory system.

      Strengths:

      The strongest evidence supporting the model comes from single cell RNASeq, genetically facilitated anterograde and retrograde circuit tracing, and optogenetic stimulation. The evidence clear demonstrates two molecularly defined cell populations with differential projection targets. Stimulating the two populations produced opposite behavioral responses.

      Weaknesses:

      There are a couple of inconsistencies that may be addressed by additional experiments and careful interpretation of the data.

      Stimulating aplCoA or slc17a6 neurons results in spatial avoidance, and stimulating pplCoA or slc17a7 neurons drives approach behaviors. On the other hand, the authors and others in the field also show that there is no apparent spatial bias in odor-driven responses associated with odor valence. This discrepancy may be addressed better. A possibility is that odor-evoked responses are recorded from populations outside of those defined by slc17a6/a7. This may be addressed by marking activated cells and identifying their molecular markers. A second possibility is that optogenetic stimulation activates a broad set of neurons that and does not recapitulate the sparseness of odor responses. It is not known whether sparsely activation by optogenetic stimulation can still drive approach of avoidance behaviors.

      We agree that marking specific genetic or projection defined neurons could help to clarify if there are some neurons have more selective valence responses. However, we are not able to perform these experiments at the moment. We have included new data demonstrating that sparser optogenetic activation evokes behaviors similar in magnitude as the broader activation (see Figure S4).

      The authors show that inhibiting slc17a7 neurons blocks approaching behaviors toward 2-PE. Consistent with this result, inhibiting NAc projection neurons also inhibits approach responses. However, inhibiting aplCOA or slc17a6 neurons does not reduce aversive response to TMT, but blocking MEA projection neurons does. The latter two pieces of evidence are not consistent with each other. One possibility is that the MEA projecting neurons may not be expressing slc17a6. It is not clear that the retrogradely labeling experiments what percentage of MEA- and NACprojecting neurons express slc17a6 and slc17a7. It is possible that neurons expressing neither VGluT1 nor VGluT2 could drive aversive or appetitive responses. This possibility may also explain that silencing slc17a6 neurons does not block avoidance.

      We have now performed RNAscope staining on retrograde tracing to better define this relationship. Although the VGluT1 and VGluT2 neurons have biased projections to the MeA and NAc, respectively, there is some nuance detailed in Figure S10. Generally, MeA projecting neurons are predominately VGluT2+, whereas NAc projecting have about 20% that express both. Some (less than 35%) retrogradely labeled neurons were not detected as VGluT1 or VGluT2 positive, suggesting that other populations could also contribute. We agree that the discrepancy between MeA-projection and VGluT2 silencing is likely due to incomplete targeting of the MeA-projecting population with the VGluT2-cre line. This is included in the Discussion section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Main:

      (1) For the head-fixed imaging experiments, what is the behavior of the mice during odor exposure? Could the weak reliability of individual neurons be due to a lack of approach or avoidance behavior? Could restricted behavior also explain the lack of valence encoding at the population level?

      We agree that this is a limitation of head-fixed recordings. In the revised manuscript we did attempt to characterize their behavioral response, and look for correlations in odor representation. Although we did find different patterns of odor-evoked walking behavior, these patterns were not reliable or specific to particular odors (Figure S2). For example, one might expect aversive odors to pause walking or elicit a fast fleeing-like response, but we did not observe any apparent differences for locomotion between odors as all odors evoked a mixture of responses (Figure S2A-D, text lines 208-232). We then examined responses to odor depending on the behavioral state (walking, pausing or fleeing) and didn’t observe any apparent patterns in odor responses (Figure S2E,F). Lastly, we acknowledge in the text that the lack of valence encoding may be an artifact of head-fixation (see lines 849-857).

      (2) For the optogenetic manipulations of Vglut1 and Vglut2 neurons, why was the injection and fiber targeted to the medial portion of the plCoA, if the hypothesis was that these glutamatergic neuron populations in different regions (anterior or posterior) are responsible for approach and avoidance? 

      We targeted the middle region of plCoA because it contains a mixture of cell types found in both the anterior and posterior plCoA, allowing us to test the hypothesis that cell types, not intraplCoA location, elicit different responses. Had we targeted the anterior or posterior regions, we would expect to simply recapitulate the result from activation of random cells in each region. As a result, we think stimulation in the middle plCoA is a better test for the contribution of cell types. We have clarified this in the text (Lines 417-419).

      Could this explain the lack of necessity with the DREADD experiments? 

      For the loss of function experiments, a larger volume of virus was injected to cover a larger area and we did confirm targeting of the appropriate areas. Though, it is always possible that the lack of necessity is due to incomplete silencing.

      Further, why was an optogenetic inhibition approach not utilized? 

      Although optogenetic inhibition could have plausibly been used instead, we chose chemogenetic inhibition for two reasons: First, for minutes-long periods of inhibition, optical illumination poses the risk of introducing heat related effects (Owen et al., 2019). In fact, we first tried optical inhibition but controls were exhibited unusually large variance. Second, it is more feasible in our assay as it has a narrow height between the floor and lid that complicates tethering to an optic fiber. Past experiments overcame this with a motorized fiber retraction system (Root et al., 2014), but this is highly variable with user-dependent effects, so we found chemogenetics to be a more practical strategy. We have added a sentence to explain the rationale (see lines 561-563).

      (3) The specific subregion of the nucleus accumbens that was targeted should be named, as distinct parts of the nucleus accumbens can have very different functions. 

      We attempted to define specific subregions of the nucleus accumbens and found that plCoA projection is not specific to the shell or core, anterior or posterior, rather it broadly innervates the entire structure. We have added a note about this in manuscript (see lines 470-471). Given that we did not find notable subregion-specific outputs within the NAc, targeting was directed to the middle region of NAc, with coordinates stated in the methods. 

      (4) Why was an intersectional DREADD approach used to inhibit the projection pathways, as opposed to optogenetic inhibition? The DREADD approach could potentially affect all projection targets, and the authors might want to address how this could influence the interpretation of the results.

      This is partly addressed above in point 2. As for interpretation, we acknowledge that the intersectional approach silences the neurons projecting to a given target and not the specific projection and we have been careful with the wording. Although this may complicate the conclusion, we did map the collaterals for NAc and MeA projecting neurons and find that neurons do not appreciably project to both targets and have minimal projections to other targets. We have now taken care to state that we silence the neurons projecting to a structure, not silencing the projection, and we acknowledge this caveat. However, since the MeA- and NAcprojecting neurons appear to be distinct from each other (largely not collateralizing to each other), the conclusion that these divergent pathways are required still stands. We have added discussion of this in the Limitations section (see lines 859-863).

      Minor:

      (1) Line 402 needs a reference.

      We have added the missing reference (now line 441).

      (2) The Supplemental Figure labeling in the main text should be checked carefully.

      Thank you for pointing this out. We have fixed the prior errors.

      (3) Panel letter D is missing from Figure 2.

      This has been fixed.

      Reviewer #2 (Recommendations for the authors):

      Major Concerns, additional experiments:

      - In the calcium imaging experiments mice were presented with the same odor many times. Overall responses to odor presentations were quite variable and appear to habituate dramatically (Figure S1F). The general conclusion from these experiments are a lack of consistent valence-specific responses of individual neurons, but I wonder if this conclusion is slightly premature. A few potential explanatory factors that may need additional attention are: -First, despite recording video of the mouse's face during experiments, no behavioral response to any odor is described. Is it possible these odors when presented in head-fixed conditions do not have the same valence?

      Yes, we agree that this is a possibility. We have added a discussion in the Limitations section (see lines 849-857). We have also added additional behavioral analysis discussed below.

      On trials with neural responses are there behavioral responses that could be quantified? 

      We have now added data in which we attempt to characterize their behavioral response, to look for correlations in odor representation (see lines 208-228). Although we did observe different patterns of odor-evoked walking behavior, these patterns were not reliable or specific to particular odors (Figure S2). One might expect aversive odors to pause walking or elicit a fast fleeing-like response, but we did not observe any apparent differences for locomotion between odors (Figure S2A-D). Next, we examined responses to odor depending on the behavioral state (walking, pausing or fleeing) and didn’t observe any meaningful differences in odor responses (Figure S2E,F). Lastly, we acknowledge that the odor representation may be different in freely moving animals that exhibit dynamic responses to odor (see lines 859-857).

      - Habituation seems to play a prominent role in the neural signals, is there a larger contribution of valence if you look only at the first delivery (or some subset of the 20 presentations) of an odor type for a given trial? 

      Indeed, we considered this, but we did not find any apparent differences in valence encoding as measured by the proportion of neurons with significant valence scores across trials (see Figure 1J).

      - Is it reasonable to exclude valence encoding as a possibility when largely neurons were unresponsive to the positive valence odors (2PE and peanut) chosen when looking at the average cluster response (Figure 1F)? 

      It is true that we see fewer neurons responding to the appetitive odors (Figure 1H) and smaller average responses within the cluster, but some neurons do respond robustly. If these were valence responses, we would predict that neural responses should be similarly selective, but we do not observe any such selectivity. The sparseness of responses to appetitive odors does cause the average cluster analysis (Figure 1F) to show muted responses to these odors, consistent with the decreased responsivity to appetitive odors. Moreover, single neuron response analysis reveals that a given neuron is not more likely to respond to appetitive or aversive odors with any selectivity greater than chance. For these reasons, we think it is reasonable to conclude an absence of valence responses, which is consistent with the conclusion from another report (Iurilli et al., 2017).

      - While the preference and aversion assay with 4 corners is an interesting set-up and provides a lot of data for this particular manuscript. It would be helpful to test additional behaviors to determine whether these circuits are more conserved. As it stands the current manuscript relies on very broad claims using a single behavioral readout. Some attempts to use head-fixed approaches with more defined odor delivery timelines and/or additional valenced behavioral readouts is warranted.

      We appreciate the suggestion, but are not able to perform these experiments at the moment. The choice of the 4-quadrant assay was used because it built off of our prior experiments that demonstrate a role for the plCoA in innate behavior. It is noteworthy that the responses to odor seen in this assay are generally in agreement with other olfactory behavioral assays, so one wouldn’t predict a different result. The approach and avoidance responses measured in this assay are precisely the behaviors we wish to understand. Moreover, we did examine other nonolfactory behavioral readouts (Figures S3, S8), and didn’t observe any effect of manipulation of these pathways. Lastly, we have tried to define parameters for head-fixed behavior that would permit correlation of neural responses with behavior, including longer stimulations and closed loop locomotion control of odor concentration, but were unsuccessful at establishing parameters that generated reliable behavioral responses. We acknowledge that one limitation of the study is the limited behavioral tests with two odors and whether the circuits are more broadly necessary for other odors. 

      Minor comments:

      • Please define PID in the Results when it is first introduced.

      Done (see line 154)

      • Line 412 Figure S5C-N should be Figure S6C-N.

      Fixed. Now Figure S8C-N due to additional figures (see line 451).

      • Throughout the Discussion it would be helpful if the authors referred to specific Figure panels that support their statements (e.g. lines 654-656 "[...] which is supported by other findings presented here showing that both VGluT2+ and VGluT1+ neurons project to MeA, while the projection to NAc is almost entirely composed of VGluT1+ neurons".

      Thank you for the suggestion. We have added figure references in the discussion.

      • Line 778 "producing" should be "produce".

      Corrected (see line 840)

      • The figures are very busy, especially all the manipulations. The authors are commended for including each data point, but they might consider a more subtle design (translucent lines only for each animal, and one mean dot for the SEM), just to reduce the overall clutter of an already overwhelming figure set. But this is ultimately left to the authors to resolve and style to their liking. 

      Thank you for the suggestion. We have tried some different styles but like the original best.

      Reviewer #3 (Recommendations for the authors):

      If within reach, I suggest that the author determine the percentage of retrogradely labeled neurons to NAc or MEA that expresses GluT1 and GluT2. 

      We have done this for the middle region plCoA that has the greatest mixture of cell types (See Figure S10, lines 504-517). We find that the MeA projecting neurons are mostly VGluT2+ with a minority that express both VGluT1 and VGlut2. NAc-projecting neurons are primarily VGluT1+ with about 20% expressing VGlut2 as well.

      It would also be nice to sparse label of aplCoA and pplCoA using ChR2 to see if sparse activation drives approach or avoidance. 

      We agree that it would be useful to vary the sparseness of the ChR2 expression, to see if produces similar results. We examined this using sparsely labeled odor ensembles, as previously done (Root et al., 2014). Briefly, we used the Arc-CreER mouse to label TMT responsive neurons with a cre-dependent ChR2 AAV vector targeted to the anterior or posterior regions, while previously we had broadly targeted the entirety of plCoA. We had established that this labeling method captures about half of the active cells detected by Arc expression, which is on the order of hundreds of neurons rather than thousands by broad cre-independent expression. Remarkably, we get effects similar in magnitude that are not significantly different from that with broader activation of the anterior or posterior domains (see new Figure S4, lines 267-288). It still remains possible that there is a threshold number of neurons that are necessary to elicit behavior, but that is beyond the scope of the current study. However, these data indicate that the effect of activating anterior and posterior domains is not an artifact of broad stimulation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      We appreciate the positive assessment. We recognize that since all of the work in this manuscript was done in vitro, there are reasonable concerns about the translatability of these data to clinical settings. These results should not directly inform malaria policy, but we hope that these data bring new considerations to the approach for choosing strategic antimalarial combinations. We have modified the manuscript to clarify this distinction.

      Public Reviews

      Reviewer #1 (Public Review):

      We thank the reviewer for their thoughtful summary of this manuscript. It is important to note that DHA-PPQ did show antagonism in RSAs. In this modified RSA, 200 nM PPQ alone inhibited growth of PPQ-sensitive parasites approximately 20%. If DHA and PPQ were additive, then we would expect that addition of 200 nM PPQ would shift the DHA dose response curve to the left and result in a lower DHA IC50. Please refer to Figure 4a and b as examples of additive relationships in dose-response assays. We observed no significant shift in IC50 values between DHA alone and DHA + PPQ. This suggests antagonism, albeit not to the extent seen with CQ. We have modified the manuscript to emphasize this point. As the reviewer pointed out, it is fortunate that despite being antagonistic, clinically used artemisinin-4-aminoquinoline combinations are effective, provided that parasites are sensitive to the 4-aminoquinoline. It is possible that superantagonism is required to observe a noticeable effect on treatment efficacy (Sutherland et al. 2003 and Kofoed et al. 2003), but that classical antagonism may still have silent consequences. For example, if PPQ blocks some DHA activation, this might result in DHA-PPQ acting more like a pseudo-monotherapy. However, as the reviewer pointed out, while our data suggest that DHA-PPQ and AS-ADQ are “non-optimal” combinations, the clinical consequences of these interactions are unclear. We have modified the manuscript to emphasize the later point.

      While the Ac-H-FluNox and ubiquitin data point to a likely mechanism for DHA-quinoline antagonism, we agree that there are other possible mechanisms to explain this interaction.  We have addressed this limitation in the discussion section. Though we tried to measure DHA activation in parasites directly, these attempts were unsuccessful. We acknowledge that the chemistry of DHA and Ac-H-FluNox activation is not identical and that caution should be taken when interpreting these data. Nevertheless, we believe that Ac-H-FluNox is the best currently available tool to measure “active heme” in live parasites and is the best available proxy to assess DHA activation in live parasites. These points are now addressed in the discussion section. Both in vitro and in parasite studies point to a roll for CQ in modulating heme, though an exact mechanism will require further examination. Similar to the reviewer, we were perplexed by the differences observed between in vitro and in parasite assays with PPQ and MFQ. We proposed possible hypotheses to explain these discrepancies in the discussion section. Interestingly, our data corelate well with hemozoin inhibition assays in which all three antimalarials inhibit hemozoin formation in solution, but only CQ and PPQ inhibit hemozoin formation in parasites. In both assays, in-parasite experiments are likely to be more informative for mechanistic assessment.

      It remains unclear why K13 genotype influences RSA values, but not early ring DHA IC50 values. In K13<sup>WT</sup> parasites, both RSA values and DHA IC50 values were increased 3-5 fold upon addition of CQ. This suggests that CQ-mediated resistance is more robust than that conferred by K13 genotype. However, this does not necessarily suggest a different resistance mechanism. We acknowledge that in addition to modulating heme, it is possible that CQ may enhance DHA survival by promoting parasite stress responses. Future studies will be needed to test this alternative hypothesis. This limitation has been acknowledged in the manuscript. We have also addressed the reviewer’s point that other factors, including poor pharmacokinetic exposure, contributed to OZ439-PPQ treatment failure.

      Reviewer #2 (Public Review):

      We appreciate the positive feedback. We agree that there have been previous studies, many of which we cited, assessing interactions of these antimalarials. We also acknowledge that previous work, including our own, has shown that parasite genetics can alter drug-drug interactions. We have included the author’s recommended citations to the list of references that we cited. Importantly, our work was unique not only for utilizing a pulsing format, but also for revealing a superantagonistic phenotype, assessing interactions in an RSA format, and investigating a mechanism to explain these interactions. We agree with the reviewer that implications from this in vitro work should be cautious, but hope that this work contributes another dimension to critical thinking about drug-drug interactions for future combination therapies. We have modified the manuscript to temper any unintended recommendations or implications.

      The reviewer notes that we conclude “artemisinins are predominantly activated in the cytoplasm”. We recognize that the site of artemisinin activation is contentious. We were very clear to state that our data combined with others suggest that artemisinins can be activated in the parasite cytoplasm. We did not state that this is the primary site of activation. We were clear to point out that technical limitations may prevent Ac-H-FluNox signal in the digestive vacuole, but determined that low pH alone could not explain the absence of a digestive vacuole signal.

      With regard to the “reproducibility” and “mechanistic definition” of superantagonism, we observed what we defined as a one-sided superantagonistic relationship for three different parasites (Dd2, Dd2 PfCRT<sup>Dd2</sup>, and Dd2 K13<sup>R539T</sup>) for a total of nine independent replicates. In the text, we define that these isoboles are unique in that they had mean ΣFIC50 values > 2.4 and peak ΣFIC50 values >4 with points extending upward instead of curving back to the axis. As further evidence of the reproducibility of this relationship, we show that CQ has a significant rescuing effect on parasite survival to DHA as assessed by RSAs and IC50 values in early rings.

      Reviewer #3 (Public Review):

      We thank the reviewer for their positive feedback. We acknowledge that no combinations tested in this manuscript were synergistic. However, two combinations, DHA-MFQ and DHA-LM, were additive, which provides context for contextualizing antagonistic relationships. We have previously reported synergistic and additive isobolograms for peroxide-proteasome inhibitor combinations using this same pulsing format (Rosenthal and Ng 2021). These published results are now cited in the manuscript.

      We believe that these findings are specific to 4-aminoquinoline-peroxide combinations, and that these findings cannot be generalized to antimalarials with different mechanisms of action. Note that the aryl amino alcohols, MFQ and LM, were additive with DHA. Since the mechanism of action of MFQ and LM are poorly understood, it is difficult to speculate on a mechanism underlying these interactions.

      We agree with the reviewer that while the heme probe may provide some mechanistic insight to explain DHA-quinoline interactions, there is much more to learn about CQ-heme chemistry, particularly within parasites.

      The focus of this manuscript was to add a new dimension to considerations about pairings for combination therapies. It is outside the scope of this manuscript to suggest alternative combinations. However, we agree that synergistic combinations would likely be more strategic clinically.

      An in vitro setup allows us to eliminate many confounding variables in order to directly assess the impact of partner drugs on DHA activity. However, we agree that in vivo conditions are incredibly more complex, and explicitly state this.

      We agree that in the future, modeling studies could provide insight into how antagonism may contribute to real-world efficacy. This is outside the scope of our studies.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      The key weaknesses identified in this manuscript are described in the 'weaknesses' section of the public review. The major one is the inconsistency around the H-FluNox response in the chemical vs biological experiments. I can't think of a simple experiment to resolve this issue, but it is good that this data is openly provided in the manuscript. I believe there could be more discussion to clarify this limitation with the current study, and the conclusions, and particularly the title, should be softened regarding the mechanism of antagonism being based on heme reactivity.

      We have softened the title and conclusions to take into account the limitations of our studies.

      (1) Please double-check the definitions for isobologram interpretation. In most antimicrobial interaction studies, I see the threshold for antagonism at sumFIC50 of 1.5, or even 2. 1.25 is often interpreted as additive in many studies.

      We acknowledge that different studies use various cutoff values. Our interpretations for additive versus antagonistic versus superantagonistic were based not only on mean ΣFIC50 values, but also isobologram shape. For example, the flat isoboles for MFQ-DHA were clearly distinct from the curved isoboles of PPQ-DHA. It is unclear what cutoff value(s) would be most clinically relevant.

      (2) For the MFQ-PPQ interaction study, please make it clear that these drugs have very long half-lives (weeks), so the 4 h pulse assay isn't really relevant to their overall activity. It probably shows a slower onset of action, but there is plenty of drug remaining for many days in the clinical scenario, so perhaps the data from the traditional 48h assay is more relevant. The same consideration applies to OZ439, which may impact the interpretation of that data.

      We have now included the half-lives of these compounds in the discussion section. Our intent was to use a pulsing format to make these isobolograms comparable with the other assays. It is important to note that pulses can reveal stronger phenotypes that might be missed with traditional methods. Thus, while 48 h assays may better mimic in vivo conditions, they could also mask important phenotypes.

      Reviewer #3 (Recommendations for the Authors):

      I have included most of my concerns in the public review. Below are some additional specific points for consideration:

      (1) It is expected to include a synergistic combination as a control (e.g., artemisinin + lumefantrine) to contextualize the degree of antagonism observed. The experimental design should show some synergistic profiles in comparison. Adding a few experiments by including a synergistic control is needed.

      Both MFQ-DHA and LM-DHA combinations were additive, which provides context for antagonistic combinations. This is now stated in the results section pertaining to Figure 1. We have also included a reference to our previous publication in which we demonstrated that proteasome inhibitor-peroxide combinations are synergistic to additive using this same pulsing format.

      (2) Consider in vivo validation or pharmacokinetic/pharmacodynamic modeling to strengthen the translational relevance of the findings when it comes to doses and the IC50 correlations.

      We agree that this would be useful to do in future, but it is outside the scope of the current study.

      (3) It would be beneficial to include a discussion section on how the findings are generalizable to different Plasmodium falciparum genotypes (3D7, Dd2, MRA-1284) and their relevance.

      Findings were consistent across three parasite backgrounds depending on PfCRT genotype. This point has been included in the discussion section. The background of these parasites is also provided in Table 1.

      (4) Potential evaluation criteria to understand where certain combinations should be reconsidered can be included as a suggestion for the wider audience.

      Our in vitro studies suggest that pulsing isobolograms would be a useful assay to include when evaluating combination therapies. While we believe that synergistic combinations would be more strategic than antagonistic combinations, we cannot provide evaluation criteria or make recommendations for reconsidering currently used combinations.

      (5) Further elaborate on the mechanistic basis of heme inactivation by quinolines. If data are available, please include more data on the specificity of the process.

      Despite our best efforts, we were unable to evaluate quinoline-heme interactions in parasites. Even in vitro, this interaction has remined elusive for decades. We agree that this would be an important future step towards supporting a specific mechanism for quinoline-DHA antagonism.

    1. The idea of change over time is perhaps the easiest of the C’s to grasp. Students readily acknowledge that we employ and struggle with technologies unavailable to our forebears, that we live by different laws, and that we enjoy different cultural pursuits. Moreover, students also note that some aspects of life remain the same across time.

      It is very important to understand that things are different in both time a culture. Realizing cultural roots can also help better attach ideas and goals to the things we view as different. For example we may not make an old recipe using the exact same methods but rather make it with modern appliances. So in some ways the recipe is different but the root is the same.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-03280

      Corresponding author(s): Stephan Gruber

      1. General Statements [optional]

      First, we would like to thank the editor at Review Commons for the efficient handling of our manuscript. We also apologize for our delayed response.

      We are grateful to all three reviewers for their careful evaluation of our work and for their constructive feedback, which will provide a valuable basis for improving the figures and the text, as described below. We expect to be able to complete the revision following the plan described below quickly.

      We note that the reviewer reports (Rev. #1 and Rev. #3) made us realize that the manuscript text was misleading on the following point. Although we used the purified ATP hydrolysis–deficient Smc protein for sybody isolation, this does not restrict the selection to a specific conformation. As described in detail in Vazquez-Nunez et al. (Figure 5), this mutant displays the ATP-engaged conformation only in a smaller fraction of complexes (~25% in the presence of ATP and DNA), consistent with prior in vivo observations reported by Diebold-Durand et al. (Figure 5). Rather than limiting the selection to a particular configuration, our aim was to reduce the prevalence of the predominant rod state in order to broaden the range of conformations represented during sybody selection. Consistent with this interpretation, only a small number of isolated sybodies show strong conformation-specific binding in the presence or absence of ATP/DNA, as observed by ELISA (now included in the manuscript). We will revise the manuscript text accordingly to clarify this point.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      • *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Gosselin et al., develop a method to target protein activity using synthetic single-domain nanobodies (sybodies). They screen a library of sybodies using ribosome/ phage display generated against bacillus Smc-ScpAB complex. Specifically, they use an ATP hydrolysis deficient mutant of SMC so as to identify sybodies that will potentially disrupt Smc-ScpAB activity. They next screen their library in vivo, using growth defects in rich media as a read-out for Smc activity perturbation. They identify 14 sybodies that mirror smc deletion phenotype including defective growth in fast-growth conditions, as well as chromosome segregation defects. The authors use a clever approach by making chimeras between bacillus and S. pnuemoniae Smc to narrow-down to specific regions within the bacillus Smc coiled-coil that are likely targets of the sybodies. Using ATPase assays, they find that the sybodies either impede DNA-stimulated ATP hydrolysis or hyperactivate ATP hydrolysis (even in the absence of DNA). The authors propose that the sybodies may likely be locking Smc-ScpAB in the "closed" or "open" state via interaction with the specific coiled-coil region on Smc. I have a few comments that the authors should consider:

      Major comments: 1. Lack of direct in vitro binding measurements: The authors do not provide measurements of sybody affinities, binding/ unbinding kinetics, stoichiometries with respect to Smc-ScpAB. Additionally, do the sybodies preferentially interact with Smc in ATP/ DNA-bound state? And, do the sybodies affect the interaction of ScpAB with SMC? It is understandable that such measurements for 14 sybodies is challenging, and not essential for this study. Nonetheless, it is informative to have biochemical characterization of sybody interaction with the Smc-ScpAB complex for at least 1-2 candidate sybodies described here.

      We agree with the reviewer that adding such data would be reassuring and that obtaining solid data using purified components is not easy even for a smaller selection of sybodies. We have data that show direct binding of Smc to sybodies by various methods including ELISA, pull-downs and by biophysical methods (GCI). Initially, we omitted these data from the manuscript as we are convinced that the mapping data obtained with chimeric SMC proteins is more definitive and relevant. During the revision we will incorporate the ELISA data showing direct binding and also indicating a lack of preference for a specific state of Smc.

      Many modes of sybody binding to Smc are plausible The authors provide an elaborate discussion of sybodies locking the Smc-ScpAB complex in open/ closed states. However, in the absence of structural support, the mechanistic inferences may need to be tempered. For example, is it also not possible for the sybodies to bind the inner interface of the coiled-coil, resulting in steric hinderance to coiled-coil interactions. It is also possible that sybody interaction disrupts ScpAB interaction (as data ruling this possibility out has not been provided). Thus, other potential mechanisms would be worth considering/ discussing. In this direction, did AlphaFold reveal any potential insights into putative binding locations?

      We have attempted to map the binding by structure prediction, however, so far, even the latest versions of AlphaFold are not able to clearly delineate the binding interface. Indeed, many ways of binding are possible, including disruption of ScpAB interaction. However, since the main binding site is located on the SMC coiled coils, the later scenario would likely be an indirect consequence of altered coiled coil configuration, consistent with our current interpretation.

      1. Sybody expression in vivo Have the authors estimated sybody expression in vivo? Are they all expressed to similar levels?

      We have tagged selected sybodies with gfp and performed live cell imaging. This showed that they are all roughly equally expressed and that they localize as foci in the cell presumably by binding to Smc complexes loaded onto the chromosome at ParB/parS sites. We will include this data in the revised version of the manuscript.

      1. Sybodies should phenocopy ATP hydrolysis mutant of Smc The sybodies were screened against an ATP hydrolysis deficient mutant of Smc, with the rationale that these sybodies would interfere this step of the Smc duty cycle. Does the expression of the sybodies in vivo phenocopy the ATP hydrolysis deficient mutant of Smc? Could the authors consider any phenotypic read-outs that can indicate whether the sybody action results in an smc-null effect or specifically an ATP hydrolysis deficient effect?

      As eluded to above, we think that our selection gave rise to sybodies that bind various, possibly multiple Smc conformations. Consistent with this idea, the phenotypes are similar to null mutant rather than the ATP-hydrolysis defective EQ mutant, which display even more severe growth phenotypes. We will add the following notes to the text:

      “These conditions favour ATP-engaged particles alongside the typically predominant ATP-disengaged rod-shaped state (add Vazquez Nunez et al., 2021).”

      “ELISA data confirm that nearly all clones bind Smc-ScpAB; however, their binding shows little or no dependence on the presence of ATP or DNA.”

      Minor comments: 1. It was surprising that no sybodies were found that could target both bacillus and spneu Smc. For example, sybodies targeting the head regions of Smc that might work in a more universal manner. Could the authors comment on the coverage of the sybodies across the protein structure?

      It is rather common that sybodies (like antibodies and nanobodies) exhibit strong affinity differences between highly conserved proteins (> 90 % identity). The underlying reasons for such strong discrimination are i) location of less conserved residues primarily at the target protein surface and ii) the large interaction interface between sybody and target which offers multiple vulnerabilities for disturbance, in particular through bulky side chains resulting in steric clashes. Another frequently observed phenomenon is sybody binding to a dominant epitope, which also often applies to nanobodies and antibodies. A great example for this are the dominant epitopes on SARS-CoV-2 RBDs.

      Growth curves (Fig. S3) show a large jump in recovery in growth under sybody induction conditions. Could the authors address this observation here and in the text?

      We suppose that this recovery represents suppressor mutants and/or (more likely) improved growth in the absence of functional Smc during nutrient limitation (see Gruber et al., 2013 and Wang et al., 2013). We will add this statement to the text.

      L41- Sentence correction: Loop can be removed. Ah, yes, sorry for this confusing error. Thank you. 4. L525 - bsuSmc 'E' :extra E can be removed. To do. Thank you. 5. References need to be properly formatted. To do. Thank you. 6. The authors should add in figure legend for Fig 1i) details on representation of the purple region, and explain the grey strokes for orientation of the loop. To do. 7. How many cells were analysed in the cell biological assays? Legends should include these information. To Be Included.

      Reviewer #1 (Significance (Required)):

      Overall, this is an impressive study that uses an elegant strategy to find inhibitors of protein activity in vivo. The manuscript is clearly written and the experiments are logical and well-designed. The findings from the study will be significant to the broad field of genome biology, synthetic biology and also SMC biology. Specifically, the coiled coil domain of SMC proteins have been proposed to be of high functional value. The authors have elegantly identified key coiled-coil regions that may be important for function, and parallelly exhibited potential of the use of synthetic sybody/designed binders for inhibition of protein activity.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Review: "Single Domain Antibody Inhibitors Target the Coiled Coil Arms of the Bacillus subtilis SMC complex" by Ophélie Gosselin et al, Review Commons RC-2025-03280 Structural Maintenance of Chromosome proteins (SMCs), a family of proteins found in almost all organisms, are organizers of DNA. They accomplish this by a process known as loop extrusion, wherein double-stranded DNA is actively reeled in and extruded into loops. Although SMCs are known to have several DNA binding regions, the exact mechanism by which they facilitate loop extrusion is not understood but is believed to entail large conformational changes. There are currently several models for loop extrusion, including one wherein the coiled coil (CC) arms open, but there is a lack of insightful experimentation and analysis to confirm any of these models. The work presented aims to provide much-needed new tools to investigate these questions: conformation-selective sybodies (synthetic nanobodies) that are likely to alter the CC opening and closing reactions. The authors produced, isolated, and expressed sybodies that specifically bound to Bacillus subtilis Smc-ScpAB. Using chimeric Smc constructs, where the coiled coils were partly replaced with the corresponding sequences from Streptococcus pneumoniae, the authors revealed that the isolated sybodies all targeted the same 4N CC element of the Smc arms. This region is likely disrupted by the sybodies either by stopping the arms from opening (correctly) or forcing them to stay open (enough). Disrupting these functional elements is suggested to cause the Smc-dependent chromosome organization lethal phenotype, implying that arm opening and closing is a key regulatory feature of bacterial Smc-ScpAB. In summary, the authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes. Some specific comments: Line 75: "likely stabilizing otherwise rare intermediates of the conformational cycle." - sorry, why is that being concluded? Why not stabilizing longer-lived oncformations? We will clarify this statement!

      Line 89: Sorry, possibly our lack of understanding: why first ribosome and then phage display?

      Ribosome display offers to screen around 10^12 sybodies per selection round (technically unrestricted library size), while for phage display, the library size is restricted to around 10^9 sybodies due to the fact that production of a phage library requires transformation of the phagemid plasmid into E. coli, thereby introducing a diversity bottleneck. This is why the sybody platform starts off with ribosome display. It switches to phage display from round 2 onwards because the output of the initial round of ribosome display is around 10^6 sybodies, which can be easily transferred into the phage display format. Phage display is used to minimize selection biases. For more information, please consult the original sybody paper (PMID: 29792401).

      Line 100: Why was only lethality selected? Less severe phenotypes not clear enough?

      Yes, colony size is more difficult to score robustly, as the sizes of individual transformant colonies can vary quite widely. The number of isolated sybodies was at the limit of further analysis.

      Line 106: Could it be tested somehow if convex and concave library sybodies fold in Bs?

      We did not focus on the non-functional sybody candidates and only sybodies of the loop library turned out to cause functional consequences at the cellular level. Notably, we will include gfp-imaging showing that non-lethal sybodies are expressed to similar levels that toxic sybodies. Given the identical scaffold of concave and loop sybodies (they only differ in their CDR3 length), we expect that the concave sybodies fold in the cytoplasm of B. subtilis. For the convex sybodies exhibiting a different scaffold, this will be tested.

      Line 125: Could Pxyl be repressed by glucose?

      To our knowledge and experience, repression by glucose (catabolite repression) does not work well in this context in B. subtilis.

      Line 131: The SMC replacement strain is a cool experiment and removes a lot of doubts!

      Thank you! (we agree 😊)

      Line 141: The mapping is good and looks reliable, but looks and feels like a tour de force? Of course, some cryo-EM would have been lovely (lines 228-229 understood, it has been tried!).

      Yes, we have made several attempts at structural biology. Unfortunately, Smc-ScpAB is not well suited for cryo-EM in our hands and crystallography with Smc fragments and sybodies did not yield well-diffracting crystals.

      Line 179: Mmmh. Do we not assume DNA binding on top of the dimerised heads to open the CC (clamp)?

      We will clarify the text here.

      Line 187: Having sybodies that presumably keep the CC together (closing) and some that do not allow them to come together correctly (opening) is really cool and probably important going forward.

      Thank you!

      Figure 1 Ai is not very colour-blind friendly.

      We are sorry for this oversight. We will try to make the color scheme more inclusive. Thank you for the notification.

      Optional: did the authors see any spontaneous mutations emerge that bypass the lethal phenotype of sybody expression?

      No, we did not observe spontaneous mutations suppressing the phenotype, possibly due to the limited number of cell generations observed. We tried to avoid suppressors by limiting growth, but this may indeed be a good future approach for further fine map the binding sites and to obtain insights into the mechanism of inhibition.

      Optional: we think it would be nice to try some biochemical experiment with BMOE/cysteine-crosslinked B. subtilis Smc in the mid-region (4N or next to it) of the Smc coiled coils to try to further strengthen the story. Some of the authors are experts in this technique and strains might already exist?

      We have indeed tried to study the impact of sybody binding on Smc conformation by cysteine cross-linking. However, we were not convinced by the results and thus prefer not to draw any conclusions from them. We will add a corresponding note to the text.

      Reviewer #2 (Significance (Required)):

      The authors present a new method for trapping bacterial Smc's in certain conformations using synthetic antibodies. Using these antibodies, they have pinpointed the (previously suggested) 4N region of the coiled coils as an essential site for the opening and closing of the Smc coiled coil arms and that hindering these reactions blocks Smc-driven chromosomal organization. The work has important implications for how we might elucidate the mechanism of DNA loop extrusion by SMC complexes. Thank you!

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Gosselin et al. use the sybody technology to study effects of in vivo inhibition oft he Bacillus subtilis SMC complex. Smc proteins are central DNA binding elements of several complexes that are vital for chromosome dynamics in almost all organisms. Sybodies are selected from three different libraries of the single domain antibodies, using the „transition state" mutant Smc. They identify 14 such mutant sybodies that are lethal when expressed in vivo, because they prevent proper function of Smc. The authors present evidence suggesting that all obtained sybodies bind to a coiled-coil region close to the Smc „neck", and thereby interfere with the Smc activity cycle, as evidenced by defective ATPase activity when Smc is bound to DNA. The study is well done and presented and shows that the strategy is very potent in finding a means to quickly turn off a protein's function in vivo, much quicker than depleting the protein.

      The authors also draw conclusions on the molecular mode of action of the SMC complex. The provide a number of suggestive experiments, but in my view mostly indirect evidence for such mechanism.

      My main criticism ist hat the authors have used a single - and catalytically trapped form of SMC. They speculate why they only obtain sybodies from one library, and then only idenfity sybodies that bind to a rather small part oft he large Smc protein. While the approach is definitely valuable, it is biassed towards sybodies that bind to Smc in a quite special way, it seems. Using wild type Smc would be interesting, to make more robust statements about the action of sybodies potantially binding to different parts of Smc.

      As explained above, we are quite confident the Smc ATPase mutation did not bias the selection in an obvious way. The surprising bias towards coiled coil binding sites has likely other explanations, as they likely form a preferred epitope recognized by sybodies.

      Line 105: Alternatively, the other libraries did not produce good binders or these sybodies were 106 not stably expressed in B. subtilis. This could be tested using Western blotting - I am assuming sybody antibodies are commercially avalable. However, this test is not important for the overall study, it would just clarify a minor point.

      While there are antibody fragments available to augment the size of sybodies (PMID: 40108246), these recognize 3D-epitopes and are thus not suited for Western blotting. We did not follow up on the negative results much, but would like to point out again that there are several biases that likely emerge for the same reason (bias to library, bias to coiled coil binding site). If correct, then likely few other sybodies are effectively lethal in B. subtilis, with the exception of the ones isolated and characterized. We have added this notion to the manuscript. We have also tested the expression of non-lethal sybodies by gfp-tagging and imaging. These results will be included in the revision.

      Fig. 2B: is is odd to count Spo0J foci per cells, as it is clear from the images that several origins must be present within the fluorescent foci. I am fine with the „counting" method, as the images show there is a clear segregation defect when sybodies are expressed, I believe the authors should state, though, that this is not a replication block, but failure to segregate origins.

      We agree that this is an important point and will add a corresponding comment to the text.

      Testing binding sites of sybodies tot he SMC complex is done in an indirect manner, by using chimeric Smc constructs. I am surprised why the authors have not used in vitro crosslinking: the authors can purify Smc, and mass spectrometry analyses would identify sites where sybodies are crosslinked to Smc. Again, I am fine with the indirect method, but the authors make quite concrete statements on binding based on non-inhibition of chimeric Smc; I can see alternative explanations why a chimera may not be targeted.

      We have made several attempts of testing direct binding with mixed outcomes and decided to not include those results in the light of the stronger and more relevant in vivo mapping. However, we will add ELISA results and briefly discuss grating coupled interferometry (GCI) data and pull-downs.

      Smc-disrupting sybodies affect the ATPase activity in one of two ways. Again, rather indirect experiments. This leads to the point Revealing Smc arm dynamics through synthetic binders in the discussion. The authors are quite careful in stating that their experiments are suggestive for a certain mode of action of Smc, which is warranted.

      In line 245, they state More broadly, the study demonstrates how synthetic binders can trap, stabilize, or block transient conformations of active chromatin-associated machines, providing a powerful means to probe their mechanisms in living cells. This is off course a possible scenario for the use of sybodies, but the study does not really trap Smc in a transient conformation, at least this is not clearly shown.

      We agree and will carefully rephrase this statement. Thank you.

      Overall, it is an interesting study, with a well-presented novel technology, and a limited gain of knowledge on SMC proteins. We respectfully disagree with the last point, since our unique results highlight the importance of the Smc coiled coils, which are otherwise largely neglected in the SMC literature, likely (at least in part) due the mild effect of single point mutations on coiled coil dynamics.

      Reviewer #3 (Significance (Required)):

      The work describes the gaining and use of single-binder antibodies (sybodies) to interfere with the function of proteins in bacteria. Using this technology for the SMC complex, the authors demonstrate that they can obtain a significant of binders that target a defined region is SMC and thereby interfere with the ATPase cycle.

      The study does not present a strong gain of knowledge of the mode of action of the SMC complex.

      As pointed out above, we respectfully disagree with this assertion.

      • *

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      As pointed out above, there are a few minor points that we prefer not to experimentally address. In particular, we do not consider it as necessary to determine the expression levels of sybodies which were non-inhibitory. We also wish to note that we attempted to obtain structural additional biochemical data and to that end performed cryo-EM, crystallography and cysteine cross-linking experiments. Unfortunately, we did not obtain sybody complex structures and the cross-linking data were unfortunately not conclusive. We also wish to note that the first author has finished her PhD and left the lab, which limits our capacity to add additional experiments. However, as the reviewers also pointed out, the main conclusions are well supported by the data already.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers

      We thank the Reviewers for their appreciative comments (Reviewer 1: “first time that a well-established existing mathematical model of signaling response extended and applied to heterogeneous ligand mixtures”)and constructive suggestions for improvement. In this extensive revision, we have not only addressed the suggestions comprehensively but also extended our analysis of signaling antagonism to all doses and at the single-cell level using novel computational workflows. This resulted in the discovery of several mechanismsof antagonism and synergy that are dose-dependent, and dependent on the cell-specific state of the signaling network, thereby manifesting in only a subset of cells.

      We have addressed Reviewer comments: we have made substantial revisions to improve clarity, rigor, and biological interpretation. Below we briefly summarize the main concerns raised by Reviewers 1-3 and how we have addressed them.

      • We have rewritten the Methods section to clarify our approaches. We have also added the explanation of methodology and the rationale in the main text to improve readability and comprehensiveness (Addressing Reviewer #1 comments). This includes explaining and justifying the signaling codon approaches (Reviewer 1), our core-module parameter matching methodology and discussion (Reviewer #1, point 11, Reviewer #2, point 1), and the model schematic (Reviewer #1, point 5).
      • For one of our major conclusions – that macrophages may distinguish stimuli in the context of ligand mixtures – we have validated these results with experiments, which increases confidence in this conclusion (Reviewer #2, point 3, Reviewer #3, point 2).
      • We have updated the model for CpG-pIC competition using Michaelis–Menten kinetics without any additional parameters, rather than introducing new free parameters. This change removes parameter freedom for fitting combinatorial conditions, leading to a more constrained and mechanistically grounded model whose predictions align better with experimental data (Updated Figures 2 and S2; Reviewer #2, point 2).
      • We have addressed all other editorial and clarification-related concerns as well, as detailed in our point-by-point response below. In addition, we have extended the scope of the manuscript. We have extended our analysis of ligand combinations across a broad dose range, from non-responsive to saturated conditions. This led to several additional discoveries. For example, we show that ultrasensitive IKK activation can underlie synergistic combinations of ligands at low doses. In contrast, beyond the CpG-poly(I:C) antagonism, we identify that competition for CD14 uptake by LPS and Pam can generate antagonism between these ligands within specific dose ranges.

      Importantly, such antagonism or synergy is not evident in all cells in the population. It may also not be picked up by studies of the mean behavior. With our new computational workflow that allows for single-cell resolution we identify the conditions that must be met by the signaling network state, for antagonism or synergy to take place.

      Further, we examine the hypothesis that such signaling pathway interactions affect stimulus-response specificity in combinatorial stimulus conditions. By comparing models with and without this antagonism, we demonstrate that antagonistic interactions can improve stimulus-response specificity in complex ligand mixtures.

      These additional analyses provide a new mechanistic understanding of cellular information processing and elucidate how synergy and antagonism can mechanistically shape signaling fidelity in response to complex ligand mixtures.

      Point-by-Point Response

      Reviewer #1

      Evidence, reproducibility and clarity

      The authors extend an existing mathematical model of NFkB signalling under stimulation of various single receptors, to model that describes responses to stimulation of multiple receptors simultaneously. They compare this model to experimental data derived from live-cell imaging of mouse macrophages, and modify the model to account for potential antagonism between TLR3 and TLR9 response due to competition for endosomal transport. Using this framework they show that, despite distinguishability decreasing with increasing numbers of heterogenous stimuli, macrophages are still able in principle to distinguish these to a statistically significant degree. I congratulate the authors on an interesting approach that extends and validates an existing mathematical model, and also provides valuable information regarding macrophage response.

      Response: We thank the reviewer for this appreciative assessment and for the careful reading of our work. The constructive comments helped us substantially improve the rigor and clarity of the manuscript.

      In addition to revising the text for clarity, we have extended our analysis to systematically investigate dose-response behavior for each pair of ligand combination. Using the experimentally validated model, we explored 10 ligand pairs across a range of doses from non-responsive to saturating. This allowed us to identify mechanistic regimes in which synergy and antagonism arise at the single-cell level. In particular, we found that low-dose synergy can be explained by ultrasensitive IKK activation (Figure 4 and corresponding supplementary figures), while antagonism can emerge from competition for shared components such as CD14 (Figure 5 and corresponding supplementary figures). We further show that antagonism can enhance condition distinguishability in ligand mixtures, thereby contributing to stimulus-response specificity (Figure 5 and corresponding supplementary figures).

      There are no major issues affecting the scientific conclusions of the paper, however the lack of detail surrounding the mathematical model and the 'signaling codons' that are used throughout the paper make it difficult to read. This is exacerbated by the fact that I was unable to find Ref 25 which apparently describes the model, however I was able to piece together the essential components from the description in Ref 8 and the supplementary material.

      Response: This comment helped us to improve the writing. We apologize that the key reference 25 was still not publicly available. It is now published in Nature Communications. In addition, we have added more details to clarify the mathematical model as well as the signaling codons, in results and in methods. Please see below for details.

      Lots of the minor comments below stem from this, however there are also a few other places that could benefit from some additional clarification and explanation.

      Significance: 1. '...it remains unclear complex...' -> '...it remains unclear whether complex...' Response: We have rewritten the Significance (now it is Synopsis).

      Introduction: 2. 'temporal dynamics of NFkB' - it would be good to be more concrete regarding the temporal dynamics of what aspect of this (expression, binding, conformation, etc), if possible. Response: It refers to the presence of NFκB into nucleus, which represents active NFκB capable of activating gene expression. We have clarified this (Lines 59-61 in introduction paragraph 2). “Upon stimulation, NFκB translocates into the nucleus, … activating immune gene expression (10, 15–19).

      'signaling codons' - the behaviour of these is key to the entire paper, so even if they are well described in the reference, it would be good to have a short description as early as possible so that the reader can get an idea in their mind what exactly is being discussed here. Later, it would be good to have concrete description of exactly what these capture.

      Response: We thank the reviewer for this comment. We have added one whole paragraph in the early introduction to describe the concept of Signaling Codons which allow quantitative characterization of NFkB stimulus-response-specific dynamics (Lines 60-67). We have also added more concrete description of Signaling Codons in the results as well as adding an illustration for the signaling codons (Lines 169-175, Figure S2B).

      'This challenge...population of macrophages' - this seems a bit out of place, and is a bit of a run on sentence, so I suggest moving this to the next paragraph and working it into the first sentence there '...regulatory mechanisms, and this challenge could be addressed with a model parameterised to account for heterogeneous...Early models ...', or something similar.

      Response: We thank the reviewer for this suggestion, we have revised this as suggested. This improves the logic flow (Lines 87-88).

      Ref 25: I can't find a paper with this title anywhere, so if it's an accepted preprint then it would be good to have this available as well. That said, I still think it would be difficult to grasp the work done in this paper without some description of the mathematical model here, at least schematically, if not the full set of ODEs. For example, there are numerous references to how this incorporates heterogeneous responses, the 'core module', etc, and the reader has no context of these if they aren't familiar with the structure of the model. Response: We apologize that Ref 25 was not on PubMed. Now it’s published, and we have updated the corresponding information. This comment also helped us to improve the writing by adding a description of the mathematical model in the Introduction (Lines 95-105), the results (Lines 129-141), and a detailed description of the model in the Methods (Simulation of heterogenous NFκB dynamical responses.)

      We have also added the schematic of the model topology in Figure S1 (adapted from previous publications Guo et al 2025, Adelaja et al 2021) to make sure the paper is self-contained.

      'A key challenge which is...' -> 'A key challenge is...' Response: We have revised the Introduction and removed this sentence.

      'With model simulation ...' -> a bit of a run on sentence, I suggest breaking after 'conditions'. Response: We have revised the introduction and removed this sentence.

      Results:

      1. This section would benefit from a more in-depth description of the model and experimental setup. In particular for the experiment, the reader never really knows what this workflow for this is, nor what the model ingests as input, and what the predictions are of. Response: This comment helped us to improve clarity by adding an in-depth description of the model and experimental setup. We have revised the Results as suggested (Lines 129-141). We also appended the corresponding revision here for reviewer reference.

      This mechanistic model was trained on single-ligand response experimental datasets, capturing the single-ligand stimulus-response specificity of the population of macrophages while accounting for cellular heterogeneity. Specifically, quantitative NFκB dynamic trajectory data from hundreds of single macrophages responding to five single ligands (TNF, pIC, Pam, CpG, LPS) at 3-5 doses was obtained from live cell imaging experiments. The mathematical model (Figure S1) consists of a 52-dimensional system of ordinary differential equations, including 52 intracellular species, 101 reactions and 133 parameters, and is divided into five receptor modules, which respond to the corresponding ligands respectively, and the IKK-NFκB core module that contains the prominent IκBα negative feedback loop. By fitting the single-cell experimental data set with a non-linear mixed effect statistical model (coupling with 52-dimensional NFκB ODE model), the parameter distributions for the single-cell population were inferred. Analyzing the resulting simulated NFκB trajectories with Information theoretic and machine learning classification analyses confirmed that the virtual cell model simulations reproduced key SRS performance characteristics of live macrophages.”

      '..mechanistic model was trained...' - trained in this study, or in the previous referenced study? Response: The mechanistic model was trained in a previous study (Guo et al 2025 Nature Comm), and we have clarified this in the revision (Lines 127 - 129).

      1. 'determined parameter distributions' - this is where it would be good to have more background on the model. What parameters are these, and what do they correspond to biologically? It would also be nice to see in the methods or supplementary material how this is done (maximum likelihood, etc). Response: This comment helps us to clarify the predetermined parameter distributions. We have revised the methods to include this information (Simulation of heterogenous NFκB dynamical responses, paragraph 3). We have appended the corresponding text here for reviewer’s convenience.

      “The ODE model was then fitted to the population of single-cell trajectories to recapitulate the cell-to-cell heterogeneity in the experimental data (2). This is achieved by solving the non-linear mixed effects model (NLME) through stochastic approximation of expectation maximation algorithm (SAEM) (3–6). Seventeen parameters were estimated. Within the core module, the estimated parameters included the rates governing TAK1 activation (k52, k65), the time delays of IκBα transcription regulated by NFκB (k99, k101), and the total cellular NFκB abundance (tot NFκB). Within the receptor module, receptor synthesis rates (k54 for TNF, k68 for Pam, k85 for CpG, k35 for LPS, k77 for pIC), degradation rates of the receptor–ligand complexes (k56, k61, k64 for TNF; k75 for Pam; k93 for CpG; k44 for LPS; k83 for pIC), and endosomal uptake rates (k87 for CpG; k36 and k40 for LPS; k79 for pIC) were fitted. All remaining parameters were fixed at literature-suggested values (1). The single-cell parameters inferred from experimental individualcell trajectories then served as empirical distributions for generating the new dataset (see SupplementaryDataset2).”

      'matching cells with similar core model...' - it's difficult to follow the logic as to why this is done, so I think this needs to be a little clearer. My guess would be that the assumption is that simulated cells with similar 'core' parameters have a similar downstream signalling response, and therefore the receptors can be 'transplanted'. So it would be nice to see exactly what these distributions are and what the effect of a bad match would be. Response: We thank the reviewer for this comment. In the revision, we have explained the rationale for matching cells with similar core module (Lines 145-152).

      Previous work determined parameter distributions for only the cognate receptor module (and the core module) that provided the best fit for the relevant single ligand experimental data (Figure 1A, Step 1), but other receptor modules’ parameter values were not determined. To simulate stimulus responses to more than two ligands, we imputed the other ligand-receptor module parameters using shared core-module parameters as common variables and employing nearest-neighbor hot-deck imputation (35). In this setup, the core module functions as an “anchor” to harmonize two or more receptor-specific parameter distributions.

      This nearest-neighbor hot-deck imputation approach (the core module matching method) was shown to outperform other approaches, including random matching and rescaled-similarity matching (Guo et al. 2025, Supplementary Figure S11). For the reviewer’s convenience, we have also appended the corresponding figure below.

      Figure S11 from (Guo et al., 2025). Assessment of matching techniques for predicting single-cell responses to various ligand stimuli (a-d). Heatmaps illustrating the Wasserstein distance between the signaling codon distributions predicted by the model and those observed in experiments. The analysis employs four distinct matching methods to align the five ligand-receptor module parameters: (a) “Random Matching”, (b) “Similarity Matching” (the method used in our study), (c) “Rescaled-Similarity Matching”, and (d) “Sampling Approximated Distribution”. In the heatmaps, rows represent signaling codons, columns denote ligands, and the color intensity indicates the Wasserstein distance, providing a visual metric of similarity between model predictions and experimental data. e-f. Histogram of the average Wasserstein distance between the model-predicted and experimentally observed signaling codon distributions, summarized across signaling codons (e) and ligands (f).

      Some explanation of how this relates to the experimental data the parameters are fit on would also be useful. (a) Is there a correspondence between individual simulated cells and the experimental data for the single ligand stimulation, and then the smallest set of these is taken? Is there also a matching from the simulated multi-receptor modules and the multi-receptor data, and if so, is this done in the same way? Response: This comment to help us clarify the correspondence relationship between model simulations and experimental data.

      Yes—there is a correspondence between individual simulated cells and the previously published experimental data (Guo et al., 2025b) for single-ligand stimulation. We have revised the first paragraph of the Results (Lines 136–148) and the Methods (Lines 544-557) to clarify how the model simulations were fit to the previous experimental dataset. See Reviewer 1, Comments 10 for the updates in Methods. We have pasted in the revised Results section below for the reviewer’s reference.

      By fitting the single-cell experimental data set with a non-linear mixed effect statistical model (coupling with 52-dimensional NFκB ODE model), the parameter distributions for the single cell population were inferred.

      'six signaling codons' - here it would be good to recapitulate what these represent, but also what the 'strength' and 'activity' correspond to (total integrated value, maximum value, etc) Response: We thank the reviewer for the suggestion and have clarified this point (Lines 169-175, Figure S2B).

      'pre-defined thresholds' - no need to state these numerically in the text (although giving some sense of how/why these were chosen would give some context), but I couldn't find the values of these, nor values corresponding to the signaling codons. Response: We appreciate the reviewer’s comment. We have added this information in the figure legend (Figure 1B-C) and Method -- “Responder fraction” (Lines 666-672). Specifically, for the model simulation data, the integral thresholds are 0.4 (µM·h), 0.5 (µM·h), and 0.6 (µM·h). The peak thresholds are 0.12 (µM), 0.14 (µM), and 0.16 (µM). For the experimental data, the integral thresholds are 0.2 (A.U.·h), 0.3 (A.U.·h), and 0.4 (A.U.·h). The peak thresholds are 0.14 (A.U.), 0.18 (A.U.), and 0.22 (A.U.). Thresholds were selected so that the medium threshold yields 50% responder cells under single-ligand conditions, while the responder ratio remains unsaturated under three-ligand stimulation.

      'non-responder cells are likely a result of cellular heterogeneity in receptor modules rather than the core module' - is this the 'ill health' referenced earlier? If so make this clear. Response: Yes, this is the ‘ill health’ referenced earlier, and we have clarified this (Lines 198-199).

      It's also very difficult to follow this chain of logic, given that the reader at this point doesn't have any knowledge of what the 'core' module is, nor the significance of the thresholds on the signaling codons. I would suggest making this much clearer, with reference to each of these. Response: We apologize for the poor explanation. We have now explained in the Introduction (Lines 95-106) and the results (Lines 129-141) how the model is structured into receptor-proximal modules that converge on the common core module. We have also added a schematic for clarity (Figure S1). For further clarification of the math models, we have significantly revised the Methods (Simulation of heterogenous NFκB dynamical responses). The defined thresholds are clarified in the Methods -- “Responder fraction”.

      '...but the model represented these as independent mass action reactions' - the significance of this may not be clear to someone not familiar with biophysical models, so probably better to make it explicit. Response: We thank the reviewer for this reminder, and we have added a description of the significance of this point (Lines 225-227).

      '...we trained a random forest classifier...' - is this trained on the 'raw' experimental time series data, or on the signaling codons? Response: It is trained on the signaling codons calculated from model simulations of NFκB trajectories. We have clarified this (Lines 260-261).

      'We also applied a Long Short-Term Memory (LSTM) machine learning model...' - it might be good to reference these three approaches at the beginning of this section, otherwise they seem to come out of the blue a little. Response: We have added the references of these three approaches in the beginning of this section (Lines 242-246).

      'We then used machine learning classifiers...' - random forests, LSTMs, or a different model? Response: We have clarified that this as random forest classifier (Line 276).

      Discussion:

      1. '...over statistical models...' - suggest maybe 'purely statistical models' Response: We thank the reviewer for this suggestion. We have rewritten the whole Discussion to include the new insights of antagonism and synergy and their roles in maintaining unexpectedly high SRS performance. Thus, this sentence was removed.

      'We found that endosomal transport...' - A paper by Huang, et. al. (https://www.jneurosci.org/content/40/33/6428) observed a synergistic phagocytic response between CpC and pIC stimulation in microglia. This is still consistent with a saturation effect dependent on dose, but may be worth a mention. Response: We thank the reviewer for referring this interesting paper to us, and this comment helps us to improve the Discussion of inflammatory signaling pathways besides NFκB. This paper demonstratessynergistic effects between CpG and pIC in inhibiting tumor growth and promoting cytokine production(Huang et al., 2020), such as IFN-β and TNF-α, whose expression is also regulated by the IRF and MAPK signaling pathways (Luecke et al., 2021; Sheu et al., 2023). This finding does not contradict our findings that CpG and pIC act antagonistically in the NFκB signaling pathway because of the combinatorial pathways that act on gene expression: CpG can activate the MAPK signaling pathway (Luecke et al., 2024) but not the IRF signaling pathway, whereas pIC activates the IRF signaling pathway (Akira and Takeda, 2004) but only weakly the MAPK pathway. Therefore, their combination can synergistically regulate inflammatory responses. We have added this to the discussion (Lines 515-522).

      '...features termed...' -> 'features, termed' Response: We thank the reviewer for their carefully reading, and we have rewritten the Discussion.

      '...we applied a Long Short-Term Memory (LSTM) machine learning model..' - maybe make clear that this is on the time-series data (also LSTM has already been defined). Response: We thank the reviewer for their carefully reading, and we have rewritten the Discussion.

      Materials and methods:

      1. The descriptions in this section are quite vague, so I would suggest expanding this with more detail from the supplementary material, where things are quite well explained. Response: We thank the reviewer for this suggestion, and we have rewritten the whole Methods as suggested.

      'sampling distribution' - not clear what this refers to in this context Response: We have clarified this in the revision (Methods -- Simulation of heterogenous NFκB dynamical responses, paragraph 3). The single-cell signaling-pathway parameter values used for bootstrapping sampling to generate model simulations are given in Supplementary dataset 2.

      'RelA-mVenus mouse strain' - it would be good to mention the relevance of the reporter for NFkB signaling Response: We have added the relevance of the reporter for NFkB signaling (Methods, Lines 624-626).

      '...A random forest classifier...' -> a random forest classifier

      Response: We have rewritten the methods.

      Significance

      This study provides mechanistically interpretable insight on the important question of how immune cells perform target recognition in realistic scenarios, and also provides validation of existing mathematical models by extending these beyond their original domain. The paper uses 'signaling codons' as a proxy for information processing, however in this instance it is cross-validated with an LSTM model that is applied directly to the time series data. Nevertheless, the scope of the paper is such that it does not deal with the question of how these signals are transmitted or used in a downstream immune response. To my knowledge, this is the first time that a well established existing mathematical model of signalling response has been extended and applied to heterogeneous ligand mixtures. These results will be of interest to those studying immune cell responses, and to those interested in basic research on mathematical models of signaling and cellular information processing more generally.

      My background is in biophysical models, machine learning, and signaling in cancer. I have a basic understanding of immunology, but no experience in experimental cell biology.

      Response: We thank the reviewer for highlighting the novelty of our study. We appreciate the reviewer’s recognition that our work advances the understanding of cellular information processing in the context of ligand mixtures, particularly as the first to extend computational models to investigate signaling fidelity under mixed-ligand conditions.

      We agree that this work will interest computational biologists focused on signaling network modeling and information processing. In addition, we believe it will also be valuable for all signaling biologists, as we provide fundamental insights. For experimental biologists in particular, our model provides an efficient, quantitative framework for exploring and generating testable hypotheses.

      We would also like to gently emphasize that evaluating specificity within signaling pathways is as essential as studying downstream functional responses. While immune function outcomes are certainly important, they rely on the upstream signaling pathways that first respond to environmental cues. Understanding how these signaling pathways achieve specificity and discriminability is therefore crucial. For example, this is particularly relevant for drug development targeting pathways such as NFκB, where assessing the direct signaling output—NFκB activation dynamics—can provide valuable insight into the effects of pharmacological interventions.

      Reviewer #2

      Evidence, reproducibility and clarity

      Guo et al. developed a heterogeneous, single-cell ODE model of NFκB signaling parameterized on five individual ligands (TNF, Pam, LPS, CpG, pIC) and extended it, via core-module parameter matching, to predict responses to all 31 combinations of up to five ligands. They found that simulated responder fractions and signaling codon features generally agreed with live-cell imaging data. A notable discrepancy emerged for the CpG (TLR9) + pIC (TLR3) pair: experiments exhibited non-integrative antagonism unpredicted by the original model. This issue was resolved by incorporating a Hill-type term for competitive, limited endosomal trafficking of these ligands. Finally, by decomposing NFκB trajectories into six "signaling codons" and applying Wasserstein distances plus random-forest and LSTM classifiers, the authors showed that stimulus-response specificity (SRS) declines with ligand complexity but remains statistically significant even for quintuple mixtures. This is a well written and scientifically sound manuscript about complexities of cellular signaling, especially considering the limitations of in vitro experiments in recapitulating in vivo dynamics.

      Response: We thank the reviewer for carefully reading the manuscript and for this endorsement. We have significantly improved the manuscript thanks to the reviewer’s insightful comments (see below for point-to-point responses).

      Besides addressing the reviewer’s questions, we have further extended our work to investigate how ligand pairs interact across all doses and how those interactions affect stimulus-response specificity. As the reviewer pointed out, experimental studies are limited in recapitulating the multitude of complex physiological contexts. The model is helpful to explore more complex scenarios beyond the feasibility of in-vitro experimental setups. Using computational simulations, we have further explored 360 conditions generated from 10 ligand pairs, each evaluated at 6 doses spanning non-responsive to saturating levels, and with each condition considered 1000 cells to capture the heterogeneity of the population.

      From this extended analysis, we identified the mechanistic bases for observations of both synergy and antagonism. Synergy for certain low-dose ligand combinations can be explained by ultrasensitive IKK activation (Figure 4), while antagonism between LPS and Pam arises from competition for the cofactor CD14 (Figure 5). We show that these phenomena are dependent on the signaling network state and therefore are not observed in all cells of the population. We define the network conditions that must be met for antagonism and synergy to occur. Importantly, we then show that antagonism can contribute to stimulus-response specificity in ligand mixtures (Figure 5).

      Here are a few comments and recommendations:

      1. The modeling approach used in this manuscript, while interesting, might need further validation. Inferring multi-ligand receptor parameters by matching single-ligand cells on core-module similarity may not capture true co-variation in receptor expression or adaptor availability. Single cell measurements of receptor expressions could be done (e.g. via flow cytometry) to ground this assumption in real data. If the authors think this is out of scope for this manuscript, they could fit core-matched single cell models with two receptor modules from scratch to the two-ligand experimental data. Would this fitted model produce similar receptor parameters compared to the presented approach? At least the authors should add a bit more explanation for why their modeling approach is better (or valid) than fitting the models with 2/3/4/5 receptor modules from scratch to the experimental data.

      Response: We thank the reviewer for this comment, this helped us improve the explanation of the methodology, the rationale, and the validation. The methodology is based on the well-established statistical method of nearest-neighbor hot-deck imputation (Andridge and Little, 2010). In this implementation, the core module functions as a stabilizing “anchor” (common variables) to harmonize various receptor-specific parameter distributions. Similar methodologies have been successfully applied to correct batch effects or integrate single-cell RNAseq datasets using anchor cell types (Stuart et al., 2019). Our workflow has been validated on single-ligand stimuli conditions in a previous study (Guo et al., 2025) (See below 3rdparagraph). Here, we used this method to generate predictions for ligand mixtures and have validated them with experimental studies of the dual-ligand stimuli, and we found that our predictions align well with the experimental data. As the reviewer suggested in point 3, in the revision, we also added experimental validation on the binary classifiers of macrophage determines whether specific stimuli are presented in the ligand mixture. The question we are interested in in this work is how macrophage process ligand-specific information in the context of ligand mixtures. For this question, the experimental results align with the model predictions, reaching consistent conclusions.

      In the revision, we have explained the rationale for using the nearest-neighbor hot-deck imputation by matching cells with similar core module (Lines 143-150).

      Previous work determined parameter distributions for only the cognate receptor module (and the core module) that provided the best fit for the single ligand experimental data (Figure 1A, Step 1), and other receptor modules parameter information is missing. To simulate stimulus responses to more than two ligands, we imputed the other ligand–receptor module parameters using shared core-module parameters as common variables and employing nearest-neighbor hot-deck imputation (35). In this setup, the core module functions as an “anchor” to harmonize two or more receptor-specific parameter distributions. This was achieved by by minimizing Euclidean distance between the core module parameters associated with the independently parameterized single-ligand models (Figure 1A, Step 2).

      In Guo et al. (2025) (see Supplementary Figure S11), the nearest-neighbor hot-deck imputation approach (core module similarity matching method) was compared with other approaches, including random matching and rescaled-similarity matching. The results show that, after matching, the core module method best preserves the single-ligand stimulus signaling codon distributions. For the reviewer’s convenience, we have also appended the figure in the response to Reviewer 1, Comment 11.

      The advantage of our workflow is that it does not need to be fit to new experimental data and still gives reliable predictions on signaling dynamics. For the reviewer’s interest, we have tried to fit core-matched single cell models with two receptor modules. As fitting parameters require sufficiently large and high-quality datasets, single-ligand stimulation data with more than 1,000 cells can be adequate to estimate 6~7 parameters (Guo et al., 2025) (approx. 1400 cells to 2000 cells per ligand). However, our current experimental dataset for combinatorial-ligand conditions contains only 500~1,000 cells, and we have tested these datasets but results show a poor fit of heterogeneous signaling dynamics. This is due to an insufficient number of cells for estimating 8~10 parameters. We estimate that at least ~1,500 cells would be needed for reliable parameter estimation under dual-ligand stimulation (and more cells may be needed for combinatorial ligand stimuli involving more ligands). This is currently not feasible to obtain for mixed ligands given the large number of combinatorial conditions.

      Overall, in this paper, the nearest-neighbor hot-deck imputation approach is presented as a feasible and acceptable approach that best reflects our current understanding of the signaling network. Importantly, it helps identify potential gaps by highlighting discrepancies between model predictions and experimental observations.

      (a) The refined model posits competitive, saturable endosomal transport for CpG and pIC, but no direct measurements of endosomal uptake rates or compartmental saturation thresholds are provided, leaving the Hill parameters under-constrained. The authors could produce dose-response curves for CpG and pIC individually and in combination across a range of concentrations to fit the Hill parameters for competitive uptake. (b) If this is out of scope for this paper, the authors should at least comment on why the endosome hypothesis is better than others e.g. crosstalks and other parallel pathway activations. Especially given that even the refined model simulations with Hill equations for CpG and pIC do not quite match with the experimental data (Fig 2 B,E).

      Response: (a) The reviewer’s comments helped us to improve our work by employing the Michaelis-Menten Kinetics for substrate competition reactions, which increases the mathematic rigor of the CpG-pIC competition model. In this updated model, there is no free parameters to tune, as all the Vmax, Kd, should be consistent with the single-ligand scenario. And the Hill is same as single-ligand case, equal to 1.

      The comments on examining dose-response curves for CpG and pIC inspired us to extend the dose-response curves for all ligand pair combination, allowing us to identify the synergy in low-dose ligand pairs and antagonism for high-dose LPS-Pam, besides CpG-pIC (new Figure 4 & 5).

      (b) Regarding alternative hypotheses for antagonism—such as crosstalk or parallel-pathway activation: any antagonistic effect would have to arise from negative regulation acting within the first 30 min. However, IκBα-mediated feedback only becomes appreciable after ~30 min (Hoffmann et al., 2002), and A20-dependent attenuation requires ≥2 h (Werner et al., 2005). Beyond these delayed feedback, NFκB activation depends primarily on phosphorylation and K63-linked ubiquitination, for which no mechanism produces true antagonism; at most, combinatorial inputs saturate the response to the level of the strongest single ligand. We have added this rationale to the Discussion to explain why we favor the endosome saturation hypothesis over other mechanisms (Lines 459-465). While this may not capture every nuance, it represents the simplest model extension capable of reproducing the observed antagonism.

      Authors asses the distinguishability of single-ligand stimuli and combinatorial ligands stimuli using the simulations from the refined model. While this is informative, the simulated data could propagate deviations from the experimental data to the classifiers. How would the classifiers fare when the experimental data is used to assess the single-stimulus distinguishability? The authors could use the experimental data they already have and confirm their main claim of the paper, that cells retain stimulus-response specificity even with multiple ligand exposure. In short, how would Fig 3E look when trained/validated on available experimental data?

      Response: We thank the reviewer’s valuable comments, and they helped us strengthen the rigor of our analysis by incorporating cross-model testing. Specifically, we refined our analysis of ligand presence/absence classification by including ROC AUC and balanced accuracy metrics. This adjustment accounts for the fact that the experimental data did not cover all combinatorial conditions, thereby mitigating potential biases from data imbalance and threshold choice. The experimental results are qualitatively consistent with the simulations, though—as expected—they show somewhat lower ligand distinguishability compared to the noise-free simulated dataset. We have updated Figures 3E–F (previously Figure 3E), added Figure S8, and revised the manuscript accordingly (Lines 292–301). For the reviewer’s convenience, we have also pasted in the revised manuscript text below.

      “Classifiers trained to distinguish TNF-present from TNF-absent conditions achieved a Receiver Operating Characteristic-Area Under the Curve (ROC AUC) of 0.96, significantly above the 0.5 baseline (Figure 3D, Figure S8A). Extending this analysis to other ligands, cells detected LPS (0.85), Pam (0.84), pIC (0.73), and CpG (0.63) in mixtures (Figure 3D, S8A). Using experimental data from double- and triple-ligand stimuli (Figure 1D), ROC AUC values were TNF 0.74, LPS 0.74, Pam 0.66, pIC 0.75, and CpG 0.66 (Figure 3E, S8B). Classifier accuracies yielded consistent results (Figure S8C-D). These results indicated a remarkable capability of preserving ligand-specific dynamic features within complex NFκB signal trajectories that enable nuclear detection of extracellular ligands even in complex stimulus mixtures.”

      While the approach of presented here with multiple simultaneous ligand exposures is a major step towards the in vivo-like conditions, the temporal aspect is still missing. That is, temporal phasing i.e. sequential exposure to multiple ligands as one would expect in vivo rather than all at once. This is probably out of scope for this paper but the authors could comment how how their work could be taken forward in such direction and would the SRS be better or worse in such conditions. Response: We thank the reviewer for this insightful comment. We have added “the temporal aspect of multiple ligand exposures” to the discussion (Lines 503-510), and we pasted the corresponding paragraph here for reviewer’s references (black fonts are previous version, and blue fonts is the revised new texts):

      Cells may be expected to interpret not only the combination of signals but also their timing and duration to mount appropriate transcriptional responses (58, 59). For example, acute inflammation integrates pathogen-derived cues with pro- and anti-inflammatory signals over a timeframe of hours to days (58), to coordinate the pathogen removal and tissue repairing process. Investigating sequential stimulus combinations in our model is therefore crucial for understanding how cells process complex physiological inputs. Simulations that account for longer timescales may require additional feedback mechanisms, as described in some of our previous studies for NFκB (15, 60). **

      There is no caption for Figure 3F in the figure legend nor a reference in the main text.

      Response: In the revised manuscript we actually removed Figure 3F.

      Significance

      General assessment: This is a good manuscript in it's present form which could get better with revision. There needs more supporting data and validation to back the main claim presented in the manuscript.

      Significance/impact/readership: When revised this manuscript could be of interest to a broad community involving single cells biology, cell and immune signaling, and mathematical modeling. Especially the models presented here could be used a starting point to more complex and detailed modeling approaches.

      Response: We thank the reviewer for this endorsement. The reviewer’s constructive suggestion helped us significantly improve the clarity and rigor of our main conclusion.

      In summary, we have strengthened the computational framework in several ways. We improved the model’s fit to experimental single-ligand training data and reformulated the antagonistic CpG-pIC model using Michaelis–Menten kinetics, thereby reducing parameter arbitrariness and increasing mechanistic interpretability. These changes led to better agreement between model predictions and experimental observations for combinatorial ligand responses (Updated Figure 2 and Figure S2), which we hope will further increase experimentalists’ confidence in the modeling results. We have also validated one key conclusion (“cells retain stimulus-response specificity even with multiple ligand exposure”) using the experimental dataset, and it aligns with the model predictions.

      In addition, we have further extended our analysis and the scope. Inspired by the reviewer’s advice (and Reviewer 3’s comment 1b) on dose-combination study for CpG-pIC pair, we expanded our research to dose-response relationships for all dual-ligand combinations (Lines 302-406, Figure 4-5). This additional comprehensive analysis allowed us to identify the mechanism of synergistic and antagonistic effects in single-cell responses and to pinpoint the corresponding dose ranges among different ligand pairs.

      Interestingly, we found that IKK ultrasensitive activation may lead to low-dose ligand combinations synergistic response for single cells. We also found that CD14 uptake competition between LPS and Pam may lead to antagonistic/non-integrative combination. Our simulation-based finding of non-integrative combination of LPS-Pam stimuli aligns with previous independent experimental finding of non-integrative response for LPS and Pam combination (Kellogg et al., 2017), and this independent experimental study validated our model prediction.

      We further analyzed stimulus-response specificity under conditions predicted to exhibit synergy or antagonism. Our results indicate that antagonistic combinations of ligands can increase stimulus-response specificity in the context of ligand mixtures.

      Reviewer #3

      Evidence, reproducibility and clarity

      The authors investigate experimentally single macrophages' NF-kB responses to five ligands, separately and to 3 pairs of ligands. Using the single ligand stimulations, they train an existing mathematical model to replicate single-cell NF-kB nuclear trajectories. From what I understand, for each single cell trajectory in response to a given ligand, the best fit parameters of the core module and the receptor module (specific for the given ligand) are found.

      Then (again, from what I understand), single ligand models are used to generate responses to combinations of ligands. The parametrizations of single ligand models (to be combined) are chosen to have the most similar core modules. It is not described how the responses to more than one ligand are calculated - I expect that respective receptor modules work in parallel, providing signals to the core module. After observing that the response to CpG+pIC is lower (in terms of duration and total) than for CpG alone, the model is modified to account for competition for endosomal transport required by both ligands.

      Having the trained model, simulations of responses to all 31 combinations of ligands are performed, and each NF-κB trajectory is described by six signaling codons-Speed, Peak, Duration, Total, Early vs. Late, and Oscillations. Next, these codons are used to reconstruct (using a random forest model) the stimuli (which may be the combination of ligands). The single and even the two ligand stimuli are relatively well recognized, which is interpreted as the ability of macrophages to distinguish ligands even if present in combination.

      We thank the reviewer for careful reading of the manuscript.

      Major comments

      1) The demonstrated ability to recognize stimuli is based on several key assumptions that can hardly be met in reality.

      Response: We thank the reviewer for this comment, which prompted us to carefully reflect on the rigor of our work, inspired us to extend our analysis to a broad range of ligand-dose combinations, and helped us improve clarifying the limitations of our approach. Please see our detailed responses below.

      a) The cell knows the stimulation time, and then it can use speed as a codon. Look on fig. S4A: The trajectories in response to plC are similar to those in response to TNF, but just delayed. Response: We thank the reviewer for this comment. We updated the model parameterization to better fit to the single-ligand pIC condition (Lines 557-559). In the updated model, the simulated responses to TNF and pIC are quite different (Fig. S2A-B, Fig. S5A-B). Specifically, the Peak, Duration, EarlyVsLate, and Total signaling codons have different values. In addition, the literature suggests that timing difference of NFκB activation are sufficient to elicit differences in downstream gene expression responses, especially for the early response genes (ERG) and intermediate response genes (ING) (Figure 1 in Ando, et al, 2021). For reviewer’s convenience, we have also appended the figures. Specifically, within the first 60 minutes, ctrl exhibit higher Speed of NFκB activation, and the NFκB regulated ERG and ING show differences in the first 60 minutes (Below Fig 1a,b). Ando et al then identified the gene regulatory mechanism that is able to distinguish between differences in the Speed codon. Importantly, this mechanism does not require knowledge of t=0, i.e. when the timer was started.

      The signaling codon Speed, which is based on derivatives, is one way to quantify such timing differences in activation. It was selected from a library of more than 900 different dynamic features using an information maximizing algorithm (Adelaja et al., 2021). It is possible that other ways of measuring time, e.g. time to half-max, might not be distinguished that well by these regulatory mechanisms.

      b) The increase of stimulus concentration typically increases Peak, Duration, and Total, so a similar effect can be achieved by changing the ligand or concentration. Response: This (“the increase of stimulus concentration typically increases Peak, Duration, and Total”) is not an assumption. What the reviewer described (“a similar effect can be achieved by changing the ligand or concentration”) may occur or may not. The six informative signaling codons can vary under different ligands or doses. For example, with increasing doses of Pam, the NFκB response shows a higher peak, potentially making it appear more like LPS stimulation. However, as the Pam dose increases, the response duration decreases, which distinguishes it from LPS stimulation (See experimental data shown in Figure 4A, second row, and Figure 3A, second row in Luecke et al., (2024), we also pasted the corresponding figure below for reviewer’s convenience).

      Figure 4A and Figure 3A from Luecke et al., (2024). Figure 4A: NFκB activity dynamics in the single cells in response to 0, 0.01, 0.1, 1, 10, and 100 ng/ml P3C4 stimulation. Eight hours were measured by fluorescence microscopy of reporter hMPDMs. Each row of the heatmap represents the p38 or NFκB signaling trajectory of one cell. Trajectories are sorted by the maximum amplitude of p38 activity. Data from two pooled biological replicates are depicted. Total # of cells: 898, 834, 827, 787, 778, and 923. Figure 3A: NFκB activity dynamics in the single cells in response to 100 ng/ml LPS stimulation. Eight hours were measured by fluorescence microscopy of reporter hMPDMs. Each row of the heatmap represents the NFκB signaling trajectory of one cell (with p38 measured shown in the original paper). Trajectories are sorted by the maximum amplitude of p38 activity. Data from two pooled biological replicates are depicted.

      Inspired by the reviewer’s comment (and also Reviewer 2’s comments), in the revision, we expanded our research to dose-response relationships for all dual-ligand combinations (Lines 302-406, Figure 4-5). This additional comprehensive analysis allowed us to identify the mechanism of synergistic and antagonistic effects in single-cell responses and to pinpoint the corresponding dose ranges among different ligand pairs.

      Interestingly, we found that IKK ultrasensitive activation may lead to synergistic responses to low-dose ligand combinations but only in a subset of single cells. We also found that CD14 uptake competition between LPS and Pam may lead to antagonistic/non-integrative combination. Our simulation-based finding of non-integrative combination of LPS-Pam stimuli aligns with previous independent experimental findings of non-integrative response for LPS and Pam combination (Kellogg et al., 2017).

      c) Distinguishing a given ligand in the presence of some others, even stronger bases, on the assumption that these ligands were given at the same time, which is hardly justified. Response: We agree with the reviewer that ligands could be given at different times. Considering time delays between ligands (the inset and also removal) dramatically adds to the combinatorial complexity. Some initial studies by the Tay lab are beginning to explore some scenarios of time-shifted ligand pairs (Wang et al 2025). Here we focus on a systematic exploration of all ligand combinations at 6 different doses. The fact that we do not consider time delays is not an assumption but admittedly a limitation that may well be addressed in future studies. We have included a brief discussion of this issue in the discussion (Lines 503-514). We’ve appended here for reviewer’s convenience.

      Cells may be expected to interpret not only the combination of signals but also their timing and duration to mount appropriate transcriptional responses (Kumar et al., 2004; Son et al., 2023). For example, acute inflammation integrates pathogen-derived cues with pro- and anti-inflammatory signals over a timeframe of hours to days (Kumar et al., 2004), to coordinate the pathogen removal and tissue repairing process. Investigating sequential stimulus combinations in our model is therefore crucial for understanding how cells process complex physiological inputs. Simulations that account for longer timescales may require additional feedback mechanisms, as described in some of our previous studies for NFκB (Werner et al., 2008, 2005).

      We would like to suggest that despite (or maybe because) limiting our study to coincident stimuli, we made some noteworthy discoveries.

      2) For single ligands, it would be nice to see how the random forest classifier works on experimental data, not only on in silico data (even if generated by a fitted model).

      Response: This comment and Reviewer 2 comment 3 have helped us strengthen the rigor of our analysis by incorporating cross-model testing. We pasted the response below.

      Specifically, we refined our analysis of ligand presence/absence classification by including ROC AUC and balanced accuracy metrics. This adjustment accounts for the fact that the experimental data did not cover all combinatorial conditions, thereby mitigating potential biases from data imbalance and threshold choice. The experimental results are qualitatively consistent with the simulations, though—as expected—they show somewhat lower ligand distinguishability compared to the noise-free simulated dataset. We have updated Figures 3E–F (previously Figure 3E), added Figure S8, and revised the manuscript accordingly (Lines 292–301). For the reviewer’s convenience, we have also included the revised manuscript text below.

      “Classifiers trained to distinguish TNF-present from TNF-absent conditions achieved a Receiver Operating Characteristic-Area Under the Curve (ROC AUC) of 0.96, significantly above the 0.5 baseline (Figure 3D, Figure S8A). Extending this analysis to other ligands, cells detected LPS (0.85), Pam (0.84), pIC (0.73), and CpG (0.63) in mixtures (Figure 3D, S8A). Using experimental data from double- and triple-ligand stimuli (Figure 1D), ROC AUC values were TNF 0.74, LPS 0.74, Pam 0.66, pIC 0.75, and CpG 0.66 (Figure 3E, S8B). Classifier accuracies yielded consistent results (Figure S8C-D). These results indicated a remarkable capability of preserving ligand-specific dynamic features within complex NFκB signal trajectories that enable nuclear detection of extracelular ligands even in complex stimulus mixtures.”

      3) My understanding of ligand discrimination is such that it is rather based on a combination of pathways triggered than solely on a single transcription factor response trajectory, which varies with ligand concentration and ligand concentration time profile (no reason to assume it is OFF-ON-OFF). For example, some of the considered ligands (plC and CpG) activate IRF3/IRF7 in addition to NF-kB, which leads to IFN production and activation of STATs. This should at least be discussed.

      Response: We thank the reviewer for this comment and fully agree. In the previous version, we discussed different signaling pathways combinatorically distinguishing stimulus. In the revision, we have extended this discussion to include the example of pIC and CpG activation, as suggested (Lines 515-522). We pasted the corresponding text below.

      Furthermore, innate immune responses do not solely rely on NFκB but also involve the critical functions of AP1, p38, and the IRF3-ISGF3 axis. The additional pathways are likely activated in a coordinated manner and provide additional information (Luecke et al., 2021). This is exemplified by the studies demonstrating synergistic effects between CpG and pIC in inhibiting tumor growth and promoting cytokine production (Huang et al., 2020), such as IFNβ and TNFα, whose expression is also regulated by the IRF and MAPK signaling pathways (Luecke et al., 2021; Sheu et al., 2023). Therefore the inclusion of parallel pathways of AP1 and MAPK, as well as the type I interferon network (Cheng et al., 2015; Davies et al., 2020; Hanson and Batchelor, 2022; Luecke et al., 2024; Paek et al., 2016; Peterson et al., 2022) are next steps for expanding the mathematical models presented here.”

      Technical comments

      1) Reference 25: X. Guo, A. Adelaja, A. Singh, W. Roy, A. Hoffmann, Modeling single-cell heterogeneity in signaling dynamics of macrophages reveals principles of information transmission. Nature Communications (2025) does not lead to any paper with the same or a similar title and author list. This Ref is given as a reference to the model. Fortunately, Ref 8 is helpful. Nevertheless, authors should include a schematic of the model.

      Response: We apologize for the paper not being accessible on time. It is now. We have also added a schematic of the model as suggested (Figure S1) and have added detailed description of the model and simulations in introduction (Lines 95-106), results (Lines 129-141), and methods (Simulation of heterogenous NFκB dynamical responses).

      2) Also Mendeley Data DOI:10.17632/bv957x6frk.1 and GitHub https://github.com/Xiaolu-Guo/Combinatorial_ligand_NFkB lead to nowhere.

      Response: We thank the reviewer for this comment, and we have made the GitHub codes public. Mendeley Data DOI:10.17632/bv957x6frk.1 can be accessed via the shared link: https://data.mendeley.com/preview/bv957x6frk?a=6d56e079-d7b0-482e-951f-8a8e06ee8797

      and will be public once the paper accepted.

      3) Dataset 1 is not described. Possibly it contains sets of parameters of receptor modules (different numbers of sets for each module, why?), but the names of parameters never appear in the text, which makes it impossible to reproduce the data.

      Response: We thank the reviewer for this comment, and we have added the description of the dataset (S3 SupplementaryDataset2_NFkB_network_single_cell_parameter_distribution.xlsx) and added the parameter names in the methods (Simulation of heterogenous NFκB dynamical responses).


      4) It is difficult to understand how the simulations in response to more than one ligand are performed.

      Response: We thank the reviewer for this comment, and we have improved the explanation of the methods (Results, Lines 145-152) and included a detailed description of the model and simulations for combinatorial ligands (Methods, Predicting heterogeneous single-cell responses to combinatorial-ligand stimulation).

      Significance

      A lot of work has been done, the methodology is interesting, but the biological conclusions are overstated.

      Response: We thank the reviewer for their interest in the methodology. We have revised the title, the abstract, and added the discussion about our finding to more accurately document what we have found. In the revision, we have increased the clarity and rigor of the work. For the key conclusion that macrophages maintain some level of NFκB signaling fidelity in response to ligand mixtures, we have validated the binary classifier results on experimental data as reviewer suggested.

      In the revision, we have also extended our methodology to explore further, the dose-response curves for different dosage combination for ligand pairs. This further work allowing us identified the synergistic and antagonistic regimes. By comparing the stimulus response specificity for antagonistic model vs the non-antagonistic model, we demonstrated that signaling antagonism may increase the distinguishability of presence or absence of specific ligands within complex ligand mixtures. This provides a mechanism of how signaling fidelity is maintained to the surprising degree we reported.

      REFERENCES

      Adelaja, A., Taylor, B., Sheu, K.M., Liu, Y., Luecke, S., Hoffmann, A., 2021. Six distinct NFκB signaling codons convey discrete information to distinguish stimuli and enable appropriate macrophage responses. Immunity 54, 916-930.e7. https://doi.org/10.1016/j.immuni.2021.04.011

      Akira, S., Takeda, K., 2004. Toll-like receptor signalling. Nat Rev Immunol 4, 499–511. https://doi.org/10.1038/nri1391

      Andridge, R.R., Little, R.J.A., 2010. A Review of Hot Deck Imputation for Survey Non-response. Int Stat Rev 78, 40–64. https://doi.org/10.1111/j.1751-5823.2010.00103.x

      Cheng, Z., Taylor, B., Ourthiague, D.R., Hoffmann, A., 2015. Distinct single-cell signaling characteristics are conferred by the MyD88 and TRIF pathways during TLR4 activation. Sci Signal 8, ra69. https://doi.org/10.1126/scisignal.aaa5208

      Davies, A.E., Pargett, M., Siebert, S., Gillies, T.E., Choi, Y., Tobin, S.J., Ram, A.R., Murthy, V., Juliano, C., Quon, G., Bissell, M.J., Albeck, J.G., 2020. Systems-Level Properties of EGFR-RAS-ERK Signaling Amplify Local Signals to Generate Dynamic Gene Expression Heterogeneity. Cell Systems 11, 161-175.e5. https://doi.org/10.1016/j.cels.2020.07.004

      Guo, X., Adelaja, A., Singh, A., Roy, W., Hoffmann, A., 2025a. Modeling single-cell heterogeneity in signaling dynamics of macrophages reveals principles of information transmission. Nature Communications.

      Guo, X., Adelaja, A., Singh, A., Wollman, R., Hoffmann, A., 2025b. Modeling heterogeneous signaling dynamics of macrophages reveals principles of information transmission in stimulus responses. Nat Commun 16, 5986. https://doi.org/10.1038/s41467-025-60901-3

      Hanson, R.L., Batchelor, E., 2022. Coordination of MAPK and p53 dynamics in the cellular responses to DNA damage and oxidative stress. Molecular Systems Biology 18, e11401. https://doi.org/10.15252/msb.202211401

      Huang, Y., Zhang, Q., Lubas, M., Yuan, Y., Yalcin, F., Efe, I.E., Xia, P., Motta, E., Buonfiglioli, A., Lehnardt, S., Dzaye, O., Flueh, C., Synowitz, M., Hu, F., Kettenmann, H., 2020. Synergistic Toll-like Receptor 3/9 Signaling Affects Properties and Impairs Glioma-Promoting Activity of Microglia. J. Neurosci. 40, 6428–6443. https://doi.org/10.1523/JNEUROSCI.0666-20.2020

      Kellogg, R.A., Tian, C., Etzrodt, M., Tay, S., 2017. Cellular Decision Making by Non-Integrative Processing of TLR Inputs. Cell Rep 19, 125–135. https://doi.org/10.1016/j.celrep.2017.03.027

      Kumar, R., Clermont, G., Vodovotz, Y., Chow, C.C., 2004. The dynamics of acute inflammation. Journal of Theoretical Biology 230, 145–155. https://doi.org/10.1016/j.jtbi.2004.04.044

      Luecke, S., Guo, X., Sheu, K.M., Singh, A., Lowe, S.C., Han, M., Diaz, J., Lopes, F., Wollman, R., Hoffmann, A., 2024. Dynamical and combinatorial coding by MAPK p38 and NFκB in the inflammatory response of macrophages. Molecular Systems Biology 20, 898–932. https://doi.org/10.1038/s44320-024-00047-4

      Luecke, S., Sheu, K.M., Hoffmann, A., 2021. Stimulus-specific responses in innate immunity: Multilayered regulatory circuits. Immunity 54, 1915–1932. https://doi.org/10.1016/j.immuni.2021.08.018

      Paek, A.L., Liu, J.C., Loewer, A., Forrester, W.C., Lahav, G., 2016. Cell-to-Cell Variation in p53 Dynamics Leads to Fractional Killing. Cell 165, 631–642. https://doi.org/10.1016/j.cell.2016.03.025

      Peterson, A.F., Ingram, K., Huang, E.J., Parksong, J., McKenney, C., Bever, G.S., Regot, S., 2022. Systematic analysis of the MAPK signaling network reveals MAP3K-driven control of cell fate. Cell Systems 13, 885-894.e4. https://doi.org/10.1016/j.cels.2022.10.003

      Sheu, K.M., Guru, A.A., Hoffmann, A., 2023. Quantifying stimulus-response specificity to probe the functional state of macrophages. Cell Systems 14, 180-195.e5. https://doi.org/10.1016/j.cels.2022.12.012

      Son, M., Wang, A.G., Keisham, B., Tay, S., 2023. Processing stimulus dynamics by the NF-κB network in single cells. Exp Mol Med 55, 2531–2540. https://doi.org/10.1038/s12276-023-01133-7

      Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W.M., Hao, Y., Stoeckius, M., Smibert, P., Satija, R., 2019. Comprehensive Integration of Single-Cell Data. Cell 177, 1888-1902.e21. https://doi.org/10.1016/j.cell.2019.05.031

      Werner, S.L., Barken, D., Hoffmann, A., 2005. Stimulus Specificity of Gene Expression Programs Determined by Temporal Control of IKK Activity. Science 309, 1857–1861. https://doi.org/10.1126/science.1113319

      Werner, S.L., Kearns, J.D., Zadorozhnaya, V., Lynch, C., O’Dea, E., Boldin, M.P., Ma, A., Baltimore, D., Hoffmann, A., 2008. Encoding NF-kappaB temporal control in response to TNF: distinct roles for the negative regulators IkappaBalpha and A20. Genes Dev 22, 2093–2101. https://doi.org/10.1101/gad.1680708

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors extend an existing mathematical model of NFkB signalling under stimulation of various single receptors, to model that describes responses to stimulation of multiple receptors simultaneously. They compare this model to experimental data derived from live-cell imaging of mouse macrophages, and modify the model to account for potential antagonism between TLR3 and TLR9 response due to competition for endosomal transport. Using this framework they show that, despite distinguishability decreasing with increasing numbers of heterogenous stimuli, macrophages are still able in principle to distinguish these to a statistically significant degree. I congratulate the authors on an interesting approach that extends and validates an existing mathematical model, and also provides valuable information regarding macrophage response.

      There are no major issues affecting the scientific conclusions of the paper, however the lack of detail surrounding the mathematical model and the 'signaling codons' that are used throughout the paper make it difficult to read. This is exacerbated by the fact that I was unable to find Ref 25 which apparently describes the model, however I was able to piece together the essential components from the description in Ref 8 and the supplementary material.

      Lots of the minor comments below stem from this, however there are also a few other places that could benefit from some additional clarification and explanation.

      Significance:

      '...it remains unclear complex...' -> '...it remains unclear whether complex...'

      Introduction: 'temporal dynamics of NFkB' - it would be good to be more concrete regarding the temporal dynamics of what aspect of this (expression, binding, conformation, etc), if possible.

      'signaling codons' - the behaviour of these is key to the entire paper, so even if they are well described in the reference, it would be good to have a short description as early as possible so that the reader can get an idea in their mind what exactly is being discussed here. Later, it would be good to have concrete description of exactly what these capture.

      'This challenge...population of macrophages' - this seems a bit out of place, and is a bit of a run on sentence, so I suggest moving this to the next paragraph and working it into the first sentence there '...regulatory mechanisms, and this challenge could be addressed with a model parameterised to account for heterogeneous...Early models ...', or something similar.

      Ref 25: I can't find a paper with this title anywhere, so if it's an accepted preprint then it would be good to have this available as well. That said, I still think it would be difficult to grasp the work done in this paper without some description of the mathematical model here, at least schematically, if not the full set of ODEs. For example, there are numerous references to how this incorporates heterogeneous responses, the 'core module', etc, and the reader has no context of these if they aren't familiar with the structure of the model.

      'A key challenge which is...' -> 'A key challenge is...'

      'With model simulation ...' -> a bit of a run on sentence, I suggest breaking after 'conditions'.

      Results:

      This section would benefit from a more in-depth description of the model and experimental setup. In particular for the experiment, the reader never really knows what this workflow for this is, nor what the model ingests as input, and what the predictions are of.

      '..mechanistic model was trained...' - trained in this study, or in the previous referenced study?

      'determined parameter distributions' - this is where it would be good to have more background on the model. What parameters are these, and what do they correspond to biologically? It would also be nice to see in the methods or supplementary material how this is done (maximum likelihood, etc).

      'matching cells with similar core model...' - it's difficult to follow the logic as to why this is done, so I think this needs to be a little clearer. My guess would be that the assumption is that simulated cells with similar 'core' parameters have a similar downstream signalling response, and therefore the receptors can be 'transplanted'. So it would be nice to see exactly what these distributions are and what the effect of a bad match would be.

      Some explanation of how this relates to the experimental data the parameters are fit on would also be useful. Is there a correspondence between individual simulated cells and the experimental data for the single ligand stimulation, and then the smallest set of these is taken? Is there also a matching from the simulated multi-receptor modules and the multi-receptor data, and if so, is this done in the same way?

      'six signaling codons' - here it would be good to recapitulate what these represent, but also what the 'strength' and 'activity' correspond to (total integrated value, maximum value, etc)

      'pre-defined thresholds' - no need to state these numerically in the text (although giving some sense of how/why these were chosen would give some context), but I couldn't find the values of these, nor values corresponding to the signaling codons.

      'non-responder cells are likely a result of cellular heterogeneity in receptor modules rather than the core module' - is this the 'ill health' referenced earlier? If so make this clear.

      It's also very difficult to follow this chain of logic, given that the reader at this point doesn't have any knowledge of what the 'core' module is, nor the significance of the thresholds on the signaling codons. I would suggest making this much clearer, with reference to each of these.

      '...but the model represented these as independent mass action reactions' - the significance of this may not be clear to someone not familiar with biophysical models, so probably better to make it explicit.

      '...we trained a random forest classifier...' - is this trained on the 'raw' experimental time series data, or on the signaling codons?

      'We also applied a Long Short-Term Memory (LSTM) machine learning model...' - it might be good to reference these three approaches at the beginning of this section, otherwise they seem to come out of the blue a little.

      'We then used machine learning classifiers...' - random forests, LSTMs, or a different model?

      Discussion:

      '...over statistical models...' - suggest maybe 'purely statistical models'

      'We found that endosomal transport...' - A paper by Huang, et. al. (https://www.jneurosci.org/content/40/33/6428) observed a synergistic phagocytic response between CpC and pIC stimulation in microglia. This is still consistent with a saturation effect dependent on dose, but may be worth a mention.

      '...features termed...' -> 'features, termed'

      '...we applied a Long Short-Term Memory (LSTM) machine learning model..' - maybe make clear that this is on the time-series data (also LSTM has already been defined).

      Materials and methods:

      The descriptions in this section are quite vague, so I would suggest expanding this with more detail from the supplementary material, where things are quite well explained.

      'sampling distribution' - not clear what this refers to in this context

      'RelA-mVenus mouse strain' - it would be good to mention the relevance of the reporter for NFkB signaling

      '...A random forest classifier...' -> a random forest classifier

      Significance

      This study provides mechanistically interpretable insight on the important question of how immune cells perform target recognition in realistic scenarios, and also provides validation of existing mathematical models by extending these beyond their original domain. The paper uses 'signaling codons' as a proxy for information processing, however in this instance it is cross-validated with an LSTM model that is applied directly to the time series data. Nevertheless, the scope of the paper is such that it does not deal with the question of how these signals are transmitted or used in a downstream immune response. To my knowledge, this is the first time that a well established existing mathematical model of signalling response has been extended and applied to heterogeneous ligand mixtures. These results will be of interest to those studying immune cell responses, and to those interested in basic research on mathematical models of signaling and cellular information processing more generally.

      My background is in biophysical models, machine learning, and signaling in cancer. I have a basic understanding of immunology, but no experience in experimental cell biology.

    1. Arguments for Utilitarianismfunction togglePlayOrPause(){document.getElementById("player-container").classList.add("show-player"),document.getElementById("audio-icon").outerHTML=""}Table of ContentsIntroduction: Moral Methodology & Reflective EquilibriumArguments for UtilitarianismWhat Fundamentally MattersThe Veil of IgnoranceEx Ante ParetoExpanding the Moral CircleThe Poverty of the AlternativesThe Paradox of DeontologyThe Hope ObjectionSkepticism About the Distinction Between Doing and AllowingStatus Quo BiasEvolutionary Debunking ArgumentsConclusionResources and Further ReadingIntroduction: Moral Methodology & Reflective EquilibriumYou cannot prove a moral theory. Whatever arguments you come up with, it’s always possible for someone else to reject your premises—if they are willing to accept the costs of doing so. Different theories offer different advantages. This chapter will set out some of the major considerations that plausibly count in favor of utilitarianism. A complete view also needs to consider the costs of utilitarianism (or the advantages of its competitors), which are addressed in Chapter 8: Objections to Utilitarianism. You can then reach an all-things-considered judgment as to which moral theory strikes you as overall best or most plausible.To this end, moral philosophers typically use the methodology of reflective equilibrium. 1 1 This involves balancing two broad kinds of evidence as applied to moral theories:Intuitions about specific cases (thought experiments).General theoretical considerations, including the plausibility of the theory’s principles or systematic claims about what matters.General principles can be challenged by coming up with putative counterexamples, or cases in which they give an intuitively incorrect verdict. In response to such putative counterexamples, we must weigh the force of the case-based intuition against the inherent plausibility of the principle being challenged. This could lead you to either revise the principle to accommodate your intuitions about cases or to reconsider your verdict about the specific case, if you judge the general principle to be better supported (especially if you are able to “explain away” the opposing intuition as resting on some implicit mistake or confusion).As we will see, the arguments in favor of utilitarianism rest overwhelmingly on general theoretical considerations. Challenges to the view can take either form, but many of the most pressing objections involve thought experiments in which utilitarianism is held to yield counterintuitive verdicts.There is no neutral, non-question-begging answer to how one ought to resolve such conflicts. 2 2 It takes judgment, and different people may be disposed to react in different ways depending on their philosophical temperament. As a general rule, those of a temperament that favors systematic theorizing are more likely to be drawn to utilitarianism (and related views), whereas those who hew close to common sense intuitions are less likely to be swayed by its theoretical virtues. Considering the arguments below may thus do more than just illuminate utilitarianism; it may also help you to discern your own philosophical temperament!While our presentation focuses on utilitarianism, it’s worth noting that many of the arguments below could also be taken to support other forms of welfarist consequentialism (just as many of the objections to utilitarianism also apply to these related views). This chapter explores arguments for utilitarianism and closely related views over non-consequentialist approaches to ethics.Arguments for UtilitarianismWhat Fundamentally MattersMoral theories serve to specify what fundamentally matters, and utilitarianism offers a particularly compelling answer to this question.Almost anyone would agree with utilitarianism that suffering is bad, and well-being is good. What could be more obvious? If anything matters morally, human well-being surely does. And it would be arbitrary to limit moral concern to our own species, so we should instead conclude that well-being generally is what matters. That is, we ought to want the lives of sentient beings to go as well as possible (whether that ultimately comes down to maximizing happiness, desire satisfaction, or other welfare goods).Could anything else be more important? Such a suggestion can seem puzzling. Consider: it is (usually) wrong to steal. 3 3 But that is plausibly because stealing tends to be harmful, reducing people’s well-being. 4 4 By contrast, most people are open to redistributive taxation, if it allows governments to provide benefits that reliably raise the overall level of well-being in society. So it’s not that individuals just have a natural right to not be interfered with no matter what. When judging institutional arrangements (such as property and tax law), we recognize that what matters is coming up with arrangements that tend to secure overall good results, and that the most important factor in what makes a result good is that it promotes well-being. 5 5Such reasoning may justify viewing utilitarianism as the default starting point for moral theorizing. 6 6 If someone wants to claim that there is some other moral consideration that can override overall well-being (trumping the importance of saving lives, reducing suffering, and promoting flourishing), they face the challenge of explaining how that could possibly be so. Many common moral rules (like those that prohibit theft, lying, or breaking promises), while not explicitly utilitarian in content, nonetheless have a clear utilitarian rationale. If they did not generally promote well-being—but instead actively harmed people—it’s hard to see what reason we would have to still want people to follow them. To follow and enforce harmful moral rules (such as rules prohibiting same-sex relationships) would seem like a kind of “rule worship”, and not truly ethical at all. 7 7 Since the only moral rules that seem plausible are those that tend to promote well-being, that’s some reason to think that moral rules are, as utilitarianism suggests, purely instrumental to promoting well-being.Similar judgments apply to hypothetical cases in which you somehow know for sure that a typically reliable rule is, in this particular instance, counterproductive. In the extreme case, we all recognize that you ought to lie or break a promise if lives are on the line. In practice, of course, the best way to achieve good results over the long run is to respect commonsense moral rules and virtues while seeking opportunities to help others. (It’s important not to mistake the hypothetical verdicts utilitarianism offers in stylized thought experiments with the practical guidance it offers in real life.) The key point is just that utilitarianism offers a seemingly unbeatable answer to the question of what fundamentally matters: protecting and promoting the interests of all sentient beings to make the world as good as it can be.The Veil of IgnoranceHumans are masters of self-deception and motivated reasoning. If something benefits us personally, it’s all too easy to convince ourselves that it must be okay. We are also more easily swayed by the interests of more salient or sympathetic individuals (favoring puppies over pigs, for example). To correct for such biases, it can be helpful to force impartiality by imagining that you are looking down on the world from behind a “veil of ignorance”. This veil reveals the facts about each individual’s circumstances in society—their income, happiness level, preferences, etc.—and the effects that each choice would have on each person, while hiding from you the knowledge of which of these individuals you are. 8 8 To more fairly determine what ideally ought to be done, we may ask what everyone would have most personal reason to prefer from behind this veil of ignorance. If you’re equally likely to end up being anyone in the world, it would seem prudent to maximize overall well-being, just as utilitarianism prescribes. 9 9How much weight we should give to the verdicts that would be chosen, on self-interested grounds, from behind the veil? The veil thought experiment highlights how utilitarianism gives equal weight to everyone’s interests, without bias. That is, utilitarianism is just what we get when we are beneficent to all: extending to everyone the kind of careful concern that prudent people have for their own interests. 10 10 But it may seem question-begging to those who reject welfarism, and so deny that interests are all that matter. For example, the veil thought experiment clearly doesn’t speak to whether non-sentient life or natural beauty has intrinsic value. It’s restricted to that sub-domain of morality that concerns what we owe to each other, where this includes just those individuals over whom our veil-induced uncertainty about our identity extends: presently existing sentient beings, perhaps. 11 11 Accordingly, any verdicts reached via the veil of ignorance will still need to be weighed against what we might yet owe to any excluded others (such as future generations, or non-welfarist values).Still, in many contexts other factors will not be relevant, and the question of what we morally ought to do will reduce to the question of how we should treat each other. Many of the deepest disagreements between utilitarians and their critics concern precisely this question. And the veil of ignorance seems relevant here. The fact that some action is what everyone affected would personally prefer from behind the veil of ignorance seems to undermine critics’ claims that any individual has been mistreated by, or has grounds to complain about, that action.Ex Ante ParetoA Pareto improvement is better for some people, and worse for none. When outcomes are uncertain, we may instead assess the prospect associated with an action—the range of possible outcomes, weighted by their probabilities. A prospect can be assessed as better for you when it offers you greater well-being in expectation, or ex ante. 12 12 Putting these concepts together, we may formulate the following principle:Ex ante Pareto: in a choice between two prospects, one is morally preferable to another if it offers a better prospect for some individuals and a worse prospect for none.This bridge between personal value (or well-being) and moral assessment is further developed in economist John Harsanyi’s aggregation theorem. 13 13 But the underlying idea, that reasonable beneficence requires us to wish well to all, and prefer prospects that are in everyone’s ex ante interests, has also been defended and developed in more intuitive terms by philosophers. 14 14A powerful objection to most non-utilitarian views is that they sometimes violate ex ante Pareto, such as when choosing policies from behind the veil of ignorance. Many rival views imply, absurdly, that prospect Y could be morally preferable to prospect X, even when Y is worse in expectation for everyone involved.Caspar Hare illustrates the point with a Trolley case in which all six possible victims are stuffed inside suitcases: one is atop a footbridge, five are on the tracks below, and a train will hit and kill the five unless you topple the one on the footbridge (in which case the train will instead kill this one and then stop before reaching the others). 15 15 As the suitcases have recently been shuffled, nobody knows which position they are in. So, from each victim’s perspective, their prospects are best if you topple the one suitcase off the footbridge, increasing their chances of survival from 1/6 to 5/6. Given that this is in everyone’s ex ante interests, it’s deeply puzzling to think that it would be morally preferable to override this unanimous preference, shared by everyone involved, and instead let five of the six die; yet that is the implication of most non-utilitarian views. 16 16Expanding the Moral CircleWhen we look back on past moral atrocities—like slavery or denying women equal rights—we recognize that they were often sanctioned by the dominant societal norms at the time. The perpetrators of these atrocities were grievously wrong to exclude their victims from their “circle” of moral concern. 17 17 That is, they were wrong to be indifferent towards (or even delight in) their victims’ suffering. But such exclusion seemed normal to people at the time. So we should question whether we might likewise be blindly accepting of some practices that future generations will see as evil but that seem “normal” to us. 18 18 The best protection against making such an error ourselves would be to deliberately expand our moral concern outward, to include all sentient beings—anyone who can suffer—and so recognize that we have strong moral reasons to reduce suffering and promote well-being wherever we can, no matter who it is that is experiencing it.While this conclusion is not yet all the way to full-blown utilitarianism, since it’s compatible with, for example, holding that there are side-constraints limiting one’s pursuit of the good, it is likely sufficient to secure agreement with the most important practical implications of utilitarianism (stemming from cosmopolitanism, anti-speciesism, and longtermism).The Poverty of the AlternativesWe’ve seen that there is a strong presumptive case in favor of utilitarianism. If no competing view can be shown to be superior, then utilitarianism has a strong claim to be the “default” moral theory. In fact, one of the strongest considerations in favor of utilitarianism (and related consequentialist views) is the deficiencies of the alternatives. Deontological (or rule-based) theories, in particular, seem to rest on questionable foundations. 19 19Deontological theories are explicitly non-consequentialist: instead of morally assessing actions by evaluating their consequences, these theories tend to take certain types of action (such as killing an innocent person) to be intrinsically wrong. 20 20 There are reasons to be dubious of this approach to ethics, however.The Paradox of DeontologyDeontologists hold that there is a constraint against killing: that it’s wrong to kill an innocent person even if this would save five other innocent people from being killed. This verdict can seem puzzling on its face. 21 21 After all, given how terrible killing is, should we not want there to be less of it? Rational choice in general tends to be goal-directed, a conception which fits poorly with deontic constraints. 22 22 A deontologist might claim that their goal is simply to avoid violating moral constraints themselves, which they can best achieve by not killing anyone, even if that results in more individuals being killed. While this explanation can render deontological verdicts coherent, it does so at the cost of making them seem awfully narcissistic, as though the deontologist’s central concern was just to maintain their own moral purity or “clean hands”.Deontologists might push back against this characterization by instead insisting that moral action need not be goal-directed at all. 23 23 Rather than only seeking to promote value (or minimize harm), they claim that moral agents may sometimes be called upon to respect another’s value (by not harming them, even as a means to preventing greater harm to others), which would seem an appropriately outwardly-directed, non-narcissistic motivation.The challenge remains that such a proposal makes moral norms puzzlingly divergent from other kinds of practical norms. If morality sometimes calls for respecting value rather than promoting it, why is the same not true of prudence? (Given that pain is bad for you, for example, it would not seem prudent to refuse a painful operation now if the refusal commits you to five comparably painful operations in future.) Deontologists may offer various answers to this question, but insofar as we are inclined to think, pre-theoretically, that ethics ought to be continuous with other forms of rational choice, that gives us some reason to prefer consequentialist accounts. 24 24Deontologists also face a tricky question about where to draw the line. Is it at least okay to kill one person to prevent a hundred killings? Or a million? Absolutists never permit killing, no matter the stakes. But such a view seems too extreme for many. Moderate deontologists allow that sufficiently high stakes can justify violations. But how high? Any answer they offer is apt to seem arbitrary and unprincipled. Between the principled options of consequentialism or absolutism, many will find consequentialism to be the more plausible of the two.The Hope ObjectionImpartial observers should want and hope for the best outcome. Non-consequentialists claim, nonetheless, that it’s sometimes wrong to bring about the best outcome. Putting the two claims together yields the striking result that you should sometimes hope that others act wrongly.Suppose it would be wrong for some stranger—call him Jack—to kill one innocent person to prevent five other (morally comparable) killings. Non-consequentialists may claim that Jack has a special responsibility to ensure that he does not kill anyone, even if this results in more killings by others. But you are not Jack. From your perspective as an impartial observer, Jack’s killing one innocent person is no more or less intrinsically bad than any of the five other killings that would thereby be prevented. You have most reason to hope that there is only one killing rather than five. So you have reason to hope that Jack acts “wrongly” (killing one to save five). But that seems odd.More than merely being odd, this might even be taken to undermine the claim that deontic constraints matter, or are genuinely important to abide by. After all, to be important just is to be worth caring about. For example, we should care if others are harmed, which validates the claim that others’ interests are morally important. But if we should not care more about Jack’s abiding by the moral constraint against killing than we should about his saving five lives, that would seem to suggest that the constraint against killing is not in fact more morally important than saving five lives.Finally, since our moral obligations ought to track what is genuinely morally important, if deontic constraints are not in fact important then we cannot be obligated to abide by them. 25 25 We cannot be obliged to prioritize deontic constraints over others’ lives, if we ought to care more about others’ lives than about deontic constraints. So deontic constraints must not accurately describe our obligations after all. Jack really ought to do whatever would do the most good overall, and so should we.Skepticism About the Distinction Between Doing and AllowingYou might wonder: if respect for others requires not harming them (even to help others more), why does it not equally require not allowing them to be harmed? Deontological moral theories place great weight on distinctions such as those between doing and allowing harm, or killing and letting die, or intended versus merely foreseen harms. But why should these be treated so differently? If a victim ends up equally dead either way, whether they were killed or “merely” allowed to die would not seem to make much difference to them—surely what matters to them is just their death. Consequentialism accordingly denies any fundamental significance to these distinctions. 26 26Indeed, it’s far from clear that there is any robust distinction between “doing” and “allowing”. Sometimes you might “do” something by remaining perfectly still. 27 27 Also, when a doctor unplugs a terminal patient from life support machines, this is typically thought of as “letting die”; but if a mafioso, worried about an informant’s potentially incriminating testimony, snuck in to the hospital and unplugged the informant’s life support, we are more likely to judge it to constitute “killing”. 28 28 Jonathan Bennett argues at length that there is no satisfactory, fully general distinction between doing and allowing—at least, none that would vindicate the moral significance that deontologists want to attribute to such a distinction. 29 29 If Bennett is right, then that might force us towards some form of consequentialism (such as utilitarianism) instead.Status Quo BiasOpposition to utilitarian trade-offs—that is, benefiting some at a lesser cost to others—arguably amounts to a kind of status quo bias, prioritizing the preservation of privilege over promoting well-being more generally.Such conservatism might stem from the Just World fallacy: the mistake of assuming that the status quo is just, and that people naturally get what they deserve. Of course, reality offers no such guarantees of justice. What circumstances one is born into depends on sheer luck, including one’s endowment of physical and cognitive abilities which may pave the way for future success or failure. Thus, even later in life we never manage to fully wrest back control from the whimsies of fortune and, consequently, some people are vastly better off than others despite being no more deserving. In such cases, why should we not be willing to benefit one person at a lesser cost to privileged others? They have no special entitlement to the extra well-being that fortune has granted them. 30 30 Clearly, it’s good for people to be well-off, and we certainly would not want to harm anyone unnecessarily. 31 31 However, if we can increase overall well-being by benefiting one person at the lesser cost to another, we should not refrain from doing so merely due to a prejudice in favor of the existing distribution. 32 32 It’s easy to see why traditional elites would want to promote a “morality” which favors their entrenched interests. It’s less clear why others should go along with such a distorted view of what (and who) matters.It can similarly be argued that there is no real distinction between imposing harms and withholding benefits. The only difference between the two cases concerns what we understand to be the status quo, which lacks moral significance. Suppose scenario A is better for someone than B. Then to shift from A to B would be a “harm”, while to prevent a shift from B to A would be to “withhold a benefit”. But this is merely a descriptive difference. If we deny that the historically given starting point provides a morally privileged baseline, then we must say that the cost in either case is the same, namely the difference in well-being between A and B. In principle, it should not matter where we start from. 33 33Now suppose that scenario B is vastly better for someone else than A is: perhaps it will save their life, at the cost of the first person’s arm. Nobody would think it okay to kill a person just to save another’s arm (that is, to shift from B to A). So if we are to avoid status quo bias, we must similarly judge that it would be wrong to oppose the shift from A to B—that is, we should not object to saving someone’s life at the cost of another’s arm. 34 34 We should not care especially about preserving the privilege of whoever stood to benefit by default; such conservatism is not truly fair or just. Instead, our goal should be to bring about whatever outcome would be best overall, counting everyone equally, just as utilitarianism prescribes.Evolutionary Debunking ArgumentsAgainst these powerful theoretical objections, the main consideration that deontological theories have going for them is closer conformity with our intuitions about particular cases. But if these intuitions cannot be supported by independently plausible principles, that may undermine their force—or suggest that we should interpret these intuitions as good rules of thumb for practical guidance, rather than as indicating what fundamentally matters.The force of deontological intuitions may also be undermined if it can be demonstrated that they result from an unreliable process. For example, evolutionary processes may have endowed us with an emotional bias favoring those who look, speak, and behave like ourselves; this, however, offers no justification for discriminating against those unlike ourselves. Evolution is a blind, amoral process whose only “goal” is the propagation of genes, not the promotion of well-being or moral rightness. Our moral intuitions require scrutiny, especially in scenarios very different from our evolutionary environment. If we identify a moral intuition as stemming from our evolutionary ancestry, we may decide not to give much weight to it in our moral reasoning—the practice of evolutionary debunking. 35 35Katarzyna de Lazari-Radek and Peter Singer argue that views permitting partiality are especially susceptible to evolutionary debunking, whereas impartial views like utilitarianism are more likely to result from undistorted reasoning. 36 36 Joshua Greene offers a different psychological debunking argument. He argues that deontological judgments—for instance, in response to trolley cases—tend to stem from unreliable and inconsistent emotional responses, including our favoritism of identifiable over faceless victims and our aversion to harming someone up close rather than from afar. By contrast, utilitarian judgments involve the more deliberate application of widely respected moral principles. 37 37Such debunking arguments raise worries about whether they “prove too much”: after all, the foundational moral judgment that pain is bad would itself seem emotionally-laden and susceptible to evolutionary explanation—physically vulnerable creatures would have powerful evolutionary reasons to want to avoid pain whether or not it was objectively bad, after all! 38 38However, debunking arguments may be most applicable in cases where we feel that a principled explanation for the truth of the judgment is lacking. We do not tend to feel any such lack regarding the badness of pain—that is surely an intrinsically plausible judgment if anything is. Some intuitions may be over-determined: explicable both by evolutionary causes and by their rational merits. In such a case, we need not take the evolutionary explanation to undermine the judgment, because the judgment also results from a reliable process (namely, rationality). By contrast, deontological principles and partiality are far less self-evidently justified, and so may be considered more vulnerable to debunking. Once we have an explanation for these psychological intuitions that can explain why we would have them even if they were rationally baseless, we may be more justified in concluding that they are indeed rationally baseless.As such, debunking objections are unlikely to change the mind of one who is drawn to the target view (or regards it as independently justified and defensible). But they may help to confirm the doubts of those who already felt there were some grounds for scepticism regarding the intrinsic merits of the target view.ConclusionUtilitarianism can be supported by several theoretical arguments, the strongest perhaps being its ability to capture what fundamentally matters. Its main competitors, by contrast, seem to rely on dubious distinctions—like “doing” vs. “allowing”—and built-in status quo bias. At least, that is how things are apt to look to one who is broadly sympathetic to a utilitarian approach. Given the flexibility inherent in reflective equilibrium, these arguments are unlikely to sway a committed opponent of the view. For those readers who find a utilitarian approach to ethics deeply unappealing, we hope that this chapter may at least help you to better understand what appeal others might see in the view.However strong you judge the arguments in favor of utilitarianism to be, your ultimate verdict on the theory will also depend upon how well the view is able to counter the influential objections that critics have raised against it.The next chapter discusses theories of well-being, or what counts as being good for an individual.Next Chapter: Theories of Well-BeingHow to Cite This PageChappell, R.Y. and Meissner, D. (2023). Arguments for Utilitarianism. In R.Y. Chappell, D. Meissner, and W. MacAskill (eds.), An Introduction to Utilitarianism, <https://www.utilitarianism.net/arguments-for-utilitarianism>, accessed document.write((new Date).toLocaleDateString("en-US"))2/13/2026.
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents maRQup, a Python pipeline for automating the quantitative analysis of preclinical cancer immunotherapy experiments using bioluminescent imaging in mice. maRQup processes images to quantify tumor burden over time and across anatomical regions, enabling large-scale analysis of over 1,000 mice. The study uses this tool to compare different CAR-T cell constructs and doses, identifying differences in initial tumor control and relapse rates, particularly noting that CD19.CD28 CAR-T cells show faster initial killing but higher relapse compared to CD19.4-1BB CAR-T cells. Furthermore, maRQup facilitates the spatiotemporal analysis of tumor dynamics, revealing differences in growth patterns based on anatomical location, such as the snout exhibiting more resistance to treatment than bone marrow.

      Strengths:

      (1) The maRQup pipeline enables the automatic processing of a large dataset of over 1,000 mice, providing investigators with a rapid and efficient method for analyzing extensive bioluminescent tumor image data.

      (2) Through image processing steps like tail removal and vertical scaling, maRQup normalizes mouse dimensions to facilitate the alignment of anatomical regions across images. This process enables the reliable demarcation of nine distinct anatomical regions within each mouse image, serving as a basis for spatiotemporal analysis of tumor burden within these consistent regions by quantifying average radiance per pixel.

      Weaknesses:

      (1) While the pipeline aims to standardize images for regional assessment, the reliance on scaling primarily along the vertical axis after tail removal may introduce limitations to the quantitative robustness of the anatomically defined regions. This approach does not account for potential non-linear growth across dimensions in animals of different ages or sizes, which could result in relative stretching or shrinking of subjects compared to an average reference.

      Our answer to this comment is included in the Supplemental Methods. The standard deviation of the mouse pixels was calculated to ensure that the image processing steps did not alter the shape or size of the mice. Such consistency is particularly striking because our dataset was accrued by nine lab members over the last five years, before we conceived and carried out our analysis (c.f., answer to point #2). In fact, it is the very consistency of this IVIS measurement that led us to conceive our pipeline. As seen from Supplemental Figure 4G, there is minimal difference in the shape or size of the mice across 7,534 images. A total of 99 images were removed either due to being too slanted (91/7663, 1.2%) or due to processing errors (8/7633, 0.1%). Also, the vertical scaling was conducted while keeping the aspect ratio unchanged to prevent any non-anatomical scaling. Hence, we did not record any nonlinear growth of the mice that would warrant more convoluted alignment and/or batch correction for our images.

      (2) Furthermore, despite excluding severely slanted images, the pipeline does not fully normalize for variations in animal pose during image acquisition (e.g., tucked body, leaning). This pose variability not only impacts the precise relative positioning of internal anatomical regions, potentially making their definition based on relative image coordinates more qualitative than truly quantitative for precise regional analysis, but it also means that the bioluminescent light signal from the tumor will not propagate equally to the camera, as photons will travel differentially through the tissue. This differing light path through tissues due to variable positioning can introduce large variability in the measured radiance that was not accounted for in the analysis algorithm. Achieving more robust anatomical and quantitative normalization might require methods that control animal posture using a rigid structure during imaging.

      Reviewer #1 is correct that different mouse postures would be an issue when aligning the images and normalizing for size. However, all experiments are conducted for luminescence measurements in the IVIS system (i.e., this requires anesthesia and long integration time for imaging). In our experience and in our 1000+ mouse dataset, we noticed that all experiments (n=37) did place the anesthetized mice in a stretched/elongated position. Of note, these experiments were conducted by nine different researchers who were not instructed on how to place the mice on the machine for ideal image processing, thus showing that the standard protocol of imaging mice on IVIS does not introduce large variations in animal pose during image acquisition. We think the issue raised by Reviewer #1 is moot in the context of classical settings for mouse luminescence imaging.

      Reviewer #2 (Public review):

      Summary:

      The authors developed a method that automatically processes bioluminescent tumor images for quantitative analysis and used it to describe the spatiotemporal distribution of tumor cells in response to CD19-targeting CAR-T cells, comprising CD28 or 4-1BB costimulatory domains. The conclusion highlights the dependence of tumor decay and relapse on the number of injected cells, the type of cells, and the initial growth rate of tumors (where initial is intended from the first day of therapy). The authors also determined the spatiotemporal analysis of tumor response to CAR T therapy in different regions of the mouse body in a model of acute lymphoblastic leukemia (ALL).

      Strengths:

      The analysis is based on a large number of images and accounts for many variables. The results of the analysis largely support their claims that the kinetics of tumor decay and relapse are dependent on the CAR T co-stimulatory domain and number of cells injected and tumor growth rates. 

      Weaknesses:

      The study does not specify how a) differences in mouse positioning (and whether they excluded not-aligned mice) and b) tumor spread at the start of therapy influenced their data. The study does not take into account the potential heterogeneity of CAR T cells in terms of CAR T expression or T cell immunophenotype (differentiation, exhaustion, fitness...).

      See answer #2 to Reviewer #1.

      Author response image 1.

      Author response image 1 shows the average tumor radiance on day zero (when CAR-T cell therapy was administered) for all mice. While there is some spread, most mice had tumor localized to the liver or bone marrow.

      Reviewer #3 (Public review):

      Summary:

      The paper "The 1000+ mouse project: large-scale spatiotemporal parametrization and modeling of preclinical cancer immunotherapies" is focused on developing a novel methodology for automatic processing of bioluminescence imaging data. It provides quantitative and statistically robust insights into preclinical experiments that will contribute to optimizing cell-based therapies. There is an enormous demand for such methods and approaches that enable the spatiotemporal evaluation of cell monitoring in large cohorts of experimental animals.

      Strengths:

      The manuscript is generally well written, and the experiments are scientifically sound. The conclusions reflect the soundness of experimental data. This approach seems to be quite innovative and promising to improve the statistical accuracy of BLI data quantification. 

      This methodology can be used as a universal quantification tool for BLI data for in vivo assessment of adoptively transferred cells due to the versatility of the technology.

      Weaknesses: 

      No weaknesses were identified by this Reviewer. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In this paper, the authors propose a significant advancement in optical image data analysis by employing automation. They effectively demonstrate the valuable insights that can be gained from analyzing extensive datasets with a more unbiased methodology. At present, I do not have any specific suggestions for improvement.

      However, it is important to note that this work is limited in its operational scope. Specifically, it relies on predefined ROIs rather than aligning the signal site with anatomical systems. The scaling model and image cropping are simplistic, animal pose is not taken into account, and the data output needs to be called semi-quantitative or qualitative, and would have been stronger utilizing an AI agent. Nevertheless, this work underscores the potential of automated systems in preclinical image analysis, which is a crucial step towards developing more sophisticated approaches to optical image data analysis.

      While our analysis used predefined ROIs, the maRQup pipeline allows users to manually draw ROIs on the mouse image.

      Reviewer #2 (Recommendations for the authors):

      The writing and presentation of data are clear and accurate, but some additional information should be added regarding the imaging protocol used to acquire the original data. 

      The authors mention fluorescence in Figure 1. I expected all the data to be generated from bioluminescent NALM-6 tumors, since bioluminescence is indeed measured in average radiance and can be per pixel (p/sec/cm2/sr/pixel). Fluorescence should be measured using radiance efficiency (p/sec/cm2/sr)/(µW/cm2), a unit that compensates for non-uniform excitation light pattern in the instrument. Would the author find different results if fluorescence data were analyzed separately?

      Reviewer #2 is correct that the unit for fluorescence would be radiance efficiency. The word “fluorescent” was included in the label of Figure 1a  to highlight that our workflow could be applied to other types of light-generating methods (i.e., fluorescence vs. bioluminescence). However, in this study, measurements of bioluminescent tumors only were analyzed. If fluorescence measurements are to be analyzed, our methods of image acquisition and processing would be directly applicable.

      Did the author ever check the signal of the snout in mice with no tumor?

      In mice with no tumor, there is no detectable signal in the snout (or anywhere else, for that matter).

      The urine of mice contains phosphor, and might give a background signal, especially if longer exposure is used at the end of the study.

      For the mice with no tumor injection, the luminescence signal was below background (<10<sup>2</sup> p/sec/cm<sup>2</sup>/sr/pixel). In particular, we do not detect any signal in the bladder/urine. Additionally, as described in the Supplemental Methods and Figure 1b, only pixels that were on the mouse as determined from the brightfield image were used to calculate the tumor burden from the radiance of the luminescent image. This method ensures that any background signal (e.g., from phosphor in mouse urine) would be excluded in the radiance quantification and not bias the results.

      Additionally, as described in the Methods, the exposure time was held constant at 30 seconds for each IVIS measurement across all 37 experiments.

      The data using more than 2 million cells comes from only 10 mice, and maybe the biological relevance of this group is limited since it will not be achievable and translatable in humans (PMID: 33653113).

      We appreciate Reviewer #2’s attention to this issue. The effect observed in our study is large enough to reach statistical significance despite the small number of mice. Note that the dosing regimen used was optimized for the murine NSG model and would require appropriate scaling before clinical application. Nonetheless, NSG mice remain the gold standard for pre‑clinical in vivo evaluation and their use is generally required by regulatory agencies, such as the FDA, for assessing novel CAR‑T cell therapies; thus these findings are relevant for advancing such treatments.

    1. Mary Smith Cranch comments on politics, 1786-87 In the aftermath of the Revolution, politics became a sport consumed by both men and women. In a series of letters sent to her sister, Mary Smith Cranch comments on a series of political events including the lack of support for diplomats, the circulation of paper or hard currency, legal reform, tariffs against imported tea tables, Shays rebellion, and the role of women in supporting the nation’s interests. On foreign policy, pending legislation, and women’s political participation I began to write you last night but my eyes were so poor that I could not continue it. I am now risen with the sun to thank you for the charming budget you have sent me. Such frequent communications shortens the idea of distance by many miles. I believe there have been letters constantly upon the water for each other ever since you left us. The idea of your returning soon to your dear friends here would be a much more joyful one if this country would suffer you first to do all the good your inclinations lead you too, and what they really wish you to do though they put it out of your power to do it. I hope they will come to their senses before winter. The court is adjourned to next January. The House have been disputing half this session whether we should have paper money, any lawyers or any court of common pleas. They voted finally, against paper money, sent up to the Senate a curious bill with regards to lawyers and the inferior court. A committee of five from the Senate have it to consider till next term. Mr. Cranch is one of them. Thus do they spend their time in curtailing tea tables, while they are suffering thousand to be wrested from them for want of giving ampler powers to Congress. It is dreadful to those who see the necessity of different measures to stand by and see such pursued as they fear will ruin their country. Ask no excuse my dear sister for writing politics. It would be such a want of public spirit not to feel interested in the welfare of our country as the wives of ministers and Senators ought to be ashamed off. Let no one say that the ladies are of no importance in the affairs of the nation. Persuade them to renounce all their luxuries and it would be found that they are, and believe me there is not a more effectual way to do it, than to make them acquainted with the causes of the distresses of their country. We do not want spirit. We only want to have it properly directed.  “Mary Smith Cranch to Abigail Adams, 10 July 1786,” Founders Online, National Archives.  Available through the National Archives Her frustration with the Massachusetts state legislature May 22, 1786 “Not one word of politics have I written nor shall I have time to do it now. If I had I would tell you what wonderful things the House are doing with the lawyers, the court of common pleas, &c, but the newspapers will do it for me. I am thankful there is a Senate as well as a House. What has Congress done? Anything to detain you in Europe. I love my country too well to wish you to return yet, much as I wisht to see you. I did design to write to my dear niece by this vessel but fear I shall not have time. My sincere love and good wishes attend her and hers. Tis very late good night my ever dear Sister and believe me, yours affectionately.  “Mary Smith Cranch to Abigail Adams, 22 May 1786,” Founders Online, National Archives.  Available through the National Archives Commenting on Shays’ Rebellion November 26, 1786 There is like to be a great disturbance in Cambridge at the sitting of the Court of Common Pleas this week. There is an express come to the governor to inform him that Shays, one of the heads of the incendiaries, (it is a many headed beast) is determined to come with eighteen hundred men to stop the court. There will be force sent to oppose them I suppose, and I wish there may not be blood shed. Are we not hastening fast to monarchy, to Anarchy? I am sure we are unless the people discover a better spirit soon. We are concerned for our children I assure you. The college company are wishing to be allowed to march out in defence of government but they will not be permitted. Mr Cranch will go tomorrow and take care of them, of our children I mean… “Mary Smith Cranch to Abigail Adams, 26 November 1786,” Founders Online, National Archives Available through the National Archives Further thoughts on Shays’ Rebellion February 9, 1787 “If you have received our Letters by Captain Callahan, you will be in some measure prepared for the accounts which Captain Folger will bring you of the rebellion which exists in this state. It had arisen to such a height that it was necessary to oppose it by force of arms. We are always in this country to do things in an extraordinary manner. The militia were called for, but there was not a copper in the treasury to pay them or to support them upon their march. Town meetings were called in many places and promises were made them that if the would enlist, they would pay them and wait till the money could be collected from the public for their pay. And for their present support people contributed as they were able and in this manner in less than a week was collected an army of five thousand men who marched under the command of General Lincoln to Worcester to protect the court. The result you will see in the papers. The season has been stormy and severe our army have suffered greatly in some of their marches, especially last Saturday night. Many of them were badly froze, they marched thirty miles without stopping to refresh themselves in order to take Shays and his army by surprise. They took about 150 of them. Shays and a number with him scampered off and have gotten to New Hampshire. Shays and his party are a poor deluded people. They have given much trouble and put us and themselves to much expense and have greatly added to the difficulties they complain off. I think you must have been very uneasy about us. Shays has not a small party in Braintree but not many in this parish. They want paper money to cheat with. They called a town meeting about a week since to forbid collection. Thayers attending the general court but they could not get a vote.  “Mary Smith Cranch to Abigail Adams, 9 February 1787,” Founders Online, National Archives.  Available through the National Archives

      Mary Smith Cranch’s letters show that women were deeply engaged in political debates even though they could not vote or hold office. Her discussions of currency policy, legal reforms, and Shays’s Rebellion reveal that she closely followed national issues and believed women had a responsibility to support the country through informed opinions and economic sacrifice.

    1. Author response:

      eLife Assessment

      This useful study examines whether the sugar trehalose, coordinates energy supply with the gene programs that build muscle in the cotton bollworm (Helicoverpa armigera). The evidence for this currently is incomplete. The central claim - that trehalose specifically regulates an E2F/Dp-driven myogenic program - is not supported by the specificity of the data: perturbations and sequencing are systemic, alternative explanations such as general energy or amino-acid scarcity remain plausible, and mechanistic anchors are also limited. The work will interest researchers in insect metabolism and development; focused, tissue-resolved measurements together with stronger mechanistic controls would substantially strengthen the conclusions.

      We thank the reviewer for the thoughtful and constructive evaluation of our work and for recognizing its potential relevance to researchers working on insect metabolism and development. We fully agree that our current evidence is preliminary and that the mechanistic link between trehalose and the E2F/Dp‑driven myogenic program needs to be strengthened.

      Our intention was to present trehalose-E2F/Dp coupling as a working model emerging from our data, rather than as a fully established pathway. We agree that systemic manipulations of trehalose and whole‑larval RNA‑seq cannot fully differentiate global metabolic stress from specific effects on myogenic programs. In the revision, we plan to include additional metabolic readouts (e.g., ATP/AMP ratio, key amino acids where available) to better discuss the overall energetic and nutritional state. We will reanalyze our RNA‑seq data to more clearly distinguish broad stress/metabolic signatures from cell‑cycle/myogenic signatures. Furthermore, we will reframe our discussion to explicitly state that we cannot completely rule out a contribution of general energy or amino‑acid scarcity at this stage.

      We acknowledge that, with our current experiments, the specificity for an E2F/Dp‑driven program is inferred mainly from enrichment of E2F targets among differentially expressed genes, and expression changes in canonical E2F partners and downstream cell‑cycle/myogenic regulators. To address this more rigorously, we are performing targeted qRT-PCR for a panel of well‑characterized E2F/Dp target genes and myogenic markers in larval muscle versus non‑muscle tissues, following trehalose perturbation. Where technically feasible, testing whether partial knockdown of HaE2F or HaDp modifies the effect of trehalose manipulation on selected myogenic markers. These data, even if limited, will help to provide a more direct functional link, and we will include them in the manuscript if completed in time. In parallel, we will soften statements that imply a fully established, trehalose‑specific regulation of E2F/Dp and instead present this as a strong candidate pathway suggested by the current data.

      We fully agree that tissue‑resolved analyses are essential to move from systemic correlations to causality in muscle. We are in the process of standardizing larval muscle dissections and isolating thoracic/abdominal body wall muscle for trehalose, glycogen, and expression assays. Comparing expression of key metabolic and myogenic genes in muscle versus fat body and midgut, under trehalose manipulation. These tissue‑resolved data will directly address whether the transcriptional changes we report are preferentially localized to muscle.

      We are grateful for the reviewer’s critical but encouraging comments. We will moderate our central claims, also explicitly consider and discuss alternative explanations. Further, we will add tissue‑resolved and more focused mechanistic data as far as possible within the current revision. We believe these changes will substantially strengthen the manuscript and better align our conclusions with the evidence we presently have.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work by Mohite et al., they have used transcriptomic and metabolic profiling of H. armigera, muscle development, and S. frugiperda to link energy trehalose metabolism and muscle development. They further used several different bioinformatics tools for network analysis to converge upon transcriptional control as a potential mechanism of metabolite-regulated transcriptional programming for muscle development. The authors have also done rescue experiments where trehalose was provided externally by feeding, which rescues the phenotype. Though the study is exciting, there are several concerns and gaps that lead to the current results as purely speculative. It is difficult to perform any genetic experiments in non-model insects; the authors seem to suggest a similar mechanism could also be applicable in systems like Drosophila; it might be possible to perform experiments to fill some missing mechanistic details.

      A few specific comments below:

      The authors used N-(phenylthio) phthalimide (NPP), a trehalose-6-phosphate phosphatase (TPP) inhibitor. They also find several genes, including enzymes of trehalose metabolism, that change. Further, several myogenic genes are downregulated in bulk RNA sequencing. The major caveat of this experiment is that the NPP treatment leads to reduced muscle development, and so the proportion of the samples from the muscles in bulk RNA sequencing will be relatively lower, which might have led to the results. So, a confirmatory experiment has to be performed where the muscle tissues are dissected and sequenced, or some of the interesting targets could be validated by qRT-PCR. Further to overcome the off-target effects of NPP, trehalose rescue experiments could be useful.

      Thank you for this valuable comment. We will validate the gene expression data using qRT-PCR on muscle tissue samples from both treated and control groups. This will help determine whether the gene expression patterns observed in the RNA-seq data are muscle-specific or systemic.

      Even the reduction in the levels of ADP, NAD, NADH, and NMN, all of which are essential for efficient energy production and utilization, could be due to the loss of muscles, which perform predominantly metabolic functions due to their mitochondria-rich environment. So it becomes difficult to judge if the levels of these energy molecules' reduction are due to a cause or effect.

      We thank the reviewer for this thoughtful comment and agree that reduced levels of ADP, NAD, NADH, and NMN could arise either from a disturbance of energy metabolism or from loss of mitochondria‑rich muscles. Our current data cannot fully separate these two possibilities. Still, several studies support the interpretation that perturbing trehalose metabolism causes a primary systemic energy deficit that is coupled to mitochondrial function, not merely a passive consequence of tissue loss.

      For example:

      (1) Our previous study in H. armigera showed that chemical inhibition of trehalose synthesis results in depletion of trehalose, glucose, glucose‑6‑phosphate, and suppression of the TCA cycle, indicating reduced energy levels and dysregulated fatty‑acid oxidation (Tellis et al., 2023).

      (2) Chang et al. (2022) showed that trehalose catabolism and mitochondrial ATP production are mechanistically linked. HaTreh1 localizes to mitochondria and physically interacts with ATP synthase subunit α. 20‑hydroxyecdysone increases HaTreh1 expression, enhances its binding to ATP synthase, and elevates ATP content, while knockdown of HaTreh1 or HaATPs‑α reduces ATP levels.

      (3) Similarly, our previous study inhibition of Treh activity in H. armigera generates an “energy‑deficient condition” characterized by deregulation of carbohydrate, protein, fatty‑acid, and mitochondria‑related pathways, and a concomitant reduction in key energy metabolites (Tellis et al., 2024).

      (4) The starvation study in H. armigera has shown that reduced hemolymph trehalose is associated with respiratory depression and large‑scale reprogramming of glycolysis and fatty‑acid metabolism (Jiang et al., 2019).

      These findings support a direct coupling between trehalose availability and systemic energy/redox state. Therefore, the coordinated decrease in ADP, NAD, NADH, and NMN following TPS/TPP silencing is consistent with a primary disturbance of systemic energy and mitochondrial metabolism rather than exclusively a secondary consequence of muscle loss. We agree, however, that the present whole‑larva metabolite measurements do not allow a quantitative partitioning between changes due to altered muscle mass and those due to intrinsic metabolic impairment at the cellular level. Thus, tissue-specific quantification of these metabolites would allow us to directly test whether altered energy metabolites are a cause or consequence of muscle loss.

      References:

      (1) Tellis, M. B., Mohite, S. D., Nair, V. S., Chaudhari, B. Y., Ahmed, S., Kotkar, H. M., & Joshi, R. S. (2024). Inhibition of Trehalose Synthesis in Lepidoptera Reduces Larval Fitness. Advanced Biology, 8(2), 2300404.

      (2) Chang, Y., Zhang, B., Du, M., Geng, Z., Wei, J., Guan, R., An, S. and Zhao, W., 2022. The vital hormone 20-hydroxyecdysone controls ATP production by upregulating the binding of trehalase 1 with ATP synthase subunit α in Helicoverpa armigera. Journal of Biological Chemistry, 298(2).

      (3) Tellis, M., Mohite, S. and Joshi, R., 2024. Trehalase inhibition in Helicoverpa armigera activates machinery for alternate energy acquisition. Journal of Biosciences, 49(3), p.74.

      (4) Jiang, T., Ma, L., Liu, X.Y., Xiao, H.J. and Zhang, W.N., 2019. Effects of starvation on respiratory metabolism and energy metabolism in the cotton bollworm Helicoverpa armigera (Hübner)(Lepidoptera: Noctuidae). Journal of Insect Physiology, 119, p.103951.

      The authors have used this transcriptomic data for pathway enrichment analysis, which led to the E2F family of transcription factors and a reduction in the level of when trehalose metabolism is perturbed. EMSA experiments, though, confirm a possibility of the E2F interaction with the HaTPS/TPP promoter, but it lacks proper controls and competition to test the actual specificity of this interaction. Several transcription factors have DNA-binding domains and could bind any given DNA weakly, and the specificity is ideally known only from competitive and non-competitive inhibition studies.

      We thank the reviewer for this important comment and fully agree that EMSA alone, without appropriate competition and control reactions, cannot establish the specificity or functional relevance of a transcription factor-DNA interaction. In our study, we found the E2F family from GRN analysis of the RNA seq data obtained upon HaTPS/TPP silencing, suggesting a potential regulatory connection. After that, we predicted E2F binding sites on the promoter of HaTPS/TPP. The EMSA experiments were intended as preliminary evidence that E2F can associate with the HaTPS/TPP promoter in vitro. We will clarify this in the manuscript by softening our conclusion to indicate that our data support a “possible E2F-HaTPS/TPP interaction”. We also perform EMSA with specific and non‑specific competitors to confirm the E2F binding to the HaTPS/TPP promoter.

      The work seems to have connected the trehalose metabolism with gene expression changes, though this is an interesting idea, there are no experiments that are conclusive in the current version of the manuscript. If the authors can search for domains in the E2F family of transcription factors that can bind to the metabolite, then, if not, a chip-seq is essential to conclusively suggest the role of E2F in regulating gene expression tuned by the metabolites.

      A previous study in D. melanogaster, Zappia et al., (2016) showed vital role of E2F in skeletal muscle required for animal viability. They have shown that Dp knockdown resulted in reduced expression of genes encoding structural and contractile proteins, such as Myosin heavy chain (Mhc), fln, Tropomyosin 1 (Tm1), Tropomyosin 2 (Tm2), Myosin light chain 2 (Mlc2), sarcomere length short (sals) and Act88F, and myogenic regulators, such as held out wings (how), Limpet (Lmpt), Myocyte enhancer factor 2 (Mef2) and spalt major (salm). Also, ChiP-qRT-PCR showed upstream regions of myogenic genes, such as how, fln, Lmpt, sals, Tm1 and Mef2, were specifically enriched with E2f1, E2f2, and Dp antibodies in comparison with a nonspecific antibody. Further, Zappia et al. (2019) reported a chip-seq dataset that suggests that E2F/Dp directly activates the expression of glycolytic and mitochondrial genes during muscle development. Zappia et al., (2023) showed the regulation of one of the glycolytic genes, Phosphoglycerate kinase (Pgk) by E2F during Drosophila development.

      However, the regulation of trehalose metabolic genes by E2F/Dp and vice versa was not studied previously. So here in our study, we tried to understand the correlation of trehalose metabolism and E2F/Dp in the muscle development of H. armigera.

      References:

      (1) Zappia, M.P. and Frolov, M.V., 2016. E2F function in muscle growth is necessary and sufficient for viability in Drosophila. Nature Communications, 7(1), p.10509.

      (2) Zappia, M.P., Rogers, A., Islam, A.B. and Frolov, M.V., 2019. Rbf activates the myogenic transcriptional program to promote skeletal muscle differentiation. Cell reports, 26(3), pp.702-719.

      (3) Zappia, M. P., Kwon, Y.-J., Westacott, A., Liseth, I., Lee, H. M., Islam, A. B., Kim, J., & Frolov, M. V. (2023a). E2F regulation of the Phosphoglycerate kinase gene is functionally important in Drosophila development. Proceedings of the National Academy of Sciences, 120(15), e2220770120.

      Some of the above concerns are partially addressed in experiments where silencing of E2F/Dp shows similar phenotypes as with NPP and dsRNA. It is also notable that silencing any key transcription factor can have several indirect effects, and delayed pupation and lethality could not be definitely linked to trehalose-dependent regulation.

      Yes. It’s true that silencing of any key transcription factor can have several indirect effects. Our intention was not to argue that delayed pupation and lethality are exclusively due to trehalose-dependent regulation, but that E2F/Dp and HaTPS/TPP silencing showed a consistent set of phenotypes and molecular changes, such as (i) transcriptomic enrichment of E2F targets upon trehalose perturbation, (ii) reduced HaTPS/TPP expression following E2F/Dp silencing, (iii) reduced myogenic gene expression that parallels the phenotypes observed with HaTPS/TPP silencing and (iv) restoration of E2F and Dp expression in E2F/Dp‑silenced insects upon trehalose feeding in the rescue assay. Together, these findings support a functional association between E2F/Dp and trehalose homeostasis. At the same time, we fully acknowledge that these results do not exclude additional, trehalose‑independent roles of E2F/Dp in development.

      Trehalose rescue experiments that rescue phenotype and gene expression are interesting. But is it possible that the fed trehalose is metabolized in the gut and might not reach the target tissue? In which case, the role of trehalose in directly regulating transcription factors becomes questionable. So, a confirmatory experiment is needed to demonstrate that the fed trehalose reaches the target tissues. This could possibly be done by measuring the trehalose levels in muscles post-rescue feeding. Also, rescue experiments need to be done with appropriate control sugars.

      Yes, it’s possible that, to some extent, trehalose is metabolized in the gut. Even though trehalase is present in the insect gut, some of the trehalose will be absorbed via trehalose transporters on the gut lining. Trehalose feeding was not rescued in insects fed with the control diet (empty vector and dsHaTPP), which contains chickpea powder, which is composed of an ample amount of amino acids and carbohydrates. Insects fed exclusively on a trehalose-containing diet are rescued, but not on a control diet that contains other carbohydrates. We agree that direct measurement of trehalose in target tissues will provide important confirmation. In the manuscript, we will measure trehalose levels in muscle, gut, and haemolymph after trehalose feeding.

      No experiments are performed with non-target control dsRNA. All the experiments are done with an empty vector. But an appropriate control should be a non-target control.

      Yes, there was no experiment with non-target dsRNA. Earlier, we have optimized a protocol for dsRNA delivery and its effectiveness in target knockdown (concentration, time) experiment, and published several research articles using a similar protocol:

      (1) Chaudhari, B.Y., Nichit, V.J., Barvkar, V.T. and Joshi, R.S., 2025. Mechanistic insights in the role of trehalose transporter in metabolic homeostasis in response to dietary trehalose. G3: Genes, Genomes, Genetics, p. jkaf303.

      (2) Barbole, R.S., Sharma, S., Patil, Y., Giri, A.P. and Joshi, R.S., 2024. Chitinase inhibition induces transcriptional dysregulation altering ecdysteroid-mediated control of Spodoptera frugiperda development. Iscience, 27(3).

      (3) Patil, Y.P., Wagh, D.S., Barvkar, V.T., Gawari, S.K., Pisalwar, P.D., Ahmed, S. and Joshi, R.S., 2025. Altered Octopamine synthesis impairs tyrosine metabolism affecting Helicoverpa armigera vitality. Pesticide Biochemistry and Physiology, 208, p.106323.

      (4) Tellis, M.B., Chaudhari, B.Y., Deshpande, S.V., Nikam, S.V., Barvkar, V.T., Kotkar, H.M. and Joshi, R.S., 2023. Trehalose transporter-like gene diversity and dynamics enhances stress response and recovery in Helicoverpa armigera. Gene, 862, p.147259.

      (5) Joshi, K.S., Barvkar, V.T., Hadapad, A.B., Hire, R.S. and Joshi, R.S., 2025. LDH-dsRNA nanocarrier-mediated spray-induced silencing of juvenile hormone degradation pathway genes for targeted control of Helicoverpa armigera. International Journal of Biological Macromolecules, p.148673.

      The same vector backbone and preparation procedures were used for both control and experimental constructs, allowing us to specifically compare the effects of the target dsRNA. The phenotypes and gene expression changes we observed were specific to the target genes and were not seen in the empty vector controls, suggesting that the effects are not due to nonspecific responses of dsRNA delivery or vector components.<br /> We acknowledge your suggestions, and in future studies, we will keep non-target dsRNA as a control in silencing assays.

      Reviewer #2 (Public review):

      Summary:

      This study shows that the knockdown of the effects of TPS/TPP in Helicoverpa armigera and Spodoptera frugiperda can be rescued by trehalose treatment. This suggests that trehalose metabolism is necessary for development in the tissues that NPP and dsRNA can reach.

      Strengths:

      This study examines an important metabolic process beyond model organisms, providing a new perspective on our understanding of species-specific metabolism equilibria, whether conserved or divergent.

      Weaknesses:

      While the effects observed may be truly conserved across Lepidopterans and may be muscle-specific, the study largely relies on one species and perturbation methods that are not muscle-specific. The technical limitations arising from investigations outside model systems, where solid methods are available, limit the specificity of inferences that may be drawn from the data.

      Thank you for this potting out this experimental weakness. We will validate the gene expression data using qRT-PCR on muscle tissue samples from both treated and control groups. We will also perform metabolite analysis with muscle samples. This will help to determine whether the observed gene expression patterns and metabolite changes are muscle-specific or systemic.

      Reviewer #3 (Public review):

      The hypothesis is that Trehalose metabolism regulates transcriptional control of muscle development in lepidopteran insects.

      The manuscript investigates the role of Trehalose metabolism in muscle development. Through sequencing and subsequent bioinformatics analysis of insects with perturbed trehalose metabolism (knockdown of TPS/TPP), the authors have identified transcription factor E2F, which was validated through RT-PCR. Their hypothesis is that trehalose metabolism regulates E2F, which then controls the myogenic genes. Counterintuitive to this hypothesis, the investigators perform EMSAs with the E2F protein and promoter of the TPP gene and show binding. Their knockdown experiments with Dp, the binding partner of E2F, show direct effect on several trehalose metabolism genes. Similar results are demonstrated in the trehalose feeding experiment, where feeding trehalose leads to partial rescue of the phenotype observed as a result of Dp knockdown. This seems contradictory to their hypothesis. Even more intriguing is a similar observation between paramyosin, a structural muscle protein, and E2F/Dp - they show that paramyosin regulates E2F/Dp and E2F/Dp regulated paramyosin. The only plausible way to explain the results is the existence of a feed-forward loop between TPP-E2F/Dp and paramyosin-E2F/Dp. But the authors have mentioned nothing in this line. Additionally, I think trehalose metabolism impacts amino acid content in insects, and that will have a direct bearing on muscle development. The sequencing analysis and follow-up GSEA studies have demonstrated enrichment of several amino acid biosynthetic genes. Yet authors make no efforts to measure amino acid levels or correlate them with muscle development. Any study aiming to link trehalose metabolism and muscle development and not considering the above points will be incomplete.

      We appreciate the reviewer’s efforts in the careful evaluation of this manuscript and constructive comments. From our and earlier data we found it was difficult to consider linear pathway “trehalose → E2F → muscle,” but rather a regulatory module in which trehalose metabolism and E2F/Dp form an interdependent circuit controlling myogenic genes. E2F/Dp binds and activates trehalose metabolism genes (TPS/TPP, Treh1) and myogenic structural genes, consistent with EMSA (TPS/TPP-E2F) and predicted binding sites of E2F on metabolic genes, Treh1, Pgk, and myogenic genes such as Act88F, Prm, Tm1, Fln, etc. At the same time, perturbing trehalose synthesis reduces E2F/Dp expression and myogenic gene expression, and trehalose feeding partially restores all three. This bidirectional influence is similar to E2F‑dependent control of carbohydrate metabolism and systemic sugar homeostasis described in D. melanogaster, where E2F/Dp both regulates metabolic genes and is itself constrained by metabolic state (Zappia et al., 2023a; Zappia et al., 2021).

      The reciprocal regulation between Prm and E2F/Dp is indeed intriguing. Rather than a paradox, we interpret this as evidence that E2F/Dp couples metabolic genes and structural muscle genes within a shared module, and that key sarcomeric components (such as paramyosin) feed back on this transcriptional program. Similar cross‑talk between E2F‑controlled metabolic programs and tissue function has been documented in D. melanogaster muscle and fat body, where E2F loss in one tissue elicits systemic changes in the other (Zappia et al., 2021). For further confirmation of E2F-regulated Prm, we will perform EMSA on the Prm promoter with appropriate controls.

      We fully agree that amino‑acid metabolism is a critical missing piece. In the manuscript, we will quantify the amino acid levels and include the results: “Amino acids display differential levels showing cysteine, leucine, histidine, valine, and proline showed significant reductions, while isoleucine and lysine showed non-significant reductions upon trehalose metabolism perturbation. These results are consistent with previous reports published by Tellis et al. (2024) and Shi et al. (2016)”. We will reframe our conclusions more cautiously as establishing a trehalose-E2F/Dp-muscle development, while stating that “definitive causal links via amino‑acid metabolism remain to be demonstrated”.

      Reference:

      (1) Zappia, M. P., Kwon, Y.-J., Westacott, A., Liseth, I., Lee, H. M., Islam, A. B., Kim, J., & Frolov, M. V. (2023a). E2F regulation of the Phosphoglycerate kinase gene is functionally important in Drosophila development. Proceedings of the National Academy of Sciences, 120(15), e2220770120.

      (2) Zappia, M.P., Guarner, A., Kellie-Smith, N., Rogers, A., Morris, R., Nicolay, B., Boukhali, M., Haas, W., Dyson, N.J. and Frolov, M.V., 2021. E2F/Dp inactivation in fat body cells triggers systemic metabolic changes. elife, 10, p.e67753.

      (3)Tellis, M., Mohite, S. and Joshi, R., 2024. Trehalase inhibition in Helicoverpa armigera activates machinery for alternate energy acquisition. Journal of Biosciences, 49(3), p.74.

      (4) Shi, J.F., Xu, Q.Y., Sun, Q.K., Meng, Q.W., Mu, L.L., Guo, W.C. and Li, G.Q., 2016. Physiological roles of trehalose in Leptinotarsa larvae revealed by RNA interference of trehalose-6-phosphate synthase and trehalase genes. Insect Biochemistry and Molecular Biology, 77, pp.52-68.

      Author response image 1.

      The result section of the manuscript is quite concise, to my understanding (especially the initial few sections), which misses out on mentioning details that would help readers understand the paper better. While technical details of the methods should be in the Materials and Methods section, the overall experimental strategy for the experiments performed should be explained in adequate detail in the results section itself or in figure legends. I would request authors to include more details in the results section. As an extension of the comment above, many times, abbreviations have been used without introducing them. A thorough check of the manuscript is required regarding this.

      Thank you very much for pointing out this issue. We will revise the manuscript content according to these suggestions.

      The Spodoptera experiments appear ad hoc and are insufficient to support conservation beyond Helicoverpa. To substantiate this claim, please add a coherent, minimal set of Spodoptera experiments and present them in a dedicated subsection. Alternatively, consider removing these data and limiting the conclusions (and title) to H. armigera.

      We thank the reviewer for this helpful comment. We agree that, in this current version of the manuscript, the S. frugiperda experiments are not sufficiently systematic to support strong claims about conservation beyond H. armigera. Our primary focus in this study is indeed on H. armigera, and the addition of the S. frugiperda data was intended only as preliminary, supportive evidence rather than a central component of our conclusions. To avoid over‑interpretation and to keep the manuscript focused and coherent, we will remove all S. frugiperda data from the revised version, including the corresponding text and figures. We will also adjust the title, abstract, and conclusion to clearly state that our findings are limited to H. armigera.

      In order to check the effects of E2F/Dp, a dsRNA-mediated knockdown of Dp was performed. Why was the E2F protein, a primary target of the study, not chosen as a candidate? The authors should either provide justification for this or perform the suggested experiments to come to a conclusion. I would like to point out that such experiments were performed in Drosophila.

      Thank you for this thoughtful comment and the specific suggestion. We agree that directly targeting E2F would, in principle, be an informative complementary approach. In our study, however, we prioritized Dp knockdown for two main reasons. First, E2F is a large family, and E2F-Dp functions as an obligate heterodimer. Previous work in D. melanogaster has shown that depletion of Dp is sufficient to disrupt E2F-dependent transcription broadly, often with more efficient loss of complex activity than targeting individual E2F isoforms (Zappia et al., 2021; Zappia et al., 2016). Second, in our preliminary trials, we performed a dsRNA feeding assay with dsHaE2F, dsHaDp, and combined dsHaE2F plus dsHaDp. In that assay, we did not achieve silencing of E2F in dsRNA targeting HaE2F (dsHaE2F). So here, as E2F is a large family, other E2F isoforms may be compensating for the silencing effect of targeted HaE2F. However, HaE2F showed significantly reduced expression upon dsHaDp and combined dsHaE2F plus dsHaDp feeding (Figure A), whereas HaDp showed a significant reduction in its expression in all three conditions (Figure B).  As we observed reduced expression of both HaE2F and HaDp upon combined feeding of dsHaE2F and dsHaDp, we further performed a rescue assay by exogenous feeding of trehalose. We observed the significant upregulation of HaE2F, HaDp, trehalose metabolic genes (HaTPS/TPP and HaTreh1), and myogenic genes (HaPrm and HaTm2) (Figure C). For these reasons, we focused on Dp silencing as a more reliable way to impair E2F/Dp complex function in H. armigera.

      Author response image 2.

      References:

      (1) Zappia, M.P. and Frolov, M.V., 2016. E2F function in muscle growth is necessary and sufficient for viability in Drosophila. Nature Communications, 7(1), p.10509.

      (2) Zappia, M.P., Guarner, A., Kellie-Smith, N., Rogers, A., Morris, R., Nicolay, B., Boukhali, M., Haas, W., Dyson, N.J. and Frolov, M.V., 2021. E2F/Dp inactivation in fat body cells triggers systemic metabolic changes. elife, 10, p.e67753.

      Silencing of HaDp resulted in a significant decrease in HaE2F expression. I find this observation intriguing. DP is the cofactor of E2F, and they both heterodimerise and sit on the promoter of target genes to regulate them. I would request authors to revisit this result, as it contradicts the general understanding of how E2F/Dp functions in other organisms. If Dp indeed controls E2F expression, then further experiments should be conducted to come to a conclusion convincingly. Additionally, these results would need thorough discussion with citations of similar results observed for other transcription factor-cofactor complexes.

      Thank you for highlighting this point and for prompting us to examine these data more carefully. Silencing HaDp leading to reduced HaE2F mRNA is indeed unexpected if one only considers the canonical view of E2F/Dp as a heterodimer that co-occupies target promoters without strongly regulating each other’s expression. However, several lines of work suggest that transcription factor-cofactor networks frequently include feedback loops in which cofactors influence the expression of their partner TFs. First, in multiple systems, transcription factors and their cofactors are known to regulate each other’s transcription, forming positive or negative feedback loops. For example, in hematopoietic cells, the transcription factor Foxp3 controls the expression of many of its own cofactors, and some of these cofactors in turn facilitate or stabilize Foxp3 expression, forming an interconnected regulatory network rather than a simple one‑way interaction (Rudra et al., 2012). Second, E2F/Dp complexes exhibit non‑canonical regulatory mechanisms and can regulate broad sets of targets, including other transcriptional regulators. Several studies show that E2F/Dp proteins not only control classical cell‑cycle genes but also participate in diverse processes such as DNA damage signaling, mitochondrial function, and differentiation (Guarner et al., 2017; Ambrus et al., 2013; Sánchez-Camargo et al., 2021). In D. melanogaster, complete loss of dDP alters the expression of direct targets E2F/DP, including dATM (Guarner et al., 2017).

      All these reports indicate that the E2F-Dp complex sits at the top of multi‑layer regulatory hierarchies. Such architectures make it plausible that Dp silencing in H. armigera could modulate HaE2F expression in a non-canonical way.

      References:

      (1) Rudra, D., DeRoos, P., Chaudhry, A., Niec, R.E., Arvey, A., Samstein, R.M., Leslie, C., Shaffer, S.A., Goodlett, D.R. and Rudensky, A.Y., 2012. Transcription factor Foxp3 and its protein partners form a complex regulatory network. Nature immunology, 13(10), pp.1010-1019.

      (2) Guarner, A., Morris, R., Korenjak, M., Boukhali, M., Zappia, M.P., Van Rechem, C., Whetstine, J.R., Ramaswamy, S., Zou, L., Frolov, M.V. and Haas, W., 2017. E2F/DP prevents cell-cycle progression in endocycling fat body cells by suppressing dATM expression. Developmental cell, 43(6), pp.689-703.

      (3) Ambrus, A.M., Islam, A.B., Holmes, K.B., Moon, N.S., Lopez-Bigas, N., Benevolenskaya, E.V. and Frolov, M.V., 2013. Loss of dE2F compromises mitochondrial function. Developmental cell, 27(4), pp.438-451.

      (4) Sánchez-Camargo, V.A., Romero-Rodríguez, S. and Vázquez-Ramos, J.M., 2021. Non-canonical functions of the E2F/DP pathway with emphasis in plants. Phyton, 90(2), p.307.

      I consider the overall bioinformatics analysis to remain very poorly described. What is specifically lacking is clear statements about why a particular dry lab experiments were conducted.

      We again thank the reviewer for advising us to give a biological context/motivation for every bioinformatics analysis performed. The bioinformatics analyses devised here, try to explain the systems-level perturbations of HaTPS/TPP silencing to explain the observed phenotype and to discover transcription factors potentially modulating the HaTPS/TPP induced gene regulatory changes.

      (1) Gene set enrichment analyses:

      Differential gene expression analyses of the bulk RNA sequencing data followed by qRT-PCR confirmed the transcriptional changes in myogenic genes and gene expression alterations in metabolic and cell cycle-related genes. These perturbations merely confirmed the effect induced by HaTPS/TPP silencing in obviously expected genes. We wanted to see whether using an “unbiased” system-level statistical analyses like gene set enrichment analyses (GSEA), can reveal both expected and novel biological processes that underlie HaTPS/TPP silencing. GSEA results revealed large-scale transcriptional changes in 11 enriched processes, including amino acid metabolism, energy metabolism, developmental regulatory processes, and motor protein activity. GSEA not only divulged overall transcriptionally enriched pathways but also identified the genes undergoing synchronized pathway-level transcriptional change upon HaTPS/TPP silencing.

      (2) Gene regulatory network analysis:

      Although GSEA uncovered potential pathway-level changes, we were also interested in identifying the gene regulatory network associated with such large-scale process-level transcriptional perturbations. Interestingly, the biological processes undergoing perturbations were also heterogeneous (e.g., motor protein activity, energy metabolism, amino acid metabolism, etc.). We hypothesized that the inference of a causal gene regulatory network associated with the genes associated with GSEA-enriched biological processes should predict core/master transcription factors that might synchronously regulate metabolic and non-metabolic processes related to HaTPS/TPP silencing, thereby providing a broad understanding of the perturbed phenotype. The gene regulatory network analysis statistically inferred an “active” gene regulatory network corresponding to the GSEA-enriched KEGG gene sets. Ranking the transcription factors (TFs) based on the number of outgoing connections (outdegree centrality) within the active gene regulatory network, E2F family TFs were identified to be top-ranking, highly connected transcription factors associated with the transcriptionally enriched processes. This suggests that E2F family TFs are central to controlling the flow of regulatory information within this network. Intriguingly, E2F has been previously implicated in muscle development in insects (Zappia et al., 2016). Further extracting the regulated targets of E2F family TFs within this network revealed the mechanistic connection with the 11 enriched processes. This GRN analysis was crucial in discovering and prioritizing E2F TFs as central transcription factors mediating HaTPS/TPP silencing effects, which was not apparent using trivial analyses like differential gene expression analysis.

      As per the reviewer’s suggestions, we will add these outlined points in the text of the manuscript (Results section) to further give context and clarity to the bioinformatics analyses conducted in this study.

      In my judgement, the EMSA analysis presented is technically poor in quality. It lacks positive and negative controls, does not show mutation analysis or super shifts. Also, it lacks any competition assays that are important to prove the binding beyond doubt. I am not sure why protein is not detected at all in lower concentrations. Overall, the EMSA assays need to be redone; I find the current results to be unacceptable.

      Thank you for pointing out this issue. We will reperform the EMSA analysis with appropriate controls.  Although the gel image was not clear, there was a light band of protein (indicated by the white square) observed in well No. 8, where we used 8 μg of E2F protein and 75 ng of HaTPS/TPP promoter, upon gel stained with SYPRO Ruby protein stain, suggesting weak HaTPS/TPP-E2F complex formation.

      GSEA studies clearly indicate enrichment of the amino acid synthesis gene in TPP knockdown samples. This supports the plausible theory that a lack of Trehalose means a lack of enough nutrients, therefore less of that is converted to amino acids, and therefore muscle development is compromised. Yet the authors make no effort to measure amino acid levels. While nutrients can be sensed through signalling pathways leading to shut shutdown of myogenic genes, a simple and direct correlation between less raw material and deformed muscle might also be possible.

      We quantified amino acid levels as per the suggestion, and we observed differential levels of amino acids upon trehalose metabolism perturbation.

      However, we observed that insect were failed to rescue when fed a control chickpea-based artificial diet that contained nutrients required for normal growth and development. Based on this observation, we conclude that trehalose deficiency is the only possible cause for the defect in muscle development.

      The authors are encouraged to stick to one color palette while demonstrating sequencing results. Choosing a different color palette for representing results from the same sequencing analysis confuses readers.

      Thank you for the comment. We will revise the color palette as per the suggestion.

      Expression of genes, as understood from sequencing analysis in Figure 1D, Figure 2F, and Figure 3D, appears to be binary in nature. This result is extremely surprising given that the qRT-PCR of these genes have revealed a checker and graded expression.

      Thank you for pointing out this issue. We will revise the scale range for these figures to get more insights about gene expression levels and include figures as per the suggestion.

      In several graphs, non-significant results have been interpreted as significant in the results section. In a few other cases, the reported changes are minimal, and the statistical support is unclear; please recheck the analyses and include exact statistics. In the results section, fold changes observed should be discussed, as well as the statistical significance of the observed change.

      We will revise the analyses and include exact statistics as per the suggestion.

      Finally, I would add that trehalose metabolism regulates cell cycle genes, and muscle development genes establish correlation and causation. The authors should ensure that any comments they make are backed by evidence.

      We thank the reviewer for this insightful comment.  Although direct evidence in insects is currently lacking, multiple independent studies in yeast, plants and mammalian systems support a regulatory link between trehalose metabolism and the cell cycle. In budding yeast Saccharomyces cerevisiae, neutral Treh (Nth1) is directly phosphorylated and activated by the major cyclin‑dependent kinase Cdk1 at G1/S, routing stored trehalose into glycolysis to fuel DNA replication and mitosis (Ewald et al., 2016). CDK‑dependent regulation of trehalase activity has also been reported in plants, where CDC28‑mediated phosphorylation channels glucose into biosynthetic pathways necessary for cell proliferation (Lara-núñez et al., 2025). Furthermore, budding yeast cells accumulate trehalose and glycogen upon entry into quiescence and subsequently mobilize these stores to generate a metabolic “finishing kick” that supports re‑entry into the cell cycle (Silljé et al., 1999; Shi et al., 2010). Exogenous trehalose that perturbs the trehalose cycle impairs glycolysis, reduces ATP, and delays cell cycle progression in S. cerevisiae, highlighting a dose‑ and context‑dependent control of growth versus arrest (Zhang, Zhang and Li, 2020). In mammalian systems, trehalose similarly modulates proliferation-differentiation decisions. In rat airway smooth muscle cells, low trehalose concentrations promote autophagy, whereas higher doses induce S/G2–M arrest, downregulate Cyclin A1/B1, and trigger apoptosis, indicating a shift from controlled growth to cell elimination at higher exposure (Xiao et al., 2021). In human iPSC‑derived neural stem/progenitor cells, low‑dose trehalose enhances neuronal differentiation and VEGF secretion, while higher doses are cytotoxic, again highlighting a tunable impact on cell‑fate outcomes (Roose et al., 2025). In wheat, exogenous trehalose under heat stress reduces growth, lowers auxin, gibberellin, abscisic acid and cytokinin levels, and represses CycD2 and CDC2 expression, suggesting that trehalose signalling integrates with hormone pathways and core cell‑cycle regulators to restrain proliferation during stress (Luo, Liu, and Li, 2021). Together, these studies showed the importance of trehalose metabolism in cell‑cycle regulation to decide whether cells and tissues proliferate, differentiate, or remain quiescent.

      With respect to muscle development, previous work has implicated glycolytic metabolism in myogenesis and muscle growth. Tixier et al. (2013) showed that loss of key glycolytic genes results in abnormally thin muscles, while Bawa et al. (2020) demonstrated that loss of TRIM32 decreases glycolytic flux and reduces muscle tissue size. These findings indicate that carbohydrate and energy metabolism pathways are important determinants of muscle structure and growth. However, there are no previous studies about the role of trehalose metabolism in muscle development, other than as an energy source, so here we specifically set out to establish the involvement of trehalose metabolism in muscle development.

      References:

      (1) Ewald, J.C. et al. (2016) “The yeast cyclin-dependent kinase routes carbon fluxes to fuel cell cycle progression,” Molecular cell, 62(4), pp. 532–545.

      (2) Lara-núñez, A. et al. (2025) “The Cyclin-Dependent Kinase activity modulates the central carbon metabolism in maize during germination,” (January), pp. 1–16.

      (3) Silljé, H.H.W. et al. (1999) “Function of trehalose and glycogen in cell cycle progression and cell viability in Saccharomyces cerevisiae,” Journal of bacteriology, 181(2), pp. 396–400.

      (4) Shi, L. et al. (2010) “Trehalose Is a Key Determinant of the Quiescent Metabolic State That Fuels Cell Cycle Progression upon Return to Growth,” 21, pp. 1982–1990.

      (5) Zhang, X., Zhang, Y. and Li, H. (2020) “Regulation of trehalose, a typical stress protectant, on central metabolisms, cell growth and division of Saccharomyces cerevisiae CEN. PK113-7D,” Food Microbiology, 89, p. 103459.

      (6) Xiao, B. et al. (2021) “Trehalose inhibits proliferation while activates apoptosis and autophagy in rat airway smooth muscle cells,” Acta Histochemica, 123(8), p. 151810.

      (7) Roose, S.K. et al. (2025) “Trehalose enhances neuronal differentiation with VEGF secretion in human iPSC-derived neural stem / progenitor cells,” Regenerative Therapy, 30, pp. 268–277.

      (8) Luo, Y., Liu, X. and Li, W. (2021) “Exogenously-supplied trehalose inhibits the growth of wheat seedlings under high temperature by affecting plant hormone levels and cell cycle processes,” Plant Signaling & Behavior, 16(6).

      (9) Tixier, V., Bataillé, L., Etard, C., Jagla, T., Weger, M., DaPonte, J.P., Strähle, U., Dickmeis, T. and Jagla, K., 2013. Glycolysis supports embryonic muscle growth by promoting myoblast fusion. Proceedings of the National Academy of Sciences, 110(47), pp.18982-18987.

      (10) Bawa, S., Brooks, D.S., Neville, K.E., Tipping, M., Sagar, M.A., Kollhoff, J.A., Chawla, G., Geisbrecht, B.V., Tennessen, J.M., Eliceiri, K.W. and Geisbrecht, E.R., 2020. Drosophila TRIM32 cooperates with glycolytic enzymes to promote cell growth. elife, 9, p.e52358.

      Finally, we appreciate the meticulous review of this manuscript and constructive comments. We will perform the recommended experiments, data analysis, and revise the manuscript accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      (1) The absence of replicate paired-end datasets limits confidence in peak localization.

      The reviewer was under the impression that that we did not perform biological replicates of our ChIP-seq experiments. All ChIP-seq (and ATAC-seq) experiments were performed with biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. We had indicated this in the text and methods but will try to make this even clearer.

      (2) The analyses are primarily correlative, making it difficult to fully assess robustness or to support strong mechanistic conclusions.

      Histone modifications are difficult to alter genetically because of the high copy number of histone genes and inhibition of HATs/HDACs in general leads to alterations in other histone modifications. It is an inherent challenge in establishing causality of histone modifications, especially histone acetylation marks.

      (3) Some claims (e.g., specificity for CpG islands, "dynamic" regulation during differentiation) are not fully supported by the analyses as presented.

      We have modified the text in response to this point. The new text reads: “Non-CGI promoters have lower overall levels of transcription compared to CGI promoters, and for this promoter class H3K115ac enrichment detected by ChIP is only really seen for the highest quartile of transcription (4SU) quartile of expression (Figure 1G). CGI promoters on the other hand, exhibit significant levels of detected H3K115ac even for the lowest quartile of expression. These results suggest a special link between CGI promoters and H3K115ac”.

      (4) Overall, the study introduces an intriguing new angle on globular PTMs, but additional rigor and mechanistic evidence are needed to substantiate the conclusions.

      We agree that the paper does not provide mechanistic details or solid causality of H3K115ac. We have only emphasized the potential role of H3K115ac in nucleosome fragility based on our in vivo data and previously published in-vitro experiments (Manohar et.al., 2009, Chatterjee et. al., 2015). We do provide the evidence that H3K115ac is enriched on subnucleosomal particles via sucrose gradient sedimentation of MNase-digested chromatin (Figure 3C-D).

      Reviewer #2 (Public review):

      (1) I am not fully convinced about the specificity of the antibody. Although the experiment in Figure S1A shows a specific binding to H3K115ac-modified peptides compared to unmodified peptides, the authors do not show any experiment that shows that the antibody does not bind to unrelated proteins. Thus, a Western of a nuclear extract or the chromatin fraction would be critical to show. Also, peptide competition using the H3K115ac peptide to block the antibody may be good to further support the specificity of the antibody. Also, I don't understand the experiment in Figure S1B. What does it tell us when the H3K115ac histone mark itself is missing? The KLF4 promoter does not appear to be a suitable positive control, given that hundreds of proteins/histone modifications are likely present at this region. It is important to clearly demonstrate that the antibody exclusively recognizes H3K115ac, given that the conclusion of the manuscript strongly depends on the reliability of the obtained ChIP-Seq data.

      ChIP-qPCR in S1B includes competition from native chromatin and shows high specificity to its target. We have provided antibody validation in three ways:

      - Western blot with dot-blot of synthetic peptides (Figure S1A).

      - Western blots with Whole cell extracts (Figure 4D).

      - ChIP-qPCR on native chromatin spiked with a cocktail of synthetic mono-nucleosomes, each carrying a single acetylation and a specific barcode (SNAP-ChIP K-AcylStat Panel).

      We could not include H3K115ac marked nucleosomes as they are not available in the panel. Figure S1B shows that the H3K115ac antibody exhibits negligible binding to known K-acyl marks, comparable to an unmodified nucleosome. Because of the absence of a H3K115ac modified barcoded nucleosome, we used the KLF4 promoter from mESCs as a positive control, in agreement with ChIP-seq signal shown in the genome browser profile (Figure 1E), the KLF4 promoter shows a significantly higher signal than the gene body.

      (2) The association of H3K115ac with fragile nucleosomes is based on MNase-sensitivity and fragment length, which are indirect methods and can have technical bias. Experiments that support that the H3K115ac modified nucleosomes are indeed more fragile are missing.

      We have performed ChIP-seq on MNase digested mESC chromatin fractionated on sucrose gradients and this shows that H3K115ac is enriched in fractions containing sub-nucleosomal and fragile nucleosomes but depleted in fractions containing stable nucleosomes (Figure 3D).

      (3) The comparison of H3K115ac with H3K122ac and H3K64ac relies on publicly available datasets. Since the authors argue that these marks are distinct, data generated under identical experimental conditions would be more convincing. At a minimum, the limitations of using external datasets should be discussed.

      H3K64ac and H3K122ac datasets were generated by us in a previous publication (Pradeepa et. al., 2016) using same native MNase ChIP protocol as used here. The ChIP-seq datasets for H3K122ac and H3K27ac are processed in an identical manner, with the same computational pipelines, to the H3K115ac data sets generated in this paper.

      (4) The enrichment of H3K115ac at enhancers and CTCF binding sites is notable but remains descriptive. It would be interesting to clarify whether H3K115ac actively influences transcription factor/CTCF binding or is a downstream correlate.

      We agree with the reviewer’s comment, but we have not claimed causality.

      (5) No information is provided about how H3K115ac may be deposited/removed. Without this information, it is difficult to place this modification into established chromatin regulatory pathways.

      Due to broad target specificity, redundancies and crosstalk among different classes of HATs and HDACs, it is not tractable to answer this question in the current manuscript.

      Reviewer #3 (Public reviews):

      Reviewer 3 is mistaken in thinking our ChIP experiments are performed under cross-linked conditions. As clearly stated in the main text and methods, all our ChIP-seq for histone modifications is done on native MNase-digested chromatin – with no cross-linking. This includes the spike-in experiment shown in Fig S1B to test H3K115ac antibody specificity against the bar-coded SNAP-ChIP® K-AcylStat Panel from Epicypher. We could not include H3K115ac bar-coded nucleosomes in that experiment since they are not available in the panel.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I have two primary concerns that resound through the entire paper:

      (a) Overall, the manuscript is making strong claims based on entirely correlative datasets. No quantitative analyses are performed to demonstrate co-occupancy/localization. Please see more detailed descriptions below.

      Our responses to specific points are provided against each comment below.

      (b) Lack of paired-end replicates for H3K115ac ChIP-seq. While the reviewer token for the deposited data was not made accessible to me, looking at Supplementary Table 1, it appears there are two H3K115ac ChIP-seq datasets. One is paired-end and is single-read. So are peaks called with only one replicate of PE? Or are inaccurate peaks called with SR datasets? Either way, this is not a rigorous way to evaluate H3K115ac localization.

      We are sorry that this reviewer was not able to access the data – the token for the GEO accession was provided for reviewers at the journal’s request. All ChIP-seq (and ATAC-seq) experiments (paired and single-end) were performed with two biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. This was indicated in both the main text and in the methods. In the revised manuscript we have tried to make this even clearer and have put the relevant Pearsons coefficient (r) into the text at the appropriate places. For the reviewer’s information, here is the complete list of data samples in the GEO Accession:

      Author response image 1.

      While I agree that H3K115ac occupancy is high at +CGIs, the authors downplay that H3K122ac and H3K27ac is also more highly enriched at these locations (page 7, last sentence of first paragraph). I imagine this is all due to the more highly transcribed nature of these genes. Sub-stratifying the K27ac and K122ac by transcription (as in Figure 1G) would help to demonstrate a unique nature of H3K115ac. But even better would be to do an analysis that plots H3K115ac enrichment vs transcription for every individual gene rather than aggregate analyses that are biased by single locations. For example, make an XY scatterplot of RNAPII occupancy or 4SU-seq signal vs H3K115ac level, where each point represents a single gene. Because the interpretation that it is CGI-based and not transcription is confounded with the fact that -CGI are more lowly transcribed. So, looking at Figure 1G, even the -CGI occupancy of H3K115ac is correlated with transcription, but it is just more lowly transcribed.

      We thank the reviewer for these suggestions but point out that Figure 1G shows H3K115ac signal for CGI+ and CGI– TSS that are matched for expressions levels (quartiles of 4SU-seq). Fig 1F shows that H3k115ac is much more of a discriminator between CGI+ and – than H3K27ac or H3K122ac.

      (2) H3K115ac, H3K27ac, and H3K122ac are all more enriched (in aggregate) at +CGI locations (Fig 1F); so do these locations just have more positioned nucleosomes? More H3.3? So that these PTMs are just more enriched due to the opportunity?

      Positioned nucleosomes are generally found downstream of the TSS of active CpG island promoters, so what the reviewer suggests may well account for the relative enrichment of H327ac and H3K122ac at CGI+ vs CGI- promoters in Fig.1F. But H3K115ac localisation is distinct, with the peak at the nucleosome-depleted region not the +1 nucleosome. This is also confirmed by the contour plots in Fig 3. Our observation is also not explained by an enrichment of H3.3 at CGI promoters, since we show that H3K115ac is not specific to H3.3 (Fig 4D).

      (3) The authors note in paragraph 2 of page 7 that "H3K115ac does not scale linearly with gene expression..." but the authors never show a quantification of this; stratification in four clusters is not able to make a linear correlation. Furthermore, in the second line of page 7, the authors state that the levels do generally correlate with transcription. To claim it is a specific CGI link and not transcription is tricky, but I encourage the authors to consider more quantifiable ways, rather than correlations, to demonstrate this point, if it is observed.

      We thank the reviewer for this comment, and taking it into consideration, we have decided to re-phrase this paragraph. The new text reads: “Non-CGI promoters have lower overall levels of transcription compared to CGI promoters, and for this promoter class H3K115ac enrichment detected by ChIP is only really seen for the highest quartile of transcription (4SU) quartile of expression (Figure 1G). CGI promoters on the other hand, exhibit significant levels of detected H3K115ac even for the lowest quartile of expression. These results suggest a special link between CGI promoters and H3K115ac”.

      (4) The authors claim on page 7 that "on average, transcription increased from TSS that also gained H3K115ac but to a modest extent, compared with the more substantial loss of H3K115ac from downregulated TSS". However, both upregulated and downregulated are significant; the difference in magnitude could simply be due to more highly or more lowly transcribed locations, meaning that fold change could be more robustly detected. I caution the authors to substantiate claims like this rather than stating a correlation.

      We thank the reviewer for this comment which relates to the data in Fig 2A. It is Fig. 2B shows that the association of H3K115ac loss with downregulation is statistically stronger than H3K115ac gain with upregulation, but only for CGI promoters. With regard to the text on the original pg 7 that is referred to, we have now reworded this to read “Average levels of transcription increased from TSS that also gained H3K115ac, and there was loss of H3K115ac from downregulated TSS (Figure 2A).”

      (5) For Figure 2C, the authors argue that H3K115ac correlate with bivalent locations. So this is all qualitative and aggregate localization; please quantitatively demonstrate this claim.

      Figure S2D provides statistics for this (observed/expected and Fishers exact test).

      (6) The authors claim in Figure 2 that H3115ac is dynamic during differentiation (title of Figure 2). However, there are locations that gain and lose, or maintain H3K115ac. In fact, the most discussed locations are H3K115ac with no change (2C); which means it is NOT dynamic during differentiation. So what is the message for the role during differentiation? From Supplemental Table 1, it appears there is a single ChIP experiment for H3K115ac in NPC, and it is a single read. So this is also a difficult claim with one replicate. Related to this, in S2A, the authors show K115ac where there is no change in transcription; so what is the role of H3K115ac at TSSs relevant to differentiation - it is at both locations changed and unchanged in transcription, but H3K115ac levels itself do not change at these subsets. So, how is this dynamic? This is very confusing, and clearer analyses and descriptions are necessary to deconvolute these data.

      We apologise for the misleading title for Figure 2. This has now been amended to “Changes in H3K115ac during differentiation”. The message of this figure is that whilst changes in H3K115ac at TSS are small (panels A-C), at enhancers the changes are much more dramatic (panel D). The reviewer is incorrect about the number of replicates for NPCs – there are two biological replicates (see response to point 1b).

      (7) The authors go on to examine H3K115ac enrichment on fragile nucleosomes through sucrose gradient sedimentation. A control for H3K27ac or H3K122ac would be nice for comparison.

      We do not have the material available to perform these experiments

      (8) When discussing Figures 3 and SF3, the authors mention performing a different MNase for a second ChIP. Showing the MNase distribution for both the more highly digested and the lowly digested would be nice. a) Related to the above, the authors show input in SF3E to argue that the difference in H3K115ac vs H3K27ac is not due to the library, but they do not show the MNase digestion patterns, which is more important for this argument.

      Input libraries (first two graphs of FigS3E) are the MNase-digested chromatin. Comparison of nucleotide frequencies from millions of reads is more robust method than the fragment length patterns.

      (9) The authors move on to examine H3K115ac at enhancers. Just out of curiosity, given what was found at promoters, is H3K115ac enriched at +CGI enhancers? And what is the correlation with enhancer transcription?

      This is an interesting point, but the number of enhancers associated with CGI is not very high and so we did not focus on this. We have not analysed a correlation with eRNAs in this paper.

      (10) The authors state on page 14 that the most frequent changes in H3K115ac during differentiation are at these enhancers. So do these changes connect with differentiation-specific genes, and/or genes that have altered transcription during differentiation? Just trying to understand the functional role.

      Given the challenges of connecting enhancers with target genes, we have not addressed this question quantitatively. However, we draw the reviewer’s attention to the Genome Browser shots in Figures 2D and S2C, which show clear gain of H3K115ac (and ATAC-seq peaks) at intra and intergenic regions close to genes whose transcription is activated during the differentiation to NPCs.

      (11) Related, at the end of page 14, the authors state that the changes in H3K115ac correlate with changes in ATAC-seq; I imagine this dynamic is not unique for H3K115ac and this is observed for other PTMs (H3K27ac), so assessing and clarifying this, to again get to the specific interest of H3K115ac, would be ideal.

      We have not claimed that chromatin accessibility is unique to H3K115ac. It is the location of H3K115ac which is found inside the ATAC-seq peak region while H3K27ac is found only upstream/downstream of the ATAC peak that is so striking. This is apparent in Fig 4C.

      (12) The authors examine levels of H3K115ac in H3.3 KO cell lines via western blot (Figure 4D), but no replicates and/or quantification are shown.

      We now provide a biological replicate for the Western Blot (new FigS4H) together with an image of the whole gel for the data in Fig 4D

      (13) In Figure S4 and at the end of page 17, the authors are arguing that there is a link to pioneer TF complexes, based on Oct4 binding. First, while Oct4 has pioneering activity, not all Oct4 sites (or motifs) are pioneering; this has been established. So if you want to use Oct4, substratifying by pioneer vs no pioneer is necessary. Second, demonstrating this is unique to pioneer and not to non-pioneer TFs would be an important control.

      In response to the reviewer’s comment, we have removed the term “pioneer” from the manuscript.

      (14) Minor point: Figure 4 A and B, there are some formatting issues with the scale bars.

      We thank the reviewer for pointing this out, and the errors have been corrected in the revised figure.

      (15) Minor point is that it should be clear when single replicates of data are used and when PE/SR sequences are combined or which one is used in each analysis, as this was hard to discern when reading the paper and figure legends.

      We have clearly stated in the text that, after Figure2, we repeated all experiments in paired-end mode. All processing steps are defined separately for single end and paired end datasets in the method section. Details of biological replicates are provided in Sup. Table 1. These concerns are also addressed in our response to Reviewer’s public comment-1.

      (16) Minor point: it is surprising that different MNase and different units were used in the ChIP vs sucrose sedimentation. Could the authors clarify why?

      Chromatin prep for sucrose gradients were done on a much larger scale than for ChIP-seq and required different setups to obtain the right level of MNase digestion.

      (17) The authors note that fragile nucleosomes contain H2A.Z and H3.3, but they never perform an analysis of available data to demonstrate a correlation (or better a quantifiable correlation) between H3K115ac occupancy and these marks at the locations they identify H3K115ac.

      Since have shown (Fig. 4) that depletion of H3.3 does not affect overall levels of H3K115ac, we do not think there is value in further quantitative correlative analyses of H3K115ac and variant histones.

      (18) Minor point: What is the overlap in peaks for H3K115ac, H3K122ac, and H3K27ac (Figure 1C)?

      Nearly all H3K115ac peaks overlap with H3K122ac and/or H3K27ac. Its most distinct properties are its association with CGI promoters, fragile nucleosomes and its unique localisation within the NDRs, three points that the manuscript is focussed on.

      Reviewer #3 (Recommendations for the authors):

      (1) The western blot results in Figure 4D probing for H3, H3.3, and H3K115ac use Ponceau S staining, presumably of an area of the membrane where histones might be expected to migrate, as a measure of loading. However, the Ponceau S bands appear uniformly weaker in the H3.3KO lanes, yet despite this, blotting with H3.3 antibody detects a band in H3.3 knockout ESCs, suggesting that the antibody does not have a high degree of specificity. Again, a blocking experiment with appropriate peptides would instill more confidence in the specificity of these reagents, and/or the authors could provide independent validation of the knockout model to differentiate between a partial knockout or antibody cross-reactivity (e.g., by Sanger sequencing).

      In a revised Fig. S4H we now show the whole gel corresponding to this blot but including co-staining with an antibody for H4 to provide a better loading control. We also provide a biological replicate of this Western blot in the lower panel of Fig. S4H.

      (2) The manuscript would benefit from in vitro follow-up and validation, but if the authors intend to keep the manuscript primarily in silico, I suggest dedicating a few lines in each section to explain the plots, their axes, and their purpose, as well as to assist with interpretation, rather than directly discussing the results. This would make the manuscript more accessible and understandable for a broader audience in the field of epigenetics.

      In the revised version, we have tried to improve the text to make the data more accessible to a broad audience.

    1. Reviewer #4 (Public review):

      I maintain that the images in Figure 12 (new Figure 14) do not support the authors' interpretation that 2-cell embryos resulted from in vitro fertilization (IVF) of Amrc-/- rescued sperm. They are clearly not normal 2-cell embryos and instead look very much like fragmented eggs that can be seen occasionally following in vitro fertilization procedures even when that is done with wild type eggs and sperm. The only portion of current Figure 14 that has normal looking 2-cell embryos is in panel 14A4, where wild type B6D2 sperm were used. Even in that panel, there are some fragmented eggs that the authors identify as 2-cell embryos.

      The authors offer the explanation that CD1 eggs fertilized by B6D2F1 hybrid male sperm do not develop beyond the 2-cell stage, citing a 2008 paper published in Biology of Reproduction by Fernandez-Goonzalez et al. I read through that paper very carefully and even had a colleague read through it in case I missed something, but that paper says nothing at all about strain incompatibilities, much less 2-cell arrest due to them. The only crosses done in that paper are CD1 eggs x CD1 sperm and B6D2 eggs x B6D2 sperm, all by intracytoplasmic sperm injection and not standard in vitro fertilization. [Note that the paper does mention performing in vitro fertilization but says nothing about how it was done or what mouse strains were used.] I even searched the literature for information regarding incompatibility between these strains and could find nothing relevant. But even if the authors are correct and there happens to be a strain incompatibility and 2-cell arrest is expected, what the authors are calling 2-cell embryos are clearly not.

      A second explanation offered by the authors is that they used collagenase to remove the cumulus cells and that this may have affected the appearance of the embryos. This technique is actually used to remove both the cumulus cells and the zona pellucida and has been described as a gentler way to do so than other standard methods (hyaluronidase treatment followed by acid Tyrodes to remove the zona pellucida) (Yamatoya et al., Reprod Med Biol 2011, DOI 10.1007/s12522-011-0075-8). I think it is highly relevant to the current study that the method they used to remove cumulus cells also dissolves the zona, either partially or completely. Given that many of the eggs, fragmented eggs, and 2-cell embryos (from the WT sperm) in Figure 14A are lacking a zona pellucida, it seems very likely that many of the eggs were either zona-free or had partial zona dissolution from the start. In fact, the authors state in the Methods section that "Cumulus-free and zona-free eggs were collected..." for how IVF was done. Partial zona dissolution is standard in some protocols for performing IVF using frozen mouse sperm, which usually have much lower motility and overall efficacy than fresh sperm. In any case, it would improve transparency if the manuscript made clear somewhere other than buried in the Methods that the IVF procedure was done on eggs with partially or fully removed zonas, to allow proper interpretation.

      In the rebuttal, the authors go on to state: "To provide additional functional evidence, we complemented the IVF experiments with ICSI using rescued Armc2-/- sperm and B6D2 oocytes, which allowed embryos to develop to the blastocyst stage. In these experiments, 25% of injected oocytes reached the blastocyst stage with rescued sperm compared to 13% for untreated Armc2-/- sperm (Supplementary Fig. 9) These results support the functional competence of rescued sperm and demonstrate partial recovery of fertilization ability following Armc2 mRNA electroporation."

      Their conclusion that the data support partial recovery of fertilization ability following Armc2 mRNA electroporation in my opinion has no basis. This experiment was done only once, and no information is provided regarding how many eggs underwent ICSI or how many reached the blastocyst stage. The authors claim that the rescued sperm were better than the Armc2-/- sperm in producing blastocysts, but this is based on a simple percentage report of 25% vs 13% without any statistical analysis, even on the results from the single experiment presented.

      Overall, the paper shows rescue of some sperm motility by the new method they use, and the new title is therefore appropriate. The authors have also dealt reasonably with many of the original concerns regarding documenting that their methodology was effective in producing protein (at least the GFP marker) in spermatogenic cells. In my view the authors have, however, not shown any indication of functional recovery over what is already known for the knockout sperm, that ICSI can support blastocyst stage embryo development. They also have not, in my view, justified the claims at the end of the abstract "These motile sperm were able to produce embryos by IVF..." and that "...mRNA electroporation can restore...partially fertilizing ability..."

    1. Following acceptance, authors may pass their manuscript to the journal in any reasonable format (LaTeX or markdown preferred; Word and PDF acceptable).The document will be published in a “web-first” format, such as the Distill version of R Markdown.This allows reflowable text and mobile readability.We currently do not plan to support interactive content, as we do not think the large effort is worth the modest benefit.

      You don't have to host -- why not just evaluate and curte?

      Or you can have a compromise -- a 'traditional summary' in the journal, linking to the interactive version created by the author, the latter being the canonical one

      NB, I think interactive content is high value, but the authors can produce it, especially given Claude code etc

    1. Chapter 1: Introduction to College Writing at CNM This textbook was designed for English 1110 and 1120, Composition I and Composition II, respectively. If you are enrolled in one of these courses, you may be nearing the end of your studies at Central New Mexico Community College (CNM), you may be just starting your studies at CNM, or you may have already taken this class but didn’t finish. The reality is every English 1110 and 1120 course at CNM contains a diverse range of students. If you are enrolled in English 1110 or 1120 at CNM, you are likely a resident of New Mexico (NM). You might have gone to an elementary or secondary school here. You might feel like a part of the unique culture here in NM. Wherever you started, we welcome you to CNM! The graphic below lists the outcomes for English 1110 and 1120, which will be introduced by your instructor and included in your syllabus. Course Outcomes: Composition I & II Composition I: English 1110 Analyze communication through reading and writing skills. Employ writing processes such as planning, organizing, composing, and revising. Express a primary purpose and organize supporting points logically. Use and document research evidence appropriate for college-level writing. Employ academic writing styles appropriate for different genres and audiences. Identify and correct grammatical and mechanical errors in their writing. Composition II: English 1120 Analyze the rhetorical situation for purpose, main ideas, support, audience, and organizational strategies in a variety of genres. Employ writing processes such as planning, organizing, composing, and revising. Use a variety of research methods to gather appropriate, credible information. Evaluate sources, claims, and evidence for their relevance, credibility, and purpose. Quote, paraphrase, and summarize sources ethically, citing and documenting them appropriately. Integrate information from sources to effectively support claims and for other purposes ( to provide background information, evidence/examples, illustrate an alternative view, etc.). Use an appropriate voice ( including syntax and word choice). Did You Know Being a CNM student means that you are enrolled at the largest post-secondary institution in the state. CNM offers resources that can help you not only with your studies but also with managing your responsibilities as well. In this textbook, we’ll cover the conventions of writing, and we’ll also cover some of the resources available to you as a CNM student. And since this book is free and available on the internet, you can keep it…forever! This textbook is an Open Educational Resource (OER) text, which means it was created using free and available sources on the Internet, namely eight different open access books. Our compiled textbook will shift between free, outside writing resources and the plural first pronoun voice, or the we voice, signaling the English teachers who compiled and developed sections of the text. Throughout this text, the writers–all CNM English faculty, some of whom are still paying back student loans–are the we who compiled this textbook. We did so because we believe that a college education should be engaging, enlightening, informative, life-affirming, worldview-upturning and affordable. We believe it shouldn’t cost money to learn how to write, and that is why we are making this book available to you. This project also would not have happened without the support of CNM’s OER initiative and Liberal Arts administration. This textbook will cover ways to communicate effectively as you develop insight into your own style, writing process, grammatical choices, and rhetorical situations. With these skills, you should be able to improve your writing talent regardless of the discipline you enter after completing this course. Knowing your rhetorical situation, or the circumstances under which you communicate, and knowing which tone, style, and genre will most effectively persuade your audience, will help you regardless of whether you are enrolling in history, biology, theater, or music next semester–because when you get to college, you write in every discipline. To help launch our introduction this chapter includes a section from the open access textbook Successful Writing. As you begin this chapter, you may wonder why you need an introduction. After all, you have been writing and reading since elementary school. You completed numerous assessments of your reading and writing skills in high school and as part of your application process for college. You may write on the job, too. Why is a college writing course even necessary? It can be difficult to feel excited about an intro writing course when you are eager to begin the coursework in your major (and if you are an English major, let your teacher know so you can talk about your future education plans). Regardless of your field of study, honing your writing skills—plus your reading and critical-thinking skills—will help you build a solid academic foundation. In college, academic expectations change from what you may have experienced in high school. The quantity of work you are expected to complete increases. When instructors expect you to read pages upon pages or study hours and hours for one particular course, managing your workload can be challenging. This chapter includes strategies for studying efficiently and managing your time. The quality of the work you do also changes. It is not enough to understand course material and summarize it on an exam. You will also be expected to seriously engage with new ideas by reflecting on them, analyzing them, critiquing them, making connections, drawing conclusions, or finding new ways of thinking about a given subject. Educationally, you are moving into deeper waters. A good introductory writing course will help you swim. Infographic comparing various aspects of high school and college, adapted from “Chapter One” of Successful Writing, 2012, used according to Creative Commons 3.0 cc-by-nc-sa. Seeking Help Meeting College Expectations Depending on your education before coming to CNM, you will have varied writing experiences as compared with other students in class. Some students might have earned a GED, some might be returning to school after a decades-long break, and still other students might either be graduating high school, or be freshly graduated. If the latter is the case, you might enter college with a wealth of experience writing five-paragraph essays, book reports, and lab reports. Even the best students, however, need to make big adjustments to learn the conventions of academic writing. College-level writing obeys different rules, and learning them will help you hone your writing skills. Think of it as ascending another step up the writing ladder. Many students feel intimidated asking for help with academic writing; after all, it’s something you’ve been doing your entire life in school. However, there’s no need to feel like it’s a sign of your lack of ability; on the contrary, many of the strongest student writers regularly seek help and support with their writing (that’s why they’re so strong). College instructors are familiar with the ups and downs of writing, and most colleges have support systems in place to help students learn how to write for an academic audience. The following sections discuss common on-campus writing services, what to expect from them, and how they can help you. Tutoring Center CNM students have access to The Learning and Computer Center (TLCc), which is available on six campuses: Advanced Technology Center, Main, Montoya, Rio Rancho, South Valley, and Westside. At these writing centers, trained tutors help students meet college-level expectations. The tutoring centers offer one-on-one meetings, online, and group sessions for multiple disciplines. TLCc also offers workshops on citing and learning how to develop a writing process.   CNM’s Ace Tutoring Lab provides students with resources and support for their academic needs. Student-Led Workshops Some courses encourage students to share their research and writing with each other, and even offer workshops where students can present their own writing and offer constructive comments to their classmates. Independent paper-writing workshops provide a space for peers with varying interests, work styles, and areas of expertise to brainstorm. Writing in drafts makes academic work more manageable. Drafting gets your ideas onto paper, which gives you more to work with than the perfectionist’s daunting blank screen. You can always return later to fix the problems that bother you. Communicating in a College Course Communication courses teach students that communication involves two parties—the sender and the receiver of the communicated message. Sometimes, there is more than one sender and often, there is more than one receiver of the message. The main purpose of communication whether it be email, text, tweet, blog, discussion, presentation, written assignment, or speech is always to help the receiver(s) of the message understand the idea that the sender of the message is trying to share. This section will focus on electronic communication in a college course. Email or message An email or message sent to your instructor is often the result of a question you may have. Many students think contacting their instructor shows that they weren’t paying attention or that they are the only student did not understand something, so they often keep quiet and go on trying to do work that they do not understand. Other students think that their teacher is their own private tutor, so they email or message the teacher several times a day to ask questions that likely have answers in the syllabus and in the learning module instructions. Both of these behaviors are unhelpful and frustrating to the students and the instructor. On the other hand, avoid monopolizing your teacher’s email inbox with dozens of emails and messages per week and expecting her to respond immediately. Nobody enjoys having their inbox blown up with multiple messages by the same person. Try to remember your instructor will likely have many other emails from administrators, staff, and other students. Avoid sending harsh or demanding emails or messages when you are panicked, frustrated, or angry. Walk away from your computer and return at a later time when you feel calmer. Then re-read the instructions, or syllabus, or the course materials you find confusing, and if you still cannot find the answer because it is not there, definitely email or message your instructor. Tips for Emailing Your Instructor Be polite: Address your professor formally, using the title “Professor” or “Instructor” with their last name. Depending on how formal your professor seems, use a salutation (“Dear” or “Hello” followed by your professor’s name/title (Dr. XYZ, Professor XYZ, etc.) Pose a question. Clearly introduce the purpose of your email and the information you are requesting. If you are not asking a specific question, be aware that you may not receive a response to your email. Be concise. Instructors are busy people, and although they are typically more than happy to help you, kindly get to your point quickly. Sign off with your first and last name, the course number, and the class time. This will make it easy for your professor to identify you. Do not ask, “When will you return our papers?” If you MUST ask, make it specific and realistic (e.g., “Will we get our papers back by the end of next week?”). Most Instructors teach multiple classes and could have hundreds of assignments to grade. Do not ask your Instructor if you missed anything important when you were absent. Instructors work diligently to design their coursework, so asking if any of that content was important can be considered rude or dismissive of their hard work. Instead ask if missed anything that was not included on the course schedule. Creating an appropriate tone can feel overwhelming. We know that all emails should be polite, and emails to your instructor may be more formal or professional. Not all Instructors will expect formal emails, but it’s important to remember that your instructor is not your friend and that an email or message is not a text message. It is not appropriate to send an informal or colloquial message and to assume your instructor is your friend or acquaintance and that an email or message is the same as text message. Sample Email to an Instructor Subject: English 1110 Section 102: Absence Dear/Hello Professor [Last name], l was unable to attend class today, so I wanted to ask if there are any handouts or additional assignments I should complete before we meet on Thursday? I did review the syllabus and course outline, and I will complete the quiz and reading homework listed there. Many thanks, [First name] [Last name]   Communication on Public Discussion Boards Whenever you are being asked to communicate or post in a discussion forum or other communication mode, you need to ask yourself if there will be one recipient or several. In other words, who will be your readers? Is the forum private so that only your instructor or only a group of classmates or only a specific classmate can see it or is it public so that everyone, all of your classmates and your instructor can see your post? Check the forum to which you are posting for these settings. The discussion board is a public forum, so you might have a broad audience. Create a post according to the recipient(s). It is nice to address a classmate by name if you are responding to a specific person in a discussion forum.

      post to the whole class

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript reports the identification of putative orthologues of mitochondrial contact site and cristae organizing system (MICOS) proteins in Plasmodium falciparum - an organism that unusually shows an acristate mitochondrion during the asexual part of its life cycle and then this develops cristae as it enters the sexual stage of its life cycle and beyond into the mosquito. The authors identify PfMIC60 and PfMIC19 as putative members and study these in detail. The authors at HA tags to both proteins and look for timing of expression during the parasite life cycle and attempt (unsuccessfully) to localise them within the parasite. They also genetically deleted both gene singly and in parallel and phenotyped the effect on parasite development. They show that both proteins are expressed in gametocytes and not asexuals, suggesting they are present at the same time as cristae development. They also show that the proteins are dispensible for the entire parasite life cycle investigated (asexuals through to sporozoites), however there is some reduction in mosquito transmission. Using EM techniques they show that the morphology of gametocyte mitochondria is abnormal in the knock out lines, although there is great variation.

      Major comments: The manuscript is interesting and is an intriguing use of a well studied organism of medical importance to answer fundamental biological questions. My main comments are that there should be greater detail in areas around methodology and statistical tests used. Also, the mosquito transmission assays (which are notoriously difficult to perform) show substantial variation between replicates and the statistical tests and data presentation are not clear enough to conclude the reduction in transmission that is claimed. Perhaps this could be improved with clearer text?

      We would like to thank the reviewer for taking the time to review our manuscript. We are happy to hear the reviewer thinks the manuscript is interesting and thank the reviewer for their constructive feedback.

      To clarify the statistical analyses used, we included a new supplementary dataset with all statistical analyses and p-values indicated per graph. Furthermore, figure legends now include the information on the exact statistical test used in each case.

      Regarding mosquito experiments, while we indeed reported a reduction in transmission and oocysts numbers we are aware that this effect might be due to the high variability in mosquito feeding assays. To highlight this point, we deleted the sentence "with the transmission reduction of [numbers]...." and we included the sentence "The high variability encountered in the standard membrane feeding assays, though, partially obstructs a clear conclusion on the biological relevance of the observed reduction in oocyst numbers"

      More specific comments to address: Line 101/Fig1E (and figure legend) - What is this heatmap showing. It would be helpful to have a sentence or two linking it to a specific methodology. I could not find details in the M+M section and "specialized, high molecular mass gels" does not adequately explain what experiments were performed. The reference to Supplementary Information 1 also did not provide information.

      We added the information "high molecular mass gels with lower acrylamide percentage" to clarify methodology in the text. Furthermore, we extended the figure legend to include all relevant information. Further experimental details can be found in the study cited in this context, where the dataset originates from (Evers et al., 2021).

      Line 115 and Supplementary Figure 2C + D - The main text says that the transgenic parasites contained a mitochondrially localized mScarlet for visualization and localization, but in the supplementary figure 2 it shows mitotracker labelling rather than mScarlet. This is very confusing. The figure legend also mentions both mScarlet and MitoTracker. I assume that mScarlet was used to view in regular IFAs (Fig S2C) and the MitoTracker was used for the expansion microscopy (Fig S2D)? Please clarify.

      We thank the reviewer for pointing this out - this was indeed incorrectly annotated. We used the endogenous mito-mScarlet signal in IFA and mitoTracker in U-ExM. The figure annotation has now been corrected.

      Figure 2C - what is the statistical test being used (the methods say "Mean oocysts per midgut and statistical significance were calculated using a generalized linear mixed effect model with a random experiment effect under a negative binomial distribution." but what test is this?)?

      The statistic test is now included in the material and method section with the sentence "The fitted model was used to obtain estimated means and contrasts and were evaluated using Wald Statistics". The test is now also mentioned in the figure legend.

      Also the choice of a log10 scale for oocyst intensity is an unusual choice - how are the mosquitoes with 0 oocysts being represented on this graph? It looks like they are being plotted at 10^-1 (which would be 0.1 oocysts in a mosquito which would be impossible).

      As the data spans three orders of magnitude with low values being biologically meaningful, we decided that a log scale would best facilitate readability of the graph. As the 0 values are also important to show, we went with a standard approach to handle 0s in log transformed data and substituted the 0s with a small value (0.001). We apologize for not mentioning this transformation in the manuscript. To make this transformation transparent, we added a break at the lower end of the log‑scaled y‑axis and relabelled the lowest tick as '0'. This ensures that mosquitoes with zero oocysts are shown along the x‑axis without being assigned an artificial value on the log scale. We would furthermore like to highlight that for statistics we used the true value 0 and not 0.001.

      Figure 2D - it is great that the data from all feeding replicates has been shared, however it is difficult to conclude any meaningful impact in transmission with the knock-out lines when there is so much variation and so few mosquitoes dissected for some datapoints (10 mosquitoes are very small sample sizes). For example, Exp1 shows a clear decrease in mic19- transmission, but then Exp2 does not really show as great effect. Similarly, why does the double knock out have better transmission than the single knockouts? Sure there would be a greater effect?

      We agree with the reviewer and with the new sentence added, as per major point, we hope we clarified the concept. Note that original Figure 2D has been moved to the supplementary information, as per minor comment of another reviewer.

      Figure 3 legend - Please add which statistical test was used and the number of replicates.

      Done

      Figure 4 legend - Please add which statistical test was used and the number of replicates.

      Done. Regarding replicates, note that while we measured over 100 cristae from over 30 mitochondria, these all stem from the same parasite culture.

      Figure 5C - the 3D reconstructions are very nice, but what does the red and yellow coloring show?

      Indeed, the information was missing. We added it to the figure legend.

      Line 352 - "Still, it is striking that, despite the pronounced morphological phenotype, and the possibly high mitochondrial stress levels, the parasites appeared mostly unaffected in life cycle propagation, raising questions about the functional relevance of mitochondria at these stages." How do the authors reconcile this statement with the proven fact that mitochondria-targeted antimalarials (such as atovaquone) are very potent inhibitors of parasite mosquito transmission?

      Our original sentence was reductive. What we wanted to state was related to the functional relevance of crista architecture and overall mitochondrial morphology rather than the general functional relevance of the mitochondria. We changed the sentence accordingly.

      Furthermore, even though we do not discuss this in the article, we are aware of mitochondria targeting drugs that are known to block mosquito transmission. We want to point out that it is difficult to discern the disruption of ETC and therefore an impact on energy conversion with the impact on the essential pathway of pyrimidine synthesis, highly relevant in microgamete formation. Still, a recent paper from Sparkes et al. 2024 showed the essentiality of mitochondrial ATP synthesis during gametogenesis so it is very likely that the mitochondrial energy conversion is highly relevant for transmission to the mosquito.

      Reviewer #1 (Significance (Required)):

      This manuscript is a novel approach to studying mitochondrial biology and does open a lot of unanswered questions for further research directions. Currently there are limitations in the use of statistical tests and detail of methodology, but these could be easily be addressed with a bit more analysis/better explanation in the text. This manuscript could be of interest to readers with a general interest in mitochondrial cell biology and those within the specific field of Plasmodium research. My expertise is in Plasmodium cell biology.

      We thank the reviewer for the praise.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Major comments: 1) In my opinion, the authors tend to sensationalize or overinterpret their results. The title of the manuscript is very misleading. While MICOS is certainly important for crista formation, it is not the only factor, as ATP synthase dimer rows make a highly significant contribution to crista morphology. Thus, one can argue with equal validity that ATP synthase should be considered the 'architect', as it's the conformation of the dimers and rows modulate positive curvature. Secondly, while cristae are still formed upon mic60/mic19 gene knockout (KO), they are severely deformed, and likely dysfunctional (see below). Thus, I do not agree with the title that MICOS is dispensable for crista formation, because the authors results show that it clearly is essential. So, the title should be changed.

      We thank the reviewer for taking the time to review our manuscript.

      Based on the reviewers' interpretation we conclude the title does not come across as intended. We have changed the title to: "The role of MICOS in organizing mitochondrial cristae in malaria parasites"

      The Discussion section starting from line 373 also suffers from overinterpretation as well as being repetitive and hard to understand. The authors infer that MICOS stability is compromised less in the single KOs (sKO) in compared to the mic60/mic19 double KO (dKO). MICOS stability was never directly addressed here and the composition of the MICOS complex is unaddressed, so it does not make sense to speculate by such tenuous connections. The data suggest to me that mic60 and mic19 are equally important for crista formation and crista junction (CJ) stabilization, and the dKO has a more severe phenotype than either KO, further demonstrating neither is epistatic.

      We do agree with the reviewer's notion that we did not address complex stability, and our wording did not make this sufficiently clear. We shortened and rephrased the paragraph in question.

      The following paragraphs (line 387 to 422) continues with such unnecessary overinterpretation to the point that it is confusing and contradictory. Line 387 mentions an 'almost complete loss of CJs' and then line 411 mentions an increase in CJ diameter, both upon Mic60 ablation. I do not think this discussion brings any added value to the manuscript and should be shortened. Yes, maybe there are other putative MICOS subunits that may linger in the KOS that are further destabilized in the dKO, or maybe Mic60 remains in the mic19 KO (and vice versa) to somehow salvage more CJs, which is not possible in the dKO. It is impossible to say with confidence how ATP synthase behaves in the KOs with the current data.

      We shortened this paragraph.

      2) While the authors went through impressive lengths to detect any effect on lifecycle progression, none was found except for a reduction in oocyte count. However, the authors did not address any direct effect on mitochondria, such as OXPHOS complex assembly, respiration, membrane potential. This seems like a missed opportunity, given the team's previous and very nice work mapping these complexes by complexome profiling. However, I think there are some experiments the authors can still do to address any mitochondrial defects using what they have and not resorting to complexome profiling (although this would be definitive if it is feasible):

      i) Quantification of MitoTracker Red staining in WT and KOs. The authors used this dye to visualize mitochondria to assay their gross morphology, but unfortunately not to assay membrane potential in the mutants. The authors can compare relative intensities of the different mitochondria types they categorized in Fig. 3A in 20-30 cells to determine if membrane potential is affected when the cristae are deformed in the mutants. One would predict they are affected.

      Interesting suggestion. As our staining and imaging conditions are suitable for such analysis (as demonstrated by Sarazin et al., 2025, https://www.biorxiv.org/content/10.1101/2025.11.27.690934v1), we performed the measurements on the same dataset which we collected for Figure 3. We did, however, not detect any difference in mitotracker intensity between the different lines. The result of this analysis is included in the new version of Supplementary figure S6.

      ii) Sporozoites are shown in Fig S5. The authors can use the same set up to track their motion, with the hypothesis that they will be slower in the mutants compared to WT due to less ATP. This assumes that sporozoite mitochondria are active as in gametocytes.

      While theoretically plausible and informative, we currently do not know the relevance of mitochondrial energy conversion for general sporozoite biology or specifically features of sporozoite movement. Given the required resources and time to set this experiment up and the uncertainty whether it is a relevant proxy for mitochondrial functioning, we argue it is out of scope for this manuscript.

      iii) Shotgun proteomics to compare protein levels in mutants compared to WT, with the hypothesis that OXPHOS complex subunits will be destabilized in the mutants with deformed cristae. This could be indirect evidence that OXPHOS assembly is affected, resulting in destabilized subunits that fail to incorporate into their respective complexes.

      While this experiment could potentially further our understanding of the interaction between MICOS and levels of OXPHOS complex subunits we argue that the indirect nature of the evidence does not justify the required investments.

      To expedite resubmission, the authors can restrict the cell lines to WT and the dKO, as the latter has a stronger phenotype that the individual KOs and conclusions from this cell line are valid for overall conclusions about Plasmodium MICOS.

      I will also conclude that complexome/shotgun proteomics may be a useful tool also for identifying other putative MICOS subunits by determining if proteins sharing the same complexome profile as PfMic60 and Mic19 are affected. This would address the overinterpretation problem of point 1.

      3) I am aware of the authors previous work in which they were not able to detect cristae in ABS, and thus have concluded that these are truly acristate. This can very well be true, or there can be immature cristae forms that evaded detection at the resolution they used in their volumetric EM acquisitions. The mitochondria and gametocyte cristae are pretty small anyway, so it not unreasonable to assume that putative rudimentary cristae in ABS may be even smaller still. Minute levels of sampled complex III and IV plus complex V dimers in ABS that were detected previously by the authors by complexome profiling would argue for the presence of miniscule and/or very few cristae.

      I think that authors should hedge their claim that ABS is acrisate by briefly stating that there still is a possibility that miniscule cristae may have been overlooked previously.

      We acknowledge that we cannot demonstrate the absolute absence of any membrane irregularities along the inner mitochondrial membrane. At the same time, if such structures were present, they would be extremely small and unlikely to contain the full set of proteins characteristic of mature cristae. For this reason, we consider it appropriate to classify ABS mitochondria as acristate. To reflect the reviewer's point while maintaining clarity for readers, we have slightly adjusted our wording in the manuscript, changing 'fully acristate' to 'acristate'.

      This brings me to the claim that Mic19 and Mic60 proteins are not expressed in ABS. This is based on the lack of signal from the epitope tag; a weak signal is detected in gametocytes. Thus, one can counter that Mic19 and Mic60 are also expressed, but below the expression limits of the assay, as the protein exhibits low expression levels when mitochondrial activity is upregulated.

      We agree with the reviewer that the absence of a detectable epitope‑tag signal does not definitively exclude low‑level expression, and we have therefore replaced the term 'absent' with 'undetectable' throughout the manuscript. In context with previous findings of low-level transcripts of the proteins in a study by Lopez-Berragan et al. and Otto et al., we also added the sentence "The apparent absence could indicate that transcripts are not translated in ABS or that the proteins' expression was below detection limits of western blot analysis." to the discussion. _At the same time, we would like to clarify that transcript levels for both genes fall within the

      To address this point, the authors should determine of mature mic60 and mic19 mRNAs are detected in ABS in comparison to the dKO, which will lack either transcript. RT-qPCR using polyT primers can be employed to detect these transcripts. If the level of these mRNAs are equivalent to dKO in WT ABS, the authors can make a pretty strong case for the absence of cristae in ABS.

      We appreciate the reviewer's suggestion. As noted in the Discussion, existing transcriptomic datasets already show detectable MIC19 and MIC60 mRNAs in ABS. For this reason, we expect RT-qPCR to reveal low (but not absent) levels of both transcripts, unlike the true loss expected to be observed in the dKO. Because such residual signals have been reported previously and their biological relevance remains uncertain, we do not believe transcript levels alone can serve as a definitive indicator of cristae absence in ABS.

      They should highlight the twin CX9C motifs that are a hallmark of Mic19 and other proteins that undergo oxidative folding via the MIA pathway. Interestingly, the Mia40 oxidoreductase that is central to MIA in yeast and animals, is absent in apicomplexans (DOI: 10.1080/19420889.2015.1094593).

      Searching for the CX9C motifs is a valuable suggestion. In response to the reviewer´s suggestion we analysed the conservation of the motif in PfMIC19 and included this in a new figure panel (Figure 1 F).

      Did the authors try to align Plasmodium Mic19 orthologs with conventional Mic19s? This may reveal some conserved residues within and outside of the CHCH domain.

      In response to this comment we made Figure 1 F, where we show conserved residues within the CHCH domains of a broad range of MIC19 annotated sequences across the opisthokonts, and show that the Cx9C motifs are conserved also in PfMIC19. Outside the CHCH domain, we did not find any meaningful conservation, as PfMIC19 heavily diverges from opisthokont MIC19.

      5) Statistcal significance. Sometimes my eyes see population differences that are considered insignificant by the statistical methods employed by the authors, eg Fig. 4E, mutants compared to WT, especially the dKO. Have the authors considered using other methods such as student t-test for pairwise comparisons?

      The graphs in figures 3, 4 and 5 got a makeover, such that they now are in linear scale and violin plots (also following a suggestion from further down in the reviewer's comments). We believe that this improves interpretability. ANOVA was kept as statistical testing to assure the correction for multiple comparisons that cannot be performed with standard t-test. A full overview of statistics and exact p-values can also be found in the newly added supplementary information 2.

      Minor comments: Line 33. Anaerobes (eg Giardia) have mitochondria that do produce ATP, unlike aerobic mitochondria

      We acknowledge that producing ATP via OXPHOS is not a characteristic of all mitochondria-like organelles (e.g. mitosomes), which is why these are typically classified separately from canonical mitochondria. When not considering mitochondria-like organelles, energy conversion is the function that the mitochondrion is most well-known for and the one associated with cristae.

      Line 56: Unclear what authors mean by "canonical model of mitochondria"

      To clarify we changed this to "yeast or human" model of mitochondria.

      Lines 75-76: This applies to Mic10 only

      We removed the "high degree of conservation in other cristate eukaryotes" statement.

      Line 80: Cite DOI: 10.1016/j.cub.2020.02.053

      Done

      Fig 2D: I find this table difficult to read. If authors keep table format, at least get rid of 'mean' column' as this data is better depicted in 2C. I suggest depicted this data either like in 3B depicting portion of infected vs unaffected flies in all experiments, then move modified Table to supplement. Important to point out experiment 5 appears to be an outlier with reduced infectivity across all cell lines, including WT.

      To clarify: the mean reported in the table indicates the mean per replicate while the mean reported in figure 2C is the overall mean for a given genotype that corrects for variability within experiments. We agree that moving the table to the supplementary data is a good idea. We decided to not include a graph for infected and non-infected mosquitoes as this information would be partially misleading, highlighting a phenotype we argue to be influenced by the strong variability.

      Fig. 3C-G: I feel like these data repeatedly lead to same conclusions. These are all different ways of showing what is depicted in Fig 2B: mitochondria gross morphology is affected upon ablation of MICOS. I suggest that these graphs be moved to supplement and replaced by the beautiful images.

      Thank you for the nice comment on our images. We have now moved part of the graphs to supplementary figure 6 and only kept the Relative Frequency, Sphericity and total mitochondria volume per cell in the main figure.

      Line 180: Be more specific with which tubulin isoform is used as a male marker and state why this marker was used in supplemental Fig S6.

      We have now specified the exact tubulin isoform used as the male gametocyte marker, both in the main text and in Supplementary Fig. S6. This is a commercial antibody previously known to work as an effective male marker, which is why we selected it for this experiment. This is now clearly stated in the manuscript.

      Line 196 and Fig 3C: the word 'intensities' in this context is very ambiguous. Please choose a different term (puncta, elements, parts?). This is related to major point 2i above.

      To clarify the biological effect that we can conclude form the measurement, we added an explanation about it in the respective section of the results, and we decided to replace the raw results of the plug-in readout with the deduced relative dispersion.

      Line 222: Report male/female crista measurements

      We added Supplementary information 2, which contains exact statistical test and outcomes on all presented quantifications as well as a per-sex statistical analysis of the data from figure 4. Correspondingly, we extended supplementary information 2 by a per-sex colour code for the thin section TEM data.

      Fig. 4B-E: depict data as violin plots or scatter plots like Fig. 2C to get a better grasp of how the crista coverage is distributed. It seems like the data spread is wider in the double KO. This would also solve the problem with the standard deviation extending beyond 0%.

      We changed this accordingly.

      Lines 331-333: Please clarify that this applies for some, but not all MICOS subunits. Please also see major point 1 above. Also, the authors should point out that despite their structural divergence, trypanosomal cryptic mitofilins Mic34 and Mic40 are essential for parasite growth, in contrast to their findings with PfMic60 (DOI: https://doi.org/10.1101/2025.01.31.635831).

      This has been changed accordingly.

      Line 320: incorrect citation. Related to point 1above.

      Correct citation is now included in the text.

      Lines 333-335. This is related to the above. Again, some subunits appear to affect cell growth under lab conditions, and some do not. This and the previous sentence should be rewritten to reflect this.

      This has been changed accordingly.

      Line 343-345: The sentence and citation 45 are strange. Regarding the former, it is about CHCHD10, whose status as a bona fide MICOS subunit is very tenuous, so I would omit this. About the phenomenon observed, I think it makes more sense to write that Mic60 ablation results in partially fragmented mitochondria in yeast (Rabl et al., 2009 J Cell Biol. 185: 1047-63). A fragmented mitochondria is often a physiological response to stress. I would just rewrite as not to imply that mitochondrial fission (or fusion) is impaired in these KOs, or at least this could be one of several possibilities.

      The sentence has been substituted following the indication of the reviewer. Though we still include the data of the human cells as this has also been shown in Stephens et al. 2020.

      Line 373: 'This indicates' is too strong. I would say 'may suggest' as you have no proof that any of the KOs disrupts MICOS. This hypothesis can be tested by other means, but not by penetrance of a phenotype.

      Done

      Line 376-377; 'deplete functionality' does not make sense, especially in the context of talking about MICOS subunit stability. In my opinion, this paragraph overinterprets the KO effects on MICOS stability. None of the experiments address this phenomenon, and thus the authors should not try to interpret their results in this context. See major point 1. Other suggestions for added value

      We removed the sentence. Also, the entire paragraph has been shortened, restructured and wording was changed to address major point 1.

      1) Does Plasmodium Sam50 co-fractionate with Mic60 and Mic19 in BN PAGE (Fig. 1E)

      While we did identify SAMM50 in our BN PAGE, the protein does not co-migrate with the MICOS components but instead comigrates with other components of a putative sorting and assembly machinery (SAM) complex. As SAMM50, the SAM complex and the overarching putative mitochondrial membrane space bridging (MIB) complex are not mentioned in the manuscript, we decided to not include the information in the figure.

      Reviewer #2 (Significance (Required)):

      The manuscript by Tassan-Lugrezin is predicated on the idea that Plasmodium represents the only system in which de novo crista formation can be studied. They leverage this system to ask the question whether MICOS is essential for this process. They conclude based on their data that the answer is no, which the authors consider unprecedented. But even if their claim is true that ABS is acristate, this supposed advantage does not really bring any meaningful insight into how MICOS works in Plasmodium.

      First the positives of this manuscript. As has been the case with this research team, the manuscript is very sophisticated in the experimental approaches that are made. The highlights are the beautiful and often conclusive microscopy performed by the authors. Only the localization of Mic60 and Mic19 was inconclusive due to their very low expression unfortunately.

      The examination of the MICOS mutants during in vitro life cycle of Plasmodium falciparum is extremely impressive and yields convincing results. Mitochondrial deformation is tolerated by life cycle stage differentiation, with a modest but significant reduction of oocyte production, being observed.

      However, despite the herculean efforts of the authors, the manuscript as it currently stands represents only a minor advance in our understanding of the evolution of MICOS, which from the title and focus of the manuscript, is the main goal of the authors. In its current form, the manuscript reports some potentially important findings:

      1) Mic60 is verified to play a role in crista formation, as is predicted by its orthology to other characterized Mic60 orthologs.

      2) The discovery of a novel Mic19 analog (since the authors maintain there is no significant sequence homology), which exhibits a similar (or the same?) complexome profile with Mic60. This protein was upregulated in gametocytes like Mic60 and phenocopies Mic60 KO.

      3) Both of these MICOS subunits are essential (not dispensable) for proper crista formation

      4) Surprisingly, neither MICOS subunit is essential for in vitro growth or differentiation from ABS to sexual stages, and from the latter to sporozoites. This says more about the biology of plasmodium itself than anything about the essentiality of Mic60, ie plasmodium life cycle progression tolerates defects to mitochondrial morphology. But yes, I agree with the authors that Mic60's apparent insignificance for cell growth in examined conditions does differ with its essentiality in other eukaryotes. But fitness costs were not assayed (eg by competition between mutants and WT in infection of mosquitoes)

      5) Decreased fitness of the mutants is implied by a reduction of oocyte formation.

      While interesting in their own way, collectively they do not represent a major advance in our understanding of MICOS evolution. Furthermore, the findings bifurcate into categories informing MICOS or Plasmodium biology. Both aspects are somewhat underdeveloped in their current form.

      This is unfortunate because there seem to be many missed opportunities in the manuscript that could, with additional experiments, lead to a manuscript with much wider impact. For me, what is remarkable about Plasmodium MICOS that sets it apart from other iterations is the apparent absence of the Mic10 subunit. Purification of plasmodium MICOS via the epitope tagged Mic60 and Mic19 could have verified that MICOS is assembled without this core subunit. Perhaps Mic60 and Mic19 are the vestiges of the complex, and thus operate alone in shaping cristae. Such a reduction may also suggest the declining importance of mitochondria in plasmodium.

      Another missed opportunity was to assay the impact of MICOS-depletion of OXPHOS in plasmodium. This is a salient issue as maybe crista morphology is decoupled from OXPHOS capacity in Plasmodium, which links to the apparent tolerance of mitochondrial morphology in cell growth and differentiation. I suggested in section A experiments to address this deficit.

      Finally, the authors could assay fitness costs of MICOS-ablation and associated phenotypes by assaying whether mosquito infectivity is reduced in the mutants when they are directly competing with WT plasmodium. Like the authors, I am also surprised that MICOS mutants can pass population bottlenecks represented by differentiation events. Perhaps the apparent robustness of differentiation may contribute plasmodium's remarkable ability to adapt.

      I realize that the authors put a lot of efforts into their study and again, I am very impressed by the sophistication of the methods employed. Nevertheless, I think there is still better ways to increase the impact of the study aside from overinterpreting the conclusions from the data. But this would require more experiments along the lines I suggest in Section A and here.

      We thank the reviewer for their extensive analysis of the significance of our findings, including the compliments on our microscopy images and the sophisticated experimental approaches. We hope we have convincingly argued why we could or could not include some of the additional analyses suggested by the reviewer in section 1 above.

      With regard to the significance statement, we want to point out that our finding that PfMICOS is not needed for initial formation of cristae (as opposed to organization thereof), is a confirmation of something that has been assumed by the field, without being the actual focus of studies. We argue that the distinction between formation and organization of cristae is important and deserves some attention within the manuscript. The result of MICOS not being involved in the initial formation of cristae, we argue to be relevant in Plasmodium biology and beyond. As for the insights into how MICOS works in Plasmodium we have confirmed that the previously annotated PfMIC60 is indeed involved in the organization of cristae. Furthermore, we have identified and characterized PfMIC19. These findings, we argue, are indeed meaningful insights into PfMICOS.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      MICOS is a conserved mitochondrial protein complex responsible for organising the mitochondrial inner membrane and the maintenance of cristae junctions. This study sheds first light on the role of two MICOS subunits (Mic60 and the newly annotated Mic19) in the malaria parasite Plasmodium falciparum, which forms cristae de novo during sexual development, as demonstrated by EM of thin section and electron tomography. By generating knockout lines (including a double knockout), the authors demonstrate that knockout of both MICOS subunits leads to defects in cristae morphology and a partial loss of cristae junctions. With a formidable set of parasitological assays, the authors show that despite the metabolically important role of mitochondria for gametocytes, the knockout lines can progress through the life stages and form sporozoites, albeit with diminished infection efficiency.

      We thank the reviewer for their time and compliment.

      Major comments:

      1) The authors should improve to present their findings in the right context, in particular by:

      (i) giving a clearer description in the introduction of what is already known about the role of MICOS. This starts in the introduction, where one main finding is missing: loss of MICOS leads to loss of cristae junctions and the detachment of cristae membranes, which are nevertheless formed, but become membrane vesicles. This needs to be clearly stated in the introduction to allow the reader to understand the consistency of the authors' findings in P. falciparum with previous reports in the literature.

      We extended the introduction to include this information.

      (ii) at the end to the introduction, the motivating hypothesis is formulated ad hoc "conclusive evidence about its involvement in the initial formation of cristae is still lacking" (line 83). If there is evidence in the literature that MICOS is strictly required for cristae formation in any organism, then this should be explained, because the bona fide role of MICOS is maintenance of cristae junctions (the hypothesis is still plausible and its testing important).

      To clarify we rephrased the sentence to: "Although MICOS has been described as an organizer of crista junctions, its role during the initial formation of nascent cristae has not been investigated."

      2) Line 96-97: "Interestingly, PfMIC60 is much larger than the human MICOS counterpart, with a large, poorly predicted N-terminal extension." This statement is lacking a reference and presumably refers to annotated ORFs. The authors should clarify if the true N-terminus is definitely known - a 120kDa size is shown for the P. falciparum but this is not compared to the expected length or the size in S. cerevisiae.

      To solve the reference issue, we added the uniprot IDs we compared to see that the annotated ORF is bigger in Plasmodium. We also changed the comparison to yeast instead of human, because we realized it is confusing to compare to yeast all throughout the figure, but then talk about human in this specific sentence.

      Regarding whether the true N-terminus is known. Short answer: No, not exactly.

      However, we do know that the Pf version is about double the size of the yeast protein.

      As the reviewer correctly states, we show the size of 120kDa for the tagged protein in Figure 1G. Considering that we tagged the protein C-terminally, and observed a 120kDa product on western blot, it is safe to conclude that the true N-terminus does not deviate massively from the annotated ORF, and hence, that there is a considerable extension of the protein beyond a 60kDa protein. We do not directly compare to yeast MIC60 on our western blots, however, that comparison can be drawn from literature: Tarasenko et al., 2017 showed that purified MIC60 running at ~60kDa on SDS-PAGE actively bends membranes, suggesting that in its active form, the monomer of yeast MIC60 is indeed 60kDa in size.

      To clarify, we now emphasize that we ran the Alphafold prediction on the annotated open reading frame (annotated and sequenced by Bohme et al. and Chapell et al. now cited in the manuscript), and revised the wording to make clear what we are comparing in which sentence.

      3) lines 244-245: "Furthermore, our data indicates the effect size increases with simultaneous ablation of both proteins?". The authors should explain which data they are referring to, as some of the data in Fig 3 and 4 look similar and all significance tests relate to the wild type, not between the different mutants, so it is not clear if any overserved differences are significant. The authors repeat this claim in the discussion in lines 368-369 without referring to a specific significance test. This needs to be clarified.

      As a reply to this and other comments from the reviewers we added the multiple testing within all samples. In addition, to clarify statistics used we included a supplementary dataset with all p-values and statistical tests used.

      4) lines 304-306: "Though well established as the cristae organizing system, the role of MICOS in initial formation of cristae remains hidden in model organisms that constitutively display cristae.". This sentence is misleading since even in organisms that display numerous cristae throughout their life cycle, new cristae are being formed as the cells proliferate. Thus, failure to produce cristae in MICOS knockout lines would have been observable but has apparently not been reported in the literature. Thus, the concerted process in P. falciparum makes it a great model organism, but not fundamentally different to what has been studied before in other organisms.

      We deleted this statement.

      5) lines 373-378. "where ablation of just MIC60 is sufficient to deplete functionality of the entire MICOS (11, 15),". The authors' claim appears to be contrary to what is actually stated in ref 15, which they cite:

      "MICOS subunits have non-redundant functions as the absence of both MICOS subcomplexes results in more severe morphological and respiratory growth defects than deletion of single MICOS subunits or subcomplexes."

      This seems in line with what the authors show, rather than "different".

      This sentence has been removed.

      6) lines 380-385: "... thus suggesting that membrane invaginations still arise, but are not properly arranged in these knockout lines. This suggests that MICOS either isn't fully depleted,...". These conclusions are incompatible with findings from ref. 15, which the authors cite. In that study, the authors generated a ∆MICOS line which still forms membrane invaginations, showing that MICOS is not required at all for this process in yeast. Hence the authors' implication that MICOS needs to be fully depleted before membrane invaginations cease to occur is not supported by the literature.

      This sentence has been deleted in the revised version of the manuscript.

      Minor comments:

      7) The authors should consider if the first part of their title could be seen as misleading: It suggests that MICOS is "the architect" in cristae formation, but this is not consistent with the literature nor their own findings.

      Title is changed accordingly

      Minor comments:

      • Line 43, of the three seminal papers describing the discovery of MICOS in 2011, the authors only cite two (refs 6 and 7), but miss the third paper, Hoppins et al, PMID: 21987634, which should probably be corrected.

      Done, the paper is now cited

      • Page 2, line 58: for a more complete picture the authors should also cite the work of others here which shows that although at very low levels, e.g. complex III (a drug target) and ATP synthase do assemble (Nina et al, 2011, JBC).

      Done

      • Page 3, line 80: "Irrespective of the shape of an organism's cristae, the crista junctions have been described as tubular channels that connect the cristae membrane to the inner boundary membrane (22, 24)." This omits the slit-shaped cristae junctions found in yeast (Davies et al, 2011, PNAS), which the authors should include.

      The paper and concept have been added to the manuscript, though the sentence has been moved up in the introduction, when crista junctions are first introduced.

      • Line 97: "poorly predicted N-terminal extension", as there is no experimental structure, we don't know if the prediction is poor. Presumably the authors mean either poorly ordered or the absence of secondary structure elements, or the poor confidence score for that region in the prediction? This should be clarified or corrected.

      We were referring to the poor confidence score. To address this comment as well as major point 2, we rewrote the respective paragraph. It now clearly states that confidence of the prediction is low, and we mention the tool that was used to identify conserved domains (Topology-based Evolutionary Domains).

      • Line 98: "an antiparallel array of ten β-sheets". They are actually two parallel beta-sheets stacked together. The authors could find out the name of this fold, but the confidence of the prediction is marked a low/very low. So, its existence is unknown, not just its "function".

      We adapted the domain description to "a stack of two parallel beta-sheets" and replaced the statement on unknown function by the statement "Because this domain is predicted solely from computational analysis, both its actual existence in the native protein and its biological function remain unknown."

      Fig 1B: The authors show two alphafold predictions of S. cerevisiae and P. falciparum Mic60 structures. There is however an experimental Mic60/19 (fragment) structure from the former organism (PMID: 36044574), which should be included if possible

      We appreciate the reviewer's suggestion and note that the available structural data indeed provides valuable insight into how MIC60 and MIC19 interact. However, these structures represent fusion constructs of limited protein fragments and therefore capture only a small portion of each protein, specifically the interaction interface. Because our aim in Fig. 1B is to compare the overall domain architecture of the full‑length proteins, we believe that including fragment‑based structures would be less informative in this context.

      Line: 318-321: "The same trend was observed for PfMIC19 and PfMIC60. Although transcriptomic data suggested that low-level transcripts of PfMIC19 and PfMIC60 are present in ABS (38), we did not detect either of the proteins in ABS by western blot analysis. While this statement is true, the authors should comment on the sensitivity of the respective methods - how well was the antibody working in their hands and how do they interpret the absence of a WB band compared to transcriptomics data?

      The HA antibody used in our experiments is a standard commercial reagent that performs reliably in both WB and IFA, although it shows a low background signal in gametocytes. We agree that the sensitivity of the method and the interpretation of weak or absent bands should be addressed explicitly. Transcript levels for both PfMIC19 and PfMIC60 in asexual blood stages fall within the

      • Lines 322-323: would the authors not typically have expected an IFA signal given the strength of the band in Western blot? If possible, the authors should comment if the negative fluorescence outcome can indeed be explained with the low abundance or if technical challenges are an equally good explanation.

      Considering the nature of the investigated proteins (embedded in the IMM and spread throughout the mitochondria) difficulties in achieving a clear signal in IFA or U-ExM are not very surprizing. While epitopes may remain buried in IFA, U-ExM usually increases accessibility for the antibodies. However, U-ExM comes at the cost of being prone to dotty background signals, therefore potentially hiding low abundance, naturally dotty signals such as the signal of MICOS proteins that localize to distinct foci (at the CJ) along the mitochondrion. Current literature suggests that, in both human and yeast, STED is the preferred method for accurate spatial resolution of MICOS proteins (https://www.ncbi.nlm.nih.gov/pubmed/32567732,https://www.ncbi.nlm.nih.gov/pubmed/32067344). Unfortunately, we do not have experience with, nor access to, this particular technique/method.

      Lines 357-365: the authors describe limitations of the applied methods adequately. Perhaps it would be helpful to make a similar statement about the analysis of 3D objects like mitochondria and cristae from 2D sections. E.g. the apparent cristae length depends on whether cristae are straight (e.g. coiled structures do not display long cross sections despite their true length in 3D).

      The limitations of other methods are described in the respective results section.

      We added a clarifying sentence in the results section of Figure 4:

      "Note that such measurements do not indicate the true total length or width of cristae, as the data is two-dimensional. The recorded values are to be considered indicative of possible trends, rather than absolute dimensions of cristae."

      This statement refers to the length/width measurements of cristae.

      In the context of Figure 4 D we mention the following (see preprint lines 229 - 230): "We expect this effect to translate into the third dimension and thus conclude that the mean crista volume increases with the loss of either PfMIC19,PfMIC60, or both."

      For Figure 5, we included a clarifying statement in the results section of the preprint (lines 269 - 273): "Note that these mitochondrial volumes are not full mitochondria, but large segments thereof. As a result of the incompleteness of the mitochondria within the section, and the tomography specific artefact of the missing wedge, we were unable to confirm whether cristae were in fact fully detached from the boundary membrane, or just too long to fit within the observable z-range. "

      Line 404: perhaps undetected or similar would be a better description than "hidden"?

      The sentence does not exist in the revised manuscript

      Reviewer #3 (Significance (Required)):

      The main strength of the study is that it provides the first characterisation of the MICOS complex in P. falciparum, a human parasite in which the mitochondrion has been shown to be a drug target. Mic60 and the newly annotated Mic19 are confirmed to be essential for proper cristae formation and morphology, as well as overall mitochondrial morphology. Furthermore, the mutant lines are characterised for their ability to complete the parasite life cycle and defects in infection effectivity are observed. This work is an important first step for deciphering the role of MICOS in the malaria parasite and the composition and function of this complex in this organism. The limitation of the study stems from what is already known about MICOS and its subunits in

      great detail in yeast and humans with similar findings regarding loss of cristae and cristae defects. The findings of this study do not provide dramatic new insight on MICOS function or go substantially beyond the vast existing literature in terms of the extent of the study, which focuses on parasitological assays and morphological analysis. Exploring the role of MICOS in an early-divergent organism and human parasite is however important given the divergence found in mitochondrial biology and P. falciparum is a uniquely suited model system. One aspect that would increase the impact of the paper would be if the authors could mechanistically link the observed morphological defects to the decreased infection efficiency, e.g. by probing effects on mitochondrial function. This will likely be challenging as the morphological defects are diverse and the fitness defects appear moderate/mild.

      As suggested by Reviewer 2, we examined mitochondrial membrane potential in gametocytes using MitoTracker staining and did not observe any obvious differences associated with the morphological defects. At present, additional assays to probe mitochondrial function in P. falciparum gametocytes are not sufficiently established, and developing and validating such methods would require substantial work before they could be applied to our mutant lines. For these reasons, a more detailed mechanistic link between the observed morphological changes and the reduced infection efficiency is currently beyond reach.

      The advance presented in this study is to pioneer the study of MICOS in P. falciparum, thus widening our understanding of the role of this complex to different model organism. This study will likely be mainly of interest for specialised audiences such as basic research parasitologists and mitochondrial biologists. My own field of expertise is mitochondrial biology and structural biology.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) I have to admit that it took a few hours of intense work to understand this paper and to even figure out where the authors were coming from. The problem setting, nomenclature, and simulation methods presented in this paper do not conform to the notation common in the field, are often contradictory, and are usually hard to understand. Most importantly, the problem that the paper is trying to solve seems to me to be quite specific to the particular memory study in question, and is very different from the normal setting of model-comparative RSA that I (and I think other readers) may be more familiar with.

      We have revised the paper for clarity at all levels: motivation, application, and parameterization. We clarify that there is a large unmet need for using RSA in a trial-wise manner, and that this approach indeed offers benefits to any team interested in decoding trial-wise representational information linked to a behavioral responses, and as such is not a problem specific to a single memory study.

      (2) The definition of "classical RSA" that the authors are using is very narrow. The group around Niko Kriegeskorte has developed RSA over the last 10 years, addressing many of the perceived limitations of the technique. For example, cross-validated distance measures (Walther et al. 2016; Nili et al. 2014; Diedrichsen et al. 2021) effectively deal with an uneven number of trials per condition and unequal amounts of measurement noise across trials. Different RDM comparators (Diedrichsen et al. 2021) and statistical methods for generalization across stimuli (Schütt et al. 2023) have been developed, addressing shortcomings in sensitivity. Finally, both a Bayesian variant of RSA (Pattern component modelling, (Diedrichsen, Yokoi, and Arbuckle 2018) and an encoding model (Naselaris et al. 2011) can effectively deal with continuous variables or features across time points or trials in a framework that is very related to RSA (Diedrichsen and Kriegeskorte 2017). The author may not consider these newer developments to be classical, but they are in common use and certainly provide the solution to the problems raised in this paper in the setting of model-comparative RSA in which there is more than one repetition per stimulus.

      We appreciate the summary of relevant literature and have included a revised Introduction to address this bounty of relevant work. While much is owed to these authors, new developments from a diverse array of researchers outside of a single group can aid in new research questions, and should always have a place in our research landscape. We owe much to the work of Kriegeskorte’s group, and in fact, Schutt et al., 2023 served as a very relevant touchpoint in the Discussion and helped to highlight specific needs not addressed by the assessment of the “representational geometry” of an entire presented stimulus set. Principal amongst these needs is the application of trial-wise representational information that can be related to trial-wise behavioral responses and thus used to address specific questions on brain-behavior relationships. We invite the Reviewer to consider the utility of this shift with the following revisions to the Introduction.

      Page 3. “Recently, methodological advancements have addressed many known limitations in cRSA. For example, cross-validated distance measures (e.g., Euclidean distance) have improved the reliability of representational dissimilarities in the presence of noise and trial imbalance (Walther et al., 2016; Nili et al., 2014; Diedrichsen et al., 2021). Bayesian approaches such as pattern component modeling (Diedrichsen, Yokoi, & Arbuckle, 2018) have extended representational approaches to accommodate continuous stimulus features or temporal variation. Further, model comparison RSA strategies (Diedrichsen et al., 2021) and generalization techniques across stimuli (Schütt et al., 2023) have improved sensitivity and inference. Nevertheless, a common feature shared across most of improvements is that they require stimuli repetition to examine the representational structure. This requirement limits their ability to probe brain-behavior questions at the level of individual events”.

      Page 8. “While several extensions of RSA have addressed key limitations in noise sensitivity, stimulus variance, and modeling (e.g., Diedrichsen et al., 2021; Schütt et al., 2023), our tRSA approach introduces a new methodological step by estimating representational strength at the trial level. This accounts for the multi-level variance structure in the data, affords generalizability beyond the fixed stimulus set, and allows one to test stimulus- or trial-level modulations of neural representations in a straightforward way”.

      Page 44. “Despite such prevalent appreciation for the neurocognitive relevance of stimulus properties, cRSA often does not account for the fact that the same stimulus (e.g., “basketball”) is seen by multiple subjects and produces statistically dependent data, an issue addressed by Schütt et al., 2023, who developed cross validation and bootstrap methods that explicitly model dependence across both subjects and stimulus conditions”.

      (3) The stated problem of the paper is to estimate "representational strength" in different regions or conditions. With this, the authors define the correlation of the brain RDM with a model RDM. This metric conflates a number of factors, namely the variances of the stimulus-specific patterns, the variance of the noise, the true differences between different dissimilarities, and the match between the assumed model and the data-generating model. It took me a long time to figure out that the authors are trying to solve a quite different problem in a quite different setting from the model-comparative approach to RSA that I would consider "classical" (Diedrichsen et al. 2021; Diedrichsen and Kriegeskorte 2017). In this approach, one is trying to test whether local activity patterns are better explained by representation model A or model B, and to estimate the degree to which the representation can be fully explained. In this framework, it is common practice to measure each stimulus at least 2 times, to be able to estimate the variance of noise patterns and the variance of signal patterns directly. Using this setting, I would define 'representational strength" very differently from the authors. Assume (using LaTeX notation) that the activity patterns $y_j,n$ for stimulus j, measurement n, are composed of a true stimulus-related pattern ($u_j$) and a trial-specific noise pattern ($e_j,n$). As a measure of the strength of representation (or pattern), I would use an unbiased estimate of the variance of the true stimulus-specific patterns across voxels and stimuli ($\sigma^2_{u}$). This estimator can be obtained by correlating patterns of the same stimuli across repeated measures, or equivalently, by averaging the cross-validated Euclidean distances (or with spatial prewhitening, Mahalanobis distances) across all stimulus pairs. In contrast, the current paper addresses a specific problem in a quite specific experimental design in which there is only one repetition per stimulus. This means that the authors have no direct way of distinguishing true stimulus patterns from noise processes. The trick that the authors apply here is to assume that the brain data comes from the assumed model RDM (a somewhat sketchy assumption IMO) and that everything that reduces this correlation must be measurement noise. I can now see why tRSA does make some sense for this particular question in this memory study. However, in the more common model-comparative RSA setting, having only one repetition per stimulus in the experiment would be quite a fatal design flaw. Thus, the paper would do better if the authors could spell the specific problem addressed by their method right in the beginning, rather than trying to set up tRSA as a general alternative to "classical RSA".

      At a general level, our approach rests on the premise that there is meaningful information present in a single presentation of a given stimulus. This assumption may have less utility when the research goals are more focused on estimating the fidelity of signal patterns for RSA, as in designs with multiple repetitions. But it is an exaggeration to state that such a trial-wise approach cannot address the difference between “true” stimulus patterns and noise. This trial-wise approach has explicit utility in relating trial-wise brain information to trial-wise behavior, across multiple cognitions (not only memory studies, as applied here). We have added substantial text to the Introduction distinguishing cRSA, which is widely employed, often in cases with a single repetition per stimulus, and model comparative methods that employ multiple repetitions. We clarify that we do not consider tRSA an alternative to the model comparative approach, and discuss that operational definitions of representational strength are constrained by the study design.

      Page 3. “In this paper, we present an advancement termed trial-level RSA, or tRSA, which addresses these limitations in cRSA (not model comparison approaches) and may be utilized in paradigms with or without repeated stimuli”.

      Page 4. “Representational geometry usually refers to the structure of similarities among repeated presentations of the same stimulus in the neural data (as captured in the brain RSM) and is often estimated utilizing a model comparison approach, whereas representational strength is a derived measure that quantifies how strongly this geometry aligns with a hypothesized model RSM. In other words, geometry characterizes the pattern space itself, while representational strength reflects the degree of correspondence between that space and the theoretical model under test”.

      Finally, we clarified that in our simulation methods we assume a true underlying activity pattern and a random error pattern. The model RSM is computed based on the true pattern, whereas the brain RSM comes from the noisy pattern, not the model RSM itself.

      Page 9. “Then, we generated two sets of noise patterns, which were controlled by parameters σ<sub>A</sub> and σ<sub>B</sub> , respectively, one for each condition”.

      (4) The notation in the paper is often conflicting and should be clarified. The actual true and measured activity patterns should receive a unique notation that is distinct from the variances of these patterns across voxels. I assume that $\sigma_ijk$ is the noise variances (not standard deviation)? Normally, variances are denoted with $\sigma^2$. Also, if these are variances, they cannot come from a normal distribution as indicated on page 10. Finally, multi-level models are usually defined at the level of means (i.e., patterns) rather than at the level of variances (as they seem to be done here).

      We have added notations for true and measured activity patterns to differentiate it from our notation for variance. We agree that multilevel models are usually defined at the level of means rather than at the level of variances and we include a Figure (Fig 1D) that describes the model in terms of the means. We clarify that the σ ($\sigma$) used in the manuscript were not variances/standard deviations themselves; rather, they were meant to denote components of the actual (multilevel) variance parameter. Each component was sampled from normal distributions, and they collectively summed up to comprise the final variance parameter for each trial. We have modified our notation for each component to the lowercase letter s to minimize confusion. We have also made our R code publicly available on our lab github, which should provide more clarity on the exact simulation process.

      (5) In the first set of simulations, the authors sampled both model and brain RSM by drawing each cell (similarity) of the matrix from an independent bivariate normal distribution. As the authors note themselves, this way of producing RSMs violates the constraint that correlation matrices need to be positive semi-definite. Likely more seriously, it also ignores the fact that the different elements of the upper triangular part of a correlation matrix are not independent from each other (Diedrichsen et al. 2021). Therefore, it is not clear that this simulation is close enough to reality to provide any valuable insight and should be removed from the paper, along with the extensive discussion about why this simulation setting is plainly wrong (page 21). This would shorten and clarify the paper.

      We have added justification of the mixed-effects model given the potential assumption violations. We caution readers to investigate the robustness of their models, and to employ permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the Appendix. Finally, we agree that the first simulation setting does not possess several properties of realistic RDMs/RSMs; however, we believe that there is utility in understanding the mathematical properties of correlations – an essential component of RSA – in a straightforward simulation where the ground truth is known, thus moving the simulation to Appendix 1.

      (6) If I understand the second simulation setting correctly, the true pattern for each stimulus was generated as an NxP matrix of i.i.d. standard normal variables. Thus, there is no condition-specific pattern at all, only condition-specific noise/signal variances. It is not clear how the tRSA would be biased if there were a condition-specific pattern (which, in reality, there usually is). Because of the i.i.d. assumption of the true signal, the correlations between all stimulus pairs within conditions are close to zero (and only differ from it by the fact that you are using a finite number of voxels). If you added a condition-specific pattern, the across-condition RSA would lead to much higher "representational strength" estimates than a within-condition RSA, with obvious problems and biases.

      The Reviewer is correct that the voxel values in the true pattern are drawn from i.i.d. standard normal distributions. We take the Reviewer’s suggestion of “condition-specific pattern” to mean that there could be a condition-voxel interaction in two non-mutually exclusive ways. The first is additive, essentially some common underlying multi-voxel pattern like [6, 34, -52, …, 8] for all condition A trials, and different one such pattern for condition B trials, etc. The second is multiplicative, essentially a vector of scaling factors [x1.5, x0.5, x0.8, …, x2.7] for all condition A trials, and a different one such vector for condition B trials, etc. Both possibilities could indeed affect tRSA as much as it would cRSA.

      Importantly, If such a strong condition-specific pattern is expected, one can build a condition-specific model RDM using one-shot coding of conditions (see example figure; src: https://www.newbi4fmri.com/tutorial-9-mvpa-rsa), to either capture this interesting phenomenon or to remove this out as a confounding factor. This practice has been applied in multiple regression cRSA approaches (e.g., Cichy et al., 2013) and can also be applied to tRSA.

      (7) The trial-level brain RDM to model Spearman correlations was analyzed using a mixed effects model. However, given the symmetry of the RDM, the correlations coming from different rows of the matrix are not independent, which is an assumption of the mixed effect model. This does not seem to induce an increase in Type I errors in the conditions studied, but there is no clear justification for this procedure, which needs to be justified.

      We appreciate this important warning, and now caution readers to investigate the robustness of their models, and consider employing permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the supplement.

      Page 46. “While linear mixed-effects modeling offers a powerful framework for analyzing representational similarity data, it is critical that researchers carefully construct and validate their models. The multilevel structure of RSA data introduces potential dependencies across subjects, stimuli, and trials, which can violate assumptions of independence if not properly modeled. In the present study, we used a model that included random intercepts for both subjects and stimuli, which accounts for variance at these levels and improves the generalizability of fixed-effect estimates. Still, there is a potential for systematic dependence across trials within a subject. To ensure that the model assumptions were satisfied, we conducted a series of diagnostic checks on an exemplar ROI (right LOC; middle occipital gyrus) in the Object Perception dataset, including visual inspection of residual distributions and autocorrelation (Appendix 3, Figure 13). These diagnostics supported the assumptions of normality, homoscedasticity, and conditional independence of residuals. In addition, we conducted permutation-based inference, similar to prior improvements to cRSA (Niliet al. 2014), using a nested model comparison to test whether the mean similarity in this ROI was significantly greater than zero. The observed likelihood ratio test statistic fell in the extreme tail of the null distribution (Appendix 3, Figure 14), providing strong nonparametric evidence for the reliability of the observed effect. We emphasize that this type of model checking and permutation testing is not merely confirmatory but can help validate key assumptions in RSA modeling, especially when applying mixed-effects models to neural similarity data. Researchers are encouraged to adopt similar procedures to ensure the robustness and interpretability of their findings”.

      Exemplar Permutation Testing

      To test whether the mean representational strength in the ROI right LOC (middle occipital gyrus) was significantly greater than zero, we used a permutation-based likelihood ratio test implemented via the permlmer function. This test compares two nested linear mixed-effects models fit using the lmer function from the lme4 package, both including random intercepts for Participant and Stimulus ID to account for between-subject and between-item variability.

      The null model excluded a fixed intercept term, effectively constraining the mean similarity to zero after accounting for random effects:

      ROI ~ 0 + (1 | Participant) + (1 | Stimulus)

      The full model included the same random effects structure but allowed the intercept to be freely estimated:

      ROI ~ 1 + (1 | Participant) + (1 | Stimulus)

      By comparing the fit of these two models, we directly tested whether the average similarity in this ROI was significantly different from zero. Permutation testing (1,000 permutations) was used to generate a nonparametric p-value, providing inference without relying on normality assumptions. The full model, which estimated a nonzero mean similarity in the right LOC (middle occipital gyrus), showed a significantly better fit to the data than the null model that fixed the mean at zero (χ²(1) = 17.60, p = 2.72 × 10⁻⁵). The permutation-based p-value obtained from permlmer confirmed this effect as statistically significant (p = 0.0099), indicating that the mean similarity in this ROI was reliably greater than zero. These results support the conclusion that the right LOC contains representational structure consistent with the HMAXc2 RSM. A density plot of the permuted likelihood ratio tests is plotted along with the observed likelihood ratio test in Appendix 3 Figure 14.

      (8) For the empirical data, it is not clear to me to what degree the "representational strength" of cRSA and tRSA is actually comparable. In cRSA, the Spearman correlation assesses whether the distances in the data RSM are ranked in the same order as in the model. For tRSA, the comparison is made for every row of the RSM, which introduces a larger degree of flexibility (possibly explaining the higher correlations in the first simulation). Thus, could the gains presented in Figure 7D not simply arise from the fact that you are testing different questions? A clearer theoretical analysis of the difference between the average row-wise Spearman correlation and the matrix-wise Spearman correlation is urgently needed. The behavior will likely vary with the structure of the true model RDM/RSM.

      We agree that the comparability between mean row-wise Spearman correlations and the matrix-wise Spearman correlation is needed. We believe that the simulations are the best approach for this comparison, since they are much more robust than the empirical dataset and have the advantage of knowing the true pattern/noise levels. We expand on our comparison of mean tRSA values and matrix-wise Spearman correlations on page 42.

      Page 42. “Although tRSA and cRSA both aim to quantify representational strength, they differ in how they operationalize this concept. cRSA summarizes the correspondence between RSMs as a single measure, such as the matrix-wise Spearman correlation. In contrast, tRSA computes such correspondence for each trial, enabling estimates at the level of individual observations. This flexibility allows trial-level variability to be modeled directly, but also introduces subtle differences in what is being measured. Nonetheless, our simulations showed that, although numerical differences occasionally emerged—particularly when comparing between-condition tRSA estimates to within-condition cRSA estimates—the magnitude of divergence was small and did not affect the outcome of downstream statistical tests”.

      (9) For the real data, there are a number of additional sources of bias that need to be considered for the analysis. What if there are not only condition-specific differences in noise variance, but also a condition-specific pattern? Given that the stimuli were measured in 3 different imaging runs, you cannot assume that all measurement noise is i.i.d. - stimuli from the same run will likely have a higher correlation with each other.

      We recognize the potential of condition-specific patterns and chose to constrain the analyses to those most comparable with cRSA. However, depending on their hypotheses, researchers may consider testing condition RSMs and utilizing a model comparison approach or employ the z-scored approach, as employed in the simulations above. Regarding the potential run confounds, this is always the case in RSA and why we exclude within-run comparisons. We have also added to the Discussion the suggestion to include run as a covariate in their mixed-effects models. However, we do not employ this covariate here as we preferred the most parsimonious model to compare with cRSA.

      Page 46 - 47. “Further, while analyses here were largely employed to be comparable with cRSA, researchers should consider taking advantage of the flexibility of the mixed-effects models and include co variates of non-interest (run, trial order etc.)”.

      (10) The discussion should be rewritten in light of the fact that the setting considered here is very different from the model-comparative RSA in which one usually has multiple measurements per stimulus per subject. In this setting, existing approaches such as RSA or PCM do indeed allow for the full modelling of differences in the "representational strength" - i.e., pattern variance across subjects, conditions, and stimuli.

      We agree that studies advancing designs with multiple repetitions of a given stimulus image are useful in estimating the reliability of concept representations. We would argue however that model comparison in RSA is not restricted to such data. Many extant studies do not in fact have multiple repetitions per stimulus per subject (Wang et al., 2018 https://doi.org/10.1088/1741-2552/abecc3, Gao et al, 2022 https://doi.org/10.1093/cercor/bhac058, Li et al, 2022 https://doi.org/10.1002/hbm.26195, Staples & Graves, 2020 https://doi.org/10.1162/nol_a_00018) that allow for that type of model-comparative approach. While beneficial in terms of noise estimation, having multiple presentations was not a requirement for implementing cRSA (Kriegeskorte, 2008 https://doi.org/10.3389/neuro.06.004.2008). The aim of this manuscript is to introduce the tRSA approach to the broad community of researchers whose research questions and datasets could vary vastly, including but not limited to the number of repeated presentations and the balance of trial counts across conditions.

      (11) Cross-validated distances provide a powerful tool to control for differences in measurement noise variances and possible covariances in measurement noise across trials, which has many distinct advantages and is conceptually very different from the approach taken here.

      We have added language on the value of cross-validation approaches to RSA in the Discussion:

      Page 47. “Additionally, we note that while our proposed tRSA framework provides a flexible and statistically principled approach for modeling trial-level representational strength, we acknowledge that there are alternative methods for addressing trial-level variability in RSA. In particular, the use of cross-validated distance metrics (e.g., crossnobis distance) has become increasingly popular for controlling differences in measurement noise variance and accounting for possible covariance structures across trials (Walther et al., 2016). These metrics offer several advantages, including unbiased estimation of representational dissimilarities under Gaussian noise assumptions and improved generalization to unseen data. However, cross-validated distances are conceptually distinct from the approach taken here: whereas cross-validation aims to correct for noise-related biases in representational dissimilarity matrices, our trial-level RSA method focuses on estimating and modeling the variability in representation strength across individual trials using mixed-effects modeling. Rather than proposing a replacement for cross-validated RSA, tRSA adds a complementary tool to the methodological toolkit—one that supports hypothesis-driven inference about condition effects and trial-level covariates, while leveraging the full structure of the data”.

      (12) One of the main limitations of tRSA is the assumption that the model RDM is actually the true brain RDM, which may not be the case. Thus, in theory, there could be a different model RDM, in which representational strength measures would be very different. These differences should be explained more fully, hopefully leading to a more accessible paper.

      Indeed, the chosen model RSM may not be the true RSM, but as the noise level increases the correlation between RSMs practically becomes zero. In our simulations we assume this to be true as a straightforward way to manipulate the correspondence between the brain data and the model. However, just like cRSA, tRSA is constrained by the model selections the researchers employ. We encourage researchers to have carefully considered theoretically-motivated models and, if their research questions require, consider multiple and potentially competing models. Furthermore, the trial-wise estimates produced by tRSA encourage testing competing models within the multiple regression framework. We have added this language to the Discussion.

      Page 46. ..”choose their model RSMs carefully. In our simulations, we designed our model RSM to be the “true” RSM for demonstration purposes. However, researchers should consider if their models and model alternatives”.

      Pages 45-46. “While a number of studies have addressed the validity of measuring representational geometry using designs with multiple repetitions, a conceptual benefit of the tRSA approach is the reliance on a regression framework that engenders the testing of competing conceptual models of stimulus representation (e.g., taxonomic vs. encyclopedic semantic features, as in Davis et al., 2021)”.

      Reviewer #2 (Public review):

      (1)  While I generally welcome the contribution, I take some issue with the accusatory tone of the manuscript in the Introduction. The text there (using words such as 'ignored variances', 'errouneous inferences', 'one must', 'not well-suited', 'misleading') appears aimed at turning cRSA in a 'straw man' with many limitations that other researchers have not recognized but that the new proposed method supposedly resolves. This can be written in a more nuanced, constructive manner without accusing the numerous users of this popular method of ignorance.

      We apologize for the unintended accusatory tone. We have clarified the many robust approaches to RSA and have made our Introduction and Discussion more nuanced throughout (see also 3, 11 and16).

      (2) The described limitations are also not entirely correct, in my view: for example, statistical inference in cRSA is not always done using classic parametric statistics such as t-tests (cf Figure 1): the rsatoolbox paper by Nili et al. (2014) outlines non-parametric alternatives based on permutation tests, bootstrapping and sign tests, which are commonly used in the field. Nor has RSA ever been conducted at the row/column level (here referred to by the authors as 'trial level'; cf King et al., 2018).

      We agree there are numerous methods that go beyond cRSA addressing these limitations and have added discussion of them into our manuscript as well as an example analysis implementing permutation tests on tRSA data (see response to 7). We thank the reviewer for bringing King et al., 2014 and their temporal generalization method to our attention, we added reference to acknowledge their decoding-based temporal generalization approach.

      Page 8. “It is also important to note that some prior work has examined similarly fine-grained representations in time-resolved neuroimaging data, such as the temporal generalization method introduced by King et al. (see King & Dehaene, 2014). Their approach trains classifiers at each time point and tests them across all others, resulting in a temporal generalization matrix that reflects decoding accuracy over time. While such matrices share some structural similarity with RSMs, they do not involve correlating trial-level pattern vectors with model RSMs nor do their second-level models include trial-wise, subject-wise, and item-wise variability simultaneously”.

      (3) One of the advantages of cRSA is its simplicity. Adding linear mixed effects modeling to RSA introduces a host of additional 'analysis parameters' pertaining to the choice of the model setup (random effects, fixed effects, interactions, what error terms to use) - how should future users of tRSA navigate this?

      We appreciate the opportunity to offer more specific proscriptions for those employing a tRSA technique, and have added them to the Discussion:

      Page 46. “While linear mixed-effects modeling offers a powerful framework for analyzing representational similarity data, it is critical that researchers carefully construct and validate their models and choose their model RSMs carefully. In our simulations, we designed our model RSM to be the “true” RSM for demonstration purposes. However, researchers should consider if their models and model alternatives. However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question”.

      (4) Here, only a single real fMRI dataset is used with a quite complicated experimental design for the memory part; it's not clear if there is any benefit of using tRSA on a simpler real dataset. What's the benefit of tRSA in classic RSA datasets (e.g., Kriegeskorte et al., 2008), with fixed stimulus conditions and no behavior?

      To clarify, our empirical approach uses two different tasks: an Object Perception task more akin to the classic RSA datasets employing passive viewing, and a Conceptual Retrieval task that more directly addresses the benefits of the trialwise approach. We felt that our Object Perception dataset is a simpler empirical fMRI dataset without explicit task conditions or a dichotomous behavioral outcome, whereas the Retrieval dataset is more involved (though old/new recognition is the most common form of memory retrieval testing) and  dependent on behavioral outcomes. However, we recognize the utility of replication from other research groups and do invite researchers to utilize tRSA on their datasets.

      (5) The cells of an RDM/RSM reflect pairwise comparisons between response patterns (typically a brain but can be any system; cf Sucholutsky et al., 2023). Because the response patterns are repeatedly compared, the cells of this matrix are not independent of one another. Does this raise issues with the validity of the linear mixed effects model? Does it assume the observations are linearly independent?

      We recognize the potential danger for not meeting model assumptions. Though our simulation results and model checks suggest this is not a fatal flaw in the model design, we caution readers to investigate the robustness of their models, and consider employing permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the Appendix. See response to R1.

      (6) The manuscript assumes the reader is familiar with technical statistical terms such as Type I/II error, sensitivity, specificity, homoscedasticity assumptions, as well as linear mixed models (fixed effects, random effects, etc). I am concerned that this jargon makes the paper difficult to understand for a broad readership or even researchers currently using cRSA that might be interested in trying tRSA.

      We agree this jargon may cause the paper to be difficult to understand. We have expanded/added definitions to these terms throughout the methods and results sections.

      Page 12. “Given data generated with 𝑠<sub>𝑐𝑜𝑛𝑑,𝐴</sub> = 𝑠<sub>𝑐𝑜𝑛𝑑,B</sub>, the correct inference should be a failure to reject the null hypothesis of ; any significant () result in either direction was considered a false positive (spurious effect, or Type I error). Given data generated with , the inference was considered correct if it rejected the null hypothesis of  and yielded the expected sign of the estimated contrast (b<sub>B-𝐴</sub><0). A significant result with the reverse sign of the estimated contrast (b<sub>B-𝐴</sub><0) was considered a Type I error, and a nonsignificant (𝑝 ≥ 0.05) result was considered a false negative (failure to detect a true effect, or Type II error)”.

      Page 2. “Compared to cRSA, the multi-level framework of tRSA was both more theoretically appropriate and significantly sensitive (better able to detect) to true effects”.

      Page 25.”The performance of cRSA and tRSA were quantified with their specificity (better avoids false positives, 1 - Type I error rate) and sensitivity (better avoids false negatives 1 - Type II error rate)”.

      Page 6. “One of the fundamental assumptions of general linear models (step 4 of cRSA; see Figure 1D) is homoscedasticity or homogeneity of variance — that is, all residuals should have equal variance” .

      Page11. “Specifically, a linear mixed-effects model with a fixed effect  of condition (which estimates the average effect across the entire sample, capturing the overall effect of interest) and random effects of both subjects and stimuli (which model variation in responses due to differences between individual subjects and items, allowing generalization beyond the sample) were fitted to tRSA estimates via the `lme4 1.1-35.3` package in R (Bates et al., 2015), and p-values were estimated using Satterthwaites’s method via the `lmerTest 3.1-3` package (Kuznetsova et al., 2017)”.

      (7) I could not find any statement on data availability or code availability. Given that the manuscript reuses prior data and proposes a new method, making data and code/tutorials openly available would greatly enhance the potential impact and utility for the community.

      We thank the reviewer for raising our oversight here. We have added our code and data availability statements.

      Page 9. “Data is available upon request to the corresponding author and our simulations and example tRSA code is available at https://github.com/electricdinolab”.

      Reviewer #1 (Recommendations for the authors):

      (13) Page 4: The limitations of cRSA seem to be based on the assumption that within each different experimental condition, there are different stimuli, which get combined into the condition. The framework of RSA, however, does not dictate whether you calculate a condition x condition RDM or a larger and more complete stimulus x stimulus RDM. Indeed, in practice we often do the latter? Or are you assuming that each stimulus is only shown once overall? It would be useful at this point to spell out these implicit assumptions.

      We agree that stimulus x stimulus RDMs can be constructed and are often used. However, as we mentioned in the Introduction, researchers are often interested in the difference between two (or more) conditions, such as “remembered” vs. “forgotten” (Davis et al., https://doi.org/10.1093/cercor/bhaa269) or “high cognitive load” vs. “low cognitive load” (Beynel et al., https://doi.org/10.1523/JNEUROSCI.0531-20.2020). In those cases, the most common practice with cRSA is to construct condition-specific RDMs, compute cRSA scores separately for each condition, and then compare the scores at the group level. The number of times each stimulus gets presented does not prevent one from creating a model RDM that has the same rows and columns as the brain RDM, either in the same condition (“high load”) or across different conditions.

      (14) Page 5: The difference between condition-level and stimulus-level is not clear. Indeed, this definition seems to be a function of the exact experimental design and is certainly up for interpretation. For example, if I conduct a study looking at the activity patterns for 4 different hand actions, each repeated multiple times, are these actions considered stimuli or conditions?

      We have added clarifying language about what is considered stimuli vs conditions. Indeed, this will depend on the specific research questions being employed and will affect how researchers construct their models. In this specific example, one would most likely consider each different hand action a condition, treating them as fixed effects rather than random effects, given their very limited number and the lack of need to generalize findings to the broader “hand actions” category.

      Page 5. “Critically, the distinction between condition-level and stimulus level is not always clear as researchers may manipulate stimulus-level features themselves. In these cases, what researchers ultimately consider condition-level and stimulus-level will depend on their specific research questions. For example, researchers intending to study generalized object representation may consider object category a stimulus-level feature, while researchers interested in if/how object representation varies by category may consider the same category variable condition-level”.

      (15) Page 5: The fact that different numbers of trials / different levels of measurement noise / noise-covariance of different conditions biases non-cross-validated distances is well known and repeatedly expressed in the literature. We have shown that cross-validation of distances effectively removes such biases - of course, it does not remove the increased estimation variability of these distances (for a formal analysis of estimation noise on condition patterns and variance of the cross-nobis estimator, see (Diedrichsen et al. 2021)).

      We thank the reviewer for drawing our attention to this literature and have added discussions of these methods.

      (16). Page 5: "Most studies present subjects with a fixed set of stimuli, which are supposedly samples representative of some broader category". This may be the case for a certain type of RSA experiments in the visual domain, but it would be unfair to say that this is a feature of RSA studies in general. In most studies I have been involved in, we use a "stimulus" x "stimulus" RDM.

      We have edited this sentence to avoid the “most” characterization. We also added substantial text to the introduction and discussion distinguishing cRSA, which is nonetheless widely employed, especially in cases with a single repetition per stimulus (Macklin et al., 2023, Liu et al, 2024) and the model comparative method and explicitly stating that we do not consider tRSA an alternative to the model comparative approach.

      (17). Page 5: I agree that "stimuli" should ideally be considered a random effect if "stimuli" can be thought of as sampled from a larger population and one wants to make inferences about that larger population. Sometimes stimuli/conditions are more appropriately considered a fixed effect (for example, when studying the response to stimulation of the 5 fingers of the right hand). Techniques to consider stimuli/conditions as a random effect have been published by the group of Niko Kriegeskorte (Schütt et al. 2023).

      Indeed, in some cases what may be thought of as “stimuli” would be more appropriately entered into the model as a fixed effect; such questions are increasingly relevant given the focus on item-wise stimulus properties (Bainbridge et al., Westfall & Yarkoni). We have added text on this issue to the Discussion and caution researchers to employ models that most directly answer their research questions.

      Page 46. “However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question. An effect is fixed when the levels represent the specific conditions of theoretical interest (e.g., task condition) and the goal is to estimate and interpret those differences directly. In contrast, an effect is random when the levels are sampled from a broader population (e.g., subjects) and the goal is to account for their variability while generalizing beyond the sample tested. Note that the same variable (e.g., stimuli) may be considered fixed or random depending on the research questions”.

      (18) Page 6: It is correct that the "classical" RSA depends on a categorical assignment of different trials to different stimuli/conditions, such that a stimulus x stimulus RDM can be computed. However, both Pattern Component Modelling (PCM) and Encoding models are ideally set up to deal with variables that vary continuously on a trial-by-trial or moment-by-moment basis. tRSA should be compared to these approaches, or - as it should be clarified - that the problem setting is actually quite a different one.

      We agree that PCM and encoding models offer a flexible approach and handle continuous trial-by-trial variables. We have clarified the problem setting in cRSA is distinct on page 6, and we have added the robustness of encoding models and their limitations to the Discussion.

      Page 6. “While other approaches such as Pattern Component Modeling (PCM) (Diedrichsen et al., 2018) and encoding models (Naselaris et al., 2011) are well-suited to analyzing variables that vary continuously on a trial-by-trial or moment-by-moment basis, these frameworks address different inferential goals. Specifically, PCM and encoding models focus on estimating variance components or predicting activation from features, while cRSA is designed to evaluate representational geometry. Thus, cRSA as well as our proposed approach address a problem setting distinct from PCM and encoding models”.

      (19) Page 8: "Then, we generated two noise patterns, which were controlled by parameters 𝜎 𝐴 and 𝜎𝐵, respectively, one for each condition." This makes little sense to me. The noise patterns should be unique to each trial - you should generate n_a + n_b noise patterns, no?

      We clarify that the “noise patterns” here are n_voxel x n_trial in size; in other words, all trial-level noise patterns are generated together and each trial has their own unique noise pattern. We have revised our description as “two sets of noise patterns” for clarity starting on page 9.

      (20) Page 9: First, I assume if this is supposed to be a hierarchical level model, the "noise parameters" here correspond to variances? Or do these \sigma values mean to signify standard deviations? The latter would make little sense. Or is it the noise pattern itself?

      As clarified in 4., the σ values are meant to denote hierarchical components of the composite standard deviation; we have updated our notation to use lower case letter s instead for clarity.

      (21) Page 10: your formula states "𝜎<sub>𝑠𝑢𝑏𝑗</sub>~ 𝙽(0, 0.5^2)". This conflicts with your previous mention that \sigmas are noise "levels" are they the noise patterns themselves now? Variances cannot be normally distributed, as they cannot be negative.

      As clarified in 4., the σ values are meant to denote hierarchical components of the composite standard deviation; we have updated our notation to use lower case letter s instead for clarity.

      (22) Page 13: What was the task of the subject in the Memory retrieval task? Old/new judgements relative to encoding of object perception?

      We apologize for the lack of clarity about the Memory Retrieval task and have added that information and clarified that the old/new judgements were relative to a separate encoding phase, the brain data for which has been reported elsewhere.

      Page 14. “Memory Retrieval took place one day after Memory Encoding and involved testing participants’ memory of the objects seen in the Encoding phase. Neural data during the Encoding phase has been reported elsewhere. In the main Memory Retrieval task, participants were presented with 144 labels of real-world objects, of which 114 were labels for previously seen objects and 30 were unrelated novel distractors. Participants performed old/new judgements, as well as their confidence in those judgements on a four-point scale (1 = Definitely New, 2 = Probably New, 3 = Probably Old, 4 = Definitely Old)”.

      (23) Page 13: If "Memory Retrieval consisted of three scanning runs", then some of the stimulus x stimulus correlations for the RSM must have been calculated within a run and some between runs, correct? Given that all within-run estimates share a common baseline, they share some dependence. Was there a systematic difference between the within-run and the between-run correlations?

      We have clarified in this portion of the methods that within run comparisons were excluded from our analyses. We also double-checked that the within-run exclusion was included in the description of the Neural RSMs.

      Page 14. “Retrieval consisted of three scanning runs, each with 38 trials, lasting approximately 9 minutes and 12 seconds (within-run comparisons were later excluded from RSA analyses)”.

      Page 18. “This was done by vectorizing the voxel-level activation values within each region and calculating their correlations using Pearson’s r, excluding all within-run comparisons.”

      (24) Page 20: It is not clear why the mean estimate of "representational strength" (i.e., model-brain RSM correlations) is important at all. This comes back to Major point #2, namely that you are trying to solve a very different problem from model-comparative RSA.

      We have clarified that our approach is not an alternative to model-comparative RSA, and that depending on the task constraints researchers may choose to compare models with tRSA or other approaches requiring stimulus repetition (see 3).

      (25) Page 21: I believe the problems of simulating correlation matrices directly in the way that the authors in their first simulation did should be well known and should be moved to an appendix at best. Better yet, the authors could start with the correct simulation right away.

      We agree the paper is more concise with these simulations being moved to the appendix and more briefly discussed. We have implemented these changes (Appendix 1). However, we are not certain that this problem is unknown, and have several anecdotes of researchers inquiring about this “alternative” approach in talks with colleagues, thus we do still discuss the issues with this method.

      (26) Page 26: Is the "underlying continuous noise variable 𝜎𝑡𝑟𝑖𝑎𝑙 that was measured by 𝑣𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 " the variance of the noise pattern or the noise pattern itself? What does it mean it was "measured" - how?

      𝜎𝑡𝑟𝑖𝑎𝑙 is a vector of standard deviations for different trials, and 𝜎𝑡𝑟𝑖𝑎𝑙 i would be used to generate the noise patterns for trial i. v_measured is a hypothetical measurement of trial-level variability, such as “memorability” or “heartbeat variability”. We have revised our description to clarify our methods.

      Reviewer #2 (Recommendations for the authors):

      (8) It would be helpful to provide more clarity earlier on in the manuscript on what is a 'trial': in my experience, a row or column of the RDM is usually referred to as 'stimulus condition', which is typically estimated on multiple trials (instances or repeats) of that stimulus condition (or exemplars from that stimulus class) being presented to the subject. Here, a 'trial' is both one measurement (i.e., single, individual presentation of a stimulus) and also an entry in the RDM, but is this the most typical scenario for cRSA? There is a section in the Discussion that discusses repetitions, but I would welcome more clarity on this from the get-go.

      We have added discussion of stimulus repetition methods and datasets to the Introduction and clarified our use of the terms.

      Page 8. “Critically, in single-presentation designs, a “trial” refers to one stimulus presentation, and corresponds to a row or column in the RSM. In studies with repeated stimuli, these rows are often called “conditions” and may reflect aggregated patterns across trials. tRSA is compatible with both cases: whether rows represent individual trials or averaged trials that create “conditions”, tRSA estimates are computed at the row level”.

      (9) The quality of the results figures can be improved. For example, axes labels are hard to read in Figure 3A/B, panels 3C/D are hard to read in general. In Figure 7E, it's not possible to identify the 'dark red' brain regions in addition to the light red ones.

      We thank the reviewer for raising these and have edited the figures to be more readable in the manner suggested.

      (10) I would be interested to see a comparison between tRSA and cRSA in other fMRI (or other modality) datasets that have been extensively reported in the literature. These could be the original Kriegeskorte 96 stimulus monkey/fMRI datasets, commonly used open datasets in visual perception (e.g., THINGS, NSD), or the above-mentioned King et al. dataset, which has been analyzed in various papers.

      We recognize the great utility of replication from other research groups and do invite researchers to utilize tRSA on their datasets.

      (11) On P39, the authors suggest 'researchers can confidently replace their existing cRSA analysis with tRSA': Please discuss/comment on how researchers should navigate the choice of modeling parameters in tRSA's linear mixed effects setting.

      We have added discussion of the mixed-effects parameters and the various and encourage researchers to follow best practices for their model selection.

      Page 46. “However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question”.

      (12) The final part of the Results section, demonstrating the tRSA results for the continuous memorability factor in the real fMRI data, could benefit from some substantiation/elaboration. It wasn't clear to me, for example, to what extent the observed significant association between representational strength and item memorability in this dataset is to be 'believed'; the Discussion section (p38). Was there any evidence in the original paper for this association? Or do we just assume this is likely true in the brain, based on prior literature by e.g. Bainbridge et al (who probably did not use tRSA but rather classic methods)?

      Indeed, memorability effects have been replicated in the literature, but not using the tRSA method. We have expanded our discussion to clarify the relationship of our findings and the relevant literature and methods it has employed.

      Page 38. “Critically, memorability is a robust stimulus property that is consistent across participants and paradigms (Bainbridge, 2022). Moreover, object memorability effects have been replicated using a variety of methods aside from tRSA, including univariate analyses and representational analyses of neural activity patterns where trial-level neural activity pattern estimates are correlated directly with object memorability (Slayton et al, 2025).”

      (13) The abstract could benefit from more nuance; I'm not sure if RSA can indeed be said to be 'the principal method', and whether it's about assessing 'quality' of representations (more commonly, the term 'geometry' or 'structure' is used).

      We have edited the abstract to reflect the true nuisance in the current approaches.

      Abstract. Neural representation refers to the brain activity that stands in for one’s cognitive experience, and in cognitive neuroscience, a prominent method of studying neural representations is representational similarity analysis (RSA). While there are several recent advances in RSA, the classic RSA (cRSA) approach examines the structure of representations across numerous items by assessing the correspondence between two representational similarity matrices (RSMs): usually one based on a theoretical model of stimulus similarity and the other based on similarity in measured neural data.

      (14) RSA is also not necessarily about models vs. neural data; it can also be between two neural systems (e.g., monkey vs. human as in Kriegeskorte et al., 2008) or model systems (see Sucholutsky et al., 2023). This statement is also repeated in the Introduction paragraph 1 (later on, it is correctly stated that comparing brain vs. model is most likely the 'most common' approach).

      We have added these examples in our introduction to RSA.

      Page 3.”One of the central approaches for evaluating information represented in the brain is representational similarity analysis (RSA), an analytical approach that queries the representational geometry of the brain in terms of its alignment with the representational geometry of some cognitive model (Kriegeskorte et al., 2008; Kriegeskorte & Kievit, 2013), or, in some cases, compares the representational geometry of two neural systems (e.g., Kriegeskorte et al., 2008) or two model systems (Sucholutsky et al., 2023)”.

      (15) 'theoretically appropriate' is an ambiguous statement, appropriate for what theory?

      We apologize for the ambiguous wording, and have corrected the text:

      Page 11. “Critically, tRSA estimates were submitted to a mixed-effects model which is statistically appropriate for modeling the hierarchical structure of the data, where observations are nested within both subjects and stimuli (Baayen et al., 2008; Chen et al., 2021)”.

      (16) I found the statement that cRSA "cannot model representation at the level of individual trials" confusing, as it made me think, what prohibits one from creating an RDM based on single-trial responses? Later on, I understood that what the authors are trying to say here (I think) is that cRSA cannot weigh the contributions of individual rows/columns to the overall representational strength differently.

      We thank the reviewer for their clarifying language and have added it to this section of the manuscript.

      “Abstract. However, because cRSA cannot weigh the contributions of individual trials (RSM rows/columns), it is fundamentally limited in its ability to assess subject-, stimulus-, and trial-level variances that all influence representation”.

      (17) Why use "RSM" instead of "RDM"? If the pairwise comparison metric is distance-based (e..g, 1-correlation as described by the authors), RDM is more appropriate.

      We apologize for the error, and have clarified the Methods text:

      Page3-4. First, brain activity responses to a series of N trials are compared against each other (typically using Pearson’s r) to form an N×N representational similarity matrix.

      (18) Figure 2: please write 'Correlation estimate' in the y-axis label rather than 'Estimate'.

      We have edited the label in Figure 2.

      (19) Page 6 'leaving uncertain the directionality of any findings' - I do not follow this argument. Obviously one can generate an RDM or RSM from vector v or vector -v. How does that invalidate drawing conclusions where one e.g., partials out the (dis)similarity in e.g., pleasantness ratings out of another RDM/RSM of interest?

      We agree such an approach does not invalidate the partial method; we have clarified what we mean by “directionality”.

      Page 8. ”For instance, even though a univariate random variable , such as pleasantness ratings, can be conveniently converted to an RSM using pairwise distance metrics (Weaverdyck et al., 2020), the very same RSM would also be derived from the opposite random variable , leaving uncertain of the directionality (or if representation is strongest for pleasant or unpleasant items) of any findings with the RSM (see also Bainbridge & Rissman, 2018)”.

      (20) P7 'sampled 19900 pairs of values from a bi-variate normal distribution', but the rows/columns in an RDM are not independent samples - shouldn't this be included in the simulation? I.e., shouldn't you simulate first the n=200 vectors, and then draw samples from those, as in the next analysis?

      This section has been moved to Appendix 1 (see responses to Reviewer 1.13).

      (21) Under data acquisition, please state explicitly that the paper is re-using data from prior experiments, rather than collecting data anew for validating tRSA.

      We have clarified this in the data acquisition section.

      Page 13. “A pre-existing dataset was analyzed to evaluate tRSA. Main study findings have been reported elsewhere (S. Huang, Bogdan, et al., 2024)”.

      (22) Figure 4 could benefit from some more explanation in-text. It wasn't clear to me, for example, how to interpret the asterisks depicted in the right part of the figure.

      We clarified the meaning of the asterisks in the main text in addition to the existent text in the figure caption.

      Page 26. “see Figure 4, off-diagonal cells in blue; asterisks indicate where tRSA was statistically more sensitive then cRSA)”.

      (23) Page 38 "the outcome of tRSA's improved characterization can be seen in multiple empirical outcomes:" it seems there is one mention of 'outcomes' too many here.

      We have revised this sentence.

      Page 41. “tRSA's improved characterization can be seen in multiple empirical outcomes”.

      (24) Page 38 "model fits became the strongest" it's not clear what aspect of the reported results in the paragraph before this is referring to - the Appendix?

      Yes, the model fits are in the Appendix, we have added this in text citation.

      Moreover, model-fits became the strongest when the models also incorporated trial-level variables such as fMRI run and reaction time (Appendix 3, Table 6).

      References

      Diedrichsen, J., Berlot, E., Mur, M., Schütt, H. H., Shahbazi, M., & Kriegeskorte, N. (2021). Comparing representational geometries using whitened unbiased-distance-matrix similarity. Neurons, Behavior, Data and Theory, 5(3). https://arxiv.org/abs/2007.02789

      Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), e1005508.

      Diedrichsen, J., Yokoi, A., & Arbuckle, S. A. (2018). Pattern component modeling: A flexible approach for understanding the representational structure of brain activity patterns. NeuroImage, 180, 119-133.

      Naselaris, T., Kay, K. N., Nishimoto, S., & Gallant, J. L. (2011). Encoding and decoding in fMRI. NeuroImage, 56(2), 400-410.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS Computational Biology, 10(4), e1003553.

      Schütt, H. H., Kipnis, A. D., Diedrichsen, J., & Kriegeskorte, N. (2023). Statistical inference on representational geometries. ELife, 12. https://doi.org/10.7554/eLife.82566

      Walther, A., Nili, H., Ejaz, N., Alink, A., Kriegeskorte, N., & Diedrichsen, J. (2016). Reliability of dissimilarity measures for multi-voxel pattern analysis. NeuroImage, 137, 188-200.

      King, M. L., Groen, I. I., Steel, A., Kravitz, D. J., & Baker, C. I. (2019). Similarity judgments and cortical visual responses reflect different properties of object and scene categories in naturalistic images. NeuroImage, 197, 368-382.

      Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H., ... & Bandettini, P. A. (2008). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 1126-1141.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS computational biology, 10(4), e1003553.

      Sucholutsky, I., Muttenthaler, L., Weller, A., Peng, A., Bobu, A., Kim, B., ... & Griffiths, T. L. (2023). Getting aligned on representational alignment. arXiv preprint arXiv:2310.13018.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this manuscript, Dillard and colleagues integrate cross-species genomic data with a systems approach to identify potential driver genes underlying human GWAS loci and establish the cell type(s) within which these genes act and potentially drive disease. Specifically, they utilize a large single-cell RNA-seq (scRNA-seq) dataset from an osteogenic cell culture model - bone marrow-derived stromal cells cultured under osteogenic conditions (BMSC-OBs) - from a genetically diverse outbred mouse population called the Diversity Outbred (DO) stock to discover network driver genes that likely underlie human bone mineral density (BMD) GWAS loci. The DO mice segregate over 40M single nucleotide variants, many of which affect gene expression levels, therefore making this an ideal population for systems genetic and co-expression analyses. The current study builds on previously published work from the same group that used co-expression analysis to identify co-expressed "modules" of genes that were enriched for BMD GWAS associations. In this study, the authors utilize a much larger scRNA-seq dataset from 80 DO BMSC-OBs, infer co-expression-based and Bayesian networks for each identified mesenchymal cell type, focused on networks with dynamic expression trajectories that are most likely driving differentiation of BMSC-OBs, and then prioritized genes ("differentiation driver genes" or DDGs) in these osteogenic differentiation networks that had known expression or splicing QTLs (eQTL/sQTLs) in any GTEx tissue that colocalized with human BMD GWAS loci. The systems analysis is impressive, the experimental methods are described in detail, and the experiments appear to be carefully done. The computational analysis of the single-cell data is comprehensive and thorough, and the evidence presented in support of the identified DDGs, including Tpx2 and Fgfrl1, is for the most part convincing. Some limitations in the data resources and methods hamper enthusiasm somewhat and are discussed below. Overall, while this study will no doubt be valuable to the BMD community, the cross-species data integration and analytical framework may be more valuable and generally applicable to the study of other diseases, especially for diseases with robust human GWAS data but for which robust human genomic data in relevant cell types is lacking. 

      Specific strengths of the study include the large scRNA-seq dataset on BMSC-OBs from 80 DO mice, the clustering analysis to identify specific cell types and sub-types, the comparison of cell type frequencies across the DO mice, and the CELLECT analysis to prioritize cell clusters that are enriched for BMD heritability (Figure 1). The network analysis pipeline outlined in Figure 2 is also a strength, as is the pseudotime trajectory analysis (results in Figure 3). One weakness involves the focus on genes that were previously identified as having an eQTL or sQTL in any GTEx tissue. The authors rightly point out that the GTEx database does not contain data for bone tissue, but the reason that eQTLs can be shared across many tissues - this assumption is valid for many cis-eQTLs, but it could also exclude many genes as potential DDGs with effects that are specific to bone/osteoblasts. Indeed, the authors show that important BMD driver genes have cell-type-specific eQTLs. Furthermore, the mesenchymal cell type-specific co-expression analysis by iterative WGCNA identified an average of 76 co-expression modules per cell cluster (range 26-153). Based on the limited number of genes that are detected as expressed in a given cell due to sparse per-cell read depth (400-6200 reads/cell) and dropouts, it's hard to believe that as many as 153 co-expression modules could be distinguished within any cell cluster. I would suspect some degree of model overfitting here and would expect that many/most of these identified modules have very few gene members, but the methods list a minimum module size of 20 genes. How do the numbers of modules identified in this study compare to other published scRNA-seq studies that use iterative WGCNA? 

      In the section "Identification of differentiation driver genes (DDGs)", the authors identified 408 significant DDGs and found that 49 (12%) were reported by the International Mouse Knockout [sic] Consortium (IMPC) as having a significant effect on whole-body BMD when knocked out in mice. Is this enrichment significant? E.g., what is the background percentage of IMPC gene knockouts that show an effect on whole-body BMD? Similarly, they found that 21 of the 408 DDGs were genes that have BMD GWAS associations that colocalize with GTEx eQTLs/sQTLs. Given that there are > 1,000 BMD GWAS associations, is this enrichment (21/408) significant? Recommend performing a hypergeometric test to provide statistical context to the reported overlaps here. 

      We thank the reviewer for their constructive feedback and thoughtful questions. In regards to the iterativeWGCNA, a larger number of modules is sometimes an outcome of the analysis, as reported in the iterativeWGCNA preprint (Greenfest-Allen et al., 2017). While we did not make a comparison to other works leveraging this tool for scRNA-seq, it has been used broadly across other published studies, such as PMID: 39640571, 40075303, 33677398, 33653874. While model overfitting, as you mention, may be a cause for more modules, our Bayesian network analysis we perform after iterativeWGCNA highlights smaller aspects of coexpression modules, as opposed to focusing on the entirety of any given module.

      We did not perform enrichment or statistical tests as our goal was to simply highlight attributes or unique features of these genes for additional context.

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, Farber and colleagues have performed single-cell RNAseq analysis on bone marrow-derived stem cells from DO Mice. By performing network analysis, they look for driver genes that are associated with bone mineral density GWAS associations. They identify two genes as potential candidates to showcase the utility of this approach. 

      Strengths: 

      The study is very thorough and the approach is innovative and exciting. The manuscript contains some interesting data relating to how cell differentiation is occurring and the effects of genetics on this process. The section looking for genes with eQTLs that differ across the differentiation trajectory (Figure 4) was particularly exciting. 

      Weaknesses: 

      The manuscript is in parts hard to read due to the use of acronyms and there are some questions about data analysis that need to be addressed. 

      We thank the reviewer for their feedback and shared enthusiasm for our work. We tried to minimize the use of technical acronyms as much as we could without compromising readability. Additionally, we addressed questions regarding aspects of data analysis. 

      Reviewer #1 (Recommendations for the authors):

      (1) For increased transparency and to allow reproducibility, it would be necessary for the scripts used in the analysis to be shared along with the publication of the preprint. Also, where feasible, sharing the processed data in addition to the raw data would allow the community greater access to the results and be highly beneficial. 

      Thank you for this suggestion. The raw data will be available via GEO accession codes listed in the data availability statement. We will make available scripts for some analyses on our Github (https://github.com/Farber-Lab/DO80_project) and processed scRNA-seq data in a Seurat object (.rds) on Zenodo (https://zenodo.org/records/15299631)

      (2) Lines 55-76: I think the summary of previous work here is too long. I understand that they would like to cover what has been done previously, but this seems like overkill. 

      Good suggestion. We have streamlined some of the summary of our previous work.

      (3) Did the authors try to map QTL for cell-type proportion differences in their BMSC-OBs? While 80 samples certainly limit mapping power, the data shown in Figs 4C/D suggest that you might identify a large-effect modifier of LMP/OB1 proportions. 

      We did try to map QTL for cell type proportion differences, but no significant associations were identified. 

      (4) Methods question: Does the read alignment method used in your analysis account for SNPs/indels that segregate among the DO/CC founder strains? If not, the authors may wish to include this in their discussion of study limitations and speculate on how unmapped reads could affect expression results. 

      The read alignment method we used does not account for SNPs/indels from the DO founder strains that fall in RNA transcripts captured in the scRNA-seq data. We have included this as a limitation in our discussion (line 422-424). 

      (5) Much of the discussion reads as an overview of the methods, while a discussion of the results and their context to the existing BMD literature is relatively lacking in comparison.

      We have added additional explanation of the results and context to the discussion (line 381-382, 396-407). 

      (6) Figure 1E and lines 146-149: Adjusted p values should be reported in the figure and accompanying text instead of switching between unadjusted and adjusted p values. 

      We updated Figure 1e to portray adjusted p-values, listed the adjusted p-values in legend of Figure 1e, and listed them in the main text (line 153-154).

      (7) Why do the authors bring the IMPC KO gene list into the analysis so late? This seems like a highly relevant data resource (moreso than the GTEx eQTLs/sQTLs) that could have been used much earlier to help identify DDGs. 

      Given that our scRNA-seq data is also from mice, we did choose to integrate information from the IMPC to highlight supplemental features of genes in networks (i.e., genes that have an experimentally-tested and significant effect on BMD in mice). However, our primary goal was to inform human GWAS and leverage our previous work in which we identified colocalizations between human BMD GWAS and eQTL/sQTL in a human GTEx tissue, which is why this information was used to guide our network analysis.

      (8) Does Fgfrl1 and/or Tpx2 have a cis-eQTL in your BMSC-OB scRNA-seq dataset? 

      We did not identify cis-eQTL effects for Fgfrl1 and Tpx2.

      (9) Figure 4B-C: These eQTLs may be real, but based on the diplotype patterns in Figure 4C, I suspect they are artifacts of low mapping power that are driven by rare genotype classes with one or two samples having outlier expression results. For example, if you look at the results in Fig 4C for S100a1 expression, the genotype classes with the highest/lowest expression have lower sample numbers. In the case of Pkm eQTL showing a PWK-low effect, the PWK genome has many SNPs that differ from the reference genome in the 3' UTR of this gene, and I wonder if reads overlapping these SNPs are not aligning correctly (see point 4 above) and resulting (falsely) in lower expression values for samples with a PWK haplotype. 

      As mentioned above, our alignment method did not consider DO founder genetic variation that is specifically located in the 3’ end of RNA transcripts in the scRNA-seq data. We have included this as a limitation in our discussion (line 422-424).

      In future studies, we intend to include larger populations of mice to potentially overcome, as you mention, any artifacts that may be attributable to low statistical power, rare genotype classes, or outlier expression.

      Reviewer #2 (Recommendations for the authors):

      Major Points 

      (1) The authors hypothesize "that many genes impacting BMD do so by influencing osteogenic differentiation or possibly bone marrow adipogenic differentiation". However, cell type itself does not correlate with any bone trait. Does this indicate that the hypothesis is not entirely correct, as genes that drive these phenotypes would not be enriched in one particular cell type? The authors have previously identified "high-priority target genes". So, are there any cell types that are enriched for these target genes? If not, this would indicate that all these genes are more ubiquitously expressed and this is probably why they would have a greater effect on the overall bone traits. Furthermore, are the 73 eGenes (so genes with eQTLs in a particular cell type that change around cell type boundaries) or the DDGs (Table 1) enriched for these high-priority target genes? 

      The bone traits measured in the DO mice are complex and impacted by many factors, including the differentiation propensity and abundance of certain cell types, both within and outside of bone. Though we did not identify correlations between cell type abundance and the bone traits we measured, we tailored our investigations to focus on cellular differentiation using the scRNA-seq data. However, future studies would need to be performed to investigate any connections between cellular differentiation, cell type abundance, and bone traits.

      We did not perform enrichment analyses of either the target genes identified from our other work or eGenes identified here, but instead used the target gene list to center our network analysis and the eGenes to showcase the utility of the DO mouse population.

      (2) The readability of the paper could be improved by minimising the use of acronyms and there are several instances of confusing wording throughout the paper. In many cases, this can be solved by re-organising sentences and adding a bit more detail. For example, it was unclear how you arrived at Fgfrl1 or Tpx2.

      One of the goals of our study was to identify genes that have (to our knowledge) little to no known connection to BMD. We chose to highlight Fgfrl1 and Tpx2 because there is minimal literature characterizing these genes in the context of bone, which we speak to in the results (line 296-297). Additionally, we prioritized these genes in our previous work and they were identified in this study by using our network analyses using the scRNA-seq data, which we mention in the results (line 276-279).

      (3) Technical aspects of the assay. In Figure 1d you show that the cell populations vary considerably between different DO mice. It would be useful to give some sense of the technical variance of this assay given that the assay involves culturing the cells in an exogenous environment. This could take the form of tests between mice within the same inbred strain, or even between different legs of the same DO mice to show that results are technically very consistent. It might also be prudent to identify that this is a potential limitation of the approach as in vitro culturing has the potential to substantially change the cell populations that are present. 

      We agree that in vitro culturing, in addition to the preparation of single cells for scRNA-seq, are unavoidable sources of technical variation in this study. However, the total number of cells contributed by each of the 80 DO mice after data processing does not appear to be skewed and the distribution appears normal (see added figures, now included as Supplemental Figure 3). Therefore, technical variation is at least consistent across all samples. Nevertheless, we have mentioned the potential for technical variation artifacts in our study in the discussion (line 414-416).

      (4) Need for permutation testing. "We identified 563 genes regulated by a significant eQTL in specific cell types. In total, 73 genes with eQTLs were also tradeSeq-identified genes in one or more cell type boundaries". These types of statements are fine but they need to be backed up with permutation testing to show that this level of enrichment is greater than one would expect by chance. 

      We did not perform enrichment tests as our only goal was to 1. determine if eQTL could be resolved in the DO mouse population using our scRNA-seq data and 2. predict in what cell type the associated eQTL and associated eGene may have an effect.

      (5) The main novelty of the paper seems to be that you have used single-cell RNA seq (given that you appear to have already detailed the candidates at the end). I don't think this makes the paper less interesting, but I think you need to reframe the paper more about the approach, and not the specific results. How you landed on these candidates is also not clear. So the paper might be improved by more robustly establishing the workflow and providing guidelines for how studies like this should be conducted in the future. 

      We sought to not only devise a rigorous approach to analyze our single cell data, but also showcase the utility of the approach in practice by highlighting targets for future research (i.e., Fgfrl1 and Tpx2).

      Our goal was to identify novel genes and we landed on these candidate genes (Fgfrl1 and Tpx2) because they had substantial data supporting their causality and they have yet to be fully characterized in the context of bone and BMD (line 295-297).

      In regards to establishing the workflow, we have included rationale for specific aspects of our approach throughout the paper. For example, Figure 2 itemizes each step of our network analysis and we explain why each step is utilized throughout various parts results (e.g., lines 168-170, 179-181, 191-193, 202-203, 257-260, 276-277).

      We have added a statement advocating for large-scale scRNA-seq from genetically diverse samples and network analyses for future studies (line 436-438).

      Minor Points 

      (1) In the summary you use the word "trajectory". Trajectories for what? I assume the transition between cell types, but this is not clear. 

      We added text to clarify the use of trajectory in the summary (line 34).

      (2) This sentence: "By 60 identifying networks enriched for genes implicated in GWAS we predicted putatively causal genes 61 for hundreds of BMD associations based on their membership in enriched modules." is also not clear. Do you mean: we predicted putatively causal genes by identifying clusters of co-expressed genes that were enriched for GWAS genes?" It is not clear how you identify the causal gene in the network. Is this just based on the hub gene? 

      The aforementioned sentence has since been removed to streamline the introduction, as suggested by Reviewer 1.

      In regards to causal gene identification, it is not based on whether it is hub gene. We prioritized a DDG (and their associated networks) if it was a causal gene that we identified in our previous work as having eQTL/sQTL in a GTEx tissue that colocalizes with human BMD GWAS.

      (3) Figure 3C. This is good but the labels are quite small. Would be good to make all the font sizes larger. 

      We have enlarged Figure 3C.

      (4) Line 341 in the Discussion should be "pseudotemporal". 

      We have edited “temporal” to “pseduotemporal”.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In their study, Haghighi et al. seek to build upon prior literature linking alterations in mitochondrial network distribution with various kinds of psychosis. Correlations between subcellular mitochondrial localization and different psychological states is an interesting and potentially fruitful frontier and should be explored; however, despite their ambitious strategy to screen 168 skin fibroblasts from patients experiencing psychosis, and examine various online image databases, there is a concerning number of issues related to the image-analysis approach. The foremost of these is a lack of direct measures of mitochondrial distribution, which might serve to validate their proposed MITO-SLOPE protocol. There is also a worrisome lack of robust controls, which are critical in light of how admittedly subtle some of the distribution phenotypes may be. Overall, the aim to screen differences in mitochondrial distribution is a laudable goal and, in the context of psychological disorders, could be helpful in identifying new therapeutic targets; but the methodology employed in this study does not seem to be sufficiently rigorous to be able to leverage this approach for screening purposes.

      I have extensive experience investigating mitochondria with advanced imaging technologies, including super-resolution microscopy as well as high-throughput and 4D imaging modalities. I am also familiar with standard as well as machine-learning approaches for quantifying mitochondrial morphology as well as distribution or trafficking. In my opinion, this study requires substantial revision, both in terms of the indirect and often opaque image-analysis pipeline as well as the inclusion of orthogonal experiments, which could serve to lessen concerns regarding purported differences in mitochondrial distribution, which are so difficult to discern as to be imperceptible. It is worth noting, too, that this study appears to be predicated, in many ways, upon a 2010 study (Cataldo et al.) of mitochondria in patients with bipolar disorder, which appears to reflect its own lack of critical controls for cell size.

      Major comments:

      The authors state, in the first paragraph of the results section: "By eye, we observed that samples from patients in the control and MDD categories show a more fine-grained, dispersed mitochondrial network extending to the edges of the cell, whereas patients in the categories experiencing psychosis tend to show an agglomerated, thicker network more concentrated around the nucleus. The pattern is subtle and heterogeneous across a cell population." The pattern is indeed subtle. I am concerned that it is so subtle as to be imperceptible. Firstly, it is important to note that the mitochondrial reticulum in BP, SZ, and SZA is more difficult to differentiate, by eye, because the signal appears to be saturated in places, such that the boundaries of individual mitochondria are indistinguishable due to differences in contrast or possibly from the fluorescence intensity itself. Although the authors indicate in the legend that the intensity of the mitochondrial fluorescence was adjusted "for visual clarity," it appears that the contrast needs to be decreased in the BP, SZ, and SZA conditions. It is also important to note that MitoTrackers load into mitochondria in a membrane-potential-dependent fashion. Did the authors detect differences in membrane potential between these groups? While imaging, was the same laser power and gain utilized from condition to condition? With this being said, it is not clear that mitochondria in control and MDD categories have different morphologies from the other conditions. It is also not clear what "fine-grained" means in this context. Is this a comment on aspect ratio? If so, it would be better to use standard terminology. (Why are there large red circular structures in the nucleus? These are likely not mitochondria, so why are they showing up in the channel with MitoTracker?) It is also not evident that one condition has more dispersed mitochondria than another. Given that the authors appear to be making this a central claim of their manuscript, it would seem appropriate to highlight specifically the regions of the different cells that they believe exhibit meaningful differences. If I attempt to look at the merged image, which is important because it is really the only way that one can gauge the relative distance of the mitochondrial network from the edge of the cell, there would seem to be no obvious differences between the conditions. Another key point that I think important to mention, given that it is frequently referenced in this manuscript, Cataldo et al., 2010 indicate that mitochondria in patient fibroblasts with bipolar disorder (BD) are more perinuclear than those in control. However, a cursory inspection of the images from this study (e.g., Figure 2A-B; Figure 4A-D; and Figure 6A-H) unambiguously demonstrate that the BD cells are smaller than the control cells. Of course, if the cells are smaller, the distance from the nucleus will tend to be shorter. In Cataldo et al., 2010, the authors state, "We also measured cell area, cell length, cell width, and cell perimeter of the fibroblasts used in this analysis to verify that the observed mitochondrial distributional differences were not simply a result of BD cells being smaller, shorter, or fatter. No significant differences in any of these measurements were seen based on diagnosis after two sample t tests." Notably, the data is not shown, so it is difficult to appreciate what the variance of the population of cells from control and BD would look like, but it must be said, nevertheless, that the representative images in this paper all point to the BD cells being smaller. In light of this, it would be helpful if Haghighi et al. could add scale bars to all the images (e.g., in Figure 2), so readers can ascertain whether all the cells are portrayed at the same scale and are of similar areas.

      As the authors indicate, interpretable measures of mitochondrial morphology include values like size and shape. It is concerning, therefore, that Figure 3 purports to identify a number of significantly different mitochondrial "features" in the patient groups experiencing psychosis, but they do not appear to make an effort to clarify how any of these features might reflect ground truths of mitochondrial architecture, which can be understood directly by values such as aspect ratio, circularity, area, number organelles, number of nodes or branching points in a network, etc. Unless the authors can specifically tie their machine-learning classifications to standard mitochondrial shape descriptors, their classifications will remain opaque and therefore of limited credibility or value. One way to improve the validation of their machine-learning classification methods would be to use empirically sound methods for manipulating a mitochondrial morphology and distribution, which could serve as positive or negative controls. For example, treatment of cells with the uncoupler FCCP would induce mitochondrial fragmentation, treatment with cycloheximide results in stress-induced mitochondrial hyperfusion (SIMH), or treatment with Nocodazole would block mitochondrial trafficking. Treating control cells with these chemicals would help to establish baseline measurements for how far the patient cells are deviating from untreated controls, in one direction or another. Such considerations, I think, are especially important when the mitochondrial phenotypes are so subtle. I agree with the authors' argument that, for the purposes of screening, it is best to focus on a single metric. Based on their apparent discernment of the subtle differences in mitochondrial distribution in patients experiencing psychosis, they opted to examine possible differences in network density. To this end, they developed "MITO-SLOPE." Out of multiple categories of features, they highlight the following as the most powerful for establishing differences in mitochondrial network density:

      "(a) A subset of texture measures in the nuclei and cytoplasm area of the mito channel. (b) A subset of features measuring the intensity of the mitochondria area across the cell."

      Within the concentric bins around the cell nuclei, they measure:

      • FracAtD: Fraction of total stain in an object at a given radius.
      • MeanFrac: Mean fractional intensity at a given radius, calculated as the fraction of total intensity normalized by the fraction of pixels at a given radius.
      • RadialCV: Coefficient of variation of intensity within a ring, calculated across 8 slices."

      While the authors have recommended the use of a single metric for purposes of screening, MITO-SLOPE appears to represent a bundle of metrics, which, in the end, do not amount to a clear readout of what is being measured. From my point of view, if one were interested in measuring mitochondrial distribution, then, in an ideal situation, one would measure the average distance of all the mitochondria from the center of the nucleus. And, since the size of the cell is critical for establishing relative distances to the boundaries or periphery of the cell, one would normalize this metric by cellular area. Thus, the readout would be: [average mitochondrial distance from the nuclear center (µm)]/[cellular area (µm2)]. An even simpler metric could be: [average mitochondrial distance from nuclear center (µm)]/[average cytoplasmic radius (µm)]. When talking about mitochondrial distribution, we typically think in terms of where is the mitochondrial network, on average, in relation to the nucleus (perinuclear) or to the edge of the cell (peripheral). By quantifying the actual mean distance of the mitochondrial network in relation to both the nucleus and the bona fide cell extremities, via the metrics I described above, one can obtain direct measurements of the truly meaningful values related to mitochondrial distribution. It seems deviating from these approaches introduces more and more opportunities for confounding variables.

      However, the MITO-SLOPE analysis does not seem to consider this metric. Is this, or a similar variation, not the most direct way to establish differences in the mitochondrial network distribution? I would, of course, at least want to see a discussion of why the authors have not chosen to use the most direct form of quantification for this purely spatial value. Why opt for a multifaceted measurement of a relatively straightforward quantity, when a simpler form of quantification would not only suffice but arguably be more likely to capture the ground truth? With this being said, it is not clear to me why, within MITO-SLOPE there seems to be a reliance on measuring the "intensity" of the mitochondria. (And what intensity is it? Mean intensity per ROI?) Of course, particularly if MitoTrackers were used for staining mitochondria, there will be heterogeneity in fluorescence intensity from organelle to organelle, which introduces potential confounders into the workflow. Furthermore, as indicated above, to know if the subcellular distribution of mitochondria is truly altered, it is essential to know if the cell size has likewise changed. Therefore, any unbiased measure of mitochondrial distribution must take into consideration the size of the cell; however, based on the information provided about MITO-SLOPE, it does not appear that the authors are accounting for possible variations in cell size that might account for alterations in mitochondrial network distribution - i.e., a smaller cell will have a more constrained area in which mitochondria will be able to disperse - thus, not accounting for cell size (area) will yield ambiguous results. For example, how can we know if mitochondrial motility is impaired or if the cell is simply smaller and there is less space in which to move? Another complexity, here, is if the cell boundaries were not accounted for via staining of actin, etc., then establishing a true cell boundary will be very challenging. How many bins are sufficient to capture the whole cell? Just 12? Furthermore, human fibroblasts have a tendency to be quite large (sometimes several hundred microns from end to end); how can the authors account for the whole cell, particularly in cases where part of the cell is beyond the field of view or cells are growing on top of each other, as is often the case?

      In Figure 6, there is no control image that could be used as a frame of reference. I have extensive experience imaging A549 cells. The mitochondria in these images appear to be highly fragmented. The staining patterns, particularly of the cells treated with divalproex-sodium, are quite dim, indicating mitochondrial depolarization. Of course, depolarization affects the fluorescence intensity of mitochondria stained with vital dyes, such as MitoTrackers, which will, in turn, presumably affect the values obtained from MITO-SLOPE, which appear to rely on intensity gradients, rather than more concrete spatial coordinates. Also, as indicated above, it is unclear how the authors are establishing the edges of cells without a marker of the plasma membrane or cytoskeleton.

      The authors note that "Divalproex-sodium is a benzodiazepine receptor agonist and HDAC inhibitor (Rahman et al. 2025) used to manage a variety of seizure disorders (Willmore 2003) and bipolar disorder(Bond et al. 2010; Cipriani et al. 2013); it shows a positive MITO-SLOPE which is the direction expected to normalize the centralized mitochondrial localization associated with psychosis." Insofar as this recommends the drug for use in "normalizing" perinuclear mitochondria within neurons, it would seem only prudent to mention that this drug also appears to induce mitochondrial depolarization and fragmentation, which are both associated with a range of severe human pathologies. I would caution the authors to not highlight one potential benefit while omitting an obvious side effect involving what appears to be significant perturbation of mitochondrial structure and function. What is the point of normalizing mitochondrial distribution if the mitochondria being redistributed are dysfunctional?

      The authors note, in Figure 7, that their MITO-SLOPE analysis was unable to discern a statistically significant difference in cells with specific knockouts of genes associated with mitochondrial trafficking. If the MITO-SLOPE cannot discern a difference in the context of a substantial abrogation of mitochondrial transport capacity, how is it that it could detect meaningful differences where there is only a "subtle" change in distribution? This result would seem to militate strongly against the efficacy of this analysis pipeline and would not recommend its use for unbiased screening and discovery.

      Minor comments:

      For Figure 6 b and c, "µm" should be "µM."

      The introduction and discussion could be more concise.

      Significance

      This study attempts to fill an important gap in knowledge relating to mitochondrial distribution and psychological disorders. It aims to perform an initial screen to try to validate a novel analysis pipeline called MITO-SLOPE, however, the study appears to lack analytical rigor, both in terms of the underlying cell biology together with the approach for quantification, itself. Conceptually, this study has great promise, but the authors will need to improve their pipeline prior to publication, which will likely require fundamental revisions, including an array of orthogonal measures (largely lacking here) as well as detailed demonstrations of how the segmentation actually works and ultimately yields data reflecting demonstrable mitochondrial trafficking/distribution defects.

    1. Author response:

      eLife Assessment

      This study provides a valuable contribution to understanding how negative affect influences food-choice decision making in bulimia nervosa, using a mechanistic approach with a drift diffusion model (DDM) to examine the weighting of tastiness and healthiness attributes. The solid evidence is supported by a robust crossover design and rigorous statistical methods, although concerns about the interpretation of group differences across neutral and negative conditions limit the interpretability of the results.

      We are grateful for this improved assessment. Below, we provide detailed responses that we believe address the noted concerns about interpreting group differences across conditions. If these clarifications resolve the interpretability concerns, we would be grateful if the editors would consider updating the eLife assessment accordingly.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the Drift and Diffusion Model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options in individuals with bulimia nervosa (BN) and healthy participants

      (2)The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has potential to improve the understanding of pathological food choices.

      Weaknesses:

      I thank the author for reviewing their manuscript.

      However, I still have major concerns.

      The authors say that they removed any causal claims in their revised version of the manuscript. The sentence before the last one of the abstract still says "bias for high-fat foods predicted more frequent subjective binge episodes over three months". This is a causal claim that I already highlighted in my previous review, specifically for that sentence (see my second sentence of my major point 2 of my previous review).

      We appreciate the Reviewer's continued attention to causal language. We acknowledge that our use of the term 'predicted', though intended to refer to statistical prediction in a regression model, could be misinterpreted as implying causation. We have therefore revised this sentence to read: 'bias for high-fat foods was associated with more frequent subjective binge episodes over three months’.

      I also noticed that a comment that I added was not sent to the authors. In this comment I was highlighting that in Figure 2 of Galibri et al., I was uncertain about a difference between neutral and negative inductions of the average negative rating after the induction in the BN group (i.e. comparing the negative rating after negative induction in BN to the negative rating after neutral induction in BN). Figure 2 of Galibri et al. looks to me that:

      (1) The BN participants were more negative before the induction when they came to the neutral session than when they came to the negative session.

      (2) The BN participants looked almost negatively similar (taking into account the error bars reported) after the induction in both sessions

      These observations are of high importance because they may support the fact that BN patients were likely in a similar negative state to run the food decision task in both conditions (negative and neutral). Therefore, the lack of difference in food choices in BN patients is unsurprising and nothing could be concluded from the DDM analyses. Moreover, the strong negative ratings of BN patients in the neutral condition as compared to healthy participants together with almost similar negative ratings after the two inductions contradict the authors' last sentence of their abstract.

      I appreciate that the authors reproduced an analysis of their initial paper regarding the negative ratings (i.e. Table S1). It partly answers my aforementioned point but does not address the fact that BN may have been in a similar negative state in both conditions (neutral and negative) when running the food decision task: if BN patients were similarly negative after both induction (neutral and negative), nothing can be concluded from their differences in their results obtained from the DDM. As the authors put it, "not all loss-ofcontrol eating occurs in the context of negative state", I add that far from all negative states lead to a loss-of-control eating in BN patients. This grounds all my aforementioned remarks and my remarks of my first review.

      A solution for that is to run a paired t-test in BN patients only comparing the score after the induction in the two conditions (neutral and negative) reported in Figure 2 of their initial article.

      We appreciate the reviewer’s concern. We understand how the visual representation in Figure 2, which displays between-subject error bars, might suggest similar post-induction affect levels. However, the within-subject paired comparison (which appropriately accounts for individual differences in baseline affect) reveals a significant difference, which we detail below.

      While BN participants did report higher baseline negative affect than the HC group prior to the mood inductions, this does not negate the effectiveness of the manipulation. The critical comparison is the within-subject change from pre- to post-induction (detailed below) which shows that negative affect was significantly higher after the negative induction than the neutral induction.

      As we reported in the Supplementary Information (Table S1), our initial analyses of self-reported affect ratings used a linear mixed-effects model with group (HC = 0, BN = 1), condition (Neutral = 0, Negative = 1), and time (pre-induction = 0, post-induction = 1) as fixed effects, including all interactions, and random intercepts for participants. This approach accounts for individual differences in baseline affect.

      However, to address the reviewer's concerns, we conducted two simple effects analyses using estimated marginal means. As the reviewer suggested, we directly compared post-induction affect between conditions within the BN group (described in the second analysis below). In the first analysis, we examined the diagnosis × time interaction within each condition separately. In the Negative condition, individuals with BN demonstrated a substantial increase in negative affect from pre- to post-induction (mean difference = 20.36, t = 4.84, p < 0.0001, Cohen’s d = 0.97). In the second analysis, we examined the condition × time interaction within each group separately. Among the BN group, we found that reported affect was significantly higher following the negative mood induction than after the neutral affect induction (mean difference = -17.40, t = -4.13, p = 0.0003, Cohen’s d = 0.83). This difference in post-induction negative affect between conditions within the BN group represents a meaningful and statistically robust difference in affective states. These within-group effects confirm that the negative mood induction was (1) effective in the BN group and (2) produced significantly greater negative affect than the neutral mood induction.

      These findings confirm that participants completed the food decision task under meaningfully different affective states, supporting the interpretability of the subsequent DDM analyses. We now report these analyses in the Supplementary Information.

      I appreciate the analysis that the authors added with the restrictive subscale of the EDE-Q.

      That this analysis does not show any association with the parameters of interest does not show that there is a difference in the link between self reported restrictions and self reported binges. Only such a difference would allow us to claim that the results the authors report may be related to binges.

      We thank the reviewer for raising this important point about specificity. To address this concern, we examined the correlation between self-reported binge frequency (both subjective binge episodes and objective binge episodes over the past three months) and EDE-Q Restraint subscale in our BN sample.

      The correlation between these measures were modest and non-significant (subjective binge frequency: Spearman’s p = 0.21, p = 0.306; objective binge frequency: Spearman’s p = 0.05, p = 0.806), indicating that both binge frequency measures and dietary restraint were relatively independent dimensions of eating pathology in our sample. This dissociation supports the specificity of our findings: the fact that our DDM parameters were associated with binge frequency but not with dietary restraint suggests that the affect-induced changes in decisionmaking we observed are specifically related to binge-eating behavior rather than reflecting a correlate of dietary restraint. We now report this analysis in the Supplementary Information.

      I appreciate the wording of the answer of the authors to my third point: "the results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic, but the results do not allow us to determine whether this reactivity causes the symptoms". This sentence is crystal clear and sums very well the limits of the associations the authors report with binge eating frequency. However, I do not see this sentence in the manuscript. I think the manuscript would benefit substantially from adding it.

      We thank the reviewer for the suggestion. We have added the following sentences that convey this information to the end of the third paragraph of the discussion:

      “These results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic. However, our correlational design does not allow us to determine whether this reactivity causes the symptoms.”

      Statistical analyses:

      If I understood well the mixed models performed, analyses of supplementary tables S1 and S27 to S32 are considering all measures as independent which means that the considered score of each condition (neutral vs negative) and each time (before vs after induction) which have been rated by the same participants are independent. Such type of analyses does not take into account the potential correlation between the 4 scores of a given participant. As a consequence, results may lead to false positives that a linear mixed model does not address. The appropriate analysis would be to run adapted statistical tests pairing the data without running any mixed model.

      We appreciate the reviewer's attention to the statistical approach. However, we respectfully note that mixed-effects models do account for within-subject correlations, contrary to the reviewer’s interpretation.

      The linear mixed-effects model we employed explicitly accounts for the correlation among repeated measures from the same participant through the random intercept term. This random effect structure models the non-independence of observations within participants, allowing for correlated errors within individuals while assuming independence between individuals. This is a standard and appropriate approach for analyzing repeated-measures data (Bates et al., 2015).

      The mixed-effects model is, in fact, more appropriate than separate paired t-tests for our design because it:

      (1) Simultaneously models all fixed effects (group, condition, time) and their interactions in a single unified framework;

      (2) Properly partitions variance into within-subject and between-subject components;

      (3) Provides greater statistical power and more precise estimates by using all available data simultaneously; and

      (4) Allows for direct testing of three-way interactions that cannot be assessed through pairwise comparisons alone.

      Paired tests (e.g., t-tests), as the reviewer suggests, would require multiple separate analyses and would not allow us to test our primary hypotheses about group × condition × time interactions. The mixed-effects approach provides a more comprehensive and statistically rigorous analysis of our repeated-measures design. To clarify this even further in the manuscript, we have added the following in our methods when describing our model, “participant-level random intercepts were included to account for within-subject correlations across repeated measurements.”

      Notes:

      It is not because specific methods like correlating self reported measures over long periods with almost instantaneous behaviors (like tasks) have been used extensively in studies that these methods are adapted to answer a given scientific question. Measures aggregated over long periods miss the variations in instantaneous behaviors over these periods.

      We acknowledge the reviewer’s concern about the temporal mismatch between our session-level task measures and the 3-month aggregated symptom reports. This is a valid limitation of crosssectional designs, and we agree that examining how task performance fluctuates in relation to real-time symptom variation would provide richer insights into the potential dynamics of these relationships.

      We agree that we cannot capture how daily changes in task performance relate to momentary symptom occurrence. In response to previous rounds of helpful reviews, we added this limitation to the Discussion section, noting that future research employing ecological momentary assessment (EMA) or daily diary methods could examine whether the decision-making processes we identified also fluctuate in relation to real-time symptom occurrence.

      We note that our finding that affect-induced changes in decision-making parameters were associated with subjective binge frequency suggests that this laboratory-measured reactivity may reflect a stable individual difference that manifests across contexts and time periods. While our current study provides initial evidence that individual differences in affect-related decisionmaking are associated with symptom severity, we acknowledge that longitudinal designs with repeated assessments would strengthen causal and temporal inferences.

      Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well-understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decisionmaking processes that may increase the likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant, and the methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      Sample size was relatively small, and participants were all women with BN, which limits generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. These limitations are adequately noted in the discussion.

      We are grateful to Reviewer #2 for their careful and supportive review of our manuscript. We appreciate their recognition that computational modeling can reveal nuanced alterations in decision-making processes that may not be apparent in overt behavioral choices. Their balanced assessment of both the strengths and limitations of our work has been helpful in contextualizing our findings appropriately. We have carefully considered their comments regarding sample size and the potential limitations of our mood induction procedure, both of which we discuss in detail in the manuscript's limitations section.

      Reviewer #3 (Public review):

      Summary:

      The study uses the food choice task, a well-established method in eating disorder research, particularly in anorexia nervosa. However, it introduces a novel analytical approach-the diffusion decision model-to deconstruct food choices and assess the influence of negative affect on how and when tastiness and healthiness are considered in decision-making among individuals with bulimia nervosa and healthy controls.

      Strengths:

      The introduction provides a comprehensive review of the literature, and the study design appears robust. It incorporates separate sessions for neutral and negative affect conditions and counterbalances tastiness and healthiness ratings. The statistical methods are rigorous, employing multiple testing corrections.

      A key finding-that negative affect induction biases individuals with bulimia nervosa toward prioritizing tastiness over healthiness-offers an intriguing perspective on how negative affect may drive binge eating behaviors.

      Weaknesses:

      A notable limitation is the absence of a sample size calculation, which, combined with the relatively small sample, may have contributed to null findings. Additionally, while the affect induction method is validated, it is less effective than alternatives such as image or film-based stimuli (Dana et al., 2020), potentially influencing the results.

      We are grateful to Reviewer #3 for their thoughtful evaluation of our work. We appreciate their recognition that the diffusion decision model provides a novel analytical lens for understanding how negative affect influences the dynamics of food-related decision-making in bulimia nervosa. Their balanced assessment of both the methodological strengths of our design (counterbalancing, rigorous statistical corrections) and its limitations (sample size, mood induction efficacy) has been valuable in ensuring we appropriately contextualize our findings and their implications. Specifically, we have taken their comments regarding sample size and the relative efficacy of different mood induction methods seriously, and we address these important methodological considerations in our discussion of the study's limitations.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors have addressed my previous comments, and I do not have any additional suggestions for improvement.

      We thank the reviewer for their time, effort, and insightful feedback.

      Reviewer #3 (Recommendations for the authors):

      The authors have adequately addressed my feedback. I have no further comments.

      We thank the reviewer for their time, effort, and insightful feedback.

    1. Author response:

      eLife Assessment

      Hoverflies are known for their sexually dimorphic visual systems and exquisite flight behaviors. This valuable study reports how two types of visual descending neurons differ between males and females in their motion- and speed-dependent responses, yet surprisingly, the behavior they control lacks any sexual dimorphism. The results convincingly support these findings, which will be of interest for studies of visuomotor transformations and network-level brain organization.

      This statement perfectly recapitulates our findings.

      Public Reviews:

      Reviewer #1 (Public review):  

      Summary: 

      Hoverflies are known for a striking sexual dimorphism in eye morphology and early visual system physiology. Surprisingly, the male and female flight behaviors show only subtle differences. Nicholas et al. investigate the sensori-motor transformation of sexually dimorphic visual information to flight steering commands via descending neurons. The authors combined intra- and extracellular recordings, neuroanatomy, and behavioral analysis. They convincingly demonstrate that descending neurons show sexual dimorphisms - in particular at high optic flow velocities - while wing steering responses seem relatively monomorphic. The study highlights a very interesting discrepancy between neuronal and behavioral response properties.

      Thank you for this summary. Most of the statement perfectly recapitulates the main findings of our paper. However, we want to emphasize that some hoverfly flight behaviors are strongly sexually dimorphic, especially those related to courtship and mating. Indeed, only male hoverflies pursue targets at high speed, chase away territorial intruders, and pursue females for mating. However, other flight behaviours, such as those related to optomotor responses and flights between flowers when feeding, are not sexually dimorphic. We will amend the Introduction to make the difference between flight behaviors clear.

      More specifically, the authors focused on two types of descending neurons that receive inputs from well-characterized wide-field sensitive tangential cells: OFS DN1, which receives inputs from so-called HS cells, and OFS DN2, which receives input from a set of VS cells. Their likely counterparts in Drosophila connect to the neck, wing, and haltere neuropils. The authors characterized the visual response properties of these two neuronal classes in both male and female hoverflies and identified several interesting differences. They then presented the same set of stimuli, tracked wing beat amplitude, and analyzed the sum and the difference of right and left wing beat amplitude as a readout of lift or thrust, and yaw turning, respectively. Behavioral responses showed little to no sexual dimorphism, despite the observed neuronal differences.

      Thank you for this very nice summary of our work. We want to clarify that LPTC input to DN1 and DN2 has not been shown directly in hoverflies using e.g. dye coupling, or dual recordings. Instead, the presumed HS and VS input is inferred from morphological and physiological DN evidence, and comparisons to similar data in Drosophila and blowflies. We will amend the Introduction to clarify this. The rest of the paragraph perfectly recapitulates the main findings of our paper.

      Strengths:

      I find the question very interesting and the results both convincing and intriguing. A fundamental goal in neuroscience is to link neuronal responses and behavior. The current study highlights that the transformations - even at the level of descending neurons to motoneurons - are complex and less straightforward than one might expect.

      Thank you.

      Weaknesses:

      The authors investigated two types of descending neurons, but it was not clear to me how many other descending neurons are thought to be involved in wing steering responses to wide-field motion. I would suggest providing a more in-depth overview of what is known about hoverflies and Drosophila, since the conclusions drawn from the study would be different if these two types were the only descending neurons involved, as opposed to representing a subset of the neurons conveying visual information to the wing neuropil.

      This is a great point. There are around 1000 fly DNs, of which many could respond to widefield motion, without being specifically tuned to widefield motion. For example, many looming sensitive neurons also respond to widefield motion, and could therefore be involved in the WBA movements that we measured here. In addition, there are many multimodal neurons that could be involved in optomotor responses in free flight, but these may not have been stimulated when we only provided visual input. Furthermore, many visual neurons are modulated by proprioceptive feedback, which is lacking in immobilized physiology preps. Finally, in blowflies, up to 5 optic flow sensitive DNs have been identified morphologically, and in Drosophila 3 have been identified morphologically and physiologically. In summary, it is more than likely that other neurons project visual widefield motion information to the wing neuropil. We will amend our Introduction and Discussion to make this important point clear to the readers.

      Both neuronal classes have counterparts in Drosophila that also innervate neck motor regions. The authors filled the hoverfly DNs in intracellular recordings to characterize their arborization in the ventral nerve cord. In my opinion, these anatomical data could be further exploited and discussed a bit more: is the innervation in hoverflies also consistent with connecting to the neck and haltere motor regions? Are there any obvious differences and similarities to the Drosophila neurons mentioned by the authors? If the arborization also supports a role in neck movements, the authors could discuss whether they would expect any sexual dimorphism in head movements.

      These are all great points. We did not see any clear arborizations to the frontal nerve, where we would expect to find the neck motor neurons (NMNs). In addition, while we did see fine arborizations throughout the length of the thoracic ganglion, we saw no strong outputs projecting directly to the haltere nerve (HN). In the revised version of the MS we will modify figure 4 (morphological characterization) to clarify.

      There are important differences between the morphology of DN1 and DN2 in hoverflies and DNHS1 and DNOVS2 in Drosophila, in terms of their projections in the thoracic ganglion. For example, In Drosophila DNOVS2, there are several fine branches along the length of the neuron in the thoracic ganglia. Similarly, we found fine branches in Eristalis tenax DN2, however, in addition, we found a wide branch projecting to the area of the thoracic ganglion where the prothoracic and pterothoracic nerves likely get their inputs (Figure 4), suggesting that the neuron could contribute to controlling the wings and/or the forelegs (which is why we quantified the WBA). In Drosophila DNHS1, there is a similar fat branch to the prothoracic and pterothoracic nerves, which we also found in Eristalis tenax OFS DN1 (Figure 4). Indeed, while Drosophila DNHS1 and DNOVS2 have quite strikingly different morphology, DN1 and DN2 in Eristalis looked quite similar. We will modify the Results section to make this clear.

      In addition, to investigate this further, in the revised version of the MS we will include analysis of the movement of different body parts (including the head) to investigate the presence of any potential sexual dimorphism. Unfortunately, however, this will not include the halteres, as they cannot be seen well in the videos.

      Reviewer #2 (Public review):

      Summary:

      Many fly species exhibit male-specific visual behaviors during courtship, while little is known about the circuit underlying the dimorphic visuomotor transformations. Nicholas et al focus on two types of visual descending neurons (DNs) in hoverflies, a species in which only males exhibit high-speed pursuit of conspecifics. They combined electrophysiology and behavior analysis to identify these DNs and characterize their response to a variety of visual stimuli in both male and female flies. The results show that the neurons in both sexes have similar receptive fields but exhibit speed-dependent dimorphic responses to different optic flow stimuli.

      This statement perfectly recapitulates the main findings of our paper. However, as mentioned above, while hoverfly flight behaviors related to courtship and mating are strongly sexually dimorphic, other flight behaviours, such as those related to optomotor responses and flights between flowers when feeding, are not. We will amend the Introduction to make the difference between flight behaviors clear.

      Strengths:

      Hoverflies, though not a common model system, show very interesting dimorphic behaviors and provide a unique and valuable entry point to explore the brain organization behind sexual dimorphism. The findings here are not only interesting on their own right but will also likely inspire those working in other systems, particularly Drosophila.

      Thank you.

      The authors employed rigorous morphology, electrophysiology, and behavior methods to deliver a comprehensive characterization of the neurons in question. The precision of the measurements allowed for identifying a subtle and nuanced neuronal dimorphism and set a standard for future work in this area.

      Thank you.

      Weaknesses:

      Cell-typing using receptive field preferred directions (RFPDs): if I understood correctly, this classification method mostly relies on the LPDs near the center of the receptive field (median within the contour in Fig.1). I have two concerns here. First, this method is great if we are certain there are only two types of visual DNs as described in the manuscript. But how certain is this? Given the importance of vision in flight control, I would expect many DNs that transmit optic flow information to the motor center. I'd also like to point out that there are other lobula plate tangential cells (LPTCs) than HS and VS cells, which are much less studied and could potentially contribute to dimorphic behaviors.

      This is very true, and an important point. As mentioned above, in blowflies, up to 5 optic flow sensitive DNs have been identified morphologically, however, if these correspond to 5 different physiological types remain unclear. In both blowflies and Drosophila 3 have been identified morphologically and physiologically (DNHS1, DNOVS1, DNOVS2). Importantly, in both blowflies and fruitflies DNOVS1 gives graded responses, and no action potentials, meaning that we would not be able to record from it using extracellular electrophysiology.

      We previously used clustering techniques to show that in Eristalis, we can reliably distinguish two types of optic flow sensitive DNs from extracellular electrophysiological data, based on a range of receptive field parameters, and we think that these correspond to DNHS1 and DNOVS2 in Drosophila (Nicholas et al, J Comp Physiol A, 2020, cited in paper). As mentioned above in response to Reviewer 1, this does not mean that there are no other neurons that could respond to widefield optic flow, and which might be involved in the WBA we recorded in the paper. However, the point of this paper was not to conclusively show that there are only two optic flow sensitive descending neurons. The point was to say that there are two quite distinct optic flow sensitive neurons that have similar receptive fields in males and females, while the responses to widefield motion show differences between males and females.

      We will modify the Introduction and Discussion to make these important points clear to the Reader, including the discussion of the 45-60 LPTCs that exist in the lobula plate, and what their role might be.

      Second, this method feels somewhat impoverished given the richness of the data. The authors have nicely mapped out the directional tuning for almost the entire visual field. Instead of reducing this measurement to 2 values (center and direction), I was wondering if there is a better method to fully utilize the data at hand to get a better characterization of these DNs. As the authors are aware, local features alone can be ambiguous in characterizing optic flows. What's more, taking into account more global features can be useful for discovering potentially new cell types.

      This is a great point, and we did an extensive analysis of other receptive field properties in this study (shown in supp fig 1). In addition, and as mentioned above, we have published a clustering analysis across receptive field properties of these neurons (Nicholas et al, J Comp Physiol A, 2020, cited in paper). The point that we attempted to make in this paper was that by using two strikingly simple metrics, we can reliably distinguish which of the two neuron types we are recording from (if we accept that there are two main types that we are likely to record from) simply based on location and overall directional preference. This makes automated analysis very easy and straightforward. Indeed, we now use this routinely to ID what neuron we are recording from, rather than making a human-based assumption.

      However, we agree that further in depth analysis is warranted. Therefore, to address this, we will provide additional receptive field analysis and clustering in the revised version of the MS. In addition, we want to highlight that all data is uploaded to DataDryad for anyone interested in doing additional in-depth analyses.

      Line 131, it wasn't clear to me why full-screen stimuli were used for comparison here, instead of the full receptive field maps. Male flies exhibit sexual dimorphic behaviors only during courtship, which would suggest that small-sized visual stimuli (mimicking an intruder or female conspecific) would be better suited to elicit dimorphic neuronal responses. A similar comment applies to the later results as well. Based on the receptive field mapping in Figure 1, I'm under the impression that these 2 DN types are more suited to detect wide-field optic flows, those induced by self-motion as mentioned in the manuscript. The results are still very interesting, but it's good to make this point clear early on to help set appropriate expectations. Conversely, this would also suggest that there are other visual DN types that are responsible for the courtship-related sexually dimorphic behaviors.

      Thank you for mentioning these important points. Our reasoning for using full-screen stimuli for the analysis on line 131 was that since we used the small sinusoidal gratings for mapping the receptive fields, and to subsequently classify the neurons, it would be unfair to use the same data to investigate potential sexual dimorphism. I.e., we selected neurons that fulfilled certain criteria, and then we cannot rightfully use the same criteria to determine differences. This was not explicitly mentioned in the paper, so we will modify the text to make this clear to the Reader.

      However, in Supp Figure 1d/e we show that there are no striking receptive field differences between males and females in terms of receptive field center nor directional preference. In Supp Figure 1f we show that there is no difference between male and female receptive field height and width. We will modify the text to draw the Reader’s attention to this figure, and also mention the additional analysis done in response to the comment above.

      As a side note, I personally expected at least DNHS1 to have a smaller receptive field in males, as the hoverfly HSN is strikingly sexually dimorphic (Nordström et al, Curr Biol 2008), and also very sensitive to small objects. However, while optic flow sensitive DNs do respond to small objects (see e.g. the J Comp Physiol paper mentioned above) we did not detect any obvious sexual dimorphism in receptive field properties. Indeed, we think that a different subset of DNs control target pursuit behavior (target selective DNs (TSDNs)). This will be addressed in the modified version of the paper.

    1. 12.7. Activity: Value statements in what goes viral# 12.7.1. Choose three scenarios# When content goes viral there may be many people with a stake in it’s going viral, such as: The person (or people) whose content or actions are going viral, who might want attention, or get financial gain, or might be embarrassed or might get criticism or harassment, etc. Different people involved might have different interests. Some may not have awareness of it happening at all (like a video of an infant). Different audiences might have interests such as curiosity or desire to bring justice to a situation or desire to get attention for themselves or their ideas based on engaging the viral content, or desire to troll or harass others. Social networking platforms might have interests such as increased attention to their platform or increased advertising, or increased or decreased reputation (in views of different audiences). List at least three different scenarios of content going viral and list out the interests of different groups and people in the content going viral. 12.7.2. Create value statements# Social media platforms have some ability to influence what goes viral and how (e.g., recommendation algorithms, what actions are available, what data is displayed, etc.), though they only have partial control, since human interaction and organization also play a large role. Still, regardless of whether we can force any particular outcome, we can still consider of what you think would be best for what content should go viral, how much, and in what ways. Create a set of value statements for when and how you ideally would want content to go viral. Try to come up with at least 10 value statements. We encourage you to consider different ethics frameworks as you try to come up with ideas.

      This section clearly shows that virality isn’t neutral and always involves tradeoffs between different groups. I liked how the examples highlight that what benefits platforms or audiences can still harm individuals, especially through misinformation or loss of privacy. It also made me think more about how recommendation systems should reflect ethical values, not just engagement metrics

    1. 11.4.1. Filter Bubbles# One concern with how recommendation algorithms is that they can create filter bubbles (or “epistemic bubbles” or “echo chambers”), where people get filtered into groups and the recommendation algorithm only gives people content that reinforces and doesn’t challenge their interests or beliefs. These echo chambers allow people in the groups to freely have conversations among themselves without external challenge. The filter bubbles can be good or bad, such as forming bubbles for: Hate groups, where people’s hate and fear of others gets reinforced and never challenged Fan communities, where people’s appreciation of an artist, work of art, or something is assumed, and then reinforced and never challenged Marginalized communities can find safe spaces where they aren’t constantly challenged or harassed (e.g., a safe space) 11.4.2. Amplifying Polarization and Negativity# There are concerns that echo chambers increase polarization, where groups lose common ground and ability to communicate with each other. In some ways echo chambers are the opposite of context collapse, where contexts are created and prevented from collapsing. Though others have argued that people do interact across these echo chambers, but the contentious nature of their interactions increases polarization. Along those lines, ff social media sites simply amplify content that gets strong reactions, they will often amplify the most negative and polarizing content. Recommendation algorithms can make this even works. For example: At one point, Facebook counted the default “like” reaction less than the “anger” reaction, which amplified negative content. On Twitter, one study found (full article on archive.org): “Whereas Google gave higher rankings to more reliable sites, we found that Twitter boosted the least reliable sources, regardless of their politics.” According to another study on Twitter: “An analysis […] suggested that when users swarm tweets to denounce them with quote tweets and replies, they might be cueing Twitter’s algorithm to see them as particularly engaging, which in turn might be prompting Twitter to amplify those tweets. The upshot is that when people enthusiastically gather to denounce the latest Bad Tweet of the Day, they may actually be ensuring more people see it than had they never decided to pile on in the first place. That possibility raises serious questions of what constitutes responsible civic behavior on Twitter and whether the platform is in yet another way incentivizing combative behavior.” Though this is a big concern about Internet-based social media, traditional media sources also play into this: For example, this study: Cable news has a much bigger effect on America’s polarization than social media, study finds Note: polarization itself is not necessarily bad (do we want to make everyone believe the exact same thing?), and some argue that in some situations polarization is even a good thing. 11.4.3. Radicalization# Building off of the amplification polarization and negativity, there are concerns (and real examples) of social media (and their recommendation algorithms) radicalizing people into conspiracy theories and into violence. Rohingya Genocide in Myanmar# A genocide of the Rohingya people in Myanmar started in 2016, and in 2018 Facebook admitted it was used to ‘incite offline violence’ in Myanmar. In 2021, the Rohingya sued Facebook for £150bn over how Facebook amplified hate speech and didn’t take down inflammatory posts. The Flat Earth Movement# The flat earth movement (an absurd conspiracy theory that the earth is actually flat, and not a globe) gained popularity in the 2010s. As YouTuber Dan Olson explains it in his (rather long) video In Search of a Flat Earth: Modern Flat Earth [movement] was essentially created by content algorithms trying to maximize retention and engagement by serving users suggestions for things that are, effectively, incrementally more concentrated versions of the thing they were already looking at. Bizarre cranks peddling random theories are an aspect of civilization that has always been with us, so it was inevitable that they would end up on YouTube, but the algorithm made sure they found an audience. These systems were accidentally identifying people susceptible to conspiratorial and reactionary thinking and sending them increasingly deeper into Flat Earth evangelism. Dan Oleson then explained that by 2020, the flat earth content was getting less views: The bottom line is that Flat Earth has been slowly bleeding support for the last several years. Because they’re all going to QAnon. See also: YouTube aids flat earth conspiracy theorists, research suggests 11.4.4. Discussion Questions# What responsibilities do you think social media platforms should have in regards to larger social trends? Consider impact vs. intent. For example, consequentialism only cares about the impact of an action. How do you feel about the importance of impact and intent in the design of recommendation algorithms? What strategies do you think might work to improve how social media platforms use recommendations?

      This section does a great job showing how recommendation algorithms can unintentionally amplify polarization and even contribute to radicalization. The examples (Facebook reactions, Twitter quote-tweet dynamics, and the flat earth → QAnon pipeline) clearly illustrate how engagement-based systems can reward negativity and extreme content. I also appreciate the nuance at the end that polarization itself isn’t always bad, which keeps the discussion balanced rather than alarmist. Overall, this is a clear, well-supported explanation of why algorithmic design choices have serious social consequences beyond individual user intent.

    1. The Examined Life is Wise Living: The Relationship Between Mindfulness, Wisdom, and the Moral Foundations.Published in:Journal of Adult Development, Dec2020,Academic Search CompleteBy:Verhaeghen, PaulVerhaeghen, Paul The Examined Life is Wise Living: The Relationship Between Mindfulness, Wisdom, and the Moral Foundations  This correlational study of two independent samples (260 college students and 173 Mechanical Turk workers aged 21–74) examined whether and how mindfulness (broadly construed as a manifold of self-awareness, self-regulation, and self-transcendence), influences wisdom about the self (Adult Self-Transcendence Inventory and Self-Assessed Wisdom Scale) and wisdom about the (social) world (Three-Dimensional Wisdom Scale), and how mindfulness and wisdom impact ethical sensitivities (the five moral foundations). Mindfulness predicted wisdom about the self, and wisdom about the self was linked to an emphasis on the individualizing moral foundations of care/harm avoidance and fairness and, to a lesser degree, on the binding moral foundations of loyalty, authority, and purity. Wisdom about the (social) world was not associated with either mindfulness or the moral foundations. Age was a significant positive predictor for wisdom about the self once the self-awareness component of mindfulness was taken into account. Keywords: Wisdom; Mindfulness; Moral foundations; Ethics This paper investigates the links between trait mindfulness, wisdom, and ethical sensitivities (operationalized as sensitivity to the five moral foundations) in two independent samples, one of college students and one of adults spanning ages 21–74. Two principal ideas guided the study. The first idea is that wisdom, whether one conceptualizes it as a form of expertise or as a virtue or personality characteristic, might be well served by the specific quality or qualities of attention the individual brings to their experiences. It makes sense to expect that a habitual mindful attitude (i.e., taking an open, non-judgmental, reflective, self-regulatory, and sometimes self-transcendent stance towards life) might be a good indicator or exemplifier of such qualities. The second idea is that most, if not all, current adult-developmental theories consider wisdom to be of practical consequence, in the sense that wise people are expected to generally display prosocial attitudes and behavior (for a review, see Bangen et al. [10]). Consequentially, one might expect this wise stance to give rise to ethical sensitivities that are compatible with the characteristics of wisdom (as defined within these theories). Wisdom It is probably fair to say that within the field of psychology the study of wisdom started from an adult development perspective (e.g., Clayton and Birren [20]; Erikson [26]; Kramer [44]; Pascual-Leone [54]). Initial conceptualizations tended to view wisdom primarily from a cognitive angle, that is, as an advanced form of postformal thought. For instance, Baltes and Staudinger ([ 9 ]) define wisdom as 'expertise in the conduct and meaning of life' (p. 124). In this approach, wisdom is conceptualized as a form of crystallized intelligence, more specifically 'expert knowledge in the fundamental pragmatics of life that permits exceptional insight, judgment, and advice about complex and uncertain matters' (Pasupathi et al. [56], p. 351). Other approaches—Glück and Bluck ([31]) label these 'integrative views'—have supplemented this cognitive view by additionally emphasizing the reflective, affective, and conative qualities of the wise person, making wisdom more akin to a personality characteristic or a virtue (e.g., Ardelt [ 3 ]; Mitchell et al. [52])—wisdom as 'personal, concrete, applied, and involved' (Ardelt [ 3 ], p. 262). The different conceptualizations of wisdom do have a common core. From a review of 24 different key theories or definitions of wisdom, Bangen et al. ([10]) concluded that five subcomponents were present in at least half of the papers: (a) social decision making and pragmatic knowledge of life; (b) prosocial attitudes and values; (c) reflection and self-understanding (including a desire to learn); (d) acknowledgement of and coping with uncertainty; and (e) emotional homeostasis. Although there are qualitative, performance-based measures of wisdom, such as the Berlin wisdom paradigm (Baltes and Smith [ 8 ]), where participants describe how they would solve a particular life problem and answers are scored along a series of dimensions, self-report measures were used here, simply because quantitative measures allow for more efficient data collection and scoring, which in turn allows to query a larger sample of respondents. Specifically, I used the three quantitative self-report measures for wisdom recommended by Glück ([30]), Glück et al. ([34]), and Staudinger and Glück ([64])—Ardelt's Three-Dimensional Wisdom Scale (3D-WS; [ 2 ]), Levenson's Adult Self-Transcendence Inventory (ASTI; Levenson et al. [47]), and Webster's Self-Assessed Wisdom Scale (SAWS; [71], [72]). These three scales have different emphases. The 3D-WS measures wisdom as the integration of cognitive, reflective, and affective/compassionate personal characteristics; the SAWS gauges five dimensions, namely critical life experience, emotional regulation, reminiscence and reflectiveness, humor, and openness; the ASTI taps into self-transcendent wisdom, defined as a self-expansive process entailing decreased self-concern and increased empathy, understanding, spirituality, and feelings of connectedness with past and future generations. Not all of these scales cover all five subcomponents mentioned above: Arguably, the 3D-WS does; the SAWS covers social decision making, self-reflection, and emotional homeostasis; and the ASTI includes items about prosocial attitudes, self-reflection, and emotional homeostasis. Glück et al. ([34]) and Staudinger and Glück ([64]) additionally make a distinction between personal and general wisdom. The former refers to a person's insight into themselves and their own lives; the latter to insights into life and the world in general. The assumption is that personal wisdom is obtained through actual personal experience, whereas general wisdom does not have personal experience as a necessary condition. In Glück's conceptualization, all three scales mentioned above measure personal wisdom; only performance-based measures tap into general wisdom. Glück et al. ([34]) also posit a third, often underappreciated facet of wisdom, namely other-related wisdom, which they define as 'an empathy-based caring concern for both concrete other people and humankind at large' (p. 5); it is most evident in two of the three 3D-WS scales, namely the cognitive and reflective scales, and is possibly a subcomponent of personal wisdom. In (partial) confirmation of this view, Glück et al. found that all three 3D-WS scales loaded on a different factor than the two other quantitative scales. Given that the cognitive scale of the 3D-WS contains items that are indeed about the other (e.g., 'People are either good or bad' and 'You can classify almost all people as either honest or crooked'—both items are reverse-scored), but also items that are often general and external (e.g., 'ignorance is bliss' and 'It is better not to know too much about things that cannot be changed'—both items are reverse-scored), it seems to us that this dimension could be labeled more accurately as 'wisdom about the (social) world', in contrast with the 'wisdom about the self' tapped in personal-wisdom scales. Mindfulness Mindfulness is often defined as a particular way of paying attention—the ability or propensity to engage in "nonelaborative, non-judgmental, present-centered awareness in which each thought, feeling, or sensation that arises in the attentional field is acknowledged" (Bishop et al. [12], p. 232); this awareness requires cultivation (Nilsson and Kazemi [53]). One corollary is that "thought or events are observed as events in the mind without over-identifying with them and without reacting to them in an automatic, habitual pattern of reactivity", thus "introducing a 'space' between one's perception and response" and allowing one "to respond to situations more reflectively (as opposed to reflexively)" (Bishop et al. [12], p. 232). Mindfulness has been found to be broadly beneficial to the individual—mindfulness interventions lead to positive outcomes regarding stress, well-being, anxiety, depression, negative emotions, emotion regulation, rumination, self-compassion, and empathy (Eberth and Sedlmeier [25]; Verhaeghen [68]). These relationships are at least partially causal: changes in dispositional mindfulness after meditation training correlate with changes in self-perceived stress, anxiety, depressed mood, positive affect, negative affect, rumination, and general well-being (Gu et al. [40]; Khoury et al. [43]). Recent theoretical work within the field has converged on the conclusion that mindfulness is a complex concept, more akin to a manifold (or even a cascade of processes) than to a singular construct. The starting point of this work has been an examination of the reasons why mindfulness interventions lead to such a wide array of positive outcomes. Many models have been advanced to explain the translation of mindfulness into positive outcomes (e.g., Baer [ 5 ]; Brown et al. [16]; Chiesa et al. [19]; Creswell and Lindsay [21]; Grabovac et al. [35]; Hölzel et al. [42]; Segal et al. [59]; Shapiro et al. [60]; Vago and Silbersweig [67]), each with their own emphases and levels of complexity. Although details of the different proposed models vary, the list of proposed mechanisms generally contains three categories, as Vago and Silbersweig ([67]) point out. A first proposed mechanism is a change in self-awareness. This involves recognizing automatic habits and automatic patterns of reactivity, as well as an increased awareness of momentary states of body and mind—what is typically meant by mindfulness. A second proposed mechanism is a change in self-regulation. This includes better regulation of emotions, heightened self-compassion, increased emotional and cognitive flexibility, decreased rumination and worry, and increased nonattachment and acceptance. A final proposed mechanism is increased self-transcendence . This implies increased decentering, a stronger awareness of interdependence between self and others, and heightened compassion. Vago and Silbersweig label this common-denominator model the S-ART model, after its three components: self-awareness, self-regulation, and self-transcendence. Our own empirical work on the subject (Verhaeghen [69]; Verhaeghen and Aikman [70]), based on exploratory and confirmatory factor analysis as well as structural equation modeling on 3 independent samples of about 300 subjects each has indeed confirmed the plausibility of this S-ART mindfulness manifold, suggesting a flow of influence from self-awareness over self-regulation to self-transcendence, and then outward to well-being and other aspects of psychological health (for a schematic representation, see Fig. 1). Factor analysis showed that additional subdivisions were present within the components of self-awareness and self-regulation: self-awareness incorporated reflective awareness (the more active, deliberate, probing aspect of mindfulness) and controlled sense-of-self in the moment (the more passive, equanimous, non-judgmental aspect of mindfulness) (for more details on these components and how they are measured, see the "Methods" section below); self-regulation was tapped by (the opposite of) self-preoccupation and by self-compassion. Graph: Fig. 1 The S-ART mindfulness manifold as obtained in Verhaeghen ([69]) Mindfulness and Wisdom There are obvious points of contact between this conceptualization of mindfulness and those of wisdom, suggesting they operate in the same nomological space. First, some of the common-core wisdom subcomponents align with the mindfulness manifold. Clearly, the reflection and self-understanding subcomponent of common-core wisdom has a natural affinity (if not identity) with the reflective awareness component in the mindfulness manifold. A few examples from specific theories illustrate this quite nicely. For instance, Ardelt ([ 3 ]) explicitly claims that '[t]he development of wisdom requires the transcendence of one's subjectivity and projections, which can be accomplished through self-examination, self-awareness, and a reflection on one's own behavior and one's interactions with others' (p. 269). Likewise, Glück and Bluck's ([32]) MORE (mastery, openness, reflectivity, and emotion regulation) model of wisdom posits that wisdom-related knowledge develops through an interaction of life experiences with the four MORE resources, and that therefore wisdom should manifest itself in how people reflect upon past experiences. As a third example, Brown and Greene's model of Wisdom Development ([14]) states that wisdom ripens when individuals go through a core 'learning-from-life' process, comprised of reflection, integration, and application. Pascual-Leone ([55]), as a final example, considers meditation (one possible cultivator of mindfulness) as a path towards wisdom, through its fostering of insight, self-insight, and self-transcendence. Second, emotional homeostasis can be understood as an aspect or outcome of self-regulation. Third, some wisdom researchers explicitly view self-transcendence as a critical component of wisdom (see the Ardelt quote above; also Curnow [22]; Levenson [46]). There are a few empirical indications of a mindfulness-wisdom link as well. One study (Brienza et al. [13]) used its own process-based measure of wisdom, and found correlations with mindfulness scales, especially observing and orienting. Two studies used a training approach to foster wisdom by incorporating mindfulness either explicitly (Sharma and Dewangan [61]) or implicitly (as reflective awareness through a self-reflection journal and a life experience journal; Bruya and Ardelt [17]). The former study did not find intervention effects on either mindfulness or wisdom, but did find significant correlations at pretest between mindfulness (measured by the Mindful Attention Awareness Scale, MAAS; Brown and Ryan [15]) and the affective and reflective components of wisdom. The latter study obtained an intervention effect of the reflective exercises over and beyond those of attending a cognitively oriented class on wisdom, but did not include a measure of mindfulness to verify the proximal cause of the effect. These intervention studies, then, are somewhat suggestive of (but far from definitive about) a positive relationship between mindfulness and wisdom. Wisdom and Ethical Sensitivities The psychological study of ethical sensitivities and attitudes (e.g., Greene [37]; Haidt [41]) has converged on the conclusion that ethical actions are not always the product of the careful application of rational thought, but instead tend to be largely (although not exclusively) based on intuitions—evolved, automatic responses, inaccessible to awareness, which sometimes operate in contradiction with logical constraints. Researchers in this field often consider the vessels for these intuitions to be innate—for instance, Haidt's Moral Foundations Theory (MFT; Graham et al. [36]) posits that ethical sensitivities ultimately boil down to the five dimensions of promoting care/avoiding harm, fairness, ingroup loyalty, (respect for) authority, and purity (or sanctity). The former two are often combined into an 'individualizing' foundation, because they focus on the provision and protection of individual rights; the remaining three into a 'binding' foundation, because they focus on ingroup cohesion. The idea is that every individual is sensitive to these five aspects, but that the intuitions themselves are built through experience, and are thus open to individual and cultural differences through a tuning up or down of the emotional responses due to experiences that fit into these vessels (Flanagan and Williams [28]). In our previous study (Verhaeghen and Aikman [70]), where we adopted the Moral Foundations framework, we found clear links between the mindfulness manifold and ethical sensitivities, which possibly might be mediated through wisdom. Specifically, we found that reflective awareness and self-transcendence were directly related to the individualizing aspects of morality (i.e., an emphasis on care and fairness); only self-transcendence was related to the binding aspects of morality (i.e., an emphasis on loyalty, authority, and sanctity). One reason to suspect that wisdom might play a role in the individualizing foundation stems from its very definition—prosocial attitudes and values are the second most cited key component in Bangen et al.'s ([10]) literature review (21 out of 24 theories or models incorporated this component). A key mechanism may be the self-transcendental character of wisdom, which it has in common with mindfulness. There are empirical reasons to suspect that wisdom is implicated in moral attitudes (for a review of empirical and theoretical links between wisdom and ethics, see Sternberg and Glück [65]). For instance, wisdom has been found to correlate positively with other-oriented values such as well-being of friends, societal engagement, and ecological protection (Kunzmann and Baltes [45]; Webster [73]). Implicit lay theories of wisdom also include value orientations that align, in Haidt's model, with care and fairness (Glück et al. submitted). The Present Study The literature reviewed suggests that mindfulness, wisdom, and ethical sensitivities are related, but the pieces of this puzzle have not yet been fit together. One wide-open question is how the different components of mindfulness, broadly defined as self-awareness, self-regulation, and self-transcendence relate to wisdom; another whether (or how) wisdom might be a mediator translating, and perhaps crystalizing, mindfully experienced events into ethical attitudes. From the literature reviewed above, I expect that all three aspects of mindfulness would be positively related to wisdom. To assess wisdom, I used the three scales most commonly used in quantitative research—the 3D-WS, the ASTI, and the SAWS. After Glück et al. ([34]), I expect that a factor analysis of these measures will yield two dimensions: wisdom about the self (ASTI and SAWS) and wisdom about the (social) world (3D-WS). Given that mindfulness is primarily associated with knowledge of the self, I would expect that the mindfulness-wisdom connection would be stronger for wisdom about the self than for wisdom about the (social) world. Extending our prior work on mindfulness and ethical sensitivities, as well as building on Glück et al. (submitted), I expect that wisdom will be positively connected to the individualizing moral foundations—care and fairness. For the binding foundations—authority, loyalty, and sanctity/purity—the connection is likely less strong. Because wisdom is very often considered an aspect of adult development, I included a group of adults sampled across a large sweep of the adult life span (Sample B, age 25–74), aside from the more usual sample of college students (Sample A). Adding the former sample allows me, first, to check if the results from the first sample replicate, and second, to test whether or not any of the wisdom or ethical components are age-sensitive, as has sometimes been claimed (e.g., Ardelt [ 1 ]; Baltes and Kunzmann [ 7 ]; but see, e.g., Grossmann and Kross [39]; Mickler and Staudinger [51]). Methods Participants Sample A consisted of 260 undergraduate students from the Georgia Institute of Technology, who received course credit in return for their participation. They were invited to participate in a study on 'mindfulness, acceptance, and psychology'. They were aged 18–26 (mean = 19.7, SD = 1.5); 54% were women. Sample B consisted of 173 participants recruited from Mechanical Turk. They were invited to participate in a study on 'mindfulness, acceptance, and psychology', and offered $4 in return for their time. Workers needed to be highly qualified in order to participate—more than 5000 Human Intelligence Tasks (HIT; i.e., surveys or other online tasks) completed to the requesters' satisfaction, and at least 98% of all lifetime HITs approved by the requester. They were aged 21–74 (mean = 39.8, SD = 11.7); 44% were women. The age distribution was as follows: age 21–30: 38 participants; age 31–40: 69 participants; age 41–50: 33 participants; age 51–60: 18 participants; age 61–74: 12 participants. On average, participants had completed 14.9 years of education (SD = 1.9). Although Mechanical Turk is generally considered to be a useful, valid, and reliable tool for behavioral researchers (e.g., Mason and Suri [49]), we found it prudent to assess potential differences in data quality between the two samples. We did this by comparing Cronbach's α values for all subscales (see the "Measures and Procedure" section below for all α values). Sample B (Mechanical Turk) tended to have higher reliability values (median = 0.84, ranging from 0.41 to 0.93) than Sample A (students) (median = 0.71, ranging from 0.48 to 0.90). The correlation between Fisher z -transformed reliability values between the samples was 0.78 (this transformation was applied to linearize the measurement scale), suggesting that both groups were about equally sensitive to differences in the item characteristics that drive reliability. Measures and Procedure Participants filled out all questionnaires online; they took about 45–60 min to complete. Below, questionnaires are grouped thematically; the mindfulness measures (i.e., self-awareness, self-regulation, and self-transcendence) are presented as they resulted from the set of factor analyses (an exploratory analysis on 488 participants, and a confirmatory analysis on an independent sample of 222 participants) in Verhaeghen ([69]); this structure was replicated in Verhaeghen and Aikman ([70]). All measures were collected from both samples. Cronbach's α values reported are the values obtained in the present study, reported separately for Samples A and B, respectively. Note that some scales (notably the subscales of the Self-Compassion Scale) contain a very small number of items, possibly depressing the α values. Control Variables The Mini-IPIP (Donnellan et al. [23]) is a 20-item measurement of the Big Five personality factors , 4 items for each factor: Extraversion (sample item: 'I am the life of the party', Cronbach's α = 0.83 and 0.87), Agreeableness (sample item: 'I sympathize with others' feelings', Cronbach's α = 0.77 and 0.85), Conscientiousness (sample item: 'I get chores done right away', Cronbach's α = 0.68 and 0.78), Openness (which the IPIP labels Intellect/Imagination; sample item: 'I have a vivid imagination', Cronbach's α = 0.71 and 0.84), and Neuroticism (sample item: 'I have frequent mood swings', Cronbach's α = 0.74 and 0.78). Additionally, participants were asked for their age and gender . Social Conservatism Social conservatism was measured via the Social Conservatism subscale (6 items; sample item: 'Please indicate the extent to which you feel positive or negative towards each issue: ... Abortion'; Cronbach's α = 0.62 and 0.69) of the Social and Economic Conservatism Scale (SECS; Everett [27]). Self-awareness Two constructs were assessed within self-awareness. The first, reflective awareness , is the unit-weighted composite of the z -scores of three scales: (a) the Observing subscale of the Five Facets Mindfulness Questionnaire (FFMQ; Baer et al. [ 6 ]) (8 items; sample item: 'When I'm walking, I deliberately notice the sensations of my body moving', Cronbach's α = 0.73 and 0.87); (b) the Reflectiveness subscale of the Broad Rumination Scale (BRS; Trani et al. in preparation) (4 items; sample item: 'It is important for me to understand why I feel a certain way', Cronbach's α = 0.81 and 0.81); and (c) Search for Insight/Wisdom of the Aspects of Spirituality scale (ASP; Büssing et al. [18]) (7 items; sample item: 'I strive for insight and truth', Cronbach's α = 0.84 and. 90). In both samples, the composite was normally distributed, as ascertained via a Kolmogorov–Smirnov test ( p > 0.2). The second construct, controlled sense-of-self in the moment , is the unit-weighted composite of the z -scores of three scales: (a) the Acting with Awareness subscale from the FFMQ (8 items, sample item: the reverse of 'When I'm doing things, my mind wanders off and I'm easily distracted', Cronbach's α = 0.87 and 0.91); (b) the Sense-of-self Scale (SOSS; Flury and Ickes [29]) (12 items, sample item: 'I have a clear and definite sense of who I am and what I'm all about'; Cronbach's α = 0.86 and 0.88); and (c) the Non-judging of inner experience subscale of the FFMQ (8 items, sample item: the reverse of 'I criticize myself for having irrational or inappropriate emotions', Cronbach's α = 0.90 and 0.93). In both samples, the composite was normally distributed, as ascertained via a Kolmogorov–Smirnov test ( p > 0.2). Self-regulation Two constructs were assessed within self-regulation. The first, self-preoccupation , is the unit-weighted composite of the z -scores of two subscales from the BRS, namely Compulsivity (5 items; sample item: 'When I start to worry, it's very hard for me to stop', Cronbach's α = 0.79 and 0.87) and Worrying (3 items; sample item: 'Uncertainty about the future bothers me', Cronbach's α = 0.58 and 0.68), as well as two subscales from the Self-Compassion Scale, Short Form (SCS; Raes et al. [57]), namely Isolation (2 items; sample item: 'When I'm feeling down, I tend to feel like most other people are probably happier than I am', Cronbach's α = 0.56 and 0.63) and Over-Identified (2 items; sample item: 'When I fail at something important to me I become consumed by feelings of inadequacy', Cronbach's α = 0.66 and 0.58). In both samples, the composite was normally distributed, as ascertained via a Kolmogorov–Smirnov test ( p > 0.2). In our previous work, as here, self-preoccupation correlated negatively with other aspects of mindfulness, as one would expect—better self-regulation implies lower, not higher, levels of self-preoccupation. This may be confusing for some readers. Because the construct is, however, measured by scales that tap explicitly into the self-preoccupation aspect, and not its absence or opposite, we preferred to keep the self-preoccupation label. The second, self-compassion , was measured as the unit-weighted composite of the z -scores of three subscales from the SCS, namely Self-Kindness (2 items; sample item: 'I try to be understanding and patient towards those aspects of my personality I don't like', Cronbach's α = 0.61 and 0.60), Common humanity (2 items; sample item: 'I try to see my failings as part of the human condition', Cronbach's α = 0.49 and 0.57), and Mindfulness (2 items; sample item: 'When something painful happens I try to take a balanced view of the situation', Cronbach's α = 0.66 and 0.68), as well as the Decentering subscale of the Experiences Questionnaire (EQ; Fresco et al. 2007) (13 items, sample item: 'I am better able to accept myself as I am'; Cronbach's α = 0.84 and 0.93). The composite was normally distributed in Sample A, Kolmogorov–Smirnov = 0.042, p > 0.2, but not Sample B, Kolmogorov–Smirnov = 0.075, p = 0.034. Self-transcendence Self-transcendence was measured as the unit-weighted composite of the z -scores of 2 subscales from the Dispositional Positive Emotion Scale (DPES; Shiota et al. [62]), namely Joy (6 items; sample item: 'I am an intensely cheerful person', Cronbach's α = 0.84 and 0.90), and Love (6 items; sample item: 'I develop strong feelings of closeness to people easily', Cronbach's α = 0.82 and 0.90), and 1 subscale from the Resilience Scale (RS; Lundman et al. [48]), namely Meaningfulness (7 items, sample item: 'My life has meaning', Cronbach's α = 0.81 and 0.91). The composite was normally distributed in Sample A, Kolmogorov–Smirnov = 0.042, p > 0.2, but not Sample B, Kolmogorov–Smirnov = 0.072, p = 0.046. Moral Foundations This construct was measured using the 5 subscales of the Moral Foundations Questionnaire (Graham et al. [36]): (a) Care/harm (6 items; sample item: 'When you decide whether something is right or wrong, to what extent are the following considerations relevant to your thinking? – Whether or not someone suffered emotionally'; Cronbach's α = 0.52 and 0.76); (b) Fairness (6 items; sample item: '... Whether or not some people were treated differently than others'; Cronbach's α = 0.56 and 0.64); (c) Ingroup loyalty (6 items; sample item: '... Whether or not someone's action showed love for his or her country'; Cronbach's α = 0.48 and 0.84); (d) Authority (6 items; sample item: '... Whether or not someone showed a lack of respect for authority'; Cronbach's α = 0.61 and 0.85); and (e) Purity (6 items; sample item: '... Whether or not someone violated standards of purity and decency'; Cronbach's α = 0.69 and 0.92). Wisdom Scales Participants filled out three self-report wisdom surveys. The Adult Self-Transcendence Inventory (ASTI; Levenson et al. [47]) measures, in the words of the authors, "a decreasing reliance on externals for definition of the self, increasing interiority and spirituality, and a greater sense of connectedness with past and future generations" (p. 127). After factor analysis, Levenson et al. derived a more focused self-transcendence scale, which is used here (Factor 1 of their Table 1; 10 items; sample item: 'My peace of mind is not so easily upset as it used to be'; Cronbach's α = 0.67 and 0.79). The Self-Assessed Wisdom Scale (SAWS; Webster [71]) measures 5 interrelated dimensions of wisdom: experience (8 items; sample item: 'I have experienced many painful events in my life'; Cronbach's α = 0.81 and 0.84), emotions (8 items; sample item: 'I am good at identifying subtle emotions within myself'; Cronbach's α = 0.83 and 0.86), reminiscence (8 items; sample item: 'Reviewing my past helps gain perspective on current concerns'; Cronbach's α = 0.86 and 0.91), openness (8 items; sample item: 'I like to read books which challenge me to think differently about issues'; Cronbach's α = 0.71 and 0.80), and humor (8 items; sample item: 'I can chuckle at personal embarrassments'; Cronbach's α = 0.86 and 0.91). The Three-Dimensional Wisdom Scale (3D-WS; Ardelt [ 2 ]) consists of 3 subscales, tapping the cognitive (14 items, sample item: 'It is better not to know too much about things that cannot be changed'; Cronbach's α = 0.78 and 0.86), reflective (12 items, sample item: 'When I'm upset at someone, I usually try to "put myself in his or her shoes" for a while'; Cronbach's α = 0.55 and 0.54), and affective (13 items, sample item: 'I can be comfortable with all kinds of people'; Cronbach's α = 0.49 and 0.41) components of wisdom. Factor analysis of the nine wisdom scales in both samples; principal axis analysis with oblimin rotation Sample ASample BFactor 1 wisdom about the selfFactor 2 wisdom about the social worldFactor 1 wisdom about the selfFactor 2 wisdom about the social worldASTI (total).67.80SAWS-emotion regulation.72.78SAWS-experience.79.75SAWS-humor.71.77SAWS-openness.65.74SAWS-reminisce-reflect.80.733D-WS-affective.71.803D-WS-cognitive.57.683D-WS-reflective.76.68 N = 260 for Sample A and 173 for Sample B. For legibility reasons, factor loadings below.30 are not represented Measures Collected but Not Included in the Analyses Additionally, participants filled out the Nonattachment Scale (NAS; Sahdra et al. [58]), the Emotional Resilience Scale (ERS; Gross and John [38]); the QUEST scale (Batson and Schoenrade [11]), the Varieties of Inner Speech Questionnaire (VISQ; McCarthy-Jones and Fernyhough [50]), and the Self-Verbalization Scale (SVS; Duncan and Cheyne [24]). Some of those measures were remnants of an earlier (Verhaeghen [69]) attempt at casting a wide net of mindfulness measures; these measures failed to make the final cut after the factor analysis described in that paper (NAS, ERS, and QUEST); others were are not relevant to the present project (VISQ and SVS). Results Factor Analysis of the Wisdom Scales Two exploratory factor analyses (principal axis analysis with oblimin rotation), one for each sample, were conducted on the nine wisdom scales (i.e., the ASTI scale, the three 3D-WS scales and the five SAWS scales). Scale or subscale scores (i.e., not item scores) were the unit of analysis. Eigenvalues and the scree plot suggested a 2-factor solution in both samples. This solution is presented in Table 1; it explains 55% of the variance in Sample A, and 57% of the variance in Sample B. Both analyses converged on the same solution: the ASTI and all the SAWS scales loaded on one factor, and all three 3D-WS scales loaded on another. As mentioned in the introduction, the ASTI and the SAWS scale have in common that they survey wisdom from an intrapersonal perspective, that is, they appear to tap self-knowledge and self-acceptance; the 3D-WS arguably captures skills and wisdom about how to deal with the social world and with external circumstances. Consequently, I will label the first factor wisdom about the self , and the second wisdom about the ( social ) world . The two factors are relatively independent: Their intercorrelation was 0.18 in Sample A and 0.07 in Sample B. Wisdom and the Mindfulness Manifold To examine how the mindfulness manifold is related to self-assessed wisdom, as well as to control for the effects of the set of background variables (personality, age, and gender), hierarchical multiple regression analysis was applied to the data, separated by sample, with the two types of wisdom (wisdom about the self and wisdom about the [social] world) as the final outcome. For these analyses, a unit-weighted composite was constructed from the z -scores for the ASTI and the different SAWS scales to represent wisdom about the self. The unit-weighted composite of the z -scores of the three 3D-WS scales represented wisdom about the (social) world. Both unit-weighted wisdom composites were normally distributed in both samples; highest Kolmogorov–Smirnov = 0.057, p > 0.200. In the first step, the background variables—the five IPIP scales, age, and gender—were entered. The next step added the two self-awareness composites (reflective awareness and controlled sense-of-self in the moment); the step after that the two self-regulation composites (self-preoccupation and self-compassion); the final step added self-transcendence. Pearson correlations between all variables are reported in Table 2; results from the regression analyses in Table 3. Note that in these analyses, self-preoccupation is scored as defined above, that is, higher values indicate higher levels of self-preoccupation, which indicates a low level of self-regulation. Because of the potential conceptual overlap between the mindfulness concept of self-transcendence and wisdom as defined through the ASTI, analyses were rerun after removing the ASTI from the composite measuring wisdom about the self. The wisdom about the self variable and the wisdom about the self variable with the ASTI removed were virtually identical ( r = 0.98 in Sample A and 0.99 in Sample B); the pattern of the regression results was identical (i.e., variables that were significant remained significant and variables that were not remained non-significant). Correlation matrix for the background variables, mindfulness variables, and wisdom factors; Sample A data presented above the diagonal, Sample B below 12345678910111213141516171 IPIP extraversion1.00.29**.01 −.12*.13*.09.10.03.12.22** −.22**.13*.40**.31**.19**.06.062 IPIP agreeableness.25**1.00.17** −.02.25**.18**.03.28**.36**.19**.00.20**.51**.38**.23**.31**.063 IPIP conscientiousness.12.30**1.00 −.16**.05.18**.03.11.09.34** −.11.18**.27**.10 −.02.05.19**4 IPIP neuroticism −.43** −.34** −.36**1.00 −.09 −.04 −.03.24**.08 −.53**.60** −.48** −.34** −.18** −.11.06 −.045 IPIP intellect/imagination.29**.18* −.02 −.20**1.00.07.04 −.15*.35**.08 −.08.07.20**.36**.03.04 −.116 Social conservatism −.04.14.23** −.19* −.111.00 −.05.07.16*.15* −.02.14*.24**.18*.03.11.54**7 Age −.05.13.07 −.08 −.08.30**1.00 −.07.05.03.03 −.02 −.03.03.07 −.03.088 Gender.05 −.31** −.17* −.02.03 −.07 −.21**1.00.04 −.03.21** −.05.13*.05.13*.30**.009 Reflective awareness.22**.34**.26** −.18*.43** −.02 −.12 −.141.00 −.08.22**.23**.35**.60**.15*.37**.23**10 Controlled sense-of-self in the moment.33**.40**.37** −.62**.21**.05.17* −.10.17*1.00 −.54**.42**.43**.22**.14* −.03.0111 Self-preoccupation −.37** −.22** −.23**.57** −.19* −.08 −.17* −.08 −.02 −.56**1.00 −.44** −.27** −.08 −.14*.30**.1112 Self-compassion.06.16* −.07 −.20**.03.05.04 −.04.17* −.01.17*1.00.48**.41**.21**.14*.17**13 Self-transcendence.52**.59**.34** −.66**.16*.26**.04 −.12.43**.54** −.47**.21**1.00.57**.27**.35**.24**14 Wisdom about the self.34**.51**.32** −.47**.40**.10.11 −.14.66**.45** −.28**.22**.68**1.00.28**.41**.26**15 Wisdom about the (social) world.11.06.08 −.08.08 −.05.05 −.06.10.05 −.06.00.11.101.00.18**.1016 Individualizing foundation.09.38**.09 −.13.17* −.08.06 −.15.31**.13 −.02.03.29**.43**.111.00.33**17 Binding foundation −.04.20**.20* −.12 −.20*.77**.13 −.10 −.01 −.02.09.07.31**.16*.01.071.00 N = 260 for Sample A and 173 for Sample B IPIP International Personality Item Pool (https://ipip.ori.org/) * p <.05 Results from hierarchical regression analyses to predict the wisdom factors Step 1Step 2Step 3Step 4Sample ASample BSample ASample BSample ASample BSample ASample BWisdom about the self IPIP extraversion0.19**0.080.16**0.020.17**0.030.11* − 0.06 IPIP agreeableness0.24**0.26**0.080.17**0.060.17** − 0.010.05 IPIP conscientiousness0.010.07* − 0.060.01 − 0.060.03 − 0.080.02 IPIP neuroticism − 0.16** − 0.21** − 0.15** − 0.19** − 0.10 − 0.17* − 0.06 − 0.05 IPIP intellect/imagination0.28**0.31**0.13**0.110.16**0.110.14*0.18** Age − 0.010.08 − 0.020.13* − 0.010.12*0.010.13* Gender0.07 − 0.060.080.010.070.020.050.02 Reflective awareness0.52**0.50**0.46**0.49**0.40**0.38** Controlled sense-of-self in the moment0.15*0.120.120.130.070.09 Self-preoccupation0.04 − 0.010.050.05 Self-compassion0.19**0.060.14*0.03 Self-transcendence0.28**0.41**R2.296.455.506.622.526.625.561.673R2 change.296**.455**.210**.167**.020**.003.035**.048**Wisdom about the (social) world IPIP extraversion0.130.120.100.130.090.130.060.12 IPIP agreeableness0.21** − 0.010.16*0.000.16*0.000.16 − 0.01 IPIP conscientiousness − 0.090.03 − 0.130.04 − 0.120.04 − 0.13*0.04 IPIP neuroticism − 0.17** − 0.02 − 0.13 − 0.08 − 0.07 − 0.09 − 0.05 − 0.08 IPIP intellect/imagination − 0.050.06 − 0.080.06 − 0.080.06 − 0.080.07 Age0.050.040.050.040.060.050.070.05 Gender0.11 − 0.070.10 − 0.070.11 − 0.070.10 − 0.07 Reflective awareness0.110.040.130.040.100.02 Controlled sense-of-self in the moment0.12 − 0.120.07 − 0.110.05 − 0.12 Self-preoccupation − 0.120.03 − 0.110.04 Self-compassion0.03 − 0.000.01 − 0.08 Self-transcendence0.130.06R2.116.033.132.043.140.043.148.044R2 change.116*.033.016.009.008.000.008.001 N = 260 for Sample A and 173 for Sample B IPIP International Personality Item Pool (ipip.ori.org) * p <.05, ** p <.01 Ethical Sensitivity as Consequence of Mindfulness and Wisdom Hierarchical regression was applied to investigate how wisdom and the mindfulness manifold potentially shape ethical sensitivity, operationalized here as the moral foundations. To keep the number of analyses manageable, the two individualizing foundations were collapsed into a single construct by taking the average of the z -scores of the Care/Harm and Fairness scales (the correlation between the two individualizing foundations was 0.50 in Sample A, and 0.57 in Sample B); likewise, a unit-weighted z -score composite was built from the three binding foundations, namely Ingroup loyalty, Authority, and Purity (intercorrelations between the three binding foundations ranged from 0.59 to 0.64 in Sample A, and from 0.63 to 0.78 in Sample B). As is usual (because individuals generally tend to skew towards the ethical side of the distribution), these composites were not normally distributed, Kolmogorov–Smirnov = 0.109, 0.112, 0.139, and 0.073, for individualizing in Samples A and B and binding in sample A and B, respectively, p = 0.000, 0.000, 0.000, and 0.040, respectively. Pearson correlations are reported in Table 2; results from the regression analyses in Table 4. Rerunning the regression analyses with the alternate measure of wisdom about the self, that is, with the ASTI removed, yielded an identical pattern as obtained for the original wisdom about the self concept (i.e., variables that were significant remained significant and variables that were not remained non-significant). Results from hierarchical regression analyses to predict the moral foundations Step 1Step 2Step 3Step 4Step 5Sample ASample BSample ASample BSample ASample BSample ASample BSample ASample BIndividualizing foundation IPIP extraversion − 0.06 − 0.02 − 0.04 − 0.03 − 0.01 − 0.03 − 0.06 − 0.11 − 0.10 − 0.09 IPIP agreeableness0.23**0.34**0.110.33**0.100.34**0.050.25*0.030.23* IPIP conscientiousness0.060.010.01 − 0.02 − 0.00 − 0.04 − 0.03 − 0.040.01 − 0.05 IPIP neuroticism − 0.04 − 0.03 − 0.10 − 0.10 − 0.21* −.16 − 0.17 − 0.07 − 0.17* − 0.05 IPIP intellect/imagination0.15*0.080.040.020.070.020.040.08 − 0.030.03 Social conservatism0.01 − 0.15 − 0.00 − 0.16 − 0.01 − 0.16 − 0.03 − 0.22* − 0.02 − 0.20* Age − 0.060.05 − 0.050.09 − 0.080.11 − 0.060.13 − 0.070.07 Gender0.21** − 0.060.25** − 0.030.21** − 0.030.18* − 0.020.17* − 0.02 Reflective awareness0.33**0.190.22**0.20*0.17*0.110.03 − 0.05 Controlled sense-of-self in the moment − 0.05 − 0.120.05 − 0.110.02 − 0.15 − 0.00 − 0.17 Self-preoccupation0.38**0.100.39**0.170.39**0.13 Self-compassion0.10 − 0.110.04 − 0.15 − 0.01 − 0.15 Self-transcendence0.27**0.35*0.160.17 Wisdom about the self0.42**0.41** Wisdom about the self (ASTI excluded)(NA)(NA) Wisdom about the (social) world0.010.04R2.158.160.233.191.300.202.329.232.404.285R2 stepwise change.158**.160**01,075**.033.067**.011.029**.031*.075**.053**Binding foundation IPIP extraversion − 0.020.030.000.040.030.050.00 − 0.02 − 0.01 − 0.02 IPIP agreeableness − 0.080.09 − 0.120.10 − 0.130.11 − 0.15*0.04 − 0.15*0.03 IPIP conscientiousness0.21**0.030.22**0.040.21**0.020.20**0.030.21**0.02 IPIP neuroticism0.070.07 − 0.020.02 − 0.05 − 0.06 − 0.030.00 − 0.030.02 IPIP intellect/imagination0.02 − 0.10 − 0.03 − 0.10 − 0.01 − 0.11 − 0.02 − 0.06 − 0.06 − 0.09 Social conservatism0.54**0.80**0.54**0.80**0.54**0.80**0.53**0.74**0.53**0.75** Age0.02 − 0.100.02 − 0.110.00 − 0.090.01 − 0.060.01 − 0.09 Gender − 0.13 − 0.04 − 0.10 − 0.05 − 0.13* − 0.03 − 0.14* − 0.02 − 0.14* − 0.02 Reflective awareness0.130.000.040.010.02 − 0.06 − 0.06 − 0.13 Controlled sense-of-self in the moment − 0.15* − 0.08 − 0.12 − 0.06 − 0.13 − 0.09 − 0.15 − 0.10 Self-preoccupation0.21*0.15*0.22**0.20**0.21*0.19** Self-compassion0.14 − 0.090.12 − 0.11*0.09 − 0.12* Self-transcendence0.100.28**0.050.22* Wisdom about the self0.23**0.15 Wisdom about the self (ASTI excluded)(NA)(NA) Wisdom about the (social) world − 0.040.04R2.361.651.391.655.419.668.423.690.447.698R2 stepwise change.361**.651**.030*.004.029*.013.004.024**.023*.008 N = 260 for Sample A and 173 for Sample B IPIP International Personality Item Pool (https://ipip.ori.org/) * p <.05, ** p <.01 Discussion In the present study, I investigated if and how wisdom might be related to dispositional mindfulness, broadly construed as a manifold of self-awareness, self-regulation, and self-transcendence, and if and how it might promote ethical sensitivities. Wisdom was measured using the three self-report surveys most often used in quantitative research on the topic—the 3D-WS, the ASTI, and the SAWS. Two independent samples were included: A sample of college students (Sample A), and one of adult workers on Mechanical Turk with a much wider age range (viz., 21–74; Sample B). The Structure of Wisdom A first expectation (after Glück et al. [34]) was that factor analysis on the subscales of the three surveys would reveal a bifurcation between wisdom about the self (ASTI and SAWS) and wisdom about the (social) world (3D-WS). Factor analysis indeed confirmed this divergence, in both samples. The correlation between the two dimensions was small, 0.18 in Sample A and 0.07 in Sample B, underscoring the relative independence of these two aspects of wisdom. This result replicates that of Glück et al., who obtained a correlation of 0.11. The present study is the first to also show functional independence between the two constructs, in that both types of wisdom have different correlates, as explicated in the next two sections. Predicting Wisdom About the Self From the literature reviewed in the Introduction, I expected that all three aspects of mindfulness—self-awareness, self-regulation, and self-transcendence—would be positively related to wisdom. Regression analysis suggested that this is (partially) true, but only for wisdom about the self. Before I detail these results, note that the background variables explained a fair amount of variance in wisdom about the self: it was negatively related to neuroticism, and positively related to agreeableness and intellect/imagination in both samples, and additionally to extraversion in the college sample and conscientiousness in the Mechanical Turk sample. After taking mindfulness into account, only the influence of intellect/imagination (in both groups) and extraversion (in the college sample) remained significant, but the coefficients were substantially reduced (with β s roughly half of those in Step 1). This suggests that the effects of agreeableness and neuroticism are wholly mediated through the effects of mindfulness, and those of extraversion and intellect/imagination are partially mediated. Levenson et al. ([47]) obtained a negative effect of neuroticism, and a positive effect of openness (i.e., imagination/intellect in this sample), agreeableness, and conscientiousness on the ASTI, a measure of wisdom about the self; only the latter correlation was absent from the present results. Within the Berlin wisdom paradigm, openness to experience is likewise a strong predictor of wisdom scores (e.g., Pasupathi et al. [56]; Staudinger and Glück [64]). This makes sense: if wisdom is at least partially based on experience, an openness to new experiences would be key for its development or flourishing. Crucially, the mindfulness manifold explained an additional 21% to 26% of the variance in wisdom about the self, over and beyond the variance explained by personality, age, and gender. In both samples, one aspect of self-awareness—reflective awareness—was a significant and strong predictor of wisdom about the self, with β values around 0.40 for the final step. The other aspect of self-awareness, however—controlled sense-of-self in the moment—was not a significant predictor (except in Step 2 in the college sample). It appears, then, that wisdom about the self is associated with a reflective stance about one's experiences (i.e., reflective awareness), but not with the experience of being present in the moment (i.e., controlled sense-of-self in the moment)—in other words, it is the examination of or the investigation into one's experiences rather than the mere witnessing of those experiences that is important for this type of wisdom, as many models of wisdom (e.g., Ardelt [ 3 ]; Brown and Greene [14]; Glück and Bluck [31]) indeed explicitly predict. It is interesting to note that self-compassion (at least in the college sample) was an additional predictor for wisdom about the self. The reasons might be that self-compassion allows one to step back from the immediacy of the experience, and consider oneself the way one would consider a friend—this friendly distancing, like the reflection/examination component, might possibly help to foster the transcendence Ardelt ([ 3 ]) considers so necessary for the development of wisdom. Self-preoccupation was not related to wisdom in either sample. One additional link found here was that between self-transcendence and wisdom about the self (with β values on par with or a little lower than those for reflective awareness). This association is almost self-evident, given that quite a few theorists consider self-transcendence to be a critical component of wisdom (Ardelt [ 3 ]; Curnow [22]; Levenson [46]). Note that this relationship remained unchanged when the ASTI, a measure of wisdom the conceptually relies on self-transcendence, was removed from the composite that tapped wisdom about the self, suggesting that the relationship cannot be explained merely by conceptual overlap between the measure of self-transcendence and the ASTI. The role of reflective awareness and self-compassion in wisdom about the self, however, is not merely to foster self-transcendence: the final step in the regression analyses clearly shows that the effects of reflective awareness (both samples) and self-compassion (college sample) are far from completely mediated by self-transcendence. It is also important to stress that the three background variables and the mindfulness manifold provide us with a very good handle on the individual differences in wisdom about the self: they explain a little more than half to two thirds of the variance (between 56 and 67%, to be precise), indicating that these constructs probably should be important components in any realistic theory of wisdom about the self. Predicting Wisdom About the (Social) World Wisdom about the (social) world, in contrast, was not predicted by the mindfulness manifold at all. There is some indication that wisdom about the (social) world might have roots in individual differences in personality instead: individuals scoring higher on agreeableness and lower on neuroticism scored higher on wisdom about the (social) world; however, this was only true in the student sample. As in wisdom about the self, the effects of agreeableness and neuroticism were wholly mediated through the effects of mindfulness, even though the latter effects did not rise to the level of significance. These personality correlates have some face validity in their predictive value. That is, it makes sense that people who are (or want to appear) more friendly, warm, and helpful might be better at picking up on social cues or be more interested in understanding how the social world and the world in general works. Neuroticism, in general, is related to overreactivity, negative emotions, and feeling easily threatened by social situations; none of these qualities would likely be conducive to acquire the type of equanimity associated with wisdom in general (see Wink and Staudinger [74], for a similar argument). Note that Ardelt et al. ([ 4 ]) found that openness and extraversion correlated with the 3D-WS (in a sample of 98 males who were approximately 80 years old); we found such correlations for wisdom about the self, not for wisdom about the (social) world. The reason for the discrepancy is unclear. The reason why the influence of personality variables on wisdom about the (social) world is constrained to the college group is likewise unclear. One potential reason could be adult development: perhaps as people grow older the grip of personality on their outlook on the world loosens. There is a hint of this in the present data: after a median split on the Mechanical Turk sample, the relevant correlations were nominally higher in the younger sample (correlation of wisdom about the [social] world with agreeableness was 0.11, with neuroticism − 0.12) than the older subsample (0.01 and − 0.04, resp.). None of these correlations, however, reached significance. This, then, remains an area for further research. Note that the Mechanical Turk sample was highly educated (about 3 years of college), so educational differences are unlikely to explain the cross-sample differences. Also note that the relationship with personality is much smaller than that observed in wisdom about the self: the background variables (personality, age, and gender) explained 30–46% of the variance in wisdom about the self, versus only 3–12% in wisdom about the (social) world. Wisdom about the (social) world is not only distinct from wisdom about the self; it also seems, with the present measures, much harder to explain. Wisdom and the Moral Foundations Turning now to ethical sensitivity as a potential consequence of mindfulness and wisdom, I found, first, a conceptual (partial) replication of our earlier paper (Verhaeghen and Aikman [70]) on the effects of mindfulness on the moral foundations. In that paper, we found, in two independent samples, that reflective awareness, self-preoccupation, and self-transcendence were related to the individualizing aspects of morality (i.e., an emphasis on care and fairness) (note that the relationship with self-preoccupation was only significant in Sample A in the present study). Self-compassion and self-transcendence were positively related to the binding aspects of morality (i.e., an emphasis on loyalty, authority, and sanctity). In the present data, an additional effect of self-preoccupation on binding was obtained, and the effect of self-compassion on binding was not significantly different from zero in one sample, and, surprisingly, negative in the other. Wisdom about the self turned out to be a strong predictor for the individualizing foundation, that is, one's sensitivity to the ethical dimensions of care and fairness ( β for the final step = 0.42 and 0.41, resp.). In contrast, wisdom about the (social) world had only a negligible and non-significant influence on the individualizing foundation ( β = 0.01 and 0.04). While most theories about wisdom posit an effect on ethics, notably "prosocial attitudes and behaviors, which include empathy, compassion, warmth, altruism, and a sense of fairness" (Bangen et al. [10], p. 1257), the present data suggest that this effect remains restricted to wisdom about the self, and does not extend to wisdom about the (social) world. Within the group of mindfulness variables, the effects of self-awareness on the individualizing foundation were partially mediated through self-transcendence (i.e., the coefficients associated with self-awareness become smaller once self-transcendence enters the equation) and wholly mediated through wisdom about the self (i.e., the coefficients associated with self-awareness became non-significant once the wisdom variables enter the equation, but only wisdom about the self had a reliable effect). The effects of self-transcendence on individualizing, in turn, were fully mediated through wisdom, and particularly wisdom about the self. One possible interpretation of the latter finding is that self-transcendence is a precursor for wisdom about the self; another that self-transcendence as defined here is subsumed under or maybe even synonymous with wisdom about the self. The latter interpretation is certainly compatible with views about wisdom as a form of self-transcendence (Ardelt [ 3 ]; Curnow [22]; Levenson [46]). Whatever the mechanism, wisdom about the self thus appears to foster an increased emphasis on the ethical dimensions of care and fairness, and this is partially due to the influence of reflective awareness and self-transcendence on wisdom about the self. The effects of wisdom on the binding foundations (i.e., an emphasis on authority, ingroup loyalty, and purity) were rather small. The strongest predictor for the binding foundation remained social conservatism, with people who are more conservative showing larger interest in the binding foundation ( β for the final step = 0.53 and 0.75). Wisdom about the self had a much smaller effect ( β for the final step = 0.23 and 0.15; the latter value was ns ); the contribution of wisdom about the (social) world was essentially nil ( β for the final step = − 0.04 and 0.04, ns ). In the college sample, participants who were less agreeable, more conscientious, male, and more self-preoccupied showed a larger interest in the binding foundation. The latter effect replicated for the Mechanical Turk sample, where lower levels of self-compassion and higher levels of self-transcendence were additionally related to a higher interest in binding. If we look at the results that replicate across both samples, the take-away message is that an interest in the binding foundation is determined mostly by social conservatism, and maybe, but to a much smaller extent, by wisdom about the self. This implies a second amendment to the Bangen et al. ([10]) quotation above, to the effect that wisdom's fostering of prosocial attitudes applies mostly to attitudes that make the rights and concerns of others visible (i.e., treating individuals with care and fairness), and less so to attitudes pertaining to ingroup cohesion (i.e., a focus on loyalty, authority, and purity).
    1. Philosophy for Children, Values Education and the Inquiring Society.Published in:Educational Philosophy & Theory,Oct2014,Professional Development CollectionBy:Cam, Philip Philosophy for Children, Values Education and the Inquiring Society.  How can school education best bring about moral improvement? Socrates believed that the unexamined life was not worth living and that the philosophical examination of life required a collaborative inquiry. Today, our society relegates responsibility for values to the personal sphere rather than the social one. I will argue that, overall, we need to give more emphasis to collaboration and inquiry rather than pitting students against each other and focusing too much attention on 'teaching that' instead of 'teaching how'. I will argue that we need to include philosophy in the curriculum throughout the school years, and teach it through a collaborative inquiry which enables children to participate in an open society subject to reason. Such collaborative inquiry integrates personal responsibility with social values more effectively than sectarian and didactic religious education. Keywords: religion; ethics; community of inquiry; spiral curriculum Introduction [ 1 ]As Socrates would have it, the philosophical examination of life is a collaborative inquiry. The social nature of the enterprise goes with its spirit of inquiry to form his bifocal vision of the examined life. These days, insofar as our society teaches us to think about values, it tends to inculcate a private rather than a public conception of them. This makes reflection a personal and inward journey rather than a social and collaborative one, and a person's values a matter of parental guidance in childhood and individual decision in maturity. The relegation of responsibility for values to the personal sphere also militates against societal self-examination. On the other hand, the traditional pontifical alternative is equally presumptive and debilitating in ignoring the possibility of personal judgement. How can education steer a course between the tyranny of unquestionable moral codes and the bankruptcy of individualistic moral relativism? It remains to be seen whether there is a way in which education could teach children to engage productively across their differences rather than responding to difference with suspicion or prejudice. Gilbert Ryle (in Cahn, 1970) made a clear distinction between 'teaching how' and 'teaching that', arguing from a behaviourist perspective that teaching how had a much more lasting impact than simply teaching the facts. However, too much emphasis on 'teaching how' can result in conditioning, training, teaching to conform to habit, teaching obedience with the threat of hellfire if the rules are broken. There is a third way, the way of philosophy espoused by Matthew Lipman ([ 8 ]) in his Philosophy for Children, which involves giving more emphasis to collaboration and inquiry rather than pitting students against each other and focusing too much attention on 'teaching that' instead of 'teaching how'. Philosophy as it is traditionally taught may well involve teaching how to follow the rules of formal logic correctly, or learning facts about the life and death of Socrates, but it also requires a capacity for critical reflection, consideration of alternative possibilities, and a genuine concern for truth and clarity. I argue that we need to include philosophy in the curriculum throughout the school years, but it needs to be a philosophy taught in the spirit of Socrates which balances individual and social values. Religious instruction tends to inculcate values through adult imposition and denies space to critical judgement. Ryle's distinction between 'learning that' and 'learning how' implied that these were discrete and exclusive ways of learning. However, learning how to do things is more than a matter of memorizing facts or following procedural instructions. Being able to cook is more than being able to follow a recipe book. Again, while some instruction is useful in learning to ride a bike, it is mostly a matter of trying to ride, and then, under guidance, trying again. It is a case of learning by doing, and doing it under different circumstances, in order to apply it in different circumstances. This is working out for oneself how to exercise individual judgement, rather than first learning a set of instructions and then carrying them out (Ryle, in Cahn, 1970, pp. 413–424). Whatever the rules are, they are heuristic and strategic, depending on different contexts, rather than algorithmic and learnable by rote. 'Learning how' can be important in many areas of the curriculum where training in skills is an important feature, especially in physical education and the arts, However, learning the art of inquiry requires a slightly different type of 'learning how' from training, rehearsal, repetition. A curriculum that is based on inquiry is one that is centred on thinking. There is a world of difference in the outcome to be expected from an education that treats knowledge as material with which to think and one that emphasizes memorization of knowledge. It is the difference between an inquiring society and one in which those few who have developed an inquiring mind have done so in spite of their education rather than because of it (Dewey, 1916/1966, chap. 12; Lipman, [ 8 ]). The concept of a community of inquiry owes much to Dewey who, in Democracy and education (1916/1966), described the healthy relation between an individual and his or her environment as functional. Dewey insisted that because the relationship between the individual and his or her environment must be based on mutual adjustment, fitting into society might well involve radically changing it. Dewey believed in the importance of preparing students for democratic citizenship. He stressed that consciously guided education aimed at developing the 'mental equipment' and moral character of students was essential to the development of civic character. Is this not what religious instruction tries to do? The relationship between the individual and society was far more important for Dewey than the child's relationship with an abstract God. It was organic and continually evolving in mutual adaptation. It differs from religious instruction in that its aim is to develop a model of free inquiry, which requires tolerance of alternative viewpoints, and free communication. He also believed that children's capacity for the exercise of deliberative, practical reason in moral situations could be cultivated not by ready-made knowledge but by 'a mode of associated living' characteristic of democracy. Lipman ([ 7 ]) was to elaborate on this idea of schools as a model of a participatory democracy and his classroom community of inquiry provided close analogies with the democratic school, a microcosm of the wider society. Thinking Together When we move away from the traditional classroom to the inquiring one and the teacher becomes less occupied with conveying information—with teaching 'that'— it becomes educationally desirable for students to engage with one another. When human conduct stimulates moral inquiry it is usually because that conduct is controversial, which is to say that there are different points of view as to how it should be judged. If you and I have different opinions in regard to someone's character or conduct, then we are both in need of justification and our views are subject to each other's objections. When we make a proposal to solve a practical problem of any complexity, we rely upon others who are reasonably well placed for constructive criticism or a better suggestion. If we want students to grow out of the habit of going with their own first thoughts, to become used to considering a range of possibilities, and to be on the lookout for better alternatives, then we could not do better than to have them learn by exploring issues, problems and ideas together. If we want them to become used to giving reasons for what they think, to expect the same of others, and to make productive use of criticism, then we could not go past giving them plenty of practice with their peers. And if we want them to grow up so that they consider other people's points of view, and not to be so closed minded as to think that those who disagree with them must be either ignorant or vicious, then the combination of intellectual and social engagement to be found in collaborative inquiry is just the thing. These are all good reasons for having our students learn to inquire together. Philosophy for Children More than any other discipline, philosophy is an inquiry into fundamental human problems and issues, where all the general conceptions that animate society come under scrutiny. Philosophy as a formal discipline played an important part in its place as a matriculation subject in some Australian states, because there were rigorous rules by which its standards could be maintained. This would involve, say, learning that ignoratio elenchi was an informal fallacy, or that modus tollens is an illegitimate move in deductive logic, or learning how to mount a reasoned argument in defence of a position. When, however, we are talking abut philosophy for children, its subject matter needs to be adapted to the interests and experience of students of various ages and its tools and procedures adjusted to their stage of development. There are models to work from, particularly the series of novels and manuals from Matthew Lipman, and in recent years we have begun to find our way forward.[ 2 ] If part of the difficulty is also that some philosophers think of philosophy as being above all that, it is salutary to remember that other disciplines have long since discovered how to recast themselves in educational form. Just as mathematics was forced to become more practical and relevant to the growing range of children who were staying on at school through the New Maths, so philosophy has been forced to become more real and relevant to children. The move towards an integrated curriculum away from discrete learning areas also required philosophy to make the connections across and through disciplines, raising the larger questions of epistemology, ontology, aesthetics and, for the purpose of this article, the important area of axiology or values. For philosophy to have a formative influence, and thereby to significantly affect both the way people think and the character of their concerns, it needs to be part of the regular fare throughout the school years. Only by this means can it effectively supply its nutrients to the developing roots of thought or knowing that and action or knowing how. We need to counter the view that philosophy is an advanced discipline, suitable only for the academically gifted and intellectually mature. Jerome Bruner made famous the startling claim that 'the foundations of any subject may be taught to anybody at any age in some form' (1960, p. 12), and he suggested that the prevailing view of certain disciplines being too difficult for younger students results in our missing important educational opportunities. Bruner called this structure a spiral curriculum : one that begins with the child's intuitive understanding of the fundamentals, and then returns to the same basic concepts, themes, issues and problems at increasingly elaborate and more abstract or formal levels over the years. A spiral curriculum is vital for developing the kind of deep understanding that belongs to philosophy and the humanities. What else is to be gained from building philosophy into the curriculum throughout the school years? It seems to me that an education in philosophical inquiry will assist students to achieve a rich understanding of a wide array of issues and ideas that inform life and society through an increasingly deep inquiry into them. It will help students to think more carefully about issues and problems that do not have a unique solution or a settled decision procedure, but where judgements and decisions can be better or worse in all kinds of ways. Since most of the problems that we face in life and in our society are of that character, the general-purpose tools that students acquire through philosophy will ensure that they are better prepared to face those problems. If philosophy is carried out in the collaborative style envisaged above, then its recipients will also be more likely to tackle such problems collaboratively, and thereby to be more constructive and accommodating with one another. Let me spell all this out a little under the headings of 'thinking', 'understanding' and 'community'. Thinking Philosophy is a discipline with a particular focus on thinking. It involves thinkers in the cognitive surveillance of their own thought. It is a reflective practice, in the sense that it involves not only careful thinking about some subject matter, but thinking about that thinking, in an effort to guide and improve it. Since philosophical thinking tends to keep one eye on the thinking process, philosophy can supply the tools that assist the thinker in such tasks as asking probing questions, making needful distinctions, constructing fruitful connections, reasoning about complex problems, evaluating propositions, elaborating concepts, and honing the criteria that are used to make judgements and decisions. Dewey's (2010) five-step model of identifying the problem and placing it in context, making creative and testable hypotheses that move towards a possible solution, analysing the hypotheses in terms of past experience, considering alternative hypotheses that may be more suitable, and checking possible solutions against actual experiences was picked up as a model of individual thinking, especially in science and design work. But in a community of inquiry each of these steps is done from the multiple perspectives of the group at any age, allowing not only the falsifiability of any conservative position to truth but also their complete contingency. The skills, abilities and habits of skills, abilities and habits of thinking—acquiring the habit of reflecting carefully upon your own thoughts, as well as what others think; developing the ability to imagine and evaluate new possibilities; developing the habit of changing your mind on the basis of good reasons; and acquiring skill in the establishment and use of appropriate criteria to form sound judgements—provide the methodology of Lipman's community of inquiry. Understanding Philosophy deals with ethical questions about how we should behave, social questions about the good community, epistemological questions about the justification of people's opinions, metaphysical questions about our spiritual lives, or logical questions about what we may reasonably infer, and is therefore a rich source of our cultural heritage and of contemporary thought and debate. In terms of both its history and ways of thinking, philosophy also helps to deepen our understanding of the big ideas and key concepts that have helped to shape civilization and continue to inform the way we live. Our conceptions of what makes something right or wrong, of justice, freedom and responsibility, of our personal, cultural and national identity, of sources of knowledge, of the nature of truth, beauty and goodness, are all central to what we value and how we conduct our affairs. Since such concepts so deeply inform life and society, it is important for students to develop their understanding of them. While we may attempt to deal with these matters elsewhere in the curriculum, philosophical inquiry gives students the tools that they need in order to explore these ideas in depth. Community With regard to cooperative thinking and the importance of community, I would stress the virtues of dialogue. As we work to resolve differences in our understandings, or to subject our reasons to each other's judgement, or try to follow an argument where it leads, we are like detectives whose clues are the experience, inferences, judgements and other intellectual considerations that each thinker brings into the dialogue with others. On this view, philosophical inquiry provides a model of the inquiring community: one that is engaged in thoughtful deliberation and decision making, is driven by a desire to make advance through cooperation and dialogue, and values the kinds of regard and reciprocity that grow under its influence. Just because it has these characteristics, philosophical inquiry can provide a training-ground for people who are being brought up to live together in such a community. Dewey's five steps require the philosophical disposition to give reasons when that is appropriate; and, generally, to cooperate with others and respect different points of view. Values Education The vital significance of educating for judgement in regard to values is nowhere more clearly recognized than in the writings of John Dewey: 'The formation of a cultivated and effectively operative good judgment or taste with respect to what is aesthetically admirable, intellectually acceptable and morally approvable is the supreme task set to human beings by the incidents of experience' (Dewey, 1929/1980, p. 262). This makes the cultivation of judgement the ultimate educational task and the development of good judgement central to values education in particular. Values education therefore cannot be simply a matter of instructing students as to what they should value—just so much 'teaching that'—as if students did not need to inquire into values or learn to exercise their judgement. In any case, it is an intellectual mistake to think that values constitute a subject matter to be learned by heart. They are not that kind of thing. Values are embodied in commitments and actions and not merely in propositions that are verbally affirmed. Nor can values education be reduced to an effort to directly mould the character of students so that they will make the right moral choices—as if in all the contingencies of life there was never really any doubt about what one ought to do, and having the right kind of character would ensure that one did it. Being what is conventionally called 'of good character' will not prevent you from acting out of ignorance, from being blind to the limitations of your own perspective, from being overly sure that you have right on your side, or even from committing atrocities with a good conscience in the name of such things as nation or faith. History is littered with barbarities committed by men reputedly of good character who acted out of self-righteous and bigoted certainty. Far from being on solid moral ground, the ancient tradition that places emphasis upon being made of the right stuff has encouraged moral blindness towards those of different ethnicity, religion, politics, and the like. Whatever else we do by way of values education, we must make strenuous efforts to cultivate good judgement. When it comes to deciding what to do in a morally troubling situation, good judgement involves distinguishing more from less acceptable decisions and conduct. Such discernment needs to be made by comparing our options in the circumstances in which they occur. Any such comparison requires us to ensure that, insofar as possible, we have hold of all the relevant facts. It involves us doing our best to make sure that we have not overlooked any reasonable course of action. It requires us to think about the consequences of making one decision, or taking one course of action, by comparison with another, and to be mindful of the criteria against which we evaluate them. It requires us to monitor the consequences of our actions in order to adjust our subsequent thinking to actuality. In short, good moral judgement requires us to follow the ways of inquiry. Dewey (1920/1957, pp. 163–164) says: A moral situation is one in which judgment and choice are required antecedently to overt action. The practical meaning of the situation—that is to say the action needed to satisfy it—is not self-evident. It has to be searched for. There are conflicting desires and alternative apparent goods. What is needed is to find the right course of action, the right good. Hence, inquiry is exacted: observation of the detailed make-up of the situation; analysis into its diverse factors; clarification of what is obscure; discounting of the more insistent and vivid traits; tracing of the consequences of the various modes of action that suggest themselves; regarding the decision reached as hypothetical and tentative until the anticipated or supposed consequences which led to its adoption have been squared with the actual consequences. The lack of integration of our advanced empirical and scientific knowledge with the remnants of value systems of much earlier times is already a problem of considerable proportions. We should not be adding to this burden when we teach science and technology, or history, or about society and the environment. Instead, we need to introduce our students to ways of thinking that develop their values in conjunction with their other understandings. This approach to values education fits with the emphasis to be placed upon collaborative inquiry for several reasons. First, the idea that values are to be cultivated by student reflection rather than impressed upon the student from without by moral authority does not imply that the pursuit of values is a purely personal affair. That would be a pendulum swing to individualistic relativism. Collaborative inquiry supplies a middle road—a way forward between an unquestioningly traditional attitude towards values and an individualism that makes each person their own moral authority. The development of good judgement through collaborative inquiry is the path towards a truly social intelligence. Secondly, values inquiry depends upon different points of view. If something is uncontroversial and everyone is of the same opinion, then there is no motivation for inquiry. Inquiry arises in situations where something is uncertain, puzzling, contentious or in some way problematic. The collaborative inquiry is organic, synergistic and evolving, a kind of moral practice based on a principle of democracy. Consider such elementary aspects of philosophical practice as: learning to hear someone out when you disagree with what they are saying; learning to explore the source of your disagreement rather than engaging in personal attacks; developing the habit of giving reasons for what you say and expecting the same of others; being disposed to take other people's interests and concerns into account; and generally becoming more communicative and inclusive. To see values education as continuous with all of our other efforts to educate our young in the ways of inquiry is to place it firmly in the tradition of reflective education rather than traditional religious instruction. Religious instruction cannot take on the burden of a systematic exploration of the ethical issues involved in the various areas of the curriculum as they are presented throughout the rest of the week. If we are to cultivate good moral judgement we need to make it integral to the material that we teach and not something we attempt to establish in such a disconnected fashion. From a pedagogical perspective, while it would be possible for religious instructors to introduce students to values inquiry, they are under no obligation to do so and many of them come from traditions that are likely to use the occasion to moralize and engage in indoctrination instead. This is not to say that religious education is incompatible with values inquiry. It is rather to acknowledge the need for change. Much of traditional religious instruction is antithetical to the educational requirements of an inquiring society; and if we are to develop such a society, such an outdated approach should not retain its foothold in our schools. This still leaves it open as to whether the school takes a philosophical approach to values education, or insists upon indoctrination rather than education. We should not think of philosophy and religion as representing two incompatible options when it comes to values education. They are representative, however, of a deeper choice that must be made in relation to values education, the choice between appeal to reason and dogmatism as central to the way we teach. Footnotes 1 Editor's Note : This article has been substantially edited and modified since it was delivered as a keynote address in December 2010. The context in which it was written reflects an ongoing tension between the didactic teaching of ethics through religious education and a more organic process of teaching ethics by modelling it and discussing it in philosophical discussion. In New South Wales (NSW) religious education was not compulsory, but Education Department policy forbade schools from offering alternative lessons to students who chose not to take part in scripture. The NSW government tasked St James Ethics Centre, under the guidance of Professor Cam, to develop and deliver ethics education classes in urban, regional and rural primary schools as an alternative to religious education. St James Ethics Centre promptly established Primary Ethics Limited, an independent not-for-profit organization, to develop an engaging, age-appropriate, interconnected curriculum that spans the primary years from Kindergarten to Year 6 and to then deliver ethics education free of charge via a network of specially trained and accredited volunteers. Despite protests from Church leaders in NSW that they should have sole responsibility for values education, on 1 December 2010 Parliament amended the NSW Education Act to give students who do not attend special religious education/scripture classes in NSW public schools the legal right to attend philosophical ethics classes as an alternative to supervised 'private study'. Because of the popularity of secular ethics classes, pressure from Church leaders and a change to a conservative state government, it was legislated in 2012 that parents should be told of the availability of ethics classes in their school only after they have opted out of special religious education or scripture. 2 Since the early 1990s Lipman's followers have extended his work and this general approach is now represented in schools in many countries around the world. For a selection of Australasian resources see http://www.fapsa.org.au/resources/catalogue References Bruner, J. S. (1960). The process of education. Cambridge, MA: Harvard University Press. Cahn, S. E. (Ed.). The philosophical foundations of education. New York: Harper & Row. 3 Dewey, J. (1910). How we think. Chicago, IL: D. C. Heath & Co. 4 Dewey, J. (1957). Reconstruction in philosophy (enlarged ed.). Boston, MA: Beacon Press. (Original work published 1920). 5 Dewey, J. (1966). Democracy and education. London: Collier Macmillan. (Original work published 1916). 6 Dewey, J. (1980). The quest for certainty. New York: Perigee Books. (Original work published 1929). 7 Lipman, M. (1988). Philosophy goes to school. Philadelphia, PA: Temple University Press. 8 Lipman, M. (2002). Thinking in education (2nd ed.). New York: Cambridge University Press. 9 Ryle, G. (1970). Teaching and training. In S. M. Cahn (Ed.), The philosophical foundations of education (pp. 413–424). New York: Harper & Row. ~~~~~~~~ By Philip Cam Reported by Author
    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      In this valuable manuscript, Lin et al attempt to examine the role of long non coding RNAs (lncRNAs) in human evolution, through a set of population genetics and functional genomics analyses that leverage existing datasets and tools. Although the methods are incomplete and at times inadequate, the results nonetheless point towards a possible contribution of long non coding RNAs to shaping humans, and suggest clear directions for future, more rigorous study.

      Comments on revisions:

      I thank the authors for their revision and changes in response to previous rounds of comments. As before, I appreciate the changes made in response to my comments, and I think everyone is approaching this in the spirit of arriving at the best possible manuscript, but we still have some deep disagreements on the nature of the relevant statistical approach and defining adequate controls. I highlight a couple of places that I think are particularly relevant, but note that given the authors disagree with my interpretation, they should feel free to not respond!

      (1) On the subject of the 0.034 threshold, I had previously stated: "I do not agree with the rationale for this claim, and do not agree that it supports the cutoff of 0.034 used below."

      In their reply to me, the authors state:

      "What we need is a gene number, which (a) indicates genes that effectively differentiate humans from chimpanzees, (b) can be used to set a DBS sequence distance cutoff. Since this study is the first to systematically examine DBSs in humans and chimpanzees, we must estimate this gene number based on studies that identify differentially expressed genes in humans and chimpanzees. We choose Song et al. 2021 (Song et al. Genetic studies of human-chimpanzee divergence using stem cell fusions. PNAS 2021), which identified 5984 differentially expressed genes, including 4377 genes whose differential expression is due to trans-acting differences between humans and chimpanzees. To the best of our knowledge, this is the only published data on trans-acting differences between humans and chimpanzees, and most HS lncRNAs and their DBSs/targets have trans-acting relationships (see Supplementary Table 2). Based on these numbers, we chose a DBS sequence distance cutoff of 0.034, which corresponds to 4248 genes (the top 20%), slightly fewer than 4377."

      I have some notes here. First, Agoglia et al, Nature, 2021, also examined the nature of cis vs trans regulatory differences between human and chimps using a very similar set up to Song et al; their Supplementary Table 4 enables the discovery of genes with cis vs trans effects although admittedly this is less straightforward than the Song et al data. Second, I can't actually tell how the 4377 number is arrived at. From Song et al, "Of 4,671 genes with regulatory changes between human-only and chimpanzee-only iPSC lines, 44.4% (2,073 genes) were regulated primarily in cis, 31.4% (1,465 genes) were regulated primarily in trans, and the remaining 1,133 genes were regulated both in cis and in trans (Fig. 2C). This final category was further broken down into a cis+trans category (cis- and transregulatory changes acting in the same direction) and a cis-trans category (cis- and trans-regulatory changes acting in opposite directions)." Even when combining trans-only and cis&trans genes that gives 2,598 genes with evidence for some trans regulation. I cannot find 4,377 in the main text of the Song et al paper.

      Elsewhere in their response, the authors respond to my comment that 0.034 is an arbitrary threshold by repeating the analyses using a cutoff of 0.035. I appreciate the sentiment here, but I would not expect this to make any great difference, given how similar those numbers are! A better approach, and what I had in mind when I mentioned this, would be to test multiple thresholds, ranging from, eg,0.05 to 0.01 <DBS dist =0.01 -> 0.034 -> 0.05> at some well-defined step size.

      (1) We sincerely thank the reviewer for this critical point. Our initial purpose, based on DBS distances from the human genome to chimpanzee genome and archaic genomes, was that genes with large DBS distances may have contributed more to human evolution. However, our ORA (overrepresentation analysis) explored only genes with large DBS distances (the legend of old Figure 2 was “1256 target genes whose DBSs have the largest distances from modern humans to chimpanzees and Altai Neanderthals are enriched in different Biological Processes GO terms”), with the use of the cutoff (threshold) of 0.034 for defining large distance. The cutoff is not totally unreasonable (as our new results and the following sensitivity analysis indicate), but this approach was indirect and flawed.

      (2) We have now performed ORA using two methods. The first uses only DBS distances. Instead of using a cutoff, we now sort genes by DBS distance (human-chimpanzee distances and human-Altai Neanderthal distance, respectively, see Supplementary Table 5) and use the top 25% and bottom 25% of genes to perform ORA. This directly examines whether DBS distances along indicate that genes with large DBS distances contribute more to human evolution than genes with small DBS distances. The second also explores the ASE genes (allele-specific expression, genes undergoing human/chimpanzee-specific regulation in the tetraploid human–chimpanzee hybrid iPS) reported by Agoglia et al. 2021. We select the top 50% and bottom 50% of genes with large and small DBS distances, intersect them with ASE genes from Agoglia et al. 2021 (their Supplementary Table 4), and apply ORA to the intersections. Both the results are that: (a) more GO terms are obtained from genes with large DBS distances, (b) more human evolution-related GO terms are obtained from genes with large DBS distances (Supplementary Table 5,6,7; Figure 2; Supplementary Fig. 15). These results directly suggest that genes with large DBS distances contribute more to human evolution than genes with small DBS distances, which is a key theme of the study.

      (3) Regarding Song et al 2021, the statement of “we differentiated…allotetraploid (H1C1a, H1C1b, H2C2a, H2C2b) lines into ectoderm, mesoderm, and endoderm” made us assume that their differentiated hybrid cell lines cover more tissue types than those of Agoglia et al. 2021. Now, upon re-examining Supplementary Table 5 of Song et al. and Supplementary Table 4 of Agoglia et al. 2021, we find that the latter more clearly indicates significant ASE genes (p-adj<0.01 and |LFC>0.5| in GRCh38 and PanTro5).

      (4) We have also performed two additional analyses in response to the suggestion of “test multiple thresholds, ranging from, eg, 0.05 to 0.01 <DBS dist =0.01 -> 0.034 -> 0.05> at some well-defined step size”. First, we performed a multi-threshold sensitivity analysis using a spectrum of cutoffs (0.03, 0.034, 0.04, 0.05), and tracked the number of genes identified and the enrichment significance of key GO terms (e.g., "neuron projection development," "behavior") across these thresholds. The result confirms that while the absolute number of genes varies with the cutoffs, the core biological conclusion (specifically, the significant enrichment of target genes in neurodevelopmental and cognitive functions) remains stable and significant. For instance, "behavior" maintains strong statistical significance (FDR<0.01) in both the human-chimpanzee and human-Altai Neanderthal comparisons across all tested cutoffs, and "Neuron projection development" also remains significant across three (0.03, 0.034, 0.04) of the four cutoffs in the Altai comparison. This pattern suggests that our core findings regarding neurodevelopmental functions are robust across a range of cutoffs. Nevertheless, we did not extend the analysis to smaller cutoffs (e.g., 0.01 or 0.02) because such values would identify an excessively large number of genes (>10000) for ORA, which would render the GOterm enrichment analysis less meaningful due to a loss of specificity.

      Second, we have performed an additional validation to directly evaluate whether the 0.034 cutoff itself represents a stringent and biologically meaningful value. We sought to empirically determine how often a DBS sequence distance of 0.034 or greater might occur by chance in promoter regions, thereby testing its significance as a marker of potential evolutionary divergence. We randomly sampled 10,000 windows from annotated promoter regions across the hg38 genome, each with a size matching the average length of DBSs (147 bp). We then calculated the per-base sequence distances for these random windows between modern humans and chimpanzees, as well as between modern humans and the three archaic humans (Altai, Denisovan, Vindija). The analysis reveals that a distance of ≥0.034 is a rare event in random promoter sequences: for Human-Chimp, Human-Altai, HumanDenisovan, and Human-Vindija, 5.49% (549/10000), 0.31% (31/10000), 4.47% (447/10000), and0.03% (3/10000) of random windows reach this distance. This empirical evidence suggests that 0.034 is a sufficiently strong cutoff for defining large DBS distance, it would occur very unlikely in a random genomic background (P<0.1 for Chimpanzee and P<0.05 for the archaic humans), and DBSs exceeding this cutoff are significantly enriched for sequences that have undergone substantial evolutionary change instead of being random neutral variations.  

      (5) We present new Figure 2, Supplementary Table 5,6,7, and Supplementary Fig. 15. We have substantially revised section 2.3, related sections in Results, Supplementary Note 3, and Supplementary Table 8. We have removed related descriptions and explanations in the main text and Supplementary Notes. The results of the above two analyses are presented here as two Author response images.

      Author response table 1.

      Sensitivity analysis of GO-term enrichment across different DBS sequence distance cutoffs. The table shows the numbers of target genes identified and the false discovery rates (FDR) for the enrichment of three selected GO terms at four different distance cutoffs. Note that, unlike in the old Figure 2, the results for chimpanzees and Altai Neanderthals are not directly comparable here, as the numbers of target genes used for the enrichment analysis differ between them at each cutoff.

      Author response image 1.

      Distribution of per-base sequence distances for DBS size-matched random genomic windows in Ensembl-annotated promoter regions, calculated between modern humans and (A) chimpanzee, (B) Altai Neanderthal, (C) Denisovan, and (D) Vindija Neanderthal genomes.

      (2) The authors have introduced a new TFBS section, as a control for their lncRNAs - this is welcome, though again I would ask for caution when interpreting results. For instance, in their reply to me the authors state: "The number of HS TFs and HS lncRNAs (5 vs 66) <HS TF vs all HS lncRNAs> alone lends strong evidence suggesting that HS lncRNAs have contributed more significantly to human evolution than HS TFs (note that 5 is the union of three intersections between <many2zero + one2zero> and the three <human TF list>)."

      But this assumes the denominator is the same! There are 35899 lncRNAs according to the current GENCOVE build; 66/35899 = 0.0018, so, 0.18% of lncRNAs are HS. The authors compare this to 5 TFs. There are 19433 protein coding genes in the current GENCOVE build, which naively (5/19433) gives a big depletion (0.026%) relative to the lnc number. However, this assumes all protein coding genes are TFs, which is not the case. A quick search suggests that ~2000 protein coding genes are TFs (see, eg, https://pubmed.ncbi.nlm.nih.gov/34755879/); which gives an enrichment (although I doubt it is a statistically significant one!) of HS TFs over HS lncRNAs (5/2000 = 0.0025). Hence my emphasis on needing to be sure the controls are robust and valid throughout!

      We thank the reviewer for this comment. While 5 vs 66 reveals a difference, a direct comparison is too simplified. The real take-home message of the new TFBS section is not the numbers but the distributions of HS TFs’ targets and HS lncRNAs’ targets across GTEx organs and tissues (Figure 3 and Supplementary Figures 24, 25) - correlated HS lncRNA-target transcript pairs are highly enriched in brain regions, but correlated HS TF-target transcript pairs are distributed broadly across GTEx tissues and organs. We have now removed the simple comparison of “5 vs 66” and more carefully explained our comparison in section 2.6.

      (3) In my original review I said: line 187: "Notably, 97.81% of the 105141 strong DBSs have counterparts in chimpanzees, suggesting that these DBSs are similar to HARs in evolution and have undergone human-specific evolution." I do not see any support for the inference here. Identifying HARs and acceleration relies on a far more thorough methodology than what's being presented here. Even generously, pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee.

      In their reply to me, the authors state:

      Here, we actually made an analogy but not an inference; therefore, we used such words as "suggesting" and "similar" instead of using more confirmatory words. We have revised the latter half sentence, saying "raising the possibility that these sequences have evolved considerably during human evolution".

      Is the aim here to draw attention to the ~2.2% of DBS that do not have a counterpart? In that case, it would be better to rewrite the sentence to emphasise those, not the ones that are shared between the two species? I do appreciate the revised wording, though.

      (1) Our original phrasing may be misleading, and we agree entirely that “pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee”. As explained in that reply, we know and think that DBSs and HARs are two different classes of sequences, and indeed, identifying HARs and acceleration relies on a far more thorough methodology. Yet, three factors prompted us to compare them. First, both suggest the importance of sequences outside genes. Second, both are quite “old” sequences and have undergone considerable evolution recently (although the references are different). Third, both have contributed greatly to human brain evolution.  

      (2) Here, our stress is 97.81% but not 2.2%, and we have made this analogy more clearly and cautiously. Relevant revisions have been made in the Results, Discussion, and Methods sections.   

      (3) We also have further determined whether the 2.2% DBSs are human-specific gains by analyzing them using the UCSC Multiz Alignments of 100 Vertebrates. The result confirms that all 2248 DBSs are present in the human genome but are absent from the chimpanzee genome and all other aligned vertebrate genomes. We add this result into the manuscript.

      (4) Finally, Line 408: "Ensembl-annotated transcripts (release 79)" Release 79 is dated to March 2015, which is quite a few releases and genome builds ago. Is this a typo? Both the human and the chimpanzee genome have been significantly improved since then!

      (1) We thank the reviewer for this comment, which prompts us to provide further explanation and additional data. First, we began predicting HS lncRNAs’ DBSs when Ensembl release 79 was available, but did not re-predict DBSs when new Ensembl releases were published because (a) these new Ensembl releases are based also on hg38, (b) we did not find any fault in the LongTarget program during our use, nor received any one from users, (c) predicting lncRNAs’ DBSs using the LongTarget program is highly time-consuming.  

      (2) Second, to assess the influence of newer Ensembl releases, we compared the promoters annotated in release 79 and in release 115. We found that the vast majority (87.3%) of promoters newly annotated in release 115 belong to non-coding genes. Thus, using release 115 may predict more DBSs in non-coding genes, but downstream analyses based on protein-coding genes would be essentially the same (meaning that all figures and tables would be the same).

      (3) Third, a key element of this study is GTEx data analysis, and these data were also published years ago.  

      (4) Finally, some lncRNA genes have new gene symbols in new Ensembl releases. To allow researchers to use our data conveniently, we have added a new column titled "Gene symbol (Ensembl release115)" to Supplementary Tables 2A and 2B.  

      Summary:

      Major changes based on Reviewer’s comments:

      (1) The following revisions are made to address the comment on “the 0.034 threshold”: (a) Section 2.3, section 2.4, Supplementary Note 3, and related contents in Discussion and Methods are revised, (b) new Figure 2, Supplementary Figure 15, new Supplementary Table 5,6,7, (c) Table 2 and Supplementary Table 8 are revised.

      (2) To address the comment on “new TFBS section”, section 2.6 and section 4.13 are revised.  

      (3) To address the comment on “97.81% and 2.2% of DBSs”, section 2.3 is revised.

      (4) The following revisions are made to address the comment on “release 79”: (a) the old Supplementary Table 2, 3 are merged to Supplementary Table 2AB, and the new column "Gene symbol (Ensembl release115)" is added to Supplementary Table 2AB, (b) accordingly, Supplementary Table 4,5 are renamed to Supplementary Table 3,4.

      Additional revisions:

      (1) Section 2.5 “Young weak DBSs may have greatly promoted recent human evolution” is moved into Supplementary Note 3 (which now has the subtitle “Target genes with specific DBS features are enriched in specific functions”), because this section is short and lacking sufficient cross-validation.

      (2) Considerable minor revisions of sentences have been made.

      (3) Since there are many supplementary figures, the main text now cites only Supplementary Notes, as the reader can easily access supplementary figures in Supplementary Notes.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This article deals with the chemotactic behavior of E coli bacteria in thin channels (a situation close to 2D). It combines experiments and simulations.

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, close to the average radius of the circle trajectories of the unconfined bacteria in 2D. It is known that these circles are chiral and impose that the bacteria swim preferentially along the right-side wall when there is no chemotactic gradient. In the presence of a chemotactic gradient, this larger proportion of bacteria swimming on the right wall yields chemotaxis. This effect is backed by numerical simulations and a geometrical analysis.

      If the conclusions drawn from the experiments presented in this article seem clear and interesting, I find that the key elements of the mechanism of this wall-directed chemotaxis are not sufficiently emphasized. Moreover, the paper would be clearer with more details on the hypotheses and the essential ingredients of the analyses.

      We thank the reviewer for these constructive suggestions. We agree that emphasizing the underlying mechanism is crucial for the clarity of our findings. In the revised manuscript, we have now explicitly highlighted the critical roles of chiral circular motion and the alignment effect following side-wall collisions in both the Abstract (lines 25-27) and the Discussion (lines 391-393). Furthermore, we have added a new analysis of bacterial trajectories post-collision (Fig. S2), which demonstrates that cells predominantly align with and swim along the sidewalls. We have also clarified the assumptions in our numerical simulations, specifically how the radius of circular trajectories and the alignment effect are incorporated into the equations of motion. Please refer to our detailed responses in the "Recommendations for the authors" section for further specifics.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors investigated the chemotaxis of E. coli swimming close to the bottom surface in gradients of attractant in channels of increasingly smaller width but fixed height = 30 µm and length ~160 µm. In relatively large channels, they find that on average the cells drift in response to the gradient, despite cells close to the surface away from the walls being known to not be chemotactic because they swim in circles.

      They find that this average drift is due to the cell localization close to the side walls, where they slide along the wall. Whereas the bacteria away from the walls have no chemotaxis (as shown before), the ones on the left side wall go down-gradient on average, but the ones on the right-side wall go up-gradient faster, hence the average drift. They then study the effect of reducing channel width. They find that chemotaxis is higher in channels with a width of about 8 µm, which approximately corresponds to the radius of the circular swimming R. This higher chemotactic drift is concomitant to an increased density of cells on the RSW. They do simulations and modeling to suggest that the disruption of circular swimming upon collision with the wall increases the density of cells on the RSW, with a maximal effect at w = ~ 2/3 R, which is a good match for their experiments.

      Strengths:

      The overall result that confinement at the edge stabilises bacterial motion and allows chemotaxis is very interesting although not entirely unexpected. It is also important for understanding bacterial motility and chemotaxis under ecologically relevant conditions, where bacteria frequently swim under confinement (although its relevance for controlling infections could be questioned). The experimental part of the study is nicely supported by the model.

      Weaknesses:

      Several points of this study, in particular the interpretation of the width effect, need better clarification:

      (1) Context:

      There are a number of highly relevant previous publications that should have been acknowledged and discussed in relation to the current work:

      https://pubs.rsc.org/en/content/articlehtml/2023/sm/d3sm00286a

      https://link.springer.com/article/10.1140/epje/s10189-024-00450-7

      https://doi.org/10.1016/j.bpj.2022.04.008

      https://doi.org/10.1073/pnas.1816315116

      https://www.pnas.org/doi/full/10.1073/pnas.0907542106

      https://doi.org/10.1038/s41467-020-15711-0

      http://doi.org/10.1038/s41467-020-15711-0

      http://doi.org/10.1039/c5sm00939a

      We appreciate the reviewer bringing these important publications to our attention. We have now cited and discussed these works in the Introduction (lines 55-62 and 76-85) to better contextualize our study regarding bacterial motility and chemotaxis in confined geometries.

      (2) Experimental setup:

      a) The channels are built with asymmetric entrances (Figure 1), which could trigger a ratchet effect (because bacteria swim in circle) that could bias the rate at which cells enter into the channel, and which side they follow preferentially, especially for the narrow channel. Since the channel is short (160 µm), that would reflect on the statistics of cell distribution. Controls with straight entrances or with a reversed symmetry of the channel need to be performed to ensure that the reported results are not affected by this asymmetry.

      We appreciate the reviewer's insight regarding the potential ratchet effect caused by asymmetric entrances. To rule this out, we fabricated a control device with straight entrances and repeated the measurements. As shown in Figure S3, the chemotactic drift velocity follows the same trend as observed in the original setup, confirming an optimal width of ~9 mm. These results demonstrate that the entrance geometry does not bias the reported statistics. We have updated the manuscript text at lines 233-235.

      b) The authors say the motile bacteria accumulate mostly at the bottom surface. This is strange, for a small height of 30 µm, the bacteria should be more-or-less evenly spread between the top and bottom surface. How can this be explained?

      We apologize for not explaining this clearly in the text. As shown by Wei et al., Phys. Rev. Lett. 135, 188401 (2025), significant surface accumulation occurs in channels with heights exceeding 20 µm. In our specific experimental setup, we did not use Percoll to counteract gravity. Therefore, the bacteria accumulated mostly at the bottom surface under the combined influence of gravity and hydrodynamic attraction. This bottom-surface localization is supported by our observation that the bacterial trajectories were predominantly clockwise (characteristic of the bottom surface) rather than counter-clockwise (characteristic of the top surface). We have added this explanation to Line 141.

      c) At the edge, some of the bacteria could escape up in the third dimension (http://doi.org/10.1039/c5sm00939a). What is the magnitude of this phenomenon in the current setup? Does it have an effect?

      We thank the reviewer for raising this important point regarding 3D escape. We have quantified this phenomenon and found the escape rate from the edge into the third dimension to be 0.127 s<sup>-1</sup>. This corresponds to a mean residence time that allows a cell moving at 20 mm/s to travel approximately 157.5 mm along the edge. Since this distance is comparable to the full length of our lanes (~160 mm), most cells traverse the entire edge without escaping. Furthermore, our analysis is based on the average drift of the surface trajectories per unit of time; this metric is independent of the absolute number of cells present. Therefore, the escape phenomenon does not significantly impact our conclusions. We have added a statement clarifying this at line 154.

      d) What is the cell density in the device? Should we expect cell-cell interactions to play a role here? If not, I would suggest to de-emphasize the connection to chemotaxis in the swarming paper in the introduction and discussion, which doesn't feel very relevant here, and rather focus on the other papers mentioned in point 1.

      The cell density in our experiments was approximately 1.3×10<sup>-3</sup> μm<sup>-2</sup>. Given this low density, we do not expect cell-cell interactions to play a role in the observed behaviors.

      Regarding the connection to swarming chemotaxis: We agree that our low-density setup differs from a high-density swarm; however, we believe the comparison remains relevant for two reasons. First, it provides a necessary contrast to studies showing surface inhibition of chemotaxis. Second, while we eliminate cell-cell interactions, we isolate the geometric aspect of swarming. In a swarm, cells move within narrow lanes created by their neighbors. Our device mimics this specific physical confinement by replacing neighboring cells with PDMS sidewalls. This allows us to decouple the effects of physical confinement from cell-cell interactions. We have added the text (Line 370) to clarify this rationale and have incorporated the additional references in introduction as suggested in point 1.

      e) We are not entirely convinced by the interpretation of the results in narrow channels. What is the causal relationship between the increased density on the RSW and the higher chemotactic drift? The authors seem to attribute higher drift to this increased RSW density, which emerges due to the geometric reasons. But if there is no initial bias, the same geometric argument would induce the same increased density of down-gradient swimmers on the LSW, and so, no imbalance between RSW and LSW density. Could it be the opposite that the increased RSW density results from chemotaxis (and maybe reinforces it), not the other way around? Confinement could then deplete one wall due to the proximity of the other, and/or modify the swimming pattern - 8 µm is very close to the size of the body + flagellum. To clarify this point, we suggest measuring the bacterial distributions in the absence of a gradient for all channel widths as a control.

      We thank the reviewer for this insightful comment regarding the causal relationship between cell density and chemotactic drift. We apologize if the initial explanation was unclear.

      Regarding the no-gradient control: Without an attractant gradient (and no initial bias), there is no breaking of symmetry and the labels of "LSW" and "RSW" are arbitrary. Therefore, there will be no asymmetry in the bacterial distributions on both sides (within experimental fluctuations) in the absence of a gradient for any channel width.

      Regarding the causality and density imbalance: We agree that the increased RSW density is a result of chemotaxis, which is then reinforced by the lane geometry especially at narrow lane width. The mechanism relies on the coupling of chemotactic bias with surface circularity. The angle ranges that lead to RSW-UG accumulation (Fig. 6A-C) coincide with the up-gradient direction. Because these cells experience suppressed tumbling (longer runs), they can maintain the steady circular trajectories required to reach and align with the RSW. Conversely, while pure geometric analysis suggests a similar potential for LSW-DG accumulation, these trajectories coincide with the down-gradient direction. These cells experience enhanced tumbling, which distorts the circular trajectories. This prevents them from effectively reaching the LSW and also increases the probability of them leaving the wall. Therefore, the causality is indeed a positive feedback loop: the attractant gradient creates an initial bias that allows the RSW-UG fraction to form stable trajectories; the optimal lane width (matching the swimming radius) then maximizes this capture efficiency, further enriching the RSW fraction and enhancing the overall drift.

      We have added clarifications regarding these points in the revised manuscript (the last paragraph of “Results”).

      (3) Simulations:

      The simulations treat the wall interaction very crudely. We would suggest treating it as a mechanical object that exerts elastic or "hard sphere" forces and torques on the bacteria for more realistic modeling.

      We appreciate the reviewer's suggestion to incorporate more detailed mechanical interactions, such as elastic or hard-sphere forces, for the wall collisions. While we agree that a full hydrodynamic or mechanical model would offer higher fidelity, our experimental observations suggest that a simplified kinematic approach is sufficient for the specific phenomena studied here.

      As shown in the new Fig. S2, our analysis of cell trajectories in the 44-µm-wide channels reveals that cells colliding with the sidewalls tend to align with the surface almost instantaneously. The timescale required for this alignment is negligible compared to the typical wall residence time (see also Ref. 6). Consequently, to maintain computational efficiency without sacrificing the essential physics of the accumulation effect, we employed a coarse-grained phenomenological model where a bacterium immediately aligns parallel to the wall upon contact, similar to approaches used previously (Ref. 43). We have added relevant text to the manuscript on lines 168-171.

      Notably, the simulations have a constant (chemotaxis independent) rate of wall escape by tumbling. We would expect that reduced tumbling due to up-gradient motility induces a longer dwell time at the wall.

      We apologize for the confusion. The chemotaxis effect is indeed fully integrated into our simulation. Specifically, the simulated cells sense the chemical gradient and adjust their motor CW bias (B) accordingly. This adjustment directly modulates the tumble rate (k), calculated as k \= B/0.31 s<sup>-1</sup>. Consequently, the wall escape rate is not constant but varies with the chemotactic response. We also imposed a maximum detention time limit which, when combined with the variable tumble rate, results in an average wall residence time of approximately 2 s, consistent with our experimental observations (Fig. S6B). We have clarified these details in the final section of 'Materials and Methods'.

      Reviewer #3 (Public review):

      This paper addresses through experiment and simulation the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.

      While the data analysis is reasonably convincing, I think that the authors could make much better use of what must be voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, I would like to see much more analysis of how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis. In essence, there needs to be a much clearer control analysis of trajectories without sidewalls to understand the mechanism in their presence.

      We thank the reviewer for this insightful suggestion. We agree that understanding how circular trajectories are interrupted by wall collisions is central to explaining the enhanced chemotaxis. While we did not explicitly formulate a Fokker-Planck equation, we have addressed the reviewer's core point by employing two complementary mathematical approaches that model the probability distribution of swimming directions and wall interactions:

      (1) Stochastic simulations (Langevin approach): As detailed in the "Simulation of E. coli chemotaxis within lane confinements" subsection of “Results” and Figure 5, we modeled cells as self-propelled particles performing random walks. This model explicitly accounts for the "interruption" of circular trajectories by incorporating a constant angular velocity (circular swimming) and an alignment effect upon collision with sidewalls. These simulations successfully reproduced the experimental trends, confirming that the interplay between circular radius and lane width determines the optimal drift velocity.

      (2) Geometric probability analysis: To provide the "intuitive understanding", we included a specific Geometrical Analysis section (the last subsection of “Results”) and Figure 6. This analysis mathematically formulates the problem by calculating the exact proportion of swimming angles that allow a cell to transition from a circular trajectory in the bulk to an up-gradient trajectory along the Right Sidewall (RSW). By integrating over the possible swimming directions, we derived the probability of wall interception as a function of lane width (w) and swimming radius (r). This analysis reveals that the interruption of circular paths is most favorable for chemotaxis when w » (0.7-0.8)´r.

      (3) Control analysis: regarding the "control analysis of trajectories without sidewalls," we utilized the cells in the Middle Area (MA) of the wide lanes as an internal control. As shown in Fig. 2B and 4A, these cells exhibit typical surface-associated circular swimming (Fig. 3B) but generate zero net drift. This serves as the baseline "no sidewall" condition, demonstrating that the chemotactic enhancement is strictly driven by the rectification of circular swimming into wall-aligned motion at the boundaries.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. Yet, each of these would be characterized by significant heterogeneity in pore sizes and geometries, and thus it is very unclear whether or how the findings in this work would carry over to those situations.

      We thank the reviewer for this important observation regarding environmental heterogeneity. We agree that we should be cautious about directly extrapolating to complex ecological contexts without qualification. We have revised the last sentence of the abstract to adopt a more measured tone: "Our results may offer insights into bacterial navigation in complex biological environments such as host tissues and biofilms, providing a preliminary step toward exploring microbial ecology in confined habitats and potential strategies for controlling bacterial infections."

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Key elements of the mechanism of wall-directed chemotaxis are not sufficiently emphasized:

      For instance, the chirality of the trajectories is an essential part of the analysis but is mentioned only briefly in the introduction. In the geometrical analysis, I understand that one of the critical parameters is the angle at which bacteria "collide" with the walls. But, again, this remains largely implicit in the discussion. This comes to the point that these ideas are not even mentioned in the abstract which doesn't provide any hint of a mechanism. An analysis of the actual trajectories of the cells after they hit the walls, as a function of their initial angle would be helpful in comparison with the simulations and the geometrical analysis.

      We appreciate the reviewer's insightful comment regarding the need to better emphasize the mechanism of wall-directed chemotaxis. We agree that the chirality of trajectories and the geometry of wall collisions are central to our analysis and were previously under-emphasized.

      To address this, we have made the following revisions:

      (1) We have revised the Abstract (lines 25-27) and the Discussion (lines 391-393) to explicitly highlight the crucial role of chiral circular motion and the alignment effect following sidewall collisions.

      (2) We further analyzed bacterial trajectories at different collision angles. Typical examples are shown in Supplementary Fig. S2. We observed that cells tend to align with and swim along the sidewalls regardless of their initial collision angles. This finding is now described in the main text at lines 168-171.

      The motion of the bacteria is modelled as run-and-tumble at several places in the manuscript, and in particular in the simulations. Yet, the trajectories of the bacteria seem to be smooth in this almost 2D geometry, except of course when they directly interact with the walls (I hardly see tumbles in the MA region in Figure 1B). Can the authors elaborate on the assumptions made in the numerical simulations? In particular, how is the radius of the trajectories included in these equations of motion (line 514)?

      We apologize for the lack of clarity regarding the bacterial motion model. It has been established that while bacteria do tumble near solid surfaces, they exhibit a smaller reorientation angle compared to bulk fluids; in fact, the most probable reorientation angle on a surface is zero (Ref. 41). Consequently, tumbles are often difficult to distinguish from runs with the naked eye. Additionally, the trajectories in Figure 1B are plotted on a 44 mm ´ 150 mm canvas with unequal coordinate scales, which may further obscure the visual distinctness of tumbling events.

      Regarding the equations of motion: We modeled the bacteria as self-propelled particles governed by the internal chemotaxis pathway, alternating between run and tumble states. As noted in the equations on lines 286 & 578, we incorporated the circular motion by introducing a constant angular velocity, −ν<sub>0</sub>/r, during the run state. Here, ν<sub>0</sub> represents the swimming speed, r denotes the radius of circular swimming, and the negative sign indicates clockwise chirality. Furthermore, to model the hydrodynamic interaction with the boundaries, we assumed that when a cell collides with a sidewall, its velocity vector instantly aligns parallel to that wall.

      The comparison of Figure 5B (simulations) with Figure 4B (experiments) does not strike me as so "similar". Why are the points at small widths so noisy (Figure 5AB)? Figure 5C is cut at these widths, it should be plotted over the entire scale.

      We acknowledge that the agreement between simulation and experiment is less robust in the narrowest channels. The discrepancy and "noise" at small widths in Figure 5 arise from the limitations of the self-propelled particle model in highly confined geometries. Specifically, our simulation treats bacteria as point particles and does not explicitly calculate the physical exclusion (steric effects) caused by the finite size of the flagella and cell body.

      In the experimental setup, steric constraints within narrow channels (comparable to the cell size) restrict the cells' ability to turn freely, effectively stabilizing their motion. However, because our model allows particles to reorient more freely than actual cells would in such confined spaces, it produces fluctuations and an overestimation of the drift velocity at small widths. If these confinement effects were fully incorporated, the cell density mismatch between the left and right sidewalls would be reduced, leading to lower drift velocities that match the experimental data more closely.

      Regarding Figure 5C: Since the "active particle" assumption loses physical validity in channels narrower than the scale of the bacterium, the simulation results in this regime are not representative of biological reality. Plotting these non-physical points would distort the analysis. Therefore, we have maintained the truncation of Figure 5C at 4 mm to ensure the data presented is physically meaningful. We have added a clear discussion of these model limitations to the manuscript at lines 310-314.

      These important precisions should be added to the text or in a supplementary section. A validated mechanism describing in detail the impact of the walls on the cell trajectories would greatly improve the conclusions.

      We thank the reviewer for the suggestions. As noted in the responses above, we have incorporated the details concerning the simulation assumptions and the model limitations at narrow widths into the revised manuscript. We have performed further analysis of the collision trajectories between bacteria and the sidewalls. As illustrated in the new Fig. S2, the data confirms that cells tend to align with and swim along the sidewalls following a collision, regardless of the initial impact angle.

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Related to swimming in 3D: The authors should specify the depth of field of the objective in their setup.

      We thank the reviewer for pointing this out. We have calculated the depth of field (DOF) of our objective to be approximately 3.7 µm. This estimate is based on the standard formula:

      where l = 610 nm (emission wavelength), n = 1.0 (refractive index), NA = 0.45 (numeric aperture), M = 20 (magnification), and e = 6.5 µm (camera resolution). We have added this specification to the "Microscopy and Data Acquisition" section of “Materials and Methods”.

      (2) Related to the interpretation of the width effect: We think plotting the cell enrichment, ie the probabilities P in Figure 4B normalized to the expected value if cells were homogeneously distributed ((3µm)/w for the side walls, (w - 6µm)/w for the middle) would help understand the strength of the wall 'siphoning' effect.

      We thank the reviewer for the suggestion. We have calculated the cell enrichment by normalizing the observed probabilities against the expected values for a homogeneous distribution, as suggested. The resulting relationship between cell enrichment and lane width is presented in Figure S4.

      Related to simulations:

      (1) Showing vd for the 3 regions in Figure S5 would be helpful also to understand the underlying mechanism.

      We thank the reviewer for the suggestion. The V<sub>d</sub> values for the three regions are shown in Fig. S5.

      (2) Figure 5B vs 4B: There is a mismatch in the right vs left side density at w=6µm in the simulations that is not here in the experiments. What could explain this difference?

      We appreciate the reviewer pointing this out. The mismatch in the simulations is due to the simplified treatment of cells as self-propelled particles, which overlooks the physical volume of the cell body and flagella. In narrow channels (w\=6 mm), these physical constraints would restrict the cells' ability to change direction freely - a factor not fully captured in the simulation. Accounting for these steric effects would trap cells more effectively against the walls, reducing the density asymmetry between the LSW and RSW and lowering the drift velocity. This would bring the simulation results closer to the experimental observations. We have added a discussion of these limitations and effects to the revised manuscript (lines 310-314).

      (3) The simulations essentially assume that the density of motile cells is homogeneous and equal at both x=0 and x=L open ends of the channel. Is it the case in the experiments, even with the gradient, and the walls creating some cell transport?

      We thank the reviewer for pointing this out. The simulation assumption is consistent with our experimental observations. Our data were recorded within 160-μm-long lanes located in the center of the wider (400 μm) cell channel. In this central region, the cells maintain a continuous flux. Furthermore, experiments were performed within 8 min of flow, limiting the time for significant cell density gradients to establish. As illustrated in Author response image 11, the inhomogeneity in the measured cell density distribution is insignificant across the length of the observation window, indicating that the walls and gradient do not create significant heterogeneity at the boundaries of the region of interest.

      Author response image 1.

      The cell density distribution along the gradient field from the data of 44-μm-wide lane.

      (4) Line 506: There is something strange with the definition of the bias. B cannot be the tumbling bias if k=B/0.31 s<sup>-1</sup> and the tumble-to-run rate is 5/s, because then the tumbling bias is B/0.31 / (B/0.31 + 5). Please clarify.

      We apologize for the confusion caused by the notation. In our model, B represents the CW bias of the individual flagellar motor, not the macroscopic tumbling bias of the cell. We assume the run-to-tumble rate is equivalent to the motor CCW-to-CW switching rate (k). Previous studies have shown that this rate increases linearly with the motor CW bias according to k=B/t, where t is a characteristic time (Ref. 50).

      Based on experimental data for wildtype cells, the average run time in the near-surface region is ~2.0 s (corresponding to a run-to-tumble rate of ~0.5 s<sup>-1</sup>) (Ref. 11), and the steady-state wildtype CW bias is ~0.15. Using these values, we determined t ~ 0.31 s. Consequently, the switching rate is defined as k=B/0.31 s<sup>-1</sup>. Since the tumble duration is constant (0.2 s) (Ref. 51), the tumble-to-run rate is fixed at 5 s<sup>-1</sup>. We have clarified these definitions and parameter values in lines 569-573.

      Other minor comments:

      (1) Line 20 and lines 34-35: We think that the connection to infection is questionable here and should be toned down.

      Thank you for the suggestion. We have revised Line 20 to read: “Understanding bacterial behavior in confined environments is helpful to elucidating microbial ecology and developing strategies to manage bacterial infections.” Additionally, we modified lines 34-35 to state: “Our results may offer insights into bacterial navigation in complex biological environments such as host tissues and biofilms, providing a preliminary step toward exploring microbial ecology in confined habitats and potential strategies for controlling bacterial infections.”

      (2) Line 49: Consider highlighting the change in the sense of rotation at the air-liquid interface.

      Thank you for the suggestion. We have now highlighted the difference in chirality between trajectories at the air-liquid interface and those at the liquid-solid interface. The text has been updated to read: “For example, E. coli swim clockwise when observed from above a solid surface, whereas Caulobacter crescentus move in tight, counter-clockwise circles when viewed from the liquid side.”

      (3) Lines 58-59: The sentence should be better formulated, explaining what is CheY-P and that its concentration changes because of a change in phosphorylation (P).

      Thank you for the suggestion. We have reformulated this section to explicitly define CheY-P and explain how its concentration is regulated through phosphorylation. The revised text reads: “The transmembrane chemoreceptors detect attractants or repellents and transmit signals into the cell by modulating the autophosphorylation of the histidine kinase CheA. Attractant binding suppresses CheA autophosphorylation, while repellent binding promotes it. This modulation alters the concentration of the phosphorylated response regulator protein, CheY-P.”

      (4) Lines 63-64: CheR CheB do a bit more than "facilitating" adaptation, they mediate it. The notation CheB(p) may be confusing, since "-P" was used above for CheY.

      Thank you for pointing this out. We have corrected the notation and strengthened the description of the enzymes' roles. The revised text is: “The adaptation enzymes CheR and CheB methylate and demethylate the receptors, respectively, mediating sensory adaptation.”

      (5) Line 130: there must be a typo in the formula.

      We have replaced the ambiguous lag time variable in Fig. 1C with _n_Δt to ensure mathematical consistency.

      (6) Additionally, \Delta t is both the time between the frame here and the lag time in Figure 1.

      Thank you for highlighting this ambiguity. We have updated the notation to distinguish these two values. The lag time in Figure 1 is now explicitly denoted as _n_Δt, while Δt remains the time interval between individual frames.

      (7) Line 162: "Consistent with previous reports," a reference to said reports is missing.

      Thank you for pointing this out. We have now added the reference (Ref. 41) to support this statement.

      (8) Figure 1B: Are these tracks in the presence of a gradient? Same as used in panel C? This needs to be explained.

      Response: Thank you for this question. We confirm that the tracks shown in Figure 1B were indeed recorded in the presence of a gradient and represent a subset of the data used in Figure 1C. We have clarified this in the figure legend as follows: "Thirty bacterial trajectories selected from the data of the 44-mm-wide lane in gradient assays. These represent a subset of the trajectories analyzed in panel C."

      (9) Simulations: the equation for x(t) should also be given for completeness.

      Thank you for the suggestion. For completeness, we have added the position updating equations for the run state to the Materials and Methods section (lines 579-580). The equations are defined as:

      (10) Figure S2: For the swimming directions that are more unstable due to the surface friction torque, RSW-DG, and LSW-UG, one would have expected that the Up-gradient motion is more persistent than the down gradient one. It seems to be the opposite. Is it significant, and what could be the reason for this?

      We apologize for the lack of clarity in our original explanation. While we would generally expect up-gradient motion to be more persistent than down-gradient motion in bulk fluid, our measurements near the surface show a different trend due to the specific contributions of run and tumble states to the escape rate. Cells swimming up-gradient (UG) in the LSW experience higher probability of running. Consequently, they are subjected to the destabilizing surface friction torque for a greater proportion of time compared to cells swimming down-gradient (DG) in the RSW. This can be explained mathematically. The escape rates for RSW-DG and LSW-UG can be expressed as:

      Where B<sup>+</sup> and B<sup>−</sup> represent the tumble bias (probability of tumbling) when swimming up-gradient and down-gradient, respectively, and k<sub>T</sub> and k<sub>R</sub> denote the escape rates during a tumble and a run, respectively. Due to the chemotactic response, 0≤ B<sup>+</sup>< B<sup>−</sup> ≤1. Crucially, our system is characterized by k<sub>R</sub>>k<sub>T</sub> (the escape rate is higher during a run than a tumble). Therefore, the lower tumble bias during up-gradient swimming (B<sup>+</sup>< B<sup>−</sup>) increases the weight of the run-state escape term((1−B<sup>+</sup>)k<sub>R</sub>), leading to a higher overall escape rate for LSW-UG compared to RSW-DG. We have added an intuitive understanding of k<sub>R</sub>>k<sub>T</sub> in the Supplemental text.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      Authors should be commended for the availability of data/code and detailed methods. Clarity is good. Authors have clearly spent a lot of time thinking about the challenges of metabolomics data analysis.

      Significance

      Schmidt et al. present MetaProViz, a comprehensive and modular platform for metabolomics data analysis. The tool provides a full suite of processing capabilities spanning metabolite annotation, quality control, normalization, differential analysis, integration of prior knowledge, functional enrichment, and visualization. The authors also include example datasets, primarily from renal cancer studies, to demonstrate the functionality of the pipeline. The MetaProViz framework addresses several long-standing challenges in metabolomics data analysis, particularly issues of reproducibility, ambiguous metabolite annotation, and the integration of metabolite features with pathway knowledge. The platform is likely to be a valuable addition for the community, but the reviewer has some comments that need to be addressed prior to publication.

      We thank the reviewer for this positive feedback.

      Comments:

      (1) (Planned)

      The section "Improving the connection between prior knowledge and metabolomics features" could benefit from additional clarification. It is not entirely clear to the reader what specific steps were taken beyond using RaMP-DB to translate metabolite identifiers. For example, how exactly were ambiguous mappings ("different scenarios") handled in practice, and to what extent does this process "fix" or merely flag inconsistencies? A more explicit description or example of how MetaProViz resolves these cases would help readers better understand the improvements claimed.

      We thank the reviewer for pointing this out and we agree that this section requires extension to ensure clarity. Beyond using RaMP-DB, we are characterising the mapping ambiguity (one-to-none, one-to-many, many-to-one, many-to-many) within and across metabolite-sets (i.e. pathways) and return this information to the user together with the translated identifiers. This is important to understand potential inflation/deflation of metabolite-sets that occur due to the translation. Moreover, we also offer the manually curated amino-acid collection to ensure L-, D- and zwitterion without chirality IDs are assigned for aminoacids (Fig. 2b). Ambiguous mappings are handled based on the measured data (Fig. 2e). Indeed, many translation cases that deflate (many-to-one mapping) or inflate (one-to-many mapping) the metabolite-sets are resolved when merging the prior knowledge with actual measured data (i.e. Fig. 2e, one-to-many in scenario 1, which becomes obsolete as only one/none of the many potential metabolite IDs is detected). By sorting each mapping into one of those scenarios, we only flag those cases. The reason for this decision has been that in many cases multiple decisions are valid (i.e. Fig. 2e, Scenario 5: Here the values of the two detected metabolites could be summed or the metabolite value with the larger Log2FC could be kept) and it should really be up to the user to make those dependent on their knowledge of the biological system and the analytical LC-MS method used.

      Since these points have not been clear enough, we will add a more explicit description to the results section by showcasing more details on how we exactly tackled this problem in the ccRCC example data. This has also been suggested by Reviewer 3 (Minor Comment 7 and 8), so feel free to also see the responses below.

      (2) (Planned)

      The introduction of MetSigDB is intriguing, but its construction and added value are not sufficiently described. It would be helpful to clarify what specific advantages MetSigDB provides over directly using existing pathway resources such as KEGG, Reactome, or WikiPathways. For example, how many features, interactions, or metabolite-set relationships are included, and in what way are these pathways improved or extended compared to those already available in public databases?

      We thank the reviewer for this valuable comment and we apologise that this was not described sufficiently. One of the major advantages is that all the resources are available in one place following the same table format without the need to visit the different original resources and perform data wrangling prior to enrichment analysis. In addition, where applicable, we have removed metabolites that are not detectable by LC-MS (i.e. ions, H2O, CO2) to circumvent pathway inflation with features that are never within the data and hence impacting the statistical testing in enrichment analysis workflows.

      During the revision, we will compile an Extended Data Table listing all the resources present in MetSigDB, their number of features and interactions. We will also extend the methods section "Prior Knowledge access" about MetSigDB and how we removed metabolites.

      (3)

      Figure 1D/1E: The reviewer appreciates the inclusion of the visualizations illustrating the different mapping scenarios, as these effectively convey the complexity of metabolite ID translation. However, it took some time to interpret what each scenario represented. It would be helpful to include brief annotations or explanatory text directly on the figures to clarify what each scenario depicts and how it relates to the underlying issue being addressed.

      *We think the reviewer refers to Fig. 2D/E and we acknowledge that this is a complex problem we try to convey. We received a similar comment from Reviewer 2 (Minor Comment 1), who asked to extend the figure legend description of what the different scenarios display. *

      We have extended the figure legend and specifically explained each displayed case and its meaning (Line 222-242):

      "d-e) Schematics of possible mapping cases between metabolite IDs (= each circle corresponds to one ID) of a pathway-metabolite set (e.g. KEGG) to metabolites IDs of a different database (e.g. HMDB) with (d) showing many-to-many mappings that can occur within and across pathway-metabolite sets and (e) additionally showing the mapping to metabolite IDs that were assigned to the detected peaks within and across pathway-metabolite sets. (d) __Translating the metabolite IDs of a pathway-metabolite set can lead to special cases such as many-to-one mappings (Pathway 1), where for example the original resource used the ID for L-Alanine (Pathway 1, green) and D-Alanine (Pathway 1, yellow) in the amino-acid pathway, whilst the translated resources only has an entry for Alanine zwitterion (Pathway 1, blue). Additionally, many-to-one mappings can also occur across pathways (Pathway 2-4), where this mapping is only detected when mappings are analysed taking all pathways into account. Both of these cases deflate the pathways, which can also happen for one-to-none mappings (Pathway 1, white). There are also cases that inflate the pathway such as one-to-many mappings (e.g. Pathway 2-4, orange mapping to pink and violet). (e)__ Showcasing the different scenarios when merging measured data (detected) based on the translated metabolites within pathways (scenario 1-5) and across pathways (scenario 6-8) highlighting problematic scenarios (4-7) that require further actions. Unproblematic scenarios (1-3 and 8) can include special cases between original and translated (i.e. one-to-many in scenario 1), which become obsolete as only one/none of the many potential metabolite IDs is detected. Yet, if multiple metabolites are detected action is required (scenario 5), which can include building the sum of the multiple detected features or only keeping the one with the highest Log2FC between two conditions. Other special cases between original and translated (i.e. many-to-one in scenario 4 and 6) also depend on what has been mapped to the measured features. If features have been measured in those scenarios, pathway deflation (i.e. only one original entry remains) or measured feature duplication (the same measurement is mapped to many features in the prior knowledge) are the possible results within and across pathways. Those scenarios should be addressed on a case-by-case basis as they also require biological information to be taken into account."

      We have also rearranged the Scenarios in Fig. 2e. We hope that together with the extended figure legend this is now clear.

      (4) (Planned)

      "By assigning other potential metabolite IDs and by translating between the present ID types, we not only increase the number of features within all ID types but also increase the feature space with HMDB and KEGG IDs (Fig. 2a, right, SFig. 2 and Supplementary Table 1)". The reviewer would appreciate additional clarification on how this was done. It is not clear what specific steps or criteria were used to assign additional metabolite IDs or to translate between identifier types. The reviewer also appreciates the inclusion of the UpSet plots. However, simply having the plots side-by-side makes it difficult to determine the specific differences. An alternative visualization, such as stacked bar plots, scatter plots summarizing the changes in feature counts, or other representation that more clearly highlights the deltas, might make these results easier to interpret.

      The main Fig. 2a shows the original (left) metabolite ID availability per detected metabolite feature in the ccRCC data and the adapted (right) metabolite IDs. The individual steps taken to extend the metabolite ID coverage of the measured features and obtain Fig 2a (right), are shown in SFig. 2 for HMDB (SFig. 2a) and KEGG (SFig. 2b). We did not include the plots for the pubchem IDs as they follow the same principle. The individual steps we are showcasing with SFig. 2 are (I) How many of the detected features (577) have a HMDB ID (341, red bar + grey bar), (II) How this distribution changed after equivalent amino-acid IDs are added, which does not change the number of features with an HMDB ID, but the number of features with a single HMDB ID, and (III) How this distribution changed after translating from the other available ID types (KEGG and PubChem) to HMDB IDs using RaMP-DBs knowledge, which leads to 430 detected features with one or multiple HMDB IDs. The exact numbers can be extracted from Supplementary Table 1, Sheet "Feature metadata", where for example N-methylglutamate had no HMDB ID assigned in the original publication (see column HMDB_Original), yet by translating HMDB from KEGG (hmdb_from_kegg) and PubChem (see column hmdb_from_pubchem) we obtain in both cases the same HMDB ID "HMDB0062660". In order to clarify this in the manuscript, we have extended the figure legend of SFig. 2: "a-b) Bargraphs showing the frequency at which a certain number of metabolite IDs per integrated peak are available as per ccRCC patients feature metadata provided in the original publication (left), after potential equivalent IDs for amino-acid and amnio-acid-related features were assigned (middle), which increases the number of features with multiple (middle: grey bars) and after IDs were translated from the other available ID types (right). for a) Of 577 detected features, 341 had at least one HMDB IDs assigned (left graph, red + grey bar) according to the original publication (left). Translating from KEGG-to-HMDB and from PubChem-to-HMDB increased the number of features with an HMDB ID from 341 to 430 (left). and __b) __Of 577 detected features, 306 had at least one KEGG IDs assigned (left graph, red + grey bar) according to the original publication (left). Translating from HMDB-to-KEGG and from PubChem-to-KEGG did not increase the total number of features with an KEGG ID (left)."

      We like the suggestion of the reviewer to provide representations of the deltas and will add additional plots to SFig. 2 as part of our planned revision.

      (5) (Planned)

      MetaboAnalyst is mentioned several times in the manuscript. The reviewer is familiar with some of the limitations and practical challenges associated with using MetaboAnalyst and its R package. Given that MetaboAnalyst already offers some overlapping functionality with MetaProViz (and offers it in the form of an interactive website and a sometimes functional R package), a more explicit comparison between the two tools would help readers fully understand the unique advantages and improvements provided by MetaProViz.

      This is a good point the reviewer raises. As part of the revisions, we plan to create a supplementary data table that includes both tools and their respective features. We will refer to this table within the manuscript text.

      (6)

      Page 11: The authors state that they used limma for statistical testing, including for the analysis of exometabolomics data, where the values appear to represent log2-transformed distances or ratios rather than normally distributed intensities. Since limma assumes approximately normal residuals, please provide evidence or justification that this assumption holds for these data types. If the distributions deviate substantially from normality, a non-parametric alternative might be more appropriate.

      For exometabolomics data we use data normalised to media blank and growth factor (formula (1)). Limma is performed on those data, not on the log2-transformed distances. The Log2(Distance) is calculated separately to the statistical results using the normalised exometabolomics data. In addition, we always perform the Shapiro-Wilk test as part of MetaProViz differential analysis function on each metabolite to understand the distribution. In this particular case we have the following distributions:

      Cell line

      Metabolites normal distribution [%]

      Metabolites not-normal distribution [%]

      HK2

      82.35

      17.65

      786-O

      95.71

      4.29

      786-M1A

      97.14

      2.86

      786-M2A

      88.57

      11.43

      OSRC2

      92.86

      7.14

      OSLM1B

      85.71

      14.29

      RFX631

      97.14

      2.86

      If a user would have distributions that deviate substantially from normality, non-parametric alternatives are also available in MetaProViz (see methods section for all options).

      7)

      Page 13: why were young and old defined this way? Authors should provide their reasoning and/or citations for this grouping.

      We thank the reviewer for pointing this out. The explanation of our choices of the age groups is purely based on the literature:

      First, ccRCC can be sporadic (>96%) or familial (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308682/pdf/nihms362390.pdf). This was also observed in other cohorts, where of 1233 patients only 93 were under 40 years of age (%, whilst 1140 (%) were older than 40 years (https://www.europeanurology.com/article/S0302-2838(06)01316-9/fulltext). Second, given the high frequency of sporadic cases it is unsurprising that ccRCC incidences were found to peak in patients aged 60 to 79 years with more male than female incidences (https://journals.lww.com/md-journal/Fulltext/2019/08020/Frequency,_incidence_and_survival_outcomes_of.49.aspx). Third, it was shown that sex impacts on the renal cancer-specific mortality and is modified by age, which is a proxy for hormonal status with premenopausal period below 42 years and postmenopausal period above 58 years (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4361860/pdf/srep09160.pdf). Putting all of this information together, we decided on our age groups of young (58years) following the hormonal period in order to account for sex impact. Additionally, our young age group is representative of the age of familial ccRCC, whilst our old age group summarises the age group where incidences were found to peak.

      To make this clear in the manuscript we have extended the method section of the manuscript (Line 547-548):

      "For the patient's ccRCC data, we compared tumour versus normal of two patient subset, "young" (58years)."

      (8)

      Figure 4e: It may help with interpretation to have these Sankey-like graph edges be proportional to the number of metabolites.

      We thank the reviewer for this suggestion, which we also pondered. When we tested this visualisation, the plot became convoluted, hard to interpret and not all potential flows exist in the data. This is why we have opted to create an overview graph of each potential flow, with each edge representing a potentially existing flow. The number of times a flow exists is shown in Fig. 4f.

      (9)

      Figure 4h: The values appear to be on an intensity scale (e.g., on the order of 3e10), yet some of them are negative, which would not be expected for raw or log-transformed mass spectrometry intensities. It is unclear whether these represent normalized abundance values, distances, or some other transformation. In addition, for the comparison of tumour versus normal tissue, it is not specified what statistical test was applied. Since mass spectrometry data are typically log2-transformed to approximate a log-normal distribution before performing t-tests or similar parametric methods, clarification is needed on how these data were processed.

      Thanks for pointing this out, it made us realize that we need to extend our figure legend for clarity for Fig. 4h (Line 343-345). In both cases we show normalized intensities following the workflow described in Fig. 3a. In case of the left graph labelled "CoRe", we are plotting an exometabolomics experiment, were additionally normalised using both media blanks (samples where no cells were cultured in) and growth factor (accounts for cell growth during experiment) as growth rate (accounts for variations in cell proliferation) has not been available (see also formula (1) in methods section). A result has a negative value if the metabolite has been consumed from the media, or a positive value if the metabolite has been released from the cell into the culture media.

      In addition, the reviewer refers to the comparison of tumour versus normal (Fig. 4a __and 4d__) and the missing description of the chosen statistical test. We have added the details to the figure legend (Lines 334 and 345).

      Adapted legend Fig. 4: "a) Differential metabolite analysis results for exometabolomics data comparing 786-O versus HK2 cells using Annova and false discovery rate (FDR) for p-value adjustment. b) __Heatmap of mean consumption-release of the measured metabolites across cell lines. c) Heatmap of normalised ccRCC cell line exometabolomics data for the selected metabolites of amino acid metabolism for a sample subset. __d) __Differential metabolite analysis results for intracellular data comparing 786-O versus HK2 cells using Annova and false discovery rate (FDR) for p-value adjustment. __e) __Schematics of bioRCM process to integrate exometabolomics with intracellular metabolomics and __f) __number of metabolites by their combined change patterns in intracellular- and exometabolomics in 786-M1A versus HK2. g)__ Heatmap of the metabolite abundances in the "Both_DOWN (Released/Comsumed)" cluster. __h) __Bar graphs of normalised methionine intensity for exometabolomics (CoRe: negative value, if the metabolite has been consumed from the media, or a positive value, if the metabolite has been released from the cell into the culture media) and intracellular metabolomics (Intra)."


      (10)

      Figure 5: "Tukey's p.adj We thank the reviewer for pointing this out. We have used the TukeyHSD (Tukey's Honestly Significant Difference) test in R on the Anova results. We have added more details into the figure legend (Line 384): "(Tukey's post-doc test after anova p.adj<br /> (11)

      The potential for multi-omics is mentioned. Please clarify how generalizable this framework is. Can it readily accommodate transcriptomics, proteomics, or fluxomics data, or does it require custom logic or formatting for each new data type?

      Thanks for raising this question. MetaProViz can readily accommodate transcriptomics and proteomics data for combined enrichment analysis using for example MetalinksDB metabolite-receptor pairs. Yet, MetaProViz does not support modelling fluxomics data into metabolic networks. We state in the discussion that this could be future development ("Beyond current capabilities, future developments could also incorporate mechanistic modeling to capture metabolic fluxes, subcellular compartmentalization, enzyme kinetics, regulatory feedback loops, and thermodynamic constraints to dissect metabolic response under perturbations."). To clarify on the availability of multi-omics integration for combined enrichment analysis, we have added some more details into the discussion section.

      Line 467-469: "In addition, providing knowledge of receptor-, transporter- and enzyme-metabolite pairs, MetaProViz can readily accommodate transcriptomics and proteomics data for combined enrichment analysis."

      (12)

      Please clarify if/how enrichment analyses account for varying set sizes and redundant metabolite memberships across pathways, which can bias over-representation analysis results.

      This is a very relevant point, which we have already been working on. Indeed, we agree that enrichment results from enrichment analyses can be biased due to varying set sizes and redundant metabolite memberships across pathways. MetaProViz explicitly accounts for varying set sizes when running over representation analysis (functions standard_ora()and cluster_ora()), which uses a model that computes the p-value under a hypergeometric distribution. Thereby, larger pathways are penalized unless the overlap is proportionally large, while smaller pathways can be significant with fewer overlaps. Hence, the test quantifies whether the observed overlap between the query set and a pathway is larger than would be expected under random sampling. In addition, we explicitly filter by gene‑set size using min_gssize/max_gssize, which further controls for extreme small or large sets. So both the statistical test itself and the size filters incorporate gene‑set size variation.

      Regarding the redundant metabolite-set (i.e. pathways) memberships, we have now implemented a new function (cluster_pk()) to cluster metabolite-sets like pathways based on overlapping metabolites. Thereby we allow investigation of enrichment results in regard to redundancy and similarity. For given metabolite-sets, the function calculates pathway similarities via either overlap- or correlation-based metrics. After optional thresholding to remove weak similarities, we implemented three clustering algorithms (connected-components clustering, Louvain community detection and hierarchical clustering) to group similar pathways. We then visualize the clustering results as a network graph using the new function viz_graph based on igraph. We have added all information into our methods section "Metabolite-set clustering" (Lines 656-671). In addition, we have also added the results of the clustering into Fig. 5f.

      New Fig. 5f:"f) *Network graph of top enriched pathways (p.adjusted

      Reviewer #2

      Evidence, reproducibility and clarity

      Schmidt et al report the development of MetaProViz, an integrated R package to process, analyze and visualize metabolomics data, including integration with prior knowledge. The authors then go on to demonstrate utility by analyzing several metabolomes of cell lines, media and patient samples from kidney cancer. The manuscript provides a concise description of key challenges in metabolomics that the authors identify and address in their software. The examples are helpful and illustrative, although I should point out that I lack the expertise to evaluate the R package itself. I only have a few very minor comments.

      Significance

      This is a very significant advance from one of the leading groups in the field that is likely to enhance metabolomics data analysis in the wider community.

      We thank the reviewer for this positive feedback on our package. We appreciate that there are no major comments from the reviewer.

      Minor comments:

      (1)

      Figure 2D, E: While the schematics are fairly intuitive, a brief figure legend description of what the different scenarios etc. represent would make this easier to grasp.

      We thank the reviewer for pointing this out and we acknowledge that this is a complex problem we try to convey. We received a similar comment from Reviewer 1 (Comment 3), so please see the extensive response there. In brief, we have extended the figure legend and specifically explained each displayed case and its meaning (Line 222-242) and extended the Figure itself by adding additional categories to Fig. 2e.

      Extended legend Fig.2 d-e: "d-e) Schematics of possible mapping cases between metabolite IDs (= each circle corresponds to one ID) of a pathway-metabolite set (e.g. KEGG) to metabolites IDs of a different database (e.g. HMDB) with (d) showing many-to-many mappings that can occur within and across pathway-metabolite sets and (e) additionally showing the mapping to metabolite IDs that were assigned to the detected peaks within and across pathway-metabolite sets. (d) __Translating the metabolite IDs of a pathway-metabolite set can lead to special cases such as many-to-one mappings (Pathway 1), where for example the original resource used the ID for L-Alanine (Pathway 1, green) and D-Alanine (Pathway 1, yellow) in the amino-acid pathway, whilst the translated resources only has an entry for Alanine zwitterion (Pathway 1, blue). Additionally, many-to-one mappings can also occur across pathways (Pathway 2-4), where this mapping is only detected when mappings are analysed taking all pathways into account. Both of these cases deflate the pathways, which can also happen for one-to-none mappings (Pathway 1, white). There are also cases that inflate the pathway such as one-to-many mappings (e.g. Pathway 2-4, orange mapping to pink and violet). (e)__ Showcasing the different scenarios when merging measured data (detected) based on the translated metabolites within pathways (scenario 1-5) and across pathways (scenario 6-8) highlighting problematic scenarios (4-7) that require further actions. Unproblematic scenarios (1-3 and 8) can include special cases between original and translated (i.e. one-to-many in scenario 1), which become obsolete as only one/none of the many potential metabolite IDs is detected. Yet, if multiple metabolites are detected action is required (scenario 5), which can include building the sum of the multiple detected features or only keeping the one with the highest Log2FC between two conditions. Other special cases between original and translated (i.e. many-to-one in scenario 4 and 6) also depend on what has been mapped to the measured features. If features have been measured in those scenarios, pathway deflation (i.e. only one original entry remains) or measured feature duplication (the same measurement is mapped to many features in the prior knowledge) are the possible results within and across pathways. Those scenarios should be addressed on a case-by-case basis as they also require biological information to be taken into account."

      (2) Fig. 4: The authors briefly state that they integrate prior knowledge to identify the changes in methionine metabolism in kidney cancer, but it is not clear how exactly they contribute to this conclusion. It could be helpful to expand a bit on this to better illustrate how MetaProViz can be used to integrate prior knowledge into the analysis workflow.

      We think the reviewer refers to this section in the text (Line 363-370):

      "Next, we focused on the cluster "Both_DOWN (Released-Consumed)" and found that several amino acids are consumed by the ccRCC cell line 786-M1A but released by healthy HK2 cells. At the same time, intracellular levels are significantly lower than in HK2 (Log2FC = -0.9, p.adj = 4.4e-5) (Fig. 4g). To explore the role of these metabolites in signaling, we queried the prior knowledge resource MetalinksDB, which includes metabolite-receptor, metabolite-transporter and metabolite-enzyme relationships, for their known upstream and downstream protein interactors for the measured metabolites (Supplementary Table 5). This approach is especially valuable for exometabolomics, as it allows us to generate hypotheses about cell-cell communication. Notably, we identified links involving methionine (Fig. 4h), enzymes such as BHMT, and transporters such as SLC43A2 that were previously shown to be important in ccRCC25,42 (Supplementary Table 5)."

      We have now extended this part to clearly state that here MetalinkDB is the prior knowledge resource we used to identify the links for methionine (Line 363-364). In addition we have extended our summary statement to ensure clarity for the reader that we combine the biological clustering, which revealed the amino acid changes, with prior knowledge for the mechanistic insight (Line 380-381):

      "In summary, calculating consumption-release and combining it with intracellular metabolomics via biological regulated clustering reveals metabolites of interest. Further combining these results with prior knowledge using the MetaproViz toolkit facilitates biological interpretation of the data."

      (3)

      Given the functional diversity among metabolites -central to diverse pathways, are key signaling molecules, restricted functions, co-variation within a pathway - I wonder how informative approaches such as PCA or enrichment analyses are for identifying metabolic drivers of a (patho)physiological state. To some extent, this can be addressed by integrating prior knowledge, and it would be helpful if the authors could comment on (and if applicable explain) whether/how this is integrated into MetaProViz.

      The reviewer is correct in stating the functional diversity of metabolites, which is also why prior knowledge is needed to add mechanistic interpretation to the finding from the metadata analysis (as we showcased by focusing on the separation of age (Fig. 5c-d)). We think that approaches such as PCA or enrichment can be helpful, even if admittedly limited. For example, in the metadata analysis presented in Fig. 5b and the subsequent enrichment analysis presented in Fig. 5, we used PCA to extract the eigenvector and the loading, which act as weights indicating the contribution of each original metabolite to that specific principal components separation. Hence, the eigenvector of PCA shows the metabolite drivers of the separation. This does not necessarily mean that those metabolites are drivers of a (patho)physiological state - the (patho)physiological state can equally be the reason for those metabolites driving the separation on the Eigenvectors. Thus, the metadata analysis presented in Fig. 5b enables us to extract the metadata variables (patho)physiological states separated on a PC with the explained variance. This can also lead to co-variation, when multiple (patho)physiological states are separated on the same PC, as the reviewer correctly points out. Regarding the enrichment analysis, we provide different types of prior knowledge for classical mapping, but also the prior knowledge we used to create the biological regulated clustering, which together help to identify key metabolic groups as we can first cluster the metabolites and afterwards perform functional enrichment. Yet, this does not account for the technical issues of enrichment analysis. In this context multi-omics integration building metabolic-centric networks could further elucidate the diversity of metabolic pathways and connection to signalling and co-variation, yet this is not the scope of MetaProViz. To sum up, we are aware of the limitations of this analysis and the constraints on the downstream interpretation.

      To capture the functional diversity amongst metabolites, which leads to metabolites being present in multiple pathways of metabolite-pathways sets, we have implemented a new function to cluster metabolite-sets like pathways based on overlapping metabolites and visualize redundant metabolite-set (i.e. pathways) memberships (Fig.5f). For more details also see our response to Reviewer 1, Comment 12. We hope this will circumvent miss- and over-interpretation of the enrichment results.

      In addition, we have extended the text to include the analysis pitfalls explicitly (Line 416-419): "Another variable explaining the same amount of variance in PC1 is the tumour stage, which could point to adjacent normal tissue metabolic rewiring that happens in relation to stage and showcases that biological data harbour co-variations, which can not be disentangled by this method."

      Reviewer #3

      Evidence, reproducibility and clarity

      This manuscript introduces an R package MetaProViz for metabolomics data analysis (post anotation), aiming to solve a poor-analysis-choices problem and enable more people to do the analysis. MetaProViz not only guides people to select the best statistical method, but also enables to solve previously unsolved problems: e.g. multiple and variable metabolite names in different databases and their connections to prior knowledge. They also created exometabolomics analysis and the needed steps to visualise intra-cell / media processes. The authors demonstrated their new package via kidney cancer (clear-cell renal cell carcinoma dataset, steping one step closer to improve biological interpretability of omics data analysis.

      Significance

      This is a great tool and I can't wait to use it on many upcoming metabolomics projects! Authors tackle multiple ongoing issues within the field: from poor selection of statistical methods (they provide guidance or have default safer options) to the messiness of data annotation between databases and improving data interpretability. The field is still evolving quickly, and it's impossible to solve all problems with one package; thus some limitations within the package could be seen as a bit rigid. Nonetheless, this fully steps toward filling an existing methodological gap. All bioinformaticians doing metabolomic analysis, or those learning how to do it, will greatly benefit from this knowledge.

      I myself lead a team of 6 bioinformaticians, and we do analysis for researchers, clinicians, drug discovery, and various companies. We run internal metabolomics pipelines every day and fully sympathise with the problems addressed by the authors.

      Major comments affecting conclusions

      none.

      We thank the reviewer for this positive feedback on evidence, reproducibility and clarity as well as significance of our work given the reviewers experience with metabolomics data analysis mentioned. We appreciate that there are no major comments from the reviewer.

      Minor comments

      Minor comments, important issues that could be addressed and possibly improve the clarity or generally presentation of the tool. Please see all below.

      (1)

      1- You start with separating and talking about metabolomics and lipidomics, but lipidomics quickly dissapears (especially beyond abstract/intro) - no real need to discuss lipidomics.

      Thanks, that's a good note and we have removed it from the abstract and introduction.

      (2)

      2- You refer to the MetImp4 imputation web tool, but I cannot find an active website, manuscript, or R package for it, and the cited link does not load. This raises doubts about whether the tool is currently usable. Additionally, imputation choice should be guided by biological context and study design, not just by testing a few methods and selecting the one that performs best.

      We fully agree with the reviewer on imputation handling. The manuscript we cite from Wei et. al. (https://doi.org/10.1038/s41598-017-19120-0) compared a multitude of missing value imputation methods and made this comparison strategy available as a web-based tool not as any code-based package such as an R-package. Yet, the reviewer is right, the web-tool is no longer reachable. Hence, we have adapted the statement in our introduction (Line 61-62): "Moreover, there are tools that focus on specific steps of the pre-processing of feature intensities, which encompasses feature selection, missing value imputation (MVI)9 and data normalisation. For example, MetImp4 is a web-tool that includes and compares multiple MVI methods9. "

      (3)

      3- The authors address key metabolomics issues such as ambiguous metabolite names and isoforms, and their focus on resolving mapping ambiguities and translating between database identifiers is highly valuable. However, the larger challenge of de novo identification and the "dark matter" of unannotated metabolites remains unresolved (initiatives as MassIVE might help in the future https://massive.ucsd.edu/ProteoSAFe/ ), and readers may benefit from clearer acknowledgement that MetaProViz does not operate on raw spectral data. The introduction currently emphasizes annotation, but since MetaProViz requires already annotated metabolite tables (and then deals with all the messiness), this space might be better used to frame the interpretability and pathway-analysis challenges that the tool directly addresses.

      We appreciate the comment and have highlighted this in the abstract and introduction: "MetaProViz operates on annotated intensity values..." (Line 29 and 88).

      Given the newest advancements in metabolite identification using AI-based methods, MetaProViz toolkit with a focus on connecting metabolite IDs to prior knowledge becomes increasingly valuable. We added this to our discussion (Line 484-488): "Given the imminent shift in metabolite identification through AI-based approaches, including language model-guided48 methods and self-supervised learning49, the growing number of identified metabolites will make the MetaProViz toolkit increasingly valuable for the community to gain functional insights."

      In regards to the introduction, where we mention some tools for peak annotation: The reason why we have this paragraph where peak annotation are named is that we wanted to set the basis by (I) listing the different steps of metabolomics data analysis and (II) pointing to well-known tools of those steps. We also have a dedicated paragraph for pathway-analysis challenges.

      (4)

      4- I also really enjoyed you touching on the point of user-friendly but then inflexible and problem of reproducibility. We truly need well working packages for other bioinformaticians, rather than expecting wet-lab scientists to do all the analysis within the user interface.

      We thank the reviewer for this positive feedback.

      (5)

      5- It would be helpful to explain why the authors chose cancer/RCC samples for the demonstration. Was it because the dataset included both media and cell measurements? Does the tool perform best when multiple layers of information are available from the same experiment?

      We specifically chose the ccRCC cell line data as example since, for a multitude of cell lines, both media (exometabolomics) and intracellular metabolomics had been performed. The combination of both data types is only used in the biological regulated clustering (Fig. 5e-g), all other analyses do not require additional data modalities. We have not specifically tested how performance differs for this particular case as it would require multiple paired data (exometabolomics and intracellular metabolomics) taken at the same time and at different times.

      (6)

      6- Figure 2B: The upset plots effectively show increased overlap after adaptation, but it would be easier to compare changes if the order of the intersection bars in the "adapted" plot matched the original. For example, while total intersections increased (251→285), the PubChem+KEGG overlap decreased (24→5), likely due to reallocation to the full intersection.

      Thanks for raising this point. We initially had ordered the bars based on their intersection size, but we agree with the reviewers that for our point it makes sense to fix the order in the adapted plot to match the order of the original plot. We have done this (Fig 2a) and also extended the figure legend text of SFig. 2, which shows the individually performed adaptations summarized in Fig 2a.

      (7) (Planned)

      7- In your example of D-alanine and L-alanine - you mention how chirality is important biological feature, but up to this point it's not clear how do you do translation exactly and in which situations this would be treated just as "alanine" and when the more precise information would be retained? You mention RaMP-DB knowledge and one to X mappings as well as your general guidance in the "methods" part, but it would be useful to describe in this publication how you exactly tackled this problem in the ccRCC case.

      We thank the reviewer for this suggestion. Since this is a complex problem, we will add a more explicit description to the results section by showcasing more details on how we exactly tackled this problem in the ccRCC example data.

      In regards to D- and L-alanine, even though chirality is an important biological feature, in a standard experiment we can not distinguish if we detect the L- or D-aminoacid. This is why we try to assign all possible IDs to increase the overlap with the prior knowledge. In Fig. 2b we showcase that this can potentially lead to multiple mappings of the same measured feature to multiple pathways. For example, if we measure alanine and assign the pubchem ID for L-Alanine, D-Alanine and Alanine and try to map to metabolite-sets that include both L-Alanine and D-Alanine. In turn this could fall into Scenario 6 (Fig. 2e), where across pathways there is a D-Alanine specific one (Pathway 1) and a L-Alanine specific one (Pathway 2). Now we can decide, if we want to allow both mapping (many-to-one) or if we decide to exclude D-Alanine because we know our biological system is human and should primarily have L-Alanine.

      (8) (Planned)

      8- In one to many mappings, it would be interesting to see quantification how frequently it was happening within a pathway or across pathways. I.e. Would going into pathway analysis "solve" the issue of "lost in translation" or not really?

      We have quantified the frequency for the example of translating the KEGG metabolite-set into HMDB IDs (Fig. 2c, left panel). Yet, we are not showcasing the quantification across the KEGG metabolite-sets with this plot. During the revision we will add the full results available to the Extended Data Table 2, which currently only includes the results displayed in Fig.2c.

      (9)

      9- QC: the coefficient of variation (CV) helps identify features with high variability and thus low detection accuracy. Here it's important to acknowledge that if the feature is very variable between groups it can be extremely important, but if the feature is very variable within the group - only then one would have low trust in the accuracy.

      Yes, we totally agree with the reviewer on this. For this reason, we have applied CV only in instances where this is not leading to any condition-driven CV differences, but is truly feature-focused: (1) Function pool_estimation performs CV on the pool samples only, which are a homogeneous mixture of all samples, and hence can be used to assess feature variability. (2) Function processing performs CV on exometabolomics media samples (=blanks), which are also not impacted by different conditions.

      (10)

      10- Missing value imputation - while missing not at random is a great way to deal with missingness, it would be great to have options for others (not just MNAR), as missingness is of a complex nature. If a pretty strong decision has been made, it would be good to support this by some supplementary data (i.e. how results change while applying various combinations of missingness and why choosing MNAR seems to be the most robust).

      We have decided to only offer support for MNAR, since we would recommend MVI only if there is a biological basis for it.

      As mentioned in the response to your minor comment 2, Wei et. al. (https://doi.org/10.1038/s41598-017-19120-0) compared a multitude of missing value imputation methods. They compared six imputation methods (i.e., QRILC, Half-minimum, Zero, RF, kNN, SVD) for MNAR and systematically measured the performance of those imputation methods. They showed that QRILC and Half-Minimum produced much smaller SOR values, showing consistent good performances on data with different numbers of missing variables. This was the reason for us to only provide Half-minimum.

      (11) (Planned)

      11- In the pre-processing and imputation stages - it would be interesting to see a summary table of how many features are left after each stage.

      This is a good suggestion and refers to the steps described in Fig. 3a. We will create an overview table for this, add it into the Extended Data Table and refer to it in the results section.

      (12)

      12- Is there a reason not to do UMAP or PSL-DA graphs for outlier detection? Doing more than PCA would help to have more confidence in removing or retaining outliers in the cases where biological relevance is borderline.

      The reason we decided to use PCA was the standardly used combination with the Hotelling T2 outlier testing. Since PCA is a linear dimensionality reduction technique that preserves the overall variance in the data and has a clear mathematical foundation linked to the covariance structure, it specifically fits the required assumptions of the Hotelling T2 outlier testing. Indeed, Hotelling T2 relies on the properties of the covariance matrix and the assumption of a multivariate Gaussian distribution. UMAP is a non-linear dimensionality reduction technique, which prioritizes preserving local and global structures in a way that often results in good clustering visualization, but it distorts distances between clusters and does not have the same rigorous statistical underpinnings as PCA. In terms of PLS-DA, which focuses on maximizing the covariance between variables and the class labels, even though not commonly done, one could use the optimal latent variables for discrimination and apply Hotelling's T² to those latent variables. Yet, PLS-DA is supervised and actively tries to separate data points in the latent space, which can be misleading for outlier detection where methods like PCA that are unbiased, unsupervised and preserve global variance are advantageous.

      (13)

      13- Metadata vs metabolite features - can this be used beyond metabolomics (i.e. proteomics, transcriptomics, etc)? It can be always very useful when there are many metadata features and it's hard to pre-select beforehand which ones are the most biologically relevant.

      Yes, definitely. In fact, we have used the metadata analysis strategy also with proteomics data and it will work equally with any omics data type.

      (14)

      14- While authors discussed what KEGG pathways were significantly deregulated, it would be interesting to see all the pathways that were affected (e.g. aPEAR "bubble" graphs can show this (https://github.com/kerseviciute/aPEAR) , or something similar to NES scores). I appreciate the trickiness of it, but it would be quite interesting to see how authors e.g. Figure5e narrowed it down to the two pathways and how all the others looked like.

      We thank the reviewer for the suggestion of the aPEAR graphs. Following this suggestion, we have implemented a new function to enable clustering of the pathways based on overlapping metabolites (cluster_pk()). For more details regarding the method see also our response to Reviewer 1 (Comment 12) and our extended method section "Metabolite-set clustering" (Lines 656-671). We visualize the clustering results as a network graph, which we also included into Fig. 5f.

      The complete result of the KEGG enrichment can be found in Extended Data Table 1, Sheet 13 (Pathway enrichment analysis using KEGG on Young patient subset). The pathways are ranked by p.adjusted value and also include a score (FoldEnrichment) from the fishers exact test (similar to NES scores in GSEA). Here one can find a total of seven pathways with a p.adjusted value For Fig. 5e we narrowed down to these two pathways based on the previous findings of dysregulated dipeptides (Fig. 5d), as we searched for a potential explanation of this observation.

      (15)

      15- Could you comment on the runtime of the pipeline? In particular, do the additional translation steps and use of multiple databases substantially affect computational speed?

      Downloading and parsing databases takes significant time, especially large ones like RaMP or HMDB might take minutes on a standard laptop. Our local cache speeds up the process by eliminating the need for repeated downloads. In the future, database access will be even faster: according to our plans, all prior knowledge will be accessible in an already parsed format by our own API (omnipathdb.org). The ambiguity analysis, which is a complex data transformation pipeline, and plotting by ggplot2, another key component of MetaProViz, are the slowest parts, especially when performing analysis for the first time when no cache can be used. This means there are a few slow operations which complete in maximum a few dozens of seconds. However, the implementation and speed of these solutions doesn't fall behind what we commonly find in bioinformatics packages, and most importantly, the speed of MetaProViz doesn't pose an obstacle or difficulty regarding an efficient use of it in analysis pipelines.

      (16)

      16- I clap to the authors for automated checks if selected methods are appropriate!

      Thank you, this is something we think is important to ensure correct analysis and circumvent misinterpretation.

      (17)

      17- My suggestion would be to also look into power calculation or p-value histogram. In your example you saw some clear signal, but very frequently research studies are under-sampled and while effect can be clearly seen, there are just not enough samples to have statistically significant hits.

      We fully agree that power calculations are very important. Yet, this should ideally happen prior to the user's experiment. MetaProViz analysis starts at a later time-point and power calculations should have been done before. In regards to p-value histogram, we have implemented a similar measure, namely a density plot, which is plotted as a quality control measure within MetaProViz differential analysis function. The density plot is a smoothed version of a histogram that represents the distribution as a continuous probability density function and can be used to assess whether the p-values follow a uniform distribution.

      (18)

      18- Overall functional parts are novel and next step in helping with data interpretability, but I still found it hard to read into functionally clear insights (re to pathways / functional groupings of metabolites) - especially as you have e.g. enzyme-metabolite databases etc. I think clarity there could be improved and would help to get your message more widely across.

      Regarding the clarity to the pathway enrichment and their functional insights, we have extended the Figure legends of Fig. 4 and 5, clearly state that for the functional interpretation MetalinkDB is the prior knowledge resource we used to identify the links for methionine (Line 367-368), and we have extended our summary statement to highlight that we combine the biological clustering with prior knowledge for the mechanistic insight (Line 380-381).

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript introduces an R package MetaProViz for metabolomics data analysis (post anotation), aiming to solve a poor-analysis-choices problem and enable more people to do the analysis. MetaProViz not only guides people to select the best statistical method, but also enables to solve previously unsolved problems: e.g. multiple and variable metabolite names in different databases and their connections to prior knowledge. They also created exometabolomics analysis and the needed steps to visualise intra-cell / media processes. The authors demonstrated their new package via kidney cancer (clear-cell renal cell carcinoma dataset, steping one step closer to improve biological interpretability of omics data analysis.

      Major comments affecting conclusions: none.

      Minor comments, important issues that could be addressed and possibly improve the clarity or generally presentation of the tool. Please see all below.

      1. You start with separating and talking about metabolomics and lipidomics, but lipidomics quickly dissapears (especially beyond abstract/intro) - no real need to discuss lipidomics.
      2. You refer to the MetImp4 imputation web tool, but I cannot find an active website, manuscript, or R package for it, and the cited link does not load. This raises doubts about whether the tool is currently usable. Additionally, imputation choice should be guided by biological context and study design, not just by testing a few methods and selecting the one that performs best.
      3. The authors address key metabolomics issues such as ambiguous metabolite names and isoforms, and their focus on resolving mapping ambiguities and translating between database identifiers is highly valuable. However, the larger challenge of de novo identification and the "dark matter" of unannotated metabolites remains unresolved (initiatives as MassIVE might help in the future https://massive.ucsd.edu/ProteoSAFe/ ), and readers may benefit from clearer acknowledgement that MetaProViz does not operate on raw spectral data. The introduction currently emphasizes annotation, but since MetaProViz requires already annotated metabolite tables (and then deals with all the messiness), this space might be better used to frame the interpretability and pathway-analysis challenges that the tool directly addresses.
      4. I also really enjoyed you touching on the point of user-friendly but then inflexible and problem of reproducibility. We truly need well working packages for other bioinformaticians, rather than expecting wet-lab scientists to do all the analysis within the user interface.
      5. It would be helpful to explain why the authors chose cancer/RCC samples for the demonstration. Was it because the dataset included both media and cell measurements? Does the tool perform best when multiple layers of information are available from the same experiment?
      6. Figure 2B: The upset plots effectively show increased overlap after adaptation, but it would be easier to compare changes if the order of the intersection bars in the "adapted" plot matched the original. For example, while total intersections increased (251→285), the PubChem+KEGG overlap decreased (24→5), likely due to reallocation to the full intersection.
      7. In your example of D-alanine and L-alanine - you mention how chirality is important biological feature, but up to this point it's not clear how do you do translation exactly and in which situations this would be treated just as "alanine" and when the more precise information would be retained? You mention RaMP-DB knowledge and one to X mappings as well as your general guidance in the "methods" part, but it would be useful to describe in this publication how you exactly tackled this problem in the ccRCC case.
      8. In one to many mappings, it would be interesting to see quantification how frequently it was happening within a pathway or across pathways. I.e. Would going into pathway analysis "solve" the issue of "lost in translation" or not really?
      9. QC: the coefficient of variation (CV) helps identify features with high variability and thus low detection accuracy. Here it's important to acknowledge that if the feature is very variable between groups it can be extremely important, but if the feature is very variable within the group - only then one would have low trust in the accuracy.
      10. Missing value imputation - while missing not at random is a great way to deal with missingness, it would be great to have options for others (not just MNAR), as missingness is of a complex nature. If a pretty strong decision has been made, it would be good to support this by some supplementary data (i.e. how results change while applying various combinations of missingness and why choosing MNAR seems to be the most robust).
      11. In the pre-processing and imputation stages - it would be interesting to see a summary table of how many features are left after each stage.
      12. Is there a reason not to do UMAP or PSL-DA graphs for outlier detection? Doing more than PCA would help to have more confidence in removing or retaining outliers in the cases where biological relevance is borderline.
      13. Metadata vs metabolite features - can this be used beyond metabolomics (i.e. proteomics, transcriptomics, etc)? It can be always very useful when there are many metadata features and it's hard to pre-select beforehand which ones are the most biologically relevant.
      14. While authors discussed what KEGG pathways were significantly deregulated, it would be interesting to see all the pathways that were affected (e.g. aPEAR "bubble" graphs can show this (https://github.com/kerseviciute/aPEAR) , or something similar to NES scores). I appreciate the trickiness of it, but it would be quite interesting to see how authors e.g. Figure5e narrowed it down to the two pathways and how all the others looked like.
      15. Could you comment on the runtime of the pipeline? In particular, do the additional translation steps and use of multiple databases substantially affect computational speed?
      16. I clap to the authors for automated checks if selected methods are appropriate!
      17. My suggestion would be to also look into power calculation or p-value histogram. In your example you saw some clear signal, but very frequently research studies are under-sampled and while effect can be clearly seen, there are just not enough samples to have statistically significant hits.
      18. Overall functional parts are novel and next step in helping with data interpretability, but I still found it hard to read into functionally clear insights (re to pathways / functional groupings of metabolites) - especially as you have e.g. enzyme-metabolite databases etc. I think clarity there could be improved and would help to get your message more widely across.

      Significance

      This is a great tool and I can't wait to use it on many upcoming metabolomics projects! Authors tackle multiple ongoing issues within the field: from poor selection of statistical methods (they provide guidance or have default safer options) to the messiness of data annotation between databases and improving data interpretability. The field is still evolving quickly, and it's impossible to solve all problems with one package; thus some limitations within the package could be seen as a bit rigid. Nonetheless, this fully steps toward filling an existing methodological gap. All bioinformaticians doing metabolomic analysis, or those learning how to do it, will greatly benefit from this knowledge.

      I myself lead a team of 6 bioinformaticians, and we do analysis for researchers, clinicians, drug discovery, and various companies. We run internal metabolomics pipelines every day and fully sympathise with the problems addressed by the authors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Thank you so much for your comprehensive and insightful assessment of our manuscript. We appreciate your recognition of the novelty of our experimental design and the utility of our computational framework for interpreting visual remapping across the lifespan and in clinical populations. We are very grateful for your suggestions regarding the narrative flow, which have helped us to improve the manuscript's focus and coherence. Our responses to your specific concerns are detailed below.

      (1) Relevance of the figure-copy results (pp. 13-15). Is it necessary to include the figure-copy task results within the main text? The manuscript already presents a clear and coherent narrative without this section. The figure-copy task represents a substantial shift from the LOCUS paradigm to an entirely different task that does not measure the same construct. Moreover, the ROCF findings are not fully consistent with the LOCUS results, which introduces confusion and weakens the manuscript's coherence. While I understand the authors' intention to assess the ecological validity of their model, this section does not effectively strengthen the manuscript and may be better removed or placed in the Supplementary Materials.

      We thank the reviewer  for their perspective regarding the narrative flow and the transition between the LOCUS paradigm and the ROCF results. However, we remain keen to retain these findings in the main text, as they provide critical ecological and clinical validation for the computational mechanisms identified in our study.

      We think these results strengthen the manuscript for the following main reasons:

      (1) The ROCF we used is a standard neuropsychological tool for identifying constructional apraxia. Our results bridge the gap between basic cognitive neuroscience and clinical application by demonstrating that specific remapping parameters—rather than general memory precision—predict real-world deficits in patients.

      (2) The finding that our winning model explains approximately 62% of the variance in ROCF copy scores across all diagnostic groups further indicates that these parameters from the LOCUS task represent core computational phenotypes that underpin complex, real-life visuospatial construction (copying drawings).

      (3) Previous research has often observed only a weak or indirect link between drawing ability and traditional working memory measures, such as digit span (Senese et al., 2020). This was previously attributed to “deictic” strategies—like frequent eye and hand movements—that minimise the need to hold large amounts of information in memory (Ballard et al., 1995; Cohen, 2005; Draschkow et al., 2021). While our study was not exclusively designed to catalogue all cognitive contributions to drawing, the findings provide significant and novel evidence indicating that transsaccadic integration is a critical driver of constructional (copying drawing) ability. By demonstrating this link, the results provide evidence to stimulate a new direction for future research, shifting the focus from general memory capacity toward the precision of spatial updating across eye movements.

      In summary, by including the ROCF results in the main text, we provide evidence for a functional role for spatial remapping that extends beyond perceptual stability into the domain of complex visuomotor control. We have expanded on these points throughout the revised manuscript:

      In the Introduction: p.2:

      “The clinical relevance of these spatial mechanisms is underscored by significant disruptions to visuospatial processing and constructional apraxia—a deficit in copying and drawing figures—observed in neurodegenerative conditions such as Alzheimer's disease (AD) and Parkinson's disease (PD).[20,21] This raises a crucial question: do clinical impairments in complex visuomotor tasks stem from specific failures in transsaccadic remapping? If so, the computational parameters that define normal spatial updating should also provide a mechanistic account of these clinical deficits, differentiating them from general age-related decline.”

      p.3: "Finally, by linking these mechanistic parameters to a standard clinical measure of constructional ability (the Rey-Osterrieth Complex Figure task), we demonstrate that transsaccadic updating represents a core computational phenotype underpinning real-world visuospatial construction in both health and neurodegeneration.

      In the Results:

      “To assess whether the mechanistic parameters derived from the LOCUS task represent core phenotypes of real-world visuospatial abilities, we also instructed all participants to complete the Rey-Osterrieth Complex Figure copy task (ROCF; Figure 7A) on an Android tablet using a digital pen (see examples in Figure 7B; all Copy data are available in the open dataset: https://osf.io/95ecp/). The ROCF is a gold-standard neuropsychological tool for identifying constructional apraxia.[29] Historically, drawing performance has shown only weak or indirect correlations with traditional working memory measures.[30] This disconnect has been attributed to active visual-sampling strategies—frequent eye movements that treat the environment as an external memory buffer, minimising the necessity of holding large volumes of information in internal working memory.[3–5]

      We hypothesised that drawing accuracy is primarily constrained by the precision of spatial updating across frequent saccades rather than raw memory capacity. To evaluate the ecological validity of the identified saccade-updating mechanism, we modelled individual ROCF copy scores across all four groups using the estimated (maximum a posteriori) parameters from the winning “Dual (Saccade) + Interference” model (Model 7; Figure 8) as regressors in a Bayesian linear model. Prior to inclusion, each regressor was normalised by dividing by the square root of its variance.

      This model successfully explained 61.99% of the variance in ROCF copy scores, indicating that these computational parameters are strong predictors of real-word constructional ability (Figure 8A). … This highlights the critical role of accurate remapping based on saccadic information; even if the core saccadic update mechanism is preserved across groups (as shown in previous analyses), the precision of this updating process is crucial for complex visuospatial tasks. Moreover, worse ROCF copy performance is associated particularly with higher initial angular encoding error. This indicates that imprecision in the initial registration of angular spatial information contributes to difficulties in accurately reproducing complex visual stimuli.”

      In the Discussion:

      “Importantly, our computational framework establishes a direct mechanistic link between trassaccadic updating and real-world constructional ability. Specifically, higher saccade and angular encoding errors contribute to poorer ROCF copy scores. By mapping these mechanistic estimates onto clinical scores, we found that the parameters derived from our winning model explain approximately 62% of the variance in constructional performance across groups. These findings suggest that the computational parameters identified in the LOCUS task represent core phenotypes of visuospatial ability, providing a mechanistic bridge between basic cognitive theory and clinical presentation.

      This relationship provides novel insights into the cognitive processes underlying drawing, specifically highlighting the role of transsaccadic working memoty.ry. Previous research has primarily focused on the roles of fine motor control and eye-hand coordination in this skill.[4,50–55] This is partly because of consistent failure to find a strong relation between traditional memory measures and copying abili [4,31] For instance, common measures of working memory, such as digit span and Corsi block tasks, do not directly predict ROCF copying performance.[31,56] Furthermore, in patients with constructional apraxia, these memory performance measures often remain relatively preserved despite significant drawing impairments.[56–58] In the literature, this lack of association has often been attributed to “deictic” visual-sampling strategies, characterised by frequent eye movements that treat the environment as an external memory buffer, thereby minimising the need to maintain a detailed internal representation.[4,59] In a real-world copying task, the ROCF requires a high volume of saccades, making it uniquely sensitive to the precision of the dynamic remapping signals identified here. Recent eye-tracking evidence confirms that patients with AD exhibit significantly more saccades and longer fixations during figure copying compared to controls, potentially as a compensatory response to trassaccadic working memory constraints.[56] This high-frequency sampling—averaging between 150 and 260 saccades for AD patients compared to approximately 100 for healthy controls—renders the task highly dependent on the precision of dynamic remapping signals.[56] To ensure this relationship was not driven by a general "g-factor" or non-spatial memory impairment, we further investigated the role of broader cognitive performance using the ACE-III Memory subscale. We found that the relationship between transsaccadic working memory and ROCF performance remains highly significant, even after controlling for age, education, and ACE-III Memory subscore. This suggests that transsaccadic updating may represent a discrete computational phenotype required for visuomotor control, rather than a non-specific proxy for global cognitive decline.

      In other words, even when visual information is readily available in the world, the act of copying depends critically on working memory across saccades. This reveals a fundamental computational trade-off: while active sampling strategies (characterised with frequent eye-hand movements) effectively reduce the load on capacity-limited working memory, they simultaneously increase the demand for precise spatial updating across eye movements. By treating the external world as an "outside" memory buffer, the brain minimises the volume of information it must hold internally, but it becomes entirely dependent on the reliability with which that information is remapped after each eye movement. This perspective aligns with, rather contradicts, the traditional view of active sampling, which posits that individuals adapt their gaze and memory strategies based on specific task demands.[3,60] Furthermore, this perspective provides a mechanistic framework for understanding constructional apraxia; in these clinical populations, the impairment may not lie in a reduced memory "span," but rather in the cumulative noise introduced by the constant spatial remapping required during the copying process.[58,61]

      Beyond constructional ability, these findings suggest that the primary evolutionary utility of high-resolution spatial remapping lies in the service of action rather than perception. While spatial remapping is often invoked to explain perceptual stability,[11–13,15] the necessity of high-resolution transsaccadic memory for basic visual perception is debated.[13,62–64] A prevailing view suggests that detailed internal models are unnecessary for perception, given the continuous availability of visual information in the external world.[13,44] Our findings support an alternative perspective, aligning with the proposal that high-resolution transsaccadic memory primarily serves action rather than perception.[13] This is consistent with the need for precise localisation in eye-hand coordination tasks such as pointing or grasping.[65] Even when unaware of intrasaccadic target displacements, individuals rapidly adjust their reaching movements, suggesting direct access of the motor system to remapping signals.66 Further support comes from evidence that pointing to remembered locations is biased by changes in eye position,[67] and that remapping neurons reside within the dorsal “action” visual pathway, rather than the ventral “perception” visual pathway.[13,68,69] By demonstrating a strong link between transsaccadic working memory and drawing (a complex fine motor skill), our findings suggest that precise visual working memory across eye movements plays an important role in complex fine motor control.”

      (2) Model fitting across age groups (p. 9).

      It is unclear whether it is appropriate to fit healthy young and healthy elderly participants' data to the same model simultaneously. If the goal of the model fitting is to account for behavioral performance across all conditions, combining these groups may be problematic, as the groups differ significantly in overall performance despite showing similar remapping costs. This suggests that model performance might differ meaningfully between age groups. For example, in Figure 4A, participants 22-42 (presumably the elderly group) show the best fit for the Dual (Saccade) model, implying that the Interference component may contribute less to explaining elderly performance.

      Furthermore, although the most complex model emerges as the best-fitting model, the manuscript should explain how model complexity is penalized or balanced in the model comparison procedure. Additionally, are Fixation Decay and Saccade Update necessarily alternative mechanisms? Could both contribute simultaneously to spatial memory representation? A model that includes both mechanisms-e.g., Dual (Fixation) + Dual (Saccade) + Interference-could be tested to determine whether it outperforms Model 7 to rule out the sole contribution of complexity.

      We thank you for the opportunity to expand upon and clarify our modelling approach. Our decision to use a common generative model for both young and older adults was grounded in the empirical finding that there was no significant interaction between age group and saccade condition for either location or colour memory. While older adults demonstrated lower baseline precision, the specific "saccade cost" remained remarkably consistent across cohorts. This was the justification we proceeded on to use of a common model to assess quantitative differences in parameter estimates while maintaining a consistent mechanistic framework for comparison.

      Moreover, our winning model nests simpler models as special cases, providing the flexibility to naturally accommodate groups where certain components—such as interference—might play a reduced role. This ultimately confirms that the mechanisms for age-related memory deficits in this task reflect more general decline rather than a qualitative failure of the saccadic remapping process.

      This approach is further supported by the properties of the Bayesian model selection (BMS) procedure we used, which inherently penalises the inclusion of unnecessary parameters. Unlike maximum likelihood methods, BMS compares marginal likelihoods, representing the evidence for a model integrated over its entire parameter space. This follows the principle of Bayesian Occam’s Razor, where a model is only favoured if the improvement in fit justifies the additional parameter space; redundant parameters instead "dilute" the probability mass and lower the model evidence.

      Consequently, we contend that a hybrid model combining fixation and saccade mechanisms is unnecessary, as we have already adjudicated between alternative mechanisms of equal complexity. Specifically, Model 6 (Dual Fixation + Interference) and Model 7 (Dual Saccade + Interference) possess an identical number of parameters. The fact that Model 7 emerged as the clear winner—providing substantial evidence against Model 6 with a Bayes Factor of 6.11—demonstrates that our model selection is driven by the specific mechanistic account of the data rather than a simple preference for complexity.

      We have revised the Results and Discussion sections of the manuscript to state these points more explicitly for readers and have included references to established literature regarding the robustness of marginal likelihoods in guarding against overfitting.

      In the Results,

      “By fitting these models to the trial-by-trial response data from all healthy participants (N=42), we adjudicated between competing mechanisms to determine which best explained participant performance (Figure 4). We used random-effects Bayesian model selection to identify the most plausible generative model. This process relies on the marginal likelihood (model evidence), which inherently balances model fit against complexity—a principle often referred to as Occam’s razor.[25–27] The analysis yielded a strong result: the “Dual (Saccade) + Interference” model (Model 7 in Table 1) emerged as the winning model, providing substantial evidence against the next best alternative with a Bayes Factor of 6.11.”

      In the Discussion:

      “Our framework employs Variational Laplace, a method used to recover computational phenotypes in clinical populations like those with substance use disorders,[34,35] and the models we fit using this procedure feature time-dependent parameterisation of variance—conceptually similar to the widely-used Hierarchical Gaussian Filter.[36–39] Importantly, the risk of overfitting is mitigated by the Bayesian Model Selection framework; by utilising the marginal likelihood for model comparison, the procedure inherently penalises excessive model complexity and promotes generalisability.[25–27,40] This generalisability was further evidenced by the model's ability to predict performance on the independent ROCF task, confirming that these parameters represent robust mechanistic phenotypes rather than idiosyncratic fits to the initial dataset.”

      Minor point: On p. 9, line 336, Figure 4A does not appear to include the red dashed vertical line that is mentioned as separating the age groups.

      Thank you for pointing out this inconsistency. We apologise for the oversight; upon further review, we concluded that the red dashed vertical line was unnecessary for the clear presentation of the data. We have therefore removed the line from Figure 4A and deleted the corresponding sentence in the figure caption.

      (3) Clarification of conceptual terminology.

      Some conceptual distinctions are unclear. For example, the relationship between "retinal memory" and "transsaccadic memory," as well as between "allocentric map" and "retinotopic representation," is not fully explained. Are these constructs related or distinct? Additionally, the manuscript uses terms such as "allocentric map," "retinotopic representation," and "reference frame" interchangeably, which creates ambiguity. It would be helpful for the authors to clarify the relationships among these terms and apply them consistently.

      Thank you for pointing this out. We have revised the manuscript to ensure that these terms are applied with greater precision and consistency. Our revisions standardise the terminology based on the following distinctions:

      Reference frames: We distinguish between the eye-centred reference frame (coordinate systems that shift with gaze) and the world-centred reference frame (coordinate systems anchored to the environment).

      Retinotopic representation vs. allocentric map: We clarify that retinotopic representations are encoded within an eye-centred reference frame and are updated with every ocular movement. Conversely, the allocentric map is anchored to stable environmental features, remaining invariant to the observer’s gaze direction or position.

      Retinotopic memory vs. transsaccadic memory: We have removed the term "retinal memory" to avoid ambiguity. We now consistently use retinotopic memory to describe the persistence of visual information in eye-centred coordinates within a single fixation. In contrast, transsaccadic memory refers to the higher-level integration of visual information across saccades, which involves the active updating or remapping of representations to maintain stability.

      To incorporate these clarifications, we have implemented the following changes:

      In the Introduction, the second paragraph has been entirely rewritten to establish these definitions at the outset, providing a clearer theoretical framework for the study.

      “Central to this enquiry is the nature of the coordinate system used for the brain's internal spatial representation. Does the brain maintain a single, world-centred (allocentric) map, or does it rely on a dynamic, eye-centred (retinotopic) representation?[11,13,15,16] In the latter system, retinotopic memory preserves spatial information within a fixation, whereas transsaccadic memory describes the active process of updating these representations across eye movements to achieve spatiotopic stability—the perception of a stable world despite eye movements.[11,16–18] If spatial stability is indeed reconstructed through such remapping, the mechanism remains unresolved: do we retain memories of absolute fixation locations, or do we reconstruct these positions from noisy memories of the intervening saccade vectors? We can test these hypotheses by analysing when and where memory errors occur. Assuming that memory precision declines over time,[19] the resulting error distributions should reveal the specific variables that are represented and updated across each saccade.”

      In the Results, the opening section of the Results has been reorganised to align with this terminology. We have ensured that the hypotheses and behavioural data—specifically the definition of "saccade cost"—are introduced using this consistent conceptual vocabulary to improve the overall coherence of the narrative.

      (4) Rationale for the selective disruption hypothesis (p. 4, lines 153-154). The authors hypothesize that "saccades would selectively disrupt location memory while leaving colour memory intact." Providing theoretical or empirical justification for this prediction would strengthen the argument.

      We have revised the Results to state the hypothesis more explicitly and expanded the Discussion to provide a robust theoretical and empirical rationale:

      In the Results,

      “This design allowed us to isolate and quantify the unique impact of saccades on spatial memory, enabling us to test competing hypotheses regarding spatial representation. If spatial memory were solely underpinned by an allocentric mechanism, precision should remain comparable across all conditions as the representation would be world-centred and unaffected by eye movements. Thus, performance in the no-saccade condition should be comparable to the two-saccade condition. Conversely, if spatial memory relies on a retinotopic representation requiring active updating across eye movements, the two-saccade condition was anticipated to be the most challenging due to cumulative decay in the memory traces used for stimulus reconstruction after each saccade.[22] Critically, we hypothesised that this saccade cost would be specific to the spatial domain; while location requires active remapping via noisy oculomotor signals, non-spatial features like colour are not inherently tied to coordinate transformations and should therefore remain stable (see more in Discussion below).

      Meanwhile, the no-saccade condition was expected to yield the most accurate localisation, relying solely on retinotopic information (retinotopic working memory). These predictions were confirmed in young healthy adults (N = 21, mean age = 24.1 years, ranged between 19 and 34). A repeated measures ANOVA revealed a significant main effect of saccades on location memory (F(2.2,43.9)=33.2, p<0.001, partial η²=0.62), indicating substantial impairment after eye movements (Figure 2A). In contrast, colour memory remained remarkably stable across all saccade conditions (Figure 2B; F(2.2, 44.7) = 0.68, p=0.53, partial η² =0.03).

      This “saccade cost”—the loss of memory precision following an eye movement—indicates that spatial representations require active updating across saccades rather than being maintained in a static, world-centred reference frame.

      Critically, our comparison between spatial and colour memory does not rely on the absolute magnitude of errors, which are measured in different units (degrees of visual angle vs. radians). Instead, we assessed the relative impact of the same saccadic demand on each feature within the same trial. While location recall showed a robust saccade cost, colour recall remained statistically unchanged. To ensure this null effect was not due to a lack of measurement sensitivity, we examined the recency effect; recall performance for the second item was predicted to be better than for the first stimulus in each condition.[23,24] As expected, colour memory for Item 2 was significantly more accurate than for Item 1 (F(1,20) = 6.52, p = 0.02, partial η² = 0.25), demonstrating that the task was sufficiently sensitive to detect standard working memory fluctuations despite the absence of a saccade-induced deficit.”

      In the Discussion, we now write that on p.18:

      “A clear finding was the specificity of the saccade cost to spatial features; it was not observed for non-spatial features like colour, even in neurodegenerative conditions. This discrepancy challenges notions of fixed visual working memory capacity unaffected by saccades.16,44–46 The differential impact on spatial versus non-spatial features in transsaccadic memory aligns with the established "what" and "where" pathways in visual processing.32,33 For objects to remain unified, object features must be bound to stable representations of location across saccades.19 One possibility is that remapping updates both features and location through a shared mechanism, predicting equal saccadic interference for both colour and location in the present study.

      However, our findings suggest otherwise. One potential concern is whether this dissociation simply reflects the inherent spatial noise introduced by fixational eye movements (FEMs), such as microssacades and drifts.47 Because locations are stored in a retinotopic frame, fixational instability necessarily shifts retinal coordinates over time. However, the "saccade cost" here was defined as the error increase relative to a no-saccade baseline of equal duration; because both conditions are subject to the same fixational drift, any FEM-induced noise is effectively subtracted out. Thus, despite the ballistic and non-Gaussian nature of FEMs,48 they cannot account for the fact the saccade cost in the spatial memory, but total absence in the colour domain. Another possibility is that this dissociation reflects differences in baseline task difficulty or dynamic range. Yet, the presence of a robust recency effect in colour memory (Figure 2B) confirms that our paradigm was sensitive to memory-dependent variance and was not limited by floor or ceiling effects.

      The fact that identical eye movements—executed simultaneously and with identical vectors—systematically degraded spatial precision while sparing colour suggests a feature-specific susceptibility to transsaccadic remapping. This supports the view that the computational process of updating an object’s location involves a vector-subtraction mechanism—incorporating noisy oculomotor commands (efference copies)—that introduces specific spatial variance. Because this remapping is a coordinate transformation, the resulting sensorimotor noise does not functionally propagate to non-spatial feature representations. Consequently, features like colour may be preserved or automatically remapped without the precision loss associated with spatial updating.11,49 Our paradigm thus provides a refined tool to investigate the architecture of transsaccadic working memory across distinct object features.”

      (5) Relationship between saccade cost and individual memory performance (p. 4, last paragraph).

      The authors report that larger saccades were associated with greater spatial memory disruption. It would be informative to examine whether individual differences in the magnitude of saccade cost correlate with participants' overall/baseline memory performance (e.g. their memory precision in the no-saccade condition). Such analyses might offer insights into how memory capacity/ability relates to resilience against saccade-induced updating.

      We have now conducted the correlation analysis to determine whether baseline memory capacity (no-saccade condition) predicts resilience to saccade-induced updating. The results indicate that these two factors are independent.

      To clarify the nature of the saccade-induced impairment, we have updated the text as follows:

      p.4: “This “saccade cost”—the loss of memory precision following an eye movement—indicates that spatial representations require active updating across saccades rather than being maintained in a static, world-centred reference frame.”

      p.5: “Further analysis examined whether individual differences in baseline memory precision (no-saccade condition) predicted resilience to saccadic disruption. Crucially, individual saccade costs (defined as the precision loss relative to baseline) did not correlate with baseline precision (rho = 0.20, p = 0.20). This suggests that the noise introduced by transsaccadic remapping acts as an independent, additive source of variance that is not modulated by an individual’s underlying memory capacity. These findings imply a functional dissociation between the mechanisms responsible for maintaining a representation and those involved in its coordinate transformation.”

      (6) Model fitting for the healthy elderly group to reveal memory-deficit factors (pp. 11-12). The manuscript discusses model-based insights into components that contribute to spatial memory deficits in AD and PD, but does not discuss components that contribute to spatial memory deficits in the healthy elderly group. Given that the EC group also shows impairments in certain parameters, explaining and discussing these outcomes of the EC group could provide additional insights into age-related memory decline, which would strengthen the study's broader conclusions.

      This is a very good point. We rewrote the corresponding results section (p.12-13):

      “Modelling reveals the sources of spatial memory deficits in healthy aging and neurodegeneration - To understand the source of the observed deficits, we applied the winning ‘Dual (Saccade) + Interference’ model the data from all participants (YC, EC, AD, and PD). By fitting the model to the entire dataset, we obtained estimates of the parameters for each individual, which then formed the basis for our group-level analysis. To formally test for group differences, we used Parametric Empirical Bayes (PEB), a hierarchical Bayesian approach that compares parameter estimates across groups while accounting for the uncertainty of each estimate [28]. This allowed us to identify which specific cognitive mechanisms, as formalised by the model parameters, were affected by age and disease.

      The Bayesian inversion used here allows us to quantify the posterior mode and variance for each parameter and the covariance for each parameter. From these, we can compute the probabilities that pairs of parameters differ from one another, which we report as P(A>B)—meaning the posterior probability that the parameter for group A was greater than that for group B.

      We first examined the specific parameters differentiating healthy elderly (EC) from young controls (YC) to isolate the factors contributing to non-pathological, age-related decline. The analysis revealed that healthy ageing is primarily characterised by a significant increase in Radial Decay (P(EC > YC) = 0.995), a heightened susceptibility to Interference (P(EC > YC) = 1.000), and a reduction in initial Angular Encoding precision (P(YC < EC) = 0.002; Figure 6). These results suggest that normal ageing degrades the fidelity of the initial memory trace and its resilience over time, while the core computational process of updating information across saccades remains intact.

      Beyond these baseline ageing effects, our clinical cohorts exhibited more severe and condition-dependent impairments. Radial decay showed a clear, graded impairment: AD patients had a greater decay rate than PD patients (P(AD > PD) = 1.000), who in turn were more impaired than the EC group (P(PD > EC) = 0.996). A similar graded pattern was observed for Interference, where AD patients were most susceptible (P(AD > PD) = 0.999), while the PD and EC groups did not significantly differ (P(PD > EC) = 0.532).

      Patients with AD also showed a tendency towards greater angular decay than controls (P(AD > EC) = 0.772), although this fell below the 95% probability threshold. This effect was influenced by a lower decay rate in the PD group compared to the EC group (P(PD < EC) = 0.037). In contrast, group differences in encoding were less pronounced. While YC exhibited significantly higher precision than all other groups, AD patients showed significantly higher angular encoding error than PD patients (P(AD > PD) = 0.985), though neither group differed significantly from the EC group.

      Crucially, parameters related to the saccade itself—saccade encoding and saccade decay—did not differentiate the groups. This indicates that neither healthy ageing nor the early stages of AD and PD significantly impair the fundamental machinery for transsaccadic remapping. Instead, the visuospatial deficits in these conditions arise from specific mechanistic failures: a faster decay of radial position information and increased susceptibility to interference, both of which are present in healthy ageing but significantly amplified by neurodegeneration.”

      In the Discussion, we added:

      “Although saccade updating was an essential component of the winning model, its two key parameters—initial encoding error and decay rate during maintenance—did not significantly differ across groups. This indicates that the core computational process of updating spatial information based on eye movements is largely preserved in healthy aging and neurodegeneration.

      Instead, group differences were driven by deficits in angular encoding error (precision of initial angle from fixation), angular decay, radial decay (decay in memory of distance from fixation), and interference susceptibility. This implies a functional and neuroanatomical dissociation: while the ventral stream (the “what” pathway) shows an age-related decline in the quality and stability of stored representations, the dorsal-stream (the “where” pathway) parietal-frontal circuits responsible for coordinate transformations remain functionally robust.[31–34] These spatial updating mechanisms appear resilient to the normal ageing trajectory and only break down when challenged by the specific pathological processes seen in Alzheimer’s or Parkinson’s disease.”

      (7) Presentation of saccade conditions in Figure 5 (p. 11). In Figure 5, it may be clearer to group the four saccade conditions together within each patient group. Since the main point is that saccadic interference on spatial memory remains robust across patient groups, grouping conditions by patient type rather than intermixing conditions would emphasize this interpretation.

      There are several valid ways to present these plots, but we chose this format because it allows for a direct visual comparison of the post-hoc group differences within each specific task demand. This arrangement clearly illustrates the graded impairment from young controls through to patients with Alzheimer’s disease across every condition. This structure also directly mirrors our two-way ANOVA, which identified significant main effects for both Group and Condition, but crucially, no significant Group x Condition interaction. We felt that grouping the data by participant group would force readers to look across four separate clusters to compare the slopes, making the stability of the saccadic remapping mechanism much harder to grasp at a glance.

      Reviewer #1 (Recommendations for the authors):

      (1) Formatting of statistical parameters.

      The formatting of statistical symbols should be consistent throughout the manuscript. Some instances of F, p, and t are italicized, while others are not. All statistical symbols should be italicized.

      Thank you for pointing this out. We have audited the manuscript. While we have revised the text to address these instances throughout the Results and Methods sections, any remaining minor formatting inconsistencies will be corrected during the final typesetting stage.

      (2) Minor typographical issues.

      (a) Line 532: "are" should be "be."

      (b) Line 654: "cantered" should be "centered."

      (c) Line 213: In "(p(bonf) < 0.001, |t| {greater than or equal to} 5.94)," the t value should be reported with its degrees of freedom, and t should be reported before p. The same applies to line 215.

      Thank you for your careful reading. All corrected.

      Reviewer #2 (Public review):

      We thank you for your positive feedback regarding our eye-tracking methodology and computational approach. We appreciate your critical insights into the feature-specific disruption hypothesis and the task structure. We have substantially revised the results and discussion about the saccadic interference on colour memory. Below we will answer your suggestions point-by-point:

      Reviewer #2 (Recommendations for the authors):

      (1) The study treats colour and location errors as comparable when arguing that saccades selectively disrupt spatial but not colour memory. However, these measures are defined in entirely different units (degrees of visual angle vs radians on a colour wheel) and are not psychophysically or statistically calibrated. Baseline task difficulty, noise level, or dynamic range do not appear to be calibrated or matched across features. As a result, the null effect of saccades on colour could reflect lower sensitivity or ceiling effects rather than implicit feature-specific robustness.

      We agree that direct comparisons of absolute error magnitudes across different dimensions are not appropriate. Our argument for feature-specific disruption relies not on the scale of errors, but on the presence or absence of a saccade cost within identical trials. In our within-subject design, the same saccade vectors produced a systematic increase in location error while leaving colour error statistically unchanged. To address sensitivity, we observed that colour memory was sufficiently precise to show a significant recency effect (p = 0.02). To further quantify the evidence for the null effect, we performed Bayesian repeated measures ANOVAs, which yielded a BF10 = 0.22. This provides substantial evidence that saccades do not disrupt colour precision, regardless of baseline sensitivity.

      We have substantially revised this in Results, Methods and Discussion:

      In the Results:

      “This design allowed us to isolate and quantify the unique impact of saccades on spatial memory, enabling us to test competing hypotheses regarding spatial representation. If spatial memory were solely underpinned by an allocentric mechanism, precision should remain comparable across all conditions as the representation would be world-centred and unaffected by eye movements. Thus, performance in the no-saccade condition should be comparable to the two-saccade condition. Conversely, if spatial memory relies on a retinotopic representation requiring active updating across eye movements, the two-saccade condition was anticipated to be the most challenging due to cumulative decay in the memory traces used for stimulus reconstruction after each saccade.[22] Critically, we hypothesised that this saccade cost would be specific to the spatial domain; while location requires active remapping via noisy oculomotor signals, non-spatial features like colour are not inherently tied to coordinate transformations and should therefore remain stable (see more in Discussion below).

      Meanwhile, the no-saccade condition was expected to yield the most accurate localisation, relying solely on retinotopic information (retinotopic working memory). These predictions were confirmed in young healthy adults (N = 21, mean age = 24.1 years, ranged between 19 and 34). A repeated measures ANOVA revealed a significant main effect of saccades on location memory (F(2.2,43.9)=33.2, p<0.001, partial η²=0.62), indicating substantial impairment after eye movements (Figure 2A). In contrast, colour memory remained remarkably stable across all saccade conditions (Figure 2B; F(2.2, 44.7) = 0.68, p=0.53, partial η² =0.03).

      This “saccade cost”—the loss of memory precision following an eye movement—indicates that spatial representations require active updating across saccades rather than being maintained in a static, world-centred reference frame.

      Critically, our comparison between spatial and colour memory does not rely on the absolute magnitude of errors, which are measured in different units (degrees of visual angle vs. radians). Instead, we assessed the relative impact of the same saccadic demand on each feature within the same trial. While location recall showed a robust saccade cost, colour recall remained statistically unchanged. To ensure this null effect was not due to a lack of measurement sensitivity, we examined the recency effect; recall performance for the second item was predicted to be better than for the first stimulus in each condition.[23,24] As expected, colour memory for Item 2 was significantly more accurate than for Item 1 (F(1,20) = 6.52, p = 0.02, partial η² = 0.25), demonstrating that the task was sufficiently sensitive to detect standard working memory fluctuations despite the absence of a saccade-induced deficit.”

      In the Methods, at the beginning of “Statistical Analysis”, we added

      “Because location and colour recall involve different scales and units, all analyses were performed independently for each feature to avoid cross-dimensional magnitude comparisons.” (p25)

      In the Discussion, we added:

      “A potential concern is whether the observed dissociation between colour and location reflects differences in baseline task difficulty or dynamic range. Yet, the presence of a robust recency effect in colour memory (Figure 2B) confirms that our paradigm was sensitive to memory-dependent variance and was not limited by floor or ceiling effects.”

      (2) Colour and then location are probed serially, without a counter-balanced order. This fixed response order could introduce a systematic bias because location recall is consistently subject to longer memory retention intervals and cognitive interference from the colour decision. The observed dissociation-saccades impair location but not colour, and may therefore reflect task structure rather than implicit feature-specific differences in trans-saccadic memory.

      Thank you for the insightful observation regarding our fixed response order. We acknowledge that that a counterbalanced design is typically preferred to mitigate potential order effects. However, we chose this consistent sequence to ensure the task remained accessible for cognitively impaired patients (i.e., the Alzheimer’s disease (AD) and Parkinson’s disease (PD) cohorts). Conducting an eye-tracking memory task with cognitively impaired patients is challenging, as they may struggle with task engagement or forget complex instructions. During the design phase, we prioritised a consistent structure to reduce the cognitive load and task-switching demands that typically challenge these cohorts.

      Critically, because the saccade cost is a relative measure calculated by comparing conditions with identical timings, any bias from the fixed order is present in both the baseline and saccade trials. The disruption we report is therefore a specific effect of eye movements that goes beyond the noise introduced by the retention interval or the preceding colour report.

      We added the following text in the Methods – experimental procedure (p.22):

      “Recall was performed in a fixed order, with colour reported before location. This sequence was primarily chosen to minimise cognitive load and task-switching demands for the two neurological patient cohorts, ensuring the paradigm remained accessible for individuals with AD and PD. While this order results in a slightly longer retention interval for location recall, the saccade cost was identified by comparing location error across experimental conditions with similar timings but varying saccadic demands.”

      (3) Relatedly, because spatial representations are retinotopic, fixational eye movements (FEMs - microsaccades and drift) displace the retinal coordinates of encoded positions, increasing apparent spatial noise with time delays. Colour memory, however, is feature-based and unaffected by small retinal translations. Thus, any between-condition or between-group differences in FEMs could selectively inflate location error and the associated model parameters (encoding noise, decay, interference), while leaving colour error unchanged. Note that FEMs tend to be slightly ballistic [1,2], hence not well modelled with a Gaussian blur.

      This is a very insightful point. We have now addressed this in detail within the discussion:

      “However, our findings suggest otherwise. One potential concern is whether this dissociation simply reflects the inherent spatial noise introduced by fixational eye movements (FEMs), such as microssacades and drifts.[46] Because locations are stored in a retinotopic frame, fixational instability necessarily shifts retinal coordinates over time. However, the "saccade cost" here was defined as the error increase relative to a no-saccade baseline of equal duration; because both conditions are subject to the same fixational drift, any FEM-induced noise is effectively subtracted out. Thus, despite the ballistic and non-Gaussian nature of FEMs,n [47] they cannot account for the fact the saccade cost in the spatial memory, but total absence in the colour domain. Another possibility is that this dissociation reflects differences in baseline task difficulty or dynamic range. Yet, the presence of a robust recency effect in colour memory (Figure 2B) confirms that our paradigm was sensitive to memory-dependent variance and was not limited by floor or ceiling effects.”

      (4) There is no in silico demonstration that the modelling framework can recover the true generating model from synthetic data or recover accurate parameters under realistic noise levels, which can be challenging in generative models with a hierarchical structure (as per [3], for example). Figure 8b shows that the parameters possess substantial posterior covariance, which raises concerns as to whether they can be reliably disambiguate.

      Many thanks for this comment. We have added a simple recovery analysis as detailed below but are also keen to ensure we fully answer your question—which has more to do with empirical rather than simulated data—and make clear the rationale for this analysis in this instance.

      We added this in Supplementary Materials:

      “Model validation and recovery analysis

      The following section provides a detailed technical assessment of the model inversion scheme, focusing on the discriminability of the model space and the identifiability of individual parameters.

      Recovery analyses of this sort are typically used prior to collecting data to allow one to determine whether, in principle, the data are useful in disambiguating between hypotheses. In this sense, they have a role analogous to a classical power calculation. However, their utility is limited when used post-hoc when data have already been collected, as the question of whether the models can be disambiguated becomes one of whether non-trivial Bayes factors can be identified from those data.

      The reason for including a recovery analysis here is not to identify whether the model inversion scheme identifies a ‘true’ model. The concept of ‘true generative models’ commits to a strong philosophical position which is at odds with the ‘all models are wrong, but some are useful’ perspective held by many in statistics, e.g., (So, 2017). Of note, one can always confound a model recovery scheme by generating the same data in a simple way, and in (one of an infinite number of) more complex ways. A good model inversion scheme will always recover the simple model and therefore would appear to select the ‘wrong’ model in a recovery analysis. However, it is still the best explanation for the data. For these reasons, we do not necessarily expect ‘good’ recoverability in all parameter ranges. This is further confounded by the relationship between the models we have proposed—e.g., an interference model with very low interference will look almost identical to a model with no interference. The important question here is whether they can be disambiguated with real data.

      Instead, the value of a post-hoc recovery analysis here is to evaluate whether there was a sensible choice of model space—i.e., that it was not a priori guaranteed that a single model (and, specifically, the model we found to be the best explanation for the data) would explain the results of all others. To address this, for each model, we simulated 16 datasets, each of which relied upon parameters sampled from the model priors, which included examples of each of the experimental conditions. We then fit each of these datasets to each of the 7 models to construct the confusion matrix shown in the lower panel of Supplementary Figure 3, by accumulating evidence over each of the 16 participants generated according to each ‘true’ model (columns) for each of the possible explanatory models (rows). This shows that no one model, for the parameter ranges sampled here, explains all other datasets. Interestingly, our ‘winning’ model in the empirical analysis is not the best explanation for any of the datasets simulated (including its own). This is reassuring, in that it implies this model winning was not a foregone conclusion and is driven by the data—not just the choice of model space.”

      Your point about the posterior covariance is well founded. As we describe in Supplementary Materials, this is an inherent feature of inverse problems (analogous to EEG source localisation). However, the fact that our posterior densities move significantly away from the prior expectations demonstrates that the data are indeed informative. By adopting a Bayesian framework, we are able to explicitly quantify this uncertainty rather than ignoring it, providing a more transparent account of parameter identifiability. We have added the following in the same section of Supplementary Materials:

      “This problem is an inverse problem—inferring parameters from a non-linear model. We therefore expect a degree of posterior covariance between parameters and, consequently, that they cannot be disambiguated with complete certainty. While some degree of posterior covariance is inherent to inverse models—including established methods like EEG source localisation—the fact that many of the parameters are estimated with posterior densities that do not include their prior expectations implies the data are informative about these.

      The advantage of the Bayesian approach we have adopted here is that we can explicitly quantify posterior covariance between these parameters, and therefore the degree to which they can be disambiguated. While the posterior covariance matrices from empirical data are the relevant measure here, we can better understand the behaviour of the model inversion scheme in relation to the specific models used using the model recovery analysis reported in Supplementary figure 3.

      The middle panel of the figure is key, along with the correlation coefficients reported in the figure caption. Here, we see at least a weak positive correlation (in some cases much stronger) for almost all parameters and limited movement from prior expectations for those parameters that are less convincingly recovered. This reinforces that the ability of the scheme to recover parameters is best assessed in terms of the degree of movement of posterior from prior values following fitting to empirical data.”

      (5) The authors employ Bayes factors (BFs) to disambiguate models, but BFs would also strengthen the claims that location, but not colour, is impacted by saccades. Despite colour being a circular variable, colour error is analysed using ANOVA on linearised differences (radians). The authors should also arguably use circular statistics, such as the von Mises distribution, for the analysis of colour.

      Regarding the use of circular statistics, you are correct that such error distributions are not suitable for ANOVA, and it is better to use circular statistics. However, for the present dataset, we used the mean absolute angular error per condition (ranging from 0 to π radians), which represents the shortest distance on the colour wheel between the target and the response.

      This approach effectively linearises the measure by removing the 2π wrap-around boundary. because the observed errors were relatively small and did not cluster near the π boundary—even in the patient cohorts (Figure 5B)—the "wrap-around" effect of circular space is negligible. Moreover, by analysing the mean error across trials for each condition, rather than trial-wise data, we invoke the Central Limit Theorem. This ensures that the distribution of these means is approximately normal, satisfying the fundamental assumptions of ANOVA. Due to these reasons, we adopted simpler linear models. We confirmed that the data did not violate the assumptions of linear statistics. In this low-noise regime, linear and circular models converge on the same conclusions. This has been revised in Methods:

      “For colour memory, we calculated the absolute angular error, defined as the shortest distance on the colour wheel between the target and the reported colour (range 0 to π radians). For the primary statistical analyses, we utilised the mean absolute error per condition for each participant. By analysing these condition-wise means rather than trial-wise raw data, we invoke the Central Limit Theorem, which ensures that the sampling distribution of these means approximates normality. Because the absolute errors in this paradigm were relatively small and did not approach the π boundary (Figure 5B) even in the clinical cohorts, the data were treated as a continuous measure in our linear ANOVAs and regression models. Moreover, because location and colour recall involve different scales and units, all analyses were performed independently for each feature to avoid cross-dimensional magnitude comparisons.”

      We have also now integrated Bayesian repeated measures ANOVA throughout the manuscript. The Results section for the young healthy adults now reads (p. 4):

      “A repeated measures ANOVA revealed a significant main effect of saccades on location memory (F(3, 20) = 51.52, p < 0.001, partial η²=0.72), with Bayesian analysis providing decisive evidence for the inclusion of the saccade factor (BF<sub>incl</sub> = 3.52 x 10^13, P(incl|data) = 1.00). In contrast, colour memory remained remarkably stable across all saccade conditions (F(3, 20) = 0.57, p = 0.64, partial η² =0.03). This null effect was supported by Bayesian analysis, which provided moderate evidence in favour of the null hypothesis (BF<sub>01</sub> = 8.46, P(excl|data) = 0.89), indicating that the data were more than eight times more likely under the null model than a model including saccade-related impairment.”

      For elderly healthy adults:

      “In contrast, colour memory remained unaffected by saccade demands (F(3, 20) = 0.57, p = 0.65, partial η² =0.03), again supported by the Bayesian analysis: BF<sub>01</sub> = 8.68, P(excl|data) = 0.90.”

      For patient cohorts:

      “Bayesian repeated measures ANOVAs further supported this dissociation, providing moderate evidence for the null hypothesis in the AD group (BF<sub>01</sub> = 3.35, P(excl|data) = 0.77) and weak evidence in the PD group (BF<sub>01</sub> = 2.23, P(excl|data) = 0.69). This indicates that even in populations with established neurodegeneration, the detrimental impact of eye movements is specific to the spatial domain.”

      Related description is also updated in Methods – Statistical Analysis.

      Minor:

      (1) The modelling is described as computational but is arguably better characterised as a heuristic generative model at Marr's algorithmic level. It does not derive from normative computational principles or describe an implementation in neural circuits.

      We appreciate your perspective on the classification of our model within Marr’s hierarchy. We agree that our framework is best characterised as an algorithmic-level generative model. Our objective was to identify the mechanistic principles governing transsaccadic updating rather than to provide a normative derivation or a specific circuit-level implementation.

      To ensure readers do not over-interpret the term ‘computational’, we have added a clarifying statement in the Discussion acknowledging the algorithmic nature of the model. Interestingly, we note that a model predicated on this form of spatial diffusion implies a neural field representation with a spatial connectivity kernel whose limit approximates the second derivative of a Dirac delta function. While a formal neural field implementation is beyond the scope of the present work, our algorithmic results provide the necessary constraints for such future biophysical models.

      p.20: “While we describe the present framework as 'computational', it is more precisely characterised as an algorithmic-level generative model within Marr’s hierarchy. Our focus was on defining the rules of spatial integration and the sources of eye-movement-induced noise, rather than deriving these processes from normative principles or defining their specific neural implementation.”

      (2) I did not find a description of the recruitment and characterization of the AD and PD patients.

      Apologies for this omission. We have now included a detailed description of participant recruitment and clinical characterisation in the Methods section and also updated Table 2:

      “A total of 87 participants completed the study: 21 young healthy adults (YC), 21 older healthy adults (EC), 23 patients with Parkinson’s disease (PD), and 22 patients with Alzheimer’s disease (AD). Their demographic and clinical details are summarised in Table 2. Initially, 90 participants were recruited (22 YC, 21 EC, 25 PD, 22 AD); however, three individuals (1 YC and 2 PD) were excluded from all analyses due to technical issues during data acquisition.

      All participants were recruited locally in Oxford, UK. None were professional artists, had a history of psychiatric illness, or were taking psychoactive medications (excluding standard dopamine replacement therapy for PD patients). Young participants were recruited via the University of Oxford Department of Experimental Psychology recruitment system. Older healthy volunteers (all >50 years of age) were recruited from the Oxford Dementia and Ageing Research (OxDARE) database.

      Patients with PD were recruited from specialist clinics in Oxfordshire. All had a clinical diagnosis of idiopathic Parkinson's disease and no history of other major neurological or psychiatric conditions. While specific dosages of dopamine replacement therapy (e.g., levodopa equivalent doses) were not systematically recorded, all patients were tested while on their regular medication regimen ('ON' state).

      Patients with PD were recruited from clinics in the Oxfordshire area. All had a clinical diagnosis of idiopathic Parkinson’s disease and no history of other major neurological or psychiatric illnesses. While all patients were tested in their regular medication ‘ON’ state, the specific pharmacological profiles—including the exact types of medication (e.g., levodopa, dopamine agonists, or combinations) and dosages—were not systematically recorded. The disease duration and PD severity were also un-recorded for this study.

      Patients with AD were recruited from the Cognitive Disorders Clinic at the John Radcliffe Hospital, Oxford, UK. All AD participants presented with a progressive, multidomain, predominantly amnestic cognitive impairment. Clinical diagnoses were supported by structural MRI and FDG-PET imaging consistent with a clinical diagnosis of AD dementia (e.g., temporo-parietal atrophy and hypometabolism).69 All neuroimaging was reviewed independently by two senior neurologists (S.T. and M.H.).

      Global cognitive function was assessed using the Addenbrooke’s Cognitive Examination-III (ACE-III).70 All healthy participants scored above the standard cut-off of 88, with the exception of one elderly participant who scored 85. In the PD group, two participants scored below the cut-off (85 and 79). In the AD group, six participants scored above 88; these individuals were included based on robust clinical and radiological evidence of AD pathology rather than their ACE-III score alone.”

      (3) YA and OA patients appear to differ in gender distribution.

      We acknowledge the difference in gender distribution between the young (71.4% female) and older adult (57.1% female) cohorts. However, we do not anticipate that gender influences the fundamental computational mechanisms of retinotopic maintenance or transsaccadic remapping. These processes represent low-level visuospatial functions for which there is no established evidence of gender-specific differences in precision or coordinate transformation. We have ensured that the gender distribution for each cohort is clearly listed in the demographics table (Table 2) for full transparency.

      Thank you very much for very insightful feedback!

      Reviewer #3 (Public review):

      Thank you for the positive feedback regarding our inclusion of clinical groups and the identification of computational phenotypes that differentiate these cohorts.

      To address your concerns about the model, we have clarified our use of Bayesian Model Selection, which inherently penalises model complexity to ensure that our results are not driven solely by the number of parameters. We will also provide further evidence regarding model generalisability to address the concern of overfitting.

      Regarding the link with the ROCF, we have revised the manuscript to better highlight the specific relationship between our transsaccadic parameters and the ROCF data and better motivate the inclusion of these results in the main text.

      Below is our response to your suggestions point-by-point:

      (1) The models tested differ in terms of the number of parameters. In general, a larger number of parameters leads to a better goodness of fit. It is not clear how the difference in the number of parameters between the models was taken into account. It is not clear whether the modelling results could be influenced by overfitting (it is not clear how well the model can generalize to new observations).

      To ensure our results were not driven by the number of parameters, we utilised random-effects Bayesian Model Selection (BMS) to adjudicate between our candidate models. Unlike maximum likelihood methods, BMS relies on the marginal likelihood (model evidence), which inherently balances model fit against parsimony—a principle known as the Occam’s Razor (Rasmussen and Ghahramani, 2000). In this framework, a model is only preferred if the improvement in fit justifies the additional parameter space; redundant parameters actually lower model evidence by diluting the probability mass. We would be happy to point toward literature that discusses how these marginal likelihood approximations provide a more robust guard against overfitting than standard metrics like BIC or AIC (MacKay, 2003; Murray and Ghahramani, 2005; Penny, 2012).

      The fact that the "Dual (Saccade) + Interference" model (Model 7) emerged as the winner—with a Bayes Factor of 6.11 against the next best alternative—demonstrates that its complexity was statistically justified by its superior account of the trial-by-trial data.

      Furthermore, to address the risk of overfitting, we established the generalisability of these parameters by using them to predict performance on an independent clinical task. These parameters successfully explained ~62% of the variance in ROCF copy scores—a very distinct, real-world task--confirming that they represent robust computational phenotypes rather than idiosyncratic fits to the initial dataset.

      In the Results (p10):

      “We used random-effects Bayesian model selection to identify the most plausible generative model. This process relies on the marginal likelihood (model evidence), which inherently balances model fit against complexity—a principle often referred to as Occam’s razor.[25–27]”

      In the Discussion (p17):

      “Importantly, the risk of overfitting is mitigated by the Bayesian Model Selection framework; by utilising the marginal likelihood for model comparison, the procedure inherently penalises excessive model complexity and promotes generalisability.[25–27,42] This generalisability was further evidenced by the model's ability to predict performance on the independent ROCF task, confirming that these parameters represent robust mechanistic phenotypes rather than idiosyncratic fits to the initial dataset.”

      (2) Results specificity: it is not clear how specific the modelling results are with respect to constructional ability (measured via the Rey-Osterrieth Complex Figure test). As with any cognitive test, performance can also be influenced by general, non-specific abilities that contribute broadly to test success.

      We agree that constructional performance is influenced by both specific mechanistic constraints and general cognitive abilities. To isolate the unique contribution of transsaccadic updating, we therefore performed a partial correlation analysis across the entire sample. We examined the relationship between location error in the two-saccades condition (our primary behavioural measure of transsaccadic memory) and ROCF copy scores. Even after partialling out the effects of global cognitive status (ACE-III total score), age, and years of education, the correlation remained highly significant (rho = -0.39, p < 0.001).

      This suggests that our model captures a specific computational phenotype—the precision of spatial updating during active visual sampling—rather than acting as a proxy for non-specific cognitive decline. This mechanistic link explains why traditional working memory measures (e.g., digit span or Corsi blocks) frequently fail to predict drawing performance; unlike those tasks, figure copying requires thousands of saccades, making it uniquely sensitive to the precision of the dynamic remapping signals identified by our modelling framework.

      We added the following text in the Discussion (p19):

      “We also found that the relationship between transsaccadic working memory and ROCF performance remains highly significant (rho = -0.39, p < 0.001), even after controlling for age, education, and global cognitive status (ACE-III total score). Consequently, transsaccadic updating may represent a discrete computational phenotype required for visuomotor control, rather than a non-specific proxy for global cognitive decline.[57]”

      Reviewer #3 (Recommendations for the authors):

      (1) The authors mention in the introduction the following: "One key hypothesis is that we use working memory across visual fixations to update perception dynamically", citing the following manuscript:

      Harrison, W. J., Stead, I., Wallis, T. S. A., Bex, P. J. & Mattingley, J. B. A computational 906 account of transsaccadic attentional allocation based on visual gain fields. Proc. Natl. 907 Acad. Sci. U.S.A. 121, e2316608121 (2024).

      However, the manuscript above does not refer explicitly to the involvement of working memory in transaccadic integration of object location in space. Rather, it takes advantage of recent evidence showing how the true location of a visual object is represented in the activity of neurons in primary visual cortex ( A. P. Morris, B. Krekelberg, A stable visual world in primate primary visual cortex. Curr. Biol. 29, 1471-1480.e6 (2019) ). The model hypothesizes that true locations of objects are readily available, and then allocates attention in real-world coordinates, allowing efficient coordination of attention and saccadic eye movements.

      Thank you for clarification. As suggested, we have now included the citation of Morris & Krekelberg (2019) to acknowledge the evidence for stable object locations within the primary visual cortex.

      (2) The authors in the introduction and the title use the terms 'transaccadic memory' and 'spatial working memory'. However, it is not clear whether these can be used interchangeably or are reflecting different constructs.

      Classical measures of visuo-spatial working memory are derived from the Corsi task (or similar), where the location of multiple objects is displayed and subsequently remembered. In such tasks, eye movements and saccades are not generally considered, only memory performance, representing the visuo-spatial span.

      Transaccadic memory tasks are instead explicitly measuring the performance on remembered object locations of features across explicit eye movements, usually using a very limited number of objects (1 or 2, as is the case for the current manuscript).

      While the two constructs share some features, it is not clear whether they represent the same underlying ability or not, especially because in transaccadic tasks, participants are required to perform one or more saccades, thus representing a dual-task case.

      I think the relationship between 'transaccadic memory' and 'spatial working memory' should be clarified in the manuscript.

      Thank you. Yes, we have added this within the Methods - Measurement of saccade cost to clarify that spatial working memory is the broad cognitive construct responsible for short-term maintenance, whereas transsaccadic memory is the specific, dynamic process of remapping representations to maintain stability across eye movements.

      In Methods (p.22):

      “Within this framework, it is important to distinguish between the broad construct of spatial working memory and the specific process of transsaccadic memory. While spatial working memory refers to the general ability to maintain spatial information over short intervals, transsaccadic memory describes the dynamic updating of these representations—termed remapping—to ensure stability across eye movements. Unlike classical 'static' measures of spatial working memory, such as the Corsi block task which focuses on memory span, transsaccadic memory tasks explicitly require the integration of stored visual information with motor signals from intervening saccades. Our paradigm treats transsaccadic updating as a core computational process within spatial working memory, where eye-centred representations are actively reconstructed based on noisy memories of the intervening saccade vectors.”

      (3) In Figure 1, the second row indicates the presentation of item 2. Indeed, in the condition 'saccade-after-item-1', the target in the second row of Figure 1 is displaced, as expected. This clarifies the direction and amplitude of the first saccade requested. However, from Figure 1, it is hard to understand the amplitude and direction of the second requested saccade. I think the figure should be updated, giving a full description of the direction and amplitude of the second saccade as well ('saccade-after-item-2' and 'two-saccades' conditions).

      We agree that making the figure legend more self-contained is beneficial for the reader. While the specific physical parameters and the trial sequence for each condition are detailed in the Results and Methods sections, we have now updated the legend for Figure 1 to explicitly define these details. Specifically, we have clarified that the colour wheel itself served as the target for the second instructed saccade (i.e., the movement from the second fixation cross to the colour wheel location). We have also included the quantitative constraint that all saccade vectors were at least 8.5 degrees of visual angle in amplitude. Given the limited space within a figure legend, we hope these concise additions provide the transparency requested without interrupting the conceptual flow of the diagram.

      Updated Figure 1 legend:

      “Participants were asked to fixate a white cross, wherever it appeared. They had to remember the colour and location of a sequence of two briefly presented coloured squares (Item 1 and 2), each appearing within a white square frame. They then fixated a colour wheel wherever it appeared on the screen, which served as the target for the second instructed saccade (i.e., a movement from the second fixation cross to the colour wheel location). This cued recall of a specific square (Item 1 or Item 2 labelled within the colour wheel). Participants selected the remembered colour on the colour wheel which led to a square of that colour appearing on the screen. They then dragged this square to its remembered location on the screen. Saccadic demands were manipulated by varying the locations of the second frame and the colour wheel, resulting in four conditions in their reliance on retinotopic versus transsaccadic memory: (1) No-Saccade condition providing a baseline measure of within-fixation precision as no eye movements were required. (2) Saccade After Item 1; (3) Saccade After Item 2; (4) Saccades after both items (Two Saccades condition). In all conditions requiring eye movements, saccade vectors were constrained to a minimum amplitude of 8.5° (degrees of visual angle). While the No-Saccade condition isolates retinotopic working memory, conditions (2) to (4) collectively quantify the impact of varying saccadic demands and timings on the maintenance of spatial information, thereby assessing the efficacy of the transsaccadic updating process.”

      (4) The authors write: "Eye tracking analysis confirmed high compliance: participants correctly maintained fixation or executed saccades as instructed on the vast majority of trials (83% {plus minus} 14%). Non-compliant trials were excluded 136 from further analysis." 14% of excluded trials are a substantial fraction of trials, given the task requirements. Is this proportion of excluded trials different between experimental groups, and are experimental groups contributing equally to this proportion?

      We thank the reviewer for pointing this out, and we apologise for the confusion. The 83% trial number was actually across all four cohorts, and all conditions, and it was actually above 90% for YC, EC and even AD, but dropped to 60 ish in PD group.

      We now have conducted a full analysis of compliant trial counts using a mixed ANOVA (4 saccade conditions x 4 cohorts). This analysis revealed a main effect of group (F(3, 80) = 8.06, p < 0.001), which was driven by lower compliance in the PD cohort (mean approx. 25.4 trials per condition) compared to the AD, EC, and YC cohorts (means ranging from 35.8 to 38.9 trials per condition). Crucially, however, the interaction between group and condition was not statistically significant (p = 0.151). This indicates that the relative impact of saccade demands on trial retention was consistent across all four groups.

      Because our primary behavioural measure—the saccade cost—is a within-subject comparison of impairment across conditions, these differences in absolute trial numbers do not introduce a systematic bias into our findings. Furthermore, even with the higher attrition in the PD group, we retained a sufficient number of high-quality trials (minimum mean of ~23 trials in the most demanding condition) to support robust trial-by-trial parameter estimation and valid statistical inference. We have updated the Results and Methods to reflect these details.

      In Results (p4):

      “To mitigate potential confounds, we monitored eye position throughout the experiment. Eye-tracking analysis confirmed high compliance in healthy adults, who followed instructions on the vast majority of trials (Younger Adults: 97.2 ± 5.2 %; Older Adults: 91.3 ± 20.4 %). The mean difference between these groups was negligible, representing just 1.25 trials per condition, and was not statistically significant (t(80) = 0.16, p = 1.000; see more in Methods – Eyetracking data analysis). Non-compliant trials were excluded from all further analyses.”

      In Methods (p27):

      “Eye-tracking analysis confirmed high compliance overall, with participants correctly maintaining fixation or executing saccades on the vast majority of trials (83% across all participants). A mixed ANOVA revealed a main effect of group on trial retention (F(3, 80) = 8.06, p < 0.001, partial η² = 0.23), primarily due to lower compliance in the PD cohort (YC: 97±4%; EC: 91±10%; AD: 95±5%; PD: 63±38%). Importantly, there was no significant interaction between group and saccade condition (F(3.36, 80) = 1.78, p = 0.15, partial η² = 0.008), suggesting that trial attrition was not disproportionately affected by specific task demands in any group.

      We acknowledge that this reduced trial count in the PD group represents a limitation for across-cohort comparison. However, the absolute number of compliant trials in PD group (mean approx. 25 per condition) remained sufficient for robust trial-by-trial parameter estimation. Furthermore, the lack of a significant group-by-condition interaction confirms that the results reported for this cohort remain valid and that our primary finding of a selective spatial memory deficit is robust to these differences in data retention.”

      (5) Modelling

      (a) Degrees of freedom, cross-validation, number of parameters.

      I appreciate the effort in introducing and testing different models. Models of increase in complexity and are based on different assumptions about the main drivers and mechanisms underlying the dependent variable. The models differ in the number of parameters. How are the differences in the number of parameters between models taken into account in the modelling analysis? Is there a cost associated with the extra parameters included in the more complex models?

      (b) Cross-validation and overfitting.

      Overfitting can occur when a model learns the training data but cannot generalize to novel datasets. Cross-validation is one approach that can be used to avoid overfitting. Was cross-validation (or other approaches) implemented in the fitting procedure against overfitting? Otherwise, the inference that can be derived from the modelled parameters can be limited.

      To address your concerns regarding model complexity and overfitting, we would like to clarify our use of Bayesian Model Selection (BMS). Unlike frequentist methods that often rely on cross-validation to assess generalisability, we used random-effects BMS based on the marginal likelihood (model evidence). This approach inherently implements Bayesian Occam’s Razor by integrating out the parameters. Under this framework, the use of the marginal likelihood for model selection provides a mathematically equivalent safeguard to frequentist cross-validation, as it evaluates the model's ability to generalise across the entire parameter space rather than just finding a maximum likelihood fit for the training data. Thus, models are penalised not just for the absolute number of parameters, but for their overall functional flexibility. A more complex model is only preferred if the improvement in model fit is substantial enough to outweigh this inherent penalty. The emergence of Model 7 as the winner (Bayes Factor = 6.11 against the next best alternative) confirms that its additional complexity is statistically justified.

      Furthermore, in this study we provided an external validation of these recovered parameters by demonstrating that they explain 62% of the variance in an independent, real-world, clinical task (ROCF copy). This empirical evidence confirms that our model captures robust mechanistic phenotypes rather than idiosyncratic noise. We have updated the Results and Discussion to explicitly state these.

      In Results: (p10)

      “We used random-effects Bayesian model selection to identify the most plausible generative model. This process relies on the marginal likelihood (model evidence), which inherently balances model fit against complexity—a principle often referred to as Occam’s razor.[26–28]”

      In Discussion: (p17)

      “Importantly, the risk of overfitting is mitigated by the Bayesian Model Selection framework; by utilising the marginal likelihood for model comparison, the procedure inherently penalises excessive model complexity and promotes generalisability.[26–28,43] This generalisability was further evidenced by the model's ability to predict performance on the independent ROCF task, confirming that these parameters represent robust mechanistic phenotypes rather than idiosyncratic fits to the initial dataset.”

      (6) n. of participants.

      (a) The authors write the following: "A total of healthy volunteers (21 young adults, mean age = 24.1 years; 21 older adults, mean age = 72.4 years) participated in this study. Their demographics are shown in Table 1. All participants were recruited locally in Oxford." However, Table 1 reports the data from more than 80 participants, divided into 4 groups. Details about the PD and AD groups are missing. Please clarify.

      We apologize for this lack of clarity in the text. We have rewrote and expand the “Participants” section and corrected Table 2 in the Methods section to reflect the correct number of participants.

      In Methods (p20):

      “A total of 87 participants completed the study: 21 young healthy adults (YC), 21 older healthy adults (EC), 23 patients with Parkinson’s disease (PD), and 22 patients with Alzheimer’s disease (AD). Their demographic and clinical details are summarised in Table 2. Initially, 90 participants were recruited (22 YC, 21 EC, 25 PD, 22 AD); however, three individuals (1 YC and 2 PD) were excluded from all analyses due to technical issues during data acquisition.

      All participants were recruited locally in Oxford, UK. None were professional artists, had a history of psychiatric illness, or were taking psychoactive medications (excluding standard dopamine replacement therapy for PD patients). Young participants were recruited via the University of Oxford Department of Experimental Psychology recruitment system. Older healthy volunteers (all >50 years of age) were recruited from the Oxford Dementia and Ageing Research (OxDARE) database.

      Patients with PD were recruited from specialist clinics in Oxfordshire. All had a clinical diagnosis of idiopathic Parkinson's disease and no history of other major neurological or psychiatric conditions. While specific dosages of dopamine replacement therapy (e.g., levodopa equivalent doses) were not systematically recorded, all patients were tested while on their regular medication regimen ('ON' state).

      Patients with PD were recruited from clinics in the Oxfordshire area. All had a clinical diagnosis of idiopathic Parkinson’s disease and no history of other major neurological or psychiatric illnesses. While all patients were tested in their regular medication ‘ON’ state, the specific pharmacological profiles—including the exact types of medication (e.g., levodopa, dopamine agonists, or combinations) and dosages—were not systematically recorded. The disease duration and PD severity were also un-recorded for this study.

      Patients with AD were recruited from the Cognitive Disorders Clinic at the John Radcliffe Hospital, Oxford, UK. All AD participants presented with a progressive, multidomain, predominantly amnestic cognitive impairment. Clinical diagnoses were supported by structural MRI and FDG-PET imaging consistent with a clinical diagnosis of AD dementia (e.g., temporo-parietal atrophy and hypometabolism).[70] All neuroimaging was reviewed independently by two senior neurologists (S.T. and M.H.).

      Global cognitive function was assessed using the Addenbrooke’s Cognitive Examination-III (ACE-III).[71] All healthy participants scored above the standard cut-off of 88, with the exception of one elderly participant who scored 85. In the PD group, two participants scored below the cut-off (85 and 79). In the AD group, six participants scored above 88; these individuals were included based on robust clinical and radiological evidence of AD pathology rather than their ACE-III score alone.”

      (b) As modelling results rely heavily on the quality of eye movements and eye traces, I believe it is necessary to report details about eye movement calibration quality and eye traces quality for the 4 experimental groups, as noisier data could be expected from naïve and possibly older participants, especially in case of clinical conditions. Potential differences in quality between groups should be discussed in light of the results obtained and whether these could contribute to the observed patterns.

      Thank you for pointing this out. We have revised the Methods about how calibration was done:

      (p27) “Prior to the experiment, a standard nine-point calibration and validation procedure was performed. Participants were instructed to fixate a small black circle with a white centre (0.5 degrees) as it appeared sequentially at nine points forming a 3 x 3 grid across the screen. Calibration was accepted only if the mean validation error was below 0.5 degrees and the maximum error at any single point was below 1.0 degree. If these criteria were not met, or if the experimenter noticed significant gaze drift between blocks, the calibration procedure was repeated. This calibration ensured high spatial accuracy across the entire display area, facilitating the precise monitoring of fixations on item frames and saccadic movements to the response colour wheel.”

      Moreover, as detailed in our response to Point 4, while the PD group exhibited lower compliance, there was no interaction between group and saccade condition for compliance (p = 0.151). This confirms that any noise or trial attrition was distributed evenly across experimental conditions. Consequently, the observed "saccade cost" (the difference in error between conditions) is not an artefact of unequal noise but represents a genuine mechanistic impairment in spatial updating. We have updated the Methods to clarify this distinction.

      Furthermore, our Bayesian framework explicitly estimates precision (random noise) as a distinct parameter from updating cost (saccade cost). This allows the model to partition the variance: even if a clinical group is "noisier" overall, this is captured by the precision parameter, ensuring it does not inflate the specific estimate of saccade-driven memory impairment.

      (7) Figure 5. I suggest reporting these results using boxplots instead of barplots, as the former gives a better overview of the distributions.

      We appreciate the suggestion to use boxplots to better illustrate data distributions. However, we have chosen to retain the current bar plot format due to the visual and statistical complexity of our 4 x 4 x 2 experimental design. Figure 5 represents 16 distinct distributions across four groups and four conditions for both location and colour measures; employing boxplots/violins for this density of data would significantly increase visual clutter and make the figure difficult to parse.

      Furthermore, the primary objective of this figure is to reflect the statistical analysis and illustrate group differences in overall performance and highlight the specific finding that patients with AD were significantly more impaired across all conditions compared to YC, EC, and PD groups. Our statistical focus remains on the mean effects—specifically the significant main effect of group (F(3, 318) = 59.71, p < 0.001) and the critical null-interaction between group and condition (p = 0.90). The error measure most relevant to these comparisons is the standard error of the mean (SEM), rather than the interquartile range (IQR). We think that bar plots provide the most straightforward and scannable representation of these mean differences and the consistent pattern of decay across cohorts for the final manuscript layout.

      To address the reviewer’s request for distributional transparency, we have provided a version of Figure 5 using grouped boxplots in the supplementary material (Supplementary figure 2). We note, however, that the spread of raw data points in these plots does not directly reflect the variance associated with our within-subject statistical comparisons.

      (8) Results specificity, trans-saccadic integration and ROCF. The authors demonstrate that the derived model parameters account for a significant amount of variability in ROCF performance across the experimental groups tested (Figure 8A). However, it remains unclear how specific the modelling results are with respect to the ROCF.

      The ROCF is generally interpreted as a measure of constructional ability. Nevertheless, as with any cognitive test, performance can also be influenced by more general, non-specific abilities that contribute broadly to test success. To more clearly link the specificity between modelling results and constructional ability, it would be helpful to include a test measure for which the model parameters would not be expected to explain performance, for example, a verbal working memory task.

      I am not necessarily suggesting that new data should be collected. However, I believe that the issue of specificity should be acknowledged and discussed as a potential limitation in the current context.

      We appreciate this important point regarding the discriminant validity of our findings. We agree that cognitive performance in clinical populations is often influenced by a general "g-factor" or non-specific executive decline. However, we chose the ROCF Copy task specifically because it is a hallmark clinical measure of constructional ability that effectively serves as a real-world transsaccadic task, requiring participants to integrate spatial information across hundreds of saccades between the model figure and the drawing surface.

      To address the reviewer’s concern regarding specificity, we leveraged the fact that all participants completed the ACE-III, which includes a dedicated verbal memory component (the ACE Memory subscale). We conducted a partial correlation analysis and found that the relationship between transsaccadic working memory and ROCF copy performance remains highly significant (rho = -0.46, p < 0.001), even after controlling for age, education, and the ACE-III Memory subscale score. This suggests that the link between transsaccadic updating and constructional ability is mechanistically specific rather than a byproduct of global cognitive impairment. We have substantially revised the Discussion to highlight this link and the supporting statistical evidence.

      We first updated the last paragraph of Introduction:

      “Finally, by linking these mechanistic parameters to a standard clinical measure of constructional ability (the Rey-Osterrieth Complex Figure task), we demonstrate that transsaccadic updating represents a core computational phenotype underpinning real-world visuospatial construction in both health and neurodegeneration.”

      The new section in Discussion highlighting the ROCF copy link:

      “Importantly, our computational framework establishes a direct mechanistic link between trassaccadic updating and real-world constructional ability. Specifically, higher saccade and angular encoding errors contribute to poorer ROCF copy scores. By mapping these mechanistic estimates onto clinical scores, we found that the parameters derived from our winning model explain approximately 62% of the variance in constructional performance across groups. These findings suggest that the computational parameters identified in the LOCUS task represent core phenotypes of visuospatial ability, providing a mechanistic bridge between basic cognitive theory and clinical presentation.

      This relationship provides novel insights into the cognitive processes underlying drawing, specifically highlighting the role of transsaccadic working memory. Previous research has primarily focused on the roles of fine motor control and eye-hand coordination in this skill.[4,50–55] This is partly because of consistent failure to find a strong relation between traditional memory measures and copying ability.[4,31] For instance, common measures of working memory, such as digit span and Corsi block tasks, do not directly predict ROCF copying performance.[31,56] Furthermore, in patients with constructional apraxia, these memory performance often remain relatively preserved despite significant drawing impairments.[56–58] In literature, this lack of association has often been attributed to “deictic” visual-sampling strategies, characterised by frequent eye movements that treat the environment as an external memory buffer, thereby minimising the need to maintain a detailed internal representation.[4,59] In a real-world copying task, the ROCF requires a high volume of saccades, making it uniquely sensitive to the precision of the dynamic remapping signals identified here. Recent eye-tracking evidence confirms that patients with AD exhibit significantly more saccades and longer fixations during figure copying compared to controls, potentially as a compensatory response to trassaccadic working memory constraints.[56] This high-frequency sampling—averaging between 150 and 260 saccades for AD patients compared to approximately 100 for healthy controls—renders the task highly dependent on the precision of dynamic remapping signals.[56] We also found that the relationship between transsaccadic working memory and ROCF performance remains highly significant (rho = -0.46, p < 0.001), even after controlling for age, education, and ACE-III Memory subscore. Consequently, transsaccadic updating may represent a discrete computational phenotype required for visuomotor control, rather than a non-specific proxy for global cognitive decline.[58]

      In other words, even when visual information is readily available in the world, the act of drawing performance depends critically on working memory across saccades. This reveals a fundamental computational trade-off: while active sampling strategies (characterised with frequent eye-hand movements) effectively reduce the load on capacity-limited working memory, they simultaneously increase the demand for precise spatial updating across eye movements. By treating the external world as an "outside" memory buffer, the brain minimises the volume of information it must hold internally, but it becomes entirely dependent on the reliability with which that information is remapped after each eye movement. This perspective aligns with, rather contradicts, the traditional view of active sampling, which posits that individuals adapt their gaze and memory strategies based on specific task demands.[3,60] Furthermore, this perspective provides a mechanistic framework for understanding constructional apraxia; in these clinical populations, the impairment may not lie in a reduced memory "span," but rather in the cumulative noise introduced by the constant spatial remapping required during the copying process.[58,61]

      Beyond constructional ability, these findings suggest that the primary evolutionary utility of high-resolution spatial remapping lies in the service of action rather than perception. While spatial remapping is often invoked to explain perceptual stability,[11–13,15] the necessity of high-resolution transsaccadic memory for basic visual perception is debated.[13,62–64] A prevailing view suggests that detailed internal models are unnecessary for perception, given the continuous availability of visual information in the external world.[13,44] Our findings support an alternative perspective, aligning with the proposal that high-resolution transsaccadic memory primarily serves action rather than perception.[13] This is consistent with the need for precise localisation in eye-hand coordination tasks such as pointing or grasping.[65] Even when unaware of intrasaccadic target displacements, individuals rapidly adjust their reaching movements, suggesting direct access of the motor system to remapping signals.[66] Further support comes from evidence that pointing to remembered locations is biased by changes in eye position,[67] and that remapping neurons reside within the dorsal “action” visual pathway, rather than the ventral “perception” visual pathway.[13,68,69] By demonstrating a strong link between transsaccadic working memory and drawing (a complex fine motor skill), our findings suggest that precise visual working memory across eye movements plays an important role in complex fine motor control.”

      We are deeply grateful to the reviewers for their meticulous reading of our manuscript and for the constructive feedback provided throughout this process. Your insights have significantly enhanced the clarity and rigour of our work.

      In addition to the changes requested by the reviewers, we wish to acknowledge a reporting error identified during the revision process. In the original Results section, the repeated measures ANOVA statistics for YC included Greenhouse-Geisser corrections, and the between-subjects degrees of freedom were incorrectly reported as within-subjects residuals. Upon re-evaluation of the data, we confirmed that the assumption of sphericity was not violated; therefore, we have removed the unnecessary Greenhouse-Geisser corrections and corrected the degrees of freedom throughout the Results and Methods sections. We have ensured that these statistical updates are reflected accurately in the revised manuscript and that they do not alter the significance or interpretation of any of our primary findings.

      We hope that these revisions address all the concerns raised and provide a more robust account of our findings. We look forward to your further assessment of our work.

    1. 10.5. Design Analysis: Accessibility# We want to provide you, the reader, a chance to explore accessibility more. In this activity you will be looking at a social media site on your device (e.g., your phone or computer). We will again follow the five step CIDER method (Critique, Imagine, Design, Expand, Repeat). So open a social media site on your device (the website or app may have additional accessibility settings, but don’t use those for now, just consider how it works as it is currently). Then do the following (preferably on paper or in a blank computer document): 10.5.1. Critique (3-5 minutes, by yourself):# What assumptions do the site and your device make about individuals or groups using social media, which might not be true or might cause problems? List as many as you can think of (bullet points encouraged). 10.5.2. Imagine (2-3 minutes, by yourself):# Select one of the above assumptions that you think is important to address. Then write a 1-2 sentence scenario where a user face difficulties because of the assumption you selected. This represents one way the design could exclude certain users. 10.5.3. Design (3-5 minutes, by yourself):# Brainstorm ways to change the site or your device to avoid the scenario you wrote above. List as many different kinds of potential solutions you can think of – aim for ten or more (bullet points encouraged). 10.5.4. Expand (5-10 minutes, with others):# Combine your list of critiques with someone else’s (or if possible, have a whole class combine theirs). 10.5.5. Repeat the Imagine and Design Tasks:# Select another assumption from the list above that you think is important to address. Make sure to choose a different assumption than you used before. Choose one that you didn’t come up with yourself, if possible. Repeat the Imagine and Design steps. 10.5.6. Explore accessibility settings# Now, try to find the accessibility settings on the social media site and on your device. For each setting you see, try to come up with what disabilities that setting would be beneficial for (there may be multiple).

      This activity is a really effective way to make accessibility feel concrete instead of abstract. By starting with critique and assumptions, it highlights how many “default” design choices silently exclude users before accessibility settings are even considered. I especially like how the Imagine and Design steps force you to think through a specific user’s experience and then brainstorm multiple solutions, rather than jumping straight to a single fix. Ending with exploring existing accessibility settings also reinforces that accessibility is often an afterthought in design, even though it should be part of the core system from the beginning.

    1. 10.2. Accessible Design# There are several ways of managing disabilities. All of these ways of managing disabilities might be appropriate at different times for different situations. 10.2.1. Coping Strategies# Those with disabilities often find ways to cope with their disability, that is, find ways to work around difficulties they encounter and seek out places and strategies that work for them (whether realizing they have a disability or not). Additionally, people with disabilities might change their behavior (whether intentionally or not) to hide the fact that they have a disability, which is called masking and may take a mental or physical toll on the person masking, which others around them won’t realize. For example, kids who are nearsighted and don’t realize their ability to see is different from other kids will often seek out seats at the front of classrooms where they can see better. As for us two authors, we both have ADHD and were drawn to PhD programs where our tendency to hyperfocus on following our curiosity was rewarded (though executive dysfunction with finishing projects created challenges)1. This way of managing disabilities puts the burden fully on disabled people to manage their disability in a world that was not designed for them, trying to fit in with “normal” people. 10.2.2. Modifying the Person# Another way of managing disabilities is assistive technology, which is something that helps a disabled person act as though they were not disabled. In other words, it is something that helps a disabled person become more “normal” (according to whatever a society’s assumptions are). For example: Glasses help people with near-sightedness see in the same way that people with “normal” vision do Walkers and wheelchairs can help some disabled people move around closer to the way “normal” people can (though stairs can still be a problem) A spoon might automatically balance itself when held by someone whose hands shake Stimulants (e.g., caffeine, Adderall) can increase executive function in people with ADHD, so they can plan and complete tasks more like how neurotypical people do. Assistive technologies give tools to disabled people to help them become more “normal.” So the disabled person becomes able to move through a world that was not designed for them. But there is still an expectation that disabled people must become more “normal,” and often these assistive technologies are very expensive. Additionally, attempts to make disabled people (or people with other differences) act “normal” can be abusive, such as Applied Behavior Analysis (ABA) therapy for autistic people, or “Gay Conversion Therapy.” 10.2.3. Making an environment work for all# Another strategy for managing disability is to use Universal Design, which originated in architecture. In universal design, the goal is to make environments and buildings have options so that there is a way for everyone to use it2. For example, a building with stairs might also have ramps and elevators, so people with different mobility needs (e.g., people with wheelchairs, baby strollers, or luggage) can access each area. In the elevators the buttons might be at a height that both short and tall people can reach. The elevator buttons might have labels both drawn (for people who can see them) and in braille (for people who cannot), and the ground floor button may be marked with a star, so that even those who cannot read can at least choose the ground floor. In this way of managing disabilities, the burden is put on the designers to make sure the environment works for everyone, though disabled people might need to go out of their way to access features of the environment. 10.2.4. Making a tool adapt to users# When creating computer programs, programmers can do things that aren’t possible with architecture (where Universal Design came out of), that is: programs can change how they work for each individual user. All people (including disabled people) have different abilities, and making a system that can modify how it runs to match the abilities a user has is called Ability based design. For example, a phone might detect that the user has gone from a dark to a light environment, and might automatically change the phone brightness or color scheme to be easier to read. Or a computer program might detect that a user’s hands tremble when they are trying to select something on the screen, and the computer might change the text size, or try to guess the intended selection. In this way of managing disabilities, the burden is put on the computer programmers and designers to detect and adapt to the disabled person. 10.2.5. Are things getting better?# We could look at inventions of new accessible technologies and think the world is getting better for disabled people. But in reality, it is much more complicated. Some new technologies make improvements for some people with some disabilities, but other new technologies are continually being made in ways that are not accessible. And, in general, cultures shift in many ways all the time, making things better or worse for different disabled people. 1 We’ve also noticed many youtube video essayists have mentioned having ADHD. This is perhaps another job that attracts those who tend to hyperfocus on whatever topic grabbed their attention, and then after releasing their video, move on to something completely different. 2 Universal Design has taken some criticism. Some have updated it, such as in acknowledging that different people’s needs may be contradictory, and others have replaced it with frameworks like Inclusive Design..

      This section does a great job comparing different ways of managing disability and, more importantly, showing how each approach places responsibility on different people. Coping strategies and modifying the person often shift the burden onto disabled individuals, asking them to adapt or appear “normal” in environments that were not designed for them. In contrast, universal design and ability-based design move that responsibility to designers and programmers, emphasizing systems that work for a wider range of users. I also appreciated the final point that accessibility is not a linear story of progress—new technologies can improve access for some people while creating new barriers for others, making accessibility an ongoing design challenge rather than a solved problem.

    1. Living an Examined Life The Book Brigade talks to Jungian analyst James Hollis, Ph.D. Posted February 15, 2018 Share Tweet Share on Bluesky Share Email Source: Used with permission of author James Hollis. What life demands of us changes somewhere along the way. The second half of the journey is when we truly become grown up—and must own up to responsibility for the way things are turning out. What led you to write your book on wisdom for the second half of life? Don’t people in the second half of life have enough wisdom to guide their lives? The first half of life is characterized by either serving or running from the instructions, examples, and admonitions we acquire from family and culture during the formative days of our operational systems. So many of the messages from our environment are internalized and become unconscious, reflexive compliances or rejections that most of us live provisional lives, lives in service to what shaped us during our provisional conclusions about self and world. We have much information, even knowledge, but little wisdom regarding the power of these influences. And what we don’t know will in fact show up in our lives and hit us in the face. What is the demarcation line for the second half of the journey: How does one know one is on that part of the journey? The “second half” of our journey is not a chronological moment but a psychological stage of awareness. Usually one does not begin to become conscious of the magnitude of these internalized messages until one is stunned into reflection upon them. For some this occurs during a divorce, an inexplicable loss of energy for one’s tasks, in an anxiety that arrives in “the hour of the wolf,” a depression, a loss of job, or children, or one’s role in life. If one is not enquiring, “Who am I apart from my history and roles,” good or bad as they may be, then such a person is much more likely to be living on automatic pilot, serving archaic stimulus/response demands. What is an examined life? What needs to be examined, and why? The examined life, as Socrates articulated millennia ago, entails looking into the root causes of my behaviors, and the patterns and consequences I am piling up. If I am not doing that, then I am most likely living very unconsciously and very reflexively. I might therefore be living someone else’s life, someone else’s set of priorities, or running from them. Either way, I am living inauthenticly, and the psyche will respond by intensifying the pathology. What becomes different in the second half? How do you define “growing up”? In the “second half,” I become aware that I am the only one present in that long-running soap opera I call my life and thus I may bear some accountability for how it is turning out. As long as I persist in blaming others, I continue to remain dependent and avoidant and a reluctant player in the unfolding of my journey. From your own experience and that of your clients, what do you find it takes to feel “grown up”? As we all know, there are many people in big bodies and big roles in life who are still governed by their unaddressed infantile fears, compensations, and avoidances. Growing up means full accountability above all things: “I alone am accountable for my choices and how my life is unfolding.” I have to ask more rigorously: “where is this choice coming from in me? What pattern do I see in my responses? Where is fear making choices for me?” Growing up means attaining personal authority over received authority, and having the courage to live it with consistency. article continues after advertisement On what matters do most adults get stuck, in your experience? I am fond of saying of psychological dilemmas, “it is not about what it is about.” Why do we get stuck? How can it be that we so easily identify such marshy zones in our lives? We typically fault ourselves for lacking sufficient will power to get unstuck. But if we have sufficient will, what is the problem? The idea that stuckness is really about something else suggests that we have to ask what deep, deep anxiety or threat will arise from our getting unstuck. If we are ever to get unstuck, we have to ferret out what archaic anxiety we will have to take on to move forward. For example, is the deeply buried anxiety the fear of being alone, forsaken by others, or is it the fear of some potential conflict with others? Either has the power to shut down intentionality and resolve. What does your Jungian background contribute to a perspective on aging? Many decades ago, Jung differentiated the two major stages of life, with many sub-passages within each. The first is about ego building. What do I need to learn, do, risk to step into the world—the world of relationship, the world of work, the world of adult responsibilities? But somewhere else we have another appointment with ourselves, in which we ask other questions: What is my life about, really? What do I need to do to live in good faith with my own soul? In the first half of life, we are ego-bound to ask, What does the world want of me, and how do I meet that demand? In the second half of life, we have a different question: What does the soul ask of me. (“Soul” is, of course, a metaphor for what is most truly us, as opposed to those thousand, thousand adaptations the world asks of us). Drawing on Jung, you hold that we rarely solve problems but can outgrow them; how does one do that? It is naïve to think we leave our history, with its primal promptings, behind. They never go away, but where they once dominated ego-consciousness and directed our choices, they later become only noisome advisors. We have to decide who these archaic counselors are, and ask ourselves what our relationship to our own soul also asks of us. And out of that engagement ego-consciousness has to make its most courageous choice. Wisdom Essential Reads Tohu v’Bohu: The Void Before Creation 5 Traits of Wisdom What do you mean by choosing enlargement? In life’s many junctures of choice we all have to decide this simple, challenging question: Does this path make me larger, or smaller? We almost always know the answer quickly. Then the summons is to choose the larger, however intimidating it may be, or we live shallow, fugitive lives. article continues after advertisement If you had one piece of advice for older adults, what would it be? I would say to them, as I say to myself as an old person: Whatever wishes to grow within you—a curiosity, a talent, an interest—is life seeking its expression through you. Our old desire for comfort, even happiness, may prove an impediment. We are here a very short time. Let us make it as luminous and as meaningful as we can. Time to stop being afraid, and time to show up as yourself. And what would you want to tell younger people so that they might approach all of life in a more seamless way? I am asked all the time by well-meaning parents how they might spare their children their parents’ heartaches. They can’t. We all have to walk into the gigantic necessary mistakes of the first half of life, fall on our faces, and then get up and begin to take life on in the light of what we need to learn for ourselves. We all have to find an internal source of guidance that we can trust and that always knows what is right for us, and to live it in the world with as much courage and fidelity as one can. That is not something a young person is ready, or capable, of doing—yet. About THE AUTHOR SPEAKS: Selected authors, in their own words, reveal the story behind the story. Authors are featured thanks to promotional placement by their publishing houses. To purchase this book, visit: Living an Examined Life Source: Used with permission of author James Hollis. Share Tweet Share on Bluesky Share Email advertisement if (!window.ptAdSlots || window.ptAdSlots.length === 0) { window.ptAdSlots = []; } window.ptAdSlots.push('div-gpt-ad-1424993595349-0') About the Author Selected authors, in their own words, reveal the story behind the story. Authors are interviewed thanks to promotional placement by their publishers.
    1. Okay, now that we've seen how some of the modern cryptographic techniques work. Let's see how they work together to make our internet secure. Securing the internet involves making the https protocol and the secure socket level protocol (ssl) secure. You've all familiar with https. This is the protocol you use when, for example, you want to give Amazon your credit card number so that you can buy a book or a movie. The secure socket level is a transport level protocol that is used when the client and server want to communicate through encrypted messages. So, both of these need be made secure And what does that mean? It means two things: that the messages can be sent securely, meaning encrypted and secondly, that the identity of the server can be trusted. When we think we're communicating with Amazon, we want to make sure we're communicating with Amazon and not some rogue site. All browsers and web servers come with a suite of both symmetric and asymmetric ciphers (public key). They also use what are known as digital certificates provided by certificate authorities that enable them to confirm the identity to confirm the identity of (trusted sites, such as Google, Amazon, etc.) servers and other computers on the internet? We're going to see how all this works together.

      Let's begin with a handshake that takes place whenever you request, or whenever your browser requests a secure session with a server. So, this is your browser on the left running on your laptop or your desktop computer (or even a mobile device: phones, tablets, etc. since they are smaller, confined computers). It makes a secure request to some server, using the https protocol to this server. The first thing the server does is it responds to the client by sending an x509 certificate, that's a standard certificate containing its public key. The client takes this certificate and uses one of its digital certificates that it has built into it to authenticate that the server really is who it says it is, that the server is Amazon. It also uses the certificate authorities information to confirm that the public key that was sent does belong to Amazon. So, in other words, it can be assured that when it sends an encrypted message, now back to the server that it's sending it to Amazon, and then only Amazon can read the the message. Given that once the client authenticates the server's identity and public key, it uses the publicly key to encrypt a randomly generated symmetric key. The client generates this internally encrypts it in the servers publicly and sends it back to the server. The server, of course, then uses its private key to decrypt the symmetric key. Now, at this point, both the client and server are sharing a symmetric key. And from then on, they can communicate in encrypted messages using that shared symmetric key. All the rest of the traffic between them during this session is done encrypted using that symmetric key. Now, why do they use both public key and symmetric keys in this handshake? Well, the reason is that they use the public key for exchanging the symmetric key. And they use the symmetric key for the actual encryption of the data that they're sending back and forth. And the reason for is, this is simply that symmetric key cryptography is much more efficient than public key cryptography. So, this saves time in terms of the traffic that goes on back and forth between the client and the server. Slide 87

      Now, what role do the certificate authorities play? Well, first of all, a Certificate Authority is an entity like a corporation or a foundation that issues digital certificates. These certify the ownership of the public keys, so these certificate authorities need to do whatever it takes including maybe visiting the mem, visiting the organizations that say that may that create these public keys to determine that the public key really is what it says, (example) it is the public key of Google or the public key of Amazon. And the fact that they are trusted third parties, these authorities is what enables the browsers and the servers to trust them. They don't have any stake in the game other than authenticating that these public keys really do belong to who they say they belong to. So, commercial certificate authorities charge money to organizations to create browsers and so forth, and they will automatically provide a set of these certificates that are built into the browsers. For example, Mozilla maintains a list of at least 57 different trusted certificate authority corresponding certificates built right into its software.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      We are grateful for the reviewers' constructive comments and suggestions, which contributed to improving our manuscript. We are pleased to see that our work was described as an "interesting manuscript in which a lot of work has been undertaken". We are also encouraged by the fact that the experiments were considered "on the whole well done, carefully documented, and support most of the conclusions drawn," and that our findings were viewed as providing "mechanistic insight into how HNRNPK modulates prion propagation" and potentially offering "new mechanical insight of hnRNPK function and its interaction with TFAP2C."

      We conducted several new experiments and revised specific sections of the manuscript, as detailed below in the point-by-point response in this letter.

      Referee #1

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The paper by Sellitto describes studies to determine the mechanism by which hnRNPK modulates the propagation of prion. The authors use cell models lacking HNRNPK, which is lethal, in a CRISPR screen to identify genes that suppress lethality. Based on this screen to 2 different cell lines, gene termed Tfap2C emerged as a candidate for interaction with HNRNPK. The show that Tfap2C counteracts the actions of HNRNPK with respect to prion propagation. Cells lacking HNRNPK show increased PrPSc levels. Overexpression of Tfap2C suppesses PrPSc levels. These effects on PrPSc are independent of PrPC levels. By RNAseq analysis, the authors hone in on metabolic pathways regulated by HNRPNK and Tfap2C, then follow the data to autophagy regulation by mTor. Ultimately, the authors show that short-term treatments of these cell models with mTor inhibitors causes increased accumulation of PrPSc. The authors conclude that the loss of HNRNPK leads to a reduced energy metabolism causing mTor inhibition, which is reduces translation by dephosphorylation of S6

      Major comments:

      1) Fig H and I, Fig 3L. The interaction between Tfap2C and HNRNPK is pretty weak. The interaction may not be consequential. The experiment seems to be well controlled, yielding limited interaction. The co-ip was done in PBS with no detergent. The authors indicate that the cells were mechanically disrupted. Since both of these are DNA binding proteins, is it possible that the observed interaction is due to the proximity on DNA that is linking the 2 proteins, including a DNAase treatment would clarify.

      Response: We agree that the observed co-IP between Tfap2c and hnRNP K is weak (previous Fig. 2H-I, Supp. Fig. 3L now shifted in Supp. Fig. 4C-E), and we have now highlighted this in the relevant section of the manuscript to reflect this observation better.

      Importantly, the co-IP was performed using endogenous proteins without overexpression or tagging, which can sometimes artificially enhance protein-protein interactions. However, we acknowledge that the use of a detergent-free lysis buffer and mechanical disruption alone may have limited nuclear protein extraction and solubilization, potentially contributing to the low co-IP signal.

      To address the reviewer's concerns and clarify whether the observed interaction could be DNA-mediated, we repeated the co-IP experiments under low-detergent conditions and included benzonase nuclease treatment to digest nucleic acids (Fig. 2H-I). DNA digestion was confirmed by agarose gel electrophoresis (Supp. Fig. 4F-G). Additionally, we performed the reciprocal IPs using both hnRNP K and Tfap2c antibodies (Fig. 2H-I). Although the level of co-immunoprecipitation remains modest, these updated experiments continue to demonstrate a specific co-immunoprecipitation between Tfap2c and hnRNP K, independent of DNA bridging. These additional controls and experimental refinements strengthen the validity of our findings. These results are also attached here for your convenience.

      2) Supplemental Fig 5B - The western blot images for pAMPK don't really look like a 2 fold increase in phosphorylation in HNRNPK deletion.

      Response: We thank the reviewer for raising this point. We re-examined the original pAMPK western blot (previously Supp. Fig. 5B; now presented as Supp. Fig. 6B) and confirmed the reported results. We note that the overall loading is not perfectly uniform across lanes (as suggested by the actin signal), which may affect the visual impression of band intensity. However, the phosphorylation change reported in the manuscript is based on the pAMPK/total AMPK ratio, which accounts for differences in AMPK expression and accurately reflects relative phosphorylation levels. To further address this concern, we performed three additional independent experiments. These new data reproduce the increase in pAMPK/AMPK upon HNRNPK deletion and are now included in the revised Supplementary Fig. 6B, together with the updated quantification. The new blot and the quantification are also attached here for your convenience.

      3) Fig. 5A - I don't think it is proper to do statistics on an of 2.

      Response: We believe the reviewer's comment refers to Fig. 5B, as Fig. 5A already has sufficient replication. We have now added two additional replicates, bringing the total to four. The updated statistical analysis corroborates our initial results. The new quantification is provided in the revised manuscript (Fig. 5B) along with the new blot (Supp. Fig. 6C). Both data are also attached here for your convenience.

      4) Fig 6D. The data look a bit more complicated than described in the text. At 7 days, compared to 2 days, it looks like there is a decrease in % cells positive for 6D11. Is there clearance of PrPSc or proliferation of un-infected cells?

      Response: We have now reworded our text in the results paragraph as follows:

      "These data show that TFAP2C overexpression and HNRNPK downregulation bidirectionally regulate prion levels in cell culture."

      We have now also included the following comments in the discussion section:

      "However, prion propagation relies on a combination of intracellular PrPSc seeding and amplification, as well as intercellular spread, which together contribute to the maintenance and expansion of infected cells within the cultured population. In this study, we were limited in our ability to dissect which specific steps of the prion life cycle are affected by TFAP2C. We also cannot fully exclude the possibility that TFAP2C overexpression influenced the relative proliferation of prion-infected versus uninfected cells in the PG127-infected HovL culture, thereby contributing to the observed reduction in the percentage of 6D11+ cells and overall 6D11+ fluorescence. However, we did not observe any signs of cell death, growth impairment, or increased proliferation under TFAP2C overexpression in PG127-infected HovL cells compared to NBH controls (data not shown). This suggests that a negative selective pressure on infected cells or a proliferative advantage of uninfected cells is unlikely in this context".

      5) The authors might consider a different order of presenting the data. Fig 6 could follow Fig. 2 before the mechanistic studies in Figs 3-5.

      Response: We believe that the current order of presenting the data is more appropriate. The first part of the manuscript focuses on the genetic and functional interactions between hnRNP K and its partners, particularly TFAP2C, which is a critical point for understanding the broader context before delving into the mechanistic studies involving prion-infected cells.

      6) The authors use SEM throughout the paper and while this is often used, there has been some interest in using StdDev to show the full scope of variability.

      Response: We chose to use SEM as it reflects the precision of the mean, which is central to our statistical comparisons. As the reviewer notes, this is a common and appropriate practice. To address variability, almost all graphs already include individual data points, which provide a direct visual representation of data spread. To further enhance clarity, we have now included StdDev in the Supplementary Source Data table of the revised manuscript.

      Discussion:

      The discrepancy between short-term and long-term treatments with mTor inhibitors is only briefly mentioned with a bit of a hand-waving explanation. The authors may need a better explanation.

      Response: We have now integrated a more detailed explanation in the discussion section of the revised manuscript as follows:

      "Previous studies showed that mTORC1/2 inhibition and autophagy activation generally reduce, rather than increase, PrPSc aggregation (79, 80). The reason for this discrepancy remains unclear and may be multifactorial. First, most prior studies were based on long-term mTOR inhibition, whereas our work examined acute inhibition, mimicking the time frame of HNRNPK and TFAP2C manipulation. Acute inhibition may trigger transient metabolic or signaling shifts that differ from adaptive changes associated with mTOR chronic inhibition, potentially overriding autophagy's effects on prion propagation. Additionally, while previous works were primarily conducted in murine in vivo models, our study focused on a human cell system propagating ovine prions. Differences in species background, model complexity (e.g., interactions between different cell types), and prion strain variability, as certain strains exhibit distinct responses to autophagy and mTOR modulation (https://doi.org/10.1371/journal.pone.0137958), likely contributed to the observed differences".

      Minor comments:

      Page 12 - no mention of chloroquine in the text or related data.

      Page 12 - Supp. Fig. E - should be 5E

      Response: We thank the reviewer for pointing this out. We have now better highlighted the use of chloroquine in Fig. 5B (see reviewer #1 - Point 3 - Major comments) and in the text as follows:

      "Furthermore, in the presence of chloroquine, LC3-II levels rose almost proportionally across all conditions (Fig. 5B), suggesting that the effects of HNRNPK and TFAP2C on autophagy occur at the level of autophagosome formation, rather than autophagosome-lysosome fusion and degradation."

      We have corrected the reference to Supp. Fig. 5E.

      Reviewer #1 (Significance (Required)):

      The study provides mechanistic insight into how HNRNPK modulates prion propagation. The paper is limited to cell models, and the authors note that long term treatment with mTor inhibitors reduced PrPSc levels in an in vivo model.

      The primary audience will be other prion researchers. There may be some broader interest in the mTor pathway and the role of HNRNPK in other neurodegenerative diseases.

      Referee #2

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript "Prion propagation is controlled by a hierarchical network involving the nuclear Tfap2c and hnRNP K factors and the cytosolic mTORC1 complex" by Sellitto et al aims to examine how heterogenous nuclear ribonucleoprotein K (hnRNPK), limits pion propagation. They perform a synthetic - viability CRISPR- ablation screen to identify epistatic interactors of HNRNPK. They found that deletion of Transcription factor AP-2g (TFAP2C) suppressed the death of hnRNP-K depleted LN-229 and U-251 MG cells whereas its overexpression hypersensitized them to hnRNP K loss. Moreover, HNRNPK ablation decreased cellular ATP, downregulated genes related to lipid and glucose metabolism and enhanced autophagy. Simultaneous deletion of TFAP2C reversed these effects, restored transcription and alleviated energy deficiency. They state that HNRNPK and TFAP2C are linked to mTOR signalling and observe that HNRNPK ablation inhibits mTORC1 activity through downregulation of mTOR and Rptor while TFAP2C overexpression enhances mTORC1 downstream functions. In prion infected cells, TFAP2C activation reduced prion levels and countered the increased prion propagation due to HNRNPK suppression. Pharmacological inhibition of mTOR also elevated prion levels and partially mimicked the effects of HNRNPK silencing. They state their study identifies TFAP2C as a genetic interactor of HNRNPK and implicates their roles in mTOR metabolic regulation and establishes a causative link between these activities and prion propagation.

      This is an interesting manuscript in which a lot of work has been undertaken. The experiments are on the whole well done, carefully documented and support most of the conclusions drawn. However, there are places where it was quite difficult to read as some of the important results are in the supplementary Figures and it was necessary to go back and forth between the Figs in the main body of the paper and the supplementary Figs. There are also Figures in the supplementary which should have been presented in the main body of the paper. These are indicated in our comments below.

      We have the following questions /points:

      Major comments:

      1) A plasmid harbouring four guide RNAs driven by four distinct constitutive promoters is used for targetting HNRNPK- is there a reason for using 4 guides- is it simply to obtain maximal editing - in their experience is this required for all genes or specific to HNRNPK?

      Response: The use of four guide RNAs driven by distinct promoters is chosen to maximize editing efficiency for HNRNPK. As previously demonstrated by J. A. Yin et al. (Ref. 32), this system provides better efficiency for gene knockout (or activation). For HNRNPK, achieving full knockout was crucial for observing a complete lethal phenotype, which made the four guide RNAs approach fundamental. However, other knockout systems, while potentially less efficient, have been shown to work well in other circumstances. We have now included this explanation in the revised manuscript as follows:

      "We employed a plasmid harboring quadruple non-overlapping single-guide RNAs (qgRNAs), driven by four distinct constitutive promoters, to target the human HNRNPK gene and maximize editing efficiency in polyclonal LN-229 and U-251 MG cells stably expressing Cas9 (32)."

      2) Is there a minimal amount of Cas9 required for editing?

      Response: We did not observe a correlation between Cas9 levels and activity, yet the C3 clone was the one with higher Cas9 expression and higher activity (Supp. Fig. 1A-B). We agree that comments about the amount of Cas9 expression may be misleading here. Thus, in the first result paragraph of the revised manuscript, we have now modified the text "we isolated by limiting dilutions LN-229 clones expressing high Cas9 levels" to "we isolated by limiting dilutions LN-229 single-cell clones expressing Cas9".

      3) It is stated that cell death is delayed in U251-MG cells compared to LN-229-C3 cells- why? Also, why use glioblastoma cells other than that they have high levels of HNRNPK? Would neuroblastoma cells be more appropriate if they are aiming to test for prion propagation?

      Response: As shown in Fig. 1A, U251-MG cells reached complete cell death at day 13, while LN-229 C3 reached it already at day 10. The percentage of viable U251-MG cells is higher (statistically significant) than LN-229 C3 cells at all time points before day 13, when both lines show complete death. The underlying reasons for this partial and relative resistance are probably multiple, but we clearly showed in Fig. 2 that TFAP2C differential expression is one modulator of cell sensitivity to HNRNPK ablation.

      We selected glioblastoma cells because their high expression of HNRNPK was essential for developing our synthetic lethality screen strategy, and we have now clarified it in the revised manuscript as follows:

      "As model systems, we chose the human glioblastoma-derived LN-229 and U-251 MG cell lines, which express high levels of HNRNPK (2, 3), a key factor for optimizing our synthetic lethality screen."

      While neuroblastoma cells might be more relevant in terms of prion neurotoxicity, glial cells, despite their resistance to prion toxicity, are fully capable of propagating prions. Prion propagation in glial cells has been shown to play crucial roles in mediating prion-dependent neuronal loss in a non-autonomous manner (see 10.1111/bpa.13056). This makes glioblastoma cells a valuable model for studying prion propagation (that is the focus of our study), despite the lack of direct toxicity (which is not the focus of our study). We have now added this explanation to the revised manuscript as follows:

      "Therefore, we continued our experiments using LN-229 cells, which provide a relevant model for studying prions, as glial cells can propagate prions and contribute to prion-induced neuronal loss through non-cell-autonomous mechanisms."

      4) Human CRISPR Brunello pooled library- does the Brunello library use constructs which have four independent guide RNAs as used for the silencing of HNRPNK?

      Response: No, the Human CRISPR Brunello pooled library does not use constructs with four independent guide RNAs (qgRNAs). Instead, each gene is targeted by 4 different single-guide RNAs (sgRNAs), each expressed on a separate plasmid. We have now clarified this in the main text of the revised manuscript as follows:

      "To identify functionally relevant epistatic interactors of HNRNPK, we conducted a whole-genome ablation screen in LN-229 C3 cells using the Human CRISPR Brunello pooled library (33), which targets 19,114 genes with an average of four distinct sgRNAs per gene, each expressed by a separate plasmid (total = 76,441 sgRNA plasmids)."

      5) To rank the 763 enriched genes, they multiply the -log10FDR with their effect size - is this a standard step that is normally undertaken?

      Response: The approach of ranking hits using the product of effect size and statistical significance is a well-established method in CRISPR screening studies. This strategy has been explicitly used in high-impact work by Martin Kampmann and others (see https://doi.org/10.1371/journal.pgen.1009103 and https://doi.org/10.1016/j.neuron.2019.07.014 as references). We have now added both references to the revised manuscript.

      6) The 32 genes selected- they were ablated individually using constructs with one guide RNA or four guide RNAs?

      Response: The 32 genes selected were ablated individually using constructs with quadruple-guide RNAs (qgRNAs), as this approach was intended to maximize editing efficiency for each gene. We have now clarified this in the main text of the revised manuscript as follows:

      "We ablated each gene individually using qgRNAs and then deleted HNRNPK."

      7) The identified targets were also tested in U251-MG cells and nine were confirmed but the percent viability was variable - is the variability simply a reflection of the different cell line?

      Response: The variability in percent viability observed in U251-MG cells likely reflects the inherent differences between cell lines, which can contribute to varying levels of susceptibility to gene ablation, even for the same targets. We have now highlighted these small differences in the main text of the revised manuscript as follows:

      "We confirmed a total of 9 hits (Fig. 1H), including the ELPs gene IKBAKP and the transcription factor TFAP2C, the two strongest hits identified in LN-229 C3 cells. However, in the U251-Cas9 the rescue effect did not always fall within the exact range observed in LN-229 C3 cells, likely due to intrinsic differences between the two cell lines."

      8) The two strongest hits were IKBAKP and TFAP2C. As TFAP2C is a transcription factor - is it known to modulate expression of any of the genes that were identified to be perturbed in the screen? Moreover, it is stated that it regulates expression of several lncRNAs- have the authors looked at expression of these lncRNAs- is the expression affected- can modulation of expression of these lncRNAs modulate the observed phenotypic effects and also some of the targets they have identified in the screen?

      Response: While TFAP2C is a transcription factor known to regulate the expression of several genes and lncRNAs, we did not identify any of its known target genes among the hits of our screen. However, our RNA-seq data and RT-qPCR (data not shown) indicate that the expression of lncRNA MALAT1 and NEAT1 (reported to interact with both HNRNPK and TFAP2C; ref 37, 41, 47) is strongly affected by HNRNPK ablation and to a lesser extent by TFAP2C deletion. However, the double deletion condition does not appear to change these lncRNA levels beyond what is observed with HNRNPK ablation alone. Therefore, we concluded that these changes do not play a primary role in the phenotypic effects observed in our study. Thus, although interesting, we believe that the description of such observations goes beyond the scope of this manuscript and the relevance of this work.

      9) As both HNRNPK and TFAP2C modulate glucose metabolism, the authors have chosen to explore the epistatic interaction. This is most reasonable.

      Response: We do not have further comments on this point.

      10) The orthogonal assay to confirm that deletion of TFAP2C supresses cell death upon removing HNRNPK- was this done using a single guide RNA or multiple guides - is there a level of suppression required to observe rescue? Interestingly ablation of HNRNPK increases TFAP2C expression in LN-229-C3 whereas in U251-Cas9 cells HNRNPK ablation has the opposite effect- both RNA and protein levels of TFAP2C are decreased - is this the cause of the smaller protective effect of TFAP2C deletion in this cell line?

      Response: TFAP2C deletion was performed using quadruple-guide RNAs (gqRNAs). We have clarified this point by addressing the reviewer #2's point 6 in "Major comments".

      We did not directly test the threshold of TFAP2C inhibition required to suppress HNRNPK ablation-induced cell death. We did not exclude that other effectors may take a role in the smaller protective effect of TFAP2C deletion in the U251-Cas9 cells, however, multiple lines of evidence from our study suggest that TFAP2C expression levels influence cellular sensitivity to HNRNPK loss:

      1) Both LN-229 C3 and U251-Cas9 cells are less sensitive to HNRNPK ablation upon TFAP2C deletion (Fig. 1G-H, Fig. 2A-B, Supp. Fig.3A-B).

      2) We observed a correlation between endogenous TFAP2C levels and HNRNPK ablation sensitivity. U251-Cas9 cells, where TFAP2C expression is reduced upon HNRNPK ablation (in contrast to LN-229 C3 cells, where HNRNPK ablation leads to an increase in TFAP2C expression) (Fig. 2C-F), are a) less sensitive to HNRNPK deletion than LN-229 C3 (Fig. 1A, 2A-B) and b) the protective effect of TFAP2C deletion is less pronounced than in LN-229 C3 (Fig. 1G-H, Fig. 2A-B, Supp. Fig.3A-B).

      3) TFAP2C overexpression experiments (Fig. 2G) establish a causal relationship to the former correlation: TFAP2C overexpression increased U251-Cas9 sensitivity to HNRNPK ablation.

      As clearly mentioned in the manuscript, we believe that, taken together, these findings strongly demonstrate a causal role for TFAP2C in modulating sensitivity to HNRNPK loss. Thus, despite the differences in the expression, the proposed viability interaction between TFAP2C and HNRNPK is conserved across cell lines.

      To further strengthen our conclusions, we have now added LN-229 C3 TFAP2C overexpression in Fig. 2G (also attached below for your convenience). As for the U251-Cas9, LN-229 C3 cells show increased sensitivity to HNRNPK ablation upon TFAP2C overexpression.

      11) Nuclear localisation studies indicate that the HNRNPK and TFAP2C proteins colocalise in the nucleus however the co-IP data is not convincing- although appropriate controls are present, the level of interaction is very low - the amount of HNRNPK pulled down by TFAP2C is really very low in the LN-229C3 cells and even lower in the U251-Cas9 cells. Have they undertaken the reciprocal co-IP expt?

      Response: We rephrased our text to better highlight this as also mentioned in our response to reviewer #1 (Point 1 - Major comments). However, as also noted by the reviewer, the experiments included all the relevant controls. Thus, the results are solid and confirm a degree of co-immunoprecipitation (although weak). As detailed in our response to reviewer #1 (Point 1 - Major comments), to strengthen our conclusion, we have now repeated the experiment in low-detergent conditions and used benzonase nuclease for DNA digestion. We also have performed the reciprocal experiment as suggested by the reviewer, confirming the initial results. In our opinion, these additional experiments support the conclusion that Tfap2c and hnRNP K co-immunoprecipitate through a weak, but direct, interaction.

      12) They state that LN-229 C3 ∆TFAP2C and U251-Cas9 ∆TFAP2C were only mildly resistant to the apoptotic action of staurosporin Fig 3E and F - I accept they have undertaken the stats which support their statement that at high concentrations of staurosporin the LN-229 C3 ∆TFAP2C cells are less sensitive but the U251-Cas9 ∆TFAP2C decreased sensitivity is hard to believe. Has this been replicated? I agree that HNRNPK deletion causes apoptosis in both LN-229 C3 and U251-Cas9 cells and this is blocked by Z-VAD-FMK - however the block is not complete- the max viability for HNRNPK deletion in LN-229 C3 cells is about 40% whereas for U251-Cas9 cells it is about 30% - does this suggest that cells are being lost by another pathway. Have they tested concentrations higher than 10nM?

      Response: The experiments in FIG. 3E-F have been replicated four times, as stated in the figure legend. We agree that TFAP2C plays a limited role in response to staurosporine-induced apoptosis, particularly in U251-Cas9 cells. To ensure clarity, we have now modified our previous sentence as follows:

      "LN-229 C3ΔTFAP2C cells were only mildly resistant to the apoptotic action of staurosporine, and U251-Cas9ΔTFAP2C showed even lower and minimal recovery (Fig. 3E-F). These results indicate that TFAP2C plays a limited role in apoptosis regulation and suggest that its suppressive effect on HNRNPK essentiality is not mediated through direct modulation of apoptosis but rather through upstream processes that eventually converge on it."

      The incomplete blockade of apoptosis by Z-VAD-FMK suggests that HNRNPK ablation may activate alternative, non-caspase-mediated cell death pathways. Regarding this point, we decided to not test Z-VAD-FMK above 10 nM as we noted that the rescue effect at the lowest concentration (2nM) was not proportionally increasing at higher concentrations, suggesting we already reached saturation. We have now added and clarified these observations in the revised manuscript as follows:

      "Z-VAD-FMK decreased cell death consistently and significantly in LN-229 C3 and U251-Cas9 cells transduced with HNRNPK ablation qgRNAs (Fig. 3C‑D), confirming that HNRNPK deletion promotes cell apoptosis. However, we observed that viability recovery plateaued already at the lowest concentration (2 nM) without further increase at higher doses, suggesting a saturation effect. This indicates that while caspase inhibition alleviates part of the cell death, HNRNPK loss triggers additional mechanisms beyond apoptosis".

      Following the suggestion of the reviewer, we have now also tested two higher concentrations of Z-VAD (20 and 50nM) in LN-229 cells. At these concentrations, we observed a slight decrease in cell viability in the NT condition, with a rescue effect in the HNRNPK-ablated cells comparable to what was observed at 2-10nM Z-VAD. For this reason, we did not include these data in the revised manuscript, and we attached them here for transparency.

      13) The RNA-seq comparisons- the authors use log2 FC Response: We used a log2 FC threshold of >0.5 and 0.25) is commonly used in RNA-seq studies to capture biologically relevant shifts (e.g.,https://doi.org/10.1371/journal.ppat.1012552; https://doi.org/10.1371/journal.ppat.1008653; https://doi.org/10.1016/j.neuron.2025.03.008; https://doi.org/10.15252/embj.2022112338). We complemented this analysis with Gene Set Enrichment Analysis (GSEA) to assess coordinated changes in biological/genetic pathways, ensuring that our conclusions are not based on isolated, minor expression changes nor on arbitrary thresholds. Finally, to enhance our result robustness, we applied False Discovery Rate (FDR) statistics, which is more stringent than a p-value cutoff. We hope this clarification strengthens the reviewer's confidence in the significance of the observed changes.

      14) It is stated" Accordingly, we observed increased AMPK phosphorylation (pAMPK) upon ablation of HNRNPK, which was consistently reduced in LN-229 C3ΔTFAP2C cells (Supp. Fig. 5B). LN-229 C3ΔTFAP2C; ΔHNRNPK cells also showed a partial reduction of pAMPK relative to LN-229 C3ΔHNRNPK cells (Supp. Fig. 5B). These results suggest that hnRNP K depletion causes an energy shortfall, leading to cell death.

      Response: I am not totally convinced by the data presented in this Fig. The authors have quantified the band intensity and present the ratio of pAMPK to AMPK. Please note that the actin levels are variable across the samples - did they normalise the data using the actin level before undertaking the comparisons? Also, if the authors think this is an important point which supports their conclusion, then it should be in the main body of the paper rather than the supplementary. If AMPK is being phosphorylated, this should lead to activation of the metabolic check point which involves p53 activation by phosphorylation. Activated p53 would turn on p21CIP1 which is a very sensitive indicator of p53 activation.

      We also refer the reviewer to our response to reviewer #1 (Point 2 - Major comments). We understand the point of the reviewer as pAMPK/Actin (absolute AMPK phosphorylation) may provide additional context regarding the downstream effects of AMPK activation, which, however, is not the primary scope of our experiment. We believe that in our specific case, a) the pAMPK/AMPK ratio is the most appropriate metric, as it reflects the energy status of the cell (ATP/AMP levels), which was our main point to assess in this experiment, and b) phospho-protein/total protein is the standard approach for quantifying phosphorylation ratio. For completeness, we have now included pAMPK/Actin quantifications in Supp. Fig. 6B of the revised manuscript (also attached below). pAMPK/Actin levels follow the same trend of pAMPK/AMPK in HNRNPK and TFAP2C single ablations. The pAMPK/AMPK partial rescue in HNRNPK;TFAP2C double ablation relative to HNRNPK single deletion is instead not observed at pAMPK/Actin level. We have now added the pAMPK/Actin quantification and this observation to the revised manuscript as follows:

      "Accordingly, we observed increased AMPK phosphorylation (pAMPK/AMPK ratio and pAMPK/Actin) upon ablation of HNRNPK, with a trend toward reduction in LN-229 C3ΔTFAP2C cells (Supp. Fig. 6B). LN-229 C3ΔTFAP2C;ΔHNRNPK cells also showed a reduction of pAMPK/AMPK ratio relative to LN-229 C3ΔHNRNPK cells, although absolute AMPK phosphorylation (pAMPK/Actin) remained high (Supp. Fig. 6B)."

      We prefer to keep the AMPK blots in Supplementary Fig. 6B, as we believe the main take-home message of the manuscript should remain centered on mTORC1 activity.

      15) We also do not understand why the mTOR Suppl. Fig. 5E is not in the main body of the paper. It's clear that RNA and protein levels of mTOR were downregulated in LN-229 C3ΔHNRNPK cells but were partially rebalanced by the ΔTFAP2C- however the ΔTFAP2C;ΔHNRNPK double deletion levels are only slightly higher than the ΔHNRNPK - they are not at the level NT or even ΔTFAP2C (Fig. 4C, Supp. Fig. 5E).

      Response: We moved the mTOR blot to Fig.5D of the revised manuscript. About the low rescue effect, this is in line with all the other observations where a full rescue of the effects of HNRNPK ablation is never achieved, but is only partial. As suggested by reviewer #3 (Figure 5 - Point 2), we have now added RT-qPCR in Fig.5C, which corroborates these data.

      16) The authors state: "Deletion of HNRNPK diminished the highly phosphorylated forms of 4EBP1, which instead were preserved in both LN-229 C3ΔTFAP2C and LN-229 C3ΔTFAP2C;ΔHNRNPK cells (Fig. 5C). Similarly, the S6 phosphorylation ratio was reduced in LN-229 C3ΔHNRNPK cells and was restored in the ΔTFAP2C;ΔHNRNPK double-ablated cells (Fig. 5C)."

      WE are not convinced that p4EBP1 is preserved in the LN-229 C3ΔTFAP2C cells - there is a very faint band which is at a lower level than the band in the LN-229 C3ΔHNRNPK cells. However, when both HNRNPK and TFAP2C were ablated, the p4EBP1 band is clear cut. I agree with the quantitation that deletion of HNRNPK and TFAP2C both reduce the level of 4EBP1 - the reduction is greater with TFAP2 but when both are deleted together the levels of 4EBP1 are higher and p4EBP1 is clearly present. In quantifying the S6 and pS6 levels, did the authors consider the actin levels- they present a ratio of the pS6 to S6. I may be lacking some understanding but why is the ratio of pS6/S6 being calculated. Is the level of pS6 not what is important - phosphorylation of S6 should lead it to being activated and thus it's the actual level of pS6 that is important, not the ratio to the non-phosphorylated protein.

      Response: In Fig. 5C, the three-band pattern of 4EBP1 is clearly visible in the NT+NT or WT condition, with the top band representing the highest phosphorylation state. Upon HNRNPK deletion, this top band almost completely disappears, mimicking the effect of our starvation control (Starv.). This top band remains clearly visible in both TFAP2C-ablated and double-ablated cells, supporting our conclusion. In our original text, we referred to the "highly phosphorylated forms" of 4EBP1, which might have caused some confusion, suggesting we were evaluating the two top bands. We are specifically referring only to the very top band (high p4EBP1), which represents the most highly phosphorylated form of 4EBP1. This is the relevant phosphorylated form to focus on, as it is the only one that disappears in the starvation control (Starv.) or upon mTORC1/2 inhibition with Torin-1 (Fig. 7B).

      To better clarify these points, we have now more clearly indicated the "high p4EBP1" band with an asterisk in Fig. 5E, added quantification of high p4EBP1/4EBP1, and rephrased the text as follows:

      "Deletion of HNRNPK diminished the highest phosphorylated form of 4EBP1 (high p4EBP1, marked with an asterisk), mimicking the effect observed in starved cells (Starv.). This high p4EBP1 band was preserved in both LN-229 C3ΔTFAP2C and LN-229 C3ΔTFAP2C;ΔHNRNPK cells (Fig. 5C).".

      Regarding pS6 quantification, we added pS6/Actin quantification in Supp. Fig. 6E and F of the revised manuscript, also attached here for your convenience.

      17) When determining ATP levels, do they control for cell number? HNRNPK depletion results in lower ATP levels, co-deletion of TFAP2C rescues this. But this could be because there is less cell-death? So, more cells express ATP. Have they controlled for relative numbers of cells.

      Response: As described in the Materials and Methods , we normalized ATP levels to total protein content, which is a standard approach for this type of quantification (see DOI:10.1038/nature19312).

      18) The construction of the HovL cell line that propagate ovine prions - very few details are provided of the susceptibility of the cell line to PG127 prions.

      Response: As with other prion-infected cell lines, HovL cells do not exhibit any specific growth defects, susceptibilities, or phenotypes beyond their ability to propagate prions. This is consistent with established observations in prion research, where immortalized cell lines (and in general in vitro cultures) normally do not show cytotoxicity upon prion infection and, therefore, are used as models for prion propagation rather than for prion toxicity (see https://doi.org/10.1111/jnc.14956 for reference).

      We now expanded the relevant section, including technical and conceptual details in the main text of the revised manuscript as follows:

      "As reported for other ovinized cell models (66), HovL cells were susceptible to infection by the PG127 strain of ovine prions and capable of sustaining chronic prion propagation, as shown by proteinase K (PK)-digested western blot and by detection of PrPSc using the anti-PrP antibody 6D11, which selectively stains prion-infected cells after fixation and guanidinium treatment (67) (Supp. Fig. 7C-E). Consistent with most prion-propagating cell lines (68), HovL cells did not exhibit specific growth defects, susceptibilities, or overt phenotypes beyond their ability to propagate prions."

      19) It is stated that HRNPK depletion from HovL cells increases PrpSC as determined by 6D11 fluorescence, but in the manuscript HRNPK depletion results in cell death. How does this come together?

      Response: As explicitly stated in the main text and shown in Fig.6-7, HNRNPK is downregulated (via siRNAs) in the prion experiments rather than fully deleted (via CRISPR) as in the first part of the manuscript. As shown in Supp. Fig. 8B, this downregulation does not affect cell viability within the experimental time window. Therefore, the observed increase in PrPSc levels upon HNRNPK downregulation, as determined by western blot and 6D11 staining, is independent of any potential cell death effects. Moreover, the same siRNA downregulation approach was used by M. Avar et al. (Ref. 26) in comparable experiments, yielding similar outcomes.

      20) They show that mTOR inhibition mimics the effect of HNRNPK deletion, why didn't they overexpress mTOR and see if that rescues this? This would indicate a causal relationship.

      Response: We appreciate the reviewer's suggestion. We agree that the proposed rescue strategy would be the best approach to indicate a causal relationship. However, we linked the activity of the mTORC1 complex (and not only that of mTOR) to prion propagation. Overexpression of only mTOR would not restore mTORC1 full function, as Rptor would still be downregulated in the context of HNRNPK siRNA silencing (Fig. 7A and Supp. Fig. 8E). Moreover, our RNA-seq data (Supp. Table 5) from HNRNPK ablation indicate the downregulation of other mTORC1 components (namely Pras40 (AKT1S1) and mLST8). Therefore, the rescue of the mTORC1 activity by an overexpression strategy would be a very challenging approach. Given these complexities, to infer causality, we used mTORC1 inhibition (via rapamycin and Torin1) to mimic the effects of HNRNPK downregulation in reducing mTORC1 activity (FIG. 7B).

      For clarification, we have now highlighted in Fig. 4C that HNRNPK ablation downregulates also AKT1S1 and mLST8, other than mTOR and Rptor (also attached below), and we have discussed this in the main text as well. We also have clarified in the revised manuscript (where we sometimes inadvertently referred to it as just mTOR inhibition) that the observed effects are due to mTORC1 inhibition, and not simply mTOR inhibition.

      21) Flow cytometric data: supplementary Fig of Fig6d. - when they are looking at fixed cells the gating strategy for cells results in the inclusion of a lot of debris. The gate needs to be moved and be more specific to ensure results are interpreted properly. Same with the singlet gating. It's not tight enough, they include doublets as well which will skew their data. The gating strategy needs to be regated.

      Response: We have reanalyzed the flow cytometry data in Fig. 6D with a more stringent gating approach to better exclude debris and ensure proper singlet selection. We confirm that there is no change in the final interpretation of the results after applying the updated gating strategy.

      Reviewer #2 (Significance (Required)):

      The manuscript "Prion propagation is controlled by a hierarchical network involving the nuclear Tfap2c and hnRNP K factors and the cytosolic mTORC1 complex" by Sellitto et al aims to examine how heterogenous nuclear ribonucleoprotein K (hnRNPK), limits pion propagation. They perform a synthetic - viability CRISPR- ablation screen to identify epistatic interactors of HNRNPK. They found that deletion of Transcription factor AP-2g (TFAP2C) suppressed the death of hnRNP-K depleted LN-229 and U-251 MG cells whereas its overexpression hypersensitized them to hnRNP K loss. Moreover, HNRNPK ablation decreased cellular ATP, downregulated genes related to lipid and glucose metabolism and enhanced autophagy. Simultaneous deletion of TFAP2C reversed these effects, restored transcription and alleviated energy deficiency.

      Referee #3

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: Using a CRISPR-based high throughput abrasion assay, Sellitto et al. identified a list of genes that improve cell viability when deleted in hnRNP K knockout cells. Tfap2c, a transcription factor, was identified as a candidate with potential overlap with a hnRNP K function like modulating glucose metabolism. The deletion of Tfap2c in hnRNP K-deletion background prevented caspase-dependent apoptosis observed in hnRNP K single-deletion cells. Further analysis of bulk RNA-seq in hnRNP K/TFAP2C single- and double-deletion cells revealed the impairment in cellular ATP level. Accordingly, activation of AMPK led to perturbed autophagy in hnRNP K deleted cells. Moreover, the reduction and/or inactivation of the downstream mTOR protein resulted in the reduced phosphorylation of S6. Conversely, the phosphorylation of S6 and E4BP1 can be increased by TFAP2C overexpression. Finally, the pharmacological inhibition of the mTOR pathway increased the PrPSC level. This is an interesting paper potentially providing new mechanical insight of hnRNPK function and its interaction with TFAP2C. However, inconsistencies in TFAP2C expression across cell lines and conflicting mechanistic interpretations complicate conclusions. Co-IP experiments suggested hnRNP K and Tfap2c may interact, though further validation is needed. Several figures require additional clarification, statistical analysis, or experimental validation to strengthen conclusions.

      Major comments:

      1) Different responses of the TFAP2C expression level to deletion of hnRNPK in the two cell lines (LN-229 C3 and U251-Cas9) should be more adequately addressed. The manuscript focuses on the interaction between hnRNPK and TFAP2C, yet the hnRNPK deletion causes different changes in TFAP2C level in two different lines. Furthermore, in studies where the mechanistic link between hnRNPK and TFAP2C is being investigated, only results from the LN-229 line are presented (Figure 4-7). Thus, it is not clear whether these mechanisms also apply to another line, U251-Cas9, where hnRNPK deletion has the opposite effect on the TFAP1C level. Thus, key experiments should be performed in both lines.

      Response: The opposite effects of hnRNPK ablation on TFAP2C expression between LN-229 C3 and U251-Cas9 cells likely reflect intrinsic differences between the two cell lines. However, the viability interaction between hnRNPK and TFAP2C is conserved in both cell models (Fig. 1G-H, 2A-B, Supp. Fig. 3A-B), suggesting that shared molecular functions at the interface of this interaction exist across the lines. In fact, we believe that the opposite effect of hnRNPK ablation on TFAP2C expression in the two lines strengthens (rather than weakens) our model by highlighting how TFAP2C expression modulates cellular sensitivity to HNRNPK ablation, as detailed in our response to Reviewer #2 (Point 10 - Major comments).

      Regarding the mechanistic studies presented in FIG. 4-7, our initial goal in using two cell lines was to validate the functional viability interaction between HNRNPK and TFAP2C, as identified in our screening (performed in LN-229 C3 cells). After confirming this interaction, we chose to focus only on LN-229 C3 (beginning with RNA-seq analysis, which then led to subsequent mechanistic studies), as this provided the necessary foundation to investigate prion propagation in HovL cells (derived from LN-229). As a U251 model propagating prions does not exist, we are technically limited in performing prion experiments only in HovL and we do not believe that conducting additional experiments in U251 cells would add substantial value to our work or further our investigation.

      We hope this explanation clarifies our rationale and addresses the reviewer's concerns.

      2) Although a lot of data are presented, it is not clear how deletion of the TFAP2C reverses the toxicity caused by deletion of hnRNPK. Specifically, the first half of the paper seems to suggest an opposite mechanism than the second half of the paper. In Figure 2-4, the authors suggest a model that TFAP2C deletion has the opposite effect of hnRNPK deletion, thus rescuing toxicity. However, in Figure 5-6, it is suggested TFAP2C overexpression has the opposite effect of hnRNPK deletion. This two opposite effect of TFAP2C make it difficult to understand the models that the authors are proposing. Please also see below comment 2 for Figure 5.

      Response: We respectfully disagree with the notion that the first and second halves of the manuscript propose contradictory mechanisms.

      In Fig. 2-4, we describe the phenotypic rescue of cell viability upon TFAP2C deletion in hnRNPK-deficient cells. At this stage, we are not proposing a specific molecular mechanism but simply observing a rescue of viability and highlighting underlying transcriptional differences. There is no implication of an opposite molecular mechanism involving the individual activities of hnRNPK and TFAP2C; rather, we focused on the broader effect of TFAP2C deletion on the viability of HNRNPK-lacking cells. In Fig. 5, we isolated a partial mechanism underlying this interaction. We state that: "These data specify a role for TFAP2C in promoting mTORC1-mediated cell anabolism and suggest that its overexpression might hypersensitize cells to HNRNPK ablation by depleting the already limited ATP available, thus making its deletion advantageous". In the discussion, we now further reviewed our explanation: "HNRNPK deletion might cause a metabolic impairment leading to a nutritional crisis and a catabolic shift, whereas TFAP2C activation could promote mTORC1 anabolic functions. Thus, Tfap2c removal may rewire the bioenergetic needs of cells by modulating the mTORC1 signaling and augmenting their resilience to metabolic stress like the one induced by HNRNPK ablation". Therefore, we propose that TFAP2C expression might be particularly detrimental in hnRNPK-deficient cells, as it could push the cell into an anabolic biosynthetic state, further depleting energy stores that the cell is attempting to conserve in response to hnRNPK depletion. Removal of TFAP2C alleviates this metabolic strain. In our view, there is no contradiction between our observations.

      We hope this explanation clarifies our rationale and resolves any perceived inconsistency in our model. To further enhance the understanding of our interpretations, we have now also added (in substitution of Fig. 5E of the original manuscript) a graphical scheme (Fig. 5G of the revised manuscript) to visually explain and illustrate our model (attached below for your convenience).

      3) Similar to the point above, the first half of the paper focuses on hnRNPK deletion-induced toxicity (Fig. 1-5), while the second half of the paper focuses on hnRNPK deletion-induced PrPSC level (Fig. 6-7). The mechanistic link between these two downstream effects of hnRNPK deletion is not clear and thus, it is difficult to understand the reason that hnRNPK deletion-induced toxicity can be rescued by TFAP2C deletion, while hnRNPK deletion-induced PrPSC level increase can be rescued by TFAP2C overexpression.

      Response: Our study is not aimed at comparing viability and prion propagation as interconnected phenotypes but rather at identifying molecular processes regulated by the HNRNPK-TFAP2C interaction. Our study identifies mTORC1 activity as a molecular process at the interface of the HNRNPK-TFAP2C. HNRNPK knockout (or knockdown, which does not affect viability, and therefore is used in the prion section of the manuscript) tones mTORC1 activity down, while TFAP2C overexpression enhances it. This finding suggested an explanation for the viability interaction we observed (see reply to reviewer #3 - Point 2 -Major comments) and it provided a partial mechanism (mTORC1 activity) to explain the effect of HNRNPK knockdown and TFAP2C overexpression on prions.

      We hope this clarification addresses the reviewer's concern.

      Abstract:

      1) Please rephrase and clarify "We linked HNRNPK and TFAP2C interaction to mTOR signaling..." by distinguishing functional, genetic, and direct (molecule-to-molecule) interactions.

      Response: 1) We have now clarified it in the text of the revised manuscript as follows:

      "We linked HNRNPK and TFAP2C functional and genetic interaction to mTOR signaling, observing that HNRNPK ablation inhibited mTORC1 activity through downregulation of mTOR and Rptor, while TFAP2C overexpression enhanced mTORC1 downstream functions."

      2) A sentence reads, "...HNRNPK ablation inhibited mTORC1 activity through downregulation of mTOR and Rptor," although the downregulation of Rptor is observed only at the RNA level. The change in Rptor protein expression level is not reported in the manuscript. Please consider adding an experiment to address this or rephrase the sentence.

      Response: 2) We have now added the experiment in Supp. Fig. 9A of the revised manuscript. The blot shows that hnRNP K depletion reduces both mTOR and Rptor protein levels. "hnRNP K depletion inhibited mTORC1 activity through downregulation of mTOR and Rptor".

      Figure 2:

      1. H and I. Co-IP experiments were done using anti-TFAP2C antibody to the bead. Although the TFAP2C bands show robust signals on the blots, indicating successful enrichment of the protein, hnRNP K bands are very faint. Has the experiment been done by conjugating the hnRNP K antibody to the beads instead? Was the input lysate enriched in the nuclear fraction? Did the lysis buffer include nuclease (if so, please indicate in the figure legend and the methods section)? Addressing these would make the argument, "We also observed specific co-immunoprecipitation of hnRNP K and Tfap2c in LN-229 C3 and U251-Cas9 cells (Fig. 2H-I, Supp. Fig. 3L), suggesting that the two proteins form a complex inside the nucleus" stronger, providing information on potential direct binding.

      Response: 1. We refer the reviewer to our response to reviewers #1 and #2 regarding the weak interaction, the nuclease treatment, and the HNRNPK IP (reviewer #1 Point 1 and reviewer #2 Point 11 - Major comments). As for the co-IP input, it was not enriched in the nuclear fraction, but as shown in Supp. Fig. 4A-B hnRNPK and Tfap2c are exclusively nuclear.

      Figure 3:

      1. C and D. Please add a sentence in the figure legend explaining which means the multiple comparisons were made between (DMSO vs each drug concentration?). Graphing individual data points instead of bars would also be helpful and more informative. Please discuss the lack of dose dependency.

      Response: 1. We have now added information about the comparison in the figure legend ("Multiple comparison was made between Z-VAD-FMK and DMSO treatments in ΔHNRNPK cells."), modified the graph to show the individual data points (attached below for your convenience), and expanded the discussion as detailed for reviewer #2 (Point 14 - Major comments). (For completeness, we have also modified Supp. FIG. 5F to show individual data points, and we have combined the graphs (the DMSO control was shared across treatments)).

      Supplemental Figure 4 (Now shifted in Supplemental Figure 5):

      1. A. Although the trend can be observed, the deletion of hnRNP K does not significantly reduce the GPX4 protein level in LN-229 C3. Therefore, the following statement requires more data points and additional statistical analysis to be accurate: "In LN-229 C3 and U251-Cas9 cells, the deletion of HNRNPK reduced the protein level of GPX4, whereas TFAP2C deletion increased it (Supp. Fig. 4A-B)."

      2. A and B. The results are confusing, considering the previous report cited (ref 49) shows an increase in GPX4 with TFAP2C. It may be possible that the deletion of TFAP2C upregulates the expression of proteins with similar functions (e.g., Sp1). If this is the case, the changes in GPX4 expression observed here are a consequence of TFAP2C deletion and may not "suggest a role for HNRNPK and TFAP2C in balancing the protein levels of GPX4."

      Response: 1. We agree with the reviewer that in LN-229 C3 cells the reduction of GPX4 protein levels upon HNRNPK deletion did not reach statistical significance in our initial Western blot analysis. To address this concern, we performed six additional independent experiments and repeated the statistical analysis. Although the trend toward reduced GPX4 protein levels remained consistent, statistical significance was still not achieved (p > 0.05). Importantly, this trend is supported by our RNA-seq dataset (Supplementary Table 5), which shows decreased GPX4 expression upon HNRNPK deletion. We have now revised the text to more accurately reflect the experimental observations and to avoid overstating the effect in LN-229 C3 cells as follows:

      "In LN-229 C3 and U251-Cas9 cells, deletion of HNRNPK was associated with reduced glutathione peroxidase 4 (GPX4) protein abundance (although not statistically significant in LN-229 C3; p ≈ 0.08), whereas deletion of TFAP2C increased it (Supp. Fig. 5A-B)."

      The six new experimental replicas have been added to the uncropped western blot section.

      __Response: __2. Concerning the potential role of TFAP2C deletion in upregulating proteins with similar functions, we recognize the reviewer's perspective. However, our primary focus is on the observed trends rather than a definitive mechanistic conclusion. We clarified our wording to acknowledge this possibility while maintaining the relevance of our findings within the broader context of hnRNPK and TFAP2C interactions.

      "This last result was interesting as a previous study reported that Tfap2c enhances GPX4 expression (51). Thus, the observed increase upon TFAP2C deletion suggests additional layers of regulation, potentially involving compensatory mechanisms."

      Supplemental Figure 5 (Now shifted in Supplemental Figure 6):

      1. B. To obtain statistical significance and strengthen the conclusion, more repeated Western blot experiments can be done to quantify the pAMPK/AMPK ratio.

      Response: We included three more experiments as detailed in our response to reviewer #1 (Point 2 - Major comments) and reviewer #2 (Point 14 - Major comments).

      Figure 5:

      1. B. I believe statistical analysis with two replicates or less is not recommended. Although the assay is robust, and the blot is convincing, please consider adding more replicates if the blot is to be quantified and statistically analyzed.

      2. "Interestingly, RNA and protein levels of mTOR were downregulated in LN-229 C3ΔHNRNPK cells but were partially rebalanced by the ΔTFAP2C;ΔHNRNPK double deletion (Fig. 4C, Supp. Fig. E)." The statement is based on a slight difference at the protein level between the single deletion and the double deletion, as well as the observation from the bulk RNA-seq data. mTOR (and Rptor) mRNA level can be assessed by RT-qPCR to validate and further support the existing data. It is also curious why deletion of TFAP2C alone, also induced decrease in mTOR, but double deletion rescued mTOR level slightly compared to deletion of HNRNPK alone.

      3. C. The main text refers to the changes in the level of phosphorylated E4BP1, stating, "Deletion of HNRNPK diminished the highly phosphorylated forms of 4EBP1, which instead were preserved in both LN-229 C3ΔTFAP2C and LN-229 C3ΔTFAP2C;ΔHNRNPK cells (Fig. 5C)." However, the quantification was done on the total E4BP1, which may be because separating pE4BP1 and E4BP1 bands on a blot is challenging. Please consider using phospho-E4BP1 specific antibody or rephrase the sentence mentioned above. The current data suggest the single- and double-deletion of hnRNP K/TFAP2C affect the overall stability of E4BP1, which may be a correlation and not due to the mTOR activity as claimed in "We conclude that HNRNPK and TFAP2C play an essential role in co-regulating cell metabolism homeostasis by influencing mTOR and AMPK activity and expression." How does the cap-dependent translation (or total protein level) change in TFAP2C deleted and overexpressing cells?

      Response: 1. We added two additional experiments as detailed in our response to reviewer #1 (Point 3 - Major comment).

      __Response: __2. Deletion of TFAP2C does not decrease mTOR levels as shown from the quantification in Fig. 5D. To further support our results, we have now included RT-qPCR in FIG. 5C as suggested by the reviewer. Data are also attached here for your convenience.

      __Response: __3. Regarding the assessment of phosphorylated 4EBP1, we think we achieved a clear separation of the differently phosphorylated forms of 4EBP1 in our blots, and we have now added the quantification for High p4EBP1/4EBP1 in Fig. 5E (see also our response to reviewer #2 Point 16 - Major comments). The quantification of total 4EBP1 represents an additional dataset, and we do not claim that 4EBP1 stability is affected by HNRNPK and TFAP2C directly through mTOR, which could be, in fact, correlative. We claim that HNRNPK and TFAP2C modulate mTORC1 and AMPK metabolic signaling as shown by the changed phosphorylation of 4EBP1, S6, AMPK, and ULK1 (Fig. 5C-E, Supp. FIG. 6B, D) and by the regulation of autophagy (Fig. 5B, Supp. Fig. 6C); we did not directly check cap-dependent translation.

      We have now rephrased our text to ensure clarity as follows:

      "We conclude that HNRNPK and TFAP2C play a role in co-regulating mTORC1 and AMPK expression, signaling, and activity."

      Figure 6:

      1. A. Did the sihnRNP K increase the TFAP2C level?

      2. A and C. Are the total PrP levels lower in TFAP2C overexpressing cells compared to mCherry cells when they are infected?

      3. D. Do the TFAP2C protein levels differ between 2-day+72-h and 7-day+96-h?

      __Response: __1. Yes, it does. We have now provided the quantification in Fig. 6A, C, and Supp. Fig. 8A (also attached below for your convenience).

      __Response: __2. We have now provided the quantification in Fig. 6A and Supp. Fig. 8A. The total PrP does not change in TFAP2C overexpressing cells. Total PrP consists of both PK-resistant PrP (PrPSc) and PK-sensitive PrP (PrPC plus potential other intermediate species), with PrPSc typically present at much lower levels. In our model, PrPC is exogenously expressed at high levels via a vector and remains constant across conditions (Fig. 6C and Supp. Fig. 8C). As a result, any changes in PrPSc may not necessarily reflect on total PrP levels.

      __Response: __3. No, there is no statistically significant change. We have now added a representative western blot and the quantification of 3 independent replicates in Supp. Fig. 8D. The other two western blots are only shown in the uncropped western blots section. This dataset is also attached here for your convenience.

      Figure 7:

      1. I agree with the latter half of the statement: "These findings suggest that HNRNPK influences prion propagation at least in part through mTORC1 signaling, although additional mechanisms may be involved." The first half requires careful rephrasing since (A) Independent of the background siRNA treatment, TFAP2C overexpression by itself can modulate PrPSC level as seen in Fig 6A and B, (B) Although the increase in TFAP2C level is observed with the hnRNP K deletion (Fig 1; LN-229 C3), sihnRNP K treatment may or may not influence the TFAP2C level (Fig 6; quantified data not provided), and (C) In the sihnRNP K-treated cells, E4BP1 level is increased compared to the siNT-treated cells, which was not observed hnRNP K-deleted cells. Discussions and additional experiments (e.g., mTOR knockdown) addressing these points would be helpful.

      __Response: __A, B) We respectfully disagree with the possibility that HNRNPK downregulation may increase prion propagation via TFAP2C upregulation. As shown in Fig. 6A-B, D and in Supp. Fig. 8A, TFAP2C overexpression reduces, rather than increases, prion levels. Therefore, it would be inconsistent to suggest that HNNRPK siRNA promotes prion propagation through TFAP2C upregulation (quantification is now provided, see reviewer #3 - Figure 6 - Point 1). C) Concerning 4EBP1 levels, we have quantified the total 4EBP1 (also attached below) and expanded the discussion on potential discrepancies between HNRNPK knockout and knockdown, as the former affects cell viability, while the latter does not. However, as explained also in the previous reply to reviewer #3 - Figure 5 - Point 3, our focus is on the highly phosphorylated band of 4EBP1 (High p4EBP1), which is the direct target of mTORC1 activity. In both the hnRNPK knockout LN-229 C3 (Fig. 5E) and knockdown HovL models (Fig. 7B), phosphorylation of 4EBP1, along with phosphorylation of S6, is clearly reduced (we have now included quantification for Fig. 7B), reinforcing our conclusion that mTORC1 activity is affected by hnRNPK depletion. As the reviewer noted, we do not claim that mTORC1 is the sole mediator of hnRNPK's effect on prion regulation. However, we think that our interpretation of a potential and partial role of mTORC1 inhibition in the effect of HNRNPK downregulation on prion propagation is in line with the data presented in Fig. 6-7 and Supp. Fig. 8-9. For further clarification, we expanded the text according to the new experiments and analysis, and we added mTOR and Raptor siRNA knockdown (Supp. Fig.9C) to further support our conclusions (also attached below for your convenience).

      Minor comments:

      1. Please clarify "independent cultures." Does this mean technical replicates on the same cell culture plate but different wells or replicated experiments on different days?

      __Response: __We have now clarified in each figure legend. "Individually treated wells" means different parental cultures grown and treated separately on the same day. n represents independent experiments on different days.

      1. Fig 2G. Please explain how the sigmoidal curves were fitted to the data points under the materials and methods section.

      2. Fig 3E and F. Please refer to the comment on Fig 2G above.

      __Response: __We have now added the explanation in Materials and Methods as follows:

      "Curve Fitting

      For sigmoidal curve fitting, we used GraphPad Prism (version X, GraphPad Software). Data in Figure 2G were fitted using nonlinear regression with a least squares regression model. For Figures 3E and 3F, data fitting was performed using an asymmetric sigmoidal model with five parameters (5PL) and log-transformed X-values (log[concentration])."

      3.Fig S3 F/H. Quantification of gel bands would be helpful when comparing protein expression changes after different treatments, as band intensities look different across.

      __Response: __We have now added the quantifications in Supp. FIG. 3D-H (attached below for your convenience). They confirm that there are no significant differences in the means of the normalized values.

      1. Supp Fig 5C and F. These panels can be combined with the corresponding panels in main Figure 5 if space allows so that the readers do not have to flip pages between the main text and Supplemental material.

      __Response: __We have now combined the panels. Previous Supp. FIG. 5C and F are now shown in FIG. 6C and E, respectively.

      Reviewer #3 (Significance (Required)):

      This is an interesting paper potentially providing new mechanical insight of hnRNPK function and its interaction with TFAP2C. It is also important to understand how hnRNPK deletion induces prion propagation and develop methods to mitigate its spread. However, inconsistencies in TFAP2C expression across cell lines and conflicting mechanistic interpretations complicate conclusions. I have expertise in RNA-binding protein, cell biology, and prion disease.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript "Prion propagation is controlled by a hierarchical network involving the nuclear Tfap2c and hnRNP K factors and the cytosolic mTORC1 complex" by Sellitto et al aims to examine how heterogenous nuclear ribonucleoprotein K (hnRNPK), limits pion propagation. They perform a synthetic - viability CRISPR- ablation screen to identify epistatic interactors of HNRNPK. They found that deletion of Transcription factor AP-2 (TFAP2C) suppressed the death of hnRNP-K depleted LN-229 and U-251 MG cells whereas its overexpression hypersensitized them to hnRNP K loss. Moreover, HNRNPK ablation decreased cellular ATP, downregulated genes related to lipid and glucose metabolism and enhanced autophagy. Simultaneous deletion of TFAP2C reversed these effects, restored transcription and alleviated energy deficiency.

      They state that HNRNPK and TFAP2C are linked to mTOR signalling and observe that HNRNPK ablation inhibits mTORC1 activity through downregulation of mTOR and Rptor while TFAP2C overexpression enhances mTORC1 downstream functions. In prion infected cells, TFAP2C activation reduced prion levels and countered the increased prion propagation due to HNRNPK suppression. Pharmacological inhibition of mTOR also elevated prion levels and partially mimicked the effects of HNRNPK silencing. They state their study identifies TFAP2C as a genetic interactor of HNRNPK and implicates their roles in mTOR metabolic regulation and establishes a causative link between these activities and prion propagation.

      This is an interesting manuscript in which a lot of work has been undertaken. The experiments are on the whole well done, carefully documented and support most of the conclusions drawn. However, there are places where it was quite difficult to read as some of the important results are in the supplementary Figures and it was necessary to go back and forth between the Figs in the main body of the paper and the supplementary Figs. There are also Figures in the supplementary which should have been presented in the main body of the paper. These are indicated in our comments below.

      We have the following questions /points:

      1. A plasmid harbouring four guide RNAs driven by four distinct constitutive promoters is used for targetting HNRNPK- is there a reason for using 4 guides- is it simply to obtain maximal editing - in their experience is this required for all genes or specific to HNRNPK?
      2. Is there a minimal amount of Cas9 required for editing?
      3. It is stated that cell death is delayed in U251-MG cells compared to LN-229-C3 cells- why? Also, why use glioblastoma cells other than that they have high levels of HNRNPK? Would neuroblastoma cells be more appropriate if they are aiming to test for prion propagation?
      4. Human CRISPR Brunello pooled library- does the Brunello library use constructs which have four independent guide RNAs as used for the silencing of HNRPNK?
      5. To rank the 763 enriched genes, they multiply the -log10FDR with their effect size - is this a standard step that is normally undertaken?
      6. The 32 genes selected- they were ablated individually using constructs with one guide RNA or four guide RNAs?
      7. The identified targets were also tested in U251-MG cells and nine were confirmed but the percent viability was variable - is the variability simply a reflection of the different cell line?
      8. The two strongest hits were IKBAKP and TFAP2C. As TFAP2C is a transcription factor - is it known to modulate expression of any of the genes that were identified to be perturbed in the screen? Moreover, it is stated that it regulates expression of several lncRNAs- have the authors looked at expression of these lncRNAs- is the expression affected- can modulation of expression of these lncRNAs modulate the observed phenotypic effects and also some of the targets they have identified in the screen?
      9. As both HNRNPK and TFAP2C modulate glucose metabolism, the authors have chosen to explore the epistatic interaction. This is most reasonable.
      10. The orthogonal assay to confirm that deletion of TFAP2C supresses cell death upon removing HNRNPK- was this done using a single guide RNA or multiple guides - is there a level of suppression required to observe rescue? Interestingly ablation of HNRNPK increases TFAP2C expression in LN-229-C3 whereas in U251-Cas9 cells HNRNPK ablation has the opposite effect- both RNA and protein levels of TFAP2C are decreased - is this the cause of the smaller protective effect of TFAP2C deletion in this cell line?
      11. Nuclear localisation studies indicate that the HNRNPK and TFAP2C proteins colocalise in the nucleus however the co-IP data is not convincing- although appropriate controls are present, the level of interaction is very low - the amount of HNRNPK pulled down by TFAP2C is really very low in the LN-229C3 cells and even lower in the U251-Cas9 cells. Have they undertaken the reciprocal co-IP expt?
      12. They state that LN-229 C3 TFAP2C and U251-Cas9TFAP2C were only mildly resistant to the apoptotic action of staurosporin Fig 3E and F - I accept they have undertaken the stats which support their statement that at high concentrations of staurosporin the LN-229 C3 TFAP2C cells are less sensitive but the U251-Cas9TFAP2C decreased sensitivity is hard to believe. Has this been replicated? I agree that HNRNPK deletion causes apoptosis in both LN-229 C3 and U251-Cas9 cells and this is blocked by Z-VAD-FMK - however the block is not complete- the max viability for HNRNPK deletion in LN-229 C3 cells is about 40% whereas for U251-Cas9 cells it is about 30% - does this suggest that cells are being lost by another pathway. Have they tested concentrations higher than 10nM?
      13. The RNA-seq comparisons- the authors use log2 FC <0.5 upregulated or genes downregulated by a similar amount- this is a very low cut off and would include essentially minimal changes in expression - not convinced of the significance of such low-level changes.
      14. It is stated" Accordingly, we observed increased AMPK phosphorylation (pAMPK) upon ablation of HNRNPK, which was consistently reduced in LN-229 C3ΔTFAP2C cells (Supp. Fig. 5B). LN-229 C3ΔTFAP2C; ΔHNRNPK cells also showed a partial reduction of pAMPK relative to LN-229 C3ΔHNRNPK cells (Supp. Fig. 5B). These results suggest that hnRNP K depletion causes an energy shortfall, leading to cell death. I am not totally convinced by the data presented in this Fig. The authors have quantified the band intensity and present the ratio of pAMPK to AMPK. Please note that the actin levels are variable across the samples - did they normalise the data using the actin level before undertaking the comparisons? Also, if the authors think this is an important point which supports their conclusion, then it should be in the main body of the paper rather than the supplementary. If AMPK is being phosphorylated, this should lead to activation of the metabolic check point which involves p53 activation by phosphorylation. Activated p53 would turn on p21CIP1 which is a very sensitive indicator of p53 activation.
      15. We also do not understand why the mTOR Suppl. Fig. 5E is not in the main body of the paper. It's clear that RNA and protein levels of mTOR were downregulated in LN-229 C3ΔHNRNPK cells but were partially rebalanced by the ΔTFAP2C- however the ΔTFAP2C;ΔHNRNPK double deletion levels are only slightly higher than the ΔHNRNPK - they are not at the level NT or even ΔTFAP2C (Fig. 4C, Supp. Fig. 5E).
      16. The authors state: "Deletion of HNRNPK diminished the highly phosphorylated forms of 4EBP1, which instead were preserved in both LN-229 C3ΔTFAP2C and LN-229 C3ΔTFAP2C;ΔHNRNPK cells (Fig. 5C). Similarly, the S6 phosphorylation ratio was reduced in LN-229 C3ΔHNRNPK cells and was restored in the ΔTFAP2C;ΔHNRNPK double-ablated cells (Fig. 5C)."

      WE are not convinced that p4EBP1 is preserved in the LN-229 C3ΔTFAP2C cells - there is a very faint band which is at a lower level than the band in the LN-229 C3ΔHNRNPK cells. However, when both HNRNPK and TFAP2C were ablated, the p4EBP1 band is clear cut. I agree with the quantitation that deletion of HNRNPK and TFAP2C both reduce the level of 4EBP1 - the reduction is greater with TFAP2 but when both are deleted together the levels of 4EBP1 are higher and p4EBP1 is clearly present. In quantifying the S6 and pS6 levels, did the authors consider the actin levels- they present a ratio of the pS6 to S6. I may be lacking some understanding but why is the ratio of pS6/S6 being calculated. Is the level of pS6 not what is important - phosphorylation of S6 should lead it to being activated and thus it's the actual level of pS6 that is important, not the ratio to the non-phosphorylated protein. 17. When determining ATP levels, do they control for cell number? HNRNPK depletion results in lower ATP levels, co-deletion of TFAP2C rescues this. But this could be because there is less cell-death? So, more cells express ATP. Have they controlled for relative numbers of cells. 18. The construction of the HovL cell line that propagate ovine prions - very few details are provided of the susceptibility of the cell line to PG127 prions. 19. It is stated that HRNPK depletion from HovL cells increases PrpSC as determined by 6D11 fluorescence, but in the manuscript HRNPK depletion results in cell death. How does this come together? 20. They show that mTOR inhibition mimics the effect of HNRNPK deletion, why didn't they overexpress mTOR and see if that rescues this? This would indicate a causal relationship. 21. Flow cytometric data: supplementary Fig of Fig6d. - when they are looking at fixed cells the gating strategy for cells results in the inclusion of a lot of debris. The gate needs to be moved and be more specific to ensure results are interpreted properly. Same with the singlet gating. It's not tight enough, they include doublets as well which will skew their data. The gating strategy needs to be regated.

      Significance

      The manuscript "Prion propagation is controlled by a hierarchical network involving the nuclear Tfap2c and hnRNP K factors and the cytosolic mTORC1 complex" by Sellitto et al aims to examine how heterogenous nuclear ribonucleoprotein K (hnRNPK), limits pion propagation. They perform a synthetic - viability CRISPR- ablation screen to identify epistatic interactors of HNRNPK. They found that deletion of Transcription factor AP-2 (TFAP2C) suppressed the death of hnRNP-K depleted LN-229 and U-251 MG cells whereas its overexpression hypersensitized them to hnRNP K loss. Moreover, HNRNPK ablation decreased cellular ATP, downregulated genes related to lipid and glucose metabolism and enhanced autophagy. Simultaneous deletion of TFAP2C reversed these effects, restored transcription and alleviated energy deficiency.

    1. We want to provide you, the reader, a chance to explore mental health more. We want you to be considering potential benefits and harms to the mental health of different people (benefits like reducing stress, feeling part of a community, finding purpose, etc. and harms like unnecessary anxiety or depression, opportunities and encouragement of self-bullying, etc.). As you do this you might consider personality differences (such as introverts and extroverts), and neurodiversity, the ways people’s brains work and process information differently (e.g., ADHD, Autism, Dyslexia, Face blindness, depression, anxiety). But be careful generalizing about different neurotypes (such as Autism), especially if you don’t know them well. Instead try to focus on specific traits (that may or may not be part of a specific group) and the impacts on them (e.g., someone easily distracted by motion might…., or someone sensitive to loud sounds might…, or someone already feeling anxious might…). We will be doing a modified version of the five-step CIDER method (Critique, Imagine, Design, Expand, Repeat). While the CIDER method normally assumes that making a tool accessible to more people is morally good, if that tool is potentially harmful to people (e.g., give people unnecessary anxiety), then making the tool accessible to more people might be morally bad. So instead of just looking at the assumptions made about people and groups using a social media site, we will be also looking at potential harms to different people and groups using a social media site. So open a social media site on your device. Then do the following (preferably on paper or in a blank computer document):

      I like that this design analysis explicitly treats “accessibility to more people” as not automatically morally good if the underlying feature or platform dynamics can cause harm (e.g., unnecessary anxiety). That framing pushes us to evaluate both who benefits and who pays the costs, rather than assuming growth or engagement is neutral. It also made me think good mental-health-oriented design should be measured by outcomes like reduced harm and increased user agency—not just “time on site,” and that those metrics might differ across groups with different vulnerabilities.

    1. Capulet. But Montague is bound as well as I, In penalty alike; and 'tis not hard, I think, For men so old as we to keep the peace. Paris. Of honourable reckoning are you both; And pity 'tis you lived at odds so long. 275But now, my lord, what say you to my suit? Capulet. But saying o'er what I have said before: My child is yet a stranger in the world; She hath not seen the change of fourteen years, Let two more summers wither in their pride, 280Ere we may think her ripe to be a bride. Paris. Younger than she are happy mothers made. Capulet. And too soon marr'd are those so early made. The earth hath swallow'd all my hopes but she, She is the hopeful lady of my earth: 285But woo her, gentle Paris, get her heart, My will to her consent is but a part; An she agree, within her scope of choice Lies my consent and fair according voice. This night I hold an old accustom'd feast, 290Whereto I have invited many a guest, Such as I love; and you, among the store, One more, most welcome, makes my number more. At my poor house look to behold this night Earth-treading stars that make dark heaven light: 295Such comfort as do lusty young men feel When well-apparell'd April on the heel Of limping winter treads, even such delight Among fresh female buds shall you this night Inherit at my house; hear all, all see, 300And like her most whose merit most shall be: Which on more view, of many mine being one May stand in number, though in reckoning none, Come, go with me. [To Servant, giving a paper] 305Go, sirrah, trudge about Through fair Verona; find those persons out Whose names are written there, and to them say, My house and welcome on their pleasure stay. [Exeunt CAPULET and PARIS]

      paris ask for capulet permission to marry juliet but was denied bc juliet is too young and juliets consent matters too so he invites paris to a feast where juliet and other young girls would be present

    2. Benvolio. Good-morrow, cousin. Romeo. Is the day so young? Benvolio. But new struck nine. Romeo. Ay me! sad hours seem long. 185Was that my father that went hence so fast? Benvolio. It was. What sadness lengthens Romeo's hours? Romeo. Not having that, which, having, makes them short. Benvolio. In love? Romeo. Out— 190 Benvolio. Of love? Romeo. Out of her favour, where I am in love. Benvolio. Alas, that love, so gentle in his view, Should be so tyrannous and rough in proof! Romeo. Alas, that love, whose view is muffled still, 195Should, without eyes, see pathways to his will! Where shall we dine? O me! What fray was here? Yet tell me not, for I have heard it all. Here's much to do with hate, but more with love. Why, then, O brawling love! O loving hate! 200O any thing, of nothing first create! O heavy lightness! serious vanity! Mis-shapen chaos of well-seeming forms! Feather of lead, bright smoke, cold fire, sick health! 205Still-waking sleep, that is not what it is! This love feel I, that feel no love in this. Dost thou not laugh? Benvolio. No, coz, I rather weep. Romeo. Good heart, at what? 210 Benvolio. At thy good heart's oppression. Romeo. Why, such is love's transgression. Griefs of mine own lie heavy in my breast, Which thou wilt propagate, to have it prest With more of thine: this love that thou hast shown 215Doth add more grief to too much of mine own. Love is a smoke raised with the fume of sighs; Being purged, a fire sparkling in lovers' eyes; Being vex'd a sea nourish'd with lovers' tears: What is it else? a madness most discreet, 220A choking gall and a preserving sweet. Farewell, my coz. Benvolio. Soft! I will go along; An if you leave me so, you do me wrong. Romeo. Tut, I have lost myself; I am not here; 225This is not Romeo, he's some other where. Benvolio. Tell me in sadness, who is that you love. Romeo. What, shall I groan and tell thee? Benvolio. Groan! why, no. But sadly tell me who. 230 Romeo. Bid a sick man in sadness make his will: Ah, word ill urged to one that is so ill! In sadness, cousin, I do love a woman. Benvolio. I aim'd so near, when I supposed you loved. Romeo. A right good mark-man! And she's fair I love. 235 Benvolio. A right fair mark, fair coz, is soonest hit. Romeo. Well, in that hit you miss: she'll not be hit With Cupid's arrow; she hath Dian's wit; And, in strong proof of chastity well arm'd, From love's weak childish bow she lives unharm'd. 240She will not stay the siege of loving terms, Nor bide the encounter of assailing eyes, Nor ope her lap to saint-seducing gold: O, she is rich in beauty, only poor, That when she dies with beauty dies her store. 245 Benvolio. Then she hath sworn that she will still live chaste? Romeo. She hath, and in that sparing makes huge waste, For beauty starved with her severity Cuts beauty off from all posterity. She is too fair, too wise, wisely too fair, 250To merit bliss by making me despair: She hath forsworn to love, and in that vow Do I live dead that live to tell it now. Benvolio. Be ruled by me, forget to think of her. Romeo. O, teach me how I should forget to think. 255 Benvolio. By giving liberty unto thine eyes; Examine other beauties. Romeo. 'Tis the way To call hers exquisite, in question more: These happy masks that kiss fair ladies' brows 260Being black put us in mind they hide the fair; He that is strucken blind cannot forget The precious treasure of his eyesight lost: Show me a mistress that is passing fair, What doth her beauty serve, but as a note 265Where I may read who pass'd that passing fair? Farewell: thou canst not teach me to forget. Benvolio. I'll pay that doctrine, or else die in debt.

      benvolio ask romeo why he is so sad all the time for romeo to reveal that bc he is in love with a woman that doesnt love him back and benvolio try to cheer romeo up by saying there are other fishes in the sea and romeo said that the other fishes only remind him of the woman

    3. [Enter ROMEO] Benvolio. See, where he comes: so please you, step aside; I'll know his grievance, or be much denied. Montague. I would thou wert so happy by thy stay, To hear true shrift. Come, madam, let's away. 180 [Exeunt MONTAGUE and LADY MONTAGUE] Benvolio. Good-morrow, cousin. Romeo. Is the day so young? Benvolio. But new struck nine. Romeo. Ay me! sad hours seem long. 185Was that my father that went hence so fast? Benvolio. It was. What sadness lengthens Romeo's hours? Romeo. Not having that, which, having, makes them short. Benvolio. In love? Romeo. Out— 190 Benvolio. Of love? Romeo. Out of her favour, where I am in love. Benvolio. Alas, that love, so gentle in his view, Should be so tyrannous and rough in proof! Romeo. Alas, that love, whose view is muffled still, 195Should, without eyes, see pathways to his will! Where shall we dine? O me! What fray was here? Yet tell me not, for I have heard it all. Here's much to do with hate, but more with love. Why, then, O brawling love! O loving hate! 200O any thing, of nothing first create! O heavy lightness! serious vanity! Mis-shapen chaos of well-seeming forms! Feather of lead, bright smoke, cold fire, sick health! 205Still-waking sleep, that is not what it is! This love feel I, that feel no love in this. Dost thou not laugh? Benvolio. No, coz, I rather weep. Romeo. Good heart, at what? 210 Benvolio. At thy good heart's oppression. Romeo. Why, such is love's transgression. Griefs of mine own lie heavy in my breast, Which thou wilt propagate, to have it prest With more of thine: this love that thou hast shown 215Doth add more grief to too much of mine own. Love is a smoke raised with the fume of sighs; Being purged, a fire sparkling in lovers' eyes; Being vex'd a sea nourish'd with lovers' tears: What is it else? a madness most discreet, 220A choking gall and a preserving sweet. Farewell, my coz. Benvolio. Soft! I will go along; An if you leave me so, you do me wrong. Romeo. Tut, I have lost myself; I am not here; 225This is not Romeo, he's some other where. Benvolio. Tell me in sadness, who is that you love. Romeo. What, shall I groan and tell thee? Benvolio. Groan! why, no. But sadly tell me who. 230 Romeo. Bid a sick man in sadness make his will: Ah, word ill urged to one that is so ill! In sadness, cousin, I do love a woman. Benvolio. I aim'd so near, when I supposed you loved. Romeo. A right good mark-man! And she's fair I love. 235 Benvolio. A right fair mark, fair coz, is soonest hit. Romeo. Well, in that hit you miss: she'll not be hit With Cupid's arrow; she hath Dian's wit; And, in strong proof of chastity well arm'd, From love's weak childish bow she lives unharm'd. 240She will not stay the siege of loving terms, Nor bide the encounter of assailing eyes, Nor ope her lap to saint-seducing gold: O, she is rich in beauty, only poor, That when she dies with beauty dies her store. 245 Benvolio. Then she hath sworn that she will still live chaste? Romeo. She hath, and in that sparing makes huge waste, For beauty starved with her severity Cuts beauty off from all posterity. She is too fair, too wise, wisely too fair, 250To merit bliss by making me despair: She hath forsworn to love, and in that vow Do I live dead that live to tell it now. Benvolio. Be ruled by me, forget to think of her. Romeo. O, teach me how I should forget to think. 255 Benvolio. By giving liberty unto thine eyes; Examine other beauties. Romeo. 'Tis the way To call hers exquisite, in question more: These happy masks that kiss fair ladies' brows 260Being black put us in mind they hide the fair; He that is strucken blind cannot forget The precious treasure of his eyesight lost: Show me a mistress that is passing fair, What doth her beauty serve, but as a note 265Where I may read who pass'd that passing fair? Farewell: thou canst not teach me to forget. Benvolio. I'll pay that doctrine, or else die in debt. [Exeunt]

      Before going any further, My hypothesis is that Romeo's reason for feeling down has something to do with love.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Polymers of orthophosphate of varying lengths are abundant in prokaryotes and some eukaryotes, where they regulate many cellular functions. Though they exist in metazoans, few tools exist to study their function. This study documents the development of tools to extract, measure, and deplete inorganic polyphosphates in *Drosophila*. Using these tools, the authors show:

      (1) That polyP levels are negligible in embryos and larvae of all stages while they are feeding. They remain high in pupae but their levels drop in adults.

      (2) That many cells in tissues such as the salivary glands, oocytes, haemocytes, imaginal discs, optic lobe, muscle, and crop, have polyP that is either cytoplasmic or nuclear (within the nucleolus).

      (3) That polyP is necessary in plasmatocytes for blood clotting in Drosophila.

      (4) That ployP controls the timing of eclosion.

      The tools developed in the study are innovative, well-designed, tested, and well-documented. I enjoyed reading about them and I appreciate that the authors have gone looking for the functional role of polyP in flies, which hasn't been demonstrated before. The documentation of polyP in cells is convincing as its role in plasmatocytes in clotting.

      We sincerely thank the reviewer for their encouraging assessment and for recognizing both the innovation of the FLYX toolkit and the functional insights it enables. Their remarks underscore the importance of establishing Drosophila as a tractable model for polyP biology, and we are grateful for their constructive feedback, which further strengthened the manuscript.

      Its control of eclosion timing, however, could result from non-specific effects of expressing an exogenous protein in all cells of an animal.

      We now explicitly state this limitation in the revised manuscript (p.16, l.347–349). The issue is that no catalytic-dead ScPpX1 is available as a control in the field. We plan to generate such mutants through systematic structural and functional studies and will update the FLYX toolkit once they are developed and validated. Importantly, the accelerated eclosion phenotype is reproducible and correlates with endogenous polyP dynamics.

      The RNAseq experiments and their associated analyses on polyP-depleted animals and controls have not been discussed in sufficient detail.  In its current form, the data look to be extremely variable between replicates and I'm therefore unsure of how the differentially regulated genes were identified.

      We thank the reviewer for pointing out the lack of clarity. We have expanded our RNAseq analysis in the revised manuscript (p.20, l.430–434). Because of inter-sample variation (PC2 = 19.10%, Fig. S7B), we employed Gene Set Enrichment Analysis (GSEA) rather than strict DEG cutoffs. This method is widely used when the goal is to capture pathway-level changes under variability (1). We now also highlight this limitation explicitly (p.20, l.430–432) and provide an additional table with gene-specific fold change (See Supplementary Table for RNA Sequencing Sheet 1). Please note that we have moved RNAseq data to Supplementary Fig. 7 and 8 as suggested in the review.

      It is interesting that no kinases and phosphatases have been identified in flies. Is it possible that flies are utilising the polyP from their gut microbiota? It would be interesting to see if these signatures go away in axenic animals.

      This is an interesting possibility. Several observations argue that polyP is synthesized by fly tissues: (i) polyP levels remain very low during feeding stages but build up in wandering third instar larvae after feeding ceases; (ii) PPBD staining is absent from the gut except the crop (Fig. S3O–P); (ii) In C. elegans, intestinal polyP was unaffected when worms were fed polyP-deficient bacteria (2); (iv) depletion of polyP from plasmatocytes alone impairs hemolymph clotting, which would not be expected if gut-derived polyP were the major source and may have contributed to polyP in hemolymph. Nevertheless, we agree that microbiota-derived polyP may contribute, and we plan systematic testing in axenic flies in future work.

      Reviewer #2 (Public review):

      Summary:

      The authors of this paper note that although polyphosphate (polyP) is found throughout biology, the biological roles of polyP have been under-explored, especially in multicellular organisms. The authors created transgenic Drosophila that expressed a yeast enzyme that degrades polyP, targeting the enzyme to different subcellular compartments (cytosol, mitochondria, ER, and nucleus, terming these altered flies Cyto-FLYX, Mito-FLYX, etc.). The authors show the localization of polyP in various wild-type fruit fly cell types and demonstrate that the targeting vectors did indeed result in the expression of the polyP degrading enzyme in the cells of the flies. They then go on to examine the effects of polyP depletion using just one of these targeting systems (the Cyto-FLYX). The primary findings from the depletion of cytosolic polyP levels in these flies are that it accelerates eclosion and also appears to participate in hemolymph clotting. Perhaps surprisingly, the flies seemed otherwise healthy and appeared to have little other noticeable defects. The authors use transcriptomics to try to identify pathways altered by the cyto-FLYX construct degrading cytosolic polyP, and it seems likely that their findings in this regard will provide avenues for future investigation. And finally, although the authors found that eclosion is accelerated in the pupae of Drosophila expressing the Cyto-FLYX construct, the reason why this happens remains unexplained.

      Strengths:

      The authors capitalize on the work of other investigators who had previously shown that expression of recombinant yeast exopolyphosphatase could be targeted to specific subcellular compartments to locally deplete polyP, and they also use a recombinant polyP-binding protein (PPBD) developed by others to localize polyP. They combine this with the considerable power of Drosophila genetics to explore the roles of polyP by depleting it in specific compartments and cell types to tease out novel biological roles for polyP in a whole organism. This is a substantial advance.

      We are grateful to the reviewer for their thorough and thoughtful evaluation. Their balanced summary of our work, recognition of the strengths of our genetic tools, and constructive suggestions have been invaluable in clarifying our experiments and strengthening the conclusions.

      Weaknesses:

      Page 4 of the Results (paragraph 1): I'm a bit concerned about the specificity of PPBD as a probe for polyP. The authors show that the fusion partner (GST) isn't responsible for the signal, but I don't think they directly demonstrate that PPBD is binding only to polyP. Could it also bind to other anionic substances? A useful control might be to digest the permeabilized cells and tissues with polyphosphatase prior to PPBD staining and show that the staining is lost.

      To address this concern, we have done two sets of experiments:

      (1) We generated a PPBD mutant (GST-PPBD<sup>Mut</sup>). We establish that GST-PPBD binds to polyP-2X FITC, whereas GST-PPBD<sup>Mut</sup> and GST do not bind polyP<sub>100</sub>-2X FITC using Microscale Thermophoresis. We found that, unlike the punctate staining pattern of GST-PPBD (wild-type), GST-PPBD<sup>Mut</sup> does not stain hemocytes. This data has been added to the revised manuscript (Fig. 2B-D, p.8, l.151–165).

      (2) A study in C.elegans by Quarles et.al has performed a similar experiment, suggested by the reviewer. In that study, treating permeabilized tissues with polyphosphatase prior to PPBD staining resulted in a decrease of PPBD-GFP signal from the tissues (2). We also performed the same experiment where we subjected hemocytes to GST-PPBD staining with prior incubation of fixed and permeabilised hemocytes with ScPpX1 and heat-inactivated ScPpX1 protein. We find that both staining intensity and the number of punctae are higher in hemocytes left untreated and in those treated with heat-inactivated ScPpX1. The hemocytes pre-treated with ScPpX1 showed reduced staining intensity and number of punctae. This data has been added to the revised manuscript (Fig. 2E-G, p.8, l.166-172).

      Further, Saito et al. reported that PPBD binds to polyP in vitro, as well as in yeast and mammalian cells, with a high affinity of ~45µM for longer polyP chains (35 mer and above) (3). They also show that the affinity of PPBD with RNA and DNA is very low. Furthermore, PPBD could detect differences in polyP labeling in yeasts grown under different physiological conditions that alter polyP levels (3). Taken together, published work and our results suggest that PPBD specifically labels polyP.

      In the hemolymph clotting experiments, the authors collected 2 ul of hemolymph and then added 1 ul of their test substance (water or a polyP solution). They state that they added either 0.8 or 1.6 nmol polyP in these experiments (the description in the Results differs from that of the Methods). I calculate this will give a polyP concentration of 0.3 or 0.6 mM. This is an extraordinarily high polyP concentration and is much in excess of the polyP concentrations used in most of the experiments testing the effects of polyP on clotting of mammalian plasma. Why did the authors choose this high polyP concentration? Did they try lower concentrations? It seems possible that too high a polyP concentration would actually have less clotting activity than the optimal polyP concentration.

      We repeated the assays using 125 µM polyP, consistent with concentrations employed in mammalian plasma studies (4,5). Even at this lower, physiologically relevant concentration, polyP significantly enhanced clot fibre formation (Included as Fig. S5F–I, p.12, l.241–243). This reconfirms the conclusion that polyP promotes hemolymph clotting.

      Author response image 1.

      Reviewer #3 (Public review):

      Summary:

      Sarkar, Bhandari, Jaiswal, and colleagues establish a suite of quantitative and genetic tools to use Drosophila melanogaster as a model metazoan organism to study polyphosphate (polyP) biology. By adapting biochemical approaches for use in D. melanogaster, they identify a window of increased polyP levels during development. Using genetic tools, they find that depleting polyP from the cytoplasm alters the timing of metamorphosis, accelerating eclosion. By adapting subcellular imaging approaches for D. melanogaster, they observe polyP in the nucleolus of several cell types. They further demonstrate that polyP localizes to cytoplasmic puncta in hemocytes, and further that depleting polyP from the cytoplasm of hemocytes impairs hemolymph clotting. Together, these findings establish D. melanogaster as a tractable system for advancing our understanding of polyP in metazoans.

      Strengths:

      (1) The FLYX system, combining cell type and compartment-specific expression of ScPpx1, provides a powerful tool for the polyP community.

      (2) The finding that cytoplasmic polyP levels change during development and affect the timing of metamorphosis is an exciting first step in understanding the role of polyP in metazoan development, and possible polyP-related diseases.

      (3) Given the significant existing body of work implicating polyP in the human blood clotting cascade, this study provides compelling evidence that polyP has an ancient role in clotting in metazoans.

      We sincerely thank the reviewer for their generous and insightful comments. Their recognition of both the technical strengths of the FLYX system and the broader biological implications reinforces our confidence that this work will serve as a useful foundation for the community.

      Limitations:

      (1) While the authors demonstrate that HA-ScPpx1 protein localizes to the target organelles in the various FLYX constructs, the capacity of these constructs to deplete polyP from the different cellular compartments is not shown. This is an important control to both demonstrate that the GTS-PPBD labeling protocol works, and also to establish the efficacy of compartment-specific depletion. While not necessary to do this for all the constructs, it would be helpful to do this for the cyto-FLYX and nuc-FLYX.

      We confirmed polyP depletion in Cyto-FLYX using the malachite green assay (Fig. 3D, p.10, l.212–214). The efficacy of ScPpX1 has also been earlier demonstrated in mammalian mitochondria (6). Our preliminary data from Mito-ScPpX1 expressed ubiquitously with Tubulin-Gal4 showed a reduction in polyP levels when estimated from whole flies (See Author response image 2 below, ongoing investigation). In an independent study focusing on mitochondrial polyP depletion, we are characterizing these lines in detail  and plan to check the amount of polyP contributed to the cellular pool by mitochondria using subcellular fractionation. Direct phenotypic and polyP depletion analyses of Nuc-FLYX and ER-FLYX are also being carried out, but are in preliminary stages. That there is a difference in levels of polyP in various tissues and that we get a very little subscellular fraction for polyP analysis have been a few challenging issues. This analysis requires detailed, independent, and careful analysis, and thus, we refrain from adding this data to the current manuscript.

      Author response image 2.

      Regarding the specificity, Saito et.al. reported that PPBD binds to polyP in vitro, as well as in yeast and mammalian cells with a high affinity of ~45µM for longer polyP chains (35 mer and above) (3). They also show that the affinity of PPBD with RNA and DNA is very low. Further, PPBD could reveal differences in polyP labeling with yeasts grown in different physiological conditions that can alter polyP levels. Now in the manuscript, we included following data to show specificity of PPBD:

      To address this concern we have done two sets of experiments:

      We generated a PPBD mutant (GST-PPBD<sup>Mut</sup>). Using Microscale Thermophoresis, we establish that GST-PPBD binds to polyP<sub>100</sub>-2X-FITC, whereas, GST-PPBD<sup>Mut</sup> and GST do not bind polyP<sub>100</sub>-2X-FITC at all. We found that unlike the punctate staining pattern of GST-PPBD (wild-type), GST-PPBD<sup>Mut</sup> does not stain hemocytes. This data has been added to the revised manuscript (Fig. 2B-D, p.8, l.151–165).

      A study in C.elegans by Quarles et.al has performed a similar experiment suggested by the reviewer. In that study, treating permeabilized tissues with polyphosphatase prior to PPBD staining resulted in decrease of PPBD-GFP signal from the tissues (2). We also performed the same experiment where we subjected hemocytes to GST-PPBD staining with prior incubation of fixed and permeabilised hemocytes with ScPpX1 and heat inactivated ScPpX1 protein. We find that both intensity of staining and number of punctae are higher in hemocytes that were left untreated and the one where heat inactivated ScPpX1 was added. The hemocytes pre-treated with ScPpX1 showed reduced staining intensity and number of punctae. This data has been added to the revised manuscript (Fig. 2E-G, p.8, l.166-172).

      (2) The cell biological data in this study clearly indicates that polyP is enriched in the nucleolus in multiple cell types, consistent with recent findings from other labs, and also that polyP affects gene expression during development. Given that the authors also generate the Nuc-FLYX construct to deplete polyP from the nucleus, it is surprising that they test how depleting cytoplasmic but not nuclear polyP affects development. However, providing these tools is a service to the community, and testing the phenotypic consequences of all the FLYX constructs may arguably be beyond the scope of this first study.

      We agree this is an important avenue. In this first study, we focused on establishing the toolkit and reporting phenotypes with Cyto-FLYX. We are systematically assaying phenotypes from all FLYX constructs, including Nuc-FLYX, in ongoing studies

      Recommendations for the authors:

      Reviewing Editor Comment:

      The reviewers appreciated the general quality of the rigour and work presented in this manuscript. We also had a few recommendations for the authors. These are listed here and the details related to them can be found in the individual reviews below.

      (1) We suggest including an appropriate control to show that PPBD binds polyP specifically.

      We have updated the response section as follows:

      (a) Highlighted previous literature that showed the specificity of PPBD.

      (b) We show that the punctate staining observed by PPBD is not demonstrated by the mutant PPBD (PPBD<sup>Mut</sup>) in which amino acids that are responsible for polyP binding are mutated.

      (c) We show that PPBD<sup>Mut</sup> does not bind to polyP using Microscale Thermophoresis.

      (d) We show that treatment of fixed and permeabilised hemocytes with ScPpX1 reduces the PPBD staining intensity and number of punctae, as compared to tissues left untreated or treated with heat-inactivated ScPpX1.

      We have included these in our updated revised manuscript (Fig. 2B-G, p.8, l.151–157)

      (2) The high concentration of PolyP in the clotting assay might be impeding clotting. The authors may want to consider lowering this in their assays.

      We have addressed this concern in our revised manuscript. We have performed the clotting assays with lower polyP concentrations (concentrations previously used in clotting experiments with human blood and polyP). Data is included in Fig. S5F–I, p.12, l.241–243.

      (3) The RNAseq study: can the authors please describe this better and possibly mine it for the regulation of genes that affect eclosion?

      In our revised manuscript, we have included a broader discussion about the RNAseq analysis done in the article in both the ‘results’ and the ‘discussion’ sections, where we have rewritten the narrative from the perspective of accelerated eclosion. (p.15 l.310-335, p. 20, l.431-446).

      (4) Have the authors considered the possibility that the gut microbiota might be contributing to some of their measurements and assays? It would be good to address this upfront - either experimentally, in the discussion, or (ideally) both.

      This is an exciting possibility. Several observations argue that fly tissues synthesize polyP: (i) polyP levels remain very low during feeding stages but build up in wandering third instar larvae after feeding ceases; (ii) PPBD staining is absent from the gut except the crop (Fig. S3O–P); (iii) in C. elegans, intestinal polyP was unaffected when worms were fed polyP-deficient bacteria (2); (iv) depletion of polyP from plasmatocytes alone impairs hemolymph clotting, which would not be expected if gut-derived polyP were the major source and may have contributed to polyP in hemolymph. Nevertheless, microbiota-derived polyP may contribute, and we plan systematic testing in axenic flies in future work.

      Reviewer #1 (Recommendations for the authors):

      (1) While the authors have shown that the depletion tool results in a general reduction of polyP levels in Figure 3D, it would have been nice to show this via IHC. Particularly since the depletion depends on the strength of the Gal4, it is possible that the phenotypes are being under-estimated because the depletions are weak.

      We agree that different Gal4 lines have different strengths and will therefore affect polyP levels and the strength of the phenotype differently.

      We performed PPBD staining on hemocytes expressing ScPPX; however, we observed very intense, uniform staining throughout the cells, which was unexpected. It seems like PPBD is recognizing overexpressed ScPpX1. Indeed, in an unpublished study by Manisha Mallick (Bhandari lab), it was found that His-ScPpX1 specifically interacts with GST-PPBD in a protein interaction assay (See Author response image 3). Due to these issues, we refrained from IHC/PPBD-based validation.

      Author response image 3.

      (2) The subcellular tools for depletion are neat! I wonder why the authors didn't test them. For example in the salivary gland for nuclear depletion?

      We have addressed this question in the reviewer responses. We are systematically assaying phenotypes from all FLYX constructs, including Mito-FLYX, and Nuc-FLYX, in ongoing independent investigations. As discussed in #1, a possible interaction of ScPpX and PPBD is making this test a bit more challenging, and hence, they each require a detailed investigation.

      (a) Does the absence of clotting defects using Lz-gal4 suggest that PolyP is more crucial in the plasmatocytoes and for the initial clotting process? And that it is dispensible/less important in the crystal cells and for the later clotting process. Or is it that the crystal cells just don't have as much polyP? The image (2E-H) certainly looks like it.

      In hemolymph, the primary clot formation is a result of the clotting factors secreted from the fat bodies and the plasmatocytes. The crystal cells are responsible for the release of factors aiding in successfully hardening the soft clot initially formed. Reports suggest that clotting and melanization of the clot are independent of each other (7). Since Crystal cells do not contribute to clot fibre formation, the absence of clotting defects using LzGAL4-CytoFLYX is not surprising. Alternatively, PolyP may be secreted from all hemocytes and contribute to clotting; however, the crystal cells make up only 5% hemocytes, and hence polyP depletion in those cells may have a negligible effect on blood clotting.

      Crystal cells do show PPBD staining. Whether polyP is significantly lower in levels in the crystal cells as compared to the plasmatocytes needs more systematic investigation. Image (2E-H) is a representative image of the presence of polyP in crystal cells and can not be considered to compare polyP levels in the crystal cells vs Plasmatocytes.

      (b) The RNAseq analyses and data could be better presented. If the data are indeed variable and the differentially expressed genes of low confidence, I might remove that data entirely. I don't think it'll take away from the rest of the work.

      We understand this concern and, therefore, in the revised manuscript, we have included a broader discussion about the RNAseq analysis done in the article in both the ‘results’ and the ‘discussion’ sections, where we have rewritten the narrative from the perspective of accelerated eclosion. (p.15 l.310-335, p. 20, l.431-446). We have also stated the limitations of such studies.

      (c) I would re-phrase the first sentence of the results section.

      We have re-phrased it in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors created several different versions of the FLYX system that would be targeted to different subcellular compartments. They mostly report on the effects of cytosolic targeting, but some of the constructs targeted the polyphosphatase to mitochondria or the nucleus.

      They report that the targeting worked, but I didn't see any results on the effects of those constructs on fly viability, development, etc.

      There is a growing literature of investigators targeting polyphosphatase to mitochondria and showing how depleting mitochondrial polyP alters mitochondrial function. What was the effect of the Nuc-FLYX and Mito-FLYX constructs on the flies?

      Also, the authors should probably cite the papers of others on the effects of depleting mitochondrial polyP in other eukaryotic cells in the context of discussing their findings in flies.

      We have addressed this question in the reviewer responses. We did not see any obvious developmental or viability defects with any of the FLYX lines, and only after careful investigation did we come across the clotting defects in the CytoFLYX. We are currently systematically assaying phenotypes from all FLYX constructs, including Mito-FLYX and Nuc-FLYX, in independent ongoing investigations.

      We have discussed the heterologous expression of mitochondrial polyphosphatase in mammalian cells to justify the need for developing Mito-FLYX (p. 10, l. 197-200). In the discussion section, we also discuss the presence and roles of polyP in the nucleus and how Nuc-FLYX can help study such phenomena (p. 19, l. 399-407).

      (2) The authors should number the pages of their manuscript to make it easier for reviewers to refer to specific pages.

      We have numbered our lines and pages in the revised manuscript.

      (3) Abstract: the abbreviation, "polyP", is not defined in the abstract. The first word in the abstract is "polyphosphate", so it should be defined there.

      We have corrected it in the revised version.

      (4) The authors repeatedly use the phrase, "orange hot", to describe one of the colors in their micrographs, but I don't know how this differs from "orange".

      ‘OrangeHot’ is the name of the LUT used in the ImageJ analysis and hence referred to as the colour

      (5) First page of the Introduction: the phrase, "feeding polyP to αβ expression Alzheimer's model of Caenorhabditis elegans" is awkward (it literally means feeding polyP to the model instead of the worms).

      We have revised it. (p.3, l.55-57).

      (6) Page 2 of the Introduction: The authors should cite this paper when they state that NUDT3 is a polyphosphatase: https://pubmed.ncbi.nlm.nih.gov/34788624/

      We have cited the paper in the revised version of the manuscript. (p.4, l. 68-70)

      (7) Page 2 of Results: The authors report the polyP content in the third instar larva (misspelled as "larval") to five significant digits ("419.30"). Their data do not support more than three significant digits, though.

      We have corrected it in the revised manuscript.

      (8) Page 3 of Results (paragraph 1): When discussing the polyP levels in various larval stages, the authors are extracting total polyP from the larvae. It seems that at least some of the polyP may come from gut microbes. This should probably be mentioned.

      This is an interesting possibility. Several observations argue that polyP is synthesized by fly tissues: (i) polyP levels remain very low during feeding stages but build up in wandering third instar larvae after feeding ceases; (ii) PPBD staining is absent from the gut except the crop (Fig. S3O–P); (ii) In C. elegans, intestinal polyP was unaffected when worms were fed polyP-deficient bacteria (2); (iv) depletion of polyP from plasmatocytes alone impairs hemolymph clotting, which would not be expected if gut-derived polyP were the major source and may have contributed to polyP in hemolymph. We mention this limitation in the revised manuscript (p.19-20, l. 425-433).

      (9) Page 3 of Results (paragraph 2): stating that the 4% paraformaldehyde works "best" is imprecise. What do the authors mean by "best"?

      We have addressed this comment in the revised manuscript and corrected it as 4% paraformaldehyde being better among the three methods we used to fix tissues, which also included methanol and Bouin’s fixative  (p.8, l. 152-154).

      (10) Page 4 of Results (paragraph 2, last line of the page): The scientific literature is vast, so one can never be sure that one knows of all the papers out there, even on a topic as relatively limited as polyP. Therefore, I would recommend qualifying the statement "...this is the first comprehensive tissue staining report...". It would be more accurate (and safer) to say something like, "to our knowledge, this is the first..." There is a similar statement with the word "first" on the next page regarding the FLYX library.

      We have addressed this concern and corrected it accordingly in the revised version of the manuscript (p.9, l. 192-193)

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should include in their discussion a comparison of cell biological observations using the polyP binding domain of E. coli Ppx (GST-PPBD) to fluorescently label polyP in cells and tissues with recent work using a similar approach in C. elegans (Quarles et al., PMID:39413779).

      In the revised manuscript, we have cited the work of Quarles et al. and have added a comparison of observations (p.19,l.408-410). In the discussion, we have also focused on multiple other studies about how polyP presence in different subcellular compartments, like the nucleus, can be assayed and studied with the tools developed in this study.

      (2) The gene expression studies of time-matched Cyto-FLYX vs WT larvae is very intriguing. Given the authors' findings that non-feeding third instar Cyto-FLYX larvae are developmentally ahead of WT larvae, can the observed trends be explained by known changes in gene expression that occur during eclosion? This is mentioned in the results section in the context of genes linked to neurons, but a broader discussion of which pathway changes observed can be explained by the developmental stage difference between the WT and FLYX larvae would be helpful in the discussion.

      We have included a broader discussion about the RNAseq analysis done in the article in both the ‘results’ and the ‘discussion’ sections, where we have rewritten the narrative from the perspective of accelerated eclosion. (p.15 l.310-335, p. 20, l.431-446). We have also stated the limitations of such studies.

      (3) The sentence describing NUDT3 is not referenced.

      We have addressed this comment and have cited the paper of NUDT3 in the revised version of the manuscript.(p.4, l. 68-70)

      (4) In the first sentence of the results section, the meaning/validity of the statement "The polyP levels have decreased as evolution progressed" is not clear. It might be more straightforward to give an estimate of the total pmoles polyP/mg protein difference between bacteria/yeast and metazoans.

      In the revised manuscript, we have given an estimate of the polyP content across various species across evolution to uphold the statement that polyP levels have decreased as evolution progressed (p. 5, l. 87-91).

      (5) The description of the malachite green assay in the results section describes it as "calorimetric" but this should read "colorimetric?"

      We have corrected it in the revised manuscript.

      References

      (1) Chicco D, Agapito G. Nine quick tips for pathway enrichment analysis. PLoS Comput Biol. 2022 Aug 11;18(8):e1010348.

      (2) Quarles E, Petreanu L, Narain A, Jain A, Rai A, Wang J, et al. Cryosectioning and immunofluorescence of C. elegans reveals endogenous polyphosphate in intestinal endo-lysosomal organelles. Cell Rep Methods. 2024 Oct 8;100879.

      (3) Saito K, Ohtomo R, Kuga-Uetake Y, Aono T, Saito M. Direct labeling of polyphosphate at the ultrastructural level in Saccharomyces cerevisiae by using the affinity of the polyphosphate binding domain of Escherichia coli exopolyphosphatase. Appl Environ Microbiol. 2005 Oct;71(10):5692–701.

      (4) Smith SA, Mutch NJ, Baskar D, Rohloff P, Docampo R, Morrissey JH. Polyphosphate modulates blood coagulation and fibrinolysis. Proc Natl Acad Sci USA. 2006 Jan 24;103(4):903–8.

      (5) Smith SA, Choi SH, Davis-Harrison R, Huyck J, Boettcher J, Rienstra CM, et al. Polyphosphate exerts differential effects on blood clotting, depending on polymer size. Blood. 2010 Nov 18;116(20):4353–9.

      (6) Abramov AY, Fraley C, Diao CT, Winkfein R, Colicos MA, Duchen MR, et al. Targeted polyphosphatase expression alters mitochondrial metabolism and inhibits calcium-dependent cell death. Proc Natl Acad Sci USA. 2007 Nov 13;104(46):18091–6.

      (7) Schmid MR, Dziedziech A, Arefin B, Kienzle T, Wang Z, Akhter M, et al. Insect hemolymph coagulation: Kinetics of classically and non-classically secreted clotting factors. Insect Biochem Mol Biol. 2019 Jun;109:63–71.

      (8) Jian Guan, Rebecca Lee Hurto, Akash Rai, Christopher A. Azaldegui, Luis A. Ortiz-Rodríguez, Julie S. Biteen, Lydia Freddolino, Ursula Jakob. HP-Bodies – Ancestral Condensates that Regulate RNA Turnover and Protein Translation in Bacteria. bioRxiv 2025.02.06.636932; doi: https://doi.org/10.1101/2025.02.06.636932.

      (9) Lonetti A, Szijgyarto Z, Bosch D, Loss O, Azevedo C, Saiardi A. Identification of an evolutionarily conserved family of inorganic polyphosphate endopolyphosphatases. J Biol Chem. 2011 Sep 16;286(37):31966–74.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Chen et al. engineered and characterized a suite of next-generation GECIs for the Drosophila NMJ that allow for the visualization of calcium dynamics within the presynaptic compartment, at presynaptic active zones, and in the postsynaptic compartment. These GECIs include ratiometric presynaptic Scar8m (targeted to synaptic vesicles), ratiometric active zone localized Bar8f (targeted to the scaffold molecule BRP), and postsynaptic SynapGCaMP8m. The authors demonstrate that these new indicators are a large improvement on the widely used GCaMP6 and GCaMP7 series GECIs, with increased speed and sensitivity. They show that presynaptic Scar8m accurately captures presynaptic calcium dynamics with superior sensitivity to the GCaMP6 and GCaMP7 series and with similar kinetics to chemical dyes. The active-zone targeted Bar8f sensor was assessed for the ability to detect release-site-specific nanodomain changes, but the authors concluded that this sensor is still too slow to accurately do so. Lastly, the use of postsynaptic SynapGCaMP8m was shown to enable the detection of quantal events with similar resolution to electrophysiological recordings. Finally, the authors developed a Python-based analysis software, CaFire, that enables automated quantification of evoked and spontaneous calcium signals. These tools will greatly expand our ability to detect activity at individual synapses without the need for chemical dyes or electrophysiology.

      We thank this Reviewer for the overall positive assessment of our manuscript and for the incisive comments.

      (1) The role of Excel in the pipeline could be more clearly explained. Lines 182-187 could be better worded to indicate that CaFire provides analysis downstream of intensity detection in ImageJ. Moreover, the data type of the exported data, such as .csv or .xlsx, should be indicated instead of 'export to graphical program such as Microsoft Excel'.

      We thank the Reviewer for these comments, many of which were shared by the other reviewers. In response, we have now 1) more clearly explained the role of Excel in the CaFire pipeline (lines 677-681), 2) revised the wording in lines 676-679 to indicate that CaFire provides analysis downsteam of intensity detection in ImageJ, and 3) Clarified the exported data type to Excel (lines 677-681). These efforts have improved the clarity and readability of the CaFire analysis pipeline.

      (2) In Figure 2A, the 'Excel' step should either be deleted or included as 'data validation' as ImageJ exports don't require MS Excel or any specific software to be analysed. (Also, the graphic used to depict Excel software in Figure 2A is confusing.)

      We thank the reviewer for this helpful suggestion. In the Fig. 2A, we have changed the Excel portion and clarified the processing steps in the revised methods. Specifically, we now indicate that ROIs are first selected in Fiji/ImageJ and analyzed to obtain time-series data containing both the time information and the corresponding imaging mean intensity values. These data are then exported to a spreadsheet file (e.g., Excel), which is used to organize the output before being imported into CaFire for subsequent analysis. These changes can be found in the Fig. 2A and methods (lines 676-681).

      (3) Figure 2B should include the 'Partition Specification' window (as shown on the GitHub) as well as the threshold selection to give the readers a better understanding of how the tool works.

      We absolutely agree with this comment, and have made the suggested changes to the Fig. 2B. In particular, we have replaced the software interface panels and now include windows illustrating the Load File, Peak Detection, and Partition functions. These updated screenshots provide a clearer view of how CaFire is used to load the data, detect events, and perform partition specification for subsequent analysis. We agree these changes will give the readers a better understanding of how the tool works, and we thank the reviewer for this comment.

      (4) The presentation of data is well organized throughout the paper. However, in Figure 6C, it is unclear how the heatmaps represent the spatiotemporal fluorescence dynamics of each indicator. Does the signal correspond to a line drawn across the ROI shown in Figure 6B? If so, this should be indicated.

      We apologize that the heatmaps were unclear in Fig panel 6C (Fig. 7C in the Current revision). Each heatmap is derived from a one-pixel-wide vertical line within a miniature-event ROI. These heatmaps correspond to the fluorescence change in the indicated SynapGCaMP variant of individual quantal events and their traces shown in Fig. 7C, with a representative image of the baseline and peak fluorescence shown in Fig. 7B. Specifically, we have added the following to the revised Fig. 7C legend:

      The corresponding heatmaps below were generated from a single vertical line extracted from a representative miniature-event ROI, and visualize the spatiotemporal fluorescence dynamics (ΔF/F) along that line over time.

      (5) In Figure 6D, the addition of non-matched electrophysiology recordings is confusing. Maybe add "at different time points" to the end of the 6D legend, or consider removing the electrophysiology trace from Figure 6D and referring the reader to the traces in Figure 7A for comparison (considering the same point is made more rigorously in Figure 7).

      This is a good point, one shared with another reviewer. We apologize this was not clear, and have now revised this part of the figure to remove the electrophysiological traces in what is now Fig. 7 while keeping the paired ones still in what is now Fig. 8A as suggested by the reviewer. We agree this helps to clarify the quantal calcium transients.

      (6) In GitHub, an example ImageJ Script for analyzing the images and creating the inputs for CaFire would be helpful to ensure formatting compatibility, especially given potential variability when exporting intensity information for two channels. In the Usage Guide, more information would be helpful, such as how to select ∆R/R, ideally with screenshots of the application being used to analyze example data for both single-channel and two-channel images.

      We agree that additional details added to the GitHub would be helpful for users of CaFire. In response, we have now added the following improvements to the GitHub site: 

      - ImageJ operation screenshots

      Step-by-step illustrations of ROI drawing and Multi Measure extraction.

      - Example Excel file with time and intensity values

      Demonstrates the required data format for CaFire import, including proper headers.

      - CaFire loading screenshots for single-channel and dual-channel imaging

      Shows how to import GCaMP into Channel 1 and mScarlet into Channel 2.

      - Peak Detection and Partition setting screenshots

      Visual examples of automatic peak detection, manual correction, and trace partitioning.

      - Instructions for ROI Extraction and CaFire Analysis

      A written guide describing the full workflow from ROI selection to CaFire data export.

      These changes have improved the usability and accessibility of CaFire, and we thank the reviewer for these points.

      Reviewer #2

      Calcium ions play a key role in synaptic transmission and plasticity. To improve calcium measurements at synaptic terminals, previous studies have targeted genetically encoded calcium indicators (GECIs) to pre- and postsynaptic locations. Here, Chen et al. improve these constructs by incorporating the latest GCaMP8 sensors and a stable red fluorescent protein to enable ratiometric measurements. In addition, they develop a new analysis platform, 'CaFire', to facilitate automated quantification. Using these tools, the authors demonstrate favorable properties of their sensors relative to earlier constructs. Impressively, by positioning postsynaptic GCaMP8m near glutamate receptors, they show that their sensors can report miniature synaptic events with speed and sensitivity approaching that of intracellular electrophysiological recordings. These new sensors and the analysis platform provide a valuable tool for resolving synaptic events using all-optical methods.

      We thank the Reviewer for their overall positive evaluation and comments.

      Major comments:

      (1) While the authors rigorously compared the response amplitude, rise, and decay kinetics of several sensors, key parameters like brightness and photobleaching rates are not reported. I feel that including this information is important as synaptically tethered sensors, compared to freely diffusible cytosolic indicators, can be especially prone to photobleaching, particularly under the high-intensity illumination and high-magnification conditions required for synaptic imaging. Quantifying baseline brightness and photobleaching rates would add valuable information for researchers intending to adopt these tools, especially in the context of prolonged or high-speed imaging experiments.

      This is a good point made by the reviewer, and one we agree will be useful for researchers to be aware. First, it is important to note that the photobleaching and brightness of the sensors will vary depending on the nature of the user’s imaging equipment, which can vary significantly between widefield microscopes (with various LED or halogen light sources for illumination), laser scanning systems (e.g., line scans with confocal systems), or area scanning systems using resonant scanners (as we use in our current study). Under the same imaging settings, GCaMP8f and 8m exhibit comparable baseline fluorescence, whereas GCaMP6f and 6s are noticeably dimmer; because our aim is to assess each reagent’s potential under optimal conditions, we routinely adjust excitation/camera parameters before acquisition to place baseline fluorescence in an appropriate dynamic range. As an important addition to this study, motivated by the reviewer’s comments above, we now directly compare neuronal cytosolic GCaMP8m expression with our Scar8m sensor, showing higher sensitivity with Scar8m (now shown in the new Fig. 3F-H).

      Regarding photobleaching, GCaMP signals are generally stable, while mScarlet is more prone to bleaching: in presynaptic area scanned confocal recordings, the mScarlet channel drops by ~15% over 15 secs, whereas GCaMP6s/8f/8m show no obvious bleaching over the same window (lines 549-553). In contrast, presynaptic widefield imaging using an LED system (CCD), GCaMP8f shows ~8% loss over 15 secs (lines 610-611). Similarly, for postsynaptic SynapGCaMP6f/8f/8m, confocal resonant area scans show no obvious bleaching over 60 secs, while widefield shows ~2–5% bleaching over 60 secs (lines 634-638). Finally, in active-zone/BRP calcium imaging (confocal), mScarlet again bleaches by ~15% over 15 s, while GCaMP8f/8m show no obvious bleaching. The mScarlet-channel bleaching can be corrected in Huygens SVI (Bleaching correction or via the Deconvolution Wizard), whereas we avoid applying bleaching correction to the green GCaMP channel when no clear decay is present to prevent introducing artifacts. This information is now added to the methods (lines 548-553).

      (2) In several places, the authors compare the performance of their sensors with synthetic calcium dyes, but these comparisons are based on literature values rather than on side-by-side measurements in the same preparation. Given differences in imaging conditions across studies (e.g., illumination, camera sensitivity, and noise), parameters like indicator brightness, SNR, and photobleaching are difficult to compare meaningfully. Additionally, the limited frame rate used in the present study may preclude accurate assessment of rise times relative to fast chemical dyes. These issues weaken the claim made in the abstract that "...a ratiometric presynaptic GCaMP8m sensor accurately captures .. Ca²⁺ changes with superior sensitivity and similar kinetics compared to chemical dyes." The authors should clearly acknowledge these limitations and soften their conclusions. A direct comparison in the same system, if feasible, would greatly strengthen the manuscript.

      We absolutely agree with these points made the reviewer, and have made a concerted effort to address them through the following:

      We have now directly compared presynaptic calcium responses on the same imaging system using the chemical dye Oregon Green Bapta-1 (OGB-1), one of the primary synthetic calcium indicators used in our field. These experiments reveal that Scar8f exhibits markedly faster kinetics and an improved signal-to-noise ratio compared to OGB-1, with higher peak fluorescence responses (Scar8f: 0.32, OGB-1: 0.23). The rise time constants of the two indicators are comparable (both ~3 msecs), whereas the decay of Scar8f is faster than that of OGB-1 (Scar8f: ~40, OGB-1: ~60), indicating more rapid signal recovery. These results now directly demonstrate the superiority of the new GCaMP8 sensors we have engineered over conventional synthetic dyes, and are now presented in the new Fig. 3A-E of the manuscript.

      We agree with the reviewer that, in the original submission, the relatively slow resonant area scans (~115 fps) limited the temporal resolution of our rise time measurements. To address this, we have re-measured the rise time using higher frame-rate line scans (kHz). For Scar8f, the rise time constant was 6.736 msec at ~115 fps resonant area scanned, but shortened to 2.893 msec when imaged at ~303 fps, indicating that the original protocol underestimated the true kinetics. In addition, for Bar8m, area scans at ~118 fps yielded a rise time constant of 9.019 msec, whereas line scans at ~1085 fps reduced the rise time constant to 3.230 msec. These new measurements are now incorporated into the manuscript ( Figs. 3,4, and 6) to more accurately reflect the fast kinetics of these indicators.

      (3) The authors state that their indicators can now achieve measurements previously attainable with chemical dyes and electrophysiology. I encourage the authors to also consider how their tools might enable new measurements beyond what these traditional techniques allow. For example, while electrophysiology can detect summed mEPSPs across synapses, imaging could go a step further by spatially resolving the synaptic origin of individual mEPSP events. One could, for instance, image MN-Ib and MN-Is simultaneously without silencing either input, and detect mEPSP events specific to each synapse. This would enable synapse-specific mapping of quantal events - something electrophysiology alone cannot provide. Demonstrating even a proof-of-principle along these lines could highlight the unique advantages of the new tools by showing that they not only match previous methods but also enable new types of measurements.

      These are excellent points raised by the reviewer. In response, we have done the following: 

      We have now included a supplemental video as “proof-of-principle” data showing simultaneous imaging of SynapGCaMP8m quantal events at both MN-Is and -Ib, demonstrating that synapse-specific spatial mapping of quantal events can be obtained with this tool (see new Supplemental Video 1). 

      We have also included an additional discussion of the potential and limitations of these tools for new measurements beyond conventional approaches. This discussion is now presented in lines 419-421 in the manuscript.

      (4) For ratiometric measurements, it is important to estimate and subtract background signals in each channel. Without this correction, the computed ratio may be skewed, as background adds an offset to both channels and can distort the ratio. However, it is not clear from the Methods section whether, or how, background fluorescence was measured and subtracted.

      This is a good point, and we agree more clarification about how ratiometric measurements were made is needed. In response, we have now added the following to the Methods section (lines 548-568):

      Time-lapse videos were stabilized and bleach-corrected prior to analysis, which visibly reduced frame-toframe motion and intensity drift. In the presynaptic and active-zone mScarlet channel, a bleaching factor of ~1.15 was observed during the 15 sec recording. This bleaching can be corrected using the “Bleaching correction” tool in Huygens SVI. For presynaptic and active-zone GCaMP signals, there was minimal bleaching over these short imaging periods. Therefore, the bleaching correction step for GCaMP was skipped. Both GCaMP and mScarlet channels were processed using the default settings in the Huygens SVI “Deconvolution Wizard” (with the exception of the bleaching correction option). Deconvolution was performed using the CMLE algorithm with the Huygens default stopping criterion and a maximum of 30 iterations, such that the algorithm either converged earlier or, if convergence was not reached, was terminated at this 30iteration limit; no other iteration settings were used across the GCaMP series. ROIs were drawn on the processed images using Fiji ImageJ software, and mean fluorescence time courses were extracted for the GCaMP and mScarlet channels, yielding F<sub>GCaMP</sub>(t) and F<sub>mScarlet</sub>(t). F(t)s were imported into CaFire with GCaMP assigned to Channel #1 (signal; required) and mScarlet to Channel #2 (baseline/reference; optional). If desired, the mScarlet signal could be smoothed in CaFire using a user-specified moving-average window to reduce high-frequency noise. In CaFire’s ΔR/R mode, the per-frame ratio was computed as R(t)=F<sub>GCaMP</sub>(t) and F<sub>mScarlet</sub>(t); a baseline ratio R0 was estimated from the pre-stimulus period, and the final response was reported as ΔR/R(t)=[R(t)−R0]/R0, which normalizes GCaMP signals to the co-expressed mScarlet reference and thereby reduces variability arising from differences in sensor expression level or illumination across AZs.

      (5) At line 212, the authors claim "... GCaMP8m showing 345.7% higher SNR over GCaMP6s....(Fig. 3D and E) ", yet the cited figure panels do not present any SNR quantification. Figures 3D and E only show response amplitudes and kinetics, which are distinct from SNR. The methods section also does not describe details for how SNR was defined or computed.

      This is another good point. We define SNR operationally as the fractional fluorescence change (ΔF/F). Traces were processed with CaFire, which estimates a per-frame baseline F<sub>0</sub>(t) with a user-configurable sliding window and percentile. In the Load File panel, users can specify both the length of the moving baseline window and the desired percentile; the default settings are a 50-point window and the 30th percentile, representing a 101-point window centered on each time point (previous 50 to next 50 samples) and took the lower 30% of values within that window to estimate F<sub>0</sub>(t). The signal was then computed as ΔF/F=[F(t)−F0(t)]/F0(t). This ΔF/F value is what we report as SNR throughout the manuscript and is now discussed explicitly in the revised methods (lines 686-693).

      (6) Lines 285-287 "As expected, summed ΔF values scaled strongly and positively with AZ size (Fig. 5F), reflecting a greater number of Cav2 channels at larger AZs". I am not sure about this conclusion. A positive correlation between summed ΔF values and AZ size could simply reflect more GCaMP molecules in larger AZs, which would give rise to larger total fluorescence change even at a given level of calcium increase.

      The reviewer makes a good point, one that we agree should be clarified. The reviewer is indeed correct that larger active zones should have more abundant BRP protein, which in turn will lead to a higher abundance of the Bar8f sensor, which should lead to a higher GCaMP response simply by having more of this sensor. However, the inclusion of the ratiometric mScarlet protein should normalize the response accurately, correcting for this confound, in which the higher abundance of GCaMP should be offset (normalized) by the equally (stoichiometric) higher abundance of mScarlet. Therefore, when the ∆R/R is calculated, the differences in GCaMP abundance at each AZ should be corrected for the ratiometric analysis. We now use an improved BRP::mScarlet3::GCaMP8m (Bar8m) and compute ΔR/R with R(t)=F<sub>GCaMP8m</sub>/F<sub>mScarlet3</sub>. ROIs were drawn over individual AZs (Fig. 6B). CaFire estimated R0 with a sliding 101-point window using the lowest 10% of values, and responses were reported as ΔR/R=[R−R0]/R0. Area-scan examples (118 fps) show robust ΔR/R transients (peaks ≈1.90 and 3.28; tau rise ≈9.0–9.3 ms; Fig. 6C, middle).

      We have now made these points more clearly in the manuscript (lines 700-704) and moved the Bar8f intensity vs active zone size data to Table S1. Together, these revisions improve the indicator-abundance confound (via mScarlet normalization). 

      (6) Lines 313-314: "SynapGCaMP quantal signals appeared to qualitatively reflect the same events measured with electrophysiological recordings (Fig. 6D)." This statement is quite confusing. In Figure 6D, the corresponding calcium and ephys traces look completely different and appear to reflect distinct sets of events. It was only after reading Figure 7 that I realized the traces shown in Figure 6D might not have been recorded simultaneously. The authors should clarify this point.

      Yes, we absolutely agree with this point, one shared by Reviewer 1. In response, we have removed the electrophysiological traces in Fig. 6 to clarify that just the calcium responses are shown, and save the direct comparison for the Fig. 7 data (now revised Fig. 8).

      (8) Lines 310-313: "SynapGCaMP8m .... striking an optimal balance between speed and sensitivity", and Lines 314-316: "We conclude that SynapGCaMP8m is an optimal indicator to measure quantal transmission events at the synapse." Statements like these are subjective. In the authors' own comparison, GCaMP8m is significantly slower than GCaMP8f (at least in terms of decay time), despite having a moderately higher response amplitude. It is therefore unclear why GCaMP8m is considered 'optimal'. The authors should clarify this point or explain their rationale for prioritizing response amplitude over speed in the context of their application.

      This is another good point that we agree with, as the “optimal” sensor will of course depend on the user’s objectives. Hence, we used the term “an optimal sensor” to indicate it is what we believed to be the best one for our own uses. However, this point should be clarified and better discussed. In response, we have revised the relevant sections of the manuscript to better define why we chose the 8m sensors to strike an optimal balance of speed and sensitivity for our uses, and go on to discuss situations in which other sensor variants might be better suited. These are now presented in lines 223-236 in the revised manuscript, and we thank the reviewer for making these comments, which have improved our study.

      Minor comments

      (1)  Please include the following information in the Methods section:

      (a) For Figures 3 and 4, specify how action potentials were evoked. What type of electrodes were used, where were they placed, and what amount of current or voltage was applied?

      We apologize for neglecting to include this information in the original submission. We have now added this information to the revised Methods section (lines 537-543).

      (b) For imaging experiments, provide information on the filter sets used for each imaging channel, and describe how acquisition was alternated or synchronized between the green and red channels in ratiometric measurements. Additionally, please report the typical illumination intensity (in mW/mm²) for each experimental condition.

      We thank the reviewer for this helpful comment. We have now added detailed information about the imaging configuration to the Methods (lines 512-528) with the following:

      Ca2+ imaging was conducted using a Nikon A1R resonant scanning confocal microscope equipped with a 60x/1.0 NA water-immersion objective (refractive index 1.33). GCaMP signals were acquired using the FITC/GFP channel (488-nm laser excitation; emission collected with a 525/50-nm band-pass filter), and mScarlet/mCherry signals were acquired using the TRITC/mCherry channel (561-nm laser excitation; emission collected with a 595/50-nm band-pass filter). ROIs focused on terminal boutons of MN-Ib or -Is motor neurons. For both channels, the confocal pinhole was set to a fixed diameter of 117.5 µm (approximately three Airy units under these conditions), which increases signal collection while maintaining adequate optical sectioning. Images were acquired as 256 × 64 pixel frames (two 12-bit channels) using bidirectional resonant scanning at a frame rate of ~118 frames/s; the scan zoom in NIS-Elements was adjusted so that this field of view encompassed the entire neuromuscular junction and was kept constant across experiments. In ratiometric recordings, the 488-nm (GCaMP) and 561-nm (mScarlet) channels were acquired in a sequential dual-channel mode using the same bidirectional resonant scan settings: for each time point, a frame was first collected in the green channel and then immediately in the red channel, introducing a small, fixed frame-to-frame temporal offset while preserving matched spatial sampling of the two channels.

      Directly measuring the absolute laser power at the specimen plane (and thus reporting illumination intensity in mW/mm²) is technically challenging on this resonant-scanning system, because it would require inserting a power sensor into the beam path and perturbing the optical alignment; consequently, we are unable to provide reliable absolute mW/mm² values. Instead, we now report all relevant acquisition parameters (objective, numerical aperture, refractive index, pinhole size, scan format, frame rate, and fixed laser/detector settings) and note that laser powers were kept constant within each experimental series and chosen to minimize bleaching and phototoxicity while maintaining an adequate signal-to-noise ratio. We have now added the details requested in the revised Methods section (lines 512-535), including information about the filter sets, acquisition settings, and typical illumination intensity.

      (2) Please clarify what the thin versus thick traces represent in Figures 3D, 3F, 4C, and 4E. Are the thin traces individual trials from the same experiment, or from different experiments/animals? Does the thick trace represent the mean/median across those trials, a fitted curve, or a representative example?

      We apologize this was not more clear in the original submission. Thin traces are individual stimulus-evoked trials (“sweeps”) acquired sequentially from the same muscle/NMJ in a single preparation; the panel is shown as a representative example of recordings collected across animals. The thick colored trace is the trialaveraged waveform (arithmetic mean) of those thin traces after alignment to stimulus onset and baseline subtraction (no additional smoothing beyond what is stated in Methods). The thick black curve over the decay phase is a single-exponential fit used to estimate τ. Specifically, we fit the decay segment by linear regression on the natural-log–transformed baseline-subtracted signal, which is equivalent to fitting y = y<sub>peak</sub>·e<sup>−t/τdecay</sup> over the decay window (revised Fig.4D and Fig.5C legends).

      (3) Please clarify what the reported sample size (n) represents. Does it indicate the number of experimental repeats, the number of boutons or PSDs, or the number of animals?

      Again, we apologize this was not clear. (n) refers to the number of animals (biological replicates), which is reported in Supplementary Table 1. All imaging was performed at muscle 6, abdominal segment A3. Per preparation, we imaged 1-2 NMJs in total, with each imaging targeting 2–3 terminal boutons at the target NMJ and acquired 2–3 imaging stacks choosing different terminal boutons per NMJ. For the standard stimulation protocol, we delivered 1 Hz stimulation for 1ms and captured 14 stimuli in a 15s time series imaging (lines 730-736).

      Reviewer #3

      Genetically encoded calcium indicators (GECIs) are essential tools in neurobiology and physiology. Technological constraints in targeting and kinetics of previous versions of GECIs have limited their application at the subcellular level. Chen et al. present a set of novel tools that overcome many of these limitations. Through systematic testing in the Drosophila NMJ, they demonstrate improved targeting of GCaMP variants to synaptic compartments and report enhanced brightness and temporal fidelity using members of the GCaMP8 series. These advancements are likely to facilitate more precise investigation of synaptic physiology.

      This is a comprehensive and detailed manuscript that introduces and validates new GECI tools optimized for the study of neurotransmission and neuronal excitability. These tools are likely to be highly impactful across neuroscience subfields. The authors are commended for publicly sharing their imaging software.

      This manuscript could be improved by further testing the GECIs across physiologically relevant ranges of activity, including at high frequency and over long imaging sessions. The authors provide a custom software package (CaFire) for Ca2+ imaging analysis; however, to improve clarity and utility for future users, we recommend providing references to existing Ca2+ imaging tools for context and elaborating on some conceptual and methodological aspects, with more guidance for broader usability. These enhancements would strengthen this already strong manuscript.

      We thank the Reviewer for their overall positive evaluation and comments. 

      Major comments:

      (1) Evaluation of the performance of new GECI variants using physiologically relevant stimuli and frequency. The authors took initial steps towards this goal, but it would be helpful to determine the performance of the different GECIs at higher electrical stimulation frequencies (at least as high as 20 Hz) and for longer (10 seconds) (Newman et al, 2017). This will help scientists choose the right GECI for studies testing the reliability of synaptic transmission, which generally requires prolonged highfrequency stimulation.

      We appreciate this point by the reviewer and agree it would be of interest to evaluate sensor performance with higher frequency stimulation and for a longer duration. In response, we performed a variety of stimulation protocols at high intensities and times, but found the data to be difficult to separate individual responses given the decay kinetics of all calcium sensors. Hence, we elected not to include these in the revised manuscript. However, we have now included an evaluation of the sensors with 20 Hz electrical stimulation for ~1 sec using a direct comparison of Scar8f with OGB-1. These data are now presented in a new Fig. 3D,E and discussed in the manuscript (lines 396-403).

      (2) CaFire.

      The authors mention, in line 182: 'Current approaches to analyze synaptic Ca2+ imaging data either repurpose software designed to analyze electrophysiological data or use custom software developed by groups for their own specific needs.' References should be provided. CaImAn comes to mind (Giovannucci et al., 2019, eLife), but we think there are other software programs aimed at analyzing Ca2+ imaging data that would permit such analysis.

      Thank you for the thoughtful question. At this stage, we’re unable to provide a direct comparison with existing analysis workflows. In surveying prior studies that analyze Drosophila NMJ Ca²⁺ imaging traces, we found that most groups preprocess images in Fiji/ImageJ and then rely on their own custom-made MATLAB or Python scripts for downstream analysis (see Blum et al. 2021; Xing and Wu 2018). Because these pipelines vary widely across labs, a standardized head-to-head evaluation isn’t currently feasible. With CaFire, our goal is to offer a simple, accessible tool that does not require coding experience and minimizes variability introduced by custom scripts. We designed CaFire to lower the barrier to entry, promote reproducibility, and make quantal event analysis more consistent across users. We have added references to the sentence mentioned above.

      Regarding existing software that the reviewer mentioned – CaImAn (Giovannucci et al. 2019): We evaluated CaImAn, which is a powerful framework designed for large-scale, multicellular calcium imaging (e.g., motion correction, denoising, and automated cell/ROI extraction). However, it is not optimized for the per-event kinetics central to our project - such as extracting rise and decay times for individual quantal events at single synapses. Achieving this level of granularity would typically require additional custom Python scripting and parameter tuning within CaImAn’s code-centric interface. This runs counter to CaFire’s design goals of a nocode, task-focused workflow that enables users to analyze miniature events quickly and consistently without specialized programming expertise.

      Regarding Igor Pro (WaveMetrics), (Müller et al. 2012): Igor Pro is another platform that can be used to analyze calcium imaging signals. However, it is commercial (paid) software and generally requires substantial custom scripting to fit the specific analyses we need. In practice, it does not offer a simple, open-source, point-and-click path to per-event kinetic quantification, which is what CaFire is designed to provide.

      The authors should be commended for making their software publicly available, but there are some questions:

      How does CaFire compare to existing tools?

      As mentioned above, we have not been able to adapt the custom scripts used by various labs for our purposes, including software developed in MatLab (Blum et al. 2021), Python (Xing and Wu 2018), and Igor (Müller et al. 2012). Some in the field do use semi-publically available software, including Nikon Elements (Chen and Huang 2017) and CaImAn (Giovannucci et al. 2019). However, these platforms are not optimized for the per-event kinetics central to our project - such as extracting rise and decay times for individual quantal events at single synapses. We have added more details about CaFire, mainly focusing on the workflow and measurements, highlighting the superiority of CaFire, showing that CaFire provides a no-code, standardized pipeline with automated miniature-event detection and per-event metrics (e.g., amplitude, rise time τ, decay time τ), optional ΔR/R support, and auto-partition feature. Collectively, these features make CaFire simpler to operate without programming expertise, more transparent and reproducible across users, and better aligned with the event-level kinetics required for this project.

      Very few details about the Huygens deconvolution algorithms and input settings were provided in the methods or text (outside of MLE algorithm used in STED images, which was not Ca2+ imaging). Was it blind deconvolution? Did the team distill the point-spread function for the fluorophores? Were both channels processed for ratiometric imaging? Were the same settings used for each channel? Importantly, please include SVI Huygens in the 'Software and Algorithms' Section of the methods.

      We thank the reviewer for raising this important point. We have now expanded the Methods to describe our use of Huygens in more detail and have added SVI Huygens Professional (Scientific Volume Imaging, Hilversum, The Netherlands) to the “Software and Algorithms” section. For Ca²⁺ imaging data, time-lapse stacks were processed in the Huygens Deconvolution Wizard using the standard estimation algorithm (CMLE). This is not a blind deconvolution procedure. Instead, Huygens computes a theoretical point-spread function (PSF) from the full acquisition metadata (objective NA, refractive index, voxel size/sampling, pinhole, excitation/emission wavelengths, etc.); if refractive index values are provided and there is a mismatch, the PSF is adjusted to account for spherical aberration. We did not experimentally distill PSFs from bead measurements, as Huygens’ theoretical PSFs are sufficient for our data.

      Both green (GCaMP) and red (mScarlet) channels were processed for ratiometric imaging using the same workflow (stabilization, optional bleaching correction, and deconvolution within Huygens). For each channel, the PSF, background, and SNR were estimated automatically by the same built-in algorithms, so the underlying procedures were identical even though the numerical values differ between channels because of their distinct wavelengths and noise characteristics. Importantly, Huygens normalizes each PSF to unit total intensity, such that the deconvolution itself does not add or remove signal and therefore preserves intensity ratios between channels; only background subtraction and bleaching correction can change absolute fluorescence values. For the mScarlet channel, where we observed modest bleaching (~1.10 over 15 sec), we applied Huygens’ bleaching correction and visually verified that similar structures maintained comparable intensities after correction. For presynaptic GCaMP signals, bleaching over these short recordings was negligible, so we omitted the bleaching-correction step to avoid introducing multiplicative artifacts. This workflow ensures that ratiometric ΔR/R measurements are based on consistently processed, intensity-conserving deconvolved images in both channels.

      The number of deconvolution iterations could have had an effect when comparing GCAMP series; please provide an average number of iterations used for at least one experiment. For example, Figure 3, Syt::GCAMP6s, Scar8f & Scar8m, and, if applicable, the maximum number of permissible iterations.

      We thank the reviewer for this comment. For all Ca²⁺ imaging datasets, deconvolution in Huygens was performed using the recommended default settings of the CMLE algorithm with a maximum of 30 iterations. The stopping criterion was left at the Huygens default, so the algorithm either converged earlier or, if convergence was not reached, terminated at this 30-iteration limit. No other iteration settings were used across the GCaMP series (lines 555-559).

      Please clarify if the 'Express' settings in Huygens changed algorithms or shifted input parameters.

      We appreciate the reviewer’s question regarding the Huygens “Express” settings. For clarity, we note that all Ca²⁺ imaging data reported in this manuscript were deconvolved using the “Deconvolution Wizard”, not the “Deconvolution Express” mode. In the Wizard, we explicitly selected the CMLE algorithm (or GMLE in a few STED-related cases as recommended by SVI), using the recommended maximum of 30 iterations, and other recommended settings while allowing Huygens to auto-estimate background and SNR for each channel.Bleaching correction was toggled manually per channel (applied to mScarlet when bleaching was evident, omitted for GCaMP when bleaching was negligible), as described in the revised Methods (lines 553-559).

      By contrast, the Deconvolution Express tool in Huygens is a fully automated front-end that can internally adjust both the choice of deconvolution algorithm (e.g., CMLE vs. GMLE/QMLE) and key input parameters such as SNR, number of iterations, and quality threshold based on the selected “smart profile” and the image metadata. In preliminary tests on our datasets, Express sometimes produced results that were either overly smoothed or showed subtle artifacts, so we did not use it for any data included in this study. Instead, we relied exclusively on the Wizard with explicitly controlled settings to ensure consistency and transparency across all GCaMP series and ratiometric analyses.

      We suggest including a sample data set, perhaps in Excel, so that future users can beta test on and organize their data in a similar fashion.

      We agree that this would be useful, a point shared by R1 above. In response, we have added a sample data set to the GitHub site and included sample ImageJ data along with screenshots to explain the analysis in more detail. These improvements are discussed in the manuscript (lines 705-708).

      (3) While the challenges of AZ imaging are mentioned, it is not discussed how the authors tackled each one. What is defined as an active zone? Active zones are usually identified under electron microscopy. Arguably, the limitation of GCaMP-based sensors targeted to individual AZs, being unable to resolve local Ca2+ changes at individual boutons reliably, might be incorrect. This could be a limitation of the optical setup being used here. Please discuss further. What sensor performance do we need to achieve this performance level, and/or what optical setup would we need to resolve such signals?

      We appreciate the reviewer’s thoughtful comments and agree that the technical challenges of active zone (AZ) Ca²⁺ imaging merit further clarification. We defined AZs, as is the convention in our field, as individual BRP puncta at NMJs. These BRP puncta co-colocalize with individual puncta of other AZ components, including CAC, RBP, Unc13, etc. ROIs were drawn tightly over individual BRP puncta and only clearly separable spots were included.

      To tackle the specific obstacles of AZ imaging (small signal volume, high AZ density, and limited photon budget at high frame rates), we implemented both improved sensors and optimized analysis (Fig. 6). First, we introduced a ratiometric AZ-targeted indicator, BRP::mScarlet3::GCaMP8m (Bar8m), and computed ΔR/R with ΔR/R with R(t)=F<sub>GCaMP8m</sub>/F<sub>mScarlet3</sub>. ROIs were drawn over individual AZs (Fig. 6B). Under our standard resonant area-scan conditions (~118 fps), Bar8m produces robust ΔR/R transients at individual AZs (example peaks ≈ 3.28; τ<sub>rise</sub>≈9.0 ms; Fig. 6C, middle), indicating that single-AZ signals can be detected reproducibly when AZs are optically resolvable.

      Second, we increased temporal resolution using high-speed Galvano line-scan imaging (~1058 fps), which markedly sharpened the apparent kinetics (τ<sub>rise</sub>≈3.23 ms) and revealed greater between-AZ variability (Fig. 6C, right; 6D–E). Population analyses show that line scans yield much faster rise times than area scans (Fig. 6D) and a dramatically higher fraction of significantly different AZ pairs (8.28% and 4.14% in 8f and 8m areascan vs 78.62% in 8m line-scan, lines 721-725), uncovering pronounced AZ-to-AZ heterogeneity in Ca²⁺ signals. Together, these revisions demonstrate that under our current confocal configuration, AZ-targeted GCaMP8m can indeed resolve local Ca²⁺ changes at individual, optically isolated boutons.

      We have revised the Discussion to clarify that our original statement about the limitations of AZ-targeted GCaMPs refers specifically to this combination of sensor and optical setup, rather than an absolute limitation of AZ-level Ca²⁺ imaging. In our view, further improvements in baseline brightness and dynamic range (ΔF/F or ΔR/R per action potential), combined with sub-millisecond kinetics and minimal buffering, together with optical configurations that provide smaller effective PSFs and higher photon collection (e.g., higher-NA objectives, optimized 2-photon or fast line-scan modalities, and potentially super-resolution approaches applied to AZ-localized indicators), are likely to be required to achieve routine, high-fidelity Ca²⁺ measurements at every individual AZ within a neuromuscular junction.

      (4) In Figure 5: Only GCAMP8f (Bar8f fusion protein) is tested here. Consider including testing with GCAMP8m. This is particularly relevant given that GCAMP8m was a more successful GECI for subcellular post-synaptic imaging in Figure 6.

      We appreciate this point and request by Reviewer 3. The main limitation for detecting local calcium changes at AZs is the speed of the calcium sensor, and hence we used the fastest available (GCaMP8f) to test the Bar8f sensor. While replacing GCaMP8f with GCaMP8m would indeed be predicted to enhance sensitivity (SNR), since GCaMP8m does not have faster kinetics relative to GCaMP8f, it is unlikely to be a more successful GECI for visualizing local calcium differences at AZs. 

      That being said, we agree that the Bar8m tool, including the improved mScarlet3 indicator, would likely be of interest and use to the field. Fortunately, we had engineered the Bar8m sensor while this manuscript was in review, and just recently received transgenic flies. We have evaluated this sensor, as requested by the reviewer, and included our findings in Fig. 1 and 6. In short, while the sensitivity is indeed enhanced in Bar8m compared to Bar8f, the kinetics remain insufficient to capture local AZ signals. These findings are discussed in the revised manuscript (lines 424-442, 719-730), and we appreciate the reviewer for raising these important points.

      In earlier experiments, Bar8f yielded relatively weak fluorescence, so we traded frame rate for image quality during resonant area scans (~60 fps). After switching to Bar8m, the signal was bright enough to restore our standard 118 fps area-scan setting. Nevertheless, even with dual-channel resonant area scans and ratiometric (GCaMP/mScarlet) analysis, AZ-to-AZ heterogeneity remained difficult to resolve. Because Ca²⁺ influx at individual active zones evolves on sub-millisecond timescales, we adopted a high-speed singlechannel Galvano line-scan (~1 kHz) to capture these rapid transients. We first acquired a brief area image to localize AZ puncta, then positioned the line-scan ROI through the center of the selected AZ. This configuration provided the temporal resolution needed to uncover heterogeneity that was under-sampled in area-scan data. Consistent with this, Bar8m line-scan data showed markedly higher AZ heterogeneity (significant AZ-pair rate ~79%, vs. ~8% for Bar8f area scans and ~4% for Bar8m area scans), highlighting Bar8m’s suitability for quantifying AZ diversity. We have updated the text, Methods, and figure legend accordingly (tell reviewer where to find everything).

      (5) Figure 5D and associated datasets: Why was Interquartile Range (IQR) testing used instead of ZScoring? Generally, IQR is used when the data is heavily skewed or is not normally distributed. Normality was tested using the D'Agostino & Pearson omnibus normality test and found that normality was not violated. Please explain your reasoning for the approach in statistical testing. Correlation coefficients in Figures 5 E & F should also be reported on the graph, not just the table. In Supplementary Table 1. The sub-table between 4D-F and 5E-F, which describes the IQR, should be labeled as such and contain identifiers in the rows describing which quartile is described. The table description should be below. We would recommend a brief table description for each sub-table.

      Thank you for this helpful suggestion. We have updated the analysis in two complementary ways. First, we now perform paired two-tailed t-tests between every two AZs within the same preparation (pairwise AZ–AZ comparisons of peak responses). At α<0.05, the fraction of significant AZ pairs is ~79% for Bar8m line-scan data versus ~8% for Bar8f area-scan data, indicating markedly greater AZ-to-AZ diversity when measured at high temporal resolution. Second, for visually marking the outlying AZs, we re-computed the IQR (Q1–Q3) based on the individual values collected from each AZs(15 data points per AZ, 30 AZs for each genotype), and marked AZs whose mean response falls above Q3 or below Q1; IQR is used here solely as a robust dispersion reference rather than for hypothesis testing. Both analyses support the same observation: Bar8m line-scan data reveal substantially higher AZ heterogeneity than Bar8f and Bar8m area-scan data. We have revised the Methods, figure panels, and legends accordingly (t-test details; explicit “IQR (Q1–Q3)” labeling; significant AZ-pair rates reported on the plots) (lines 719-730).

      (6) Figure 6 and associated data. The authors mention: ' SynapGCaMP quantal signals appeared to qualitatively reflect the same events measured with electrophysiological recordings (Fig. 6D).' If that was the case, shouldn't the ephys and optical signal show some sort of correlation? The data presented in Figure 6D show no such correlation. Where do these signals come from? It is important to show the ROIs on a reference image.

      We apologize this was not clear, as similar points were raised by R1 and R2. We were just showing separate (uncorrelated) sample traces of electrophysiological and calcium imaging data. Given how confusing this presentation turned out to be, and the fact that we show the correlated ephys and calcium imaging events in Fig. 7, we have elected to remove the uncorrelated electrophysiological events in Fig. 6 to just focus on the calcium imaging events (now Figures 7 and 8).

      Figure 7B: Were Ca2+ transients not associated with mEPSPs ever detected? What is the rate of such events?

      This is an astute question. Yes indeed, during simultaneous calcium imaging and current clamp electrophysiology recordings, we occasionally observed GCaMP transients without a detectable mEPSP in the electrophysiological trace. This may reflect the detection limit of electrophysiology for very small minis; with our noise level and the technical limitation of the recording rig, events < ~0.2 mV cannot be reliably detected, whereas the optical signal from the same quantal event might still be detected. The fraction of calcium-only events was ~1–10% of all optical miniature events, depending on genotype (higher in lines with smaller average minis). These calcium-only detections were low-amplitude and clustered near the optical threshold (lines 361-365).

      Minor comments

      (1) It should be mentioned in the text or figure legend whether images in Figure 1 were deconvolved, particularly since image pre-processing is only discussed in Figure 2 and after.

      We thank the reviewer for pointing this out. Yes, the confocal images shown in Figure 1 were also deconvolved in Huygens using the CMLE-based workflow described in the revised Methods. We applied deconvolution to improve contrast, reduce out-of-focus blur, and better resolve the morphology of presynaptic boutons, active zones, and postsynaptic structures, so that the localization of each sensor is more clearly visualized. We have now explicitly stated in the Fig. 1 legend and Methods (lines 575-577) that these images were deconvolved prior to display. 

      (2) The abbreviation, SNR, signal-to-noise ratio, is not defined in the text.

      We have corrected this error and thank the reviewer for pointing this out.

      (3) Please comment on the availability of fly stocks and molecular constructs.

      We have clarified that all fly stocks and molecular constructs will be shared upon request (lines 747-750). We are also in the process of depositing the new Scar8f/m, Bar8f/m, and SynapGCaMP sensors to the Bloomington Drosophila Stock Center for public dissemination.

      (4) Please add detection wavelengths and filter cube information for live imaging experiments for both confocal and widefield.

      We thank the reviewer for this helpful suggestion. We have now added the detection wavelengths and filter cube configurations for both confocal and widefield live imaging to the Methods.

      For confocal imaging, GCaMP signals were acquired on a Nikon A1R system using the FITC/GFP channel (488-nm laser excitation; emission collected with a 525/50-nm band-pass filter), and mScarlet signals were acquired using the TRITC/mCherry channel (561-nm laser excitation; emission collected with a 595/50-nm band-pass filter). Both channels were detected with GaAsP detectors under the same pinhole and scan settings described above (lines 512-517).

      For widefield imaging, GCaMP was recorded using a GFP filter cube (LED excitation ~470/40 nm; emission ~525/50 nm), which is now explicitly described in the revised Methods section (lines 632-633).

      (5) Please include a mini frequency analysis in Supplemental Figure S1.

      We apologize for not including this information in the original submission. This is now included in the Supplemental Figure S1.

      (6) In Figure S1B, consider flipping the order of EPSP (currently middle) and mEPSP (currently left), to easily guide the reader through the quantification of Figure S1A (EPSPs, top traces & mEPSPs, bottom traces).

      We agree these modifications would improve readability and clarity. We have now re-ordered the electrophysiological quantifications in Fig. S1B as requested by the reviewer.

      (7) Figure 6C: Consider labeling with sensor name instead of GFP.

      We agree here as well, and have removed “GFP” and instead added the GCaMP variant to the heatmap in Fig. 7C.

      (8) Figure 6E, 7B, 7E: Main statistical differences highlighting sensor performance should be represented on the figures for clarity.

      We did not show these differences in the original submission in an effort to keep the figures “clean” and for clarity, putting the detailed statistical significance in Table S1. However, we agree with the reviewer that it would be easier to see these in the Fig. 6E and 7B,E graphs. This information has now been added the Figs. 7 and 8.

      (9) Please report if the significance tested between the ephys mini (WT vs IIB-/-, WT vs IIA-/-, IIB-/- vs IIA-/-) is the same as for Ca2+ mini (WT vs IIB-/-, WT vs IIA-/-, IIB-/- vs IIA-/-). These should also exhibit a very high correlation (mEPSP (mV) vs Ca2+ mini deltaF/F). These tests would significantly strengthen the final statement of "SynapGCaMP8m can capture physiologically relevant differences in quantal events with similar sensitivity as electrophysiology."

      We agree that adding the more detailed statistical analysis requested by the reviewer would strengthen the evidence for the resolution of quantal calcium imaging using SynapGCaMP8m. We have included the statistical significance between the ephys and calcium minis in Fig. 8 and included the following in the revised methods (lines 358-361), the Fig. 8 legend and Table S1:

      Using two-sample Kolmogorov–Smirnov (K–S) tests, we found that SynapGCaMP8m Ca²⁺ minis (ΔF/F, Fig. 8E) differ significantly across all genotype pairs (WT vs IIB<sup>-/-</sup>, WT vs IIA<sup>-/-</sup>, IIB<sup>-/-</sup> vs IIA<sup>-/-</sup>; all p < 0.0001). The genotype rank order of the group means (±SEM) is IIB<sup>-/-</sup> > WT > IIA<sup>-/-</sup> (0.967 ± 0.036; 0.713 ± 0.021; 0.427 ± 0.017; n=69, 65, 59). For electrophysiological minis (mEPSP amplitude, Fig. 8F), K–S tests likewise show significant differences for the same comparisons (all p < 0.0001) with D statistics of 0.1854, 0.3647, and 0.4043 (WT vs IIB<sup>-/-</sup>, WT vs IIA<sup>-/-</sup>, IIB<sup>-/-</sup> vs IIA<sup>-/-</sup>, respectively). Group means (±SEM) again follow IIB<sup>-/-</sup> > WT > IIA<sup>-/-</sup> (0.824 ± 0.017 mV; 0.636 ± 0.015 mV; 0.383 ± 0.007 mV; n=41 each). These K–S results demonstrate identical significance and rank order across modalities, supporting our conclusion that SynapGCaMP8m resolves physiologically relevant quantal differences with sensitivity comparable to electrophysiology.

      References

      Blum, Ian D., Mehmet F. Keleş, El-Sayed Baz, Emily Han, Kristen Park, Skylar Luu, Habon Issa, Matt Brown, Margaret C. W. Ho, Masashi Tabuchi, Sha Liu, and Mark N. Wu. 2021. 'Astroglial Calcium Signaling Encodes Sleep Need in Drosophila', Current Biology, 31: 150-62.e7.

      Chen, Y., and L. M. Huang. 2017. 'A simple and fast method to image calcium activity of neurons from intact dorsal root ganglia using fluorescent chemical Ca(2+) indicators', Mol Pain, 13: 1744806917748051.

      Giovannucci, Andrea, Johannes Friedrich, Pat Gunn, Jérémie Kalfon, Brandon L. Brown, Sue Ann Koay, Jiannis Taxidis, Farzaneh Najafi, Jeffrey L. Gauthier, Pengcheng Zhou, Baljit S. Khakh, David W. Tank, Dmitri B. Chklovskii, and Eftychios A. Pnevmatikakis. 2019. 'CaImAn an open source tool for scalable calcium imaging data analysis', eLife, 8: e38173.

      Müller, M., K. S. Liu, S. J. Sigrist, and G. W. Davis. 2012. 'RIM controls homeostatic plasticity through modulation of the readily-releasable vesicle pool', J Neurosci, 32: 16574-85.

      Wu, Yifan, Keimpe Wierda, Katlijn Vints, Yu-Chun Huang, Valerie Uytterhoeven, Sahil Loomba, Fran Laenen, Marieke Hoekstra, Miranda C. Dyson, Sheng Huang, Chengji Piao, Jiawen Chen, Sambashiva Banala, Chien-Chun Chen, El-Sayed Baz, Luke Lavis, Dion Dickman, Natalia V. Gounko, Stephan Sigrist, Patrik Verstreken, and Sha Liu. 2025. 'Presynaptic Release Probability Determines the Need for Sleep', bioRxiv: 2025.10.16.682770.

      Xing, Xiaomin, and Chun-Fang Wu. 2018. 'Unraveling Synaptic GCaMP Signals: Differential Excitability and Clearance Mechanisms Underlying Distinct Ca<sup>2+</sup> Dynamics in Tonic and Phasic Excitatory, and Aminergic Modulatory Motor Terminals in Drosophila', eneuro, 5: ENEURO.0362-17.2018.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Hao Jiang et al described a systematic approach to identify proline hydroxylation proteins. The authors implemented a proteomic strategy with HILIC-chromatographic separation and reported an identification of 4993 sites from HEK293 cells (4 replicates) and 3247 sites from RCC4 sites (3 replicates) with 1412 sites overlapping between the two cell lines. From the analysis, the authors identified 225 sites and 184 sites respectively from 293 and RCC4 cells with HyPro diagnostic ion. The identifications were validated by analyzing a few synthetic peptides, with a specific focus on Repo-man (CDCA2) through comparing MS/MS spectra, retention time, and diagnostic ions. With SILAC analysis and recombinant enzyme assay, the study showed that Repo-man HyPro604 is a target of the PHD1 enzyme.

      Strengths:

      The study involved extensive LC-MS analysis and was carefully implemented. The identification of over 4000 confident proline hydroxylation sites would be a valuable resource for the community. The characterization of Repo-man proline hydroxylation is a novel finding.

      Weaknesses:

      However, as a study mainly focused on methodology, the findings from the experimental data did not convincingly demonstrate the sensitivity and specificity of the workflow for site-specific identification of proline hydroxylation in global studies.

      Proline hydroxylation is an enzymatic post translational protein modification, catalysed by prolyl Hydroxylases (PHDs), which can have profound biological significance, e.g. altering protein half-life and/or the stability of protein-protein interactions. Furthermore, there has been controversy in the field as to the true number of protein targets for PHDs in cells. Thus, there is a clear need for methods to enable the robust identification of genuine PHD targets and to reliably map sites of PHD-catalysed proline hydroxylation in proteins. We believe, therefore, that our methodology, as reported here in Jiang et al., is an important contribution towards this goal. We note that our methodology has already been used successfully by others

      (https://doi.org/10.1016/j.mcpro.2025.100969). While further improvements in this methodology may of course be developed in future, we are not currently aware of any superior methods that have been reported previously in the literature. The criticism made by the reviewer notably does not include reference to any such alternative published methodology that interested researchers can use which would offer superior results to the approach we document in this study.

      Major concerns:

      (1) The study applied HILIC-based chromatographic separation with a goal of enriching and separating hydroxyproline-containing peptides. However, as the authors mentioned, such an approach is not specific to proline hydroxylation. In addition, many other chromatography techniques can achieve deep proteome fractionation such as high pH reverse phase fractionation, strong-cation exchange etc. There was no data in this study to demonstrate that the strategy offered improved coverage of proline hydroxylation proteins, as the identifications of the HyPro sites could be achieved through deep fractionation and a highly sensitive LCMS setup. The data of Figure 2A and S1A were somewhat confusing without a clear explanation of the heat map representations. 

      The data we present in this study demonstrate clearly that peptides with hydroxylated prolines are enriched in specific HILIC fractions (F10-F18), in comparison with total unfractionated peptides derived from cell extracts. We also refer the reviewer to our previously published study by Bensaddek et al (International Journal of Mass Spectrometry: doi:10.1016/j.ijms.2015.07.029), which was reference 41 in this study, in which we compared directly the performance of both HILIC and strong anionic exchange chromatography, (hSAX). This showed that HILIC provided superior enrichment to hSAX for enrichment of peptides containing hydroxylated proline residues. To clarify this point for readers, we have now included a specific reference to our previous study at the start of the Results section in our current revision. Currently, we use HILIC to provide a degree of enrichment for proline hydroxylated peptides because we are not aware of alternative chromatographic methods that in our hands provide better results.

      We have included descriptions of the information shown in the heatmaps in the associated figure legends and captions.

      (2) The study reported that the HyPro immonium ion is a diagnostic ion for HyPro identification. However, the data showed that only around 5% of the identifications had such a diagnostic ion. In comparison, acetyl-lysine immonium ion was previously reported to be a useful marker for acetyllysine peptides (PMID: 18338905), and the strategy offered a sensitivity of 70% with a specificity of 98%. In this study, the sensitivity of HyPro immonium ion was quite low. The authors also clearly demonstrated that the presence of immonium ion varied significantly due to MS settings, peptide sequence, and abundance. With further complications from L/I immonium ions, it became very challenging to implement this strategy in a global LC-MS analysis to either validate or invalidate HyPro identifications.

      The reviewer appears to have misunderstood the point we make with regard to the identification of the immonium ion and its use as a diagnostic marker for proline hydroxylation in MS analyses. We do not claim that this immonium ion is an essential diagnostic marker for proline hydroxylation. As the reviewer notes, with respect to the acetyl-lysine modification, the corresponding immonium ion is often used in MS studies as a diagnostic for identification of specific post translational modifications. Previous studies have reported that the immonium ion for hydroxylated proline is detected when the transcription factor HIF is analysed, but is often absent with other putative PHD targets, which has been used as an argument that these targets are not genuine proline hydroxylation sites. We are not, therefore, introducing the idea in this study that the hydroxy-proline immonium ion is a required diagnostic marker for proline hydroxylation, but instead demonstrating that detection of this ion, at least in some peptide sequences, may require the use of higher MS collision energies than are typically required for routine peptide identification. We believe that this is an interesting observation that can help to clear up discussions in the literature regarding the true prevalence of PHD-catalysed proline hydroxylation in different target proteins. Our data suggest that, in future MS studies analysing suspected PHD target proteins, two different collision energy might need to be used, i.e., normal collision energy for the routine identification of a peptide, combined with use of a higher collision energy if the hydroxy-proline immonium ion was not already detected.

      (3) The study aimed to apply the HILIC-based proteomics workflow to identify HyPro proteins regulated by the PHD enzyme. However, the quantification strategy was not rigorous. The study just considered the HyPro proteins not identified by FG-4592 treatment as potential PHD targeted proteins. There are a few issues. First, such an analysis was not quantitative without reproducibility or statistical analysis. Second, it did not take into consideration that data-dependent LC-MS analysis was not comprehensive and some peptide ions may not be identified due to background interferences. Lastly, FG-4592 treatment for 24 hrs could lead to wide changes in gene expressions and protein abundances. Therefore, it is not informative to draw conclusions based on the data for bioinformatic analysis.

      We refer the reviewer to the data we present in this study using SILAC analysis, combined with our MS workflow. to achieve a more accurate quantitative picture of proline hydroxylation levels. While we agree that the point the reviewer makes is valid, regarding our data dependent LC-MS/MS analysis potentially not being comprehensive, this means, however, that we are potentially underestimating the true prevalence of proline hydroxylated peptides, not overestimating the level of these modified peptides. We also refer the reviewer to the accompanying study by Druker et al., (eLife 2025; doi.org/10.7554/eLife.108131.1)  in which we present a detailed follow-on study demonstrating the functional significance of the novel proline hydroxylation site we detected in the protein RepoMan (CDCA2). Therefore, even if we have not achieved a fully comprehensive analysis of all proline hydroxylated peptides catalysed by PHD enzymes, we believe that we have advanced the field by documenting a workflow that is able to identify and validate novel PHD targets.

      (4) The authors performed an in vitro PHD1 enzyme assay to validate that Repo-man can be hydroxylated by PHD1. However, Figure 9 did not show quantitatively PHD1-induced increase in Repo-man HyPro abundance and it is difficult to assess its reaction efficiency to compare with HIF1a HyPro.

      The analysis shown in Figure 9 was not intended to quantify the efficiency of in vitro hydroxylation of RepoMan by PHD1, but rather to answer the question, ‘Can recombinant PHD1 alone hydroxylate P604 on RepoMan in vitro, yes or no?’. The data show that the answer here is ‘yes’. Clearly, the HIF peptide is a more efficient substrate in vitro for recombinant PHD1 than the RepoMan peptide and we have now included a statement in the Discussion that addresses the significance of this observation more directly.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Jiang et al. developed a robust workflow for identifying proline hydroxylation sites in proteins. They identified proline hydroxylation sites in HEK293 and RCC4 cells, respectively. The authors found that the more hydrophilic HILIC fractions were enriched in peptides containing hydroxylated proline residues. These peptides showed differences in charge and mass distribution compared to unmodified or oxidized peptides. The intensity of the diagnostic hydroxyproline iminium ion depended on parameters including MS collision energy, parent peptide concentration, and the sequence of amino acids adjacent to the modified proline residue. Additionally, they demonstrate that a combination of retention time in LC and optimized MS parameter settings reliably identifies proline hydroxylation sites in peptides, even when multiple proline residues are present.

      Strengths:

      Overall, the manuscript presents an advanced, standardized protocol for identifying proline hydroxylation. The experiments were well designed, and the developed protocol is straightforward, which may help resolve confusion in the field.

      Weaknesses:

      (1) The authors should provide a summary of the standard protocol for identifying proline hydroxylation sites in proteins that can easily be followed by others.

      This is a good suggestion and we have now included a figure (Figure 10) with a summary of our workflow in the current revision.

      (2) Cockman et al. proposed that HIF-α is the only physiologically relevant target for PHDs. Their approach is considered the gold standard for identifying PHD targets. Therefore, the authors should discuss the major progress they made in this manuscript that challenges Cockman's conclusion.

      While we had mentioned the Cockman et al., paper in the Introduction, we had not focussed on this somewhat controversial issue. However, in response to the Reviewer’s request, we have now added a comment in the Discussion section in the current revision of how our new data address the proposal discussed previously by Cockman et al. In brief, we believe that our findings are not consistent with a model in which PHDs have no protein targets other than HIFs.

      Reviewer #3 (Public review): 

      Summary:

      The authors present a new method for detecting and identifying proline hydroxylation sites within the proteome. This tool utilizes traditional LC-MS technology with optimized parameters, combined with HILIC-based separation techniques. The authors show that they pick up known hydroxy-proline sites and also validate a new site discovered through their pipeline.

      Strengths:

      The manuscript utilizes state-of-the-art mass spectrometric techniques with optimized collision parameters to ensure proper detection of the immonium ions, which is an advance compared to other similar approaches before. The use of synthetic control peptides on the HILIC separation step clearly demonstrates the ability of the method to reliably distinguish hydroxy-proline from oxidized methionine - containing peptides. Using this method, they identify a site on CDCA2, which they go on to validate in vitro and also study its role in regulation of mitotic progression in an associated manuscript.

      Weaknesses:

      Despite the authors' claim about the specificity of this method in picking up the intended peptides, there is a good amount of potential false positives that also happen to get picked (owing to the limitations of MS-based readout), and the authors' criteria for downstream filtering of such peptides require further clarification. In the same vein, greater and more diverse cell-based validation approach will be helpful to substantiate the claims regarding enrichment of peptides in the described pathway analyses.

      We of course agree that false positives may arise, as is true for essentially all PTM studies. There are two issues here; first, are identified sites technically correct? (i.e. not misidentifications from the MS data) and second, are the identified modifications of biological significance? We have addressed this using the popular MaxQuant software suite to evaluate the modifications identified and to control the false discovery rate (FDR) at both the precursor and protein level, as described in the manuscript. We are aware that false positives could arise from confusing oxidation of methionine with hydroxylation of proline. Therefore, to address the issue as to whether we could identify bona fide PHD protein targets outside of the HIF family, we adopted a conservative approach by simply filtering out peptides where there was a methionine residue within three amino acids of the predicted proline hydroxylation site. This was a pragmatic decision made to reduce the likelihood of false positives in our dataset and we recognise that this likely results in us overlooking some genuine proline hydroxylation sites that occur nearby methionine residues. To address the potential biological relevance of the proline hydroxylation sites identified, we analysed extracts from cells treated with FG inhibitors. Of course a detailed understanding of biological significance relies upon follow-on experimental analyses for each site, which we have performed for P604 on RepoMan in accompanying study by Druker et al., (eLife 2025; doi.org/10.7554/eLife.108131.1).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The finding that the immonium ion intensities of L/I did not increase with increasing collision energy was surprising. Was this specific to this synthetic peptide?

      We agree this is an interesting and unexpected finding. We have no reason to believe that it is specific to synthetic peptides per se, but rather think this reflects an effect of amino acid composition in the peptides analysed. It will be interesting to explore this phenomenon in more detail in future.

      (2) The sequence logos in Figure 4 seemed to lack any amino acid enrichment in most positions except for collagen peptides. Have these findings been tested with statistical analysis?

      The results we show for sequence logo analysis were generated using WebLogo (10.1101/gr.849004) and correspond to an analysis of all proline hydroxylated peptides we detected across all cell lines and replicates analysed. The fact that collagens are highly abundant proteins with very high levels of proline hydroxylation likely explains why collagen peptides dominated the outcome of the sequence logo analysis. There is clearly scope for more detailed follow up analysis in future of the sequence specificity of proline hydroxylation sites in no- collagen proteins that are validated PHD targets.

      (3) Overall figure quality was not ideal. The resolution and font sizes of figures should be carefully evaluated and adjusted. The figure legend should contain a title for the figure. Annotations of the figures were somewhat confusing. 

      We agree with the criticism of the figure resolution in the review copies - the lower resolution appears to have been generated after we had uploaded higher resolution original images. We are providing again higher resolution versions of all figures for the current revision.

      Reviewer #3 (Recommendations for the authors):

      Certain concerns regarding portions of the manuscript that need addressing:

      (1) " These data show that two different cell lines show unique profiles of proteins with hydroxylated peptides." - It is difficult to conclusively say this statement after profiling the prolyl hydroxy proteome from just two cell lines, especially since the amino acids with the highest frequency in the most enriched peptides are similar in both cell lines.

      We agree with this point and have changed the current revision to state instead, “This shows that each of the two cell lines analysed have distinct profiles.”

      (2) "We noted that there was a high frequency of a methionine residues appearing either at the first, second, or even third positions after the HyPro site.." - according to the authors, claim, the advantage of their method was that they were able to overcome the limitation of older methods that couldn't separate methionine oxidation from proline hydroxylation. However, in this statement, they say that the high frequency of methionine residues may be because of the similar mass shift. These statements are contradictory. The authors should either tone down the claim or prove that those are indeed hydroxyproline sites. Is it possible that in the filtering step of excluding these high-frequency of methionine - containing peptides, we are losing potential positive hits for hydroxy-proline sites? What is the authors' take on this?

      We respectfully do not agree that our, “statements are contradictory”, with respect to the potential confusion between identification of methionine oxidation and proline hydroxylation, but acknowledge that we have not explained this issue clearly enough. It is a fact that the similar mass shift resulting from proline hydroxylation and methionine oxidation is a technical challenge that can potentially lead to misidentifications in MS studies and that is what we state clearly in the manuscript. We have addressed this issue head on experimentally in this study and show using synthetic peptides how detailed analysis of specific proline hydroxylation sites in target proteins can be distinguished from methionine oxidation, based upon differential chromatographic behaviour of peptides with either hydroxylated proline or oxidised methionine, as well as by detailed analysis of fragmentation spectra. However, in the case of our global analysis, as we were not able to perform synthetic peptide comparisons for every putative site identified, we took the pragmatic approach of filtering out examples of peptides where a methionine residue was present within three residues of a potential proline hydroxylation site. This was done simply to reduce the possibility of misidentification in the set of novel proline hydroxylated peptides identified and we accept that as a consequence we are likely filtering out peptides that include bona fide proline hydroxylation sites. We have clarified this point in the current revision and hope to be able to address this issue more comprehensively in future studies.

      (3) "Accordingly, a score cut-off of 40 for hydroxylated peptides and a localisation probability cut-off of more than 0.5 for hydroxylated peptides was performed." Could the authors shed more light and clarify what was the basis for this value of cut-off to be used in this filtering step? Is this sample dependent? What should be the criteria to determine this value?

      We used MaxQuant software (10.1016/j.cell.2006.09.026), for PTM analysis, in which a localization probability score of 0.75 and score cut-off of 40 is a commonly used threshold to define high confidence. The reason that we used 0.5 at the first step was to investigate how likely it might be that the misassignment of delta m/z +16 Da (oxidation) on Methionine would affect the identification of hydroxylation on Proline. However, we note that in the final results set used for analysis, all putative proline hydroxylated peptides that had a Methionine residue near to the hydroxylated proline were disregarded as a pragmatic step to reduce the probability of false identifications.

      (4) The authors are requested to kindly make the HPLC and MS traces more legible and use highresolution images, with clearly labeled values on the peaks. Kindly extract coordinates from the underlying data files to plot the curves if needed to make it clearer.

      We have reviewed the clarity of all images and figures in the current revision.

      (5) There seems to be no error bars in Figure 3, Figure 7E, and panels of Figure 8 with bar graphs. Are those single replicate data?

      These specific figures are from single replicate data.

      (6) For Figure 9C, the control with only PHD1 (no peptide) is missing. 

      The ‘no peptide control’ was not included in the figure because it is simply a blank lane and there is nothing to see.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Damaris et al. perform what is effectively an eQTL analysis on microbial pangenomes of E. coli and P. aeruginosa. Specifically, they leverage a large dataset of paired DNA/RNA-seq information for hundreds of strains of these microbes to establish correlations between genetic variants and changes in gene expression. Ultimately, their claim is that this approach identifies non-coding variants that affect expression of genes in a predictable manner and explain differences in phenotypes. They attempt to reinforce these claims through use of a widely regarded promoter calculator to quantify promoter effects, as well as some validation studies in living cells. Lastly, they show that these non-coding variations can explain some cases of antibiotic resistance in these microbes.

      Major comments

      Are the claims and the conclusions supported by the data or do they require additional experiments or analyses to support them?

      The authors convincingly demonstrate that they can identify non-coding variation in pangenomes of bacteria and associate these with phenotypes of interest. What is unclear is the extent by which they account for covariation of genetic variation? Are the SNPs they implicate truly responsible for the changes in expression they observe? Or are they merely genetically linked to the true causal variants. This has been solved by other GWAS studies but isn't discussed as far as I can tell here.

      We thank the reviewer for their effective summary of our study. Regarding our ability to identify variants that are causal for gene expression changes versus those that only “tag” the causal ones, here we have to again offer our apologies for not spelling out the limitation of GWAS approaches, namely the difficulty in separating associated with causal variants. This inherent difficulty is the main reason why we added the in-silico and in-vitro validation experiments; while they each have their own limitations, we argue that they all point towards providing a causal link between some of our associations and measured gene expression changes. We have amended the discussion (e.g. at L548) section to spell our intention out better and provide better context for readers that are not familiar with the pitfalls of (bacterial) GWAS.

      They need to justify why they consider the 30bp downstream of the start codon as non-coding. While this region certainly has regulatory impact, it is also definitely coding. To what extent could this confound results and how many significant associations to expression are in this region vs upstream?

      We agree with the reviewer that defining this region as “non-coding” is formally not correct, as it includes the first 10 codons of the focal gene. We have amended the text to change the definition to “cis regulatory region” and avoided using the term “non-coding” throughout the manuscript. Regarding the relevance of this including the early coding region, we have looked at the distribution of associated hits in the cis regulatory regions we have defined; the results are shown in Supplementary Figure 3.

      We quantified the distribution of cis associated variants and compared them to a 2,000 permutations restricted to the -200bp and +30bp window in both E. coli * (panel A) and P. aeruginosa* (panel B). As it can be seen, the associated variants that we have identified are mostly present in the 200bp region and the +30bp region shows a mild depletion relative to the random expectation, which we derived through a variant position shuffling approach (2,000 replicates). Therefore, we believe that the inclusion of the early coding region results in an appreciable number of associations, and in our opinion justify its inclusion as a putative “cis regulatory region”.

      The claim that promoter variation correlates with changes in measured gene expression is not convincingly demonstrated (although, yes, very intuitive). Figure 3 is a convoluted way of demonstrating that predicted transcription rates correlate with measured gene expression. For each variant, can you do the basic analysis of just comparing differences in promoter calculator predictions and actual gene expression? I.e. correlation between (promoter activity variant X)-(promoter activity variant Y) vs (measured gene expression variant X)-(measured gene expression variant Y). You'll probably have to

      We realize that we may not have failed to properly explain how we carried out this analysis, which we did exactly in the way the reviewer suggests here. We had in fact provided four example scatterplots of the kind the reviewer was requesting as part of Figure 4. We have added a mention of their presence in the caption of Figure 3.

      Figure 7 it is unclear what this experiment was. How were they tested? Did you generate the data themselves? Did you do RNA-seq (which is what is described in the methods) or just test and compare known genomic data?

      We apologize for the lack of clarity here; we have amended the figure’s caption and the corresponding section of the results (i.e. L411 and L418) to better highlight how the underlying drug susceptibility data and genomes came from previously published studies.

      Are the data and the methods presented in such a way that they can be reproduced?

      No, this is the biggest flaw of the work. The RNA-Seq experiment to start this project is not described at all as well as other key experiments. Descriptions of methods in the text are far too vague to understand the approach or rationale at many points in the text. The scripts are available on github but there is no description of what they correspond to outside of the file names and none of the data files are found to replicate the plots.

      We have taken this critique to heart, and have given more details about the experimental setup for the generation of the RNA-seq data in the methods as well as the results sections. We have also thoroughly reviewed any description of the methods we have employed to make sure they are more clearly presented to the readers. We have also updated our code repository in order to provide more information about the meaning of each script provided, although we would like to point out that we have not made the code to be general purpose, but rather as an open documentation on how the data was analyzed.

      Figure 8B is intended to show that the WaaQ operon is connected to known Abx resistance genes but uses the STRING method. This requires a list of genes but how did they build this list? Why look at these known ABx genes in particular? STRING does not really show evidence, these need to be substantiated or at least need to justify why this analysis was performed.

      We have amended the Methods section (“Gene interaction analysis”, L799) to better clarify how the network shown in this panel was obtained. In short, we have filtered the STRING database to identify genes connected to members of the waa operon with an interaction score of at least 0.4 (“moderate confidence”), excluding the “text mining” field. Antimicrobial resistance genes were identified according to the CARD database. We believe these changes will help the readers to better understand how we derived this interaction.

      Are the experiments adequately replicated and statistical analysis adequate?

      An important claim on MIC of variants for supplementary table 8 has no raw data and no clear replicates available. Only figure 6, the in vitro testing of variant expression, mentions any replicates.

      We have expanded the relevant section in the Methods (“Antibiotic exposure and RNA extraction”, L778) to provide more information on the way these assays were carried out. In short, we carried out three biological replicates, the average MIC of two replicates in closest agreement was the representative MIC for the strain. We believe that we have followed standard practice in the field of microbiology, but we agree that more details were needed to be provided in order for readers to appreciate this.

      Minor comments

      Specific experimental issues that are easily addressable..

      Are prior studies referenced appropriately?

      There should be a discussion of eQTLs in this. Although these have mostly been in eukaryotes a. https://doi.org/10.1038/s41588-024-01769-9 ; https://doi.org/10.1038/nrg3891.

      We have added these two references, which provide a broader context to our study and methodology, in the introduction.

      Line 67. Missing important citation for Ireland et al. 2020 https://doi.org/10.7554/eLife.55308

      Line 69. Should mention Johns et al. 2018 (https://doi.org/10.1038/nmeth.4633) where they study promoter sequences outside of E. coli

      Line 90 - replace 'hypothesis-free' with unbiased

      We have implemented these changes.

      Line 102 - state % of DEGs relative to the entire pan-genome

      Given that the study is focused on identifying variants that were associated with changes in expression for reference genes (i.e. those present in the reference genome), we think that providing this percentage would give the false impression that our analysis include accessory genes that are not encoded by the reference isolate, which is not what we have done.

      Figure 1A is not discussed in the text

      We have added an explicit mention of the panels in the relevant section of the results.

      Line 111: it is unclear what enrichment was being compared between, FIgures 1C/D have 'Gene counts' but is of the total DEGs? How is the p-value derived? Comparing and what statistical test was performed? Comparing DEG enrichment vs the pangenome? K12 genome?

      We have amended the results and methods section, as well as Figure 1’s caption to provide more details on how this analysis was carried out.

      Line 122-123: State what letters correspond to these COG categories here

      We have implemented the clarifications and edits suggested above

      Line 155: Need to clarify how you use k-mers in this and how they are different than SNPs. are you looking at k-mer content of these regions? K-mers up to hexamers or what? How are these compared. You can't just say we used k-mers.

      We have amended that line in the results section to more explicitly refer to the actual encoding of the k-mer variants, which were presence/absence patterns for k-mers extracted from each target gene’s promoter region separately, using our own developed method, called panfeed. We note that more details were already given in the methods section, but we do recognize that it’s better to clarify things in the results section, so that more distracted readers get the proper information about this class of genetic variants.

      Line 172: It would be VERY helpful to have a supplementary figure describing these types of variants, perhaps a multiple-sequence alignment containing each example

      We thank the reviewer for this suggestion. We have now added Supplementary Figure 3, which shows the sequence alignments of the cis-regulatory regions underlying each class of the genetic marker for both E. coli and P. aeruginosa.

      Figure 4: THis figure is too small. Why are WaaQ and UlaE being used as examples here when you are supposed to be explicitly showing variants with strong positive correlations?

      We rearranged the figure’s layout to improve its readability. We agree that the correlation for waaQ and ulaE is weaker than for yfgJ and kgtP, but our intention was to not simply cherry-pick strong examples, but also those for which the link between predicted promoter strength and recorded gene expression was less obvious.

      Figure 4: Why is there variation between variants present and variant absent? Is this due to other changes in the variant? Should mention this in the text somewhere

      Variability in the predicted transcription rate for isolates encoding for the same variant is due to the presence of other (different) variants in the region surrounding the target variant. PromoterCalculator uses nucleotide regions of variable length (78 to 83bp) to make its predictions, while the variants we are focusing on are typically shorter (as shown in Figure 4). This results in other variants being included in the calculation and therefore slightly different predicted transcription rates for each strain. We have amended the caption of Figure 4 to provide a succinct explanation of these differences.

      Line 359: Need to talk about each supplementary figure 4 to 9 and how they demonstrate your point.

      We have expanded this section to more explicitly mention the contents of these supplementary figures and why they are relevant for the findings of this section (L425).

      Are the text and figures clear and accurate?

      Figure 4 too small

      We have fixed the figure, as described above

      Acronyms are defined multiple times in the manuscript, sometimes not the first time they are used (e.g. SNP, InDel)

      Figure 8A - Remove red box, increase label size

      Figure 8B - Low resolution, grey text is unreadable and should be darker and higher resolution

      Line 35 - be more specific about types of carbon metabolism and catabolite repression

      Line 67 - include citation for ireland et al. 2020 https://doi.org/10.7554/eLife.55308

      Line 74 - You talk about looking in cis but don't specify how mar away cis is

      Line 75 - we encoded genetic variants..... It is unclear what you mean here

      Line 104 - 'were apart of operons' should clarify you mean polycistronic or multi-gene operons. Single genes may be considered operonic units as well.

      We have addressed all the issues indicated above.

      Figure 2: THere is no axis for the percents and the percents don't make sense relative to the bars they represent??

      We realize that this visualization might not have been the most clear for readers, and have made the following improvement: we have added the number of genes with at least one association before the percentage. We note that the x-axis is in log scale, which may make it seem like the light-colored bars are off. With the addition of the actual number of associated genes we think that this confusion has been removed.

      Figure 2: Figure 2B legend should clarify that these are individual examples of Differential expression between variants

      Line 198-199: This sentence doesn't make sense, 'encoded using kmers' is not descriptive enough

      Line 205: Should be upfront about that you're using the Promoter Calculator that models biophysical properties of promoter sequences to predict activity.

      Line 251: 'Scanned the non-coding sequences of the DEGs'. This is far too vague of a description of an approach. Need to clarify how you did this and I didn't see in the method. Is this an HMM? Perfect sequence match to consensus sequence? Some type of alignment?

      Line 257-259: This sentence lacks clarity

      We have implemented all the suggested changes and clarified the points that the reviewer has highlighted above.

      Line346: How were the E. coli isolates tested? Was this an experiment you did? This is a massive undertaking (1600 isolates * 12 conditions) if so so should be clearly defined

      While we have indicated in the previous paragraph that the genomes and antimicrobial susceptibility data were obtained from previously published studies, we have now modified this paragraph (e.g. L411 and L418) slightly to make this point even clearer.

      Figure 6A: The tile plot on the right side is not clearly labeled and it is unclear what it is showing and how that relates to the bar plots.

      In the revised figure, we have clarified the labeling of the heatmap to now read “Log2(Fold Change) (measured expression)” to indicate that it represents each gene’s fold changes obtained from our initial transcriptomic analysis. We have also included this information in the caption of the figure, making the relationship between the measured gene expression (heatmap) and the reporter assay data (bar plots) clear to the reader.

      FIgure 6B: typo in legend 'Downreglation'

      We thank the review for pointing this out. The typo has been corrected to “Down regulation” in the revised figure.

      Line 398: Need to state rationale for why Waaq operon is being investigated here. WHy did you look into individual example?

      We thank the reviewer for asking for a clarification here. Our decision to investigate the waaQ gene was one of both biological relevance and empirical evidence. In our analysis associating non-coding variants with antimicrobial resistance using the Moradigaravand et al. dataset, we identified a T>C variant at position 3808241 that was associated with resistance to Tobramycin. We also observed this variant in our strain collection, where it was associated with expression changes of the gene, suggesting a possible functional impact. The waa operon is involved in LPS synthesis, a central determinant of the bacteria’s outer membrane integrity and a well established virulence factor. This provided a plausible biological mechanism through which variation could influence antimicrobial susceptibility. As its role in resistance has not been extensively characterized, this represents a good candidate for our experimental validation. We have now included this rationale in our revised manuscript (i.e. L476).

      Figure 8: Can get rid of red box

      We have now removed the red box from Figure 8 in the revised version.

      Line 463 - 'account for all kinds' is too informal

      Mix of font styles throughout document

      We have implemented all the suggestions and formatting changes indicated above.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their manuscript "Cis non-coding genetic variation drives gene expression changes in the E. coli and P. aeruginosa pangenomes", Damaris and co-authors present an extensive meta-analysis, plus some useful follow up experiments, attempting to apply GWAS principles to identify the extent to which differences in gene expression between different strains within a given species can be directly assigned to cis-regulatory mutations. The overall principle, and the question raised by the study, is one of substantial interest, and the manuscript here represents a careful and fascinating effort at unravelling these important questions. I want to preface my review below (which may otherwise sound more harsh than I intend) with the acknowledgment that this is an EXTREMELY difficult and challenging problem that the authors are approaching, and they have clearly put in a substantial amount of high quality work in their efforts to address it. I applaud the work done here, I think it presents some very interesting findings, and I acknowledge fully that there is no one perfect approach to addressing these challenges, and while I will object to some of the decisions made by the authors below, I readily admit that others might challenge my own suggestions and approaches here. With that said, however, there is one fundamental decision that the authors made which I simply cannot agree with, and which in my view undermines much of the analysis and utility of the study: that decision is to treat both gene expression and the identification of cis-regulatory regions at the level of individual genes, rather than transcriptional units. Below I will expand on why I find this problematic, how it might be addressed, and what other areas for improvement I see in the manuscript:

      We thank the reviewer for their praise of our work. A careful set of replies to the major and minor critiques are reported below each point.

      In the entire discussion from lines roughly 100-130, the authors frequently dissect out apparently differentially expressed genes from non differentially expressed genes within the same operons... I honestly wonder whether this is a useful distinction. I understand that by the criteria set forth by the authors it is technically correct, and yet, I wonder if this is more due to thresholding artifacts (i.e., some genes passing the authors' reasonable-yet-arbitrary thresholds whereas others in the same operon do not), and in the process causing a distraction from an operon that is in fact largely moving in the same direction. The authors might wish to either aggregate data in some way across known transcriptional units for the purposes of their analysis, and/or consider a more lenient 'rescue' set of significance thresholds for genes that are in the same operons as differentially expressed genes. I would favor the former approach, performing virtually all of their analysis at the level of transcriptional units rather than individual genes, as much of their analysis in any case relies upon proper assignment of genes to promoters, and this way they could focus on the most important signals rather than get lots sometimes in the weeds of looking at every single gene when really what they seem to be looking at in this paper is a property OF THE PROMOTERS, not the genes. (of course there are phenomena, such as rho dependent termination specifically titrating expression of late genes in operons, but I think on the balance the operon-level analysis might provide more insights and a cleaner analysis and discussion).

      We agree with the reviewer that the peculiar nature of transcription in bacteria has to be taken into account in order to properly quantify the influence of cis variants in gene expression changes. We therefore added the exact analysis the reviewer suggested; that is, we ran associations between the variants in cis to the first gene of each operon and a phenotype that considered the fold-change of all genes in the operon, via a weighted average (see Methods for more details). As reported in the results section (L223), we found a similar trend as with the original analysis: we found the highest proportion of associations when encoding cis variants using k-mers (42% for E. coli and 45% for P. aeruginosa). More importantly, we found a high degree of overlap between this new “operon-level” association analysis and the original one (only including the first gene in each operon). We found a range of 90%-94% of associations overlapping for E. coli and between 75% and 91% for P. aeruginosa, depending on the variant type. We note that operon definitions are less precise for P. aeruginosa, which might explain the higher variability in the level of overlap. We have added the results of this analysis in the results section.

      This also leads to a more general point, however, which I think is potentially more deeply problematic. At the end of the day, all of the analysis being done here centers on the cis regulatory logic upstream of each individual open reading frame, even though in many cases (i.e., genes after the first one in multi-gene operons), this is not where the relevant promoter is. This problem, in turn, raises potentially misattributions of causality running in both directions, where the causal impact on a bona fide promoter mutation on many genes in an operon may only be associated with the first gene, or on the other side, where a mutation that co-occurs with, but is causally independent from, an actual promoter mutation may be flagged as the one driving an expression change. This becomes an especially serious issue in cases like ulaE, for genes that are not the first gene in an operon (at least according to standard annotations, the UlaE transcript should be part of a polycistronic mRNA beginning from the ulaA promoter, and the role played by cis-regulatory logic immediately upstream of ulaE is uncertain and certainly merits deeper consideration. I suspect that many other similar cases likewise lurk in the dataset used here (perhaps even moreso for the Pseudomonas data, where the operon definitions are likely less robust). Of course there are many possible explanations, such as a separate ulaE promoter only in some strains, but this should perhaps be carefully stated and explored, and seems likely to be the exception rather than the rule.

      While we again agree with the reviewer that some of our associations might not result in a direct causal link because the focal variant may not belong to an actual promoter element, we also want to point out how the ability to identify the composition of transcriptional units in bacteria is far from a solved problem (see references at the bottom of this comment, two in general terms, and one characterizing a specific example), even for a well-studied species such as E. coli. Therefore, even if carrying out associations at the operon level (e.g. by focusing exclusively on variants in cis for the first gene in the operon) might be theoretically correct, a number of the associations we find further down the putative operons might be the result of a true biological signal.

      1. Conway, T., Creecy, J. P., Maddox, S. M., Grissom, J. E., Conkle, T. L., Shadid, T. M., Teramoto, J., San Miguel, P., Shimada, T., Ishihama, A., Mori, H., & Wanner, B. L. (2014). Unprecedented High-Resolution View of Bacterial Operon Architecture Revealed by RNA Sequencing. mBio, 5(4), 10.1128/mbio.01442-14. https://doi.org/10.1128/mbio.01442-14

      2. Sáenz-Lahoya, S., Bitarte, N., García, B., Burgui, S., Vergara-Irigaray, M., Valle, J., Solano, C., Toledo-Arana, A., & Lasa, I. (2019). Noncontiguous operon is a genetic organization for coordinating bacterial gene expression. Proceedings of the National Academy of Sciences, 116(5), 1733–1738. https://doi.org/10.1073/pnas.1812746116

      3. Zehentner, B., Scherer, S., & Neuhaus, K. (2023). Non-canonical transcriptional start sites in E. coli O157:H7 EDL933 are regulated and appear in surprisingly high numbers. BMC Microbiology, 23(1), 243. https://doi.org/10.1186/s12866-023-02988-6

      Another issue with the current definition of regulatory regions, which should perhaps also be accounted for, is that it is likely that for many operons, the 'regulatory regions' of one gene might overlap the ORF of the previous gene, and in some cases actual coding mutations in an upstream gene may contaminate the set of potential regulatory mutations identified in this dataset.

      We agree that defining regulatory regions might be challenging, and that those regions might overlap with coding regions, either for the focal gene or the one immediately upstream. For these reasons we have defined a wide region to identify putative regulatory variants (-200 to +30 bp around the start codon of the focal gene). We believe this relatively wide region allows us to capture the most cis genetic variation.

      Taken together, I feel that all of the above concerns need to be addressed in some way. At the absolute barest minimum, the authors need to acknowledge the weaknesses that I have pointed out in the definition of cis-regulatory logic at a gene level. I think it would be far BETTER if they performed a re-analysis at the level of transcriptional units, which I think might substantially strengthen the work as a whole, but I recognize that this would also constitute a substantial amount of additional effort.

      As indicated above, we have added a section in the results section to report on the analysis carried out at the level of operons as individual units, with more details provided in the methods section. We believe these results, which largely overlap with the original analysis, are a good way to recognize the limitation of our approach and to acknowledge the importance of gaining a better knowledge on the number and composition of transcriptional units in bacteria, for which, as the reference above indicates, we still have an incomplete understanding.

      Having reached the end of the paper, and considering the evidence and arguments of the authors in their totality, I find myself wondering how much local x background interactions - that is, the effects of cis regulatory mutations (like those being considered here, with or without the modified definitions that I proposed above) IN THE CONTEXT OF A PARTICULAR STRAIN BACKGROUND, might matter more than the effects of the cis regulatory mutations per se. This is a particularly tricky problem to address because it would require a moderate number of targeted experiments with a moderate number of promoters in a moderate number of strains (which of course makes it maximally annoying since one can't simply scale up hugely on either axis individually and really expect to tease things out). I think that trying to address this question experimentally is FAR beyond the scope of the current paper, but I think perhaps the authors could at least begin to address it by acknowledging it as a challenge in their discussion section, and possibly even identify candidate promoters that might show the largest divergence of activities across strains when there IS no detectable cis regulatory mutation (which might be indicative of local x background interactions), or those with the largest divergences of effect for a given mutation across strains. A differential expression model incorporating shrinkage is essential in such analysis to avoid putting too much weight on low expression genes with a lot of Poisson noise.

      We again thank the reviewer for their thoughtful comments on the limitations of correlative studies in general, and microbial GWAS in particular. In regards to microbial GWAS we feel we may have failed to properly explain how the implementation we have used allows to, at least partially, correct for population structure effects. That is, the linear mixed model we have used relies on population structure to remove the part of the association signal that is due to the genetic background and thus focus the analysis on the specific loci. Obviously examples in which strong epistatic interactions are present would not be accounted for, but those would be extremely challenging to measure or predict at scale, as the reviewer rightfully suggests. We have added a brief recap of the ability of microbial GWAS to account for population structure in the results section (“A large fraction of gene expression changes can be attributed to genetic variations in cis regulatory regions”, e.g. L195).

      I also have some more minor concerns and suggestions, which I outline below:

      It seems that the differential expression analysis treats the lab reference strains as the 'centerpoint' against which everything else is compared, and yet I wonder if this is the best approach... it might be interesting to see how the results differ if the authors instead take a more 'average' strain (either chosen based on genetics or transcriptomics) as a reference and compared everything else to that.

      While we don’t necessarily disagree with the reviewer that a “wild” strain would be better to compare against, we think that our choice to go for the reference isolates is still justified on two grounds. First, while it is true that comparing against a reference introduces biases in the analysis, this concern would not be removed had we chosen another strain as reference; which strain would then be best as a reference to compare against? We think that the second point provides an answer to this question; the “traditional” reference isolates have a rich ecosystem of annotations, experimental data, and computational predictions. These can in turn be used for validation and hypothesis generation, which we have done extensively in the manuscript. Had we chosen a different reference isolate we would have had to still map associations to the traditional reference, resulting in a probable reduction in precision. An example that will likely resonate with this reviewer is that we have used experimentally-validated and high quality computational operon predictions to look into likely associations between cis-variants and “operon DEGs”. This analysis would have likely been of worse quality had we used another strain as reference, for which operon definitions would have had to come from lower-quality predictions or be “lifted” from the traditional reference.

      Line 104 - the statement about the differentially expressed genes being "part of operons with diverse biological functions" seems unclear - it is not apparent whether the authors are referring to diversity of function within each operon, or between the different operons, and in any case one should consider whether the observation reflects any useful information or is just an apparently random collection of operons.

      We agree that this formulation could create confusion and we have elected to remove the expression “with diverse biological functions”, given that we discuss those functions immediately after that sentence.

      Line 292 - I find the argument here somewhat unconvincing, for two reasons. First, the fact that only half of the observed changes went in the same direction as the GWAS results would indicate, which is trivially a result that would be expected by random chance, does not lend much confidence to the overall premise of the study that there are meaningful cis regulatory changes being detected (in fact, it seems to argue that the background in which a variant occurs may matter a great deal, at least as much as the cis regulatory logic itself). Second, in order to even assess whether the GWAS is useful to "find the genetic determinants of gene expression changes" as the authors indicate, it would be necessary to compare to a reasonable, non-straw-man, null approach simply identifying common sequence variants that are predicted to cause major changes in sigma 70 binding at known promoters; such a test would be especially important given the lack of directional accuracy observed here. Along these same lines, it is perhaps worth noting, in the discussion beginning on line 329, that the comparison is perhaps biased in favor of the GWAS study, since the validation targets here were prioritized based on (presumably strong) GWAS data.

      We thank the reviewer for prompting us into reasoning about the results of the in-vitro validation experiments. We agree that the agreement between the measured gene expression changes agree only partly with those measured with the reporter system, and that this discrepancy could likely be attributed to regulatory elements that are not in cis, and thus that were not present in the in-vitro reporter system. We have noted this possibility in the discussion. Additionally, we have amended the results section to note that even though the prediction in the direction of gene expression change was not as accurate as it could be expected, the prediction of whether a change would be present (thus ignoring directionality) was much higher.

      I don't find the Venn diagrams in Fig 7C-D useful or clear given the large number of zero-overlap regions, and would strongly advocate that the authors find another way to show these data.

      While we are aware that alternative ways to show overlap between sets, such as upset plots, we don’t actually find them that much easier to parse. We actually think that the simple and direct Venn diagrams we have drawn convey the clear message that overlaps only exist between certain drug classes in E. coli, and virtually none for P. aeruginosa. We have added a comment on the lack of overlap between all drug classes and the differences between the two species in the results section (i.e. L436 and L465).

      In the analysis of waa operon gene expression beginning on line 400, it is perhaps important to note that most of the waa operon doesn't do anything in laboratory K12 strains due to the lack of complete O-antigen... the same is not true, however, for many wild/clinical isolates. It would be interesting to see how those results compare, and also how the absolute TPMs (rather than just LFCs) of genes in this operon vary across the strains being investigated during TOB treatment.

      We thank the reviewer for this helpful suggestion. We examined the absolute expression (TPMs) of waa operon genes under the baseline (A) and following exposure to Tobramycin (B). The representative TPMs per strain were obtained by averaging across biological replicates. We observed a constitutive expression of the genes in the reference strain (MG1655) and the other isolates containing the variant of interest (MC4100, BW25113). In contrast, strains lacking the variants of interest (IAI76 and IAI78), showed lower expression of these operon genes under both conditions. Strain IAI77, on the other hand, displayed increased expression of a subset of waa genes post Tobramycin exposure, indicating strain-specific variation in transcriptional response. While the reference isolate might not have the O-antigen, it certainly expresses the waa operon, both constitutively and under TOB exposure.

      I don't think that the second conclusion on lines 479-480 is fully justified by the data, given both the disparity in available annotation information between the two species, AND the fact that only two species were considered.

      While we feel that the “Discussion” section of a research paper allows for speculative statements, we have to concede that we have perhaps overreached here. We have amended this sentence to be more cautious and not mislead readers.

      Line 118: "Double of DEGs"

      Line 288 - presumably these are LOG fold changes

      Fig 6b - legend contains typos

      Line 661 - please report the read count (more relevant for RNA-seq analysis) rather than Gb

      We thank the reviewer for pointing out the need to make these edits. We have implemented them all.

      Source code - I appreciate that the authors provide their source code on github, but it is very poorly documented - both a license and some top-level documentation about which code goes with each major operation/conclusion/figure should be provided. Also, ipython notebooks are in general a poor way in my view to distribute code, due to their encouragement of nonlinear development practices; while they are fine for software development, actual complete python programs along with accompanying source data would be preferrable.

      We agree with the reviewer that a software license and some documentation about what each notebook is about is warranted, and we have added them both. While we agree that for “consumer-grade” software jupyter notebooks are not the most ergonomic format, we believe that as a documentation of how one-time analyses were carried out they are actually one of the best formats we could think of. They in fact allow for code and outputs to be presented alongside each other, which greatly helped us to iterate on our research and to ensure that what was presented in the manuscript matched the analyses we reported in the code. This is of course up for debate and ultimately specific to someone’s taste, and so we will keep the reviewer’s critique in mind for our next manuscript. And, if we ever decide to package the analyses presented in the manuscript as a “consumer-grade” application for others to use, we would follow higher standards of documentation and design.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Damaris et al. collected genome sequences and transcriptomes from isolates from two bacterial species. Data for E. coli were produced for this paper, while data for P. aeruginosa had been measured earlier. The authors integrated these data to detect genes with differential expression (DE) among isolates as well as cis-expression quantitative trait loci (cis-eQTLs). The authors used sample sizes that were adequate for an initial exploration of gene regulatory variation (n=117 for E. coli and n=413 for P. aeruginosa) and were able to discover cis eQTLs at about 39% of genes. In a creative addition, the authors compared their results to transcription rates predicted from a biophysical promoter model as well as to annotated transcription factor binding sites. They also attempted to validate some of their associations experimentally using GFP-reporter assays. Finally, the paper presents a mapping of antibiotic resistance traits. Many of the detected associations for this important trait group were in non-coding genome regions, suggesting a role of regulatory variation in antibiotic resistance.

      A major strength of the paper is that it covers an impressive range of distinct analyses, some of which in two different species. Weaknesses include the fact that this breadth comes at the expense of depth and detail. Some sections are underdeveloped, not fully explained and/or thought-through enough. Important methodological details are missing, as detailed below.

      We thank the reviewer for highlighting the strengths of our study. We hope that our replies to their comments and the other two reviewers will address some of the limitations.

      Major comments:

      1. An interesting aspect of the paper is that genetic variation is represented in different ways (SNPs & indels, IRG presence/absence, and k-mers). However, it is not entirely clear how these three different encodings relate to each other. Specifically, more information should be given on these two points:

      2. it is not clear how "presence/absence of intergenic regions" are different from larger indels.

      In order to better guide readers through the different kinds of genetic variants we considered, we have added a brief explanation about what “promoter switches” are in the introduction (“meaning that the entire promoter region may differ between isolates due to recombination events”, L56). We believe this clarifies how they are very different in character from a large deletion. We have kept the reference to the original study (10.1073/pnas.1413272111) describing how widespread these switches are in E. coli as a way for readers to discover more about them.

      • I recommend providing more narration on how the k-mers compare to the more traditional genetic variants (SNPs and indels). It seems like the k-mers include the SNPs and indels somehow? More explanation would be good here, as k-mer based mapping is not usually done in other species and is not standard practice in the field. Likewise, how is multiple testing handled for association mapping with k-mers, since presumably each gene region harbors a large number of k-mers, potentially hugely increasing the multiple testing burden?

      We indeed agree with the reviewer in thinking that representing genetic variants as k-mers would encompass short variants (SNP/InDels) as well as larger variants and promoters presence/absence patterns. We believe that this assumption is validated by the fact that we identify the highest proportion of DEGs with a significant association when using this representation of variants (Figure 2A, 39% for both species). We have added a reference to a recent review on the advantages of k-mer methods for population genetics (10.1093/molbev/msaf047) in the introduction. Regarding the issue of multiple testing correction, we have employed a commonly recognized approach that, unlike a crude Bonferroni correction using the number of tested variants, allows for a realistic correction of association p-values. We used the number of unique presence/absence patterns, which can be shared between multiple genetic variants, and applied a Bonferroni correction using this number rather than the number of variants tested. We have expanded the corresponding section in the methods (e.g. L697) to better explain this point for readers not familiar with this approach.

      1. What was the distribution of association effect sizes for the three types of variants? Did IRGs have larger effects than SNPs as may be expected if they are indeed larger events that involve more DNA differences? What were their relative allele frequencies?

      We appreciate the suggestion made by the reviewer to look into the distribution of effect sizes divided by variant type. We have now evaluated the distribution of the effect sizes and allele frequencies for the genetic markers (SNPs/InDels, IGRs, and k-mers) for both species (Supplementary Figure 2). In E. coli, IGR variants showed somewhat larger median effect sizes (|β| = 4.5) than SNPs (|β| = 3.8), whereas k-mers displayed the widest distribution (median |β| = 5.2). In P. aeruginosa, the trend differed with IGRs exhibiting smaller effects (median |β| = 3.2), compared to SNPs/InDels (median |β| =5.1) and k-mers (median |β| = 6.2). With respect to allele frequencies, SNPs/InDels generally occured at lower frequencies (median AF = 0.34 for E.coli, median AF = 0.33 for P. aeruginosa), whereas IGRs (median AF = 0.65 for E. coli and 0.75 for P. aeruginosa) and k-mers (median AF = 0.71 for E. coli and 0.65 for P. aeruginosa) were more often at the intermediate to higher frequencies respectively. We have added a visualization for the distribution of effect sizes (Supplementary Figure 2).

      1. The GFP-based experiments attempting to validate the promoter effects for 18 genes are laudable, and the fact that 16 of them showed differences is nice. However, the fact that half of the validation attempts yielded effects in the opposite direction of what was expected is quite alarming. I am not sure this really "further validates" the GWAS in the way the authors state in line 292 - in fact, quite the opposite in that the validations appear random with regards to what was predicted from the computational analyses. How do the authors interpret this result? Given the higher concordance between GWAS, promoter prediction, and DE, are the GFP assays just not relevant for what is going on in the genome? If not, what are these assays missing? Overall, more interpretation of this result would be helpful.

      We thanks the reviewer for their comment, which is similar in nature to that raised by reviewer #2 above. As noted in our reply above we have amended the results and discussion to indicate that although the direction of gene expression change was not highly accurate, focusing on the magnitude (or rather whether there would be a change in gene expression, regardless of the direction), resulted in a higher accuracy. We postulate that the cases in which the direction of the change was not correctly identified could be due to the influence of other genetic elements in trans with the gene of interest.

      1. On the same note, it would be really interesting to expand the GFP experiments to promoters that did not show association in the GWAS. Based on Figure 6, effects of promoter differences on GFP reporters seem to be very common (all but three were significant). Is this a higher rate than for the average promoter with sequence variation but without detected association? A handful of extra reporter experiments might address this. My larger question here is: what is the null expectation for how much functional promoter variation there is?

      We thank the reviewer for this comment. We agree that estimating the null expectation for the functional promoter would require testing promoter alleles with sequence variation that are not associated in the GWAS. Such experiments, which would directly address if the observed effects in our study exceeds background, would have required us to prepare multiple constructs, which was unfortunately not possible for us due to staff constraints. We therefore elected to clarify the scope of our GFP reporter assays instead. These experiments were designed as a paired comparison of the wild-type and the GWAS-associated variant alleles of the same promoter in an identical reporter background, with the aim of testing allele-specific functional effects for GWAS hits (Supplementary Figure 6). We also included a comparison in GFP fluorescence between the promoterless vector (pOT2) and promoter-containing constructs; we observed higher GFP signals in all but four (yfgJ, fimI, agaI, and yfdQ) variant-containing promoter constructs, which indicates that for most of the construct we cloned active promoter elements. We have revised the manuscript text accordingly to reflect this clarification and included the control in the supplementary information as Supplementary Figure 6.

      1. Were the fold-changes in the GFP experiments statistically significant? Based on Figure 6 it certainly looks like they are, but this should be spelled out, along with the test used.

      We thank the reviewer for pointing this out. We have reviewed Figure 6 to indicate significant differences between the test and control reporter constructs. We used the paired student’s t-test to match the matched plate/time point measurements. We also corrected for multiple testing using the Benhamini-Hochberg correction. As seen in the updated Figure 6A, 16 out of the 18 reporter constructs displayed significant differences (adjusted p-value

      1. What was the overall correlation between GWAS-based fold changes and those from the GFP-based validation? What does Figure 6A look like as a scatter plot comparing these two sets of values?

      We thank the reviewer for this helpful suggestion, which allows us to more closely look into the results of our in-vitro validation. We performed a direct comparison of RNAseq fold changes from the GWAS (x-axis) with the GFP reporter measurements (y-axis) as depicted in the figure above. The overall correlation between the two was weak (Pearson r = 0.17), reflecting the lack of thorough agreement between the associations and the reporter construct. We however note that the two metrics are not directly comparable in our opinion, since on the x-axis we are measuring changes in gene expression and on the y-axis changes in fluorescence expression, which is downstream from it. As mentioned above and in reply to a comment from reviewer 2, the agreement between measured gene expression and all other in-silico and in-vitro techniques increases when ignoring the direction of the change. Overall, we believe that these results partly validate our associations and predictions, while indicating that other factors in trans with the regulatory region contribute to changes in gene expression, which is to be expected. The scatter plot has been included as a new supplementary figure (Supplementary Figure 7).

      1. Was the SNP analyzed in the last Results section significant in the gene expression GWAS? Did the DE results reported in this final section correspond to that GWAS in some way?

      The T>C SNP upstream of waaQ did not show significant association with gene expression in our cis GWAS analysis. Instead, this variant was associated with resistance to tobramycin when referencing data from Danesh et al, and we observed the variant in our strain collection. We subsequently investigated whether this variant also influenced expression of the waa operon under sub-inhibitory tobramycin exposure. The differential expression results shown in the final section therefore represent a functional follow-up experiment, and not a direct replication of the GWAS presented in the first part of the manuscript.

      1. Line 470: "Consistent with the differences in the genetic structure of the two species" It is not clear what differences in genetic structure this refers to. Population structure? Genome architecture? Differences in the biology of regulatory regions?

      The awkwardness of that sentence is perhaps the consequence of our assumption that readers would be aware of the differences in population genetics differences between the two species. We however have realized that not much literature is available (if at all!) about these differences, which we have observed during the course of this and other studies we have carried out. As a result, we agree that we cannot assume that the reader is similarly familiar with these differences, and have changed that sentence (i.e. L548) to more directly address the differences between the two species, which will presumably result in a diverse population structure. We thank the reviewer for letting us be aware of a gap in the literature concerning the comparison of pangenome structures across relevant species.

      1. Line 480: the reference to "adaption" is not warranted, as the paper contains no analyses of evolutionary patterns or processes. Genetic variation is not the same as adaptation.

      We have amended this sentence to be more adherent to what we can conclude from our analyses.

      1. There is insufficient information on how the E. coli RNA-seq data was generated. How was RNA extracted? Which QC was done on the RNA; what was its quality? Which library kits were used? Which sequencing technology? How many reads? What QC was done on the RNA-seq data? For this section, the Methods are seriously deficient in their current form and need to be greatly expanded.

      We thank the reviewer for highlighting the need for clearer methodological detail. We have expanded this section (i.e. L608) to fully describe the generation and quality control of the E. coli RNA-seq data including RNA extraction and sequencing platform.

      1. How were the DEG p-values adjusted for multiple testing?

      As indicated in the methods section (“Differential gene expression and functional enrichment analysis”), we have used DEseq2 for E. coli, and LPEseq for P. aeruginosa. Both methods use the statistical framework of the False Discovery Rate (FDR) to compute an adjusted p-value for each gene. We have added a brief mention of us following the standard practice indicated by both software packages in the methods.

      1. Were there replicates for the E. coli strains? The methods do not say, but there is a hint there might have been replicates given their absence was noted for the other species.

      In the context of providing more information about the transcriptomics experiments for E. coli, we have also more clearly indicated that we have two biological replicates for the E. coli dataset.

      1. There needs to be more information on the "pattern-based method" that was used to correct the GWAS for multiple tests. How does this method work? What genome-wide threshold did it end up producing? Was there adjustment for the number of genes tested in addition to the number of variants? Was the correction done per variant class or across all variant classes?

      In line with an earlier comment from this reviewer, we have expanded the section in the Methods (e.g. L689) that explains how this correction worked to include as many details as possible, in order to provide the readers with the full context under which our analyses were carried out.

      1. For a paper that, at its core, performs a cis-eQTL mapping, it is an oversight that there seems not to be a single reference to the rich literature in this space, comprising hundreds of papers, in other species ranging from humans, many other animals, to yeast and plants.

      We thank both reviewer #1 and #3 for pointing out this lack of references to the extensive literature on the subject. We have added a number of references about the applications of eQTL studies, and specifically its application in microbial pangenomes, which we believe is more relevant to our study, in the introduction.

      Minor comments:

      1. I wasn't able to understand the top panels in Figure 4. For ulaE, most strains have the solid colors, and the corresponding bottom panel shows mostly red points. But for waaQ, most strains have solid color in the top panel, but only a few strains in the bottom panel are red. So solid color in the top does not indicate a variant allele? And why are there so many solid alleles; are these all indels? Even if so, for kgtP, the same colors (i.e., nucleotides) seem to seamlessly continue into the bottom, pale part of the top panel. How are these strains different genotypically? Are these blocks of solid color counted as one indel or several SNPs, or somehow as k-mer differences? As the authors can see, these figures are really hard to understand and should be reworked. The same comment applies to Figure 5, where it seems that all (!) strains have the "variant"?

      We thank the reviewer for pointing out some limitations with our visualizations, most importantly with the way we explained how to read those two figures. We have amended the captions to more explicitly explain what is shown. The solid colors in the “sequence pseudo-alignment” panels indicate the focal cis variant, which is indicated in red in the corresponding “predicted transcription rate” panels below. In the case of Figure 5, the solid color indicates instead the position of the TFBS in the reference.

      1. Figure 1A & B: It would be helpful to add the total number of analyzed genes somewhere so that the numbers denoted in the colored outer rings can be interpreted in comparison to the total.

      We have added the total number of genes being considered for either species in the legend.

      1. Figure 1C & D: It would be better to spell out the COG names in the figure, as it is cumbersome for the reader to have to look up what the letters stand for in a supplementary table in a separate file.

      While we do not disagree with the awkwardness of having to move to a supplementary table to identify the full name of a COG category, we also would like to point out that the very long names of each category would clutter the figure to a degree that would make it difficult to read. We had indeed attempted something similar to what the reviewer suggests in early drafts of this manuscript, leading to small and hard to read labels. We have therefore left the full names of each COG category in Supplementary Table 3.

      1. Line 107: "Similarly," does not fit here as the following example (with one differentially expressed gene in an operon) is conceptually different from the one before, where all genes in the operon were differentially expressed.

      We agree and have amended the sentence accordingly.

      1. Figure 5 bottom panel: it is odd that on the left the swarm plots (i.e., the dots) are on the inside of the boxplots while on the right they are on the outside.

      We have fixed the position of the dots so that they are centered with respect to the underlying boxplots.

      1. It is not clear to me how only one or a few genes in an operon can show differential mRNA abundance. Aren't all genes in an operon encoded by the same mRNA? If so, shouldn't this mRNA be up- or downregulated in the same manner for all genes it encodes? As I am not closely familiar with bacterial systems, it is well possible that I am missing some critical fact about bacterial gene expression here. If this is not an analysis artifact, the authors could briefly explain how this observation is possible.

      We thanks the reviewer for their comment, which again echoes one of the main concerns from reviewer #2. As noted in our reply above, it has been established in multiple studies (see the three we have indicated above in our reply to reviewer #2) how bacteria encode for multiple “non-canonical” transcriptional units (i.e. operons), due to the presence of accessory terminators and promoters. This, together with other biological effects such as the presence of mRNA molecules of different lengths due to active transcription and degradation and technical noise induced by RNA isolation and sequencing can result in variability in the estimation of abundance for each gene.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This work provides an important resource identifying 72 proteins as novel candidates for plasma membrane and/or cell wall damage repair in budding yeast, and describes the temporal coordination of exocytosis and endocytosis during the repair process. The data are convincing; however, additional experimental validation will better support the claim that repair proteins shuttle between the bud tip and the damage site.

      We thank the editors and reviewers for their positive assessment of our work and the constructive feedback to improve our manuscript. We agree with the assessment that additional validation of repair protein shuttling between the bud tip and the damage site is required to further support the model.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Yamazaki et al. conducted multiple microscopy-based GFP localization screens, from which they identified proteins that are associated with PM/cell wall damage stress response. Specifically, the authors identified that budlocalized TMD-containing proteins and endocytotic proteins are associated with PM damage stress. The authors further demonstrated that polarized exocytosis and CME are temporally coupled in response to PM damage, and CME is required for polarized exocytosis and the targeting of TMD-containing proteins to the damage site. From these results, the authors proposed a model that CME delivers TMD-containing repair proteins between the bud tip and the damage site.

      Strengths:

      Overall, this is a well-written manuscript, and the experiments are well-conducted. The authors identified many repair proteins and revealed the temporal coordination of different categories of repair proteins. Furthermore, the authors demonstrated that CME is required for targeting of repair proteins to the damage site, as well as cellular survival in response to stress related to PM/cell wall damage. Although the roles of CME and bud-localized proteins in damage repair are not completely new to the field, this work does have conceptual advances by identifying novel repair proteins and proposing the intriguing model that the repairing cargoes are shuttled between the bud tip and the damaged site through coupled exocytosis and endocytosis.

      Weaknesses:

      While the results presented in this manuscript are convincing, they might not be sufficient to support some of the authors' claims. Especially in the last two result sessions, the authors claimed CME delivers TMD-containing repair proteins from the bud tip to the damage site. The model is no doubt highly possible based on the data, but caveats still exist. For example, the repair proteins might not be transported from one localization to another localization, but are degraded and resynthesized. Although the Gal-induced expression system can further support the model to some extent, I think more direct verification (such as FLIP or photo-convertible fluorescence tags to distinguish between pre-existing and newly synthesized proteins) would significantly improve the strength of evidence.

      Major experiment suggestions:

      (1) The authors may want to provide more direct evidence for "protein shuttling" and for excluding the possibility that proteins at the bud are degraded and synthesized de novo near the damage site. For example, if the authors could use FLIP to bleach budlocalized fluorescent proteins, and the damaged site does not show fluorescent proteins upon laser damage, this will strongly support the authors' model. Alternatively, the authors could use photo-convertible tags (e.g., Dendra) to differentiate between preexisting repair proteins and newly synthesized proteins.

      We thank the reviewer for evaluating our work and giving us important feedback. We agree that the FLIP and photo-convertible experiments will further confirm our model. Here, due to time and resource constraints, we decided not to perform this experiment. Instead, we have discussed this limitation in 363-366. Our proposed model of repair protein shuttling should be further tested in our future work.

      (2) In line with point 1, the authors used Gal-inducible expression, which supported their model. However, the author may need to show protein abundance in galactose, glucose, and upon PM damage. Western blot would be ideal to show the level of fulllength proteins, or whole-cell fluorescence quantification can also roughly indicate the protein abundance. Otherwise, we cannot assume that the tagged proteins are only expressed when they are growing in galactose-containing media.

      Thank you very much for raising the concern and suggesting the important experiments.We agree that the Western blot experiment to confirm the mNG-Snc1 expression in each medium will further strengthen our conclusion. Along with point (1), further investigation of repair protein shuttling between the bud tip and the damage site and the mechanisms underlying it will be an important future direction. As described above, we have discussed this limitation in 363-366.

      (3) Similarly, for Myo2 and Exo70 localization in CME mutants (Figure 4), it might be worth doing a western or whole-cell fluorescence quantification to exclude the caveat that CME deficiency might affect protein abundance or synthesis.

      We thank the reviewer for suggesting the point. Following the reviewer’s suggestion, we quantified the whole-cell fluorescence of WT and CME mutants and verified that the effect of the CME deletion on the expression levels of Myo2-sfGFP and Exo70-mNG is minimal ( Figure S6). We added the description in lines 211-212.

      (4) From the authors' model in Figure 7, it looks like the repair proteins contribute to bud growth. Does laser damage to the mother cell prevent bud growth due to the reduction of TMD-containing repair proteins at the bud? If the authors could provide evidence for that, it would further support the model.

      Thank you very much for raising the important point. We speculate that the reduction of TMD-containing proteins at the bud by CME is one of the causes of cell growth arrest after PM damage (1). This is because TMD-containing repair proteins at the bud tip, including phospholipid flippases (Dnf1/Dnf2), Snc1, and Dfg5, are involved in polarized cell growth (2-4). This will be an important future direction as well.

      (5) Is the PM repair cell-cycle-dependent? For example, would the recruitment of repair proteins to the damage site be impaired when the cells are under alpha-factor arrest?

      Thank you for raising this interesting point. Indeed, the senior author Kono previously performed this experiment when she was in David Pellman’s lab. The preliminary results suggest that Pkc1 can be targeted to the damage site, without any impairment, under alpha-factor arrest. A more comprehensive analysis in the future will contribute to concluding the relation between PM repair and the cell cycle.

      Reviewer #2 (Public review):

      This paper remarkably reveals the identification of plasma membrane repair proteins, revealing spatiotemporal cellular responses to plasma membrane damage. The study highlights a combination of sodium dodecyl sulfate (SDS) and lase for identifying and characterizing proteins involved in plasma membrane (PM) repair in Saccharomyces cerevisiae. From 80 PM, repair proteins that were identified, 72 of them were novel proteins. The use of both proteomic and microscopy approaches provided a spatiotemporal coordination of exocytosis and clathrin-mediated endocytosis (CME) during repair. Interestingly, the authors were able to demonstrate that exocytosis dominates early and CME later, with CME also playing an essential role in trafficking transmembrane-domain (TMD)containing repair proteins between the bud tip and the damage site.

      Weaknesses/limitations:

      (1) Why are the authors saying that Pkc1 is the best characterized repair protein? What is the evidence?

      We would like to thank the reviewer for taking his/her time to evaluate our work and for valuable suggestions. We described Pkc1 as “best characterized” because it was the first protein reported to accumulate at the laser damage site in budding yeast (5). However, as the reviewer suggested, we do not have enough evidence to describe Pkc1 as “best characterized”. We therefore used “one of the known repair proteins” to mention Pkc1 in the manuscript (lines 90-91).

      (2) It is unclear why the authors decided on the C-terminal GFP-tagged library to continue with the laser damage assay, exclusively the C-terminal GFP-tagged library. Potentially, this could have missed N-terminal tag-dependent localizations and functions and may have excluded functionally important repair proteins

      Thank you very much for the comments. We decided to use the C-terminal GFP-tagged library for the laser damage assay because we intended to evaluate the proteins of endogenous expression levels. The N-terminal sfGFP-tagged library is expressed by the NOP1 promoter, while the C-terminal GFP-tagged library is expressed by the endogenous promoters. We clarified these points in lines 114-118. We agree with the reviewer on that we may have missed some portion of repair proteins in the N-terminaldependent localization and functions by this approach. Therefore, in our manuscript, we discussed these limitations in lines 281-289.

      (3) The use of SDS and laser damage may bias toward proteins responsive to these specific stresses, potentially missing proteins involved in other forms of plasma membrane injuries, such as mechanical, osmotic, etc.). SDS stress is known to indirectly induce oxidative stress and heat-shock responses.

      Thank you very much for raising this point. We agree that the combination of SDS and laser may be biased to identify PM repair proteins. Therefore, in the manuscript, we discussed this point as a limitation of this work in lines 292-298.

      (4) It is unclear what the scale bars of Figures 3, 5, and 6 are. These should be included in the figure legend.

      We apologize for the missing scale bars. We added them to the legends of the figures in the manuscript.

      (5) Figure 4 should be organized to compare WT vs. mutant, which would emphasize the magnitude of impairment.

      Thank you for raising this point. Following the suggestion, we updated Figure 4. In the Figure 4, we compared WT vs mutant in the manuscript. We clarified it in the legends in the manuscript. 

      (6) It would be interesting to expand on possible mechanisms for CME-mediated sorting and retargeting of TMD proteins, including a speculative model.

      Thank you very much for this important suggestion. We think it will be very important to characterize the mechanism of CME-mediated TMD protein trafficking between the bud tip and the damage site. In the manuscript, we discussed the possible mechanism for CME activation at the damage site in lines 328-333. We speculate that the activation of the CME may facilitate the retargeting of the TMD proteins from the damage site to the bud tip.

      We do not have a model of how CMEs activate at the bud tip to sort and target the TMD proteins to the damage site. One possibility is that the cell cycle arrest after PM damage (1) may affect the localization of CME proteins because the cell cycle affects the localization of some of the CME proteins (6). We will work on the mechanism of repair protein sorting from the bud tip to the damage site in our future work.

      Reviewer #3 (Public review):

      Summary:

      This work aims to understand how cells repair damage to the plasma membrane (PM). This is important, as failure to do so will result in cell lysis and death. Therefore, this is an important fundamental question with broad implications for all eukaryotic cells. Despite this importance, there are relatively few proteins known to contribute to this repair process. This study expands the number of experimentally validated PM from 8 to 80. Further, they use precise laser-induced damage of the PM/cell wall and use livecell imaging to track the recruitment of repair proteins to these damage sites. They focus on repair proteins that are involved in either exocytosis or clathrin-mediated endocytosis (CME) to understand how these membrane remodeling processes contribute to PM repair. Through these experiments, they find that while exocytosis and CME both occur at the sites of PM damage, exocytosis predominates in the early stages of repairs, while CME predominates in the later stages of repairs. Lastly, they propose that CME is responsible for diverting repair proteins localized to the growing bud cell to the site of PM damage.

      Strengths:

      The manuscript is very well written, and the experiments presented flow logically. The use of laser-induced damage and live-cell imaging to validate the proteome-wide screen using SDS-induced damage strengthens the role of the identified candidates in PM/cell wall repair.

      Weaknesses:

      (1) Could the authors estimate the fraction of their candidates that are associated with cell wall repair versus plasma membrane repair? Understanding how many of these proteins may be associated with the repair of the cell wall or PM may be useful for thinking about how these results are relevant to systems that do or do not have a cell wall. Perhaps this is already in their GO analysis, but I don't see it mentioned in the manuscript.

      We would like to thank the reviewer for taking his/her time to evaluate our work and valuable suggestions. We agree that this is important information to include. Although it may be difficult to completely distinguish the PM repair and cell wall repair proteins, we have identified at least six proteins involved in cell wall synthesis (Flc1, Dfg5, Smi1, Skg1, Tos7, and Chs3). We included this information in lines 142-146 in the manuscript.

      (2) Do the authors identify actin cable-associated proteins or formin regulators associated with sites of PM damage? Prior work from the senior author (reference 26) shows that the formin Bnr1 relocalizes to sites of PM damage, so it would be interesting if Bnr1 and its regulators (e.g., Bud14, Smy1, etc) are recruited to these sites as well. These may play a role in directing PM repair proteins (see more below).

      Thank you for the suggestion. We identified several Bnr1-interacting proteins, including Bud6, Bil1, and Smy1 (Table S2), although Bnr1 itself was not identified in our screening. This could be attributed to the fact that (1) C-terminal GFP fusion impaired the function of Bnr1, and (2) a single GFP fusion is not sufficient to visualize the weak signal at the damage site. Indeed, in reference 26, 3GFP-Bnr1 (N-terminal 3xGFP fusion) was used.

      (3) Do the authors suspect that actin cables play a role in the relocalization of material from the bud tip to PM damage sites? They mention that TMD proteins are secretory vesicle cargo (lines 134-143) and that Myo2 localizes to damage sites. Together, this suggests a possible role for cable-based transport of repair proteins. While this may be the focus of future work, some additional discussion of the role of cables would strengthen their proposed mechanism (steps 3 and 4 in Figure 7).

      Thank you very much for the suggestion. We agree that actin cables may play a role in the targeting of vesicles and repair proteins to the damage site. Following the reviewer’s suggestion, we discussed the roles of Bnr1 and actin cables for repair protein trafficking in lines 309-313 in the manuscript.

      (4) Lines 248-249: I find the rationale for using an inducible Gal promoter here unclear. Some clarification is needed.

      Thank you for raising this point. We clarified this as possible as we could in lines 249255 in the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The N-terminal GFP collection screen is interesting but seems irrelevant to the rest of the results. The authors discussed that in the discussion part, but it might be worth showing how many hits from the laser damage screen (in Figure 2) overlap with the Nterminal GFP screen hits.

      Thank you for the suggestion. We found that 48 out of 80 repair proteins are hits in the N-terminal GFP library (Table S1 and S2). This result suggested that the N-terminal library is also a useful resource for identifying repair proteins. In the manuscript, we discussed it in lines 288-289.

      (2) SDS treatment seems a harsh stressor. As the authors mentioned, the overlapped hits from the N- and C-terminal GFP screen might be more general stress factors. Thus, I think Line 84 (the subtitle) might be overclaiming, and the authors might need to tone down the sentence.

      Thank you for the suggestion. Following the reviewer’s suggestion, we changed the sentence to “Proteome-scale identification of SDS-responsive proteins” in the manuscript. We believe that the new sentence describes our findings more precisely.

      (3) Line 103-106, it does not seem obvious to me that the protein puncta in the cytoplasm are due to endocytosis. The authors might need to provide more experimental evidence for the conclusion, or at least provide more reasoning/references on that aspect (e.g.,several specific protein hits belonging to that group have been shown to be endocytosed).

      Thank you very much for raising this point. We agree with the reviewer and deleted the description that these puncta are due to endocytosis in the manuscript.

      (4) For Figure 1D and S1C, the authors annotated some of the localization changes clearly, but some are confusing to me. For example," from bud tip/neck" to where? And from where to "Puncta/foci"? A clearer annotation might help the readers to understand the categorization.

      Thank you very much for the suggestion. These annotations were defined because it is difficult to conclusively describe the protein localization after SDS treatment. To convincingly identify the destination of the GFP fusion proteins, the dual color imaging of proteins with organelle markers or deep learning-based localization estimation is required. We feel that this might be out of the scope of this work. Therefore, as criteria, we used the localization of protein localization in normal/non-stressed conditions reported in (7) and the Saccharomyces Genome Database (SGD). We clarified this annotation definition in the manuscript (lines 413-436).

      (5) For localization in Figure 2C, as I understand, does it refer to6 the "before damage/normal" localization? If so, I think it would be helpful to state that these localizations are based on the untreated/normal conditions in the text.

      Yes, it refers to the “before damage/normal localization”. Following the reviewer’s suggestion, we stated that these localizations are based on these conditions in the manuscript (line 130).

      (6) The authors mentioned "four classes" in Line 120, but did not mention the "PM to cytoplasm" class in the text. It would be helpful to discuss/speculate why these transporters might contribute to PM damage repair.

      Thank you very much for this suggestion. We speculated that these transporters are endocytosed after PM damage because endocytosis of PM proteins contributes to cell adaptation to environmental stress (8). We mentioned it in the manuscript (lines 120-122).

      (7) Line 175-180 My understanding of the text is that the signals of Exo70-mNG/Dnf1mNG peak before the Ede1-mSc-I peaks. They occur simultaneously, but their dominating phase are different. It is clearer when looking at the data, but I think the conclusion sentences themselves are confusing to me. The authors might consider rewriting the sentences to make them more straightforward.

      Thank you very much for pointing this out. Following the reviewer’s suggestion, we revised the sentence (lines 177-182 in the manuscript).

      Reviewer #2 (Recommendations for the authors):

      It would be interesting to expand on the functional characterization of the 72 novel candidates and explore possible mechanisms for CME-mediated sorting and retargeting of TMD proteins by including a speculative model.

      Thank you very much for the comment. We agree that the further characterization of novel repair proteins and exploration of the possible mechanisms for CME-mediated TMD protein sorting and retargeting are truly important. This should be our important future direction.

      Reviewer #3 (Recommendations for the authors):

      The x-axis in Figure 1C is labeled 'Ratio' - what is this a ratio of?

      Thank you for raising this point. It is the ratio of the number of proteins associated with a GO term to the total number of proteins in the background. We clarified it in the legend of Figure 1C in the manuscript.

      References

      (1) K. Kono, A. Al-Zain, L. Schroeder, M. Nakanishi, A. E. Ikui, Plasma membrane/cell wall perturbation activates a novel cell cycle checkpoint during G1 in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 113, 6910-6915 (2016).

      (2) A. Das et al., Flippase-mediated phospholipid asymmetry promotes fast Cdc42 recycling in dynamic maintenance of cell polarity. Nat Cell Biol 14, 304-310 (2012).

      (3) M. Adnan et al., SNARE Protein Snc1 Is Essential for Vesicle Trafficking, Membrane Fusion and Protein Secretion in Fungi. Cells 12 (2023).

      (4) H.-U. Mösch, G. R. Fink, Dissection of Filamentous Growth by Transposon Mutagenesis in Saccharomyces cerevisiae. Genetics 145, 671-684 (1997).

      (5) K. Kono, Y. Saeki, S. Yoshida, K. Tanaka, D. Pellman, Proteasomal degradation resolves competition between cell polarization and cellular wound healing. Cell 150, 151-164 (2012).

      (6) A. Litsios et al., Proteome-scale movements and compartment connectivity during the eukaryotic cell cycle. Cell 187, 1490-1507.e1421 (2024).

      (7) W.-K. Huh et al., Global analysis of protein localization in budding yeast.Nature 425, 686-691 (2003).

      (8) T. López-Hernández, V. Haucke, T. Maritzen, Endocytosis in the adaptation to cellular stress. Cell Stress 4, 230-247 (2020).

    1. Reviewer #3 (Public review):

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

    2. Author response:

      The following is the authors’ response to the current reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

      Thank you. In this revision, we will focus on addressing Reviewer 3’s concern on the potential confound between posterior probability and time in neuroimaging results. First, we will present whole-brain results of subjects’ probability estimates (their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we will compare the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. Third, to address Reviewer 3’s comment that from the Tables of activation in the supplement vmPFC and ventral striatum cannot be located, we will add slice-by-slice image of the whole-brain results on Pt in the Supplemental Information in addition to the Tables of Activation.

      Public Reviews:

      Reviewer #1 (Public review):<br /> Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

      Thank you for reviewing our paper and providing constructive comments that helped us improve our paper.

      Reviewer #3 (Public review):

      Thank you again for reviewing the manuscript. In this revision, we will focus on addressing your concern on the potential confound between posterior probability and time in neuroimaging results. First, we will present whole-brain results of subjects’ probability estimates (Pt, their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we will compare the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. These results will be summarized in a new figure (Figure 4).

      Finally, to address that you were not able to locate vmPFC and ventral striatum from the Tables of activation, we will add slice-by-slice image of the whole-brain results on Pt in the supplement in addition to the Tables of Activation.

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      Thank you. Yes, people do struggle with conditional probabilities in many studies. However, as our previous work suggested (Massey and Wu, 2005), system-neglect was likely not due to response mode (having to enter probability estimates or making binary predictions, and etc.).

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      We thank the reviewer for this comment. We do not disagree that there are alternative models that can describe over- and underreactions seen in the dataset. However, we do wish to point out that since we began with the normative Bayesian model, the natural progression in case the normative model fails to capture data is to modify the starting model. It is under this context that we developed the system-neglect model. It was a simple extension (a parameterized version) of the normative Bayesian model.

      Regarding the hyperprior idea, even if the participants have a hyperprior, there has to be some function that describes/implements attraction to the mean. Having a hyperprior itself does not imply attraction to this hyperprior. We therefore were not sure why the hyperprior itself can produce attraction to the mean.

      We do look further into the possibility of attraction to the mean. First, as suggested by the reviewer, we looked into another dataset with different mean ground-truth value. In Massey and Wu (2005), the transition probabilities were [0.02 0.05 0.1 0.2], which is different from the current study [0.01 0.05 0.1], and there they also found over- and underreactions as well. Second, we reason that for the attraction to the mean idea to work subjects need to know the mean of the system parameters. This would take time to develop because we did not tell subjects about the mean. If this is caused by attraction to the mean, subjects’ behavior would be different in the early stage of the experiment where they had little idea about the mean, compared with the late stage of the experiment where they knew about the mean. We will further analyze and compare participants’ data at the beginning of the experiment with data at the end of the experiment.

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      We thank the reviewer for pointing out these potential explanations. Again, we do not disagree that any model in which participants don’t fully use numerical information they were given would produce system neglect. It is hard to separate ‘not fully using numerical information’ from ‘lack of sensitivity to the numerical information’. We will respond in more details to the four example reasons later.

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      Again, we do not disagree with the reviewer on the modeling statement. However, we also wish to point out that the system-neglect model we had is a simple extension of the normative Bayesian model. Had we gone to a non-Bayesian framework, we would have faced the criticism of why we simply do not consider a simple extension of the starting model. In response, we will add a section in Discussion summarizing our exchange on this matter.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      Thank you. In response, we will add a new figure, now Figure 4, showing the results of Pt and delta Pt from GLM-2 where we added the intertemporal prior as a regressor to control for temporal confounds. We compared Pt and delta Pt results in vmPFC and ventral striatum between GLM-1 and GLM-2. We also will show the results of intertemporal prior on vmPFC and ventral striatum under GLM-2.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments ( subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, , in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of . First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of  did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of  can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      **Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):**

      Thank you for pointing out the inclusion of the intertemporal prior in glm2, this seems like an important control that would address my criticism. Why not present this better-controlled analysis in the main figure, rather than the results for glm1 which has no effective control of the increasing posterior probability of a reversal with time?

      Thank you for this suggestion. We added a new figure (Figure 4) that showed results from GLM-2. In this new figure, we showed whole-brain results on Pt and delta Pt, ROI results of vmPFC and ventral striatum on Pt, delta Pt, and intertemporal prior.

      The reason we kept results from GLM-1 (Figure 3) was primarily because we wanted to compare the effect of Pt between experiments under identical GLM. In other words, the regressors in GLM-1 was identical across all 3 experiments. In Experiments 1 and 2, Pt and delta Pt were respectively probability estimates and belief updates that current regime was the Blue regime. In Experiment 3, Pt and delta Pt were simply the number subjects were instructed to press (Pt) and change in number between successive periods (delta Pt).

      As a further point I could not navigate the tables of fMRI activations in SI and recommend replacing or supplementing these with images. For example I cannot actually find a vmPFC or ventral striatum cluster listed for the effect of Pt in GLM1 (version in table S1), which I thought were the main results? Beyond that, comparing how much weaker (or not) those results are when additional confound regressors are included in GLM2 seems impossible.

      The vmPFC and ventral striatum were part of the cluster labeled as Central Opercular cortex. In response, we will provide information about coordinates on the local maxima within the cluster. We will also add slice-by-slice images showing the effect of Pt.


      The following is the authors’ response to the original reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting distinct contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative task design, behavioral modeling, and model-based fMRI analyses provides a solid foundation for the conclusions; however, the neuroimaging results have several limitations, particularly a potential confound between the posterior probability of a switch and the passage of time that may not be fully controlled by including trial number as a regressor. The control experiments intended to address this issue also appear conceptually inconsistent and, at the behavioral level, while informing participants of conditional probabilities rather than requiring learning is theoretically elegant, such information is difficult to apply accurately, as shown by well-documented challenges with conditional reasoning and base-rate neglect. Expressing these probabilities as natural frequencies rather than percentages may have improved comprehension. Overall, the study advances understanding of belief updating under uncertainty but would benefit from more intuitive probabilistic framing and stronger control of temporal confounds in future work.

      We thank the editors for the assessment and we appreciate your efforts in reviewing the paper. The editors added several limitations in the assessment based on the new reviewer 3 in this round, which we would like to clarify below.

      With regard to temporal confounds, we clarified in the main text and response to Reviewer 3 that we had already addressed the potential confound between posterior probability of a switch and passage of time in GLM-2 with the inclusion of intertemporal prior. After adding intertemporal prior in the GLM, we still observed the same fMRI results on probability estimates. In addition, we did two other robustness checks, which we mentioned in the manuscript.

      With regard to response mode (probability estimation rather than choice or indicating natural frequencies), we wish to point out that the in previous research by Massey and Wu (2005), which the current study was based on, the concern of participants showing system-neglect tendencies due to the mode of information delivery, namely indicating beliefs through reporting probability estimates rather than through choice or other response mode was addressed. Massy and Wu (2005, Study 3) found the same biases when participants performed a choice task that did not require them to indicate probability estimates.

      With regard to the control experiments, the control experiments in fact were not intended to address the confounds between posterior probability and passage of time. Rather, they aimed to address whether the neural findings were unique to change detection (Experiment 2) and to address visual and motor confounds (Experiment 3). These and the results of the control experiments were mentioned on page 18-19.

      We also wish to highlight that we had performed detailed model comparisons after reviewer 2’s suggestions. Although reviewer 2 was unable to re-review the manuscript, we believe this provides insight into the literature on change detection. See “Incorporating signal dependency into system-neglect model led to better models for regime-shift detection” (p.27-30). The model comparison showed that system-neglect models that incorporate signal dependency are better models than the original system-neglect model in describing participants probability estimates. This suggests that people respond to change-consistent and change-inconsistent signals differently when judging whether the regime had changed. This was not reported in previous behavioral studies and was largely inspired by the neural finding on signal dependency in the frontoparietal cortex. It indicates that neural findings can provide novel insights into computational modeling of behavior.

      To better highlight and summarize our key contributions, we added a paragraph at the beginning of Discussion:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”    

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      We thank the reviewer for the comments.

      Weaknesses:

      The authors have adequately addressed most of my prior concerns.

      We thank the reviewer for recognizing our effort in addressing your concerns.

      My only remaining comment concerns the z-test of the correlations. I agree with the non-parametric test based on bootstrapping at the subject level, providing evidence for significant differences in correlations within the left IFG and IPS.

      However, the parametric test seems inadequate to me. The equation presented is described as the Fisher z-test, but the numerator uses the raw correlation coefficients (r) rather than the Fisher-transformed values (z). To my understanding, the subtraction should involve the Fisher z-scores, not the raw correlations.

      More importantly, the Fisher z-test in its standard form assumes that the correlations come from independent samples, as reflected in the denominator (which uses the n of each independent sample). However, in my opinion, the two correlations are not independent but computed within-subject. In such cases, parametric tests should take into account the dependency. I believe one appropriate method for the current case (correlated correlation coefficients sharing a variable [behavioral slope]) is explained here:

      Meng, X.-l., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111(1), 172-175. https://doi.org/10.1037/0033-2909.111.1.172

      It should be implemented here:

      Diedenhofen B, Musch J (2015) cocor: A Comprehensive Solution for the Statistical Comparison of Correlations. PLoS ONE 10(4): e0121945. https://doi.org/10.1371/journal.pone.0121945

      My recommendation is to verify whether my assumptions hold, and if so, perform a test that takes correlated correlations into account. Or, to focus exclusively on the non-parametric test.

      In any case, I recommend a short discussion of these findings and how the authors interpret that some of the differences in correlations are not significant.

      Thank you for the careful check. Yes. This was indeed a mistake from us. We also agree that the two correlations are not independent. Therefore, we modified the test that accounts for dependent correlations by following Meng et al. (1992) suggested by the reviewer. We updated in the Methods section on p.56-57:

      “In the parametric test, we adopted the approach of Meng et al. (1992) to statistically compare the two correlation coefficients. This approach specifically tests differences between dependent correlation coefficients according to the following equation

      Where N is the number of subjects, z<sub>ri</sub> is the Fisher z-transformed value of r<sub>i</sub>,(r<sub>1</sub> = r<sub>blue</sub> and r<sub>2</sub> = r<sub>red</sub>), and r<sub>x</sub> is the correlation between the neural sensitivity at change-consistent signals and change-inconsistent signals. The computation of h is based on the following equations

      Where is the mean of the , and f should be set to 1 if > 1.”

      We updated on the Results section on p.29:

      “Since these correlation coefficients were not independent, we compared them using the test developed in Meng et al. (1992) (see Methods). We found that among the five ROIs in the frontoparietal network, two of them, namely the left IFG and left IPS, the difference in correlation was significant (one-tailed z test; left IFG: z = 1.8908, p = 0.0293; left IPS: z = 2.2584, p = 0.0049). For the remaining three ROIs, the difference in correlation was not significant (dmPFC: z = 0.9522, p = 0.1705; right IFG: z = 0.9860, p = 0.1621; right IPS: z = 1.4833, p = 0.0690).”

      We added a Discussion on these results on p.41:

      “Interestingly, such sensitivity to signal diagnosticity was only present in the frontoparietal network when participants encountered change-consistent signals. However, while most brain areas within this network responded in this fashion, only the left IPS and left IFG showed a significant difference in coding individual participants’ sensitivity to signal diagnosticity between change-consistent and change-inconsistent signals. Unlike the left IPS and left IFG, we observed in dmPFC a marginally significant correlation with behavioral sensitivity at change-inconsistent signals as well. Together, these results indicate that while different brain areas in the frontoparietal network responded similarly to change-consistent signals, there was a greater degree of heterogeneity in responding to change-inconsistent signals.”

      Reviewer #3 (Public review):

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile, at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      We thank the reviewer for the overall descriptions of the manuscript.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Thank you for these assessments.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      We appreciate the reviewer’s concern on this issue. The concern was addressed in Massey and Wu (2005) as participants performed a choice task in which they were not asked to provide probability estimates (Study 3 in Massy and Wu, 2005). Instead, participants in Study 3 were asked to predict the color of the ball before seeing a signal. This was a more intuitive way of indicating his or her belief about regime shift. The results from the choice task were identical to those found in the probability estimation task (Study 1 in Massey and Wu). We take this as evidence that the system-neglect behavior the participants showed was less likely to be due to the mode of information delivery.

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      We thank the reviewer for this comment. It is true that the system-neglect model is not entirely inconsistent with regression to the mean, regardless of whether the implementation has a hyper prior or not. In fact, our behavioral measure of sensitivity to transition probability and signal diagnosticity, which we termed the behavioral slope, is based on linear regression analysis. In general, the modeling approach in this paper is to start from a generative model that defines ideal performance and consider modifying the generative model when systematic deviations in actual performance from the ideal is observed. In this approach, a generative Bayesian model with hyper priors would be more complex to begin with, and a regression to the mean idea by itself does not generate a priori predictions.

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020)

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      Thank you for raising this point. The modeling principle we adopt is the following. We start from the normative model—the Bayesian model—that defined what normative behavior should look like. We compared participants’ behavior with the Bayesian model and found systematic deviations from it. To explain those systematic deviations, we considered modeling options within the confines of the same modeling framework. In other words, we considered a parameterized version of the Bayesian model, which is the system-neglect model and examined through model comparison the best modeling choice. This modeling approach is not uncommon in economics and psychology. For example, Kahneman and Tversky adopted this approach when proposing prospect theory, a modification of expected utility theory where expected utility theory can be seen as one specific model for how utility of an option should be computed.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, doesn't Pt always increase with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? Unless this is completely linear, the effect won't be controlled by including trial number as a co-regressor (which was done).

      Thank you for raising this concern. Yes, Pt always increases with sample number regardless of evidence (seeing change-consistent or change-inconsistent signals). This is captured by the ‘intertemporal prior’ in the Bayesian model, which we included as a regressor in our GLM analysis (GLM-2), in addition to Pt. In short, GLM-1 had Pt and sample number. GLM-2 had Pt, intertemporal prior, and sample number, among other regressors. And we found that, in both GLM-1 and GLM-2, both vmPFC and ventral striatum correlated with Pt.

      To make this clearer, we updated the main text to further clarify this on p.18:

      “We examined the robustness of P<sub>t</sub> representations in these two regions in several follow-up analyses. First, we implemented a GLM (GLM-2 in Methods) that, in addition to P<sub>t</sub>, included various task-related variables contributing to P<sub>t</sub> as regressors (Fig. S7 in SI). Specifically, to account for the fact that the probability of regime change increased over time, we included the intertemporal prior as a regressor in GLM-2. The intertemporal prior is the natural logarithm of the odds in favor of regime shift in the t-th period, where q is transition probability and t = 1,…,10 is the period (see Eq. 1 in Methods). It describes normatively how the prior probability of change increased over time regardless of the signals (blue and red balls) the subjects saw during a trial. Including it along with P<sub>t</sub> would clarify whether any effect of P<sub>t</sub> can otherwise be attributed to the intertemporal prior. Second, we implemented a GLM that replaced P<sub>t</sub> with the log odds of P<sub>t</sub>, ln (P<sub>t</sub>/(1-P<sub>t</sub>)) (Fig. S8 in SI). Third, we implemented a GLM that examined  separately on periods when change-consistent (blue balls) and change-inconsistent (red balls) signals appeared (Fig. S9 in SI). Each of these analyses showed the same pattern of correlations between P<sub>t</sub> and activation in vmPFC and ventral striatum, further establishing the robustness of the P<sub>t</sub> findings.”

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments (n\=30 subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, P<sub>t</sub>, in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of . First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of P<sub>t</sub> did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of P<sub>t</sub> can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Many of the figures are too tiny - the writing is very small, as are the pictures of brains. I'd suggest adjusting these so they will be readable without enlarging.

      Thank you. We apologize for the poor readability of the figures. We had enlarged the figures (Fig. 5 in particular) and their font size to make them more readable.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      In our manuscript, we describe a role for the nuclear mRNA export factor UAP56 (a helicase) during metamorphic dendrite and presynapse pruning in flies. We characterize a UAP56 ATPase mutant and find that it rescues the pruning defects of a uap56 mutant. We identify the actin severing enzyme Mical as a potentially crucial UAP56 mRNA target during dendrite pruning and show alterations at both the mRNA and protein level. Finally, loss of UAP56 also causes presynapse pruning defects with actin abnormalities. Indeed, the actin disassembly factor cofilin is required for pruning specifically at the presynapse.

      We thank the reviewers for their constructive comments, which we tried to address experimentally as much as possible. To summarize briefly, while all reviewers saw the results as interesting (e. g., Reviewer 3's significance assessment: "Understanding how post-transcriptional events are linked to key functions in neurons is important and would be of interest to a broad audience") and generally methodologically strong, they thought that our conclusions regarding the potential specificity of UAP56 for Mical mRNA was not fully covered by the data. To address this criticism, we added more RNAi analyses of other mRNA export factors and rephrased our conclusions towards a more careful interpretation, i. e., we now state that the pruning process is particularly sensitive to loss of UAP56. In addition, reviewer 1 had technical comments regarding some of our protein and mRNA analyses. We added more explanations and an additional control for the MS2/MCP system. Reviewers 2 and 3 wanted to see a deeper characterization of the ATPase mutant provided. We generated an additional UAP56 mutant transgene, improved our analyses of UAP56 localization, and added a biochemical control experiment. We hope that our revisions make our manuscript suitable for publication.

      1. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      • *

      Comments by reviewer 1.

      Major comments

      1.

      For Figure 4, the MS2/MCP system is not quantitative. Using this technique, it is impossible to determine how many RNAs are located in each "dot". Each of these dots looks quite large and likely corresponds to some phase-separated RNP complex where multiple RNAs are stored and/or transported. Thus, these data do not support the conclusion that Mical mRNA levels are reduced upon UAP56 knockdown. A good quantitative microscopic assay would be something like smFISH. Additinally, the localization of Mical mRNA dots to dendrites is not convincing as it looks like regions where there are dendritic swellings, the background is generally brighter.

      Our response

      We indeed found evidence in the literature that mRNPs labeled with the MS2/MCP or similar systems form condensates (Smith et al., JCB 2015). Unfortunately, smFISH is not established for this developmental stage and would likely be difficult due to the presence of the pupal case. To address whether the Mical mRNPs in control and UAP56 KD neurons are comparable, we characterized the MCP dots in the respective neurons in more detail and found that their sizes did not differ significantly between control and UAP56 KD neurons. To facilitate interpretability, we also increased the individual panel sizes and include larger panels that only show the red (MCP::RFP) channel. We think these changes improved the figure. Thanks for the insight.

      Changes introduced: Figure 5 (former Fig. 4): Increased panel size for MCP::RFP images, left out GFP marker for better visibility. Added new analysis of MCP::RFP dot size (new Fig. 5 I).

      1.

      Alternatively, levels of Mical mRNA could be verified by qPCR in the laval brain following pan-neuronal UAP56 knockdown or in FACS-sorted fluorescently labeled da sensory neurons. Protein levels could be analyzed using a similar approach.

      Our response

      We thank the reviewer for this comment. Unfortunately, these experiments are not doable as neuron-wide UAP56 KD is lethal (see Flybase entry for UAP56). From our own experience, FACS-sorting of c4da neurons would be extremely difficult as the GFP marker fluorescence intensity of UAP56 KD neurons is weak - this would likely result in preferential sorting of subsets of neurons with weaker RNAi effects. In addition, FACS-sorting whole neurons would not discriminate between nuclear and cytoplasmic mRNA.

      The established way of measuring protein content in the Drosophila PNS system is immunofluorescence with strong internal controls. In our case, we also measured Mical fluorescence intensity of neighboring c1da neurons that do not express the RNAi and show expression levels as relative intensities compared to these internal controls. This procedure rules out the influence of staining variation between samples and is used by other labs as well.

      1.

      In Figure 5, the authors state that Mical expression could not be detected at 0 h APF. The data presented in Fig. 5C, D suggest the opposite as there clearly is some expression. Moreover, the data shown in Fig. 5D looks significantly brighter than the Orco dsRNA control and appears to localize to some type of cytoplasmic granule. So the expression of Mical does not look normal.

      Our response

      We thank the reviewer for this comment. In the original image in Fig. 5 C, the c4da neuron overlaps with the dendrite from a neighboring PNS neuron (likely c2da or c3da). The latter neuron shows strong Mical staining. We agree that this image is confusing and exchanged this image for another one from the same genotype.

      Changes introduced: Figure 5 L (former Fig. 5 C): Exchanged panel for image without overlap from other neuron.

      1.

      Sufficient data are not presented to conclude any specificity in mRNA export pathways. Data is presented for one export protein (UAP56) and one putative target (Mical). To adequately assess this, the authors would need to do RNA-seq in UAP56 mutants.

      Our response

      We thank the reviewer for this comment. To address this, we tested whether knockdown of three other mRNA export factors (NXF1, THO2, THOC5) causes dendrite pruning defects, which was not the case (new Fig. S1). While these data are consistent with specific mRNA export pathways, we agree that they are not proof. We therefore toned down our interpretation and removed the conclusion about specificity. Instead, we now use the more neutral term "increased sensibility (to loss of UAP56)".

      Changes introduced: Added new Figure S1: RNAi analyses of NXF1, THO2 and THOC5 in dendrite pruning. Introduced concluding sentence at the end of first Results paragraph: We conclude that c4da neuron dendrite pruning is particularly sensitive to loss of UAP56. (p. 6)

      1.

      In summary, better quantitative assays should be used in Figures 4 and 5 in order to conclude the expression levels of either mRNA or protein. In its current form, this study demonstrates the novel finding that UAP56 regulates dendrite and presynaptic pruning, potentially via regulation of the actin cytoskeleton. However, these data do not convincingly demonstrate that UAP56 controls these processes by regulating of Mical expression and defintately not by controlling export from the nucleus.

      Our response

      We hope that the changes we introduced above help clarify this.

      1.

      While there are clearly dendrites shown in Fig. 1C', the cell body is not readily identifiable. This makes it difficult to assess attachment and suggests that the neuron may be dying. This should be replaced with an image that shows the soma.

      Our response

      We thank the reviewer for this comment. Changes introduced: we replaced the picture in the panel with one where the cell body is more clearly visible.

      1.

      The level of knockdown in the UAS56 RNAi and P element insertion lines should be determined. It would be useful to mention the nature of the RNAi lines (long/short hairpin). Some must be long since Dcr has been co-expressed. Another issue raised by this is the potential for off-target effects. shRNAi lines would be preferable because these effects are minimized.

      Our response

      We thank the reviewer for this comment. Assessment of knockdown efficiency is a control to make sure the manipulations work the way they are intended to. As mRNA isolation from Drosophila PNS neurons is extremely difficult, RNAi or mutant phenotypes in this system are controlled by performing several independent manipulations of the same gene. In our case, we used two independent RNAi lines (both long hairpins from VDRC/Bloomington and an additional insertion of the VDRC line, see Table S1) as well as a mutant P element in a MARCM experiment, i. e., a total of three independent manipulations that all cause pruning defects, and the VDRC RNAi lines do not have any predicted OFF targets (not known for the Bloomington line). If any of these manipulations would not have matched, we would have generated sgRNA lines for CRISPR to confirm.

      Minor comments:

      1.

      The authors should explain what EB1:GFP is marking when introduced in the text.


      Our response

      We thank the reviewer for this comment. Changes introduced: we explain the EB1::GFP assay in the panel with one where the cell body is more clearly visible.

      1.

      The da neuron images throughout the figures could be a bit larger.

      Our response

      We thank the reviewer for this comment. Changes introduced: we changed the figure organization to be able to use larger panels:

      • the pruning analysis of the ATPase mutations (formerly Fig. 2) is now its own figure (Figure 3).

      • we increased the panel sizes of the MCP::RFP images (Figure 5 A - I, formerly Fig. 4).

      Reviewer #1 (Significance (Required)):

      Strengths:

      The methodology used to assess dendrite and presynaptic prunings are strong and the phenotypic analysis is conclusive.

      Our response

      We thank the reviewer for this comment.

      Weakness:

      The evidence demonstrating that UAP56 regulates the expression of Mical is unconvincing. Similarly, no data is presented to show that there is any specificity in mRNA export pathways. Thus, these major conclusions are not adequately supported by the data.

      Our response

      We hope the introduced changes address this comment.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      In this paper, the authors describe dendrite pruning defects in c4da neurons in the DEXD box ATPase UAP56 mutant or in neuronal RNAi knockdown. Overexpression UAP56::GFP or UAP56::GFPE194Q without ATPase activity can rescue dendrite pruning defects in UAP56 mutant. They further characterized the mis-localization of UAP56::GFPE194Q and its binding to nuclear export complexes. Both microtubules and the Ubiquitin-proteasome system are intact in UAP56RNAi neurons. However, they suggest a specific effect on MICAL mRNA nuclear export shown by using the MS2-MCP system., resulting in delay of MICAL protein expression in pruned neurons. Furthermore, the authors show that UAP56 is also involved in presynaptic pruning of c4da neuros in VNC and Mica and actin are also required for actin disassembly in presynapses. They propose that UAP56 is required for dendrite and synapse pruning through actin regulation in Drosophila. Following are my comments.

      Major comments

      1.

      The result that UAP56::GFPE194Q rescues the mutant phenotype while the protein is largely mis-localized suggests a novel mechanism or as the authors suggested rescue from combination of residual activities. The latter possibility requires further support, which is important to support the role mRNA export in dendrite and pre-synapse pruning. One approach would be to examine whether other export components like REF1, and NXF1 show similar mutant phenotypes. Alternatively, depleting residual activity like using null mutant alleles or combining more copies of RNAi transgenes could help.

      Our response

      We thank the reviewer for this comment. We agree that the mislocalization phenotype is interesting and could inform further studies on the mechanism of UAP56. To further investigate this and to exclude that this could represent a gain-of-function due to the introduced mutation, we made and characterized a new additional transgene, UAP56::GFP E194A. This mutant shows largely the same phenotypes as E194Q, with enhanced interactions with Ref1 and partial mislocalization to the cytoplasm. In addition, we tested whether knockdown of THO2, THOC5 or NXF1 causes pruning defects (no).

      Changes introduced:

      • added new Figure S1: RNAi analyses of NXF1, THO2 and THOC5 in dendrite pruning.

      • made and characterized a new transgene UAP56 E194A (new Fig. 2 B, E, E', 3 C, C', E, F).

      1.

      The localization of UAP56::GFP (and E194Q) should be analyzed in more details. It is not clear whether the images in Fig. 2A and 2B are from confocal single sections or merged multiple sections. The localization to the nuclear periphery of UAP56::GFP is not clear, and the existence of the E194Q derivative in both nucleus and cytosol (or whether there is still some peripheral enrichment) is not clear if the images are stacked.

      Our response

      We thank the reviewer for this comment. It is correct that the profiles in the old Figure 2 were from single confocal sections from the displayed images. As it was difficult to create good average profiles with data from multiple neurons, we now introduce an alternative quantification based on categories (nuclear versus dispersed) which includes data from several neurons for each genotype, including the new E194A transgene (new Fig 3 G). Upon further inspection, the increase at the nuclear periphery was not always visible and may have been a misinterpretation. We therefore removed this statement.

      Changes introduced:

      • added new quantitative analysis of UAP56 wt and E/A, E/Q mutant localization (new Fig 3 G).

      1.

      The Ub-VV-GFP is a new reagent, and its use to detect active proteasomal degradation is by the lack of GFP signals, which could be also due to the lack of expression. The use of Ub-QQ-GFP cannot confirm the expression of Ub-VV-GFP. The proteasomal subunit RPN7 has been shown to be a prominent component in the dendrite pruning pathway (Development 149, dev200536). Immunostaining using RPN7 antibodies to measure the RPN expression level could be a direct way to address the issue whether the proteasomal pathway is affected or not.

      Our response

      We thank the reviewer for this comment. We agree that it is wise to not only introduce a positive control for the Ub-VV-GFP sensor (the VCP dominant-negative VCP QQ), but also an independent control. As mutants with defects in proteasomal degradation accumulate ubiquitinated proteins (see, e. g., Rumpf et al., Development 2011), we stained controls and UAP56 KD neurons with antibodies against ubiquitin and found that they had similar levels (new Fig. S3).

      Changes introduced:

      • added new ubiquitin immunofluorescence analysis (new Fig. S3).

      1.

      Using the MS2/MCP system to detect the export of MICAL mRNA is a nice approach to confirm the UAP56 activity; lack of UAP56 by RNAi knockdown delays the nuclear export of MS2-MICAL mRNA. The rescue experiment by UAS transgenes could not be performed due to the UAS gene dosage, as suggested by the authors. However, this MS2-MICAL system is also a good assay for the requirement of UAP56 ATPase activity (absence in the E194Q mutant) in this process. Could authors use the MARCM (thus reduce the use of UAS-RNAi transgene) for the rescue experiment? Also, the c4da neuronal marker UAS-CD8-GFP used in Fig4 could be replaced by marker gene directly fused to ppk promoter, which can save a copy of UAS transgene. The results from the rescue experiment would test the dependence of ATPase activity in nuclear export of MICAL mRNA.

      Our response

      We thank the reviewer for this comment. This is a great idea but unfortunately, this experiment was not feasible due to the (rare) constraints of Drosophila genetics. The MARCM system with rescue already occupies all available chromosomes (X: FLPase, 2nd: FRT, GAL80 + mutant, 3rd: GAL4 + rescue construct), and we would have needed to introduce three additional ones (MCP::RFP and two copies of unmarked genomic MICAL-MS2, all on the third chromosome) that would have needed to be introduced by recombination. Any Drosophilist will see that this is an extreme, likely undoable project :-(

      1.

      The UAP56 is also involved in presynaptic pruning through regulating actin assembly, and the authors suggest that Mical and cofilin are involved in the process. However, direct observation of lifeact::GFP in Mical or cofilin RNAi knockdown is important to support this conclusion.

      Our response

      We thank the reviewer for this comment. In response, we analyzed the lifeact::GFP patterns of control and cofilin knockdown neurons and found that loss of cofilin also leads to actin accumulation (new Fig. 7 I, J).

      Changes introduced:

      • new lifeact analysis (new Fig. 7 I, J).

      Minor comments:

      1.

      RNA localization is important for dendrite development in larval stages (Brechbiel JL, Gavis ER. Curr Biol. 20;18(10):745-750). Yet, the role of UAP56 is relatively specific and shown only in later-stage pruning. It would need thorough discussion.


      Our response

      We thank reviewer 2 for this comment. We added the following paragraph to the discussion: "UAP56 has also been shown to affect cytoplasmic mRNA localization in Drosophila oocytes (Meignin and Davis, 2008), opening up the possibility that nuclear mRNA export and cytoplasmic transport are linked. It remains to be seen whether this also applies to dendritic mRNA transport (Brechbiel and Gavis, 2008)." (p.13)

      1.

      Could authors elaborate on the possible upstream regulators that might be involved, as described in "alternatively, several cofilin upstream regulators have been described (Rust, 2015) which might also be involved in presynapse pruning and subject to UAP56 regulation" in Discussion?

      Our response

      We thank reviewer 2 for this comment. In the corresponding paragraph, we cite as example now that cofilin is regulated by Slingshot phosphatases and LIM kinase (p.14).

      1.

      In Discussion, the role of cofilin in pre- and post-synaptic processes was described. The role of Tsr/Cofilin regulating actin behaviors in dendrite branching has been described in c3da and c4da neurons (Nithianandam and Chien, 2018 and other references) should be included in Discussion.

      Our response

      We thank reviewer 2 for this comment. In response we tested whether cofilin is required for dendrite pruning and found that this, in contrast to Mical, is not the case (new Fig. S6). We cite the above paper in the corresponding results section (p.12).

      Changes introduced:

      • new cofilin dendrite pruning analysis (new Fig. S6).

      • added cofilin reference in Results.

      1.

      The authors speculate distinct actin structures have to be disassembled in dendrite and presynapse pruning in Discussion. What are the possible actin structures in both sites could be elaborated.

      Our response

      We thank reviewer 2 for this comment. In response, we specify in the Discussion: "As Mical is more effective in disassembling bundled F-actin than cofilin (Rajan et al., 2023), it is interesting to speculate that such bundles are more prevalent in dendrites than at presynapses." (p14)

      Reviewer #2 (Significance (Required)):

      The study initiated a genetic screen for factors involved in a dendrite pruning system and reveals the involvement of nuclear mRNA export is an important event in this process. They further identified the mRNA of the actin disassembly factor MICAL is a candidate substrate in the exporting process. This is consistent with previous finding that MICAL has to be transcribed and translated when pruning is initiated. As the presynapses of the model c4da neuron in this study is also pruned, the dependence on nuclear export and local actin remodeling were also shown. Thus, this study has added another layer of regulation (the nuclear mRNA export) in c4da neuronal pruning, which would be important for the audience interested in neuronal pruning. The study is limited for the confusing result whether ATPase activity of the exporting factor is required.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: In the manuscript by Frommeyer, Gigengack et al. entitled "The UAP56 mRNA Export Factor is Required for Dendrite and Synapse Pruning via Actin Regulation in Drosophila" the authors surveyed a number of RNA export/processing factors to identify any required for efficient dendrite and/or synapse pruning. They describe a requirement for a general poly(A) RNA export factor, UAP56, which functions as an RNA helicase. They also study links to aspects of actin regulation.

      Overall, while the results are interesting and the impact of loss of UAP56 on the pruning is intriguing, some of the data are overinterpreted as presented. The argument that UAP56 may be specific for the MICAL RNA is not sufficiently supported by the data presented. The two stories about poly(A) RNA export/processing and the actin regulation seem to not quite be connected by the data presented. The events are rather distal within the cell, making connecting the nuclear events with RNA to events at the dendrites/synapse challenging.

      Our response

      We thank reviewer 3 for this comment. To address this, we tested whether knockdown of three other mRNA export factors (NXF1, THO2, THOC5) causes dendrite pruning defects, which was not the case (new Fig. S1). While these data are consistent with specific mRNA export pathways, we agree that they are not proof. We therefore toned down our interpretation and removed the conclusion about specificity. Instead, we now use the more neutral term "increased sensibility (to loss of UAP56)".

      We agree that it is a little hard to tie cofilin to UAP56, as we currently have no evidence that cofilin levels are affected by loss of UAP56, even though both seem to affect lifeact::GFP in a similar way (new Fig. 7 I, J). However, a dysregulation of cofilin can also occur through dysregulation of upstream cofilin regulators such as Slingshot and LIM kinase, making such a relationship possible.

      Changes introduced:

      • added new Figure S1: RNAi analyses of NXF1, THO2 and THOC5 in dendrite pruning.

      • introduced concluding sentence at the end of first Results paragraph: "We conclude that c4da neuron dendrite pruning is particularly sensitive to loss of UAP56." (p. 6)

      • add new lifeact::GFP analysis of cofilin KD (new Fig. I, J).

      • identify potential other targets from the literature in the Discussion (Slingshot phosphatases and LIM kinase, p.14).

      There are a number of specific statements that are not supported by references. See, for example, these sentences within the Introduction- "Dysregulation of pruning pathways has been linked to various neurological disorders such as autism spectrum disorders and schizophrenia. The cell biological mechanisms underlying pruning can be studied in Drosophila." The Drosophila sentence is followed by some specific examples that do include references. The authors also provide no reference to support the variant that they create in UAP56 (E194Q) and whether this is a previously characterized fly variant or based on an orthologous protein in a different system. If so, has the surprising mis-localization been reported in another system?

      Our response

      We thank reviewer 3 for this comment. We added the following references on pruning and disease:

      1) Howes, O.D., Onwordi, E.C., 2023. The synaptic hypothesis of schizophrenia version III: a master mechanism. Mol. Psychiatry 28, 1843-1856.

      2) Tang, G., et al., 2014. Loss of mTOR-dependent macroautophagy causes autistic-like synaptic pruning deficits. Neuron 83, 1131-43.

      To better introduce the E194 mutations, we explain the position of the DECD motif in the Walker B domain, give the corresponding residues in the human and yeast homologues and cite papers demonstrating the importance of this residue for ATPase activity:

      3) Saguez, C., et al., 2013. Mutational analysis of the yeast RNA helicase Sub2p reveals conserved domains required for growth, mRNA export, and genomic stability. RNA 19:1363-71.

      4) Shen, J., et al., 2007. Biochemical Characterization of the ATPase and Helicase Activity of UAP56, an Essential Pre-mRNA Splicing and mRNA Export Factor. J. Biol. Chem. 282, P22544-22550.

      We are not aware of other studies looking at the relationship between the UAP56 ATPase and its localization. Thank you for pointing this out!

      Specific Comments:

      Specific Comment 1: Figure 1 shows the impact of loss of UAP56 on neuron dendrite pruning. The experiment employs both two distinct dsRNAs and a MARCM clone, providing confidence that there is a defect in pruning upon loss of UAP56. As the authors mention screening against 92 genes that caused splicing defects in S2 cells, inclusion of some examples of these genes that do not show such a defect would enhance the argument for specificity with regard to the role of UAP56. This control would be in addition to the more technical control that is shown, the mCherry dsRNA.

      Our response

      We thank reviewer 3 for this comment. To address this, we included the full list of screened genes with their phenotypic categorization regarding pruning (103 RNAi lines targeting 64 genes) as Table S1. In addition, we also tested four RNAi lines targeting the nuclear mRNA export factors Nxf1, THO2 and THOC5 which do not cause dendrite pruning defects (Fig. S1).

      Changes introduced:

      • added RNAi screen results as a list in Table S1.

      • added new Figure S1: RNAi analyses of NXF1, THO2 and THOC5 in dendrite pruning.

      Specific Comment 2: Later the authors demonstrate a delay in the accumulation of the Mical protein, so if they assayed these pruning events at later times, would the loss of UAP56 cause a delay in these events as well? Such a correlation would enhance the causality argument the authors make for Mical levels and these pruning events.

      Our response

      We thank reviewer 3 for this comment. Unfortunately, this is somewhat difficult to assess, as shortly after the 18 h APF timepoint, the epidermal cells that form the attachment substrate for c4da neuron dendrites undergo apoptosis. Where assessed (e. g., Wang et al., 2017, Development) 144: 1851–1862), this process, together with the reduced GAL4 activity of our ppk-GAL4 during the pupal stage (our own observations), eventually leads to pruning, but the causality cannot be easily attributed anymore. We therefore use the 18 h APF timepoint essentially as an endpoint assay.

      Specific Comment 3: Figure 2 provides data designed to test the requirement for the ATPase/helicase activity of UAP56 for these trimming events. The first observation, which is surprising, is the mislocalization of the variant (E194Q) that the authors generate. The data shown does not seem to indicate how many cells the results shown represent as a single image and trace is shown the UAP56::GFP wildtype control and the E194Q variant.

      Our response

      We thank reviewer 3 for this comment. It is correct that the traces shown are from single confocal sections. To better display the phenotypic penetrance, we now added a categorical analysis that shows that the UAP56 E194Q mutant is completely mislocalized in the majority of cells assessed (and the newly added E194A mutant in a subset of cells).

      Changes introduced:

      • added categorical quantification of UAP56 variant localization (new Fig. 2 G).

      __Specific Comment 4: __Given the rather surprising finding that the ATPase activity is not required for the function of UAP56 characterized here, the authors do not provide sufficient references or rationale to support the ATPase mutant that they generate. The E194Q likely lies in the Walker B motif and is equivalent to human E218Q, which can prevent proper ATP hydrolysis in the yeast Sub2 protein. There is no reference to support the nature of the variant created here.

      Our response

      We thank reviewer 3 for this comment. To better introduce the E194 mutations, we explain the position of the DECD motif in the Walker B domain, give the corresponding residues in the human and yeast homologues (Sub2) and cite papers demonstrating the importance of this residue for ATPase activity:

      1) Saguez, C., et al., 2013. Mutational analysis of the yeast RNA helicase Sub2p reveals conserved domains required for growth, mRNA export, and genomic stability. RNA 19:1363-71.

      2) Shen, J., et al., 2007. Biochemical Characterization of the ATPase and Helicase Activity of UAP56, an Essential Pre-mRNA Splicing and mRNA Export Factor. J. Biol. Chem. 282, P22544-22550.

      __Specific Comment 5: __Given the surprising results, the authors could have included additional variants to ensure the change has the biochemical effect that the authors claim. Previous studies have defined missense mutations in the ATP-binding site- K129A (Lysine to Alanine): This mutation, in both yeast Sub2 and human UAP56, targets a conserved lysine residue that is critical for ATP binding. This prevents proper ATP binding and consequently impairs helicase function. There are also missense mutations in the DEAD-box motif, (Asp-Glu-Ala-Asp) involved in ATP binding and hydrolysis. Mutations in this motif, such as D287A in yeast Sub2 (corresponding to D290A in human UAP56), can severely disrupt ATP hydrolysis, impairing helicase activity. In addition, mutations in the Walker A (GXXXXGKT) and Walker B motifs are can impair ATP binding and hydrolysis in DEAD-box helicases. Missense mutations in these motifs, like G137A (in the Walker A motif), can block ATP binding, while E218Q (in the Walker B motif)- which seems to be the basis for the variant employed here- can prevent proper ATP hydrolysis.

      Our response

      We thank reviewer 3 for this comment. Our cursory survey of the literature suggested that mutations in the Walker B motif are the most specific as they still preserve ATP binding and their effects have not well been characterized overall. In addition, these mutations can create strong dominant-negatives in related helicases (e. g., Rode et al., 2018 Cell Reports, our lab). To better characterize the role of the Walker B motif in UAP56, we generated and characterized an alternative mutant, UAP56 E194A. While the E194A variant does not show the same penetrance of localization phenotypes as E194Q, it also is partially mislocalized, shows stronger binding to Ref1 and also rescues the uap56 mutant phenotypes without an obvious dominant-negative effect, thus confirming our conclusions regarding E194Q.

      Changes introduced:

      • added biochemical, localization and phenotypic analysis of newly generated UAP56 E194A variant (new Figs. 2 B, 2 E, E', 3 C, C'). categorical quantification of UAP56 variant localization (new Fig. 2 G).

      __Specific Comment 6: __The co-IP results shown in Figure 2C would also seem to have multiple potential interpretations beyond what the authors suggest, an inability to disassemble a complex. The change in protein localization with the E194Q variant could impact the interacting proteins. There is no negative control to show that the UAP56-E194Q variant is not just associated with many, many proteins. Another myc-tagged protein that does not interact would be an ideal control.

      Our response

      We thank reviewer 3 for this comment. To address this comment, we tried to co-IP UAP56 wt or UAP56 E194Q with a THO complex subunit THOC7 (new Fig. S2). The results show that neither UAP56 variant can co-IP THOC7 under our conditions (likely because the UAP56/THO complex intermediate during mRNA export is disassembled in an ATPase-independent manner (Hohmann et al., Nature 2025)).

      Changes introduced:

      • added co-IP experiment between UAP56 variants and THOC7 (new Fig. S2).

      __Specific Comment 7: __With regard to Figure 3, the authors never define EB1::GFP in the text of the Results, so a reader unfamiliar with this system has no idea what they are seeing. Reading the Materials and Methods does not mitigate this concern as there is only a brief reference to a fly line and how the EB1::GFP is visualized by microscopy. This makes interpretation of the data presented in Figure 3A-C very challenging.

      Our response

      We thank reviewer 3 for pointing this out. We added a description of the EB1::GFP analysis in the corresponding Results section (p.8).

      __Specific Comment 8: __The data shown for MICAL MS2 reporter localization in Figure 4 is nice, but is also fully expected on many former studies analyzing loss of UAP56 or UAP56 hypomorphs in different systems. While creating the reporter is admirable, to make the argument that MICAL localization is in some way preferentially impacted by loss of UAP56, the authors would need to examine several other transcripts. As presented, the authors can merely state that UAP56 seems to be required for the efficient export of an mRNA transcript, which is predicted based on dozens of previous studies dating back to the early 2000s.

      Our response

      Firstly, thank you for commenting on the validity of the experimental approach! The primary purpose of this experiment was to test whether the mechanism of UAP56 during dendrite pruning conforms with what is known about UAP56's cellular role - which it apparently does. We also noted that our statements regarding the specificity of UAP56 for Mical over other transcripts are difficult. While our experiments would be consistent with such a model, they do not prove it. We therefore toned down the corresponding statements (e. g., the concluding sentence at the end of first Results paragraphis now: "We conclude that c4da neuron dendrite pruning is particularly sensitive to loss of UAP56." (p. 6)).

      Minor (and really minor) points:

      In the second sentence of the Discussion, the word 'developing' seems to be mis-typed "While a general inhibition of mRNA export might be expected to cause broad defects in cellular processes, our data in develoing c4da neurons indicate that loss of UAP56 mainly affects pruning mechanisms related to actin remodeling."

      Sentence in the Results (lack of page numbers makes indicating where exactly a bit tricky)- "We therefore reasoned that Mical expression could be more challenging to c4da neurons." This is a complete sentence as presented, yet, if something is 'more something'- the thing must be 'more than' something else. Presumably, the authors mean that the length of the MICAL transcript could make the processing and export of this transcript more challenging than typical fly transcripts (raising the question of the average length of a mature transcript in flies?).

      Our response

      Thanks for pointing these out. The typo is fixed, page numbers are added. We changed the sentence to: "Because of the large size of its mRNA, we reasoned that MICAL gene expression could be particularly sensitive to loss of export factors such as UAP56." (p.9) We hope this is more precise language-wise.

      Reviewer #3 (Significance (Required)):

      Understanding how post-transcriptional events are linked to key functions in neurons is important and would be of interest to a broad audience.

    1. 3.4. Bots and Responsibility# As we think about the responsibility in ethical scenarios on social media, the existence of bots causes some complications. 3.4.1. A Protesting Donkey?# To get an idea of the type of complications we run into, let’s look at the use of donkeys in protests in Oman: “public expressions of discontent in the form of occasional student demonstrations, anonymous leaflets, and other rather creative forms of public communication. Only in Oman has the occasional donkey…been used as a mobile billboard to express anti-regime sentiments. There is no way in which police can maintain dignity in seizing and destroying a donkey on whose flank a political message has been inscribed.” From Kings and People: Information and Authority in Oman, Qatar, and the Persian Gulf by Dale F. Eickelman1 In this example, some clever protesters have made a donkey perform the act of protest: walking through the streets displaying a political message. But, since the donkey does not understand the act of protest it is performing, it can’t be rightly punished for protesting. The protesters have managed to separate the intention of protest (the political message inscribed on the donkey) and the act of protest (the donkey wandering through the streets). This allows the protesters to remain anonymous and the donkey unaware of it’s political mission. 3.4.2. Bots and responsibility# Bots present a similar disconnect between intentions and actions. Bot programs are written by one or more people, potentially all with different intentions, and they are run by others people, or sometimes scheduled by people to be run by computers. This means we can analyze the ethics of the action of the bot, as well as the intentions of the various people involved, though those all might be disconnected. 3.4.3. Reflection questions# How are people’s expectations different for a bot and a “normal” user? Choose an example social media bot (find on your own or look at Examples of Bots (or apps).) What does this bot do that a normal person wouldn’t be able to, or wouldn’t be able to as easily? Who is in charge of creating and running this bot? Does the fact that it is a bot change how you feel about its actions? Why do you think social media platforms allow bots to operate? Why would users want to be able to make bots? How does allowing bots influence social media sites’ profitability? 1 We haven’t been able to get the original chapter to load to see if it indeed says that, but I found it quoted here and here. We also don’t know if this is common or representative of protests in Oman, nor that we fully understand the cultural importance of what is happening in this story. Still, we are using it at least as a thought experiment. { requestKernel: true, binderOptions: { repo: "binder-examples/jupyter-stacks-datascience", ref: "master", }, codeMirrorConfig: { theme: "abcdef", mode: "python" }, kernelOptions: { kernelName: "python3", path: "./ch03_bots" }, predefinedOutput: true } kernelName = 'python3'

      I found the donkey protest example helpful for understanding how responsibility can be separated from action. Just like the donkey does not understand the protest it carries, bots can perform actions without intention or awareness. This makes it harder to assign responsibility, since the people who design, deploy, or benefit from a bot may all have different roles and intentions.

    1. Unclear Privacy Rules: Sometimes privacy rules aren’t made clear to the people using a system. For example: If you send “private” messages on a work system, your boss might be able to read them. When Elon Musk purchased Twitter, he also was purchasing access to all Twitter Direct Messages Others Posting Without Permission: Someone may post something about another person without their permission. See in particular: The perils of ‘sharenting’: The parents who share too much Metadata: Sometimes the metadata that comes with content might violate someone’s privacy. For example, in 2012, former tech CEO John McAfee was a suspect in a murder in Belize, John McAfee hid out in secret. But when Vice magazine wrote an article about him, the photos in the story contained metadata with the exact location in Guatemala. Deanonymizing Data: Sometimes companies or researchers release datasets that have been “anonymized,” meaning that things like names have been removed, so you can’t directly see who the data is about. But sometimes people can still deduce who the anonymized data is about. This happened when Netflix released anonymized movie ratings data sets, but at least some users’ data could be traced back to them. Inferred Data: Sometimes information that doesn’t directly exist can be inferred through data mining (as we saw last chapter), and the creation of that new information could be a privacy violation. This includes the creation of Shadow Profiles, which are information about the user that the user didn’t provide or consent to Non-User Information: Social Media sites might collect information about people who don’t have accounts, like how Facebook does

      This list shows how privacy risks often come less from a single bad action and more from how data travels and persists across systems. Even when users think they are acting safely or anonymously, metadata, inference, and platform ownership can quietly undermine consent and control, making privacy feel fragile and conditional rather than guaranteed.

    1. Author response:

      The following is the authors’ response to the original reviews

      We appreciate the reviewers’ insightful comments. In response, we conducted three new experiments, summarized in Author response table 1. After the table, we provide detailed responses to each comment.

      Author response table 1.

      Summary of new experiments and results.

      Reviewer #1 (Public review):

      The authors show that corticotropin-releasing factor (CRF) neurons in the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) monosynaptically target cholinergic interneurons (CINs) in the dorsal striatum of rodents. Functionally, activation of CRFR1 receptors increases CIN firing rate, and this modulation was reduced by pre-exposure to ethanol. This is an interesting finding, with potential significance for alcohol use disorders, but some conclusions could use additional support.

      Strengths:

      Well-conceived circuit mapping experiments identify a novel pathway by which the CeA and BNST can modulate dorsal striatal function by controlling cholinergic tone. Important insight into how CRF, a neuropeptide that is important in mediating aspects of stress, affective/motivational processes, and drug-seeking, modulates dorsal striatal function.

      Weaknesses:

      (1) Tracing and expression experiments were performed both in mice and rats (in a mostly nonoverlapping way). While these species are similar in many ways, some conclusions are based on assumptions of similarities that the presented data do not directly show. In most cases, this should be addressed in the text (but see point number 2).

      In the revised manuscript, we have clarified this limitation in the first paragraph of the Methods and the third paragraph of the Discussion and avoid cross-species claims, limiting our conclusions to the species in which each assay was performed. Specifically, we now state that while mice and rats share many conserved amygdalostriatal components, our tracing and expression studies were performed in a species-specific manner, and direct cross-species comparisons of CRF–CIN connectivity and CRFR1 expression were not assessed. We further note that future studies will be needed to determine the extent to which these observations are conserved across species as more tools become available.

      (2) Experiments in rats show that CRFR1 expression is largely confined to a subpopulation of striatal CINs. Is this true in mice, too? Since most electrophysiological experiments are done in various synaptic antagonists and/or TTX, it does not affect the interpretation of those data, but non-CIN expression of CRFR1 could potentially have a large impact on bath CRF-induced acetylcholine release.

      To address whether CRFR1 expression in striatal CINs is conserved across species, we performed new histological experiments using CRFR1-GFP mice. Striatal sections were immunostained with anti-ChAT, and we found that approximately 10% of CINs express CRFR1 (new Fig. 4D, 4E). This result indicates that, similar to rats, a subset of CINs in mice express CRFR1. However, the proportion of CRFR1<sup>+</sup> CINs is lower than the proportion of CRF-responsive CINs observed during electrophysiology experiments, suggesting that CRF may also modulate CIN activity indirectly through network or synaptic mechanisms. We have also noted in the revised Discussion that while CRFR1 expression is confirmed in a subset of CINs, the broader distribution of CRFR1 among other striatal cell types remains to be determined (third paragraph of Discussion).

      In our study, bath application of CRF increased striatal ACh release. Because striatal ACh is released primarily from CINs, and CRFR1 is an excitatory receptor, this effect is most likely mediated by CRF activation of CRFR1 on CINs, leading to enhanced CIN activity and ACh release. Although CRFR1 may also be expressed on other striatal neurons, these cell types—medium spiny neurons and GABAergic interneurons—are inhibitory. If CRF were to activate CRFR1 on these GABAergic neurons, the resulting increase in GABA release would suppress CIN activity and consequently reduce, rather than enhance, ACh release. Given that most CINs responded functionally while only a small subset expressed CRFR1, these findings imply that indirect mechanisms, such as CRF modulation of local circuits influencing CIN excitability, may also contribute to the observed increase in ACh release. Together, these data support a model in which CRF primarily enhances ACh release via activation of CRFR1-expressing CINs, while indirect network effects may further amplify this response.

      (3) Experiments in rats show that about 30% of CINs express CRFR1 in rats. Did only a similar percentage of CINs in mice respond to bath application of CRF? The effect sizes and error bars in Figure 5 imply that the majority of recorded CINs likely responded. Were exclusion criteria used in these experiments?

      We thank the reviewer for this insightful question. In our mouse cell-attached recordings, ~80% of CINs increased firing during CRF bath application, and all recorded cells were included in the analysis (no exclusions based on response direction/magnitude; cells were only required to meet standard recording-quality criteria such as stable baseline firing and seal).

      Using a CRFR1-GFP reporter mouse, we found that ~10% of striatal CINs are GFP+, suggesting that the high proportion of CRF-responsive CINs cannot be explained solely by somatic reporter-labeled CRFR1 expression. Importantly, the CRF-induced increase in CIN firing is blocked by the selective CRFR1 antagonist NBI 35695 (Fig. 5B–C), supporting a CRFR1-dependent mechanism at the circuit level. We now discuss several non-mutually exclusive explanations for this apparent discrepancy: (i) reporter lines (e.g., CRFR1-GFP) may underestimate functional CRFR1 expression, particularly for low-level or compartmentalized receptor pools; (ii) bath-applied CRF may act indirectly via CRFR1 on presynaptic afferents, thereby enhancing excitatory drive onto CINs; and (iii) electrical coupling among CINs could allow direct effects in a subset of CINs to propagate through the CIN network (Ren, Liu et al. 2021). We added this discussion to the revised manuscript (fourth paragraph of the Discussion).

      (4) The conclusion that prior acute alcohol exposure reduces the ability of subsequent alcohol exposure to suppress CIN activity in the presence of CRF may be a bit overstated. In Figure 6D (no ethanol preexposure), ethanol does not fully suppress CIN firing rate to baseline after CRF exposure. The attenuated effect of CRF on CIN firing rate after ethanol pre-treatment (6E) may just reduce the maximum potential effect that ethanol can have on firing rate after CRF, due to a lowered starting point. It is possible that the lack of significant effect of ethanol after CRF in pre-treated mice is an issue of experimental sensitivity. Related to this point, does pre-treatment with ethanol reduce the later CIN response to acute ethanol application (in the absence of CRF)?

      In the revised manuscript, we have tempered our interpretation in the final Results section and throughout the Discussion to emphasize that ethanol pre-exposure attenuates, rather than abolishes, the CRFinduced increase in CIN firing. We also note the reviewer’s important point that in Figure 6D, ethanol does not fully suppress firing to baseline after CRF exposure, consistent with a partial effect. Regarding the reviewer’s question, our experiments were specifically designed to test interactions between CRF and ethanol, so we did not assess whether ethanol pre-treatment alters subsequent responses to ethanol alone. We now explicitly acknowledge CRF-dependent and CRF-independent effects of ethanol on CIN activity as an important point for future studies to disentangle (sixth paragraph of the Discussion). For example, comparing ethanol responses with and without prior ethanol without any treatment with CRF could resolve this question.

      (5) More details about the area of the dorsal striatum being examined would be helpful (i.e., a-p axis).

      We now provide more detail regarding the anterior–posterior axis of the dorsal striatum examined. Most recordings and imaging were performed in the posterior dorsomedial striatum (pDMS), corresponding to coronal slices posterior to the crossing of the anterior commissure and anterior to the tail of the striatum (starting around 0.62 mm and ending at −1.3 mm relative to the Bregma). While our primary focus was on posterior slices, some anterior slices were included to increase the sample size. These details have been added to the Methods (Last sentence of the ‘Histology and cell counting’ section and of the ‘Slice electrophysiology’ section).

      Reviewer #2 (Public review):

      Essoh and colleagues present a thorough and elegant study identifying the central amygdala and BNST as key sources of CRF input to the dorsal striatum. Using monosynaptic rabies tracing and electrophysiology, they show direct connections to cholinergic interneurons. The study builds on previous findings that CRF increases CIN firing, extending them by measuring acetylcholine levels in slices and applying optogenetic stimulation of CRF+ fibers. It also uncovers a novel interaction between alcohol and CRF signaling in the striatum, likely to spark significant interest and future research.

      Strengths:

      A key strength is the integration of anatomical and functional approaches to demonstrate these projections and assess their impact on target cells, striatal cholinergic interneurons.

      Weaknesses:

      (1) The nature of the interaction between alcohol and CRF actions on cholinergic neurons remains unclear. Also, further clarification of the ACh sensor used and others is required

      We have clarified the nature of the interaction between alcohol and CRF signaling in CINs and have provided additional details regarding the acetylcholine sensor used. These issues are addressed in detail in our responses to the specific comments below.

      Reviewer #2 (Recommendations for the authors):

      (1) The interaction between the effects of alcohol and CRF is a novel and important part of this study. When considering possible mechanisms underlying the findings in the discussion, there is no mention of occlusion. Given that incubation with alcohol produced a similar increase in firing of CINs as CRF, occlusion could be a parsimonious explanation for the observed interaction. Have the author considered blocking the effects of alcohol on CIN with CRF-R1 antagonist? Another experiment that could address the occlusion would be to test if alcohol also increases ACh levels as it did CRF.

      We thank the reviewer for proposing occlusion as a potential mechanism underlying the interaction between alcohol and CRF. We agree that, in principle, alcohol-induced endogenous CRF release could occlude subsequent exogenous CRF-mediated potentiation of CIN firing, and we carefully considered this possibility.

      However, several observations from our data argue against occlusion driven by acute alcohol exposure or withdrawal in this preparation. First, as shown in Fig. 6A, bath application of alcohol transiently reduced CIN firing, and firing recovered to baseline levels after washout without any rebound increase. Second, in Fig. 6D–E, the baseline firing rates under control conditions and following alcohol pretreatment were comparable, indicating that acute alcohol exposure and short-term withdrawal did not produce a sustained increase in CIN excitability. Together, these results suggest that acute withdrawal in slices is less likely to trigger substantial endogenous CRF release capable of occluding subsequent exogenous CRF effects.

      While we and others have previously reported increased spontaneous CIN firing following prolonged in vivo alcohol exposure and extended withdrawal periods (e.g., 21 days), short-term withdrawal (e.g., 1 day) does not robustly alter baseline CIN firing (Ma, Huang et al. 2021, Huang, Chen et al. 2024). Consistent with these prior findings, the absence of a rebound or elevated baseline firing in the present slice experiments discouraged further pursuit of an endogenous CRF occlusion mechanism under acute conditions.

      We also considered experimentally testing occlusion by blocking CRFR1 signaling during alcohol pre-treatment. However, this approach is technically challenging in slice recordings, as CRFR1 antagonists require prolonged incubation (~1 hour) during alcohol exposure. Because it is unclear whether endogenous CRF release is triggered by alcohol incubation itself or by withdrawal, the antagonist would need to remain present throughout both the incubation and withdrawal periods. This leaves insufficient time for complete washout of the CRFR1 antagonist prior to subsequent bath application of exogenous CRF to assess its effects on CIN firing. Consequently, residual antagonist presence would confound the interpretation of the exogenous CRF response.

      Finally, regarding the possibility that alcohol increases acetylcholine release, we did not observe alcohol-induced increases in CIN firing in slices, arguing against elevated ACh signaling under these conditions. Consistent with prior work (Ma, Huang et al. 2021, Huang, Chen et al. 2024), alcohol-induced increases in CIN excitability and cholinergic signaling appear to depend on prolonged in vivo exposure and extended withdrawal rather than acute slice-level manipulations.

      We have now incorporated discussion of occlusion as a potential mechanism (seventh paragraph) and clarified why our data and technical considerations argue against it in the present study. We thank the reviewer for this wonderful suggestion, which we will test in future in vivo studies.

      (2) Retrograde monosynaptic tracing of inputs to CIN. Results state the finding of labeling in all previously reported area..." Can the authors report these areas? A list in the text or a bar plot, if there is quantification, will suffice. This formation will serve as important validation and replication of previous findings.

      We thank the reviewer for this constructive suggestion. We agree that summarizing the anatomical sources of CIN input provides important validation of our tracing results. In the revised Results, we now list the major input regions observed, including the striatum itself, cortex (e.g., cingulate cortex, motor cortex, somatosensory cortex), thalamus (e.g., parafascicular thalamic nucleus, centrolateral thalamic nucleus), globus pallidus, and midbrain (first paragraph of the Results). Quantitative analysis of relative input strength will be presented in a separate study that expands on these findings. Here, we limit the current manuscript to the functional characterization of CRF and alcohol modulation of CINs.

      (3) Given the difference in connectivity among striatal subregions, it would be important to describe in more detail the injection site in the results and figures. In the figure, for example, you might want to include the AP coordinates, given that it is such a zoomed-in image, it is hard to tell how anterior/posterior the site is. I imagine that the picture is a representative image of the injection site, but maybe having a side image with overlay of injection sites in all the animals used, would help.

      The anterior–posterior (AP) coordinates for representative images have been included in the panels and reiterated more clearly in the revised Results section and figure legends. In the legend for Figure 3B, a list of AP coordinates for each animal used for Figure 3A-3E has been added.

      (4) Figure 1D inset, there seem to be some double-labeled cells in the zoomed in BNST images. The authors might want to comment on this. It seemed far from the injection site. Do D1-MSN so far away show connectivity to CINs?

      Upon closer inspection of the BNST images, we noted a small number of double-labeled cells were indeed present, consistent with prior reports that a subset of D1R-expressing neurons (~10%) has been reported previously in our lab in the BNST, with the majority being D2R-expressing neurons (Lu, Cheng et al. 2021). Given the BNST’s anatomical proximity to the dorsal striatum, it is plausible that some D1Rexpressing neurons in this region provide monosynaptic input to CINs, highlighting a potential ventral-to-dorsal connection that merits further study.

      (5) Can the author provide quantification of the onset delay of the optogenetic evoked CRF+ axon responses onto CINs? The claim of monosynaptic connectivity is well supported by the TTX/4AP experiment but additional information on the timing will strengthen that conclusion.

      We thank the reviewer for this insightful suggestion. Quantifying the onset latency of optogenetically evoked CRFMsup+</sup> axon responses onto CINs provides valuable confirmation of monosynaptic connectivity. To address this, we performed new latency measurements under the same recording conditions as the TTX/4-AP experiments. The average onset latency from the start of the optical stimulation was 5.85 ± 0.37 ms (new Figure 3J), consistent with direct monosynaptic transmission.

      As an additional reference, we analyzed latency data from a separate project in which we optogenetically stimulated cholinergic interneurons and recorded synaptic responses in medium spiny neurons. This circuit, known to involve disynaptic transmission from CINs to MSNs via nAChR-expressing interneurons (Autor response image 1) (English, Ibanez-Sandoval et al. 2011), exhibited a significantly longer latency (18.34 ± 0.70 ms; t<sub>(29)</sub> = 10.3, p < 0.001) compared to CRF⁺ CeA/BNST inputs to CINs (5.85 ± 0.37 ms). Together, these results further support that CRF⁺ axons form direct functional synapses onto CINs.

      Author response image 1.

      Latency of disynaptic transmission from CINs to MSNs via interneurons A) Schematic illustrating optogenetic stimulation of Chrimson-expressing CINs, leading to excitation of nAChRexpressing interneurons that release GABA onto recorded MSNs. B) Sample trace of disynaptic transmission (left) and bar graph summarizing onset latency (right) from light stimulation to synaptic response onset (n = 23 neurons from 3 mice).

      (6) The ACh sensor reported is "AAV-GRABACh4m" but the reference is for GRAB-ACh3.0. Also, BrainVTA has GRAB-ACh4.3. Is this the vector? Could you please check the name of the construct and report the corresponding reference, as well as clarify the meaning of the additional "m". They have a mutant version of the GRAB-ACH that researchers use for control, and of course, you want to use it as a control, but not for the test experiment.

      GRAB-ACh4m is the correct acetylcholine sensor used in this study. The ACh4 series (including ACh4h, ACh4m, and ACh4l; personal communication with Dr. Yulong Li’s lab) represents an updated generation following GRAB-ACh3.0. Although the ACh4 family has not yet been formally published, these constructs are publicly available through BrainVTA (https://www.brainvta.tech/plus/view.php?aid=2680).

      The suffix “m” does not indicate a mutant control; rather, it denotes a medium-affinity variant within the ACh4 sensor family. Importantly, the mutant (non-responsive) control sensor is only available for GRAB-ACh3.0 (ACh3.0mut) and does not exist for the ACh4 series.

      Our laboratory has previously used GRAB-ACh4m in multiple peer-reviewed publications (Huang, Chen et al. 2024, Gangal, Iannucci et al. 2025, Purvines, Gangal et al. 2025), and its use has also been reported by independent groups in recent preprints (Potjer, Wu et al. 2025, Touponse, Pomrenze et al. 2025). We have now clarified the construct name, its relationship to GRAB-ACh3.0, in the Methods ‘Reagents’ section, and we have corrected the reference accordingly.

      (7) Are CRF-R1+ CINs equally abundant in the DMS and DLS? From the image in Figure 4, it seems that a larger percentage of CINs are CRFR1+ in the DLS than in DMS. Is this true? The authors probably already have this data, or it should be easy to get, and it could be additional information that was not studied before.

      We did not perform a quantitative comparison of CRFR1+ CIN abundance between the DMS and DLS in the present study. While the representative images in Figure 4 may appear to suggest regional differences, these panels were selected to illustrate labeling quality rather than relative density and should not be interpreted as evidence of unequal distribution. We have clarified this point in the revised Discussion (last sentence of the third paragraph) and note that future studies will be needed to systematically evaluate potential regional differences in CRFR1 expression, which could have important implications for dorsal striatal function.

      (8) The manuscript states several times that there are no CRF+ neurons in the dorsal striatum. At the same time, there are reports of the CRF+ neuron in the ventral striatum and its role in learning. Could the authors include mention of the studies by the Lemos group (10.1016/j.biopsych.2024.08.006)

      We have revised the Discussion section to clarify that our findings pertain specifically to the dorsal striatum and now acknowledge the presence and functional relevance of CRF+ neurons in the ventral striatum, citing the Lemos group’s study (fifth paragraph of the Discussion).

      (9) For the histology analysis, please express cell counts as "density", not just number of cells, by providing an area (e.g., "number of cell/ µm2").

      In the revised manuscript, all histological outcomes have been recalculated as cell density (cells/mm<sup>2</sup>) by normalizing raw cell counts to the measured area of each region of interest (ROI). Figures that previously displayed absolute counts now present densities (cells/mm<sup>2</sup>), with corresponding updates made to figure legends and text. We note one exception in Figure 4B, where the comparison between the total number of CINs and CRFR1+ CINs is best represented as cell counts rather than normalized values, as the counting was conducted in the same area (within the same ROI) of the dorsostriatal subregion.

      (10) Figure 2C, we can see there are some labeled fibers in the striatum cut. Would it be possible to get a better confocal image?

      Figure 2C has been replaced with a higher-quality confocal image captured at the same magnification and scale. The updated image provides improved clarity and resolution, ensuring accurate visualization of labeled CRF+ fibers, but not cell bodies, within the striatum.

      (11) The ACh measurements in the slice are very informative and an important addition. I first thought that these experiments with the GRAB-ACh sensor were performed in ChAT-eGFP mice. After reading more carefully, I realized they were done in wild-type mice. Would you include the wildtype label in the figure as well? The ChATeGFP BAC transgenic line was reported to have enhanced ACh packaging and increased ACh release, which could have magnified the signals. So, it is important to highlight the experiments were done in wildtype mice.

      We now label with ‘WT mice’ and note in the legend that all GRAB-ACh experiments were performed in wild-type mice, not ChAT-eGFP, to avoid confounds in ACh release. We thank the reviewer for this important suggestion.

      Reviewer #3 (Public review):

      The authors demonstrate that CRF neurons in the extended amygdala form GABAergic synapses onto cholinergic interneurons and that CRF can excite these neurons. The evidence is strong, however, the authors fail to make a compelling connection showing CRF released from these extended amygdala neurons is mediating any of these effects. Further, they show that acute alcohol appears to modulate this action, although the effect size is not particularly robust.

      Strengths:

      This is an exciting connection from the extended amygdala to the striatum that provides a new direction for how these regions can modulate behavior. The work is rigorous and well done.

      Weaknesses:

      (1) While the authors show that opto stim of these neurons can increase firing, this is not shown to be CRFR1 dependent. In addition, the effects of acute ethanol are not particularly robust or rigorously evaluated. Further, the opto stim experiments are conducted in an Ai32 mouse, so it is impossible to determine if that is from CEA and BNST, vs. another population of CRF-containing neurons. This is an important caveat.

      We added recordings with the CRFR1 antagonist antalarmin. Light-evoked increases in CIN firing were abolished under CRFR1 blockade, linking the effect to CRFR1 (Figure 5J, 5K). We also clarify that CRFCre;Ai32 does not isolate CeA versus BNST sources, so we temper regional claims and highlight this as a limitation. The acute ethanol effects are modest but consistent; we expanded the discussion of dose and preparation constraints in acute slice physiology and note that in vivo studies will be needed to define the network-level impact.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors could bring some of this data together by examining CRFR1 dependence of optical stimulationinduced increases in firing. Further, the authors have devoted significant effort to exploring how the BNST and CEA project to the CIN, yet their ephys does not explore site-specific infusion of ChR2 into either region. How are we to be sure it is not some other population of CRF neurons mediating this effect? The alcohol data does not appear particularly robust, but I think if the authors wanted to, they could explore other concentrations. Mostly I think it is important to discuss the limitations of acute alcohol on 5a brain slice.

      We thank the reviewer for these thoughtful comments, which helped us strengthen the mechanistic interpretation of the CRF-CIN interaction. In the revised manuscript, we have addressed each point as follows:

      - CRFR1 dependence of optogenetically evoked responses: We performed new recordings in which optogenetic stimulation of CRF⁺ terminals in the dorsal striatum was conducted in the presence of the CRFR1 antagonist antalarmin. The increase in CIN firing evoked by light stimulation was abolished under CRFR1 blockade, confirming that this effect is mediated through CRFR1 activation (new Figure 5J, 5K, third paragraph of the corresponding Result section). These results directly link the functional effects of CRF⁺ terminal activation to CRFR1 signaling on CINs.

      - CeA vs. BNST projection specificity: The reviewer is correct that CeA and BNST projections were not analyzed separately. As unknown pathways, our experiment was designed to first establish the monosynaptic connections between CeA/BNST CRF neurons to striatal CINs. Future studies would further explore the specific contribution of each site. However, our data exclude the possibility of other CRF neurons as we selectively infused Cre-dependent opsins into both CeA and BNST of CRF-Cre mice (Figure 3G-3J).

      - Limitations of acute slice experiments: We have expanded the Discussion (sixth paragraph) to acknowledge that acute slice physiology cannot fully capture the dynamic and network-level effects of ethanol observed in vivo. While this preparation enables mechanistic precision, factors such as washout, diffusion constraints, and the absence of systemic feedback may underestimate ethanol’s impact on CINs. We now explicitly note this limitation and highlight the need for in vivo studies to examine behavioral and circuit-level implications of CRF–alcohol interactions.

      Collectively, these revisions clarify the CRFR1 dependence of CRF<sup>+</sup> terminal effects and reaffirm that both CeA and BNST projections contribute to CIN modulation while addressing the methodological limitations of the slice preparation.

      Reviewer #4 Public Review):

      This manuscript presents a compelling and methodologically rigorous investigation into how corticotropin-releasing factor (CRF) modulates cholinergic interneurons (CINs) in the dorsal striatum - a brain region central to cognitive flexibility and action selection-and how this circuit is disrupted by alcohol exposure. Through an integrated series of anatomical, optogenetic, electrophysiological, and imaging experiments, the authors uncover a previously uncharacterized CRF⁺ projection from the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) to dorsal striatal CINs.

      Strengths:

      Key strengths of the study include the use of state-of-the-art monosynaptic rabies tracing, CRF-Cre transgenic models, CRFR1 reporter lines, and functional validation of synaptic connectivity and neurotransmitter release. The finding that CRF enhances CIN excitability and acetylcholine (ACh) release via CRFR1, and that this effect is attenuated by acute alcohol exposure and withdrawal, provides important mechanistic insight into how stress and alcohol interact to impair striatal function. These results position CRF signaling in CINs as a novel contributor to alcohol use disorder (AUD) pathophysiology, with implications for relapse vulnerability and cognitive inflexibility associated with chronic alcohol intake. The study is well-structured, with a clear rationale, thorough methodology, and logical progression of results. The discussion effectively contextualizes the findings within broader addiction neuroscience literature and suggests meaningful future directions, including therapeutic targeting of CRFR1 signaling in the dorsal striatum.

      Weaknesses:

      (1) Minor areas for improvement include occasional redundancy in phrasing, slightly overlong descriptions in the abstract and significance sections, and a need for more concise language in some places. Nevertheless, these do not detract from the manuscript's overall quality or impact. Overall, this is a highly valuable contribution to the fields of addiction neuroscience and striatal circuit function, offering novel insights into stress-alcohol interactions at the cellular and circuit level, which requires minor editorial revisions.

      We have streamlined the abstract and significance statement, reduced redundancy, and improved conciseness throughout the text. We appreciate the reviewer’s feedback, which has helped us further strengthen the clarity and readability of the manuscript.

      Reviewer #4 (Recommendations for the authors):

      (1) Line 29-30: Slightly verbose. Consider: "Alcohol relapse is associated with corticotropin-releasing factor (CRF) signaling and altered reward pathway function, though the precise mechanisms are unclear."

      The sentence has been revised as recommended to improve clarity and conciseness in the introductory section (Lines 31-32).

      (2) Lines 39-43: Good synthesis, but could better emphasize the novelty of identifying a CRF-CIN pathway.

      The abstract has been revised to more clearly emphasize the novelty of identifying a CRF-CIN pathway and its functional significance (Line 42-43).

      (3) Lines 66-68: Consider integrating clinical relevance more directly, e.g., "AUD affects over 14 million adults in the U.S., with relapse often triggered by stress...".

      The introduction has been revised to more directly emphasize the clinical relevance of alcohol use disorder, including its high prevalence and the role of stress in relapse, thereby underscoring the translational significance of our findings (Lines 68-69).

      (4) Line 83: Repetition of "goal-directed learning, habit formation, and behavioral flexibility" appears multiple times; consider variety.

      We have varied the phrasing in the Introduction to avoid redundancy. Specifically, in place of repeating “goal-directed learning, habit formation, and behavioral flexibility,” we now use alternative terms such as “action selection,” “habitual responding,” and “cognitive flexibility,” depending on the context.

      (5) Lines 107-116: Clarify why both rats and mice were used-do they serve different experimental purposes?

      We now explain that each species was used for complementary experimental purposes. Rats were used for histological validation of CRFR1 expression using the CRFR1-Cre-tdTomato line, which has been extensively characterized in this species. Mice were used for the majority of electrophysiological, optogenetic, and GRAB-ACh sensor experiments due to the availability of well-established transgenic CRF-Cre-driver lines. This division allowed us to leverage the most appropriate tools in each species to address different aspects of the study. We have clarified this rationale in the Methods (first paragraph of the “Animals” section) and Discussion (third paragraph).

      (6) Electrophysiology section: The distinction between acute exposure vs. withdrawal could be further emphasized.

      To better highlight the distinction between acute alcohol exposure and withdrawal, we have clarified the timing and context of each condition within the Results section for Figure 6. Specifically, we now distinguish the immediate suppressive effects of alcohol observed during bath application (acute exposure) from the subsequent changes in CIN firing measured after washout (withdrawal). These revisions clarify the temporal dynamics and functional implications of CRF–alcohol interactions in our experimental design.

      (7) Lines 227-229: Reword for clarity: "Significantly more BNST neurons projected to CINs compared to the CeA...".

      The sentence has been reworded to clarify as recommended (Lines 247-248).

      (8) Lines 373-374: Consider connecting the CRF-CIN circuit to behavioral inflexibility in AUD more directly.

      We have modified the sentence (Lines 390-395) to more explicitly link alcohol-induced dysregulation of the CRF–CIN circuit to behavioral inflexibility in AUD, consistent with the established role of CINs in action selection and cognitive flexibility.

      (9) Lines 387-389: This is an excellent point about stress resilience; consider expanding with examples or potential implications.

      We thank the reviewer for this insightful suggestion. In the revised Discussion (sixth paragraph), we expanded this section to more directly connect alcohol-induced disruption of CRF–CIN signaling with impaired stress resilience and behavioral inflexibility. Specifically, we now note that such dysregulation may compromise stress resilience mechanisms mediated by CRF–cholinergic interactions in the striatum and related corticostriatal circuits. We further discuss how impaired CIN responsiveness could blunt adaptive behavioral adjustments under stress, biasing animals toward habitual or compulsive alcohol seeking. This addition highlights the broader implication that alcohol-induced alterations in CRF–CIN signaling may contribute to relapse vulnerability by undermining adaptive stress coping.

      References

      English, D. F., O. Ibanez-Sandoval, E. Stark, F. Tecuapetla, G. Buzsaki, K. Deisseroth, J. M. Tepper and T. Koos (2011). "GABAergic circuits mediate the reinforcement-related signals of striatal cholinergic interneurons." Nat Neurosci 15(1): 123–130.

      Gangal, H., J. Iannucci, Y. Huang, R. Chen, W. Purvines, W. T. Davis, A. Rivera, G. Johnson, X. Xie, S. Mukherjee, V. Vierkant, K. Mims, K. O'Neill, X. Wang, L. A. Shapiro and J. Wang (2025). "Traumatic brain injury exacerbates alcohol consumption and neuroinflammation with decline in cognition and cholinergic activity." Transl Psychiatry 15(1): 403.

      Huang, Z., R. Chen, M. Ho, X. Xie, H. Gangal, X. Wang and J. Wang (2024). "Dynamic responses of striatal cholinergic interneurons control behavioral flexibility." Sci Adv 10(51): eadn2446.

      Lu, J. Y., Y. F. Cheng, X. Y. Xie, K. Woodson, J. Bonifacio, E. Disney, B. Barbee, X. H. Wang, M. Zaidi and J. Wang (2021). "Whole-Brain Mapping of Direct Inputs to Dopamine D1 and D2 Receptor-Expressing Medium Spiny Neurons in the Posterior Dorsomedial Striatum." Eneuro 8(1).

      Ma, T., Z. Huang, X. Xie, Y. Cheng, X. Zhuang, M. J. Childs, H. Gangal, X. Wang, L. N. Smith, R. J. Smith, Y. Zhou and J. Wang (2021). "Chronic alcohol drinking persistently suppresses thalamostriatal excitation of cholinergic neurons to impair cognitive flexibility." J Clin Invest 132(4): e154969.

      Potjer, E. V., X. Wu, A. N. Kane and J. G. Parker (2025). "Parkinsonian striatal acetylcholine dynamics are refractory to L-DOPA treatment." bioRxiv.

      Purvines, W., H. Gangal, X. Xie, J. Ramos, X. Wang, R. Miranda and J. Wang (2025). "Perinatal and prenatal alcohol exposure impairs striatal cholinergic function and cognitive flexibility in adult offspring." Neuropharmacology 279: 110627.

      Ren, Y., Y. Liu and M. Luo (2021). "Gap Junctions Between Striatal D1 Neurons and Cholinergic Interneurons." Front Cell Neurosci 15: 674399.

      Touponse, G. C., M. B. Pomrenze, T. Yassine, V. Mehta, N. Denomme, Z. Zhang, R. C. Malenka and N. Eshel (2025). "Cholinergic modulation of dopamine release drives effortful behavior." bioRxiv.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates the control signals that drive event model updating during continuous experience. The authors apply predictions from previously published computational models to fMRI data acquired while participants watched naturalistic video stimuli. They first examine the time course of BOLD pattern changes around human-annotated event boundaries, revealing pattern changes preceding the boundary in anterior temporal and then parietal regions, followed by pattern stabilization across many regions. The authors then analyze time courses around boundaries generated by a model that updates event models based on prediction error and another that uses prediction uncertainty. These analyses reveal overlapping but partially distinct dynamics for each boundary type, suggesting that both signals may contribute to event segmentation processes in the brain.

      Strengths:

      (1) The question addressed by this paper is of high interest to researchers working on event cognition, perception, and memory. There has been considerable debate about what kinds of signals drive event boundaries, and this paper directly engages with that debate by comparing prediction error and prediction uncertainty as candidate control signals.

      (2) The authors use computational models that explain significant variance in human boundary judgments, and they report the variance explained clearly in the paper.

      (3) The authors' method of using computational models to generate predictions about when event model updating should occur is a valuable mechanistic alternative to methods like HMM or GSBS, which are data-driven.

      (4) The paper utilizes an analysis framework that characterizes how multivariate BOLD pattern dissimilarity evolves before and after boundaries. This approach offers an advance over previous work focused on just the boundary or post-boundary points.

      We appreciate this reviewer’s recognition of the significance of this research problem, and of the value of the approach taken by this paper.

      Weaknesses:

      (1) While the paper raises the possibility that both prediction error and uncertainty could serve as control signals, it does not offer a strong theoretical rationale for why the brain would benefit from multiple (empirically correlated) signals. What distinct advantages do these signals provide? This may be discussed in the authors' prior modeling work, but is left too implicit in this paper.

      We added a brief discussion in the introduction highlighting the complementary advantages of prediction error and prediction uncertainty, and cited prior theoretical work that elaborates on this point. Specifically, we now note that prediction error can act as a reactive trigger, signaling when the current event model is no longer sufficient (Zacks et al., 2007). In contrast, prediction uncertainty is framed as proactive, allowing the system to prepare for upcoming changes even before they occur (Baldwin & Kosie, 2021; Kuperberg, 2021). Together, this makes clearer why these two signals could each provide complementary benefits for effective event model updating.

      "One potential signal to control event model updating is prediction error—the difference between the system’s prediction and what actually occurs. A transient increase in prediction error is a valid indicator that the current model no longer adequately captures the current activity. Event Segmentation Theory (EST; Zacks et al., 2007) proposes that event models are updated when prediction error increases beyond a threshold, indicating that the current model no longer adequately captures ongoing activity. A related but computationally distinct proposal is that prediction uncertainty (also termed "unpredictability") can serve as a control signal (Baldwin & Kosie, 2021). The advantage of relying on prediction uncertainty to detect event boundaries is that it is inherently proactive: the cognitive system can start looking for cues about what might come next before the next event starts (Baldwin & Kosie, 2021; Kuperberg, 2021). "

      (2) Boundaries derived from prediction error and uncertainty are correlated for the naturalistic stimuli. This raises some concerns about how well their distinct contributions to brain activity can be separated. The authors should consider whether they can leverage timepoints where the models make different predictions to make a stronger case for brain regions that are responsive to one vs the other.

      We addressed this concern by adding an analysis that explicitly tests the unique contributions of prediction error– and prediction uncertainty–driven boundaries to neural pattern shifts. In the revised manuscript, we describe how we fit a combined FIR model that included both boundary types as predictors and then compared this model against versions with only one predictor. This allowed us to identify the variance explained by each boundary type over and above the other. The results revealed two partially dissociable sets of brain regions sensitive to error- versus uncertainty-driven boundaries (see Figure S1), strengthening our argument that these signals make distinct contributions.

      "To account for the correlation between uncertainty-driven boundaries and error-driven boundaries, we also fitted a FIR model that predicted pattern dissimilarity from both types of boundaries (combined FIR) for each parcel. Then, we performed two likelihood ratio tests: combined FIR to error FIR, which measures the unique contribution of uncertainty boundaries to pattern dissimilarity, and combined FIR to uncertainty FIR, which measures the unique contribution of error boundaries to pattern dissimilarity. The analysis also revealed two dissociable sets of brain regions associated with each boundary type (see Figure S1)."

      (3) The authors refer to a baseline measure of pattern dissimilarity, which their dissimilarity measure of interest is relative to, but it's not clear how this baseline is computed. Since the interpretation of increases or decreases in dissimilarity depends on this reference point, more clarity is needed.

      We clarified how the FIR baseline is estimated in the methods section. Specifically, we now explain that the FIR coefficients should be interpreted relative to a reference level, which reflects the expected dissimilarity when timepoints are far from an event boundary. This makes it clear what serves as the comparison point for observed increases or decreases in dissimilarity.

      "The coefficients from the FIR model indicate changes relative to baseline, which can be conceptualized as the expected value when far from event boundaries."

      (4) The authors report an average event length of ~20 seconds, and they also look at +20 and -20 seconds around each event boundary. Thus, it's unclear how often pre- and post-boundary timepoints are part of adjacent events. This complicates the interpretations of the reported time courses.

      This is related to reviewer's 2 comment, and it will be addressed below.

      (5) The authors describe a sequence of neural pattern shifts during each type of boundary, but offer little setup of what pattern shifts we might expect or why. They also offer little discussion of what cognitive processes these shifts might reflect. The paper would benefit from a more thorough setup for the neural results and a discussion that comments on how the results inform our understanding of what these brain regions contribute to event models.

      We thank the reviewer for this advice on how better to set the context for the different potential outcomes of the study. We expanded both the introduction and discussion to better set up expectations for neural pattern shifts and to interpret what these shifts may reflect. In the introduction, we now describe prior findings showing that sensory regions tend to update more quickly than higher-order multimodal regions (Baldassano et al., 2017; Geerligs et al., 2021, 2022), and we highlight that it remains unclear whether higher-order updates precede or follow those in lower-order regions. We also note that our analytic approach is well-suited to address this open question. In the discussion, we then interpret our results in light of this framework. Specifically, we describe how we observed early shifts in higher-order areas such as anterior temporal and prefrontal cortex, followed by shifts in parietal and dorsal attention regions closer to event boundaries. This pattern runs counter to the traditional bottom-up temporal hierarchy view and instead supports a model of top-down updating, where high-level representations are updated first and subsequently influence lower-level processing (Friston, 2005; Kuperberg, 2021). To make this interpretation concrete, we added an example: in a narrative where a goal is reached midway—for instance, a mystery solved before the story formally ends—higher-order regions may update the event representation at that point, and this updated model then cascades down to shape processing in lower-level regions. Finally, we note that the widespread stabilization of neural patterns after boundaries may signal the establishment of a new event model.

      Excerpt from Introduction:

      “More recently, multivariate approaches have provided insights into neural representations during event segmentation. One prominent approach uses hidden Markov models (HMMs) to detect moments when the brain switches from one stable activity pattern to another (Baldassano et al., 2017) during movie viewing; these periods of relative stability were referred to as "neural states" to distinguish them from subjectively perceived events. Sensory regions like visual and auditory cortex showed faster transitions between neural states. Multi-modal regions like the posterior medial cortex, angular gyrus, and intraparietal sulcus showed slower neural state shifts, and these shifts aligned with subjectively reported event boundaries. Geerligs et al. (2021, 2022) employed a different analytical approach called Greedy State Boundary Search (GSBS) to identify neural state boundaries. Their findings echoed the HMM results: short-lived neural states were observed in early sensory areas (visual, auditory, and somatosensory cortex), while longer-lasting states appeared in multi-modal regions, including the angular gyrus, posterior middle/inferior temporal cortex, precuneus, anterior temporal pole, and anterior insula. Particularly prolonged states were found in higher-order regions such as lateral and medial prefrontal cortex.

      The previous evidence about evoked responses at event boundaries indicates that these are dynamic phenomena evolving over many seconds, with different brain areas showing different dynamics (Ben-Yakov & Henson, 2018; Burunat et al., 2024; Kurby & Zacks, 2018; Speer et al., 2007; Zacks, 2010). Less is known about the dynamics of pattern shifts at event boundaries (e.g. whether shifts observed in higher-order regions precedes or follow shifts observed in lower-level regions), because the HMM and GSBS analysis methods do not directly provide moment-by-moment measures of pattern shifts. Both the spatial and temporal aspects of evoked responses and pattern shifts at event boundaries have the potential to provide evidence about two potential control processes (error-driven and uncertainty-driven) for event model updating.”

      Excerpt from Discussion:

      “We first characterized the neural signatures of human event segmentation by examining both univariate activity changes and multivariate pattern changes around subjectively identified event boundaries. Using multivariate pattern dissimilarity, we observed a structured progression of neural reconfiguration surrounding human-identified event boundaries. The largest pattern shifts were observed near event boundaries (~4.5s before) in dorsal attention and parietal regions; these correspond with regions identified by Geerligs et. al as shifting their patterns on a fast to intermediate timescale (2022). We also observed smaller pattern shifts roughly 12 seconds prior to event boundaries in higher-order regions within anterior temporal cortex and prefrontal cortex, and these are slow-changing regions identified by Geerligs et. al (2022). This is puzzling. One prevalent proposal, based on the idea of a cortical hierarchy of increasing temporal receptive windows (TRWs), suggests that higher-order regions should update representations after lower-order regions do (Chang et al., 2021). In this view, areas with shorter TRWs (e.g., word-level processors) pass information upward, where it is integrated into progressively larger narrative units (phrases, sentences, events). This proposal predicts neural shifts in higher-order regions to follow those in lower-order regions. By contrast, our findings indicate the opposite sequence. Our findings suggest that the brain might engage in top-down event representation updating, with changes in coarser-grain representations propagating downward to influence finer-grain representations. (Friston, 2005; Kuperberg, 2021). For example, in a narrative where the main goal is achieved midway—such as a detective solving a mystery before the story formally ends—higher-order regions might update the overarching event representation at that point, and this updated model could then cascade down to reconfigure how lower-level regions process the remaining sensory and contextual details. In the period after a boundary (around +12 seconds), we found widespread stabilization of neural patterns across the brain, suggesting the establishment of a new event model. Future work could focus on understanding the mechanisms behind the temporal progression of neural pattern changes around event boundaries.”

      Reviewer #2 (Public review):

      Summary:

      Tan et al. examined how multivoxel patterns shift in time windows surrounding event boundaries caused by both prediction errors and prediction uncertainty. They observed that some regions of the brain show earlier pattern shifts than others, followed by periods of increased stability. The authors combine their recent computational model to estimate event boundaries that are based on prediction error vs. uncertainty and use this to examine the moment-to-moment dynamics of pattern changes. I believe this is a meaningful contribution that will be of interest to memory, attention, and complex cognition research.

      Strengths:

      The authors have shown exceptional transparency in terms of sharing their data, code, and stimuli, which is beneficial to the field for future examinations and to the reproduction of findings. The manuscript is well written with clear figures. The study starts from a strong theoretical background to understand how the brain represents events and has used a well-curated set of stimuli. Overall, the authors extend the event segmentation theory beyond prediction error to include prediction uncertainty, which is an important theoretical shift that has implications in episodic memory encoding, the use of semantic and schematic knowledge, and attentional processing.

      We thank the reader for their support for our use of open science practices, and for their appreciation of the importance of incorporating prediction uncertainty into models of event comprehension.

      Weaknesses:

      The data presented is limited to the cortex, and subcortical contributions would be interesting to explore. Further, the temporal window around event boundaries of 20 seconds is approximately the length of the average event (21.4 seconds), and many of the observed pattern effects occur relatively distal from event boundaries themselves, which makes the link to the theoretical background challenging. Finally, while multivariate pattern shifts were examined at event boundaries related to either prediction error or prediction uncertainty, there was no exploration of univariate activity differences between these two different types of boundaries, which would be valuable.

      The fact that we observed neural pattern shifts well before boundaries was indeed unexpected, and we now offer a more extensive interpretation in the discussion section. Specifically, we added text noting that shifts emerged in higher-order anterior temporal and prefrontal regions roughly 12 seconds before boundaries, whereas shifts occurred in lower-level dorsal attention and parietal regions closer to boundaries. This sequence contrasts with the traditional bottom-up temporal hierarchy view and instead suggests a possible top-down updating mechanism, in which higher-order representations reorganize first and propagate changes to lower-level areas (Friston, 2005; Kuperberg, 2021). (See excerpt for Reviewer 1’s comment #5.)

      With respect to univariate activity, we did not find strong differences between error-driven and uncertainty-driven boundaries. This makes the multivariate analyses particularly informative for detecting differences in neural pattern dynamics. To support further exploration, we have also shared the temporal progression of univariate BOLD responses on OpenNeuro (BOLD_coefficients_brain_animation_pe_SEM_bold.html and BOLD_coefficients_brain_animation_uncertainty_SEM_bold.html in the derivatives/figures/brain_maps_and_timecourses/ directory; https://doi.org/10.18112/openneuro.ds005551.v1.0.4) for interested researchers.

      Reviewer #3 (Public review):

      Summary:

      The aim of this study was to investigate the temporal progression of the neural response to event boundaries in relation to uncertainty and error. Specifically, the authors asked (1) how neural activity changes before and after event boundaries, (2) if uncertainty and error both contribute to explaining the occurrence of event boundaries, and (3) if uncertainty and error have unique contributions to explaining the temporal progression of neural activity.

      Strengths:

      One strength of this paper is that it builds on an already validated computational model. It relies on straightforward and interpretable analysis techniques to answer the main question, with a smart combination of pattern similarity metrics and FIR. This combination of methods may also be an inspiration to other researchers in the field working on similar questions. The paper is well written and easy to follow. The paper convincingly shows that (1) there is a temporal progression of neural activity change before and after an event boundary, and (2) event boundaries are predicted best by the combination of uncertainty and error signals.

      We thank the reviewer for their thoughtful and supportive comments, particularly regarding the use of the computational model and the analysis approaches.

      Weaknesses:

      (1) The current analysis of the neural data does not convincingly show that uncertainty and prediction error both contribute to the neural responses. As both terms are modelled in separate FIR models, it may be that the responses we see for both are mostly driven by shared variance. Given that the correlation between the two is very high (r=0.49), this seems likely. The strong overlap in the neural responses elicited by both, as shown in Figure 6, also suggests that what we see may mainly be shared variance. To improve the interpretability of these effects, I think it is essential to know whether uncertainty and error explain similar or unique parts of the variance. The observation that they have distinct temporal profiles is suggestive of some dissociation,but not as convincing as adding them both to a single model.

      We appreciate this point. It is closely related to Reviewer 1's comment 2; please refer to our response above.

      (2) The results for uncertainty and error show that uncertainty has strong effects before or at boundary onset, while error is related to more stabilization after boundary onset. This makes me wonder about the temporal contribution of each of these. Could it be the case that increases in uncertainty are early indicators of a boundary, and errors tend to occur later?

      We also share the intuition that increases in uncertainty are early indicators of a boundary, and errors tend to occur later. If that is the case, we would expect some lags between prediction uncertainty and prediction error. We examined lagged correlation between prediction uncertainty and prediction error, and the optimal lag is 0 for both uncertainty-driven and error-driven models. This indicates that when prediction uncertainty rises, prediction error also simultaneously rises.

      Author response image 1.

      (3) Given that there is a 24-second period during which the neural responses are shaped by event boundaries, it would be important to know more about the average distance between boundaries and the variability of this distance. This will help establish whether the FIR model can properly capture a return to baseline.

      We have added details about the distribution of event lengths. Specifically, we now report that the mean length of subjectively identified events was 21.4 seconds (median 22.2 s, SD 16.1 s). For model-derived boundaries, the average event lengths were 28.96 seconds for the uncertainty-driven model and 24.7 seconds for the error-driven model.

      " For each activity, a separate group of 30 participants had previously segmented each movie to identify fine-grained event boundaries (Bezdek et al., 2022). The mean event length was 21.4 s (median 22.2 s, SD 16.1 s). Mean event lengths for uncertainty-driven model and error-driven model were 28.96s, and 24.7s, respectively (Nguyen et al., 2024)."

      (4) Given that there is an early onset and long-lasting response of the brain to these event boundaries, I wonder what causes this. Is it the case that uncertainty or errors already increase at 12 seconds before the boundaries occur? Or if there are other makers in the movie that the brain can use to foreshadow an event boundary? And if uncertainty or errors do increase already 12 seconds before an event boundary, do you see a similar neural response at moments with similar levels of error or uncertainty, which are not followed by a boundary? This would reveal whether the neural activity patterns are specific to event boundaries or whether these are general markers of error and uncertainty.

      We appreciate this point; it is similar to reviewer 2’s comment 2. Please see our response to that comment above.

      (5) It is known that different brain regions have different delays of their BOLD response. Could these delays contribute to the propagation of the neural activity across different brain areas in this study?

      Our analyses use ±20 s FIR windows, and the key effects we report include shifts ~12s before boundaries in higher-order cortex and ~4.5s pre-boundary in dorsal attention/parietal areas. Given the literature above, region-dependent BOLD delays are much smaller (~1–2s) than the temporal structure we observe (Taylor et al., 2018), making it unlikely that HRF lag alone explains our multi-second, region-specific progression.

      (6) In the FIR plots, timepoints -12, 0, and 12 are shown. These long intervals preclude an understanding of the full temporal progression of these effects.

      For page length purposes, we did not include all timepoints. We uploaded a brain animation of all timepoints and coefficients for each parcel in Openneuro (PATTERN_coefficients_brain_animation_human_fine_pattern.html and PATTERN_coefficients_lines_human_fine.html in the derivatives/figures/brain_maps_and_timecourses/ directory; https://doi.org/10.18112/openneuro.ds005551.v1.0.4) for interested researchers.

      References

      Taylor, A. J., Kim, J. H., & Ress, D. (2018). Characterization of the hemodynamic response function across the majority of human cerebral cortex. NeuroImage, 173, 322–331. https://doi.org/10.1016/j.neuroimage.2018.02.061

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required): *

      *Using genetics and microscopy approaches, Cabral et al. investigate how fission yeast regulates its length and width in response to osmotic, oxidative, or low glucose stress. Miller et al. have recently found that the cell cycle regulators Cdc25, Cdc13 and Cdr2 integrate information about cell volume, time and cell surface area into the cellular decision when to divide. Cabral now build on this work and test how disruption of these regulators affects cell size adaptation. They find that each stress condition shows a distinct dependence on the individual regulators, suggesting that the complex size control network enables optimized size adaptation for each condition. Overall, the manuscript is clear and the detailed methods ensure that the experiments can be replicated.

      Major comments:

      1.) It would be much easier to follow the authors' conclusions, if in addition to surface area to volume ratio, length and width, they would also plot cell volume at division in Figs. 1-4.*

      AUTHOR RESPONSE: Due to space constraints in the main (and supplemental) figures, we focused on SA:Vol ratio together with cell length and width, which directly define cell geometry in rod-shaped fission yeast. Surface area and volume are derived from these measurements and can be misleading when considered alone, as similar surface area or volume values can arise from distinct combinations of length and width. The SA:Vol ratio therefore serves as a robust integrative metric for capturing coordinated changes in length and width that reshape cell geometry. We would be happy to include individual surface area and volume plots if requested.

      2.) To me, it seems that maybe even more than upon osmotic stress, the cdc13-2x strain differs qualitatively from WT in low glucose conditions, where the increased SA-V ratio is almost completely abolished.

      AUTHOR RESPONSE: We agree with the reviewer and have revised the manuscript text to point out this difference. The newly added text states: “Under low glucose, cdc13-2x cells also showed a WT-like response, decreasing length and increasing in SA:Vol ratio (Figures 3B-D). However, this SA:Vol increase was reduced compared to WT (1% vs 8.5%; Figures 1D and 3B), suggesting impaired geometric remodeling under glucose limitation.”

      3.) It is not entirely clear to me why two copies of Cdc13 would qualitatively affect the responses. Shouldn't the extra copy behave similarly to the endogenous one and therefore only lead to quantitative changes? Maybe the authors can discuss this more clearly or even test a strain in which Cdc13 function is qualitatively disrupted.

      AUTHOR RESPONSE: Increased Cdc13 protein concentration in cdc13-2x cells disrupts the typical time-scaling of Cdc13 protein. Consistent with this, cdc13-2x cells enter mitosis at a smaller cell size. We have modified the text to clarify this point. The new text states: “To access the role of the Cdc13 time-sensing pathway, we disrupted Cdc13 protein abundance by creating a cdc13-2x strain carrying an additional copy of cdc13 integrated at an exogenous locus. cdc13-2x cells divided at a smaller size than WT, reflecting accelerated mitotic entry upon disruption of typical time-scaling of Cdc13 protein (Figure S1A).”

      4.) I don't see why the authors come to the conclusion that under osmotic stress cells would maximize cell volume. It leads to a decreased cell length, doesn't it?

      AUTHOR RESPONSE: WT cells under osmotic stress do decrease in length, but this is accompanied by an increase in cell width. Because width contributes disproportionately to cell volume in rod-shaped cells, this change results in a modest but reproducible reduction in the SA:Vol ratio relative to WT cells in control medium (Figure 1D). We note that the degree of this change under osmotic stress is small (-0.4%), although statistically significant (p * Likewise, in Figure 2B, they interpret tiny changes in the SA/V. By my estimation, the difference between control and osmotic stress is only 2% (1.195/1.17), less that the wild-type case, which appears to be twice that (which is still pretty modest). The small amplitude of these changes is obscured by the fact that the graphs do not have a baseline at zero, which, as a matter of good data-presentation practice, they should.

      *

      AUTHOR RESPONSE: We appreciate the reviewer’s distinction between statistical and biological significance and agree that this is an important point to clarify. We now note in the revised text that changes in SA:Vol ratio under osmotic stress are numerically small and should not be overinterpreted. Our revised text now states: “Under oxidative and osmotic stress, the SA:Vol ratio decreased, indicating greater cell volume expansion relative to surface area (Figure 1D). However, we note that the reduction in SA:Vol under osmotic stress, while statistically significant, was modest in magnitude (−0.4%).”

      Although small in absolute terms, even subtle geometric changes can be biologically meaningful in fission yeast due to the small size of these cells, where minor shifts in length or width translate into measurable differences in membrane area relative to cytoplasmic volume. Importantly, in Figure 2B, the key observation is not the magnitude of the change but its direction: cdc25-degron-DaMP cells exhibit a ~2% increase in SA:Vol ratio under osmotic stress, in contrast to the decrease observed in WT cells under the same condition. This opposite response reflects altered cell geometry and is supported by corresponding changes in cell length and width. We have revised the Results text to emphasize both the modest magnitude and the directional nature of these effects: “Under osmotic stress, cdc25-degron-DaMP cells exhibited a ~2% increase in SA:Vol ratio, opposite to the modest decrease observed in WT cells. This increase arose from increased cell length and reduced width (Figures 2B-D).”

      Regarding data presentation, because SA:Vol ratios vary over a narrow numerical range, setting the y-axis minimum to zero would compress the data and obscure all detectable differences. Instead, we have modifed our SA:Vol ratio graphs in Fig. 1-4 to have consistent axis scaling across panels to accurately convey relative changes while maintaining visual clarity. We are happy to provide full data tables and statistical outputs upon request.

      * I am also concerned about the use of manual measurement of width at a single point along the cell. This approach is very sensitive to the choice of width point and to non-cylindrical geometries, several of which are evident in the images presented. MATLAB will return the ??? as well as the length from a mask, but even better, one can more accurately calculate the surface area and volume by assuming rotational symmetry of the mask. Given that surface area and volume calculation need to be redone anyway, as discussed below, I encourage the authors to calculate them directly from the mask, instead of using the cylindrical assumption.*

      AUTHOR RESPONSE: In initial experiments to calculate surface area and volume of fission yeast cells for prior work (Miller et al., 2023, Current Biology) we found that automated width measurements by MATLAB or ImageJ were inaccurate for a subset of cells leading to noisy cell surface area and volume values. Measuring cell width by hand and assuming that each cell in a given strain had the same cell radius (average of population) for calculation of cell surface area and volume gave more consistent results and recapitulated established conclusions regarding size control mechanisms.

      In this previous work and the current study, abnormally skinny or wide regions of a cell were avoided when drawing a line to measure the cell width by hand. For each strain and condition, an average cell width was determined per independent experiment and used for surface area and volume calculations. Additionally, previous analysis demonstrated that this approach yields results consistent with a rotation method derived directly from cell masks, which does not assume a cylindrical cell shape (Facchetti et al., 2019, Current Biology; Miller et al., 2023, Current Biology).

      To test the validity of our size measurements and confirm the robustness of our results in this study we compared the surface area and volume of cells by this rotation method. We have added this additional information to our revised methods section and also added SA:Vol ratio graphs generated from the rotation size measurement to our revised Figure S1 E-J. Importantly, both approaches used to measure cell size gave consistent results and supported the same conclusions.*

      The authors also need to be more careful about their claims about size-dependent scaling. The concentration of both Cdc13 and Cdc25 scale with size (perhaps indirectly, in the case of Cdc13), but Cdr2 does not. Cdr2 activity has been proposed to scale with size, and its density at cortical nodes has been reported to scale with size, although that claim has been challenged .*

      AUTHOR RESPONSE: We have modified text in the Introduction and Results to address this point. Our revised text in the introduction states: “Recent work has shown that Cdk1 activation integrates size- and time-dependent inputs: the Wee1-inhibitory kinase Cdr2 cortical node density scales with cell surface area (Pan et al., 2014; Facchetti et al., 2019); Cdc25 nuclear accumulation scales with cell volume; and cyclin Cdc13 accumulates over time in the nucleus (Miller et al., 2023) (Figure 1B).” Our revised text in the results section states: “Cdr2 functions as a cortical scaffold that regulates Wee1 activity in relation to cell size, with Cdr2 nodal density reported to scale with cell surface area, enforcing a surface area threshold for mitotic entry (Pan et al., 2014; Allard et al., 2018; Facchetti et al., 2019; Sayyad and Pollard, 2022).”*

      Even taking the authors approach at face value, there are observations that do not seem to make sense, which led me to realize that the wrong formulae were used to calculate surface area and volume.

      In Figure 1E,F, the KCl-treated cells get shorter and wider; surely, that should result in a lower SA/V ratio. However, as noted above, in Figure 1D, they are shown to have a similar ratio. As a sanity check, I eye-balled the numbers off of the figure (control: 14 µm x 3.6 µm and KCl: 11 µm x 3.8 µm) and calculated their surface area and volume using the formula for a capsule (i.e., a cylinder with hemispheric ends).

      SA = the surface area of the two hemispheres + the surface are of the cylinder in between = 4*pi*(width/2)^2 + pi*width*(length-width), the length-width term calculates the side length of the capsule (length without the hemispheres) from the full length of the capsule (length including the hemispheres)

      V = the volume of the two hemispheres + the volume of the cylinder in between = 4/3*pi*(width/2)^3 + pi*(width/2)^2*(length-width).

      I got SA/V ratios of around 2, which are way off from what is presented in Figure 1D, but my calculated ratio goes down in KCl, as expected, but not as reported.

      To make sure I was not doing something wrong, I was going to repeat my calculations with the formulae in Table 1, which made me realize both are incorrect. The stated formula for the cell surface area-2*pi*RL-only represents to surface area of the cylindrical side of the cells, not its hemispherical ends. And it is not even the correct formula for the surface area of the side, because that calls for L to be the length of the side (without the hemispherical ends) not the length of the cell (which includes the hemispherical ends). L here is stated to be cell length (which is what is normally measured in the field, and which is consistent with the reported length of control cells in Figure 1E being 14 µm). The formula for the volume of a capsule in the form use in Table 1 (volume of a cylinder of length L - the volume excluded from the hemispherical ends) is pi*R^2*L - (8-(4/3*pi))*R^3.

      Given these problems, I think I spent too much time thinking about the rest of the paper, because all of the calculations, and perhaps their interpretations, need to be redone.*

      AUTHOR RESPONSE: The surface area and volume equations for a cylinder with hemispherical ends used in our study and listed in our table are correct and widely used in other work with fission yeast cells (Navarro and Nurse, 2012; Pan et al., 2014; Facchetti et al., 2019; BayBay et al., 2020; and Miller et al., 2023). We write our equations with variables for cell length and radius because these are biologically relevant and measured parameters for fission yeast cells. Cell length (L) refers to the total tip-to-tip length of the cell, including the hemispherical ends, and radius (R) refers to half the measured cell width. We have revised the Methods section to clarify this definition and avoid ambiguity (Please see methods section “Cell geometry measurements”)

      Additionally, SA or Vol calculations were performed using the length of each individual cell and the average cell radius of the population. We did not use mean cell length of the population for our calculations like the reviewer assumed in their “sanity check” above. Please see methods section “Cell geometry measurements”. We hope that these clarifications and text revisions improve transparency and reproducibility.

      * Minor Points:

      Strains should be identified by strain number is the text and figure legends.*

      AUTHOR RESPONSE: For clarity and readability, we refer to strains by genotype in the main text and figure legends, which we believe is more informative for readers than strain numbers. All strain numbers corresponding to each genotype are provided in Table S1, ensuring traceability and reproducibility without compromising clarity in data presentation.*

      In the Introduction, "Most cell control their size" should be "Most eukaryotic cell control their size".*

      • *

      AUTHOR RESPONSE: The text has been corrected as suggested.*

      Reviewer #2 (Significance (Required)):

      Nothing to add.*

      *Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary This manuscript reports that fission yeast cells exhibit distinct cell size and geometry when exposed to osmotic, oxidative, or low-glucose stress. Based on quantitative measurements of cell length and width, the authors propose that different stress conditions trigger specific 'geometric adaptation' patterns, suggesting that cell size homeostasis is flexibly modulated depending on environmental cues. The study provides phenotypic evidence that multiple environmental stresses lead to distinct outcomes in the balance between cell surface area and volume, which the authors interpret as stress-specific modes of size control.

      Major comments 1) The authors define the 48-hour time point as the 'long-term response', but no justification is provided for why 48 hours represents a physiologically relevant adaptation phase. It is unclear whether the size-control mode has stabilized by that time, or whether it may continue to change afterward. At minimum, the authors should provide a rationale (e.g., growth recovery dynamics, transcriptional adaptation plateau, or pilot time-course observations) to demonstrate that 48 hours corresponds to the steady-state adaptive phase rather than an arbitrarily selected time point.*

      AUTHOR RESPONSE: We thank the reviewer for this important point and agree that the definition of the long-term response should be clarified. We have addressed this with new experiments and revised text. We now incorporate growth curve data and doubling time analyses for all yeast strains grown under control and stress conditions (See new Figure S3). These analyses show that following an initial transient stress-induced cell cycle delay, growth rates stabilize well before 48 hours. Notably, the slowest growth rate observed was in 1M KCl, with a doubling time of ~4 hours across all yeast strains tested. Thus, by 48 hours, cells in this condition have undergone more than 12 generations of growth, while cells in all other conditions with shorter doubling times have undergone even more divisions. So by allowing cells to grow for 48 hours prior to imaging, we are capturing cells that have resumed sustained cell cycle progression following transient stress-induced cell cycle delays. Because cell size control is tightly linked to the cell cycle, we define 48 hours as a physiologically relevant time point where cells have adapted to stress conditions.

      Our revised methods now states: “Cultures were incubated at 25°C while shaking at 180 rpm for 48 h prior to imaging. This time point was chosen to ensure that cells had progressed beyond the initial transient stress response and reached a stable, condition-specific growth state, as confirmed by growth curve and doubling time analyses showing stabilization well before 48 h (Figure S3), including in the slowest growing condition (1 M KCl; doubling time ~4 h).”

      * 2*)Related to the above comment, the authors propose that different stresses lead to distinct cell size adaptations, yet the rationale for the chosen stress intensities and exposure times is insufficiently described. It remains unclear whether the osmotic, oxidative, and low-glucose conditions used here induce comparable levels of cellular stress. Dose-response and time-course analyses would greatly strengthen the conclusions. Without such analyses, it is difficult to support the interpretation that geometry modulation represents a direct adaptive response.

      AUTHOR RESPONSE: * *We selected the specific stress conditions based on previously published work showing that these doses elicit robust responses while preserving overall cell viability and the capacity for recovery. We note that osmotic, oxidative, and low glucose conditions perturb fundamentally different cellular systems (turgor pressure and cell wall mechanics, redox balance, and metabolism etc.) and therefore do not generate directly comparable levels of cellular stress in a quantitative sense. Our goal was not to equalize stress intensity across conditions, but to examine how cells change their geometry in response to distinct classes of stressors.

      We have clarified the rationale for specific stress conditions in the revised methods: “These stress intensities were selected based on prior studies demonstrating robust cellular responses while preserving cell viability and the capacity for recovery (Fantes and Nurse, 1977, Shiozaki and Russell, 1995, Degols, et al., 1996; López-Avilés et al., 2008; Sansó et al., 2008; Satioh et al., 2015, Salat-Canela et al., 2021, Bertaux et al., 2023).”

      * 3) The authors describe stress-induced size changes as an 'adaptive' response. While this is an appealing hypothesis, the presented data do not demonstrate that the change in cell size itself confers a fitness advantage. Evidence showing that blocking the size change reduces stress survival-or that the altered size improves growth recovery- would be required to support this claim. Without such data, the use of the term 'geometric adaptation' seems overstated.*

      AUTHOR RESPONSE: We have revised the text to remove the term “adaptive” and now describe stress-induced size changes in descriptive terms. As discussed further in response to Comment 4, new growth curve and doubling time analyses show that defects in surface area or volume expansion do not uniformly impair growth or survival over the stress exposure examined here, reinforcing the decision to avoid fitness-based language.*

      4) The authors conclude that mutants exhibit no major defects in growth or viability during 48-hour stress exposure based on comparable septation index values (Fig. S2). However, septation index alone does not fully capture growth performance or cell-cycle progression and is not sufficient to support claims regarding fitness or robustness of proliferation. If the authors intend to make statements about 'growth', 'viability', or 'cell-cycle progression', additional quantitative measures (e.g., growth curves, doubling time, colony-forming units, or microcolony growth measurements) would be necessary. Alternatively, the claims should be toned down to align with the measurements currently provided.*

      AUTHOR RESPONSE: We have addressed this concern with new experiments and revised text. In addition to septation index measurements (now analyzed using chi-square tests of proportions; Figure S2), we performed growth curve experiments and doubling time analyses for all genotypes under control and stress conditions (new Figure S3). These additional data show that growth rates are largely comparable across genotypes in control, oxidative, and low-glucose conditions, with more pronounced genotype-dependent differences emerging under osmotic stress. Defects in surface area or volume expansion did not uniformly correspond to impaired population growth, indicating that geometric remodeling is not strictly required for proliferation over the 48-hour stress exposure examined here. We have refined our conclusion to emphasize that defects in surface area or volume expansion do not uniformly impair growth or survival. See revised Results text under the heading “Defects in surface area or volume expansion do not uniformly compromise growth or survival”.*

      5) Related to the above comment, the manuscript does not adequately rule out the possibility that the decreased division size simply results from slower growth or delayed cell-cycle progression rather than a shift in the size-control mechanism. Measurements and normalizations of growth rate are required; without them, the interpretation remains speculative.*

      AUTHOR RESPONSE: We agree that changes in growth rate or altered cell cycle timing are important to consider. We have revised our text: “Changes in growth rate or cell cycle progression under stress may influence division size by altering mitotic regulator accumulation. Future studies measuring mitotic regulator dynamics alongside growth rates will be needed to distinguish direct changes in size control mechanisms from growth- or timing-dependent effects.”

      * 6) Regarding the phenotypes of wee1-2x cells, it is interesting that they increase the SA:Vol ratio under all stress conditions and show phenotypes distinct from cdr2Δ cells. From these observations, the authors claims that Cdr2 and Wee1 function as a surface-area-sensing module that complements the volume-sensing and time-sensing pathways to maintain geometric homeostasis. To support this interpretation, the authors could consider additional experiments, such as analyzing cdr2Δ + wee1-2x cells under the same stress conditions. Such data would test whether increased Wee1 can rescue or modify the cdr2Δ phenotype, providing functional evidence for the proposed Cdr2-Wee1-Cdk1 regulatory relationship. Measurements of cell length, width, SA:Vol ratio, and, if feasible, Cdk1 activity markers in the strain would greatly strengthen the mechanistic claims.*

      AUTHOR RESPONSE: We thank the reviewer for this insightful suggestion. While analysis of a cdr2Δ wee1-2x strain could provide additional mechanistic detail, such experiments address a distinct question beyond the scope of our current study, which focuses on how cell geometry changes under different stress conditions in cells with perturbed surface area-, volume-, or time-sensing pathways. Our conclusions regarding a surface area-sensing role for Cdr2-Wee1 signaling are based on previous studies (Pan et al., 2014; Facchetti et al., 2019; Miller et al., 2023) and the cell geometry phenotypes we observe of cdr2Δ and wee1-2x cells under stress conditions. *

      Minor comments 1) The manuscript focuses on adaptation through changes in the surface-to-volume ratio; however, only the ratio is shown. Presenting the underlying values of surface area and volume would clarify which geometric parameter primary contributes to the observed changes.*

      AUTHOR RESPONSE: Please see our response to Reviewer 1 major comment 1.*

      *2) Statistical analysis for Fig.S2 should be provided.

      AUTHOR RESPONSE: We have completed this. See revised Figure S2 and methods.*

      3) The paper by Kellog and Levin 2022 is missing from the reference list.*

      AUTHOR RESPONSE: Thank you for catching this. This reference has now been added. *

      **Referees cross-commenting**

      After reading the other reviewer's reports, I recognize that focal points differ, but they appear sequential rather than contradictory.

      Reviewer 2 raises concerns regarding the surface area/volume calculations, which-if incorrect-would influence many of the quantitative conclusions. I agree that confirming the validity of these calculations (and recalculating if necessary) should be the top priority before evaluating the biological interpretations.

      Reviewer 1 raises more mechanistic biological questions. These are certainly important, but in my view they depend on the robustness of the quantitative analysis highlighted by Reviewer 2.

      Therefore, I regard the reports as complementary rather than conflicting. Once the analytical issue pointed out by Reviewer 2 is resolved, the field will be in a better position to assess the significance of the mechanistic points raised by Reviewer 1 (as well as those in my own report).

      Reviewer #3 (Significance (Required)):

      General assessment One of the major strengths of this manuscript is its quantitative, side-by-side comparison of multiple environmental stresses under a unified experimental and analytical framework. The authors provide well-controlled morphometric measurements, allowing direct comparison of geometry changes that would otherwise be difficult to evaluate across studies. The observation that different stress types generate distinct geometric outcomes is particularly intriguing and has the potential to stimulate new conceptual thinking in the field of size control. However, the strength of the conceptual conclusion is currently limited by several aspects of the experimental design and interpretation. In particular, it remains unclear whether the observed geometry changes represent active adaptive responses rather than non-specific consequences of prolonged or string stress exposure. Demonstrating whether geometry remodeling provides a fitness advantage, clarifying whether the changes reach a steady-state rather than reflecting slow drift over time, or identifying upstream stress pathways that govern the response would substantially strengthen the conceptual advance. Even if additional mechanistic or fitness-related data cannot be added, refining the interpretation so that it remains aligned with the present evidence will enhance the clarity, and impact of the study.

      Advance Previous study - including the 2023 publication by the James B. Moseley group - established that fission yeast integrates distinct size-control pathways related to surface area, volume, and time under normal growth conditions. The present manuscript extends this line of work to stressed environments and argues that each stress condition elicits a distinct size-control pattern. To our knowledge, a systematic comparison of cell geometry across multiple stress types in the context of size-control pathways has not been reported, and this represents a potentially valuable conceptual advance. The advance is primarily phenomenological and conceptual rather than mechanistic: the work presents new correlation between stress types and geometry but does not yet elucidate the pathways governing these responses or demonstrate a functional advantage. With additional evidence - or with qualifiers ensuring that claims match the current data - the study could make an important contribution to understanding how cells integrate environmental cues into size-control strategies.

      Audience Although the primary audience consists of researchers in the fields of cell growth, cell-cycle control, and stress responses in yeast, the conceptual contribution may interest broader fields such as growth homeostasis, metabolic adaptation, and pathological cell size changes in higher eukaryotes. Beyond yeast biology, the modular view of size regulation proposed here may inspire new investigations in stem cell biology, cancer research, and biotechnology where environmental adaptation and cell size are closely linked.

      Expertise: nuclear morphology; cell morphology; cell growth; cell cycle; cytoskeleton*

    1. 8.1. Sources of Social Media Data# Social media platforms collect various types of data on their users. Some data is directly provided to the platform by the users. Platforms may ask users for information like: email address name profile picture interests friends Platforms also collect information on how users interact with the site. They might collect information like (they don’t necessarily collect all this, but they might): when users are logged on and logged off who users interact with What users click on what posts users pause over where users are located what users send in direct messages to each other Online advertisers can see what pages their ads are being requested on, and track users across those sites. So, if an advertiser sees their ad is being displayed on an Amazon page for shoes, then the advertiser can start showing shoe ads to that same user when they go to another website. Additionally, social media might collect information about non-users, such as when a user posts a picture of themselves with a friend who doesn’t have an account, or a user shares their phone contact list with a social media site, some of whom don’t have accounts (Facebook does this). Social media platforms then use “data mining” to search through all this data to try to learn more about their users, find patterns of behavior, and in the end, make more money.

      This section made me realize how much data social media platforms collect, even beyond what we intentionally share. I used to think they only stored basic info like my name or email, but they also track behaviors like what I click on, how long I look at posts, and even where I go online. It feels a little uncomfortable because many of these things happen without us noticing. It shows that our online actions can reveal a lot about us, not just what we directly say. This makes me think we should be more careful about privacy and what platforms are allowed to collect.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer 1

      Minor

      The main substance of my previous comment I suppose targeted a deeper issue - namely whether such a result is reflecting a resolution to a 'neural prediction' puzzle or a 'perceptual prediction' puzzle. Of course, these results tell us a great deal about a potential resolution for how dampening and sharpening might co-exist in the brain - but in the absence of corresponding perceptual effects (or a lack of correlation between neural and perceptual variables - as outlined in this revision) I do wonder if any claims about implications for perception might need moderation or caveating. To be honest, I don't think the authors *need* to make any more changes along these lines for this paper to be acceptable - it is more an issue they might wish to consider themselves when contextualizing their findings.

      Thank you for the thoughtful comment. We have now added a caveat to the relevant section of the discussion to make it clearer that we are discussing neural results, not perceptual results (p.20, lines 378-379).

      I am also happy with the changes that the authors have made justifying which claims can and cannot made based on a statistical decoding test against 'chance' in a single condition using t-tests. I was perhaps a little unclear when I spoke about 'comparisons against 0' in my original review, when the key issue (as the authors have intuited!) is about comparisons against 'chance' (where e.g., 0% decoding above chance is the same thing as 'chance'!). The authors are of course correct in the amendment they have made on p.29 to make clear this is a 'fixed effects analysis' - though I still worry this could be a little cryptic for the average reader. I am not suggesting that the authors run more analyses, or revise any conclusions, but I think it would be more transparent if a note was added along the lines of "while the fixed effects approach (one-sample t-test) enables us to establish whether some consistent informative patterns are detectable in these particular subjects, the results from our paired t-tests support inference to the wider population".

      This sentence has been added for increased transparency (p. 27, lines 544-547).

      Reviewer 3

      Major

      (1) In the previous round of comments, I noted that: "I am not fully convinced that Figures 3A/B and the associated results support the idea that early learning stages result in dampening and later stages in sharpening. The inference made requires, in my opinion, not only a significant effect in one-time bin and the absence of an effect in other bins. Instead to reliably make this inference one would need a contrast showing a difference in decoding accuracy between bins, or ideally an analysis not contingent on seemingly arbitrary binning of data, but a decrease (or increase) in the slope of the decoding accuracy across trials. Moreover, the decoding analyses seem to be at the edge of SNR, hence making any interpretation that depends on the absence of an effect in some bins yet more problematic and implausible". The authors responded: "we fitted a logarithmic model to quantify the change of the decoding benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1%. Given the results of this analysis and to ensure a sufficient number of trials, we focused our further analyses on bins 1-2". However, I do not see how this new analysis addresses the concern that the conclusion highlights differences in decoding performance between bins 1 and 2, yet no contrast between these bins are performed. While I appreciate the addition of the new model, in my current understanding it does not solve the problem I raised. I still believe that if the authors wish to conclude that an effect differs between two bins they must contrast these directly and/or use a different appropriate analysis approach.

      Relatedly, the logarithmic model fitting and how it justifies the focus on analysis bin 1-2 needs to be explained better, especially the rationale of the analysis, the choice of parameters (e.g., why logarithmic, why change of logarithmic fit < 0.1% as criterion, etc), and why certain inferences follow from this analysis. Also, the reporting of the associated results seems rather sparse in the current iteration of the manuscript.

      We thank the reviewer for this important point. Following your suggestion, we conducted additional post-hoc tests directly comparing the first and second bins. We found significant differences between bins in the invalid trials, but not the valid trials, suggesting that sharpening/dampening effects are condition specific. This is discussed in the manuscript on p.14, lines 268-271; p.15, 280-284; p.20, lines 382-386.

      A logarithmic analysis was chosen as learning is usually found to be a nonlinear process; learning effects occur rapidly before stabilising relatively early, as seen in Fig. 2D. This is consistent with other research which found that logarithmic fits efficiently describe learning curves in statistical learning (Kang et al., 2023; Siegelman et al., 2018; Choi et al., 2020). By utilising a change of logarithmic fit at <0.1% as a criterion, it is ensured that virtually zero learning took place after that point, allowing us to focus our analysis on learning effects as they developed and providing a more accurate model of representational change. This is explained in the manuscript on p.13, lines 250-251; p.27-28, lines 557-563.

      (2) A critical point the authors raise is that they investigate the buildup of expectations during training. They go on to show that the dampening effect disappears quickly, concluding: "the decoding benefit of invalid predictions [...] disappeared after approximately 15 minutes (or 50 trials per condition)". Maybe the authors can correct me, but my best understanding is as follows: Each bin has 50 trials per condition. The 2:1 condition has 4 leading images, this would mean ~12 trials per leading stimulus, 25% of which are unexpected, so ~9 expected trials per pair. Bin 1 represents the first time the participants see the associations. Therefore, the conclusion is that participants learn the associations so rapidly that ~9 expected trials per pair suffice to not only learn the expectations (in a probabilistic context) but learn them sufficiently well such that they result in a significant decoding difference in that same bin. If so, this would seem surprisingly fast, given that participants learn by means of incidental statistical learning (i.e. they were not informed about the statistical regularities). I acknowledge that we do not know how quickly the dampening/sharpening effects develop, however surprising results should be accompanied with a critical evaluation and exceptionally strong evidence (see point 1). Consider for example the following alternative account to explain these results. Category pairs were fixed across and within participants,i.e. the same leading image categories always predicted the same trailing image categories for all participants. Some category pairings will necessarily result in a larger representational overlap (i.e., visual similarity, etc.) and hence differences in decoding accuracy due to adaptation and related effects. For example, house  barn will result in a different decoding performance compared to coffee cup  barn, simply due to the larger visual and semantic similarity between house and barn compared to coffee cup and barn. These effects should occur upon first stimulus presentation, independent of statistical learning, and may attenuate over time e.g., due to increasing familiarity with the categories (i.e., an overall attenuation leading to smaller between condition differences) or pairs.

      We apologise for the confusion, there are 50 expected trials per bin per condition. The trial breakdown is as follows. Each participant completed 1728 trials, split equally across 3 mappings (two 2:1 maps and one 1:2 map), giving 1152 trials in the 2:1 mapping. Stimuli were expected in 75% of trials (864), leaving 216 per bin, and 54 per leading image in each bin. We have clarified this in the script (p.14, line 267; p.15, line 280). This is in line with similar studies in the field (e.g. Han et al., 2019).

      (3) In response to my previous comment, why the authors think their study may have found different results compared to multiple previous studies (e.g. Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011), particularly the sharpening to dampening switch, the authors emphasize the use of non-repeated stimuli (no repetition suppression and no familiarity confound) in their design. However, I fail to see how familiarity or RS could account for the absence of

      sharpening/dampening inversion in previous studies.

      First, if the authors argument is about stimulus novelty and familiarity as described by Feuerriegel et al., 2021, I believe this point does not apply to the cited studies. Feuerriegel et al., 2021 note: "Relative stimulus novelty can be an important confound in situations where expected stimulus identities are presented often within an experiment, but neutral or surprising stimuli are presented only rarely", which indeed is a critical confound. However, none of the studies (Han et al., 2019; Richter et al., 2018; Kumar et al., 2017; Meyer and Olson, 2011) contained this confound, because all stimuli served as expected and unexpected stimuli, with the expectation status solely determined by the preceding cue. Thus, participants were equally familiar with the images across expectation conditions.

      Second, for a similar reason the authors argument for RS accounting for the different results does not hold either in my opinion. Again, as Feuerriegel et al. 2021 correctly point out: "Adaptation-related effects can mimic ES when the expected stimuli are a repetition of the last-seen stimulus or have been encountered more recently than stimuli in neutral expectation conditions." However, it is critical to consider the precise design of previous studies. Taking again the example of Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011. To my knowledge none of these studies contained manipulations that would result in a more frequent or recent repetition of any specific stimulus in the expected compared to unexpected condition. The crucial manipulation in all these previous studies is not that a single stimulus or stimulus feature (which could be subject to familiarity or RS) determines the expectation status, but rather the transitional probability (i.e. cue-stimulus pairing) of a particular stimulus given the cue. Therefore, unless I am missing something critical, simple RS seems unlikely to differ between expectation condition in the previous studies and hence seems implausible to account for differences in results compared to the current study.

      Moreover, studies cited by the authors (e.g. Todorovic & de Lange, 2012) showed that RS and ES are separable in time, again making me wonder how avoiding stimulus repetition should account for the difference in the present study compared to previous ones. I am happy to be corrected in my understanding, but with the currently provided arguments by the authors I do not see how RS and familiarity can account for the discrepancy in results.

      The reviewer is correct in that the studies cited (Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011) ensure that participants are equally familiar with the images across expectation conditions. Where the present study differs is that participants are not familiar with individual exemplars at all. Han et al., 2019 used a pool of 30 individual images, and subjects underwent exposure sessions lasting two hours each daily for 34 days prior to testing. Kumar et al., 2017 used a pool of 12 images with subjects being exposed to each sequential pair 816 times over the course of the training period. Meyer & Olsen, 2011 used pure tones at five different pitch levels. While familiarity of stimuli across conditions was controlled for in these studies in the sense that familiarity was constant across conditions, novelty was not controlled for. The present study uses a pool of ~3500 images, which are unrepeated across trials.

      Feuerriegel et al., 2021 also points out: “There are also effects of adaptation that are dependent on the recent stimulation history extending beyond the last encountered stimulus and long-lag repetition effects that occur when the first and second presentation of a stimulus is separated by tens or even hundreds of intervening images”. Bearing this in mind, and given the very small pool of stimuli being used by Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011, it stands to reason that these studies may still have built-in but unaccounted for effects relating to the repetition of exemplars. Thus, our avoidance of those possible confounds, in addition to foregoing any prior training, may elicit differing results. Furthermore, as pointed out by Walsh et al. 2020, methodological heterogeneity (such as subject training) can produce contrasting results as PP makes divergent predictions regarding the properties of prediction error given different permutations of variables such as training, transitional probabilities, and conditional probabilities. In our case, the use of differing methodology was intentional. These issues have been discussed in more detail on p.5, lines 112-115; p.19, lines 368-377; p.20, lines 378-379).

      Minor

      (1) The authors note in their reply to my previous questions that: "As mentioned above, we opted to target our ERP analyses on Oz due to controversies in the literature regarding univariate effects of ES (Feuerriegel et al., 2021)". This might be a lack of understanding on my side, but how are concerns about the reliability of ES, as outlined by Feuerriegel et al. (2021), an argument for restricting analyses to 1 EEG channel (Oz)? Could one not argue equally well that precisely because of these concerns we should be less selective and instead average across multiple (occipital) channels to improve the reliability of results?

      The reviewer is correct in suggesting that a cluster of occipital electrodes may be more reliable than reporting one single electrode. We have amended the analysis to examine electrodes Oz, O1, and O2 (p.9, lines 187-188; p.11, lines 197-201).

      (2) The authors provide a github link for the dataset and code. However, I doubt that github is a suitable location to share EEG data (which at present I also cannot find linked in the github repo). Do the authors plan to share the EEG data and if so where?

      Thank you for bringing this to my attention. EEG data has now been uploaded at osf.io/x7ydf and linked to the github repository (p.28, lines 569-570).

      (3) The figure text could benefit from additional information; e.g. Fig.1C and Fig.3 do not clarify what the asterisk indicates; p < ? with or without multiple comparison correction?

      Thank you for pointing out this oversight, the figure texts have been amended (p. 9, line 168; p.16, line 289).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We sincerely appreciate the feedback, attention to detail and timeliness of the referees for our manuscript. Below, we provide a point-by-point response to all comments from the referees, detailing the changes we have already made, and those that are in progress. Referee's comments will appear in bolded text, while our responses will be unbolded. Any text quoted directly from the manuscript will be italicised and contained within "quotation marks". Additionally, we have grouped all comments into four categories (structural changes, minor text changes, experimental changes, figure changes), comments are numbered 1-n in each of these categories. Please note: this response to reviewer's comments included some images that cannot be embedded in this text-only section.

      1. General Statements

      We appreciate the overall highly positive and enthusiastic comments from all reviewers, who clearly appreciated the technical difficulty of this study, and noted amongst other things that this study represents" a major contribution to the future advancement of oocyst-sporozoite biology" and the development of the segmentation score for oocysts as a "major advance[ment]". We apologise for the omission of line numbers on the document sent to reviewers, we removed these for the bioRxiv submission without considering that this PDF would be transferred across to Review Commons.

      We have responded to all reviewers comments through a variety of text changes, experimental inclusions, or direct query response. Significant changes to the manuscript since initial submission are as follows:

      1. Refinement of rhoptry biogenesis model: Reviewers requested more detail around the content of the AORs, which we had previously suggested were a vehicle for rhoptry biogenesis as we saw they carried the rhoptry neck protein RON4. To address this, we first attempted to address this using antibodies against rhoptry bulb proteins but were unsuccessful. We then developed a * berghei* line where there rhoptry bulb protein RhopH3 was GFP-tagged. Using this parasite line, we observed that the earliest rhoptry-like structure, which we had previously interpreted as an AOR contained RhopH3. By contrast, RhopH3 was absent from AORs. Reflecting these observations we have renamed this initial structure the 'pre-rhoptry' and suggested a model for rhoptry biogenesis where rhoptry neck cargo are trafficked via the AOR but rhoptry bulb cargo are trafficked by small vesicles that move along the rootlet fibre (previously observed by EM).
      2. Measurement of rhoptry neck vs bulb: While not directly suggested by the reviewers, we have also included an analysis that estimates the proportion of the sporozoite rhoptry that represents the rhoptry neck. By contrast to merozoites, which we show are overwhelmingly represented by the rhoptry bulb, the vast majority of the sporozoite rhoptry represents the rhoptry neck.
      3. Measurement of subpellicular microtubules: One reviewer asked if we could measure the length of subpellicular microtubules where we had previously observed that they were longer on one side of the sporozoite than the other. We have now provided absolute and relative (% sporozoite length) length measurements for these subpellicular microtubules and also calculated the proportion of the microtubule that is polyglutamylated.
      4. More detailed analysis of RON11cKD rhoptries: Multiple comments suggested a more detailed analysis of the rhoptries that were formed/not formed in RON11cKD We have included an updated analysis that shows the relative position of these rhoptries in sporozoites.

      2. Point-by-point description of the revisions

      Reviewer #1

      Minor text changes (Reviewer #1)

      1. __Text on page 12 could be condensed to highlight the new data of ron4 staining of the AOR. __

      We agree with the reviewer that it is a reasonable suggestion. After obtaining additional data on the contents of the AOR (as described in General Statements #1), this section has been significantly rewritten to highlight these findings. 2.

      __Add reference on page 3 after 'disrupted parasites' __

      This sentence has been rewritten slightly with some references included and now reads: "Most data on these processes comes from electron microscopy studies 6-8, with relatively few functional reports on gene deleted or disrupted parasites9-11. 3.

      __Change 'the basal complex at the leading edge' - this seems counterintuitive __

      This change has been made. 4.

      __Change 'mechanisms underlying SG are poorly' - what mechanisms? of invasion or infection? __

      This was supposed to read "SG invasion" and has now been fixed. 5.

      __On page 4: 'handful of proteins' __

      This error has been corrected. 6.

      __What are the 'three microtubule spindle structures'? __

      The three microtubule spindle structures: hemispindle, mitotic spindle, and interpolar spindle are now listed explicitly in the text. 7.

      __On page 5: 'little is known' - please describe what is known, also in other stages. At the end of the paper I would like to know what is the key difference to rhoptry function in other stages? __

      The following sentence already detailed that we had recently used U-ExM to visualise rhoptry biogenesis in blood-stage parasites, but the following two sentences have been added to provide extra detail on these findings: "In that study, we defined the timing of rhoptry biogenesis showing that it begun prior to cytokinesis and completed approximate coincident with the final round of mitosis. Additionally, we observed that rhoptry duplication and inheritance was coupled with centriolar plaque duplication and nuclear fission." 8.

      __change 'rhoptries golgi-derived, made de novo' __

      This has been fixed. 9.

      __change 'new understand to' __

      This change has been made 10.

      __'rhoptry malformations' seem to be similar in sporozoites and merozoites. Is that surprising/new? __

      We assume this is in reference to mention of "rhoptry malformations" in the abstract. In the RON11 merozoite study (PMID:39292724) the authors noted no gross rhoptry malformations, only that one was not formed/missing. The abstract sentence has been changed to the following to better reflect this nuance: "*We show that stage-specific disruption of RON11 leads to a formation of sporozoites that only contain half the number of rhoptries of controls like in merozoites, however unlike in merozoites the majority of rhoptries appear grossly malformed."

      * 11.

      __What is known about crossing the basal lamina. Where rhoptries thought to be involved in this process? Or is it proteins on the surface or in other secretory organelles? __

      We are unaware of any studies that specifically look at sporozoites crossing the SG basal lamina. A review, although now ~15 years old stated that "No information is available as to how the sporozoites traverse the basal lamina" (PMID:19608457) and we don't know any more information since then. To try and better define our understanding of rhoptry secretion during SG invasion, we have added the following sentence:

      "It is currently unclear precisely when during these steps of SG invasion rhoptry proteins are required, but rhoptry secretion is thought to begin before in the haemolymph before SG invasion16." 12.

      __On page change/specify: 'wide range of parasite structures' __

      The structures observed have been listed: centriolar plaque, rhoptry, apical polar rings, rootlet fibre, basal complex, apicoplast. 13.

      __On page 7: is Airyscan2 a particular method or a specific microscope? __

      Airyscan2 is a detector setup on Zeiss LSM microscopes, this was already detailed in the materials and methods sections, but figure legends have been clarified to read: "...imaged by an LSM900 microscopy with an Airyscan2 detector". 14.

      __how large is RON11? __

      RON11 is 112 kDa in * berghei*, as noted in the text. 15.

      __There is no causal link between ookinete invasion and oocyst developmental asynchrony __

      We have deleted the sentence that implied that ookinete invasion was responsible for oocyst asynchrony. This section now simply states that "Development of each oocyst within a midgut is asynchronous..." 16.

      __First sentence of page 24 appears to contradict what is written in results____ I don't understand the first two sentences in the paragraph titled Comparison between Plasmodium spp __

      This sentence was worded confusingly, making it appear contradictory when that was not the intention. The sentence has been changed to more clearly support what is written in the discussion and now reads: "Our extensive analysis only found one additional ultrastructural difference between Plasmodium spp."

      __On page 25 or before the vast number of electron microscopy studies should be discussed and compared with the authors new data. __

      It is not entirely clear which new data should be specifically discussed based on this comment. However, we have added a new paragraph that broadly compares MoTissU-ExM and our findings with other imaging methods previously used on mosquito-stage malaria parasites:

      "*Comparison of MoTissU-ExM and other imaging modalities

      Prior to the development of MoTissU-ExM, imaging of mosquito-stage malaria parasites in situ had been performed using electron microscopy7,8,11,28, conventional immunofluorescence assays (IFA)10, and live-cell microscopy25. MoTissU-ExM offers significant advantages over electron microscopy techniques, especially volume electron microscopy, in terms of accessibility, throughput, and detection of multiple targets. While we have benchmarked many of our observations against previous electron microscopy studies, the intracellular detail that can be observed by MoTissU-ExM is not as clear as electron microscopy. For example, previous electron microscopy studies have observed Golgi-derived vesicles trafficking along the rootlet fibre8 and distinguished the apical polar rings44; both of which we could not observe using MoTissU-ExM. Compared to conventional IFA, MoTissU-ExM dramatically improves the number and detail of parasite structures/organelles that can be visualised while maintaining the flexibility of target detection. By contrast, it can be difficult or impossible to reliably quantify fluorescence intensity in samples prepared by expansion microscopy, something that is routine for conventional IFA. For studying temporally complex processes, live-cell microscopy is the 'gold-standard' and there are some processes that fundamentally cannot be studied or observed in fixed cells. We attempt to increase the utility of MoTissU-ExM in discerning temporal relationships through the development of the segmentation score but note that this cannot be applied to the majority of oocyst development. Collectively, MoTissU-ExM offers some benefits over these previously applied techniques but does not replace them and instead serves as a novel and complementary tool in studying the cell biology of mosquito-stage malaria parasites.**"

      *

      __First sentence on page 27: there are many studies on parasite proteins involved in salivary gland invasion that could be mentioned/discussed. __

      The sentence in question is "To the best of our knowledge, the ability of sporozoites to cross the basal lamina and accumulate in the SG intercellular space has never previously been reported."

      This sentence has now been changed to read as follows: "While numerous studies have characterized proteins whose disruption inhibited SG invasion9,10,15,59-63, to the best of our knowledge the ability of sporozoites to cross the basal lamina and accumulate in the SG intercellular space has never previously been reported ."

      __On page 10 I suggest to qualify the statement 'oocyst development has typcially been inferred by'. There seem a few studies that show that size doesn't reflect maturation. __

      In our opinion, this statement is already qualified in the following sentence which reads: "Recent studies have shown that while oocysts increase in size initially, their size eventually plateaus (11 days pot infection (dpi) in P. falciparum4)."

      __On page 16 the authors state that different rhoptries might have different function. This is an interesting hypothesis/result that could be mentioned in the abstract. __

      The abstract already contains the following statement: "...and provide the first evidence that rhoptry pairs are specialised for different invasion events." We see this as an equivalent statement.


      Experimental changes (Reviewer #1)

      1. On page 19: do the parasites with the RON11 knockout only have the cytoplasmic or only the apical rhoptries?

      The answer to this is not completely clear. We have added the following data to Figures 6 and 8 where we quantify the proportion of rhoptries that are either apical or cytoplasmic: In both wildtype parasites and RON11ctrl parasites, oocyst spz rhoptries are roughly 50:50 apical:cytoplasmic (with a small but consistent majority apical), while almost all rhoptries are found at the apical end (>90%) in SG spz. Presumably, after the initial apical rhoptries are 'used up' during SG invasion, the rhoptries that were previously cytoplasmic take their place. In RON11cKD the ratio of apical:cytoplasmic rhoptries is fairly similar to control oocyst spz. In RON11cKD SG spz, the proportion of cytoplasmic rhoptries decreases but not to the same extent as in wildtype or RON11Ctrl. From this, we infer that the two rhoptries that are lost/not made in RON11cKD sporozoites are likely a combination of both the apical and cytoplasmic rhoptries we find in control sporozoites.

      __in panel G: Are the dense granules not micronemes? What are the dark lines? Rhoptries?? __

      We have labelled all of Figure 1 more clearly to point out that the 'dark lines' are indeed rhoptries. Additionally, we have renamed the 'protein-dense granules' to 'protein-rich granules', as it seems we are suggesting that these structures are dense granules the secretory organelle. At this stage we simply do not know what all of these granules are. The observation that some but not all of these granules contain CSP (Supplementary Figure 2) suggests that they may represent heterogenous structures. It is indeed possible that some are micronemes, however, we think it is unlikely that they are all micronemes for a number of reasons: (1) micronemes are not nearly this protein dense in other Plasmodium lifecycle stages, (2) some of them carry CSP which has not been demonstrated to be micronemal, (3) very few of these granules are present in SG sporozoites, which would be unexpected because microneme secretion is required for hepatocyte invasion.

      __Figure 2 seems to add little extra compared to the following figures and could in my view go to the supplement. __

      We agree that Figure 2b adds little and so have moved that to Supplementary Figure 2, but think that the relative ease at which it can be distinguished if sporozoites are in the secretory cavity or SG epithelial cell is a key observation because of the difficulty in doing this by conventional IFA.

      __On page 8 the authors mention a second layer of CSP but do not further investigate it. It is likely hard to investigate this further but to just let it stand as it is seems unsatisfactory, considering that CSP is the malaria vaccine. What happens if you add anti-CSP antibodies? I would suggest to shorten the opening paragraphs of this paper and to focus on the rhoptries. This could be done be toning down the text on all aspects that are not rhoptries and point to the open question some of the observations such as the CSP layers raise for future studies. __

      When writing the manuscript, we were unsure whether to include this data at all as it is a purely incidental finding. We had no intention of investigating CSP specifically, but anti-CSP antibodies were included in most of the salivary gland imaging experiments so we could more easily find sporozoites. Given the tremendous importance of CSP to the field, we figured that these observations were potentially important enough that they should be reported in the literature even though they are not something we have the intention or resources to investigate subsequently. Additionally, after consultation with other microscopists we think there is a reasonable chance that this double-layer effect could be a product of chemical fixation. To account for this, we have qualified the paragraph on CSP with this sentence:

      "We cannot determine if there is any functional significance of this second CSP layer and considering that it has not been observed previously it may well represent an artefact of chemical (paraformaldehyde) fixation."

      __Maybe include more detail of the differences between species on rhoptry structure into Figure 4. I would encourage to move the Data on rhoptries in Figure S6 to the main text ie to Figure 4. __

      We have moved the images of developing rhoptries in * falciparum *(previously Figure S6a and b) into figure 4, which now looks as follows:

      Figure S8 (previously S6c) now consists only of the MG spz rhoptry quantification

      Manuscript structural changes (Reviewer #1)

      1. Abstract: don't focus on technique but on the questions you tried to answer (ie rewrite or delete the 3rd and 4th sentence)

      2. 'range of cell biology processes' - I understand the paper that the key discovery concerns rhoptry biogenesis and function, so focus on that, all other aspects appear rather peripheral.

      3. 'Much of this study focuses on the secretory organelles': I would suggest to rewrite the intro to focus solely on those, which yield interesting findings.

      4. Page 11: I am tempted to suggest the authors start their study with Figure 3 and add panel A from Figure 2 to it. This leads directly to their nice work on rhoptries. Other features reported in Figures 1 and 2 are comparatively less exciting and could be moved to the supplement or reported in a separate study.____ Page 23: I suggest to delete the first sentence and focus on the functional aspects and the discoveries.

      5. __Maybe add a conclusion section rather than a future application section, which reads as if you want to promoted the use of ultrastructure expansion microscopy. To my taste the technological advance is a bit overplayed considering the many applications of this techniques over the last years, especially in parasitology, where it seems widely used. In any case, please delete 'extraordinarily' __

      Response to Reviewer#1 manuscript structural changes 1-5: This reviewer considers the findings related to rhoptry biology as the most significant aspect of the study and suggests rewriting the manuscript to emphasize these findings specifically. Doing so might make the key findings easier to interpret. However, in our view, this approach could misrepresent how the study originated and what we see as the most important outcomes. We did not develop MoTissU-ExM specifically to investigate rhoptry biology. Instead, this technique was created independently of any particular biological question, and once established, we asked what questions it could answer, using rhoptry biology as a proof of concept. Given the authors' previous work and available resources, we chose to focus on rhoptry biology. Since this was driven by basic research rather than a specific hypothesis, it's important to acknowledge this in the manuscript. While we agree that the findings related to rhoptry biology are valuable, we believe that highlighting the technique's ability to observe organelles, structures, and phenotypes with unprecedented ease and detail is more important than emphasizing the rhoptry findings alone. For these reasons, we have decided not to restructure the manuscript as suggested.


      Reviewer #2

      Minor text changes (Reviewer #2)

      1. __The 'image Z-depth' value indicated in the figures is ambiguous. It is not clear whether this refers to the distance from the coverslip surface or the starting point of the z-stack image acquisition. A precise definition of this parameter would be beneficial. __

      In the legend of Figure 1, the image Z-depth has been clarified as "sum distance of Z-slices in max intensity projection". 2.

      __Paragraph 3 of the introduction - line 7, "handful or proteins" should be handful of proteins __

      This has been corrected. 3.

      __Paragraph 5 of the introduction - line 7, "also able to observed" should be observe __

      This has been changed. 4.

      __In the final paragraph of the introduction - line 1, "leverage this new understand" should be understanding __

      This has been fixed. 5.

      __The first paragraph of the discussion summary contains an incomplete sentence on line 7, "PbRON11ctrl-infected SGs." __

      This has been removed. 6.

      __The second paragraph of the discussion - line 10, "until cytokinesis beings" should be begins __

      This mistake has been corrected. 7.

      __One minor point that author suggest that oocyst diameter is not appropriate for the development of sporozoite develop. This is not so true as oocyst diameter tells between cell division and cell growth so it is important parameter especially where the proliferation with oocyst does not take place but the growth of oocyst takes place. __

      We agree that this was not highlighted enough in the text. The final sentence of the results section about this now reads:

      "While diameter is a useful readout for oocyst development in the early stages of its growth, this suggests that diameter is a poor readout for oocyst development once sporozoite formation has begun and highlights the usefulness of the segmentation score as an alternative.", and the final sentence of the discussion section about this now reads "Considering that oocyst size does not plateau until cytokinesis begins4, measuring oocyst diameter may represent a useful biological clock specifically when investigating the early stages of oocyst development." 8.

      __How is the apical polarity different to merozoite as some conoid genes are present in ookinete and sporozoite but not in merozoite. __

      Our hypothesis is that apical polarity is established by the positioning and attachment of the centriolar plaque to the parasite plasma membrane in both forming merozoites and sporozoites. While the apical polar ring proteins are obviously present at the apical end, and have important functions, we think that they themselves are unlikely to regulate polarity establishment directly. Additionally, it seems that the apical polar rings are visible in forming sporozoites far before the comparable stages of merozoite formation. An important note here is that at this point, this is largely inferences based on observational differences and there is relatively little functional data on proteins that regulate polarity establishment at any stage of the Plasmodium 9.

      __Therefore, I think that electron microscopy remains essential for the observation of such ultra-fine structures __

      We have added a paragraph in the discussion that provides a more clear comparison between MoTissU-ExM and other imaging modalities previously applied on mosquito-stage parasites (see response to Reviewer#1 (Minor text changes) comment #17). 10.

      __The author have not mentioned that sometimes the stage oocyst development is also dependent on the age of mosquito and it vary between different mosquito gut even if the blood feed is done on same day. __

      In our opinion this can be inferred through the more general statement that "development of each oocyst within a midgut is asynchronous..."


      Figure changes (Reviewer #2)

      1. __Fig 3B: stage 2 and 6 does not show the DNA cyan, it would-be good show the sate of DNA at that particular stage, especially at stage 2 when APR is visible. And box the segment in the parent picture whose subset is enlarged below it. __

      We completely agree with the reviewer that the stage 2 image would benefit from the addition of a DNA stain. Many of the images in Figure 3b were done on samples that did not have a DNA stain and so in these * yoelii samples we did not find examples of all segmentation scores with the DNA stain. Examples of segmentation score 2 and 6 for P. berghei, and 6 for P. falciparum* can be found with DNA stains in Figure S8. 2.

      __For clarity, it would be helpful to add indicators for the centriolar plaques in Figure 1b, as their locations are not immediately obvious. __

      The CPs in Figure 1a and 1b have been circled on the NHS ester only panel for clarity. +

      __Regarding Figure 1c, the authors state that 'the rootlet fiber is visible'. However, such a structure cannot be confirmed from the provided NHS ester image. Can the authors present a clearer image where the rootlet fibre is more distinct? Furthermore, please provide the basis for identifying this structure as a rootlet fiber based on the NHS ester observation alone. __

      The image in Figure 1c has been replaced with one that more clearly shows the rootlet fibre.

      Based on electron microscopy studies, the rootlet fibre has been defined as a protein dense structure that connects the centriolar plaque to the apical polar rings (PMID: 17908361). Through NHS ester and tubulin staining, we could identify the apical polar rings and centriolar plaque as sites on the apical end of the parasite and nucleus that microtubules are nucleated from. There is a protein dense fibre that connects these two structures. Based on the fact that the protein density of this structure was previously considered sufficient for its identification by electron microscopy, we consider its visualisation by NHS ester staining sufficient for its identification by U-ExM.

      __Fig 1B - could the tubulin image in the hemispindle panel be made brighter? __

      The tubulin staining in this panel was not saturated, and so this change has been made.

      __Fig 4A - the green text in the first image panel is not visible. Also, the cyan text in the 3rd image in Fig 1A is also difficult to see. There's a few places where this is the case __

      We have made all microscopy labels legible at least when printed in A4/Letter size.

      __Fig 6A - how do the authors know ron11 expression is reduced by 99%? Did they test this themselves or rely on data from the lab that gifted them the construct? Also please provide mention the number of oocyst and sporozoites were observed. __

      The way Figure 6a was previously designed and described was an oversight, that wrongly suggested we had quantified a >99% reduction in *ron11 * The 99% reduction has been removed from Figure 6a and the corresponding part of the figure legend has been rewritten to emphasise that this was previously established:

      "(a) Schematic showing previously established Ron11Ctrl and Ron11cKD parasite lines where ron11 expression was reduced by >99%9."

      As to the second part of the question, we did not independently test either protein or RNA level expression of RON11, but we were gifted the clonal parasite lines established by Prof. Ishino's lab in PMID: 31247198 not just the genetic constructs.

      __Fig 6E - are the data point colours the wrong way round on this graph? Just looking at the graph it looks as though the RON11cKD has more rhoptries than the control which does not match what is said in the text. __

      Thank you for pointing out this mistake, the colours have now been corrected.

      __Fig S8C, PbRON11 ctrl, pie chart shows 89.7 % spz are present in the secretory cavity while the text shows 100 %, 35/35 __

      The text saying 100% (35/35) only considered salivary glands that were infected (ie. Uninfected SGs were removed from the count. The two sentences that report this data have been clarified to reflect this better:

      "Of *PbRON11ctrl SGs that were infected (35/39), 100% (35/35) contained sporozoites in the secretory cavity (Figure S8c). Conversely of infected PbRON11cKD SGs (59/82), only 24% (14/59) contained sporozoites within the secretory cavity (Figure S9d)."

      *

      __Fig S9D shows that RON11 ckd contains 17.1% sporozoites in secretory cavity while the text says 24%. __

      Please see the response to Reviewer#2 Figure Changes Comment #8 where this was addressed.


      Experimental changes (Reviewer #2)

      1. __Why do the congruent rhoptries have similar lengths to each other, while the dimorphic rhoptries have different lengths? Is this morphological difference related to the function of these rhoptries? __

      We hypothesise that this morphological difference arises because the congruent rhoptries are 'used' during SG invasion, while the dimorphic rhoptries are utilized during hepatocyte invasion. It is not straightforward to test this functionally at this point, as no protein is known to have differential localization between the two. Additionally, RON11 is likely directly involved in both SG and hepatocyte invasion through a secreted portion of the protein (as seen in RBC invasion). Therefore, RON11cKD sporozoites may have combined defects, meaning we cannot assume any defect is solely due to the absence of two rhoptries. Determining this functionally is of high interest to our research groups and remains an area of ongoing study, but it is beyond the scope of this study. 2.

      Would it be possible to show whether RON11 localises to the dimorphic rhoptries, the congruent rhoptries, or both, by using expansion microscopy and a parasite line that expresses RON11 tagged with GFP or a peptide tag?

      __ __We do not have access to a parasite line that expresses a tagged copy of RON11, or anti-PbRON11 antibodies. Based on previously published localisation data, however, it seems likely that RON11 localises to both sets of rhoptries. Below are excerpts from Figure 1c of PMID: 31247198, where RON11 (in green) seems to have a more basally-extended localisation in midgut (MG) sporozoites than in salivary gland (SG) sporozoites. From this we infer that in the MG sporozoite you're seeing RON11 in both pairs of rhoptries, but only the one remaining pair in the SG sporozoite.


      __The knockdown of RON11 disrupts the rhoptry structure, making the dimorphic and congruent rhoptries indistinguishable. Does this suggest that RON11 is important for the formation of both types of rhoptries? I believe that it would be crucial to confirm whether RON11 localises to all rhoptries or is restricted to specific rhoptries for a more precise discussion of RON11's function. __

      Based on our analysis, it does indeed seem that RON11 is important for both types of rhoptries as when RON11 isn't expressed sporozoites still have both apical and cytoplasmic rhoptries (ie. Not just one pair is lost; see Reviewer #1 Experimental changes comment #1).

      __The authors state that 64% of RON11cKD SG sporozoites contained no rhoptries at all. Does this mean RON11cKD SG sporozoites used up all rhoptries corresponding to the dimorphic and congruent pairs during SG invasion? If so, this contradicts your claims that sporozoites are 'leaving the dimorphic rhoptries for hepatocyte invasion' and that 'rhoptry pairs are specialized for different invasion events'. If that is not the case, does it mean that RON11cKD sporozoites failed to form the rhoptries corresponding to the dimorphic pair? A more detailed discussion would be needed on this point and, as I mentioned above, on the specific role of RON11 in the formation of each rhoptry pair. __

      We do not agree that this constitutes a contradiction; instead, more nuance is needed to fully explain the phenotype. As shown in the new graph added in response to Reviewer#1 Figure changes comment #1 in RON11cKD oocyst sporozoites, 64% of all rhoptries are located at the apical end. Our hypothesis is that these rhoptries are used for SG invasion and, therefore, would not be present in RON11cKD SG sporozoites. Consequently, the fact that 64% of RON11cKD sporozoites lack rhoptries is exactly what we would expect. Essentially, we predict three slightly different 'pathways' for RON11cKD sporozoites: If they had 2 apical rhoptries in the oocyst, we predict they would have zero rhoptries in the SG. If they had 2 cytoplasmic rhoptries in the oocyst, we predict they would have two rhoptries in the SG. If they had one apical and one cytoplasmic rhoptry in the oocyst, we predict they would have one rhoptry in the SG. In any case, we expect the apical rhoptries to be 'used up,' which appears to be supported by the data.

      __Out of pure curiosity, is it possible to measure the length and number of subpellicular microtubules in the sporozoites observed in this study using expansion microscopy? __

      We have performed an analysis of subpellicular microtubules which is now included as Supplementary Figure 2. We could not always distinguish every SPMT from each other and so have not quantified SPMT number. We have, however, quantified their absolute length on both the 'long side' and 'short side', their relative length (as % sporozoite length) and the degree to which they are polyglutamylated.

      A description of this analysis is now found in the results section as follows: "*We quantified the length and degree of polyglutamylation of SPMTs on the 'long side' and 'short side' of the sporozoite (Figure S2). 'Short side' SPMTs were on average 33% shorter (mean = 3.6 µm {plus minus}SD 1.0 µm) than 'long side' SPMTs (mean = 5.3 µm {plus minus}SD 1.5 µm) and extended 17.4% less of the total sporozoite length. While 'short side' SPMTs were significantly shorter, a greater proportion of their length (87.9% {plus minus}SD 11.2%) was polyglutamylated compared to 'long side' SPMTs (69.4% {plus minus}SD 13.8%)." *

      Supplementary Figure 2: Analysis of sporozoite subpellicular microtubules. Isolated P. yoelii salivary gland sporozoites were prepared by U-ExM and stained with anti-tubulin (microtubules) and anti-PolyE (polyglutamylated SPMTs) antibodies. SPMTs were defined as being on either the 'long side' (nucleus distant from plasma membrane) or 'short side' (nucleus close to plasma membrane) of the sporozoite as depicted in Figure 1f. (a) SPMT length along with (b) SPMT length as a proportion of sporozoite length were both measured. (c) Additionally, the proportion of the SPMT that was polyglutamylated was measured. Analysis comprises 25 SPMTs (11 long side, 14 short side) from 6 SG sporozoites. ** = p The following section has also been added to the methods to describe this analysis: * "Subpellicular microtubule measurement

      • To measure subpellicular microtubule length and polyglutamylation maximum intensity projections were made of sporozoites stained with NHS Ester, anti-tubulin and anti-PolyE antibodies, and SYTOX Deep Red. The side where the nucleus was closest to the parasite plasma membrane was defined as the 'short side', while the side where the nucleus was furthest from the parasite plasma membrane was defined as the 'long side'. Subpellicular microtubules were then measured using a spline contour from the apical end of the sporozoite to the basal-most end of the microtubule with fluorescence intensity across the contour plotted (Zeiss ZEN 3.8). Sporozoite length was defined as the distance from the sporozoite apical polar rings to the basal complex, measuring through the centre of the cytoplasm. The percentage of the subpellicular microtubule that was polyglutamylated was determined by assessing when along the subpellicular microtubule contour the anti-PolyE fluorescence intensity last dropped below a pre-defined threshold."

      *

      __In addition to the previous point, in the text accompanying Figure 7a, the authors claim that "64% of PbRON11cKD SG sporozoites contained no rhoptries at all, while 9% contained 1 rhoptry and 27% contained 2 rhoptries". Could this data be used to infer which rhoptry pair are missing from the RON11cKD oocyst sporozoites? Can it be inferred that the 64% of salivary gland sporozoites that had no rhoptries in fact had 2 congruent rhoptries in the oocyst sporozoite stage and that these have been discharged already? __

      Please see the response to Reviewer #2 Experimental Changes Comment #4.

      __Is it possible that the dimorphic rhoptries are simply precursors to the congruent rhoptries? Could it be that after the congruent rhoptries are used for SG invasion, new congruent rhoptries are formed from the dimorphic ones and are then used for the next invasion?____ Would it be possible to investigate this by isolating sporozoites some time after they have invaded the SG and performing expansion microscopy? This would allow you to confirm whether the dimorphic rhoptries truly remain in the same form, or if new congruent rhoptries have been formed, or if there have been any other changes to the morphology of the dimorphic rhoptries. __

      In theory, it is possible that the dimorphic rhoptries are precursors to the uniform rhoptries, specifically how the larger one of the two in the dimorphic pair might be a precursor. Maybe the smaller one is, but we have no evidence to suggest that this rhoptry lengthens after SG invasion. We are interested in isolating sporozoites from SGs to add a temporal perspective, but currently, this isn't feasible. When sporozoites are isolated from SGs, they are collected at all stages of invasion. Additionally, we don't know how long each step of SG invasion takes, so a time-based method might not be effective either. We are developing an assay to better determine the timing of events during SG invasion with MoTissU-ExM, but this is beyond the scope of this study.

      __In the section titled "Presence of PbRON11cKD sporozoites in the SG intercellular space", the authors state that "the majority of PbRON11cKD-infected mosquitoes contained some sporozoites in their SGs, but these sporozoites were rarely inside either the SG epithelial cell or secretory cavity". - this is suggestive of an invasion defect as the authors suggest. Could the authors collect these sporozoites and see if liver hepatocyte infection can be established by the mutant sporozoites? They previously speculate that the two different types of rhoptries (congruent and dimorphic) may be specific to the two invasion events (salivary gland epithelial cell and liver cell infection). __

      It has already been shown that RON11cKD sporozoites fail hepatocyte invasion (PMID: 31247198), even when isolated from the haemolymph and so it seems very unlikely that they would be invasive following SG isolation. As mentioned in the discussion, RON11 in merozoites has a 'dual-function' where it is partially secreted during merozoite invasion in addition to its rhoptry biogenesis functions. Assuming this is also the case in sporozoites, using the RON11cKD parasite line we cannot differentiate these two functions and therefore cannot ascribe invasion defects purely to issues with rhoptry biogenesis. In order to answer this question functionally, we would need to identify a protein that only has roles in rhoptry biogenesis and not invasion directly.

      Reviewer #3

      Minor text changes (Reviewer #3)

      1. __Page 3 last paragraph: ...the molecular mechanisms underlying SG (invasion?) are poorly understood. __

      This has been corrected 2.

      __The term "APR" does not refer to a tubulin structure per se, but rather to the proteinaceous structure to which tubulin anchors. Are there any specific APR markers that can be used in Figure 1C? If not, I recommend avoiding the use of "APR" in this context. __

      The text does not state that the APR is a tubulin structure. Given that it is a proteinaceous structure, we visualise the APRs through protein density (NHS Ester). It has been standard for decades to define APRs by protein density using electron microscopy, and it has previously been sufficient in Plasmodium using expansion microscopy (PMIDs: 41542479, 33705377) so it is unclear why it should not be done so in this study. 3.

      __I politely disagree with the bold statements ‚ Little is known about cell biology of sporozoite formation.....from electron microscopy studies now decades old' (p.3, 2nd paragraph); ‚To date, only a handful of (instead of ‚or') proteins have been implicated in SG invasion' (p. 4, 1st paragraph). These claims may overlook existing studies; a more thorough review of the literature is recommended. __

      This study includes at least 50 references from papers broadly related to sporozoite biology, covering publications from every decade since the 1970s. The most recent review that discusses salivary gland invasion cites 11 proteins involved in SG invasion. We have replaced "handful" with a more precise term, as it is not the best adjective, but it is hardly an exaggeration.


      Figure changes (Reviewer #3)

      1. __The hypothesis that Plasmodium utilizes two distinct rhoptry pairs for invading the salivary gland and liver cells is intriguing but remains clearly speculative. Are the "cytoplasmic pair" and "docked pair" composed of the same secretory proteins? Are the paired rhoptries identical? How does the parasite determine which pair to use for salivary gland versus liver cell invasion? Is there any experimental evidence showing that the second pair is activated upon successful liver cell invasion? Without such data this hypothesis seems rather premature. __

      We are unaware of any direct protein localisation evidence suggesting that the rhoptry pairs may carry different cargo. However, only a few proteins have been localised in a way that would allow us to determine if they are associated with distinct rhoptry pairs, so this possibility cannot be ruled out either. It seems unlikely that the parasite 'selects' a specific pair, as rhoptries are typically always found at the apical end. What appears more plausible is that the "docked pair" forms first and immediately occupies the apical docking site, preventing the cytoplasmic pair from docking there. Regarding any evidence that the second pair is activated during liver cell invasion, it has been well documented over decades that rhoptries are involved in hepatocyte invasion. If the dimorphic rhoptries are the only ones present in the parasite during hepatocyte invasion, then they must be used for this process. 2.

      __The quality of the "Roolet fibre" image is not good and resembles background noise from PolyE staining. Additional or alternative images should be provided to convincingly demonstrate that PolyE staining indeed visualizes the Roolet fibre. It is puzzling that the structure is visible with PolyE staining but not with tubulin staining. __

      This is a logical misinterpretation based on the image provided in Figure 1c. Our intention was not to imply that PolyE staining enables us to see the rootlet fibre but that PolyE and tubulin allow us to see the APR to which the rootlet fibre is connected. There is some PolyE staining that likely corresponds to the early SPMTs that in 1c appears to run along the rootlet fibre but this is a product of the max-intensity projection. Please see Reviwer#2 Figure Changes Comment #3 for the updated Figure 1c. 3.

      __More arrows should be added to Figures 6b and 6c to guide readers and improve clarity. __

      We have added arrows to Figure 6b and 6c which point out what we have defined as normal and aberrant rhoptries more clearly. These panels now look like this: 4.

      __Figure 2a zoomed image of P. yoelii infected SG is different than the highligted square. __

      We agree that the highlighted square and the zoomed area appear different, but this is due to the differing amounts of light captured by the objectives used in these two panels. The entire SG panel was captured with a 5x objective, while the zoomed panel was captured with a 63x objective. Because of this difference, the plane of focus of the zoomed area is hard to distinguish in the whole SG image. The zoomed image is on the 'top' of the SG (closest to the coverslip), while most of the signal you see in the whole SG image comes from the 'middle' of the SG. To demonstrate this more clearly, we have provided the exact region of interest shown in the 63x image alongside a 5x image and an additional 20x image, all of which are clearly superimposable.__

      __ 5.

      __Figure 3 legend: "P. yoelii infected midguts harvested on day 15" should be corrected. More general, yes, "...development of each oocyst within a single midgut is asynchronous." but it is still required to provide the dissection days. __

      We are unsure what the suggested change here is. We do not know what is wrong with the statement about day 15 post infection, that is when these midguts were dissected. __ Experimental Changes (Reviewer #3)__

      1. __The proposed role of AOR in rhoptry biogenesis appears highly speculative. It is unclear how the authors conclude that "AORs carry rhoptry cargo" solely based on the presence of RON4 within the structure. Inclusion of additional markers to characterize the content of AOR and rhoptries will be essential to substantiate the hypothesis that this enigmatic structure supports rhoptry biogenesis. __

      It is important to note that the hypothesis that AORs, or rhoptry anlagen, carry rhoptry cargo and serve as vehicles of rhoptry biogenesis was proposed long before this study (PMID: 17908361). In that study, it was assumed that structures now called AORs or rhoptry anlagen were developing rhoptries. Although often visualised by EM and presumed to carry rhoptry cargo (PMID: 33600048, 26565797, 25438048), it was only more recently that AORs became the subject of dedicated investigation (PMID: 31805442), where the authors stated that "...AORs could be immature rhoptr[ies]...". Our observation that AORs contain the rhoptry protein RON4, which is not known to localize to any other organelle, we therefore consider sufficient to conclude that AORs carry rhoptry cargo and are thus vehicles for rhoptry biogenesis. 2.

      __The study of RON11 appears to be a continuation of previous work by a collaborator in the same group. However, neither this study nor the previous one adequately addresses the evolutionary context or structural characteristics of RON11. Notably, the presence of an EF-hand motif is an important feature, especially considering the critical role of calcium signaling in parasite stage conversion. Given the absence of a clear ortholog, it would be interesting to know whether other Apicomplexan parasites harbor rhoptry proteins with transmembrane domains and EF-hand motifs, and if these proteins might respond similarly to calcium stimulation. Investigating mutations within the EF-hand domain could provide valuable functional insights into RON11. __

      We are unsure what suggests that RON11 lacks a clear orthologue. RON11 is conserved across all apicomplexans and is also present in Vitrella brassicaformis (OrthoMCL orthogroup: OG7_0028843). A phylogenetic comparison of RON11 across apicomplexans has previously been performed (PMID: 31247198), and this study provides a structural prediction of PbRON11 with the dual EF-hand domains annotated (Supplementary Figure 9). 3.

      __The study cannot directly confirm that membrane fusion occurs between rhoptries and AORs. __

      This is already stated verbatim in the results "Our data cannot directly confirm that membrane fusion occurs between rhoptries and AORs..." 4.

      __It is unclear what leads to the formation of the aberrant rhoptries observed in RON11cKD sporozoites. Since mosquitoes were not screened for infection prior to salivary gland dissection, The defect reports and revisited of RON11 knockdown does not aid in interpreting rhoptry pair specialization, as there was no consistent trend as to which rhoptry pair was missing in RON11cKD oocyst sporozoites. The notion that RON11cKD parasites likely have ‚combinatorial defects that effect both rhoptry biogenesis and invasion' poses challenges to understand the molecular role(s) of RON11 on biogenesis versus invasion. Of note, RON11 also plays a role in merozoite invasion. __

      We are unclear about the comment or suggestion here, as the claims that RON11cKD does not help interpret rhoptry pair specialization, and that these parasites have combined defects, are both directly stated in the manuscript. 5.

      __Do all SG PbRON11cKD sporozoites lose their reduced number of rhoptries during SG invasion as in Figure 7a (no rhoptries)? __

      Not all RON11cKD SG sporozoites 'use up' their rhoptries during SG invasion. This is quantified in both Figure 7a and the text, which states: "64% of *PbRON11cKD SG sporozoites contained no rhoptries at all, while 9% contained 1 rhoptry and 27% contained 2 rhoptries."

      * 6.

      Different mosquito species/strains are used for P. yoelii, P. berghei, and P. falciparum. Does it effect oocyst sizes/stages? Is it ok to compare?

      __ __We agree that a direct comparison between for example * yoelii and P. berghei *oocyst size would be inappropriate, however Figure 3c and Supplementary Figure 4 are not direct comparisons between two species, but a summation of all oocysts measured in this study to indicate that the trends we observe transcend parasite/mosquito species differences. Our study was not set up with the experimental power to determine if mosquito host species alter oocyst size. 7.

      __While I acknowledge that UExM has significantly advanced resolution capabilities in parasite studies, the value of standard microscopy technique should not be overlooked. Particularly, when discussing the function of RON11, relevant IFA and electron microscopy (EM) images should be included to support claims about RON11's role in rhoptry biogenesis. This would complement the UExM data and substantially strengthen the conclusions. Importantly, UExM can sometimes produce unexpected localization patterns due to the denaturation process, which warrants caution. __

      The purpose of this study is not to discredit, undermine, or supersede other imaging techniques. It is simply to use U-ExM to answer biological questions that cannot or have not been answered using other techniques. Please refer to Reviewer # 1 Minor text changes comment#17 to see the new paragraph "Comparison of MoTissU-ExM and other imaging modalities" that addresses this

      Both conventional IFA and immunoEM have already been performed on RON11 in sporozoites before (PMID: 31247198). When assessing defects caused by RON11 knockdown, conventional IFA isn't especially helpful because it doesn't allow visualization of individual rhoptries. Thin-section TEM also doesn't provide the whole-cell view needed to draw these kinds of conclusions. Volume EM could likely support these observations, but we don't have access to or expertise in this technique, and we believe it is beyond the scope of this study. It's also important to note that for the defect we observe-missing or abnormal rhoptries-the visualization with NHS ester isn't significantly different from what would be seen with EM-based techniques, where rhoptries are easily identified based on their protein density.

      The statement that "UExM can sometimes produce unexpected localisation patterns due to the denaturation process..." is partially correct but lacks important nuance in this context. Based on our extensive experience with U-ExM, there are two main reasons why the localisation of a single protein may look different when comparing U-ExM and traditional IFA images. First, denaturation: in conventional IFAs, antibodies need to recognize conformational epitopes to bind to their target, whereas in U-ExM, antibodies must recognize linear epitopes. This doesn't mean the target protein's localisation changes, only that the antibody's ability to recognize it does. Second, antibody complexes seem unable to freely diffuse out of the gel, which can result in highly fluorescent signals not related to the target protein appearing in the image, as we have previously reported (PMID: 36993603). Importantly, neither of these factors applies to our phenotypic analysis of RON11 knockdown. All phenotypes described are based solely on NHS Ester (total protein) staining, so the considerations about changes in the localisation of individual proteins are not relevant.

    1. We are experiencing civil strife at this moment due to breakdowns in human-centered discourse and dialogue. Technology is, in part, to blame because, despite its marvelous achievements, it disconnects us from direct human interaction, eroding trust and squandering meaning. We have lost sympathy and absorbed indifference through online echo-chambers or fervent social media chains.

      The passage points out that while technology helps us stay connected, it can also weaken our social ties and make real conversation harder. When so many of our interactions happen through screens, we lose important habits like listening closely, disagreeing respectfully, and seeing each other as an actual human being. Online platforms usually strengthen our existing views instead of encouraging real discussion or empathy, so we end up talking past each other instead of truly connecting. As a result, people can become emotionally distant and only engage with important topics in a shallow way, since complex debates often get reduced to quick comments, likes, or shares.

      The passage also suggests that civility means more than just being polite. It is about creating a shared space where people can disagree without showing contempt. When technology encourages quick reactions and outrage, it becomes harder to slow down, ask honest questions, or admit mistakes. This can lead to more mistrust, and small misunderstandings may quickly turn into bigger social conflicts or even civil strife. It is easy for people to say what ever they want to whoever they want when they don't have to see their faces or fully interact with someone. Things can also be misinterpreted based on the "tone of voice" someone may read it in, even if that is not the tone intended. I think that makes people feel more inclined and quick to make their point, regardless of how it may make people feel. I believe it is important to be aware of the impact of words, even just written, and how it can make others feel and I hope more people will start to take that into consideration when reacting and responding online.

  3. Jan 2026
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 Reviewer 1 Point 1- The authors describe cortical neuronal counts across several mammalian species, which is quite impressive, but the information on the methods of counting is lacking: how representative are the data used / shown; how many individuals / brains / sections were used for each species considered? Much more detailed description of the quantifications should be provided to judge the validity of this first conclusion.

      Response: We sincerely thank the reviewer for this insightful and constructive suggestion. We agree that the methodological description of our comparative histological analysis, which is the fundamental basis of this study, was insufficient in the original manuscript. Following the reviewer’s advice, we have extensively revised the Materials and Methods section entitled “Nissl staining and neuronal cell number count” (Page 32, Line 15).

      Reviewer 1 Point 2- The authors use several markers of cortical neuron identity to confirm their neuron number measurements, but from the data shown in Figure 1D,E it seems that only some markers (Satb2) show species-differences while others do not (CTIP2 / Tbr1). How do the authors explain this discrepancy - does this mean that it is mainly Satb2 neurons that are increased in number? But if so how to explain the relative increase in subcortical projections shown in Figure S7?

      Response: We appreciate the reviewer’s insightful comments regarding the marker expression patterns. Upon re-evaluating our data in light of your feedback, we agree that the species differences in deep-layer (DL) markers such as Ctip2 and Tbr1 in the adult stage appear relatively modest compared to the robust differences observed in Satb2 and the projection data shown in Figure S8.

      To address this point, we have incorporated a comparison between the adult data (Figure 1) and our findings from P7 (Figure S2). As shown in the revised manuscript, the species differences for all markers are significantly more pronounced at P7 than in the adult. Notably, in the lower layers, rats exhibit a significantly higher number of marker-positive cells across all markers, including those newly added in this revision, compared to mice.

      We offer the following interpretation regarding these temporal differences:

      1. Developmental Relevance: The marker molecules analyzed are well-established regulators of neuronal subtype fate and projection identity during development. Their critical fate-determining functions are primarily exercised during the migration and maturation phases of nascent neurons.
      2. Postnatal Expression Shifts: Whether these molecules maintain functional roles in the fully matured adult brain remains less certain. It is plausible that marker expression may diminish in certain neuronal populations during late postnatal development, leading to the attenuated species differences observed in adults. Consequently, we believe the strong correlation between P7 quantitative data and projection fate provides a biologically sound validation of our hypothesis.

      While we have kept the discussion in the main text concise to maintain focus for the general reader, we have provided comprehensive data in Figure 1 and Figure S2. This ensures that the necessary evidence is readily available for specialists interested in these developmental dynamics.

      Reviewer 1 Point 3- The authors focus their study almost exclusively on somatosensory cortex, but can they comment on other areas (motor, visual for instance)? It would be nice to provide additional comparative data on other areas, at least for some of the parameters examined across mouse and rat. Alternatively the authors should be more explicit in the abstract and description of the study that it is limited to a single area.

      Response: We sincerely appreciate the reviewer’s insightful comment. As suggested, we have revised the Abstract to explicitly state that our current analysis is focused on the somatosensory cortex. Furthermore, as demonstrated in Figure 1B, we have added a discussion regarding the possibility that the species differences observed in the primary somatosensory cortex may be a general feature shared across the entire cerebral cortex, as follows: “This DL-biased thickening in rats was evident in the primary somatosensory area, but is consistently observed throughout the rostral-caudal cortical regions. (Page 19, Lines 29-31)“

      Reviewer 1 Point 4- The authors provide convincing evidence of increased Wnt signaling pathway in the rat. They should show more explicitly how other classical pathways of neurogenic balance / temporal patterning are expressed in their mouse and rat transcriptome data sets. These would include Notch, FGF, BMP, for which all the data should be available to provide meaningful species comparison.

      Response: We sincerely thank the reviewer for this insightful suggestion. Following your advice, we have newly included comparative data on key signaling pathways essential for cortical development—namely Wnt, FGF, NOTCH, mTOR, SHH, and BMP—across different species. These results are now presented in Figure S17. Rat progenitors show comparable patterns to other species for FGF, mTOR, and Notch signaling, but elevated Wnt and BMP expression, especially at early stages. A detailed heatmap of raw Wnt pathway gene expression across species is also included in the same supplementary figure. We believe these additions provide a more comprehensive evolutionary perspective and significantly strengthen our findings.

      Reviewer 1 Point 5- The alignment of mouse and rat trajectories is very nicely showing a delay at early-mid-corticogenesis. But there is also heterochronic transcriptome at latest stages (end of 5). How can this be interpreted? Does this mean potentially prolonged astrogliogenesis in the rat cortex?

      Response: We sincerely appreciate the reviewer’s insightful comment and the meticulous attention given to our data. Regarding the heterochronic shift observed at Day 5, we agree that this point was not sufficiently addressed in the original manuscript.

      We would like to clarify the two primary reasons for this omission, which are inherent to the current study’s design:

      1. Resolution of Stage Alignment at Temporal Extremes: In our developmental stage alignment analysis, corresponding stages are defined by pairs showing the highest transcriptomic similarity within the sampled range. By definition, the precision of this alignment tends to decrease at the earliest and latest time points of a dataset. Since the "true" biological equivalent might lie outside our sampling window, we must be cautious in interpreting shifts at these temporal boundaries.
      2. Difference in Validation Rigor: Our study prioritized the early stages of deep-layer (DL) neuron production. Consequently, we rigorously defined the onset of neurogenesis in rats (Day 1) using multiple independent methods, including clonal analysis, immunohistochemistry, and gene expression. In contrast, Day 5 was defined simply as five days post-initiation of neurogenesis, without equivalent multi-modal validation. Given that our primary focus is the early phase of neurogenesis, the precision of the transition from late neurogenesis to gliogenesis is relatively lower. For these reasons, we believe that an in-depth discussion of the heterochronic shift at Day 5 might lead to over-interpretation. To reflect this more accurately and avoid misleading the reader, we have revised Figure 6F to de-emphasize the Day 5 shift. In addition, we revised the manuscript as “Importantly, while this analysis identified stage pairs with the highest similarity, the correspondence at the edges of the temporal sampling window is inherently less certain than at the center. Consequently, we focus on the notable reflection point at the center of our dataset. (Page 13, Lines 37-39)”.

      We believe these changes more faithfully represent the biological scope of our data while maintaining the scientific integrity of our primary conclusions.

      Reviewer 1 Point 6- Figure 7: description implies that module 3 is a subset of module 4, but this is not obvious at all from the panels shown. Please clarify.

      Response: We sincerely appreciate the reviewer’s careful reading of our manuscript. As suggested, we have revised Figure 7 to clarify the hierarchical relationship between Module 3 and Module 4, ensuring that their inclusion is now explicitly presented.


      Reviewer #2 Reviewer 2 Point 1. The introduction lacks sufficient background and fails to convey the significance of the study. Specifically, why the research was undertaken, what knowledge gap it addresses, and how the findings could be applied. Addressing these questions already in the introduction would enhance the impact of the work and broaden its readership.

      Response: We sincerely appreciate the reviewer’s insightful comment on this point. Our study reports evolutionary insights gained through an unconventional approach: a single-cell level comparison between mice and rats. We agree that clarifying the necessity of this specific approach is crucial for the manuscript. Accordingly, we have added the following two points to the Introduction:

      1. At the end of the first paragraph, we emphasized the current lack of research on the evolutionary adaptation of cortical circuits, despite the established functional importance of evolutionarily conserved circuits. (Page 3, Lines 7-10); “Paradoxically, despite the importance of these variations, research has predominantly focused on the conserved aspects of cortical architecture. Consequently, the degree of evolutionary plasticity inherent in these circuits and the cell-intrinsic mechanisms driving their modification remain profoundly enigmatic.”)
      2. At the end of the third paragraph, we revised and added text (Page3, Lines 26-27; “This lack of comparative insight represents a significant gap in our understanding of how conserved developmental programs give rise to species-specific brain architectures.”).

      Reviewer 2 Point 2. In figure 5 the authors conclude that "differences in cell cycle kinetics and indirect neurogenesis are unlikely to be the primary factors driving the species-specific variation in DL neuron production. Instead, the temporal regulation of progenitor neurogenic competence, which determines the duration of the DL production phase, provides a more plausible explanation for the greater number of DL subtypes observed in rats". It is not clear to this reviewer how the authors come to this conclusion. Authors observe a significant proportion of mitotic cells in rat VZ from day 1, and a higher constant proportion of mitotic progenitors in SVZ rats compared to mouse (Figure 5C). This points to an early difference in mitotic progenitors that may also lead to increased IP numbers, and potentially an increased number in DL cells, even before day 1. In addition, the higher abundance of IPs in the G2/S phase (statistically significant in 4 of the 7 time points) (Figure 5F), would suggest that this difference might play a role in the species-specific variation of DL neuron production. The authors should estimate cell cycle length instead of just measuring proportions to conclude something about cell cycle kinetics. They can then model growth curves to predict the effect caused if there were differences in cell cycle length between equivalent cell types across species.

      Response: We sincerely thank the reviewer for their careful reading of our manuscript and for pointing out the overstatements in our original descriptions. We agree that a more nuanced interpretation of the data was necessary. In response to these constructive suggestions, we have made the following revisions:

      1. Refinement of Descriptions: We have revised the text to more accurately reflect our findings, specifically noting that the increase in RG division on Day 1 and IP proliferation throughout the neurogenic period showed a significant trend. These features are now described more fairly and cautiously in the revised manuscript. (Page 11, Lines 42-46; “Remarkably, while the temporal dynamics of mitotic density were strikingly conserved between the two species, subtle yet discernible species-specific signatures emerged. Specifically, rats exhibited a higher ratio of mitotic cells in the VZ at the onset of neurogenesis, the precise period when DL subtypes are generated in both species. Further assessment of G2/S-phase cells via pulse-EdU labeling (Figure 5D, E) “)
      2. Inclusion of Time-lapse Imaging Data: The reviewer is correct that measuring the proportions of M and G2/S phases provides only a limited snapshot of cell cycle dynamics. To gain a more precise insight, we performed primary cultures of neural progenitor cells (NPCs) from Day 1 and conducted live-cell time-lapse imaging. This allowed us to directly quantify the cell cycle duration of mouse and rat NPCs (Figure S9A-C).
      3. Comparative Analysis and Mathematical Modeling: Our new data revealed that the cell cycle lengths of the two species are remarkably similar, with no significant differences observed under these culture conditions. Furthermore, to validate the impact of these findings on overall brain development, we developed a mathematical model based on our experimental data. This model predicts the total number of cells produced over the five-day neurogenic period, providing a more robust theoretical framework for our conclusions (Figure S9D). We believe these additions significantly strengthen the manuscript and address the reviewer's concerns regarding the physiological relevance of our observations.

      Reviewer 2 Point 3. In Figure 6 the authors focus only on the mouse and rat datasets. Given the availability of datasets from primates that the author used already for Figure 7, it would give the reader a broader prospective if also these datasets would be integrated in the analysis done for Figure 6, particularly it would be interesting to integrate them in the pseudotime alignment of cortical progenitor. How do human and/or macaque early and late neurogenic phase would compare to mouse and rat in this model?

      Response: We sincerely appreciate the reviewer’s insightful suggestion. In accordance with this comment, we have now incorporated pseudotime alignments of cortical progenitors between primates (human, macaque) and rodents (mouse, rat), presented as pairwise gene expression distance matrices with dynamic time warping in Figure S13. These heatmaps illustrate temporal compression or stretching in progenitor gene expression progression across species. Notably, macaque progenitors show no definitive deviations from rodents, whereas human progenitors exhibit distinct protraction relative to rats and even more so to mice. These additions provide a more comprehensive cross-species perspective without altering the study's core conclusions.

      Reviewer 2 Point 4. In Figures 6C and 6D, the authors distinguish between cycling and non-cycling NECs and RGCs. Could the authors clarify the rationale behind making this distinction? Could the authors comment on how they interpret the impact of cycling versus non-cycling states on species-specific non-uniform scaling? Do they consider the observed non-linear correspondences to be driven by differences in cell cycle activity?

      Response: We are grateful to the reviewer for their insightful observation. We agree that our initial classification of neural progenitor cell (NPC) populations based on proliferation marker expression levels followed a convention used in other studies but was, in the context of this work, unnecessary and potentially misleading. To avoid further confusion and focus on the core biological question, we have re-organized the data by pooling these populations into a single group. Regarding the concern about species differences in cell cycle kinetics, we believe there is no significant divergence between mice and rats that could explain the observed developmental patterns in temporal progression of neurogenesis. This is supported by two lines of evidence:

      1. Quantitative analysis of pH3-positive cells (Figure 5).
      2. New time-lapse imaging data of primary cultured NPCs, which shows no substantial difference in cell cycle length between the two species (Figure S9). These results indicate that the species-specific differences in deep-layer (DL) neuron production are not driven by cell division kinetics. Consequently, we conclude that the non-linear developmental progression of NPCs occurs independently of cell cycle regulation.

      Reviewer 2 Point 5. For the non-uniform scaling in Figure 6F, the authors identify critical inflection points and mention that "the largest delay in rat progenitors occurring where Day 1 and Day 3 progenitors overlapped". It would be good if the authors could discuss what they think all the inflection points represents. How much can it be explained by the heterogeneity within progenitors per time point? There is a clear higher spread of histograms at days 3 and 5, and the histogram at day 5 almost overlaps with day 1. I wonder if the same conclusion about non-uniform scaling would be detected if the distance matrix was built separately for specific cell types, for example only looking at NECs or RGCs.

      Response: We sincerely appreciate the reviewer’s insightful perspective on this point. In alignment with the suggestions from both this reviewer and Reviewer 1 (Point 5), we have updated the manuscript to discuss all identified inflection points. Specifically, we have clarified why our discussion focuses on the correspondence between Mouse D1 and Rat Day 3.

      A recognized limitation of our current analytical approach is that it identifies the closest matching expression profiles within the specific timeframes sampled for each species. For stages at the beginning or end of our sampling window, the "true" corresponding stage in the other species may lie outside our sampled range, which naturally limits the strength of any conclusions regarding those boundary points. Consequently, while we can confidently confirm the correspondence between Mouse Day 1 and Rat Day 3—both of which sit centrally within our sampled window—we have intentionally avoided over-interpreting data near the temporal boundaries.

      Regarding the cell types analyzed, this specific analysis was conducted exclusively on NECs and RGs (now shown in Figure 6F). Extensive prior research (Susan McConnell lab, Sally Temple lab, Fumio Matsuzaki lab, Dennis Jabaudon lab, and more) has established that the time-dependent mechanisms governing the fate determination of cortical excitatory neuron subtypes are encoded within RGs. Therefore, we focused our investigation on these lineages and did not include other cell types in this study. We believe this focused approach maintains the highest degree of biological relevance for our conclusions.

      Reviewer 2 Point 6. The authors conclude that the elevated and prolonged expression of Wnt-ligand genes in rat RGs extend the DL neurogenic window and contribute to rat-specific expansion of deep cortical layer. In order to validate this finding it would be good for the authors to perform a perturbation experiment and reduce Wnt signalling/ Axin 2 levels in rats or depleted the Lmx1a and Lhx2 double-positive population. Response: __We thank the reviewer for this insightful suggestion. We agree that providing direct experimental evidence is crucial to demonstrating that elevated Wnt signaling in RG progenitors drives the production of DL subtype neurons in rats. To address this, we performed a functional intervention on Day 3, a stage when Wnt signaling (indicated by Axin2 expression) is significantly higher in rats than in mice (__Figure 7C, D). By introducing a dominant-negative form of TCF7L2 (dnTCF7L2) to inhibit Wnt signaling specifically in RG progenitors, we tracked the fate of the resulting neurons (Figure 7I, J). Our results showed a clear reduction in the proportion of DL neurons, accompanied by a reciprocal increase in upper-layer (UL) neurons. These findings demonstrate that maintained high levels of Wnt signaling are essential for the prolonged neurogenic capacity for DL neurons in rats. This new data has been incorporated into Figure 7.

      Reviewer 2 Point 7. The authors conclude that Wnt signaling is a rat specific effect since they did not observe any clear temporal change in wnt receptors in gyrencephalic species, and only a subset of RG in rats co-express Lmx1a and Lhx2. However, specific Wntligands and receptors (Wnt5a, Fzd and Lrp6) seem to be upregulated in human as well (Fig 7G), non RG cells could act as wnt ligand inducers in other species, and it has not been demonstrated that Lmx1a and Lhx2 are the source for Wntligand production. I wonder if the authors can completely rule out a role for Wnt in the protracted neurogenesis of other species.

      Response: We sincerely appreciate the reviewer’s insightful and broad perspective regarding Wnt signaling dynamics across diverse species. In this study, our primary focus was to elucidate the specific mechanisms underlying the differences between mice and rats. Consequently, we did not initially explore Wnt dynamics in other species or their roles in developmental timing in great depth in the original manuscript. We fully acknowledge that lineage-specific adaptations occur at the individual gene level; for instance, Silver and colleagues have reported that human-specific upregulation of Wnt receptor gene FZD8 modulates neural progenitor behavior (Boyd et al., Current Biology 2008, Liu et al., Nature 2025). However, our comparative analysis of five mammalian species—carefully aligned by developmental stage—reveals a distinct global trend. While individual gene variations exist like human FZD8, the expression levels of multiple Wnt-related genes, particularly ligands, are markedly higher in rats than in the other four species.

      Following the reviewer’s insightful suggestion, we examined the potential role of Lmx1a in activating Wnt ligand transcription in rat cortical progenitors by analyzing their expression correlation at the single-cell level. Our analysis revealed that several Wnt ligand genes are co-expressed with Lmx1a with a remarkably strong positive correlation. While we have not yet experimentally demonstrated the direct transcriptional activation of Wnt ligands by Lmx1a in these cells, this robust correlation at single-cell resolution strongly suggests that Lmx1a regulates Wnt ligand expression. These new findings are now included in Figure 7 and Figure S16, and the corresponding results section (Page 15, Lines 42-44) has been revised accordingly.

      __Reviewer 2 Point 8 __Minor comments: The RNAscope experiment is currently qualitative. Is it the mRNA copy number per cell equal in both species but more cells are positive in rat, or are there differences in number of mRNA molecules as well? It is not indicated if the RNAscopeprobes are the same for mouse and rat.

      Response: We sincerely thank the reviewer for this insightful suggestion. Following the comment, we performed RNAscope analysis for Axin2 in both mice and rats and quantified the results (now included in Figure 7D). The new data successfully validate the species differences initially observed in our scRNAseq analysis: specifically, the period of high-level Axin2 expression is significantly extended in rats compared to mice. These findings provide histological evidence that reinforces our conclusions regarding the distinct temporal dynamics between the two species.

      Regarding probe design, the Axin2 RNAscope probes target conserved and corresponding sequences between mouse and rat, with species-specific probes optimized for each organism to ensure maximal specificity and sensitivity. We have updated the Methods section ("Fluorescent in situ hybridization with RNAscope") to include these details.

      Reviewer #3

      Reviewer 3 Point 1. Satb2 is also widely recognized as a deep layer marker. The authors need to perform analysis and quantification in Figs 1 and 4 with other II/III and IV markers such as Cux1 and Rorb.

      Response: We thank the reviewer for their insightful comments regarding the marker specificity. We fully agree that while Satb2 is a robust marker for callosal projection identity, its broad distribution across both deep and upper layers limits its utility as a layer-specific marker. As the reviewer suggested, Cux1 (Layers 2/3) and Rorb (Layer 4) are indeed superior markers for defining laminar identity.

      To address this, we have incorporated new immunohistochemical data for these markers in both the quantification of somatosensory cortical neurons (Figure S2) and the birth-dating analysis (Figure 4).

      Our new findings are as follows:

      1. Layer Quantification (Figure S2): By utilizing Cux1 and Rorb as more specific upper-layer (UL) markers, we confirmed that there are no significant differences in the number of these neurons between mice and rats.
      2. Birth-dating Analysis (Figure 4): These markers allowed us to more precisely define the timing of Cux1/Rorb-positive cell generation, revealing subtle but important differences between the two species. While these additions do not alter the fundamental narrative of the original manuscript, they have significantly enhanced the precision and rigor of our analysis. We are grateful to the reviewer for guiding us toward this more robust validation.

      Reviewer 3 Point 2. Rats have larger cortices. Therefore, quantification of neurons should also be normalized to cortical thickness in Fig 1E and also represented with individual data points.

      Response: We sincerely appreciate the reviewer’s constructive suggestion. We agree that normalizing the number of cortical neurons by thickness provides a more rigorous comparison. Accordingly, we have calculated the neuronal density (cell count per unit thickness) for Tbr1- and Ctip2-positive cells and included these data in Figure S2C. Our analysis confirms that these populations are distributed at a significantly higher density in mice compared to rats.

      Furthermore, we have updated the visualization in Figure 1E to display individual data points, ensuring full transparency of the underlying distribution. We believe these revisions, prompted by the reviewer’s insight, have substantially strengthened the clarity and persuasiveness of our manuscript.

      Reviewer 3 Point 3. The clonal analysis in Figs 2 and 3 quantifies GFP and RFP and reports these as neurons. However, without using cell-specific markers, it seems the authors cannot exclude that some progeny are also glia derived from a radial glial progeny. I don't expect all experiments to have this but they must have some measures of both populations to address this possibility. This needs to be addressed to build confidence in the conclusion that there is clonal production of neurons.

      Related to this, the relationship between position and fate is not always 1 to 1. The data summarized in Fig 2G are based on position and not using subtype markers. They should include assessment of markers as they do in Fig 4.

      Response: We sincerely thank the reviewer for this insightful comment. We agree that a clear definition of cell types is essential for the accuracy of clonal analysis.

      In this study, we primarily identified neurons based on their distinct morphological characteristics and performed measurements specifically on these cells. To validate this approach, we confirmed that the vast majority of cells identified as neurons were positive for NeuN and cortical excitatory neuron markers, while remaining negative for glial markers such as Olig2 and SOX9. (Notably, at postnatal day 7, most cells in the glial lineage exist as undifferentiated Olig2-positive progenitors). These observations support our conclusion that the cells analyzed based on morphology are indeed cortical excitatory neurons.

      As the reviewer rightly pointed out, evaluating cell composition using fate-specific marker expression is the ideal approach. However, our current experimental setup required multiple fluorescence channels for DAPI staining (to assess tissue architecture) and immunostaining for GFP and RFP (to identify labeled clones). Due to these technical constraints regarding available detection channels and host species compatibility, we relied on morphological criteria for the primary analysis.

      To address this concern and ensure the reliability of our findings, we performed additional analyses using a subset of samples. By co-staining retrovirally labeled neurons with cell-fate markers, we obtained results consistent with our other data (Figures 1 and 4) regarding laminar position and marker expression. Based on this consistency, we are confident that our classification based on morphology and laminar position does not alter the fundamental conclusions of this study.

      Reviewer 3 Point 4. In Fig 5, the authors use PH3 as well as EdU to measure differences in indirect neurogenesis. Using EdU and Tbr2 they report more dividing IPs. However they need to measure this over the total number of Tbr2 cells as it is not normalized to differences in Tbr2 cells between species. Are there total differences in Tbr2+ cells when normalized to DAPI as well? Moreover, little analyses is performed to measure any impact on radial glia. As no striking differences were observed in IPs this leaves the cellular mechanism a bit unclear and begs the impact on radial glia. Measuring PH3+ cells in VZ and SVZ is not cell specific nor does it yield information to support the prolonged neurogenesis.

      Response: We sincerely thank the reviewer for this insightful suggestion. We agree that quantifying Tbr2+/EdU+ double-positive cells alone was insufficient to fully capture the IP dynamics. Following the reviewer’s advice, we have now quantified the total population of Tbr2+ cells, normalized to the number of DAPI-stained nuclei. This new analysis reveals that mice and rats exhibit nearly indistinguishable temporal dynamics (Figure S10). When integrated with the original Tbr2+/EdU+ data in Figure 5, these findings suggest that rats maintain a slightly higher IP pool throughout the neurogenic period. This implies that the increased neuronal production in rats is not restricted to a specific phase, but rather occurs consistently across all developmental stages. We believe these additional data significantly strengthen our conclusions.


      Reviewer 3 Point 5. The sc-seq is done in rat and compared to published mouse data from corresponding stages. They conclude species specific differences in progenitor gene expression. I am unsure how appropriate this is. Are similar sequencing platforms used? Can they find similar results if using multiple dataset? There are other datasets that may be used to validate these findings beyond DiBella et al.

      Response: We sincerely thank the reviewer for this insightful comment. We agree that establishing the validity of our analytical approach is crucial for the reader’s confidence in our findings. To address this, we have explicitly stated in the revised manuscript that both our rat scRNAseq data and the publicly available datasets were generated using consistent experimental platforms. This ensures that the integration process is technically sound.

      Revised text (Page 13, Lines 16-18): “After quality control, we integrated these profiles with previously published mouse cortical cell data from corresponding neurogenic stages, which is prepared using the consistent platform with ours (35) (Figure S11).”

      Furthermore, to ensure the robustness of our comparative analysis, we have incorporated an additional independent dataset (Ruan et al., PNAS 2021) in addition to the Di Bella et al. Nature 2021 data used in the original manuscript. We confirmed that the results obtained using this second dataset are highly consistent with our initial findings, further validating our conclusions across different studies (Figure S13A).

      Reviewer 3 Point 6. Wnt ligand analysis requires validation in situ across developmental stages, to support their conclusions. Ideally they might consider doing some manipulations to provide context to this observation.

      Response: We sincerely thank the reviewer for these insightful suggestions. We agree that validating the spatial expression patterns of Wnt ligands and confirming their expression in rat-specific RG, as suggested by our scRNAseq data, is crucial for strengthening our conclusions.

      Regarding the expression of Wnt3a, a key ligand in cortical development: although immunohistochemical analysis clearly identified Wnt3a expression in the cortical hem, the expression levels in RG within the cortical area were substantially lower than those in the hem, making definitive visualization challenging. To complement these findings and provide more robust evidence, we performed the following additional experiments:

      1. Validation of Wnt signaling levels: Using RNAscope-based in situ hybridization for Axin2, we successfully confirmed the elevated Wnt signaling levels in rat-specific RG (Figure 7C, D), consistent with our scRNAseq findings.
      2. Elucidating strikingly high correlated expressions of Lmx1a and Wnt ligand genes in the rat cortical progenitors in our scRNAseq dataset (Figure S16B).
      3. Functional analysis: To test the functional significance of this signaling, we inhibited Wnt signaling by electroporating dominant-negative TCF7L2 into rat RG at E15.5. This manipulation resulted in a subtype shift of the generated neurons toward an upper-layer identity (Figure 7I, J). These new results demonstrate that the rat-specific extension of high Wnt signaling levels serves as a fundamental mechanism for the prolonged production of deep-layer (DL) neurons. We are grateful to the reviewer for these suggestions; these additional data have significantly strengthened our core argument that the heterochronic regulation of Wnt signaling states drives the evolution of cortical neuronal composition.

      __Reviewer 3 Point 7 __Minor concerns-1

      Please separate images in Fig 1D it is very strange to have them all on top of each other.

      Response: We sincerely thank the reviewer for this suggestion. As requested, we have provided individual channel images alongside the merged multicolor panels. We agree that this modification significantly enhances the clarity of our data and makes the results much easier to interpret.

      __Reviewer 3 Point 8 __Minor concerns-2

      Are data in Fig 4E Edu+Tbr1+EdU+? This should be clarified and would be most accurate.

      Response: We appreciate the reviewer’s suggestion. We added the label of Y axes of the plots in Figure 4E-K. The procedure of cell count in these analyses are documented in the caption of Figure 4E-K, “Normalized counts of neurons colabeled for EdU and projection-specific markers, relative to the peak of EdU+ and marker+ cells.”.

      __Reviewer 3 Point 9 __Minor concerns-3

      Fig 4 graphs only have titles without Y axis. Please adjust location of title or repeat for clarity.

      Response: We thank the reviewer for this helpful suggestion. To clarify the definition of the Y-axis, we have now added a descriptive label to the axis in the revised figure.

      __Reviewer 3 Point 10 __Minor concerns-4

      Fig 4A implies cumulative incorporation which I don't think is being performed here. They should clarify this in the figure.

      Response: We appreciate the reviewer’s insightful comment. To avoid any potential misunderstanding regarding the additivity of the effect, we have revised the illustration in Figure 4A for greater clarity.

      __Reviewer 3 Point 11 __Minor concerns-5

      Fig 5 needs labels for the actual stages assayed, as illustrated in Fig 4A.

      Response: We thank the reviewer for this helpful suggestion. Following your comment, we have added the developmental stage information (expressed as embryonic days) for both mice and rats in the revised manuscript.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Yamauchi et al. performed a comparative anatomical analysis of the layer architecture in the primary somatosensory cortex across 8 mammalian species. Unlike primates, which show an expansion of upper layers (UL), rodents, especially rats, display a pronounced thickening of deep layers (DL). In this study they focus on comparing rats and mice, given the higher abundance of DL neuron subtypes in rats. Using histological analysis, they showed that rats possess significantly more DL neurons per cortical column than mice, while UL neuron counts remain similar. Clonal lineage tracing showed that rat radial glial (RG) progenitors generate more DL neurons, indicating species-specific differences in progenitor neurogenic activity. Birth dating assays confirmed an extended DL neurogenesis phase in rats, followed by a conserved UL generation phase. Single-cell RNA sequencing further revealed that rats maintain an early progenitor state longer than mice, marked by sustained expression of DL-associated genes. Specifically, rat RG progenitors exhibit prolonged and elevated expression of Wnt signaling genes, particularly Wnt ligands. Comparative analysis of published single-cell RNA-Seq across species highlighted that this extended Wnt-high period in rats is exceptional, suggesting a species-specific extension of a conserved neurogenic program.

      Major comments:

      This reviewer thinks the topic is exciting, and the experiments elegant, insightful and well described. The paper is well written and follows a very logical flow, the conclusion for each experiment is supported by the data and they are carefully stated. This reviewer really appreciated the summary illustration included as a panel in each figure, they think that this greatly enhanced the clarity and accessibility of the data presented, especially because species comparison can be difficult to follow.

      In this reviewer's opinion, there are some aspects of the findings that the authors would need to clarify/address to explain in clarify the phenotype observed and to enhance the overall significance of this very well-made paper: 1. The introduction lacks sufficient background and fails to convey the significance of the study. Specifically, why the research was undertaken, what knowledge gap it addresses, and how the findings could be applied. Addressing these questions already in the introduction would enhance the impact of the work and broaden its readership. 2. In figure 5 the authors conclude that "differences in cell cycle kinetics and indirect neurogenesis are unlikely to be the primary factors driving the species-specific variation in DL neuron production. Instead, the temporal regulation of progenitor neurogenic competence, which determines the duration of the DL production phase, provides a more plausible explanation for the greater number of DL subtypes observed in rats". It is not clear to this reviewer how the authors come to this conclusion. Authors observe a significant proportion of mitotic cells in rat VZ from day 1, and a higher constant proportion of mitotic progenitors in SVZ rats compared to mouse (Figure 5C). This points to an early difference in mitotic progenitors that may also lead to increased IP numbers, and potentially an increased number in DL cells, even before day 1. In addition, the higher abundance of IPs in the G2/S phase (statistically significant in 4 of the 7 time points) (Figure 5F), would suggest that this difference might play a role in the species-specific variation of DL neuron production. The authors should estimate cell cycle length instead of just measuring proportions to conclude something about cell cycle kinetics. They can then model growth curves to predict the effect caused if there were differences in cell cycle length between equivalent cell types across species. 3. In Figure 6 the authors focus only on the mouse and rat datasets. Given the availability of datasets from primates that the author used already for Figure 7, it would give the reader a broader prospective if also these datasets would be integrated in the analysis done for Figure 6, particularly it would be interesting to integrate them in the pseudotime alignment of cortical progenitor. How do human and/or macaque early and late neurogenic phase would compare to mouse and rat in this model? 4. In Figures 6C and 6D, the authors distinguish between cycling and non-cycling NECs and RGCs. Could the authors clarify the rationale behind making this distinction? Could the authors comment on how they interpret the impact of cycling versus non-cycling states on species-specific non-uniform scaling? Do they consider the observed non-linear correspondences to be driven by differences in cell cycle activity? 5. For the non-uniform scaling in Figure 6F, the authors identify critical inflection points and mention that "the largest delay in rat progenitors occurring where Day 1 and Day 3 progenitors overlapped". It would be good if the authors could discuss what they think all the inflection points represents. How much can it be explained by the heterogeneity within progenitors per time point? There is a clear higher spread of histograms at days 3 and 5, and the histogram at day 5 almost overlaps with day 1. I wonder if the same conclusion about non-uniform scaling would be detected if the distance matrix was built separately for specific cell types, for example only looking at NECs or RGCs. 6. The authors conclude that the elevated and prolonged expression of Wnt-ligand genes in rat RGs extend the DL neurogenic window and contribute to rat-specific expansion of deep cortical layer. In order to validate this finding it would be good for the authors to perform a perturbation experiment and reduce Wnt signalling/ Axin 2 levels in rats or depleted the Lmx1a and Lhx2 double-positive population. 7. The authors conclude that Wnt signaling is a rat specific effect since they did not observe any clear temporal change in wnt receptors in gyrencephalic species, and only a subset of RG in rats co-express Lmx1a and Lhx2. However, specific Wnt ligands and receptors (Wnt5a, Fzd and Lrp6) seem to be upregulated in human as well (Fig 7G), non RG cells could act as wnt ligand inducers in other species, and it has not been demonstrated that Lmx1a and Lhx2 are the source for Wnt ligand production. I wonder if the authors can completely rule out a role for Wnt in the protracted neurogenesis of other species.

      Minor comments:

      The RNAscope experiment is currently qualitative. Is it the mRNA copy number per cell equal in both species but more cells are positive in rat, or are there differences in number of mRNA molecules as well? It is not indicated if the RNAscope probes are the same for mouse and rat.

      Significance

      How different species achieve such remarkable differences in brain shape and size remains poorly understood. A critical aspect of this process is the duration of the neurogenic phase: the period during which neural progenitors generate neurons. This phase tends to be extended in species with larger brains and contains multiple neuronal stem cell types in varying proportions. It is thought that this accounts for their increased neuronal numbers. In their search for mechanisms that prolong neurogenesis across species, the authors propose a rat-specific role for Wnt ligands in expanding the neurogenic period in the rat brain. Importantly, they rule out that this mechanism operates in other species, such as primates or ferrets, to achieve similar extensions.

      The study is of high quality, incorporating rigorous lineage-tracing experiments in two species and single-cell RNA sequencing. Previous work established a role for Wnt signaling in regulating early neurogenesis in mice. Here, the authors characterize a novel population of radial glial cells (Lmx1a and Lhx2 double-positive) that may explain increased Wnt ligand secretion in rats. However, functional validation of this mechanism is still lacking. To strengthen its evolutionary relevance, it would be important to determine whether similar effects occur during earlier neural stages in other species (such as neuroepithelium thickening), or whether other cell types have co-opted the proposed Lmx1a-Lhx2 regulatory module in other species.

      From the perspective of a researcher with a stem cell and developmental background focused on neural evo-devo, this manuscript represents a solid and novel contribution. The proposed model of a rat-specific mechanism for extending the neurogenic phase contrasts with the prevailing concept of convergence in mechanisms underlying species-specific cortical development. This raises intriguing questions about how multiple molecular pathways have been co-opted to achieve similar developmental outcomes. Furthermore, we know very little about what determines the duration of specific developmental processes. This work suggests that extended Wnt signaling may account for prolonged neurogenesis in rats compared to mice. Future studies should aim to validate the proposed rat-specific co-option of an Lmx1a-Wnt ligand cascade in cortical radial glia, potentially through relief of Lhx2-mediated repression of Lmx1a.

    1. Calendar Planners and To-Do Lists Calendar planners and to-do lists are effective ways to organize your time. Many types of academic planners are commercially available (check your college bookstore), or you can make your own. Some people like a page for each day, and some like a week at a time. Some use computer calendars and planners. Almost any system will work well if you use it consistently. Some college students think they don’t need to actually write down their schedule and daily to-do lists. They’ve always kept it in their head before, so why write it down in a planner now? Some first-year students were talking about this one day in a study group, and one bragged that she had never had to write down her calendar because she never forgot dates. Another student reminded her how she’d forgotten a preregistration date and missed taking a course she really wanted because the class was full by the time she went online to register. “Well,” she said, “except for that time, I never forget anything!” Of course, none of us ever forgets anything—until we do. Calendars and planners help you look ahead and write in important dates and deadlines so you don’t forget. But it’s just as important to use the planner to schedule your own time, not just deadlines. For example, you’ll learn later that the most effective way to study for an exam is to study in several short periods over several days. You can easily do this by choosing time slots in your weekly planner over several days that you will commit to studying for this test. You don’t need to fill every time slot, or to schedule every single thing that you do, but the more carefully and consistently you use your planner, the more successfully will you manage your time. But a planner cannot contain every single thing that may occur in a day. We’d go crazy if we tried to schedule every telephone call, every e-mail, every bill to pay, every trip to the grocery store. For these items, we use a to-do list, which may be kept on a separate page in the planner. Check the example of a weekly planner form in Figure 2.5 “Weekly Planner”. (You can copy this page and use it to begin your schedule planning. By using this first, you will find out whether these time slots are big enough for you or whether you’d prefer a separate planner page for each day.) Fill in this planner form for next week. First write in all your class meeting times; your work or volunteer schedule; and your usual hours for sleep, family activities, and any other activities at fixed times. Don’t forget time needed for transportation, meals, and so on. Your first goal is to find all the blocks of “free time” that are left over. Remember that this is an academic planner. Don’t try to schedule in everything in your life—this is to plan ahead to use your study time most effectively. Next, check the syllabus for each of your courses and write important dates in the planner. If your planner has pages for the whole term, write in all exams and deadlines. Use red ink or a highlighter for these key dates. Write them in the hour slot for the class when the test occurs or when the paper is due, for example. (If you don’t yet have a planner large enough for the whole term, use Figure 2.5 “Weekly Planner” and write any deadlines for your second week in the margin to the right. You need to know what’s coming next week to help schedule how you’re studying this week.)

      Calendar planners and to-do lists help students organize their time and avoid forgetting important dates. Writing schedules down is more reliable than keeping everything in your head, because everyone forgets things sometimes. Planners are not only for deadlines but also for scheduling study time in advance so work is spread out and less stressful. To-do lists are useful for smaller daily tasks that don’t fit into a planner, helping you stay organized without feeling overwhelmed.

    2. ime Management Strategies for Success Following are some strategies you can begin using immediately to make the most of your time: Prepare to be successful. When planning ahead for studying, think yourself into the right mood. Focus on the positive. “When I get these chapters read tonight, I’ll be ahead in studying for the next test, and I’ll also have plenty of time tomorrow to do X.” Visualize yourself studying well! Use your best—and most appropriate—time of day. Different tasks require different mental skills. Some kinds of studying you may be able to start first thing in the morning as you wake, while others need your most alert moments at another time. Break up large projects into small pieces. Whether it’s writing a paper for class, studying for a final exam, or reading a long assignment or full book, students often feel daunted at the beginning of a large project. It’s easier to get going if you break it up into stages that you schedule at separate times—and then begin with the first section that requires only an hour or two. Do the most important studying first. When two or more things require your attention, do the more crucial one first. If something happens and you can’t complete everything, you’ll suffer less if the most crucial work is done. If you have trouble getting started, do an easier task first. Like large tasks, complex or difficult ones can be daunting. If you can’t get going, switch to an easier task you can accomplish quickly. That will give you momentum, and often you feel more confident tackling the difficult task after being successful in the first one. If you’re feeling overwhelmed and stressed because you have too much to do, revisit your time planner. Sometimes it’s hard to get started if you keep thinking about other things you need to get done. Review your schedule for the next few days and make sure everything important is scheduled, then relax and concentrate on the task at hand. If you’re really floundering, talk to someone. Maybe you just don’t understand what you should be doing. Talk with your instructor or another student in the class to get back on track. Take a break. We all need breaks to help us concentrate without becoming fatigued and burned out. As a general rule, a short break every hour or so is effective in helping recharge your study energy. Get up and move around to get your blood flowing, clear your thoughts, and work off stress. Use unscheduled times to work ahead. You’ve scheduled that hundred pages of reading for later today, but you have the textbook with you as you’re waiting for the bus. Start reading now, or flip through the chapter to get a sense of what you’ll be reading later. Either way, you’ll save time later. You may be amazed how much studying you can get done during downtimes throughout the day. Keep your momentum. Prevent distractions, such as multitasking, that will only slow you down. Check for messages, for example, only at scheduled break times. Reward yourself. It’s not easy to sit still for hours of studying. When you successfully complete the task, you should feel good and deserve a small reward. A healthy snack, a quick video game session, or social activity can help you feel even better about your successful use of time. Just say no. Always tell others nearby when you’re studying, to reduce the chances of being interrupted. Still, interruptions happen, and if you are in a situation where you are frequently interrupted by a family member, spouse, roommate, or friend, it helps to have your “no” prepared in advance: “No, I really have to be ready for this test” or “That’s a great idea, but let’s do it tomorrow—I just can’t today.” You shouldn’t feel bad about saying no—especially if you told that person in advance that you needed to study. Have a life. Never schedule your day or week so full of work and study that you have no time at all for yourself, your family and friends, and your larger life. Use a calendar planner and daily to-do list. We’ll look at these time management tools in the next section.

      The main idea of “Time Management Strategies for Success” is that managing your time well is about working smarter, not just harder. This section gives practical, realistic strategies students can use right away to stay productive, reduce stress, and avoid procrastination—while still having a life.

      In simple terms, it teaches you how to:

      Plan ahead with a positive mindset, so studying feels less stressful and more motivating.

      Use your energy wisely by doing tasks at the time of day when you focus best.

      Break big tasks into smaller, manageable pieces to avoid feeling overwhelmed.

      Set priorities, so the most important work gets done first.

      Build momentum by starting with easier tasks when motivation is low.

      Stay flexible by reviewing your schedule when things feel out of control.

      Ask for help when needed, instead of staying stuck and confused.

      Take regular breaks to avoid burnout and stay mentally fresh.

      Use small pockets of free time during the day to get work done early.

      Avoid distractions, especially multitasking, to keep your focus strong.

      Reward yourself after completing tasks to stay motivated.

      Learn to say no to interruptions without feeling guilty.

      Balance work and life, making time for rest, friends, and personal well-being.

      Use planners and to-do lists to stay organized and on track.

    1. Thinking helps in many situations, as we’ve discussed throughout this chapter. When we work out a problem or situation systematically, breaking the whole into its component parts for separate analysis, to come to a solution or a variety of possible solutions, we call that analytical thinking. Characteristics of analytical thinking include setting up the parts, using information literacy, and verifying the validity of any sources you reference. While the phrase analytical thinking may sound daunting, we actually do this sort of thinking in our everyday lives when we brainstorm, budget, detect patterns, plan, compare, work puzzles, and make decisions based on multiple sources of information. Think of all the thinking that goes into the logistics of a dinner-and-a-movie date—where to eat, what to watch, who to invite, what to wear, popcorn or candy—when choices and decisions are rapid-fire, but we do it relatively successfully all the time.

      I like that the reading shows analytical thinking isn’t just for school or work, it’s something we practice all the time in normal life.

    1. Author response:

      We thank all reviewers for their comments. We appreciate the acknowledgement that the paper is important and that results support the major conclusions. We are planning to address the specific concerns as noted by the reviewers in the following way:

      Public Reviews:

      Reviewer #2 (Public review):

      (1) The authors generate a new tool, a Gal4 knock-in of the jam2b locus, to track EGFP-expressing cells over time and follow the developmental trajectory of jam2b-expressing cells. Figure 1 characterizes the line. However, it lacks quantification, e.g., how many etv2-expressing cells also show EGFP expression or the contribution of EGFP-expressing cells to different types of blood vessels. This type of quantification would be useful, as it would also allow for comparison of their findings to their previous data examining the contribution of SVF cells to different types of blood vessels. All the authors state that at 30 hpf, EGFP-expressing cells can be seen in the vasculature (apparently the PCV).

      It is not clear why the authors do not use a nuclear marker for both ECs (as they did in their previous publication) and for jam2b-expressing cells. UAS:nEGFP and UAS:NLS-mcherry (e.g. pt424tg) transgenic lines are available. This would circumvent the problem the authors encounter with the strong fluorescence visible in the yolk extension. It would also facilitate quantifying the contribution of jam2b cells to different types of blood vessels.

      We agree with the importance of quantification. We had performed quantification of jam2b<sup>Gt(2A-Gal4)</sup>;UAS:GFP contribution to different vascular beds, which was shown in Suppl. Fig. S3. We will clarify this in the revision. We also agree that nuclear GFP or mCherry would help to visualize and quantify cells. Unfortunately, we do not have nuclear UAS:GFP or UAS:mCherry line in our possession, and it will take too long to import it for the standard revision timeline. We are working on the construct, and will attempt to establish the line; therefore we are hoping to clarify these results with the nuclear line in the revised manuscript.

      (2) The time-lapse movie in Figure 2 is not very informative, as it just provides a single example of a dividing cell contributing to the PCV. Also, quantifications are needed. As SVF cells appear to expand significantly after their initial specification, it would be informative to know how many cell divisions and which types of blood vessels jam2b-expressing cells contribute to. Can the authors observe cells that give rise to different types of blood vessels? Jam2b expression in LPM cells apparently precedes expression of etv2. Is etv2 needed for maintenance, or do Jam2b-expressing cells contribute to different types of tissues in etv2 mutant embryos? Comparing time-lapse analysis in wildtype and etv2 mutant embryos would address this question.

      The time-lapse was meant to serve as an illustration and confirmation of jam2b cell contribution to vasculature. As noted above, Suppl. Fig. S3 provides quantification of jam2b cell contribution to different vascular beds. We had previously performed detailed time-lapse analysis and quantification of SVF cell migration to PCV, SIA and SIV using etv2-2A-Venus line (Metikala et al 2022, Dev Cell), which has some of the same (or similar) information. It is very challenging to obtain this data using jam2b reporter line due to extensive and bright GFP expression in the mesothelial layer over the yolk and yolk extension; for that reason we can only trace some GFP cells but not all of them. Regarding etv2 requirement for jam2b maintenance, we intend to address this question by analyzing jam2b cell contribution in etv2 MO injected embryos, which recapitulates the phenotype in jam2b mutants.

      (3) In Figure 3, the authors generate UAS:Cre and UAS:Cre-ERT2 transgenic lines to lineage trace the jam2b-expressing cells. It is again not clear why the authors do not use a responder line containing nuclear-localized fluorescent proteins to circumvent the strong expression of fluorescent proteins in the yolk extension. It is also unclear why the two transgenic lines give very different results regarding the number of cells being labelled. The ERT2 fusions label around 3 cells in the SIA, while the Cre line labels only about 1.5 cells per embryo, with very little contribution of labelled cells to other blood vessels. One would expect the Cre line requiring tamoxifen induction to label fewer cells when compared to the constitutive Cre line. What is the reason for this discrepancy? Are the lines single integration? Is there silencing? This needs to be better characterized, also regarding the reproducibility of the experiments. If the Cre lines were to be multiple copy integrations, outcrossing the line might lead to lower expression levels in future generations. 

      It is also not clear how the authors conclude from these findings that "SVF cells show major contribution to the SIA and SIV" when only 1.5 or 3 cells of the SIA are labelled, with even fewer cells labelled in other blood vessels. They speculate that this might be due to low recombination efficiency, a question they then set out to answer using photoconversion of etv2:KAEDE expressing cells, an experiment that they also performed in their 2014 and 2022 publications. To check for low recombination efficiency, the authors could examine the expression of Cre mRNA in their transgenic embryos. Do many more jam2b expressing cells express Cre mRNA than they observe in their switch lines? They could also compare their experiments using Cre recombinase with those using EGFP expression in jam2b cells. EGFP is relatively stable, and the time frames the authors analyze are short. As no quantification of EGFP-expressing cells is provided in Figure 1, this comparison is currently not possible. Do these two different approaches answer different questions here? 

      The reviewer brings up important points, we appreciate that. Unfortunately, we do not have a nuclear switch line in our possession, and it is not possible to obtain it in the normal manuscript revision time line. Regarding UAS:Cre and UAS:CreERT2 lines, they both show rather similar labeling, with most labeled cells present in the SIA. The difference in cell number (1.5 versus 3) is likely due to different levels of Cre expression, which may vary dependent on the integration site. The lines most likely are multi-copy integrations, which can be helpful, as this would result in higher Cre expression. We will address the silencing question by performing in situ hybridization or HCR analysis for Cre or CreERT2 and comparing it with endogenous jam2b expression, as the reviewer suggested. We have noticed that the switch line used, actb2:loxP-BFP-loxP-dsRed, exhibits lower recombination frequency compared to other switch lines (we used it because it was compatible with endothelial fli1:GFP line). We will attempt to answer this question by crossing to other switch lines, which may exhibit higher recombination frequency. In principle, UAS:GFP and switch lines should produce a similar result, except that GFP decays over time and therefore our initial expectation was that switch lines may produce a more accurate result. However, this may not be the case due to low recombination efficiency, which we will attempt to address in the revision.

      (4) Concerning the etv2:KAEDE photoconversion experiments: The percentages the authors report for SVF cells' contribution to the SIV and SIA differ from their previous study (Dev Cell, 2022). In that publication, SVF cells contributed 28% to the SIA and 48% to the SIV. In the present study, the numbers are close to 80% for both vessels. The difference is that the previous study analyzed 2dpf old embryos and the new one 4dpf old embryos. Do SVF-derived cells proliferate more than PCV-derived cells, or is there another explanation for this change in percentage contribution? 

      These numbers refer to different experiments; we apologize for the confusion. As reported earlier in Metikala et 2022, 28% of SVF cells contributed to the SIA and 48% to the SIV by 3 dpf (not 2 dpf; only PCV analysis was done at 2 dpf); SIA and SIV analysis was done based on time-lapse image analysis of etv2-2A-Venus line at 3 dpf, shown in Fig. 3C in Metikala et al. However, this only refers to SVF cell contribution. It does not mean that 28% or 48% cells in SIA or SIV are derived from SVF. The total fraction of SIA and SIV cells that are derived from SVF has not been quantified in the previous study, because that would require accurate tracking of all SVF cells, which is experimentally challenging. Etv2:Kaede experiment is slighly different, because it reports newly formed cells after 24 hpf. It cannot tell if new cells are all derived from SVF cells, although we are not aware of any other source of new endothelial cells at these stages. In the previous study by Metikala et al 2022, we reported ~22 newly formed SIA and ~50 newly formed cells in SIV by 3 dpf (Fig. 1 in Metikala et al 2022), although the entire number of cells was not quantified, therefore the percentage was not known. In the current study, we attempted to estimate the entire percentage of green only Kaede cells, which was close to 80% in both SIA or SIV at 4 dpf. Please note that this estimate was performed in the posterior portion of SIA and SIV that overlies the yolk extension and where SVF cells are observed. We did not quantify cells in the anterior SIV portion, which forms the basket over the yolk.

      (5) Single-cell sequencing data: Why do the authors not show jam2b expression in their single-cell sequencing data? They sorted for (presumably) jam2b-expressing cells and hypothesize that jam2b expression in ECs at this time point is important for the generation of intestinal vasculature. Do ECs in cluster 15 express jam2b? Why are no other top marker genes (tal1, etv2, egfl7, npas4l) included in the dot blot in Figure 5b?

      We appreciate the suggestion and will include additional marker genes as well as jam2b in the revised version of the manuscript.

      (6) Concerns about cell autonomy of mutant phenotypes: The authors need to perform in situ hybridization to characterize jam2a expression. Can it be seen in SVF cells? The double mutants show a clear phenotype in intestinal vessel development; however, it is unclear whether this is due to a cell-autonomous function of jam2a/b within SVF cells. The authors need to address this issue, as jam2b and potentially also jam2a are expressed within the tissue surrounding the forming SVF. For instance, do transplanted mutant cells contribute to the intestinal vasculature to the same extent as wild-type cells do?

      jam2a expression has been characterized in the previous studies and it is shown in the Suppl. Fig. S4E. It is primarily enriched in the skeletal muscle. However, our single-cell RNA-seq analysis shows that SVF cells also express jam2a. We will include additional data on jam2a expression in the revised manuscript. We agree that transplation to address cell autonomy is an important experiment, yet there are some practical challenges to it. Jam2a,jam2b mutant phenotype is only partially penetrant, and about 50% reduction in SVF cell number, as well as partial SIA and SIV phenotypes are observed. Only a small number of transplanted cells may contribute to intestinal vasculature, therefore it may be challenging to see the differences, given the partial penetrance. In an attempt to address cell -autonomy question, we will try a different approach. We will overexpress jam2b labeled with 2A-mCherry, and test if it can rescue the mutant phenotype in cell autonomous manner. Overexpression will be done in a mosaic manner, with higher number of cells labeled than in a typical transplantation experiment.

      (7) Finally, the authors analyze the phenotypes of hand2 mutants and their impact on the expression of jam2b and etv2. They observe a reduction in jam2b and etv2 expression in SVF cells. However, they do not show the vascular phenotypes of hand2 mutants. Is the formation of the SIA and SIV disturbed? Is hand2 cell autonomously needed in ECs? The authors suggest that hand2 controls SVF development through the regulation of jam2b. However, they also show that jam2b mutants do not have a phenotype on their own. Clearly, hand2, if it were to be required in ECs, regulates other genes important for SVF development. These might then regulate jam2b expression. The clear linear relationship, as the title suggests, is not convincingly shown by the data.

      As suggested, we will add the analysis of SIA and SIV in hand2 mutants during the revision process. We could not assess that easily because the line was not maintained in vascular fli1:GFP background. We do not know if hand2 is required cell-autonomously. This is an important question, but it may be answered better in a separate study. Regarding hand2-jam2b axis, it is very clear that jam2b expression in the posterior lateral plate mesoderm is completely lost in hand2 mutants, except for its more anterior domain over the yolk. This does support the idea that hand2 functions upstream of jam2b. However, the relationship may not be necessarily direct. We agree that hand2 may regulate additional genes involved in SVF cell development. We will attempt to clarify this relationship and test if jam2b overexpression may rescue hand2 mutant phenotype.

      Reviewer #3 (Public review):

      (1) Overall molecular mechanisms of Jam2 function are not fully uncovered in the study. How do the adhesion molecules Jam2a and Jam2b regulate SVF cell formation? Are they responsible for migration, adhesion or fate determination of these structures? The authors should provide a more in-depth study of the jam2a, jam2b mutations and assess the processes affected in these mutants. Combining these mutants with etv2:Kaede can also provide a stronger causative link between their functions and defects in SVF formation.

      Our data argue that the initial SVF cell specification (based on etv2 expression) is reduced in jam2a;jam2b mutants. We do not know if the migration or fate determination of the remaining SVF cells is also affected, although this may be more challenging to answer, as there are only few SVF cells remaining. We agree that further mechanistic studies of jam2a,jam2b function are needed. However, we think that this would be better addressed in a separate study. We are currently raising mutants crossed into fli1:Kaede line, which should confirm that there are fewer new cells that emerge after Kaede photoconversion in jam2a,jam2b mutants.

      (2) Have the authors tested the specificity of the jam2b knock-in reporter line? This is an important experiment, as many of the conclusions derive from lineage tracing and fluorescence reporting from this knock-in line. One suggestion is to cross the jam2b:GFP or jam2b:Gal4, UAS:GFP line to the generated jam2b mutants, and examine the expression pattern of these lines. Considering that the ISH experiment showed lack of jam2b expression, the reporter line should not be expressed in the jam2b mutants.

      We show in Suppl. Fig. 2 that jam2b<sup>Gt(2A-Gal4)</sup>;UAS:GFP knock-in line has similar expression pattern as jam2b mRNA by in situ hybridization, which argues for its specificity. In the revision, we plan to use HCR analysis to confirm than jam2b mRNA is expressed in the same cells as jam2b<sup>Gt(2A-Gal4)</sup>;UAS:GFP, as an additional evidence for its specificity. Unfortunately, it is not feasible to cross jam2b knock-in line into jam2b mutants, as suggested by the reviewer. Because jam2b knock-in line targets the endogenous jam2b genomic locus, which is very close in the genome to jam2b promoter deletion in jam2b mutants, the recombination frequency would be very low, and we would not get double jam2b knock-in and knock-out events in the same chromosome.

      (3) The rationale behind the regeneration study is not clear, and the mechanisms underlying the phenotype are not well described. How do the authors explain the phenotype with the impaired regeneration, and what is the significance of this finding as it relates to SVF formation and function? 

      We apologize for this omission. This experiment was more thouroughly described in our previous study by Metikala et al 2022. In that study we showed that when endothelial cells are ablated by treating with MTZ from 6 to 45 hpf, this results in ablation of all vascular endothelial cells except for SVF cells, because they originate later than other cells. We subsequently showed that these SVF cells can partially form PCV and intestinal vasculature, helping them regenerate, which was confirmed by time-lapse imaging. In the current study, we tested if jam2a; jam2b double mutants show defects in such vascular regeneration. Indeed, regeneration after cell ablation was reduced, which correlated with reduction in SVF cell number. This argues that jam2a/b function is required for SVF cell emergence and vascular recovery after endothelial cell ablation. We will provide better description of this experiment and discuss interpretations in the revised manuscript.

      (4) The authors need to include representative images of jam2b>CreERT2 with 4-OH activation at different timepoints in Figure 3.

      Yes, thanks for noting this; these images will be included in the revised manuscript.

      (5) The etv2:Kaede photoconversion experiment to show that the majority of intestinal vasculature derives after 24 hours needs to be supplemented with additional data on photoconverted post-24-hour-old endothelial cells, with the expectation that the majority of intestinal endothelial cells at 4 days will then be labeled with red Kaede. In addition, there have been data that show the red Kaede protein is not stable past several days in vivo, and 3 days might be sufficient for the removal or degradation of this photoconverted protein. Thus, the statement that intestinal vasculature forms largely by new vasculogenesis might be too strong based on existing data.

      It is apparent from Fig. 4B that many other vessels, such as the dorsal aorta and many intersegmental vessels show robust red Kaede expression at 4 dpf, arguing that there is sufficient photoconverted Kaede present at this stage, and its degradation is unlikely to be the reason. However, we are planning to include additional control experiments, as suggested by the reviewer, to make this argument stronger.

      (6) To strengthen the claim that hand2 acts upstream of jam2b, the authors can perform combinatorial genetic epistatic analysis and examine whether jam2b mutations worsen hand2 homozygous or heterozygous effects on the SVF. Similarly, overexpressing jam2b might rescue the loss of SVF/etv2 expression in hand2 mutants. 

      We appreciate this suggestion. Double epistatic analysis, while informative, can be tricky. In this case, we are dealing with jam2a; jam2b redundancy and also the maternal effect. It may take a while considerable effort to generate different combinations of tripple mutant lines (jam2a,jam2b,hand2), and it is unclear whether double or tripple heterozygous embryos will show any defects to clarify their epistatic relationship. Instead, as suggested, we are planning to overexpress jam2b in wild-type and hand2 mutants to address this point.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Wu and colleagues aimed to explain previous findings that adolescents, compared to adults, show reduced cooperation following cooperative behaviour from a partner in several social scenarios. The authors analysed behavioural data from adolescents and adults performing a zero-sum Prisoner's Dilemma task and compared a range of social and non-social reinforcement learning models to identify potential algorithmic differences. Their findings suggest that adolescents' lower cooperation is best explained by a reduced learning rate for cooperative outcomes, rather than differences in prior expectations about the cooperativeness of a partner. The authors situate their results within the broader literature, proposing that adolescents' behaviour reflects a stronger preference for self-interest rather than a deficit in mentalising.

      Strengths:

      The work as a whole suggests that, in line with past work, adolescents prioritise value accumulation, and this can be, in part, explained by algorithmic differences in weighted value learning. The authors situate their work very clearly in past literature, and make it obvious the gap they are testing and trying to explain. The work also includes social contexts that move the field beyond non-social value accumulation in adolescents. The authors compare a series of formal approaches that might explain the results and establish generative and modelcomparison procedures to demonstrate the validity of their winning model and individual parameters. The writing was clear, and the presentation of the results was logical and wellstructured.

      We thank the reviewer for recognizing the strengths of our work.

      Weaknesses:

      (Q1) I also have some concerns about the methods used to fit and approximate parameters of interest. Namely, the use of maximum likelihood versus hierarchical methods to fit models on an individual level, which may reduce some of the outliers noted in the supplement, and also may improve model identifiability.

      We thank the reviewer for this suggestion. Following the comment, we added a hierarchical Bayesian estimation. We built a hierarchical model with both group-level (adolescent group and adult group) and individual-level structures for the best-fitting model. Four Markov chains with 4,000 samples each were run, and the model converged well (see Figure supplement 7)

      We then analyzed the posterior parameters for adolescents and adults separately. The results were consistent with those from the MLE analysis (see Figure 2—figure supplement 5). These additional results have been included in the Appendix Analysis section (also see Figure supplement 5 and 7). In addition, we have updated the code and provided the link for reference. We appreciate the reviewer’s suggestion, which improved our analysis.

      (Q2) There was also little discussion given the structure of the Prisoner's Dilemma, and the strategy of the game (that defection is always dominant), meaning that the preferences of the adolescents cannot necessarily be distinguished from the incentives of the game, i.e. they may seem less cooperative simply because they want to play the dominant strategy, rather than a lower preferences for cooperation if all else was the same.

      We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma.

      However, our computational modeling explicitly addressed this possibility. Model 4 (inequality aversion) captures decisions that are driven purely by self-interest or aversion to unequal outcomes, including a parameter reflecting disutility from advantageous inequality, which represents self-oriented motives. If participants’ behavior were solely guided by the payoff-dominant strategy, this model should have provided the best fit. However, our model comparison showed that Model 5 (social reward) performed better in both adolescents and adults, suggesting that cooperative behavior is better explained by valuing social outcomes beyond payoff structures.

      Besides, if adolescents’ lower cooperation is that they strategically respond to the payoff structure by adopting defection as the more rewarding option. Then, adolescents should show reduced cooperation across all rounds. Instead, adolescents and adults behaved similarly when partners defected, but adolescents cooperated less when partners cooperated and showed little increase in cooperation even after consecutive cooperative responses. This pattern suggests that adolescents’ lower cooperation cannot be explained solely by strategic responses to payoff structures but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded our Discussion to acknowledge this important point and to clarify how the behavioral and modeling results address the reviewer’s concern.

      “Overall, these findings indicate that adolescents’ lower cooperation is unlikely to be driven solely by strategic considerations, but may instead reflect differences in the valuation of others’ cooperation or reduced motivation to reciprocate. Although defection is the payoffdominant strategy in the Prisoner’s Dilemma, the selective pattern of adolescents’ cooperation and the model comparison results indicate that their reduced cooperation cannot be fully explained by strategic incentives, but rather reflects weaker valuation of social reciprocity.”

      Appraisal & Discussion:

      (Q3) The authors have partially achieved their aims, but I believe the manuscript would benefit from additional methodological clarification, specifically regarding the use of hierarchical model fitting and the inclusion of Bayes Factors, to more robustly support their conclusions. It would also be important to investigate the source of the model confusion observed in two of their models.

      We thank the reviewer for this comment. In the revised manuscript, we have clarified the hierarchical Bayesian modeling procedure for the best-fitting model, including the group- and individual-level structure and convergence diagnostics. The hierarchical approach produced results that fully replicated those obtained from the original maximumlikelihood estimation, confirming the robustness of our findings. Please also see the response to Q1.

      Regarding the model confusion between the inequality aversion (Model 4) and social reward (Model 5) models in the model recovery analysis, both models’ simulated behaviors were best captured by the baseline model. This pattern arises because neither model includes learning or updating processes. Given that our task involves dynamic, multi-round interactions, models lacking a learning mechanism cannot adequately capture participants’ trial-by-trial adjustments, resulting in similar behavioral patterns that are better explained by the baseline model during model recovery. We have added a clarification of this point to the Results:

      “The overlap between Models 4 and 5 likely arises because neither model incorporates a learning mechanism, making them less able to account for trial-by-trial adjustments in this dynamic task.”

      (Q4) I am unconvinced by the claim that failures in mentalising have been empirically ruled out, even though I am theoretically inclined to believe that adolescents can mentalise using the same procedures as adults. While reinforcement learning models are useful for identifying biases in learning weights, they do not directly capture formal representations of others' mental states. Greater clarity on this point is needed in the discussion, or a toning down of this language.

      We sincerely thank the reviewer for this professional comment. We agree that our prior wording regarding adolescents’ capacity to mentalise was somewhat overgeneralized. Accordingly, we have toned down the language in both the Abstract and the Discussion to better align our statements with what the present study directly tests. Specifically, our revisions focus on adolescents’ and adults’ ability to predict others’ cooperation in social learning. This is consistent with the evidence from our analyses examining adolescents’ and adults’ model-based expectations and self-reported scores on partner cooperativeness (see Figure 4). In the revised Discussion, we state:

      “Our results suggest that the lower levels of cooperation observed in adolescents stem from a stronger motive to prioritize self-interest rather than a deficiency in predicting others’ cooperation in social learning”.

      (Q5) Additionally, a more detailed discussion of the incentives embedded in the Prisoner's Dilemma task would be valuable. In particular, the authors' interpretation of reduced adolescent cooperativeness might be reconsidered in light of the zero-sum nature of the game, which differs from broader conceptualisations of cooperation in contexts where defection is not structurally incentivised.

      We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma. However, our behavioral and computational evidence suggests that this pattern cannot be explained solely by strategic responses to payoff structures, but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded the Discussion to acknowledge this point and to clarify how both behavioral and modeling results address the reviewer’s concern (see also our response to Q2).

      (Q6) Overall, I believe this work has the potential to make a meaningful contribution to the field. Its impact would be strengthened by more rigorous modelling checks and fitting procedures, as well as by framing the findings in terms of the specific game-theoretic context, rather than general cooperation.

      We thank the reviewer for the professional comments, which have helped us improve our work.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates age-related differences in cooperative behavior by comparing adolescents and adults in a repeated Prisoner's Dilemma Game (rPDG). The authors find that adolescents exhibit lower levels of cooperation than adults. Specifically, adolescents reciprocate partners' cooperation to a lesser degree than adults do. Through computational modeling, they show that this relatively low cooperation rate is not due to impaired expectations or mentalizing deficits, but rather a diminished intrinsic reward for reciprocity. A social reinforcement learning model with asymmetric learning rate best captured these dynamics, revealing age-related differences in how positive and negative outcomes drive behavioral updates. These findings contribute to understanding the developmental trajectory of cooperation and highlight adolescence as a period marked by heightened sensitivity to immediate rewards at the expense of long-term prosocial gains.

      Strengths:

      (1) Rigid model comparison and parameter recovery procedure.

      (2) Conceptually comprehensive model space.

      (3) Well-powered samples.

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses:

      (Q1) A key conceptual distinction between learning from non-human agents (e.g., bandit machines) and human partners is that the latter are typically assumed to possess stable behavioral dispositions or moral traits. When a non-human source abruptly shifts behavior (e.g., from 80% to 20% reward), learners may simply update their expectations. In contrast, a sudden behavioral shift by a previously cooperative human partner can prompt higher-order inferences about the partner's trustworthiness or the integrity of the experimental setup (e.g., whether the partner is truly interactive or human). The authors may consider whether their modeling framework captures such higher-order social inferences. Specifically, trait-based models-such as those explored in Hackel et al. (2015, Nature Neuroscience)-suggest that learners form enduring beliefs about others' moral dispositions, which then modulate trial-bytrial learning. A learner who believes their partner is inherently cooperative may update less in response to a surprising defection, effectively showing a trait-based dampening of learning rate.

      We thank the reviewer for this thoughtful comment. We agree that social learning from human partners may involve higher-order inferences beyond simple reinforcement learning from non-human sources. To address this, we had previously included such mechanisms in our behavioral modeling. In Model 7 (Social Reward Model with Influence), we tested a higher-order belief-updating process in which participants’ expectations about their partner’s cooperation were shaped not only by the partner’s previous choices but also by the inferred influence of their own past actions on the partner’s subsequent behavior. In other words, participants could adjust their belief about the partner’s cooperation by considering how their partner’s belief about them might change. Model comparison showed that Model 7 did not outperform the best-fitting model, suggesting that incorporating higher-order influence updates added limited explanatory value in this context. As suggested by the reviewer, we have further clarified this point in the revised manuscript.

      Regarding trait-based frameworks, we appreciate the reviewer’s reference to Hackel et al. (2015). That study elegantly demonstrated that learners form relatively stable beliefs about others’ social dispositions, such as generosity, especially when the task structure provides explicit cues for trait inference (e.g., resource allocations and giving proportions). By contrast, our study was not designed to isolate trait learning, but rather to capture how participants update their expectations about a partner’s cooperation over repeated interactions. In this sense, cooperativeness in our framework can be viewed as a trait-like latent belief that evolves as evidence accumulates. Thus, while our model does not include a dedicated trait module that directly modulates learning rates, the belief-updating component of our best-fitting model effectively tracks a dynamic, partner-specific cooperativeness, potentially reflecting a prosocial tendency.

      (Q2) This asymmetry in belief updating has been observed in prior work (e.g., Siegel et al., 2018, Nature Human Behaviour) and could be captured using a dynamic or belief-weighted learning rate. Models incorporating such mechanisms (e.g., dynamic learning rate models as in Jian Li et al., 2011, Nature Neuroscience) could better account for flexible adjustments in response to surprising behavior, particularly in the social domain.

      We thank the reviewer for the suggestion. Following the comment, we implemented an additional model incorporating a dynamic learning rate based on the magnitude of prediction errors. Specifically, we developed Model 9:  Social reward model with Pearce–Hall learning algorithm (dynamic learning rate), in which participants’ beliefs about their partner’s cooperation probability are updated using a Rescorla–Wagner rule with a learning rate dynamically modulated by the Pearce–Hall (PH) Error Learning mechanism. In this framework, the learning rate increases following surprising outcomes (larger prediction errors) and decreases as expectations become more stable (see Appendix Analysis section for details).

      The results showed that this dynamic learning rate model did not outperform our bestfitting model in either adolescents or adults (see Figure supplement 6). We greatly appreciate the reviewer’s suggestion, which has strengthened the scope of our analysis. We now have added these analyses to the Appendix Analysis section (also Figure Supplement 6) and expanded the Discussion to acknowledge this modeling extension and further discuss its implications.

      (Q3) Second, the developmental interpretation of the observed effects would be strengthened by considering possible non-linear relationships between age and model parameters. For instance, certain cognitive or affective traits relevant to social learning-such as sensitivity to reciprocity or reward updating-may follow non-monotonic trajectories, peaking in late adolescence or early adulthood. Fitting age as a continuous variable, possibly with quadratic or spline terms, may yield more nuanced developmental insights.

      We thank the reviewer for this professional comment. In addition to the linear analyses, we further conducted exploratory analyses to examine potential non-linear relationships between age and the model parameters. Specifically, we fit LMMs for each of the four parameters as outcomes (α+, α-, β, and ω). The fixed effects included age, a quadratic age term, and gender, and the random effects included subject-specific random intercepts and random slopes for age and gender. Model comparison using BIC did not indicate improvement for the quadratic models over the linear models for α<sup>+</sup> (ΔBIC<sub>quadratic-linear</sub> = 5.09), α<sup>-</sup>(ΔBIC<sub>quadratic-linear</sub> = 3.04), β (ΔBIC<sub>quadratic-linear</sub> = 3.9), or ω (ΔBIC<sub>quadratic-linear</sub>= 0). Moreover, the quadratic age term was not significant for α<sup>+</sup>, α<sup>−</sup>, or β (all ps > 0.10). For ω, we observed a significant linear age effect (b = 1.41, t = 2.65, p = 0.009) and a significant quadratic age effect (b = −0.03, t = −2.39, p = 0.018; see Author response image 1). This pattern is broadly consistent with the group effect reported in the main text. The shaded area in the figure represents the 95% confidence interval. As shown, the interval widens at older ages (≥ 26 years) due to fewer participants in that range, which limits the robustness of the inferred quadratic effect. In consideration of the limited precision at older ages and the lack of BIC improvement, we did not emphasize the quadratic effect in the revised manuscript and present these results here as exploratory.

      Author response image 1.

      Linear and quadratic model fits showing the relationship between age and the ω parameter, with 95% confidence intervals.

      (Q4) Finally, the two age groups compared - adolescents (high school students) and adults (university students) - differ not only in age but also in sociocultural and economic backgrounds. High school students are likely more homogenous in regional background (e.g., Beijing locals), while university students may be drawn from a broader geographic and socioeconomic pool. Additionally, differences in financial independence, family structure (e.g., single-child status), and social network complexity may systematically affect cooperative behavior and valuation of rewards. Although these factors are difficult to control fully, the authors should more explicitly address the extent to which their findings reflect biological development versus social and contextual influences.

      We appreciate this comment. Indeed, adolescents (high school students) and adults (university students) differ not only in age but also in sociocultural and socioeconomic backgrounds. In our study, all participants were recruited from Beijing and surrounding regions, which helps minimize large regional and cultural variability. Moreover, we accounted for individual-level random effects and included participants’ social value orientation (SVO) as an individual difference measure.

      Nonetheless, we acknowledge that other contextual factors, such as differences in financial independence, socioeconomic status, and social experience—may also contribute to group differences in cooperative behavior and reward valuation. Although our results are broadly consistent with developmental theories of reward sensitivity and social decisionmaking, sociocultural influences cannot be entirely ruled out. Future work with more demographically matched samples or with socioeconomic and regional variables explicitly controlled will help clarify the relative contributions of biological and contextual factors. Accordingly, we have revised the Discussion to include the following statement:

      “Third, although both age groups were recruited from Beijing and nearby regions, minimizing major regional and cultural variation, adolescents and adults may still differ in socioeconomic status, financial independence, and social experience. Such contextual differences could interact with developmental processes in shaping cooperative behavior and reward valuation. Future research with demographically matched samples or explicit measures of socioeconomic background will help disentangle biological from sociocultural influences.”

      Reviewer #3 (Public review):

      Summary:

      Wu and colleagues find that in a repeated Prisoner's Dilemma, adolescents, compared to adults, are less likely to increase their cooperation behavior in response to repeated cooperation from a simulated partner. In contrast, after repeated defection by the partner, both age groups show comparable behavior.

      To uncover the mechanisms underlying these patterns, the authors compare eight different models. They report that a social reward learning model, which includes separate learning rates for positive and negative prediction errors, best fits the behavior of both groups. Key parameters in this winning model vary with age: notably, the intrinsic value of cooperating is lower in adolescents. Adults and adolescents also differ in learning rates for positive and negative prediction errors, as well as in the inverse temperature parameter.

      Strengths:

      The modeling results are compelling in their ability to distinguish between learned expectations and the intrinsic value of cooperation. The authors skillfully compare relevant models to demonstrate which mechanisms drive cooperation behavior in the two age groups.

      We thank the reviewer’s recognition of our work’s strengths.

      Weaknesses:

      (Q1) Some of the claims made are not fully supported by the data:

      The central parameter reflecting preference for cooperation is positive in both groups. Thus, framing the results as self-interest versus other-interest may be misleading.

      We thank the reviewer for this insightful comment. In the social reward model, the cooperation preference parameter is positive by definition, as defection in the repeated rPDG always yields a +2 monetary advantage regardless of the partner’s action. This positive value represents the additional subjective reward assigned to mutual cooperation (e.g., reciprocity value) that counterbalances the monetary gain from defection. Although the estimated social reward parameter ω was positive, the effective advantage of cooperation is Δ=p×ω−2. Given participants’ inferred beliefs p, Δ was negative for most trials (p×ω<2), indicating that the social reward was insufficient to offset the +2 advantage of defection. Thus, both adolescents and adults valued cooperation positively, but adolescents’ smaller ω and weaker responsiveness to sustained partner cooperation suggest a stronger weighting on immediate monetary payoffs.

      In this light, our framing of adolescents as more self-interested derives from their behavioral pattern: even when they recognized sustained partner cooperation and held high expectations of partner cooperation, adolescents showed lower cooperative behavior and reciprocity rewards compared with adults. Whereas adults increased cooperation after two or three consecutive partner cooperations, this pattern was absent among adolescents. We therefore interpret their behavior as relatively more self-interested, reflecting reduced sensitivity to the social reward from mutual cooperation rather than a categorical shift from self-interest to other-interest, as elaborated in the Discussion.

      (Q2) It is unclear why the authors assume adolescents and adults have the same expectations about the partner's cooperation, yet simultaneously demonstrate age-related differences in learning about the partner. To support their claim mechanistically, simulations showing that differences in cooperation preference (i.e., the w parameter), rather than differences in learning, drive behavioral differences would be helpful.

      We thank the reviewer for raising this important point. In our model, both adolescents and adults updated their beliefs about partner cooperation using an asymmetric reinforcement learning (RL) rule. Although adolescents exhibited a higher positive and a lower negative learning rate than adults, the two groups did not differ significantly in their overall updating of partner cooperation probability (Fig. 4a-b). We then examined the social reward parameter ω, which was significantly smaller in adolescents and determined the intrinsic value of mutual cooperation (i.e., p×ω). This variable differed significantly between groups and closely matched the behavioral pattern.

      Following the reviewer’s suggestion, we conducted additional simulations varying one model parameter at a time while holding the others constant. The difference in mean cooperation probability between adults and adolescents served as the index (positive = higher cooperation in adults). As shown in the Author response image 2, decreases in ω most effectively reproduced the observed group difference (shaded area), indicating that age-related differences in cooperation are primarily driven by variation in the social reward parameter ω rather than by others.

      Author response image 2.

      Simulation results showing how variations in each model parameter affect the group difference in mean cooperation probability (Adults – Adolescents). Based on the bestfitting Model 8 and parameters estimated from all participants, each line represents one parameter (i.e., α+, α-, ω, β) systematically varied within the tested range (α±:0.1–0.9; ω, β:1–9) while other parameters were held constant. Positive values indicate higher cooperation in adults. Smaller ω values most strongly reproduced the observed group difference, suggesting that reduced social reward weighting primarily drives adolescents’ lower cooperation.

      (Q3) Two different schedules of 120 trials were used: one with stable partner behavior and one with behavior changing after 20 trials. While results for order effects are reported, the results for the stable vs. changing phases within each schedule are not. Since learning is influenced by reward structure, it is important to test whether key findings hold across both phases.

      We thank the reviewer for this thoughtful and professional comment. In our GLMM and LMM analyses, we focused on trial order rather than explicitly including the stable vs. changing phase factor, due to concerns about multicollinearity. In our design, phases occur in specific temporal segments, which introduces strong collinearity with trial order. In multi-round interactions, order effects also capture variance related to phase transitions.

      Nonetheless, to directly address this concern, we conducted additional robustness analyses by adding a phase variable (stable vs. changing) to GLMM1, LMM1, and LMM3 alongside the original covariates. Across these specifications, the key findings were replicated (see GLMM<sub>sup</sub>2 and LMM<sub>sup</sub>4–5; Tables 9-11), and the direction and significance of main effects remained unchanged, indicating that our conclusions are robust to phase differences.

      (Q4) The division of participants at the legal threshold of 18 years should be more explicitly justified. The age distribution appears continuous rather than clearly split. Providing rationale and including continuous analyses would clarify how groupings were determined.

      We thank the reviewer for this thoughtful comment. We divided participants at the legal threshold of 18 years for both conceptual and practical reasons grounded in prior literature and policy. In many countries and regions, 18 marks the age of legal majority and is widely used as the boundary between adolescence and adulthood in behavioral and clinical research. Empirically, prior studies indicate that psychosocial maturity and executive functions approach adult levels around this age, with key cognitive capacities stabilizing in late adolescence (Icenogle et al., 2019; Tervo-Clemmens et al., 2023). We have clarified this rationale in the Introduction section of the revised manuscript.

      “Based on legal criteria for majority and prior empirical work, we adopt 18 years as the boundary between adolescence and adulthood (Icenogle et al., 2019; Tervo-Clemmens et al., 2023).”

      We fully agree that the underlying age distribution is continuous rather than sharply divided. To address this, we conducted additional analyses treating age as a continuous predictor (see GLMM<sub>sup</sub>1 and LMM<sub>sup</sub>1–3; Tables S1-S4), which generally replicated the patterns observed with the categorical grouping. Nevertheless, given the limited age range of our sample, the generalizability of these findings to fine-grained developmental differences remains constrained. Therefore, our primary analyses continue to focus on the contrast between adolescents and adults, rather than attempting to model a full developmental trajectory.

      (Q5) Claims of null effects (e.g., in the abstract: "adults increased their intrinsic reward for reciprocating... a pattern absent in adolescents") should be supported with appropriate statistics, such as Bayesian regression.

      We thank the reviewer for highlighting the importance of rigor when interpreting potential null effects. To address this concern, we conducted Bayes factor analyses of the intrinsic reward for reciprocity and reported the corresponding BF10 for all relevant post hoc comparisons. This approach quantifies the relative evidence for the alternative versus the null hypothesis, thereby providing a more direct assessment of null effects. The analysis procedure is now described in the Methods and Materials section:

      “Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”

      (Q6) Once claims are more closely aligned with the data, the study will offer a valuable contribution to the field, given its use of relevant models and a well-established paradigm.

      We are grateful for the reviewer’s generous appraisal and insightful comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I commend the authors on a well-structured, clear, and interesting piece of work. I have several questions and recommendations that, if addressed, I believe will strengthen the manuscript.

      We thank the reviewer for commending the organization of our paper.

      (2) Introduction: - Why use a zero-sum (Prisoner's Dilemma; PD) versus a mixed-motive game (e.g. Trust Task) to study cooperation? In a finite set of rounds, the dominant strategy can be to defect in a PD.

      We thank the reviewer for this helpful comment. We agree that both the rationale for using the repeated Prisoner’s Dilemma (rPDG) and the limitations of this framework should be clarified. We chose the rPDG to isolate the core motivational conflict between selfinterest and joint welfare, as its symmetric and simultaneous structure avoids the sequential trust and reputation dependencies/accumulation inherent to asymmetric tasks such as the Trust Game (King-Casas et al., 2005; Rilling et al., 2002).

      Although a finitely repeated rPDG theoretically favors defection, extensive prior research shows that cooperation can still emerge in long repeated interactions when players rely on learning and reciprocity rather than backward induction (Rilling et al., 2002; Fareri et al., 2015). Our design employed 120 consecutive rounds, allowing participants to update expectations about partner behavior and to establish stable reciprocity patterns over time. We have added the following clarification to the Introduction:

      “The rPDG provides a symmetric and simultaneous framework that isolates the motivational conflict between self-interest and joint welfare, avoiding the sequential trust and reputation dynamics characteristic of asymmetric tasks such as the Trust Game (Rilling et al., 2002; King-Casas et al., 2005)”

      (3) Methods:

      Did the participants know how long the PD would go on for?

      Were the participants informed that the partner was real/simulated?

      Were the participants informed that the partner was going to be the same for all rounds?

      We thank the reviewer for the meticulous review work, which helped us present the experimental design and reporting details more clearly. the following clarifications: I. Participants were not informed of the total number of rounds in the rPDG. This prevented endgame expectations and avoided distraction from counting rounds, which could introduce additional effects. II. Participants were told that their partner was another human participant in the laboratory. However, the partner’s behavior was predetermined by a computer program. This design enabled tighter experimental control and ensured consistent conditions across age groups, supporting valid comparisons. III. Participants were informed that they would interact with the same partner across all rounds, aligning with the essence of a multiround interaction paradigm and stabilizing partner-related expectations. For transparency, we have clarified these points in the Methods and Materials section:

      “Participants were told that their partner was another human participant in the laboratory and that they would interact with the same partner across all rounds. However, in reality, the actions of the partner were predetermined by a computer program. This setup allowed for a clear comparison of the behavioral responses between adolescents and adults. Participants were not informed of the total number of rounds in the rPDG.”

      (4) The authors mention that an SVO was also recorded to indicate participant prosociality. Where are the results of this? Did this track game play at all? Could cooperativeness be explained broadly as an SVO preference that penetrated into game-play behaviour?

      We thank the reviewer for pointing this out. We agree that individual differences in prosociality may shape cooperative behavior, so we conducted additional analyses incorporating SVO. Specifically, we extended GLMM1 and LMM3 by adding the measured SVO as a fixed effect with random slopes, yielding GLMM<sub>sup</sub>3 and LMM<sub>sup</sub>6 (Tables 12–13). The results showed that higher SVO was associated with greater cooperation, whereas its effect on the reward for reciprocity was not significant. Importantly, the primary findings remained unchanged after controlling for SVO. These results indicate that cooperativeness in our task cannot be explained solely by a broad SVO preference, although a more prosocial orientation was associated with greater cooperation. We have reported these analyses and results in the Appendix Analysis section.

      (5) Why was AIC chosen rather an BIC to compare model dominance?

      Sorry for the lack of clarification. Both the Akaike Information Criterion (AIC, Akaike, 1974) and Bayesian Information Criterion (BIC, Schwarz, 1978) are informationtheoretic criterions for model comparison, neither of which depends on whether the models to be compared are nested to each other or not (Burnham et al., 2002). We have added the following clarification into the Methods.

      “We chose to use the AICc as the metric of goodness-of-fit for model comparison for the following statistical reasons. First, BIC is derived based on the assumption that the “true model” must be one of the models in the limited model set one compares (Burnham et al., 2002; Gelman & Shalizi, 2013), which is unrealistic in our case. In contrast, AIC does not rely on this unrealistic “true model” assumption and instead selects out the model that has the highest predictive power in the model set (Gelman et al., 2014). Second, AIC is also more robust than BIC for finite sample size (Vrieze, 2012).”

      (6) I believe the model fitting procedure might benefit from hierarchical estimation, rather than maximum likelihood methods. Adolescents in particular seem to show multiple outliers in a^+ and w^+ at the lower end of the distributions in Figure S2. There are several packages to allow hierarchical estimation and model comparison in MATLAB (which I believe is the language used for this analysis;

      see https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007043).

      We thank the reviewer for this helpful comment and for referring us to relevant methodological work (Piray et al., 2019). We have addressed this point by incorporating hierarchical Bayesian estimation, which effectively mitigates outlier effects and improves model identifiability. The results replicated those obtained with MLE fitting and further revealed group-level differences in key parameters. Please see our detailed response to Reviewer#1 Q1 for the full description of this analysis and results.

      (7) Results: Model confusion seems to show that the inequality aversion and social reward models were consistently confused with the baseline model. Is this explained or investigated? I could not find an explanation for this.

      The apparent overlap between the inequality aversion (Model 4) and social reward (Model 5) models in the recovery analysis likely arises because neither model includes a learning mechanism, making them unable to capture trial-by-trial adjustments in this dynamic task. Consequently, both were best fit by the baseline model. Please see Response to Reviewer #1 Q3 for related discussion.

      (8) Figures 3e and 3f show the correlation between asymmetric learning rates and age. It seems that both a^+ and a^- are around 0.35-0.40 for young adolescents, and this becomes more polarised with age. Could it be that with age comes an increasing discernment of positive and negative outcomes on beliefs, and younger ages compress both positive and negative values together? Given the higher stochasticity in younger ages (\beta), it may also be that these values simply represent higher uncertainty over how to act in any given situation within a social context (assuming the differences in groups are true).

      We appreciate this insightful interpretation. Indeed, both α+ and α- cluster around 0.35–0.40 in younger adolescents and become increasingly polarized with age, suggesting that sensitivity to positive versus negative feedback is less differentiated early in development and becomes more distinct over time. This interpretation remains tentative and warrants further validation. Based on this comment, we have revised the Discussion to include this developmental interpretation.

      We also clarify that in our model β denotes the inverse temperature parameter; higher β reflects greater choice precision and value sensitivity, not higher stochasticity. Accordingly, adolescents showed higher β values, indicating more value-based and less exploratory choices, whereas adults displayed relatively greater exploratory cooperation. These group differences were also replicated using hierarchical Bayesian estimation (see Response to Reviewer #1 Q1). In response to this comment, we have added a statement in the Discussion highlighting this developmental interpretation.

      “Together, these findings suggest that the differentiation between positive and negative learning rates changes with age, reflecting more selective feedback sensitivity in development, while higher β values in adolescents indicate greater value sensitivity. This interpretation remains tentative and requires further validation in future research.”

      (9) A parameter partial correlation matrix (off-diagonal) would be helpful to understand the relationship between parameters in both adolescents and adults separately. This may provide a good overview of how the model properties may change with age (e.g. a^+'s relation to \beta).

      We thank the reviewer for this helpful comment. We fully agree that a parameter partial correlation matrix can further elucidate the relationships among parameters. Accordingly, we conducted a partial correlation analysis and added the visually presented results to the revised manuscript as Figure 2-figure supplement 4.

      (10) It would be helpful to have Bayes Factors reported with each statistical tests given that several p-values fall within the 0.01 and 0.10.

      We thank the reviewer for this important recommendation. We have conducted Bayes factor analyses and reported BF10 for all relevant post hoc comparisons. We also clarified our analysis in the Methods and Materials section:

      “Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”

      (11) Discussion: I believe the language around ruling out failures in mentalising needs to be toned down. RL models do not enable formal representational differences required to assess mentalising, but they can distinguish biases in value learning, which in itself is interesting. If the authors were to show that more complex 'ToM-like' Bayesian models were beaten by RL models across the board, and this did not differ across adults and adolescents, there would be a stronger case to make this claim. I think the authors either need to include Bayesian models in their comparison, or tone down their language on this point, and/or suggest ways in which this point might be more thoroughly investigated (e.g., using structured models on the same task and running comparisons: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0087619).

      We thank the reviewer for the comments. Please see our response to Reviewer 1 (Appraisal & Discussion section) for details.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors may want to show the winning model earlier (perhaps near the beginning of the Results section, when model parameters are first mentioned).

      We thank the reviewer for this suggestion. We agree that highlighting the winning model early improves clarity. Currently, we have mentioned the winning model before the beginning of the Results section. Specifically, in the penultimate paragraph of the Introduction we state:

      “We identified the asymmetric RL learning model as the winning model that best explained the cooperative decisions of both adolescents and adults.”

      Reviewer #3 (Recommendations for the authors):

      (1) In addition to the points mentioned above, I suggest the following:

      Clarify plots by clearly explaining each variable. In particular, the indices 1 vs. 1,2 vs 1,2,3 were not immediately understandable.

      We thank the reviewer for this suggestion. We agree that the indices were not immediately clear. We have revised the figure captions (Figure 1 and 4) to explicitly define these terms more clearly:

      “The x-axis represents the consistency of the partner’s actions in previous trials (t<sub>−1</sub>: last trial; t<sub>−1,2</sub>: last two trials;<sub>t−1,2,3</sub>: last three trials).”

      (2) It's unclear why the index stops at 3. If this isn't the maximum possible number of consecutive cooperation trials, please consider including all relevant data, as adolescents might show a trend similar to adults over more trials.

      We thank the reviewer for raising this point. In our exploratory analyses, we also examined longer streaks of consecutive partner cooperation or defection (up to four or five trials). Two empirical considerations led us to set the cutoff at three in the final analyses. First, the influence of partner behavior diminished sharply with temporal distance. In both GLMMs and LMMs, coefficients for earlier partner choices were small and unstable, and their inclusion substantially increased model complexity and multicollinearity. This recency pattern is consistent with learning and decision models emphasizing stronger weighting of recent evidence (Fudenberg & Levine, 2014; Fudenberg & Peysakhovich, 2016). Second, streaks longer than three were rare, especially among some participants, leading to data sparsity and inflated uncertainty. Including these sparse conditions risked biasing group estimates rather than clarifying them. Balancing informativeness and stability, we therefore restricted the index to three consecutive partner choices in the main analyses, which we believe sufficiently capture individuals’ general tendencies in reciprocal cooperation.

      (3) The term "reciprocity" may not be necessary. Since it appears to reflect a general preference for cooperation, it may be clearer to refer to the specific behavior or parameter being measured. This would also avoid confusion, especially since adolescents do show negative reciprocity in response to repeated defection.

      We thank you for this comment. In our work, we compute the intrinsic reward for reciprocity as p × ω, where p is the partner cooperation expectation and ω is the cooperation preference. In the rPDG, this value framework manifests as a reciprocity-derived reward: sustained mutual cooperation maximizes joint benefits, and the resulting choice pattern reflects a value for reciprocity, contingent on the expected cooperation of the partner. This quantity enters the trade-off between U<sub>cooperation</sub> and U<sub>defection</sub> and captures the participant’s intrinsic reward for reciprocity versus the additional monetary reward payoff of defection. Therefore, we consider the term “reciprocity” an acceptable statement for this construct.

      (4) Interpretation of parameters should closely reflect what they specifically measure.

      We thank the reviewer for pointing this out. We have refined the relevant interpretations of parameters in the current Results and Discussion sections.

      (5) Prior research has shown links between Theory of Mind (ToM) and cooperation (e.g., Martínez-Velázquez et al., 2024). It would be valuable to test whether this also holds in your dataset.

      We thank the reviewer for this thoughtful comment. Although we did not directly measure participants’ ToM, our design allowed us to estimate participants’ trial-by-trial inferences (i.e., expectations) about their partner’s cooperation probability. We therefore treat these cooperation expectations as an indirect representation for belief inference, which is related to ToM processes. To test whether this belief-inference component relates to cooperation in our dataset, we further conducted an exploratory analysis (GLMM<sub>sup</sub>4) in which participants’ choices were regressed on their cooperation expectations, group, and the group × cooperation-expectation interaction, controlling for trial number and gender, with random effects. Consistent with the ToM–cooperation link in prior research (MartínezVelázquez et al., 2024), participants’ expectations about their partner’s cooperation significantly predicted their cooperative behavior (Table 14), suggesting that decisions were shaped by social learning about others’ inferred actions. Moreover, the interaction between group and cooperation expectation was not significant, indicating that this inference-driven social learning process likely operates similarly in adolescents and adults. This aligns with our primary modeling results showing that both age groups update beliefs via an asymmetric learning process. We have reported these analyses in the Appendix Analysis section.

      (6) More informative table captions would help the reader. Please clarify how variables are coded (e.g., is female = 0 or 1? Is adolescent = 0 or 1?), to avoid the need to search across the manuscript for this information.

      We thank the reviewer for raising this point. We have added clear and standardized variable coding in the table notes of all tables to make them more informative and avoid the need to search the paper. We have ensured consistent wording and formatting across all tables.

      (7) I hope these comments are helpful and support the authors in further strengthening their manuscript.

      We thank the three reviewers for their comments, which have been helpful in strengthening this work.

      References

      (1) Fudenberg, D., & Levine, D. K. (2014). Recency, consistent learning, and Nash equilibrium. Proceedings of the National Academy of Sciences of the United States of America, 111(Suppl. 3), 10826–10829. https://doi.org/10.1073/pnas.1400987111.

      (2) Fudenberg, D., & Peysakhovich, A. (2016). Recency, records, and recaps: Learning and nonequilibrium behavior in a simple decision problem. ACM Transactions on Economics and Computation, 4(4), Article 23, 1–18. https://doi.org/10.1145/2956581

      (3) Hackel, L., Doll, B., & Amodio, D. (2015). Instrumental learning of traits versus rewards: Dissociable neural correlates and effects on choice. Nature Neuroscience, 18, 1233– 1235. https://doi.org/10.1038/nn.4080

      (4) Icenogle, G., Steinberg, L., Duell, N., Chein, J., Chang, L., Chaudhary, N., Di Giunta, L., Dodge, K. A., Fanti, K. A., Lansford, J. E., Oburu, P., Pastorelli, C., Skinner, A. T.Sorbring, E., Tapanya, S., Uribe Tirado, L. M., Alampay, L. P., Al-Hassan, S. M.,Takash, H. M. S., & Bacchini, D. (2019). Adolescents’ cognitive capacity reaches adult levels prior to their psychosocial maturity: Evidence for a “maturity gap” in a multinational, cross-sectional sample. Law and Human Behavior, 43(1), 69–85. https://doi.org/10.1037/lhb0000315

      (5) Krekelberg, B. (2024). Matlab Toolbox for Bayes Factor Analysis (v3.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.13744717

      (6) Martínez-Velázquez, E. S., Ponce-Juárez, S. P., Díaz Furlong, A., & Sequeira, H. (2024). Cooperative behavior in adolescents: A contribution of empathy and emotional regulation? Frontiers in Psychology, 15,1342458. https://doi.org/10.3389/fpsyg.2024.1342458

      (7) Tervo-Clemmens, B., Calabro, F. J., Parr, A. C., et al. (2023). A canonical trajectory of executive function maturation from adolescence to adulthood. Nature Communications, 14, 6922. https://doi.org/10.1038/s41467-023-42540-8

      (8) King-Casas, B., Tomlin, D., Anen, C., Camerer, C. F., Quartz, S. R., & Montague, P. R. (2005). Getting to know you: reputation and trust in a two-person economic exchange. Science, 308(5718), 78-83. https://doi.org/10.1126/science.1108062

      (9) Rilling, J. K., Gutman, D. A., Zeh, T. R., Pagnoni, G., Berns, G. S., & Kilts, C. D. (2002).A neural basis for social cooperation. Neuron, 35(2), 395-405. https://doi.org/10.1016/s0896-6273(02)00755-9

      (10) Fareri, D. S., Chang, L. J., & Delgado, M. R. (2015). Computational substrates of social value in interpersonal collaboration. Journal of Neuroscience, 35(21), 8170-8180. https://doi.org/10.1523/JNEUROSCI.4775-14.2015

      (11) Akaike, H. (2003). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723. https://doi.org/10.1109/TAC.1974.1100705

      (12) Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 461464. https://doi.org/10.1214/aos/1176344136

      (13) Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.https://doi.org/10.1007/b97636

      (14) Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x

      (15) Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16018

      (16) Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Psychological Methods, 17(2), 228–243. https://doi.org/10.1037/a0027127

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This work by Reitz, Z. L. et al. developed an automated tool for high-throughput identification of microbial metallophore biosynthetic gene clusters (BGCs) by integrating knowledge of chelating moiety diversity and transporter gene families. The study aimed to create a comprehensive detection system combining chelator-based and transporter-based identification strategies, validate the tool through large-scale genomic mining, and investigate the evolutionary history of metallophore biosynthesis across bacteria.

      Major strengths include providing the first automated, high-throughput tool for metallophore BGC identification, representing a significant advancement over manual curation approaches. The ensemble strategy effectively combines complementary detection methods, and experimental validation using HPLC-HRMS strengthens confidence in computational predictions. The work pioneers a global analysis of metallophore diversity across the bacterial kingdom and provides a valuable dataset for future computational modeling.

      Some limitations merit consideration. First, ground truth datasets derived from manual curation may introduce selection bias toward well-characterized systems, potentially affecting performance assessment accuracy. Second, the model's dependence on known chelating moieties and transporter families constrains its ability to detect novel metallophore architectures, limiting discovery potential in metagenomic datasets. Third, while the proposed evolutionary hypothesis is internally consistent, it lacks direct validation and remains speculative without additional phylogenetic studies.

      The authors successfully achieved their stated objectives. The tool demonstrates robust performance metrics and practical utility through large-scale application to representative genomes. Results strongly support their conclusions through rigorous validation, including experimental confirmation of predicted metallophores via HPLC-HRMS analysis.

      The work provides a significant and immediate impact by enabling the transition from labor-intensive manual approaches to automated screening. The comprehensive phylogenetic framework advances understanding of bacterial metal acquisition evolution, informing future studies on microbial metal homeostasis. Community utility is substantial, since the tool and accompanying dataset create essential resources for comparative genomics, algorithm development, and targeted experimental validation of novel metallophores.

      We thank the reviewer for their valuable feedback. We appreciate the positive words, and agree with their listed limitations. Regarding the following comment:

      “Third, while the proposed evolutionary hypothesis is internally consistent, it lacks direct validation and remains speculative without additional phylogenetic studies.”

      We agree that additional phylogenetic analyses are needed in future studies. For the revised manuscript, we have validated our evolutionary hypotheses by additionally analyzing two gene families using the likelihood-based tool AleRax, which implements a probabilistic DTL model. The results were consistent with the eMPRess parsimony-based reconstructions, showing comparable patterns of rare duplication, moderate gene loss, and extensive horizontal transfer. Both methods identified similar lineages as the most probable origin and major recipients of transfer events. This agreement between independent reconciliation frameworks supports the reliability of our evolutionary conclusions. We have added a statement referencing this cross-method validation in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study presents a systematic and well-executed effort to identify and classify bacterial NRP metallophores. The authors curate key chelator biosynthetic genes from previously characterized NRP-metallophore biosynthetic gene clusters (BGCs) and translate these features into an HMM-based detection module integrated within the antiSMASH platform.

      The new algorithm is compared with a transporter-based siderophore prediction approach, demonstrating improved precision and recall. The authors further apply the algorithm to large-scale bacterial genome mining and, through reconciliation of chelator biosynthetic gene trees with the GTDB species tree using eMPRess, infer that several chelating groups may have originated prior to the Great Oxidation Event.

      Overall, this work provides a valuable computational framework that will greatly assist future in silico screening and preliminary identification of metallophore-related BGCs across bacterial taxa.

      Strengths:

      (1) The study provides a comprehensive curation of chelator biosynthetic genes involved in NRP-metallophore biosynthesis and translates this knowledge into an HMM-based detection algorithm, which will be highly useful for the initial screening and annotation of metallophore-related BGCs within antiSMASH.

      (2) The genome-wide survey across a large bacterial dataset offers an informative and quantitative overview of the taxonomic distribution of NRP-metallophore biosynthetic chelator groups, thereby expanding our understanding of their phylogenetic prevalence.

      (3) The comparative evolutionary analysis, linking chelator biosynthetic genes to bacterial phylogeny, provides an interesting and valuable perspective on the potential origin and diversification of NRP-metallophore chelating groups.

      We greatly appreciate these comments.

      Weaknesses:

      (1) Although the rule-based HMM detection performs well in identifying major categories of NRP-metallophore biosynthetic modules, it currently lacks the resolution to discriminate between fine-scale structural or biochemical variations among different metallophore types.

      We agree that this is a current limitation to the methodology. More specific metallophore structural prediction is among our future goals for antiSMASH. We have added a statement to this effect in the conclusion.

      (2) While the comparison with the transporter-based siderophore prediction approach is convincing overall, more information about the dataset balance and composition would be appreciated. In particular, specifying the BGC identities, source organisms, and Gram-positive versus Gram-negative classification would improve transparency. In the supplementary tables, the "Just TonB" section seems to include only BGCs from Gram-negative bacteria - if so, this should be clearly stated, as Gram type strongly influences siderophore transport systems.

      The reviewer raises good points here. An additional ZIP file containing all BGCs used for the manual curation was inadvertently left out of the supplemental dataset for the first version of the manuscript. We have added columns with source organisms and Gram stain (retrieved from Bacdive) to Table S2. F1 scores were similar for Gram positive and negative subsets, as seen in the new Table S2.

      We thank the reviewer for suggesting this additional analysis, and have added a brief statement in the revised manuscript.

      The “Just TonB” section (in which we tested the performance of requiring TonB without another transporter) was not used for the manuscript. We will preserve it in the revised Table S2 for transparency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In line 43:

      "excreted" should be replace by "secreted".

      Done.

      (2) In lines 158-159:

      "we manually predicted metallophore production among a large set of BGCs."

      If they are first "annotated with default antiSMASH v6.1", then it is not entirely manual, right? I would suggest making this sentence clearer.

      We have revised the language.

      (3) In lines 165-169:

      It would be good to show the confusion matrix of these results.

      The confusion matrices are found in Table S2, columns AL-AR.

      (4) In Table 1:

      Method names (AntiSMASH rules/Transporter genes) could be misleading, since they are all AntiSMASH-based, right?

      We have adjusted the methods to clarify that while the transporter genes were detected using a modified version of antiSMASH, they are not related to our chelator-based detection rule (which is now correctly singular throughout the text).

      (5) Line 198:

      There are accidental spaces and characters inserted here.

      We could not find any accidental spaces and characters here.

      (6) Line 209:

      "In total, 3,264 NRP metallophore BGC regions were detected"

      Is this number correct? I don't see a correspondence in Table 1.

      We have added the following sentence to the Table 1 legend: “An additional 54 BGC regions were detected as NRP metallophores without meeting the requirements for the antiSMASH NRPS rule.”

      (7) Line 294:

      "From B. brennerae, we identified four catecholic compounds"

      From the bacterial cells or the culture supernatant? I think it is important to state this in a more precise way. If it is from the supernatant, it could be from EVs.

      We state in line 292 that “organic compounds were extracted from the culture supernatants”. As our goal was only to confirm the ability of the strains to produce the predicted metallophores, the precise localization (including cell pellet or EVs) was not explored.

      (8) Lines 349-357:

      These results would benefit greatly from a visualization strategy.

      Thank you, we have added a reference to the existing visualization in Fig. 5, Ring C.

      (9) Lines 452-454:

      How could clusters be de-replicated? Is there an identity equivalence scheme or similarity metric?

      The BGC regions were de-replicated with BiG-SCAPE, which uses multiple similarity metrics as described in Navarro-Muñoz et al, 2020. Clusters could be dereplicated further using a more strict cutoff.

      (10) Line 457:

      "relatively low number of published genomes."

      Could metagenome-assembled genomes help in that matter?

      This is a good question, but we find that MAGs are usually too fragmented to yield complete NRPS BGC regions. We’ve added additional sentences earlier in the discussion: “Detection rates were also lower for fragmented genomes; unfortunately, this limitation (inherent to antiSMASH itself) may hinder the identification of metallophore biosynthesis in metagenomes. As long-read sequencing of metagenomes becomes more common, we expect that detection will improve.”

      (11) Lines 514-515:

      "Adequately-performing pHMMs for Asp and His β-hydroxylase subtypes could not be constructed using the above method."

      What is the overall impact of this discrepancy in the methodology for these specific groups?

      The phylogeny-based methodology was used to reduce false positives. We expect this method will have improved precision at the possible expense of recall.

      (12) Lines 543-545:

      "RefSeq representative bacterial genomes were dereplicated at the genus level using R, randomly selecting one genome for each of the 330 genera determined by GTDB"

      Isn't it more of a random sampling than a dereplication? Dereplication would involve methods such as ANI computation.

      You are correct; we have adjusted the language to clarify.

      (13) Lines 559-560: "were filtered to remove clusters on contig edges."

      This sentence is confusing because networks will be mentioned soon, and they also have edges (not the edges mentioned here), and they could also be clustered (not the clusters mentioned here). Is there a way to make the terminology clearer?

      Thank you, we have adjusted the text to read “BGC regions on contig boundaries”

      (14) Line 560:

      "The resulting 2,523 BGC regions, as well as 78 previously reported BGCs "

      How many were there before filtering?

      We have added the number: 3,264

      (15) Lines 579-580:

      Confusing terminology, as mentioned in Lines 559-560.

      Adjusted as above.

      General comments and questions:

      An objective suggestion to enrich the discussion is to address the role of bacterial extracellular vesicles (EVs) as metallophore carriers. Studies show that EVs, such as outer membrane vesicles, can transport siderophores or other metallophores for iron acquisition in various bacteria, functioning as "public goods" for community-wide nutrient sharing. Highlighting this mechanism would add ecological and functional context to the manuscript. In the future, EV-associated metallophore transport could also be considered for integration into computational detection tools.

      We thank the reviewer for the suggestion; however, we do not think that such a discussion is needed. We briefly discuss the ecological function of metallophores as public goods (and public bads) in the first paragraph of the introduction. We did not find any reports that EV-associated genes co-localize with metallophore BGCs, which would be required for their presence to be a useful marker of metallophore production.

      Is there a feasible path to more generalizable detection of chelating motifs using chemistry-aware features? For example, a machine learning classifier trained on submolecular descriptors (e.g., functional groups, coordination motifs, SMARTS patterns, graph fingerprints, metal-binding propensity scores) could complement the current genome-based approach and broaden coverage beyond known metallophore families. While the discussion mentions future extensions centered on genomic features, integrating chemical information from predicted or known products (or biosynthetic logic inferred from BGC composition) could be explored. A hybrid framework-linking BGC-derived features with chemistry-derived features-may improve both recall for novel metallophore classes and precision in distinguishing true chelators from confounders, thereby increasing overall accuracy.

      We can envision a classifier that uses submolecular descriptors to predict the ability of a molecule to bind metal ions. However, starting with a BGC and accurately predicting the structure of a hitherto unknown chelating moiety will likely prove difficult.  We have added a sentence to the discussion stating that a future tool could use accessory genes to more completely predict chemical structure.

      Although the initial analysis was conducted using RefSeq genomes, what are the anticipated challenges and limitations when scaling this method for BGC prospecting in metagenome-assembled genomes (MAGs), particularly considering the inherent quality differences, assembly fragmentation, and taxonomic uncertainties that characterize MAG datasets compared to curated reference genomes?

      Please see our response to comment 10, line 457. Our pHMM-based approach is designed to be robust to organism taxonomy; however, fragmentation is a significant barrier to accurate antiSMASH-based BGC detection (including in contig-level single-isolate genomes, see Table 1).

      Reviewer #2 (Recommendations for the authors):

      (1) In the "Chemical identification of genome-predicted siderophores across taxa" section, it would be helpful to annotate the cross-species similarities between predicted metallophore BGCs and their reference clusters (Ref BGCs). As currently described, the main text seems to highlight the cross-species resolving power of BiG-SCAPE itself rather than demonstrating the taxonomic generalizability of the chelator HMM-based detection module.

      Thank you for this comment. We intended to display that the new rule is useful for detecting BGCs in unexplored taxa, but we acknowledge that there is not a great diversity in the strains we selected. We have removed “across taxa” to avoid misleading the reader and clarify our intent.

      (2) In addition to using eMPRess for gene-species reconciliation, it may be beneficial to explore or at least reference alternative reconciliation tools to validate the inferred duplication, transfer, and loss (DTL) scenarios. Incorporating such cross-method comparisons would enhance the robustness and credibility of the evolutionary conclusions.

      We appreciate this valuable suggestion. To validate the robustness of our reconciliation-based inferences, we additionally analyzed two gene families using the likelihood-based tool AleRax, which implements a probabilistic DTL model. The results were consistent with the eMPRess parsimony-based reconstructions, showing comparable patterns of rare duplication, moderate gene loss, and extensive horizontal transfer. Both methods identified similar lineages as the most probable origin and major recipients of transfer events. This agreement between independent reconciliation frameworks supports the reliability of our evolutionary conclusions. We have added a brief statement referencing this cross-method validation in the revised manuscript.

    1. As archivists we like these questions because they tell us that people are eager for access to archival records. They also show that people realize that not everything is digitized. Indeed only a tiny fraction of the world’s primary resources are available digitally.

      Sure, some individuals may be more eager for physical records, but it should not be a question that digital archives are significantly easier to access. So I think that is a big factor to consider.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers and editors for their careful evaluation of our manuscript and their positive comments on the importance and rigor of the work. Below you will find our point-by-point response to each reviewer's suggestions. We believe that we have addressed (in the response and the revised manuscript) all of the concerns. Please note that in some cases, we have numbered a reviewer's comments for clarity, however beyond this, we have not altered any of the reviewers' text.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Lo et al., report a high-throughput functional profiling study on the gene encoding for argininosuccinate synthase (ASS1), done in a yeast experimental system. The study design is robust (see lines 141-143, main text, Methods), whereby "approximately three to four independent transformants of each variant would be isolated and assayed." (lines 140 - 141, main text, Methods). Such a manner of analysis will allow for uncertainty of the functional readout for the tested variants to be accounted for.

      This is an outstanding study providing insights on the functional landscape of ASS1. Functionally impaired ASS1 may cause citrullinemia type I, and disease severity varies according to the degree of enzyme impairment (line 30, main text; Abstract). Data from this study forms a valuable resource in allowing for functional interpretation of protein-altering ASS1 variants that could be newly identified from large-scale whole-genome sequencing efforts done in biobanks or national precision medicine programs. I have some suggestions for the Authors to consider:

      1. The specific function of ASS1 is to condense L-citrulline and L-aspartate to form argininosuccinate. Instead of measuring either depletion of substrate or formation of product, the Authors elected to study 'growth' of the yeast cells. This is a broader phenotype which could be determined by other factors outside of ASS1. Whereas i agree that the experiments were beautifully done, the selection of an indirect phenotype such as ability of the yeast cells to grow could be more vigorously discussed.

      We appreciate the reviewer's point regarding the indirect nature of growth as a functional readout. In our system, yeast growth is tightly and specifically coupled to ASS enzymatic activity. The strains used are isogenic and lack the native yeast argininosuccinate synthetase, such that arginine biosynthesis, and therefore yeast replication on minimal medium lacking arginine, depends exclusively on the activity of human ASS1. Under these defined and limiting conditions, growth provides a quantitative proxy for ASS1 function. However, we acknowledge that this assay does not resolve specific molecular mechanisms underlying reduced function, such as altered catalytic activity versus effects on protein stability. We have updated the text to clarify these points.

      "While growth is an indirect phenotype relative to direct measurement of substrate turnover or product formation, it is tightly coupled to ASS enzymatic activity in this system and is expected to be impaired by amino acid substitutions that reduce catalytic activity or protein stability. Therefore, growth on minimal medium lacking arginine is a quantitative measure of ASS enzyme function, allowing the impact of ASS1 missense variants to be assessed at scale through a high-throughput growth assay, in a single isogenic strain background, under controlled, defined conditions that limit confounding factors unrelated to ASS1 activity. We expect that the assay will detect reductions in both catalytic activity and protein stability but will not distinguish between these mechanisms."

      1. One of the key reasons why studies such as this one are valuable is due to the limitations of current variant classification methods that rely on 'conservation' status of amino acid residues to predict which variants might be 'pathogenic' and which variants might be 'likely benign'. However, there are serious limitations, and Figures 2 and 6 in the main text shows this clearly. Specifically, there is an appreciable number of variants that, despite being classified as "ClinVar Pathogenic", were shown by the assay to unlikely be functionally impaired. This should be discussed vigorously. Could these inconsistencies be potentially due to the read out (growth instead of a more direct evaluation of ASS1 function)?

      We interpret this discrepancy as reflecting a sensitivity limitation of the growth-based readout rather than a fundamental disagreement between functional effect and clinical annotation. Specifically, we believe that our assay is unable to resolve the very mildest hypomorphic variants from true wild type, i.e., the residual activity of these variants is sufficient to fully support yeast growth under the conditions used. On this basis, we have chosen not to treat wild-type-like growth in our assay as informative for benignity; conversely, reduced growth provides evidence supporting pathogenicity (all clinically validated variants examined in this range are pathogenic).

      We have revised the manuscript to clarify this point explicitly and to frame these variants as lying outside the effective resolution limit of the assay rather than representing true false positives. Additional discussion of this limitation and its implications is provided in our responses to Reviewer 2 (points 1 and 4) along with specific changes made to the text.

      1. Figure 3 is very interesting, showing a continuum of functional readout ranging from 'wild-type' to 'null'. It is very interesting that the Authors used a threshold of less than 0.85 as functionally hypomorphic. What does this mean? It would be very nice if they have data from patients carrying two hypomorphic ASS1 alleles, and correlate their functional readout with severity of clinical presentation. The reader might be curious as to the clinical presentation of individuals carrying, for example, two ASS1 alleles with normalized growth of 0.7 to 0.8.

      I hope you will find these suggestions helpful.

      We thank the reviewer for this thoughtful comment. Figure 3 indeed illustrates a continuum of functional effects, and we agree that careful interpretation of the thresholds used is important. To clarify the rationale for the hypomorphic threshold, the interpretation of intermediate growth values, and to emphasize that these labels reflect only behavior in the functional assay, we have rewritten the relevant section of the Results:

      "The normalized growth scores of the 2,193 variants tested in our functional assay form a clear bimodal distribution (Figure 3), with two distinct peaks corresponding to functional extremes, as is commonly reported in large-scale functional assays of protein function [9, 10]. The smaller peak, centered around the null control (normalized growth = 0), represents variants that fail to support growth in the assay (growth 0.85). Variants with growth values falling between these two peak-based thresholds display partial functional impairment and are classified as functionally hypomorphic (n = 323). Crucially, these classifications are entirely derived from the observed peaks in the distribution of growth values and reflect differences in functional activity under the assay conditions. They do not provide direct evidence for clinical pathogenicity or benignity and should not be used for clinical variant interpretation without proper benchmarking against clinical reference datasets, as implemented below within an OddsPath framework."

      We agree with the reviewer that correlating functional measurements with clinical severity in individuals carrying two hypomorphic ASS1 alleles would be highly informative, particularly given that ASS1 deficiency is an autosomal recessive disorder. While mild hypomorphic variants (for example, variants with normalized growth values of 0.7-0.8 in our assay) could plausibly contribute to disease when paired with a complete loss-of-function allele, systematic analysis of combinatorial genotype effects and genotype-phenotype correlations is beyond the scope of the present study, which focuses on the functional effects of individual variants. We view this as an important direction for future work.

      Reviewer #1 (Significance (Required)):

      This is an outstanding study providing insights on the functional landscape of ASS1. Functionally impaired ASS1 may cause citrullinemia type I, and disease severity varies according to the degree of enzyme impairment (line 30, main text; Abstract). Data from this study forms a valuable resource in allowing for functional interpretation of protein-altering ASS1 variants that could be newly identified from large-scale whole-genome sequencing efforts done in biobanks or national precision medicine programs.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Lo et al characterize the phenotypic effect of ~90% of all possible ASS1 missense mutations using an elegant yeast-based system, and use this dataset to aid the interpretation of clinical ASS1 variants. Overall, the manuscript is well-written and the experimental data are interpretated rigorously. Of particular interest is the identification of pairs of deleterious alleles that rescue ASS1 activity in trans. My comments mainly pertain to the relevance of using a yeast screening methodology to infer functional effects of human ASS1 mutations.

      1. Since human ASS1 is heterologously expressed in yeast for this mutational screen, direct comparison of native expression levels between human cells and yeast is not possible. Could the expression level of human ASS1 (driven by the pARG1 promoter) in yeast alter the measured fitness defect of each variant? For instance, if ASS1 expression in yeast is sufficiently high to mask modest reductions in catalytic activity, such variants may be misclassified as hypomorphic rather than amorphic. Conversely, if expression is intrinsically low, even mild catalytic impairments could appear deleterious. While it is helpful that the authors used non-human primate SNV data to calibrate their assay, experiments could be performed to directly address this possibility.

      The nature of the relationship between yeast growth and availability of functional ASS1 could also influence the interpretation of results from the yeast-based screen. Does yeast growth scale proportionately with ASS1 enzymatic activity?

      We completely agree that the expression level of human ASS1 in yeast could influence the measured fitness effects of individual variants. We expect the rank ordering of variants in our growth assay to reflect their relative enzymatic activity (i.e. a monotonic relationship) but acknowledge that the precise mapping between activity and growth is unknown and may include ceiling and floor effects that limit the assay's dynamic range. As the reviewer notes, under high expression conditions moderate loss-of-function variants could appear indistinguishable from wild type (ceiling effect), whereas under lower expression the same variants could behave closer to the null control (floor effect).

      In our system, ASS1 is expressed from the pARG1 promoter, chosen under the assumption that the native expression level of ARG1 (the yeast ASS1 ortholog) is appropriately tuned for yeast growth. Crucially, rather than assuming a fixed mapping from assay growth to clinical pathogenicity (given potential nonlinearities in the relationship between ASS function and growth) we benchmark the assay against external data, including known pathogenic and benign variants and non-human primate SNVs, to calibrate thresholds and guide interpretation within an OddsPath framework. This benchmarking indicates that ceiling effects are likely present, with some mild loss-of-function pathogenic variants appearing indistinguishable from wild type in the growth assay. We explicitly account for this by not using high-growth scores as evidence toward benignity. We have made the following changes the manuscript:

      "A subset of clinically pathogenic ASS1 variants exhibit near-wild-type growth in our yeast assay. In general, we expect a monotonic relationship between ASS function and yeast growth, but with the potential for floor and ceiling effects that constrain the assay's dynamic range. In this context, we interpret high-growth pathogenic variants as likely causing mild loss of function that cannot be distinguished from wild type in our assay"

      "Based on these findings and given that 22/56 pathogenic variants show >85% growth, we conclude that growth above this threshold should not be used as evidence toward benignity."

      1. It would be helpful to add an additional diagram to Figure 1A explaining how the screen was performed, in particular: when genotype and phenotype were measured, relative to plating on selective vs non-selective media? This is described in "Variant library sequence confirmation" and "Measuring the growth of individual isolates" of the Methods section but could also be distilled into a diagram.

      We thank the reviewer for this helpful suggestion. We have updated Figure 1 by adding a new schematic panel (Figure 1C) that distills the experimental workflow into a visual overview. This diagram is intended to complement the detailed descriptions in the Methods and improve clarity for the reader.

      1. The authors rationalize the biochemical consequences of ASS1 mutations in the context of ASS1 per se - for example, mutations in the active site pocket impair substrate binding and therefore catalytic activity, which is expected. Does ASS1 physically interact with other proteins in human cells, and could these interactions be altered in the presence of specific ASS1 mutations? Such effects may not be captured by performing mutational scanning in yeast.

      We are not aware of any specific protein-protein interactions involving ASS that are required for its enzymatic function. However, we agree that ASS could engage in non-essential interactions with other human proteins that might be altered by specific missense variants and that such interactions would not necessarily be captured in a yeast-based assay.

      Importantly, our complementation system depends on human ASS providing the essential enzymatic activity required for arginine biosynthesis in yeast. If ASS1 required obligate human-specific protein interactions to function, even the wild-type enzyme would fail to support yeast growth, which is clearly not the case. We therefore conclude that the assay robustly reports on the intrinsic enzymatic activity of ASS, while acknowledging that non-essential human-specific interactions may not be assessed. We have updated the manuscript to reflect this point.

      "Importantly, successful functional complementation indicates that ASS enzymatic activity does not depend on any obligate human-specific protein interactions."

      1. The authors note that only a small number (2/11) of mutations at the ASS1 monomer-monomer interface lead to growth defects in yeast. It would be helpful for the authors to discuss this further.

      As discussed in response to the reviewer's comments on the relationship between ASS activity and yeast growth (point 1 above), we expect growth to be a monotonic but nonlinear function of enzymatic activity, with potential ceiling effects at high activity. Under this model, variants causing weak or moderate loss of function may remain indistinguishable from wild type when residual activity is sufficient to support normal growth. We favor this explanation for the observation that only 2/11 interface variants show reduced growth, as many pathogenic interface substitutions are associated with milder disease presentations, consistent with higher residual enzyme function. Consistent with this interpretation, variants affecting the active site, where substitutions are expected to cause large reductions in catalytic activity, are readily detected by the assay.

      Although we cannot exclude partial buffering of dimerization defects in yeast, we interpret the reduced sensitivity to interface variants primarily as a general limitation of growth-based assays. Accordingly, our decision not to use growth >85% as evidence toward benignity is conservative relative to approaches that would classify high-growth variants as benign except at the monomer-monomer interface, avoiding reliance on structural subclassification and minimizing the risk of false benign interpretation. Reduced growth, by contrast, provides strong evidence of loss of ASS1 function and pathogenicity, validated under the OddsPath framework.

      We have updated the Results and Discussion sections to clarify these points (also see response to the reviewer's point 1).

      "A subset of clinically pathogenic ASS1 variants exhibit near-wild-type growth in our yeast assay. In general, we expect a monotonic relationship between ASS function and yeast growth, but with the potential for floor and ceiling effects that constrain the assay's dynamic range. In this context, we interpret high-growth pathogenic variants as likely causing mild loss of function that cannot be distinguished from wild type in our assay. Consistent with this view, many pathogenic variants with high assay growth are located at the monomer-monomer interface rather than the active site, and are associated with milder or later-onset clinical presentations, suggesting partial enzymatic impairment that is clinically relevant in humans but not resolved by the yeast assay."

      "Based on these findings and given that 22/56 pathogenic variants show >85% growth, we conclude that growth above this threshold should not be used as evidence toward benignity. Notably, this approach is conservative relative to treating high-growth variants as benign except at the monomer-monomer interface, avoiding reliance on structural subclassification and minimizing the risk of false benign interpretation arising from assay ceiling effects. Conversely, the variants with

      Reviewer #2 (Significance (Required)):

      This study presents the first comprehensive mutational profiling of human ASS1 and would be of broad interest to clinical geneticists as well as those seeking biochemical insights into the enzymology of ASS1. The authors' use of a yeast system to profile human mutations would be particularly useful for researchers performing deep mutational scans, given that it provides functional insights in a rapid and inexpensive manner.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Section 1 - Evidence, reproducibility, and clarity Summary This manuscript presents a comprehensive functional profiling of 2,193 ASS1 missense variants using a yeast complementation assay, providing valuable data for variant interpretation in the rare disease citrullinemia type I. The dataset is extensive, technically sound, and clinically relevant. The demonstration of intragenic complementation in ASS1 is novel and conceptually important. Overall, the study represents a substantial contribution to functional genomics and rare disease variant interpretation.

      Major comments 1. This is an exciting paper as it can provide support to clinicians to make actionable decisions when diagnosing infants. I have a few major comments, but I want to emphasize the label of "functionally unimpaired" variants to be misleading. The authors explain that there are several pathogenic ClinVar variants that fall into this category (above the >.85 growth threshold) but I think this category needs a more specific name and I would ask the authors to reiterate the shortcomings of the assay again in the Discussion section.

      We thank the reviewer for raising this important point. We agree that the label "functionally unimpaired" could be misleading if interpreted as implying clinical benignity rather than assay behavior. We have therefore clarified that this designation refers strictly to variant behavior in the yeast growth assay and does not imply absence of pathogenicity.

      In addition, we have expanded the Discussion to explicitly address the existence of clinically pathogenic variants with high growth scores (>0.85), emphasizing that these likely reflect a ceiling effect of the assay and represent a key limitation for interpretation. This clarification reiterates that high-growth scores should not be used as evidence toward benignity, while reduced growth provides strong functional evidence of pathogenicity. Relevant revisions are described in our responses to Reviewers 1 and 2.

      1. I think there's an important discussion to be had here, is the assay detecting variants that alter the function of ASS or is it detecting a complete ablation of enzymatic activity? The results might be strengthened with a follow-up experiment that identifies stably expressed ASS1 variants.

      We agree with the review that distinguishing between stability and enzyme activity would be valuable information. Unfortunately, we do not currently have the resources to perform this type of large-scale study. We have acknowledged in the text that our assay does not distinguish between enzyme activity and protein stability:

      "We expect that the assay will detect reductions in both catalytic activity and protein stability, but will not distinguish between these mechanisms."

      At the very least, it would be great to see the authors replicate some of their interesting results from the high-throughput screen by down-selecting to ~12 variants of uncertain significance that could be newly considered pathogenic.

      We have included new analysis of all 25 VUS variants falling in the pathogenic range of our assay (Supplemental Table S7). Reclassification under current guidelines (in the absence of our data) shifts six variants to Pathogenic/Likely Pathogenic and 11 more are reclassified to Likely Pathogenic with the application of our functional data as PS3_Supporting. The remaining eight VUS are all reclassified to Likely Pathogenic when inclusion of homozygous PrimateAI-benign variants allows the assay to satisfy full PS3 criteria.

      1. I would ask the authors to provide more citations of the literature in the introduction of the manuscript. I would be especially interested in knowing more about human ASS being identified as a homolog of yeast ARG1, as they share little sequence similarity (27.5%) at the protein level. That said, I find the yeast complementation assay exciting.

      We thank the reviewer for this suggestion. Human ASS and yeast Arg1 catalyze the same biochemical reaction and share approximately 49% amino acid sequence identity. We have revised the Introduction to clarify this relationship and to note explicitly that the Saccharomyces Genome Database (SGD) identifies the human gene encoding argininosuccinate synthase (ASS1) as the ortholog of yeast ARG1. An appropriate citation has been added to support this statement. The protein alignments have been provided as File S2.

      "This assay is based on the ability of human ASS to functionally replace (complement) its yeast ortholog (Arg1) in S. cerevisiae (Saccharomyces Genome Database, 2026). Importantly, successful functional complementation indicates that ASS enzymatic activity does not depend on any obligate human-specific protein interactions. At the protein level, human ASS and yeast Arg1 display 49% sequence identity (File S2) and share identical enzymatic roles in converting citrulline and aspartate into argininisuccinate."

      1. I appreciate the efforts made by the authors to share their work and make this study more reproducible, such as sharing the hASS1 and yASS1 plasmids being shared on NCBI Genbank (Line 121) and publishing the ONT reads on SRA (Line 154). I made a requests for additional data to be shared, such as the custom method/code for codon optimization and a table of Twist variant cassettes that were ordered. I would also love to see these results shared on MaveDB.org.

      We thank the reviewer for these suggestions regarding data sharing and reproducibility. As requested, we have provided the custom codon optimization script as File S1 and the amino acid alignment used to perform codon harmonization as File S2. The sequence of the underlying variant cassette is included in the corresponding GenBank entry, and we have clarified this point in the legend of Figure 1. For each amino acid substitution, Twist Bioscience used a yeast-specific codon scheme with a single consistent codon per amino acid; accordingly, the sequence of each variant cassette can be inferred from the base construct and the specified amino acid change. A complete list of variant amino acid substitutions used in this study is provided in Table S3.

      1. I find this manuscript very exciting as the authors have a compelling assay that identifies pathogenic variants, but I was generally disappointed by the quality and organization of the figures. For example, Figure 4 provides very little insight, but could be dramatically improved with an overlay of the normalized growth score data or highlighting variants surrounding the substrate or ATP interfaces. There are some very interesting aspects of this manuscript that could be shine through with some polished figures.

      We thank the reviewer for this feedback and agree that clear and well-organized figures are essential for conveying the key results of the study. In response, we have substantially revised Figure 4 by adding colored overlays showing residue conservation and median normalized growth scores (new panels Figure 4C and 4D), which more directly link structural context to functional outcomes and highlight patterns surrounding the active site and substrate interfaces.

      I would also encourage the authors to generate a heatmap of the data represented in Figure 2 (see Fowler and Fields 2014 PMID 25075907, Figure 2), this would be more helpful reference to the readers.

      The reviewer also suggested that a heatmap representation, similar to that used in Fowler and Fields (2014), might aid interpretation of the data shown in Figure 2. Because our dataset consists of sparse single-amino acid substitutions rather than a complete mutational scan, such heatmaps are inherently less dense and less effective at conveying patterns than in saturation mutagenesis studies. Nevertheless, to aid readers who may find this visualization useful, we have generated and included a single-nucleotide variant heatmap as Supplemental Figure S1.

      My major comments are as follows: 6. Citations needed - especially in the introduction and for establishing that hASS is a homolog of yARG1

      We have added the requested citations and clarified the ASS1-ARG1 orthology in the Introduction, as described in our response to point 3 above.

      1. Generally, the authors do a nice job distinguishing the ASS1 gene from the ASS enzyme, though I found some ambiguities (Line 685). Please double-check the use of each throughout the manuscript.

      We have edited the manuscript to ensure consistent and unambiguous use of gene and enzyme nomenclature throughout.

      1. Generally, I'm confused about what strain was used for integrating all these variants, was is the arg1 knock-out strain from the yeast knockout collection or was it FY4? I think FY4 was used for the preliminary experiments, then the KO collection strain was used for making the variant library but I think this could be made more clear in the text and figures. Lines 226-229 describes introducing the hASS1 and yASS1 sequences into the native ARG1 locus in strain FY4, but the Fig1A image depicts the ASS1 variants going into arg1 KO locus. Fig1A should be moved to Fig2.

      We agree that the strain construction steps were not described as clearly as they could have been. We have therefore clarified the strain construction workflow in the Materials & Methods and Results sections, as well as in the Figure 1 legend, to explicitly distinguish preliminary experiments performed in strain FY4 from construction of the variant library in the arg1 knockout background.

      As we have also added an additional panel to Figure 1 that schematically explains how the screen was performed (per Reviewer #2's request), we believe that Figure 1A is appropriately placed and should remain in Figure 1.

      1. Line 303 - "We classify these variants as 'functionally unimpaired'", this is not an accurate description of these variants as Figure 2 highlights 24 pathogenic ClinVar variants that would fall into this category of "functionally unimpaired". The yeast growth assay appears to capture pathogenic variants, but there is likely some nuance of human ASS functionality that is not being assessed here. I would make the language more specific, e.g. "complementary to Arg1" or "growth-compatible".

      We agree that the label "functionally unimpaired" could be misinterpreted if read as implying clinical benignity. We have therefore clarified within the manuscript that this designation refers strictly to variant behavior in the yeast growth assay (i.e., wild-type-like growth under assay conditions) and does not imply absence of pathogenicity. We also expanded the Discussion to explicitly address the subset of clinically pathogenic variants with high growth scores (>0.85), consistent with a ceiling effect of the assay and a key limitation for interpretation. See response to reviewer #3 point 1. Relevant revisions are also discussed in our responses to Reviewers #1 and #2.

      1. Lines 345-355 - It is interesting that there are variants that appear functional at the substrate interfacing sites. Is there anything common across these variants? Are they maintaining the polarity or hydrophobicity of the WT residue? Are any of these variants included in ClinVar or gnomAD? Are pathogenic variants found at any of these sites

      Yes. For highly sensitive active-site residues that have few permissible variants, the vast majority of amino acid substitutions that do retain activity preserve key physicochemical properties of the wild-type residue, such as hydrophobicity or charge. We have added this important observation to the manuscript:

      "Any variants at these sensitive residues that are permissive for activity in our assay retain hydrophobicity or charged states relative to the original amino acid side chain (Figure 5A & Table S5)."

      None of these variants are present in ClinVar. Only L15V and E191D are present in gnomAD (Table S4).

      1. Lines 423-430 - The OddsPath calculation would seem to rely heavily on the thresholds of .85 for normalized growth. The OddsPath calculation could be bolstered with some additional analysis that emphasizes the robustness to alternative thresholds.

      We agree that the sensitivity of the OddsPath calculation to the choice of growth thresholds is an important consideration. In our assay, benign ClinVar variants and non-human primate variants are observed exclusively within the peak centered on wild-type growth, whereas clinically annotated variants falling below this peak are exclusively pathogenic. On this basis, we defined the upper boundary of the assay range interpreted as supporting pathogenicity as the lower boundary of the wild-type-centered peak in the growth distribution (as defined in Figure 3), rather than selecting a cutoff by direct optimization of the OddsPath. This choice reflects the observed concordance, in our dataset, between the onset of measurable functional impairment in the assay and clinical pathogenic annotation. Importantly, in practice the OddsPath value is locally robust to the precise placement of this boundary, remaining invariant across the range 0.82-0.88. Supporting our chosen threshold of 0.85, the lowest-growth benign or primate variant observed has a normalized growth value of 0.88, while the lowest growth observed among variants present as homozygotes in gnomAD was 0.86. We have clarified this rationale and analysis in the revised manuscript.

      "Notably, the "Among all nine of the human ASS1 missense variants observed as homozygotes in gnomAD which were tested as amino acid substitutions in our assay, the lowest observed growth value was 0.86 (Ala258Val) consistent with the lower boundary of the PrimateAI variants which was a growth value of 0.87 (Ala81Thr) (Figure 6) and with our use of a 0.85 classification threshold."

      "If we treat PrimateAI variants as benign (solely for OddsPath calculation purposes), the OddsPath for growth

      1. Lines 432-441 - This is an interesting idea to use variants observed in primates, has ACMG weighed in on this? I understand that CTLN1 is an autosomal recessive disorder but I'd still be interested in seeing how the observed ASS1 missense variants in gnomAD perform in your growth assay, possibly a supplemental figure?

      To our knowledge, the ACMG/AMP guidelines do not currently address the use of homozygous missense variants observed in non-human primates. We are currently in discussion with two ClinGen working groups to discuss the possibility of formalizing the use of this data source.

      We agree that comparison with human population data is also important. Accordingly, total gnomAD allele counts and homozygous counts for all applicable ASS1 missense variants are provided in Table S4, and the growth behavior of ASS1 missense variants observed in the homozygous state in gnomAD is shown in Figure 6. These homozygous variants uniformly exhibit high growth in our assay, consistent with the absence of strong loss-of-function effects. We have updated the manuscript text to clarify these points.

      Minor comments 1. Lines 53-59 - This paragraph needs to cite the literature, especially lines 56, 57, and 59 2. Line 61 - no need to repeat "citrullinemia type I", just use the abbreviation as it was introduced in the paragraph above 3. Lines 61-71 - again, this paragraph needs more literature citations 4. Line 62 - change to "results"

      The changes suggested in points 1-4 have all been implemented in the revised manuscript.

      1. Line 74-75 - "RUSP" acronym not needed as it's never used in the manuscript, the same goes for "HHS"

      We agree that the acronyms "RUSP" and "HHS" are not reused elsewhere in the manuscript. We have nevertheless retained them at first mention, alongside the expanded names, because these acronyms are commonly used in newborn screening and public health policy contexts and may be more familiar to some readers than the expanded terms. We would be happy to remove the acronyms if preferred.

      1. Line 86 - "ASS1" I think is referring to the enzyme and should just be "ASS"? If referring to the gene then italicize to "ASS1"
      2. Lines 91-93 - It would be helpful to mention this is a functional screen in yeast
      3. Line 101 - It would be helpful to the readers to define SD before using the acronym, consider changing to "minimal synthetic defined (SD) medium" and afterwards can refer to as "SD medium"
      4. 109-114 - It would be great if you could share your method for designing the codon-harmonized yASS1 gene, consider sharing as a supplemental script or creating a GitHub repository linked to a Zenodo DOI for publication.

      The changes suggested in points 6-9 have all been implemented in the revised manuscript. The codon harmonization script has been provided as File S1.

      1. Lines 135-137 - I think it's helpful to provide a full table of the cassettes ordered from Twist as well as the primers used to amplify them, consider a supplemental table.

      Details of Twist cassette and the primer sequences used for amplification have been added to the Materials & Methods.

      1. Line 138 - "standard methods" is a bit vague, I'm guessing this is a Geitz and Schiestl 2007 LiAc/ssDNA protocol (PMID 17401334)? Also, was ClonNAT used to select for natMX colonies?

      The reviewer is correct about which protocol was used, and we have added the citation. We have also clarified that selection was carried out based on resistance to nourseothricin.

      1. Line 150 - change to "sequence the entire open reading frame, as previously described [4]."
      2. Line 222-223 - remove "replace" and just use "complement" (and remove the parenthesis)
      3. Line 249 - It would be great to see a supplemental alignment of the hASS1 and yASS1 sequences.
      4. Line 261 - spelling "citrullemia" should be corrected to "citrullinemia"
      5. Line 280 - "using Oxford Nanopore sequencing" is a bit vague, I suggest specifying the equipment used (e.g. Oxford Nanopore Technologies MinION platform) or simplify to "via long-read sequencing (see Materials & Methods)"

      The changes suggested in points 12-16 have all been implemented in the revised manuscript. An alignment of the ASS and Arg1 protein sequences has been provided as File S2.

      1. Line 287-289 - It would be great to see the average number of isolates per variant, as well as a plot of the variant growth estimate vs individual isolate growth.

      We agree with the reviewer that conveying measurement precision is important. The number of isolates assayed per variant is provided in Table S4, and we have added explicit mention of this in the text. Because variants were assayed with a mixture of 1, 2, or {greater than or equal to}3 independent isolates, a scatterplot of variant-level growth estimates versus individual isolate measurements would be difficult to interpret and potentially misleading. Instead, we report standard error estimates for each variant in Table S4, derived from the linear model used to estimate growth effects, which more appropriately summarizes measurement uncertainty given the experimental design.

      1. Lines 324-25 - consider removing the last sentence of this paragraph, it is redundant as the following paragraph starts with the same statement.

      We have removed this sentence.

      1. Lines 327-335 - This is interesting and would benefit from its own subpanel or plot in which the normalized growth score is plotted against variants that are at conserved or diverse residues in human ASS, and see if there's a statistical difference in score between the two groupings.

      As suggested by the reviewer, we have added Supplemental Figure 2 (Figure S2) in which the normalized growth score of each variant is plotted against the conservation of the corresponding residue, as measured by ConSurf. The manuscript already includes a statistical analysis of the relationship between residue conservation and functional impact, showing that amorphic variants occur significantly more frequently at highly conserved residues than unimpaired variants do (one-sided Fisher's exact test). We now refer to this new supplemental figure in the relevant Results section.

      1. Lines 339-341 - As written, it is unclear if aspartate interacts with all of the same residues as citrulline or just Asn123 and Thr119.
      2. Lines 345-355 - As with my above comment, I find this interesting and would
      3. Line 353 - add a period to "al" in "Diez-Fernandex et al."

      The issues raised in points 20 and 22 have all addressed. Point 21 appears to be truncated.

      1. Figure 1 a. Remove "Figure" from the subpanels and show just "A" and "B" (as you do for Figure 4) and combine the two images into a single image. Also make this correction to Figure 5 and Figure 8. b. Panel A - I thought the hASS1 and yASS1 were dropped into FY4, not the arg1 KO strain. This needs clarification. c. Panel A - I'm assuming the natMX cassette contains its own promoter, you could use a right-angled arrow to indicate where the promotors are in your construct. d. Panel B - I'm not sure the bar graph is necessary, it would be more helpful to see calculations of the colony size (or growth curves for each strain) and plot the raw values (maybe pixel counts?) for each replicate rather than normalizing to yeast ARG1. I would be great to have a supplemental figure showing all the replicates side-by-side. e. Panel B - Would be helpful to denote the pathogenic and benign ClinVar variants with an icon or colored text.

      f. Figure 1 Caption - make "A)" and "B)" bold.

      We have implemented the requested changes in Figure 1 with the following exceptions. We have retained panels A and B as separate subfigures because they illustrate distinct experimental concepts. In addition, we respectfully disagree with point (d). The bar graph is intended to provide a clear, high-level comparison of functional complementation by hASS1 versus yASS1 and to illustrate the gross differences in growth between benign and pathogenic proof-of-principle variants. As the bar graph includes error bars for standard deviations, presenting raw colony size measurements or growth curves for individual replicates would substantially complicate the figure without materially improving interpretability for this purpose.

      1. Figure 2 a. "Shown in magenta are amino acid substitutions corresponding to ClinVar pathogenic, pathogenic/likely pathogenic, and likely pathogenic variants" is repeated in the figure caption. b. "Shown in green are amino acid substitutions corresponding to ClinVar benign and likely benign variants." I don't see any green points. c. Identify the colors used for ASS1 substrate binding residues. d. This plot would benefit from a depiction of the human ASS secondary structure and any protein domains (nucleotide-binding domain, synthase domain, and C-terminal helix from Fig4B)

      e. Line 685 675 - "ASS1" is being used in reference to the enzyme, is this correct or should it be "ASS"?

      We have made the requested changes to Figure 2. The repeated caption text has been removed, and references to green points have been corrected to orange points to match the figure. The colors used to indicate ASS substrate-binding residues are explicitly described in the figure key. Secondary structure annotations have been added. References to the enzyme have been corrected to "ASS" rather than "ASS1" where appropriate.

      1. Figure 3 a. Rename the "unimpaired" category as there are several pathogenic ClinVar variants that fall into this category.

      To address this point, we have clarified the labeling by adding "in our yeast assay" to the figure legend, making explicit that the "unimpaired" category refers only to wild-type-like behavior under assay conditions and does not imply clinical benignity. See also response to Reviewer #3, Major Comment 1.

      1. Figure 4 a. List the PDB or AlphaFold accession used for this structure b. Panel A - state which colors are used for to depict each monomer. It is confusing to see several shades of pink/purple used to depict a single monomer in Panel A. c. It is very difficult to make out the aspartate and citrulline substrates in the catalytic binding activity, consider making an inset zooming-in on this domain and displaying a ribbon diagram of the structure rather than the surface. d. Generally, it would be more helpful here to label any particular residues that were identified as pathogenic from your screen, or to overlay average grow scores per residue data onto the structure

      We have implemented the requested changes to Figure 4. The relevant PDB/AlphaFold accession is now listed, and the colors used to depict each monomer in Panel A are clarified in the figure legend. An inset focusing on the active site has been added to improve visualization of the citrulline and aspartate substrates. In addition, we have added new panels (Figure 4C and 4D) overlaying pathogenic residues and average growth scores onto the structure to more directly link structural context with functional data.

      1. Figure 5 a. Line 716 - Insert a page break to place Figure 5 on its own page b. I suggest using a heatmap for this type of plot, as it is very difficult to track which color corresponds to which residue.

      c. Fig5A - This plot could be improved by identifying which residue positions interface with which substrate.

      We have placed Figure 5 on its own page and added information to the legend identifying which residue positions interface with each substrate. We have retained the active-site variant strip charts raised in point (b), as we believe they effectively illustrate how the distribution of variant effects differs between residues. In addition, we have provided a supplemental heatmap showing variant growth across the entire protein (Figure S1), and individual variant scores for all residues are provided in Table S4.

      1. Figure 7 a. Line 735 - Insert page break to place figure on a new page

      List the PDB accession used for these images. c. For clarity I would mention "human ASS" in the figure title d. State the colors of the substrates e. Panels A and B could be combined into a single panel, making it easier to distinguish the active site and dimerization variants.

      f. Could be interesting to get SASA scores for the ClinVar structural variants to determine if they are surface-accessible

      We have implemented the requested changes in Figure 7 with the following exceptions. For point (e), there is no single orientation of the structure that allows a clear simultaneous view of both active-site and dimerization variants; accordingly, we have retained panels A and B as separate subfigures to preserve clarity. With respect to point (f), we agree that solvent accessibility analysis could be informative in other contexts. However, such an analysis does not integrate naturally with the functional and assay-based framework of the present study and was therefore not included.

      1. Figure 8 a. Panel B - overlay a square frame in the larger protein structure that depicts where the below inset is focused, and frame inset image as well.

      We have framed the inset image as requested. We did not add a corresponding frame to the full protein structure, as doing so obscured structural details in the region of interest.

      Reviewer #3 (Significance (Required)):

      Section 2 - Significance This study represents a substantial technical, functional, and translational advance in the interpretation of missense variation in ASS1, a gene of high clinical relevance for the rare disease citrullinemia type I. Its principal strength lies in the generation of an experimentally validated functional atlas of ASS1 missense variants that covers ~90% of all SNV-accessible substitutions. The scale, internal reproducibility, and careful benchmarking of the yeast complementation assay against known pathogenic and benign variants provide a robust foundation for identifying pathogenic ASS1 variants. Particularly strong aspects include the rigorous quality control of variant identities, the quantitative nature of the functional readout, and the thoughtful integration of results into the ACMG/AMP OddsPath framework. The discovery of intragenic complementation between variants affecting distinct structural regions of the enzyme is a notable conceptual and mechanistic contribution. Limitations include the assay's reduced sensitivity to variants impacting oligomerization or subtle folding defects, and the use of yeast as a heterologous system, which may mask disease-relevant mechanisms as several pathogenic ClinVar variants were found to be "functionally unimpaired". Future work extending functional testing to additional cellular contexts or expanding genotype-level combinatorial analyses would further enhance clinical applicability. Relative to prior studies, which have relied on small numbers of patient-derived variants or low-throughput biochemical assays, this work extends the field decisively by delivering a comprehensive, variant-resolved functional map for ASS1. To the best of my current knowledge, this is the first systematic functional screen of ASS1 at this scale and the first direct experimental demonstration that ASS active sites span multiple subunits, enabling intragenic complementation consistent with Crick and Orgel's classic variant sequestration model. As such, the advance is simultaneously technical (high-throughput functional genomics), mechanistic (defining structural contributors to catalysis and epistasis), and clinical (enabling evidence-based reclassification of VUS). I find the use of homozygous non-human primate variants as an orthogonal benign calibration set both creative and controversial, my hope would be that this manuscript will prompt a productive discussion.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      This manuscript presents a comprehensive functional profiling of 2,193 ASS1 missense variants using a yeast complementation assay, providing valuable data for variant interpretation in the rare disease citrullinemia type I. The dataset is extensive, technically sound, and clinically relevant. The demonstration of intragenic complementation in ASS1 is novel and conceptually important. Overall, the study represents a substantial contribution to functional genomics and rare disease variant interpretation.

      Major comments

      This is an exciting paper as it can provide support to clinicians to make actionable decisions when diagnosing infants. I have a few major comments, but I want to emphasize the label of "functionally unimpaired" variants to be misleading. The authors explain that there are several pathogenic ClinVar variants that fall into this category (above the >.85 growth threshold) but I think this category needs a more specific name and I would ask the authors to reiterate the shortcomings of the assay again in the Discussion section. I think there's an important discussion to be had here, is the assay detecting variants that alter the function of ASS or is it detecting a complete ablation of enzymatic activity? The results might be strengthened with a follow-up experiment that identifies stably expressed ASS1 variants. At the very least, it would be great to see the authors replicate some of their interesting results from the high-throughput screen by down-selecting to ~12 variants of uncertain significance that could be newly considered pathogenic. I would ask the authors to provide more citations of the literature in the introduction of the manuscript. I would be especially interested in knowing more about human ASS being identified as a homolog of yeast ARG1, as they share little sequence similarity (27.5%) at the protein level. That said, I find the yeast complementation assay exciting. I appreciate the efforts made by the authors to share their work and make this study more reproducible, such as sharing the hASS1 and yASS1 plasmids being shared on NCBI Genbank (Line 121) and publishing the ONT reads on SRA (Line 154). I made a requests for additional data to be shared, such as the custom method/code for codon optimization and a table of Twist variant cassettes that were ordered. I would also love to see these results shared on MaveDB.org. I find this manuscript very exciting as the authors have a compelling assay that identifies pathogenic variants, but I was generally disappointed by the quality and organization of the figures. For example, Figure 4 provides very little insight, but could be dramatically improved with an overlay of the normalized growth score data or highlighting variants surrounding the substrate or ATP interfaces. There are some very interesting aspects of this manuscript that could be shine through with some polished figures. I would also encourage the authors to generate a heatmap of the data represented in Figure 2 (see Fowler and Fields 2014 PMID 25075907, Figure 2), this would be more helpful reference to the readers.

      My major comments are as follows:

      1. Citations needed - especially in the introduction and for establishing that hASS is a homolog of yARG1
      2. Generally, the authors do a nice job distinguishing the ASS1 gene from the ASS enzyme, though I found some ambiguities (Line 685). Please double-check the use of each throughout the manuscript
      3. Generally, I'm confused about what strain was used for integrating all these variants, was is the arg1 knock-out strain from the yeast knockout collection or was it FY4? I think FY4 was used for the preliminary experiments, then the KO collection strain was used for making the variant library but I think this could be made more clear in the text and figures. Lines 226-229 describes introducing the hASS1 and yASS1 sequences into the native ARG1 locus in strain FY4, but the Fig1A image depicts the ASS1 variants going into arg1 KO locus. Fig1A should be moved to Fig2.
      4. Line 303 - "We classify these variants as 'functionally unimpaired'", this is not an accurate description of these variants as Figure 2 highlights 24 pathogenic ClinVar variants that would fall into this category of "functionally unimpaired". The yeast growth assay appears to capture pathogenic variants, but there is likely some nuance of human ASS functionality that is not being assessed here. I would make the language more specific, e.g. "complementary to Arg1" or "growth-compatible".
      5. Lines 345-355 - It is interesting that there are variants that appear functional at the substrate interfacing sites. Is there anything common across these variants? Are they maintaining the polarity or hydrophobicity of the WT residue? Are any of these variants included in ClinVar or gnomAD? Are pathogenic variants found at any of these sites
      6. Lines 423-430 - The OddsPath calculation would seem to rely heavily on the thresholds of <.05 and >.85 for normalized growth. The OddsPath calculation could be bolstered with some additional analysis that emphasizes the robustness to alternative thresholds.
      7. Lines 432-441 - This is an interesting idea to use variants observed in primates, has ACMG weighed in on this? I understand that CTLN1 is an autosomal recessive disorder but I'd still be interested in seeing how the observed ASS1 missense variants in gnomAD perform in your growth assay, possibly a supplemental figure?

      Minor comments

      1. Lines 53-59 - This paragraph needs to cite the literature, especially lines 56, 57, and 59
      2. Line 61 - no need to repeat "citrullinemia type I", just use the abbreviation as it was introduced in the paragraph above
      3. Lines 61-71 - again, this paragraph needs more literature citations
      4. Line 62 - change to "results"
      5. Line 74-75 - "RUSP" acronym not needed as it's never used in the manuscript, the same goes for "HHS"
      6. Line 86 - "ASS1" I think is referring to the enzyme and should just be "ASS"? If referring to the gene then italicize to "ASS1"
      7. Lines 91-93 - It would be helpful to mention this is a functional screen in yeast
      8. Line 101 - It would be helpful to the readers to define SD before using the acronym, consider changing to "minimal synthetic defined (SD) medium" and afterwards can refer to as "SD medium"
      9. 109-114 - It would be great if you could share your method for designing the codon-harmonized yASS1 gene, consider sharing as a supplemental script or creating a GitHub repository linked to a Zenodo DOI for publication.
      10. Lines 135-137 - I think it's helpful to provide a full table of the cassettes ordered from Twist as well as the primers used to amplify them, consider a supplemental table
      11. Line 138 - "standard methods" is a bit vague, I'm guessing this is a Geitz and Schiestl 2007 LiAc/ssDNA protocol (PMID 17401334)? Also, was ClonNAT used to select for natMX colonies?
      12. Line 150 - change to "sequence the entire open reading frame, as previously described [4]."
      13. Line 222-223 - remove "replace" and just use "complement" (and remove the parenthesis)
      14. Line 249 - It would be great to see a supplemental alignment of the hASS1 and yASS1 sequences
      15. Line 261 - spelling "citrullemia" should be corrected to "citrullinemia"
      16. Line 280 - "using Oxford Nanopore sequencing" is a bit vague, I suggest specifying the equipment used (e.g. Oxford Nanopore Technologies MinION platform) or simplify to "via long-read sequencing (see Materials & Methods)"
      17. Line 287-289 - It would be great to see the average number of isolates per variant, as well as a plot of the variant growth estimate vs individual isolate growth
      18. Lines 324-25 - consider removing the last sentence of this paragraph, it is redundant as the following paragraph starts with the same statement
      19. Lines 327-335 - This is interesting and would benefit from its own subpanel or plot in which the normalized growth score is plotted against variants that are at conserved or diverse residues in human ASS, and see if there's a statistical difference in score between the two groupings
      20. Lines 339-341 - As written, it is unclear if aspartate interacts with all of the same residues as citrulline or just Asn123 and Thr119.
      21. Lines 345-355 - As with my above comment, I find this interesting and would
      22. Line 353 - add a period to "al" in "Diez-Fernandex et al."
      23. Figure 1

      a. Remove "Figure" from the subpanels and show just "A" and "B" (as you do for Figure 4) and combine the two images into a single image. Also make this correction to Figure 5 and Figure 8

      b. Panel A - I thought the hASS1 and yASS1 were dropped into FY4, not the arg1 KO strain. This needs clarification

      c. Panel A - I'm assuming the natMX cassette contains its own promoter, you could use a right-angled arrow to indicate where the promotors are in your construct

      d. Panel B - I'm not sure the bar graph is necessary, it would be more helpful to see calculations of the colony size (or growth curves for each strain) and plot the raw values (maybe pixel counts?) for each replicate rather than normalizing to yeast ARG1. I would be great to have a supplemental figure showing all the replicates side-by-side

      e. Panel B - Would be helpful to denote the pathogenic and benign ClinVar variants with an icon or colored text

      f. Figure 1 Caption - make "A)" and "B)" bold 24. Figure 2

      a. "Shown in magenta are amino acid substitutions corresponding to ClinVar pathogenic, pathogenic/likely pathogenic, and likely pathogenic variants" is repeated in the figure caption

      b. "Shown in green are amino acid substitutions corresponding to ClinVar benign and likely benign variants." I don't see any green points

      c. Identify the colors used for ASS1 substrate binding residues

      d. This plot would benefit from a depiction of the human ASS secondary structure and any protein domains (nucleotide-binding domain, synthase domain, and C-terminal helix from Fig4B)

      e. Line 685 - "ASS1" is being used in reference to the enzyme, is this correct or should it be "ASS"? 25. Figure 3

      a. Rename the "unimpaired" category as there are several pathogenic ClinVar variants that fall into this category 26. Figure 4

      a. List the PDB or AlphaFold accession used for this structure

      b. Panel A - state which colors are used for to depict each monomer. It is confusing to see several shades of pink/purple used to depict a single monomer in Panel A

      c. It is very difficult to make out the aspartate and citrulline substrates in the catalytic binding activity, consider making an inset zooming-in on this domain and displaying a ribbon diagram of the structure rather than the surface.

      d. Generally, it would be more helpful here to label any particular residues that were identified as pathogenic from your screen, or to overlay average grow scores per residue data onto the structure 27. Figure 5

      a. Line 716 - Insert a page break to place Figure 5 on its own page

      b. I suggest using a heatmap for this type of plot, as it is very difficult to track which color corresponds to which residue

      c. Fig5A - This plot could be improved by identifying which residue positions interface with which substrate 28. Figure 7

      a. Line 735 - Insert page break to place figure on a new page

      b. List the PDB accession used for these images

      c. For clarity I would mention "human ASS" in the figure title

      d. State the colors of the substrates

      e. Panels A and B could be combined into a single panel, making it easier to distinguish the active site and dimerization variants

      f. Could be interesting to get SASA scores for the ClinVar structural variants to determine if they are surface-accessible 29. Figure 8

      a. Panel B - overlay a square frame in the larger protein structure that depicts where the below inset is focused, and frame inset image as well.

      Significance

      This study represents a substantial technical, functional, and translational advance in the interpretation of missense variation in ASS1, a gene of high clinical relevance for the rare disease citrullinemia type I. Its principal strength lies in the generation of an experimentally validated functional atlas of ASS1 missense variants that covers ~90% of all SNV-accessible substitutions. The scale, internal reproducibility, and careful benchmarking of the yeast complementation assay against known pathogenic and benign variants provide a robust foundation for identifying pathogenic ASS1 variants. Particularly strong aspects include the rigorous quality control of variant identities, the quantitative nature of the functional readout, and the thoughtful integration of results into the ACMG/AMP OddsPath framework. The discovery of intragenic complementation between variants affecting distinct structural regions of the enzyme is a notable conceptual and mechanistic contribution. Limitations include the assay's reduced sensitivity to variants impacting oligomerization or subtle folding defects, and the use of yeast as a heterologous system, which may mask disease-relevant mechanisms as several pathogenic ClinVar variants were found to be "functionally unimpaired". Future work extending functional testing to additional cellular contexts or expanding genotype-level combinatorial analyses would further enhance clinical applicability.

      Relative to prior studies, which have relied on small numbers of patient-derived variants or low-throughput biochemical assays, this work extends the field decisively by delivering a comprehensive, variant-resolved functional map for ASS1. To the best of my current knowledge, this is the first systematic functional screen of ASS1 at this scale and the first direct experimental demonstration that ASS active sites span multiple subunits, enabling intragenic complementation consistent with Crick and Orgel's classic variant sequestration model. As such, the advance is simultaneously technical (high-throughput functional genomics), mechanistic (defining structural contributors to catalysis and epistasis), and clinical (enabling evidence-based reclassification of VUS). I find the use of homozygous non-human primate variants as an orthogonal benign calibration set both creative and controversial, my hope would be that this manuscript will prompt a productive discussion.

    1. You should always say, ma’am and sir. You should never say, ma’am and sir.

      Points like this remind us that what we consider as "right" or "proper" or even kind can come across as offensive or blatantly wrong to others. What does it look like for us to be humble and open enough to the fact that our conceptions of what is acceptable may not be as objective as we think?

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) Legionella effectors are often activated by binding to eukaryote-specific host factors, including actin. The authors should test the following: a) whether Lfat1 can fatty acylate small G-proteins in vitro; b) whether this activity is dependent on actin binding; and c) whether expression of the Y240A mutant in mammalian cells affects the fatty acylation of Rac3 (Figure 6B), or other small G-proteins.

      We were not able to express and purify the full-length recombinant Lfat1 to perform fatty acylation of small GTPases in vitro. However, In cellulo overexpression of the Y240A mutant still retained ability to fatty acylate Rac3 and another small GTPase RheB (see Figure 6-figure supplement 2). We postulate that under infection conditions, actin-binding might be required to fatty acylate certain GTPases due to the small amount of effector proteins that secreted into the host cell.

      (2) It should be demonstrated that lysine residues on small G-proteins are indeed targeted by Lfat1. Ideally, the functional consequences of these modifications should also be investigated. For example, does fatty acylation of G-proteins affect GTPase activity or binding to downstream effectors?

      We have mutated K178 on RheB and showed that this mutation abolished its fatty acylation by Lfat1 (see Author response image 1 below). We were not able to test if fatty acylation by Lfat1 affect downstream effector binding.

      Author response image 1.

      (3) Line 138: Can the authors clarify whether the Lfat1 ABD induces bundling of F-actin filaments or promotes actin oligomerization? Does the Lfat1 ABD form multimers that bring multiple filaments together? If Lfat1 induces actin oligomerization, this effect should be experimentally tested and reported. Additionally, the impact of Lfat1 binding on actin filament stability should be assessed. This is particularly important given the proposed use of the ABD as an actin probe.

      The ABD domain does not form oligomer as evidenced by gel filtration profile of the ABD domain. However, we do see F-actin bundling in our in vitro -F-actin polymerization experiment when both actin and ABD are in high concentration (data not shown). Under low concentration of ABD, there is not aggregation/bundling effect of F-actin.

      (4) Line 180: I think it's too premature to refer to the interaction as having "high specificity and affinity." We really don't know what else it's binding to.

      We have revised the text and reworded the sentence by removing "high specificity and affinity."

      (5) The authors should reconsider the color scheme used in the structural figures, particularly in Figures 2D and S4.

      Not sure the comments on the color scheme of the structure figures.

      (6) In Figure 3E, the WT curve fits the data poorly, possibly because the actin concentration exceeds the Kd of the interaction. It might fit better to a quadratic.

      We have performed quadratic fitting and replaced Figure 3E.

      (7) The authors propose that the individual helices of the Lfat1 ABD could be expressed on separate proteins and used to target multi-component biological complexes to F-actin by genetically fusing each component to a split alpha-helix. This is an intriguing idea, but it should be tested as a proof of concept to support its feasibility and potential utility.

      It is a good suggestion. We plan to thoroughly test the feasibility of this idea as one of our future directions.

      (8) The plot in Figure S2D appears cropped on the X-axis or was generated from a ~2× binned map rather than the deposited one (pixel size ~0.83 Å, plot suggests ~1.6 Å). The reported pixel size is inconsistent between the Methods and Table 1-please clarify whether 0.83 Å refers to super-resolution.

      Yes, 0.83 Å is super-resolution.  We have updated in the cryoEM table

      Reviewer #2:

      Weaknesses:

      (1) The authors should use biochemical reactions to analyze the KFAT of Llfat1 on one or two small GTPases shown to be modified by this effector in cellulo. Such reactions may allow them to determine the role of actin binding in its biochemical activity. This notion is particularly relevant in light of recent studies that actin is a co-factor for the activity of LnaB and Ceg14 (PMID: 39009586; PMID: 38776962; PMID: 40394005). In addition, the study should be discussed in the context of these recent findings on the role of actin in the activity of L. pneumophila effectors.

      We have new data showed that Actin binding does not affect Lfat1 enzymatic activity. (see response to Reviewer #1). We have added this new data as Figure S7 to the paper. Accordingly, we also revised the discussion by adding the following paragraph.

      “The discovery of Lfat1 as an F-actin–binding lysine fatty acyl transferase raised the intriguing question of whether its enzymatic activity depends on F-actin binding. Recent studies have shown that other Legionella effectors, such as LnaB and Ceg14, use actin as a co-factor to regulate their activities. For instance, LnaB binds monomeric G-actin to enhance its phosphoryl-AMPylase activity toward phosphorylated residues, resulting in unique ADPylation modifications in host proteins  (Fu et al, 2024; Wang et al, 2024). Similarly, Ceg14 is activated by host actin to convert ATP and dATP into adenosine and deoxyadenosine monophosphate, thereby modulating ATP levels in L. pneumophila–infected cells (He et al, 2025). However, this does not appear to be the case for Lfat1. We found that Lfat1 mutants defective in F-actin binding retained the ability to modify host small GTPases when expressed in cells (Figure S7). These findings suggest that, rather than serving as a co-factor, F-actin may serve to localize Lfat1 via its actin-binding domain (ABD), thereby confining its activity to regions enriched in F-actin and enabling spatial specificity in the modification of host targets.”

      (2) The development of the ABD domain of Llfat1 as an F-actin domain is a nice extension of the biochemical and structural experiments. The authors need to compare the new probe to those currently commonly used ones, such as Lifeact, in labeling of the actin cytoskeleton structure.

      We fully agree with the reviewer’s insightful suggestion. However, a direct comparison of the Lfat1 ABD domain with commonly used actin probes such as Lifeact, as well as evaluation of the split α-helix probe (as suggested by Reviewer #1), would require extensive and technically demanding experiments. These are important directions that we plan to pursue in future studies.

      For all other minors, we have made corrections/changes in our revised text and figures.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Yamamoto et al. presents a model by which the four main axes of the limb are required for limb regeneration to occur in the axolotl. A longstanding question in regeneration biology is how existing positional information is used to regenerate the correct missing elements. The limb provides an accessible experimental system by which to study the involvement of the anteroposterior, dorsoventral, and proximodistal axes in the regenerating limb. Extensive experimentation has been performed in this area using grafting experiments. Yamamoto et al. use the accessory limb model and some molecular tools to address this question. There are some interesting observations in the study. In particular, one strength the potent induction of accessory limbs in the dorsal axis with BMP2+Fgf2+Fgf8 is very interesting. Although interesting, the study makes bold claims about determining the molecular basis of DV positional cues, but the experimental evidence is not definitive and does not take into account the previous work on DV patterning in the amniote limb. Also, testing the hypothesis on blastemas after limb amputation would be needed to support the strong claims in the study.

      Strengths:

      The manuscript presents some novel new phenotypes generated in axolotl limbs due to Wnt signaling. This is generally the first example in which Wnt signaling has provided a gain of function in the axolotl limb model. They also present a potent way of inducing limb patterning in the dorsal axis by the addition of just beads loaded with Bmp2+Fgf8+Fgf2.

      Comments on revised version:

      Re-evaluation: The authors have significantly improved the manuscript and their conclusions reflect the current state of knowledge in DV patterning of tetrapod limbs. My only point of consideration is their claim of mesenchymal and epithelial expression of Wnt10b and the finding that Fgf2 and Wnt10b are lowly expressed. It is based upon the failed ISH, but this doesn't mean they aren't expressed. In interpreting the Li et al. scRNAseq dataset, conclusions depend heavily on how one analyzes and interprets it. The 7DPA sample shows a very low representation of epithelial cells compared to other time points, but this is likely a technical issue. Even the epithelial marker, Krt17, and the CT/fibroblast marker show some expression elsewhere. If other time points are included in the analysis, Wnt10b, would be interpreted as relatively highly expressed almost exclusively in the epithelium. By selecting the 7dpa timepoint, which may or may not represent the MB stage as it wasn't shown in the paper, the conclusions may be based upon incomplete data. I don't expect the authors to do more work, but it is worth mentioning this possibility. The authors have considered and made efforts to resolve previous concerns.

      We are grateful for the constructive comments. As Reviewer #1 suggested, we noted that clearer expression patterns of Wnt10b and Fgf2 may be detectable in scRNA-seq analyses at other stages, and we also clarified that low-level signals of epithelial and CT/fibroblast markers outside their expected clusters may reflect technical bias in the Discussion section. In addition, we agree with the reviewer’s point that our unsuccessful ISH experiments and the low abundance detected by RT-qPCR do not demonstrate absence of expression, and that conclusions from reanalyzing the Li et al. scRNA-seq dataset can depend strongly on analytical choices; therefore, while we focused on the 7 dpa sample because our RT-qPCR data suggested that Wnt10b and Fgf2 may be most enriched around the MB stage (the original study refers to 7 dpa as MB), we explicitly acknowledged that analyzing a single time point—especially one with a low representation of epithelial cells—may yield incomplete or stage-biased interpretations, and that inclusion of additional datasets could reveal clearer and potentially different expression patterns in the Discussion section. We also tempered our wording regarding the inferred cellular sources to avoid over-interpretation based on the current data in the Results section.

      Reviewer #2 (Public review):

      Summary:

      This study explores how signals from all sides of a developing limb, front/back and top/bottom, work together to guide the regrowth of a fully patterned limb in axolotls, a type of salamander known for its impressive ability to regenerate limbs. Using a model called the Accessory Limb Model (ALM), the researchers created early staged limb regenerates (called blastemas) with cells from different sides of the limb. They discovered that successful limb regrowth only happens when the blastema contains cells from both the top (dorsal) and bottom (ventral) of the limb. They also found that a key gene involved in front/back limb patterning, called Shh (Sonic hedgehog), is only turned on when cells from both the dorsal and ventral sides come into contact. The study identified two important molecules, Wnt10B and FGF2, that help activate Shh when dorsal and ventral cells interact. Finally, the authors propose a new model that explains how cells from all four sides of a limb, dorsal, ventral, anterior (front), and posterior (back), contribute at both the cellular and molecular level to rebuilding a properly structured limb during regeneration.

      Strengths:

      The techniques used in this study, like delicate surgeries, tissue grafting, and implanting tiny beads soaked with growth factors, are extremely difficult, and only a few research groups in the world can do them successfully. These methods are essential for answering important questions about how animals like axolotls regenerate limbs with the correct structure and orientation. To understand how cells from different sides of the limb communicate during regeneration, the researchers used a technique called in situ hybridization, which lets them see where specific genes are active in the developing limb. They clearly showed that the gene Shh, which helps pattern the front and back of the limb, only turns on when cells from both the top (dorsal) and bottom (ventral) sides are present and interacting. The team also took a broad, unbiased approach to figure out which signaling molecules are unique to dorsal and ventral limb cells. They tested these molecules individually and discovered which could substitute for actual dorsal and ventral cells, providing the same necessary signals for proper limb development. Overall, this study makes a major contribution to our understanding of how complex signals guide limb regeneration, showing how different regions of the limb work together at both the cellular and molecular levels to rebuild a fully patterned structure.

      Weaknesses:

      Because the expressional analyses are performed on thin sections of regenerating tissue, in the original manuscript, they provided only a limited view of the gene expression patterns in their experiments, opening the possibility that they could be missing some expression in other regions of the blastema. Additionally, the quantification method of the expressional phenotypes in most of the experiments did not appear to be based on a rigorous methodology. The authors' inclusion of an alternate expression analysis, qRT-PCR, on the entire blastema helped validate that the authors are not missing something in the revised manuscript.

      Overall, the number of replicates per sample group in the original manuscript was quite low (sometimes as low as 3), which was especially risky with challenging techniques like the ones the authors employ. The authors have improved the rigor of the experiment in the revised manuscript by increasing the number of replicates. The authors have not performed a power analysis to calculate the number of animals used in each experiment that is sufficient to identify possible statistical differences between groups. However, the authors have indicated that there was not sufficient preliminary data to appropriately make these quantifications.

      Likewise, in the original manuscript, the authors used an AI-generated algorithm to quantify symmetry on the dorsal/ventral axis, and my concern was that this approach doesn't appear to account for possible biases due to tissue sectioning angles. They also seem to arbitrarily pick locations in each sample group to compare symmetry measurements. There are other methods, which include using specific muscle groups and nerve bundles as dorsal/ventral landmarks, that would more clearly show differences in symmetry. The authors have now sufficiently addressed this concern by including transverse sections of the limbs annd have explained the limitations of using a landmark-based approach in their quantification strategy.

      We are grateful for the careful evaluation of the technical rigor and quantification. We have benefited from the reviewer’s earlier feedback, which guided revisions that improved the manuscript’s rigor and presentation.

      Reviewer #3 (Public review):

      Summary:

      After salamander limb amputation, the cross-section of the stump has two major axes: anterior-posterior and dorsal-ventral. Cells from all axial positions (anterior, posterior, dorsal, ventral) are necessary for regeneration, yet the molecular basis for this requirement has remained unknown. To address this gap, Yamamoto et al. took advantage of the ALM assay, in which defined positional identities can be combined on demand and their effects assessed through the outgrowth of an ectopic limb. They propose a compelling model in which dorsal and ventral cells communicate by secreting Wnt10b and Fgf2 ligands respectively, with this interaction inducing Shh expression in posterior cells. Shh was previously shown to induce limb outgrowth in collaboration with anterior Fgf8 (PMID: 27120163). Thus, this study completes a concept in which four secreted signals from four axial positions interact for limb patterning. Notably, this work firmly places dorsal-ventral interactions upstream of anterior-posterior, which is striking for a field that has been focussed on anterior-posterior communication. The ligands identified (Wnt10b, Fgf2) are different to those implicated in dorsal-ventral patterning in the non-regenerative mouse and chick models. The strength of this study is in the context of ALM/ectopic limb engineering. Although the authors attempt to assay the expression of Wnt10b and Fgf2 during limb regeneration after amputation, they were unable to pinpoint the precise expression domains of these genes beyond 'dorsal' and 'ventral' blastema. Given that experimental perturbations were not performed in regenerating limbs - almost exclusively under ALM conditions - this author finds the title "Dorsoventral-mediated Shh induction is required for axolotl limb regeneration" a little misleading.

      Strengths:

      (1) The ALM and use of GFP grafts for lineage tracing (Figures 1-3) take full advantage of the salamander model's unique ability to outgrow patterned limbs under defined conditions. As far as I am aware, the ALM has not been combined with precise grafts that assay 2 axial positions at once, as performed in Figure 3. The number of ALMs performed in this study deserves special mention, considering the challenging surgery involved.

      (2) The authors identify that posterior Shh is not expressed unless both dorsal and ventral cells are present. This echoes previous work in mouse limb development models (AER/ectoderm-mesoderm interaction) but this link between axes was not known in salamanders. The authors elegantly reconstitute dorsal-ventral communication by grafting, finding that this is sufficient to trigger Shh expression (Figure 3 - although see also section on Weaknesses).

      (3) Impressively, the authors discovered two molecules sufficient to substitute dorsal or ventral cells through electroporation into dorsal- or ventral- depleted ALMs (Figure 5). These molecules did not change the positional identity of target cells. The same group previously identified the ventral factor (Fgf2) to be a nerve-derived factor essential for regeneration. In Figure 6, the authors demonstrate that nerve-derived factors, including Fgf2, are alone sufficient to grow out ectopic limbs from a dorsal wound. Limb induction with a 3-factor cocktail without supplementing with other cells is conceptually important for regenerative engineering.

      (4) The writing style and presentation of results is very clear.

      Overall appraisal:

      This is a logical and well-executed study that creatively uses the axolotl model to advance an important framework for understanding limb patterning. The relevance of the mechanisms to normal limb regeneration are not yet substantiated, in the opinion of this reviewer. Additionally, Wnt10b and Fgf2 should be considered molecules sufficient to substitute dorsal and ventral identity (solely in terms of inducing Shh expression). It is not yet clear whether these molecules are truly necessary (loss of function would address this).

      Comments on revisions:

      Congratulations - I still find this an elegant and easy-to-read study with significant implications for the field! Linking your mechanisms to normal limb regeneration (i.e. regenerating blastema, not ALM), as well as characterising the cell populations involved, will be interesting directions for the future.

      We are grateful for the constructive comments. To mitigate the concerns raised by Reviewer #3, we cited a previous study suggesting that ALM was used as the alternative experimental system for studying limb regeneration (Nacu et al., 2016, Nature, PMID: 27120163; Satoh et al., 2007, Developmental Biology, PMID: 17959163) in the Introduction section. We are confident that our ALM-based data provide a reasonable basis for understanding limb regeneration. We agree that there are important remaining questions—such as which cell populations express Wnt10b and Fgf2 and how endogenous WNT10B and FGF2 signals induce Shh expression in normal regeneration—which should be investigated in future studies to deepen our understanding of limb regeneration.


      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors should be commended for addressing this gap - how cues from the DV axis interact with the AP axis during limb regeneration. Overall, the concept presented in this manuscript is extremely interesting and could be of high value to the field. However, the manuscript in its current form is lacking a few important data and resolution to fully support their conclusions, and the following needs to be addressed before publication:

      (1) ISH data on Wnt10b and FGF2 from various regeneration time points are essential to derive the conclusion. Preferably multiplex ISH of Wnt10b/Fgf2/Shh or at least canonical ISH on serial sections to demonstrate their expression in dermis/epidermis and order of gene expression i.e. Shh is only expressed after expression of Wnt10b/FGF2. It would certainly help if this can also be shown in regular blastema.

      We are grateful for the constructive suggestion on assessing Wnt10b and Fgf2 expression during regular regeneration, and we agree that clarifying their expression patterns in regular blastemas is important for strengthening the conclusions of our study. Because we cannot currently ensure sufficient sensitivity with multiplex FISH in our laboratory—partly due to high background—, we conducted conventional ISH on serial sections of regular blastemas at several time points (Fig. S5A). However, the expression patterns of Wnt10b and Fgf2 were not clear. To complement the ISH results, we performed RT-qPCR on microdissected dorsal and ventral halves of regular blastemas at the MB stage (Fig. S5B). We found that Wnt10b and Fgf2 were expressed at significantly higher levels in the dorsal and ventral halves, respectively, compared to the opposite half. This dorsal/ventral biased expression of Wnt10b/Fgf2 is consistent with our RNA-seq data. We further quantified expression levels of Wnt10b, Fgf2, and Shh across stages (intact, EB, MB, LB, and ED) and found that Wnt10b and Fgf2 peaked at the MB stage, whereas Shh peaked at the LB stage—consistent with the editor’s request regarding the order of gene expression (Fig. S5C). This temporal offset in upregulation supports our model. These results are now included in the revised manuscript (Line 294‒306).

      To identify the cell types expressing Wnt10b or Fgf2, we analyzed published single-cell RNA-seq data (7 dpa blastema (MB), Li et al., 2021). As a result, Fgf2 expression was observed in the mesenchymal cluster, whereas Wnt10b expression was observed in both mesenchymal and epithelial clusters (Fig. S6). However, because only a small fraction of cells expressed Wnt10b, the principal cellular source of WNT10B protein remains unclear. The apparent low abundance likely contributes to the weak ISH signals and reflects current technical limitations. In addition, Wnt10b and Fgf2 expression did not follow Lmx1b expression (Fig. S6J, K), and Wnt10b and Fgf2 themselves were not exclusive (Fig. S6L). These results are now included in the revised manuscript (Line 307‒321). Together with the RT-qPCR data (Fig. S5B), these results suggest that Wnt10b and Fgf2 are not exclusively confined to purely dorsal or ventral cells at the single-cell level, even though they show dorsoventral bias when assessed in bulk tissue. These results suggest that Wnt10b/Fgf2 expression is not restricted to dorsal/ventral cells but mediated by dorsal/ventral cells, and co-existence of both signals should provide a permissive environment for Shh induction. Defining the precise spatial patterns of Wnt10b and Fgf2 in regular regeneration will therefore be an important goal for future work.  

      (2) Validation of the absence of gene expression via qRT PCR in the given sample will increase the rigor, as suggested by reviewers.

      We thank for this important suggestion and agree that validation by qRT-PCR increases the rigor of our study. Accordingly, we performed RT-qPCR on AntBL, PostBL, DorBL, and VentBL to corroborate the ISH results. The results are now included in Fig. 2. We also verified by RT-qPCR that Shh expression following electroporation and the quantitative results are now provided in Fig. 5.

      (3) Please increase n for experiments where necessary and mention n values in the figures.

      We thank for this helpful comment and agree on the importance of providing sufficient sample sizes. Accordingly, we increased the n for the relevant experiments and have indicated the n values in the corresponding figure legends.

      (4) Most comments by all three reviewers are constructive and largely focus on improving the tone and language of the manuscript, and I expect that the authors should take care of them.

      We thank the reviewers for their constructive feedback on the tone and language of the manuscript. We have carefully revised the text according to each comment, and we hope these modifications have improved both clarity and readability.

      In addition, in revising the manuscript we also refined the conceptual framework. Our new analysis of Wnt10b and Fgf2 expression during normal regeneration suggests that these genes are not expressed in a strictly dorsal- or ventral-specific manner at the single-cell level. When these observations are considered together with (i) the RNA-seq comparison of dorsally and ventrally induced ALM blastemas, (ii) RT-qPCR of microdissected dorsal and ventral halves of regenerating blastemas, and (iii) the functional electroporation experiments, our interpretation is that Wnt10b and Fgf2 act as dorsal- and ventral-mediated signals, respectively: their production is regulated by dorsal or ventral cells, and the presence of both signals is required to induce Shh expression. Given those, we now think our conclusion might be explained without using the confusing term, “positional cue”. Because the distinction between “positional cue” and “positional information” could be confusing as noted by the reviewers, we rewrote our manuscript without using “positional cue.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 61: More explanation for what a double-half limb means is needed.

      We thank the reviewer for this suggestion. We have revised the manuscript (Line 73‒76). Specifically, we now explain that a double-dorsal limb, for example, is a chimeric limb generated by excising the ventral half and replacing it with a dorsal half from the contralateral limb while preserving the anteroposterior orientation.

      (2) Line 63-65: "Such blastemas form hypomorphic, spike-like structures or fail to regenerate entirely." This statement does not represent the breadth of work on the APDV axis in limb regeneration. The cited Bryant 1976 reference tested only double-posterior and double-anterior newt limbs, demonstrating the importance of disposition along the AP axis, not DV. Others have shown that the regeneration of double-half limbs depends upon the age of the animal and the length of time between the grafting of double-half limbs and amputation. Also, some double-dorsal or double-ventral limbs will regenerate complete AP axes with symmetrical DV duplications (Burton, Holder, and Jesani, 1986). Also, sometimes half dorsal stylopods regenerate half dorsal and half ventral, or regenerate only half ventral, suggesting there are no inductive cues across the DV axis as there are along the AP axis. Considering this is the basis of the study under question, more is needed to convince that the DV axis is necessary for the generation of the AP axis.

      We thank the reviewer for this detailed and constructive comment. We acknowledge that previous studies have reported a range of outcomes for double-half limbs. For example, Burton et al. (1986) described regeneration defects in double-dorsal (DD) and double-ventral (VV) limbs, although limb patterning did occur in some cases (Burton et al., 1986, Table 1). As the reviewer notes, regenerative outcomes depend on variables such as animal age and the interval between construction of the double-half limb and amputation, sometimes called the effect of healing time (Tank and Holder, 1978). Moreover, variability has been reported not only in DD/VV limbs but also in double-anterior (AA) and double-posterior (PP) limbs (e.g., Bryant, 1976; Bryant and Baca, 1978; Burton et al., 1986). In the revised manuscript, we have therefore modified the statement to avoid over-generalization and to emphasize that regeneration can be incomplete under these conditions (Line 76‒82). Importantly, in order to provide the additional evidence requested and to directly re-evaluate whether dorsal and ventral cells are required for limb patterning, we performed the ALM experiments shown in Fig. 1. The ALM system allows us to assess this question in a binary manner (regeneration vs. non-regeneration), thereby strengthening the rationale for our conclusions regarding the necessity of the APDV orientations. We also revised a sentence at the beginning of the Results section to emphasize this point (Line 139‒140).

      (3) Line 71: These findings suggest that specific signals from all four positional domains must be integrated for successful limb patterning, such that the absence of any one of them leads to failure." I was under the impression that half posterior limbs can grow all elements, but half anterior can only grow anterior elements.

      We thank the reviewer for this helpful clarification. As summarized by Stocum, half-limb experiments show that while some digit formation can occur, limb patterning remains incomplete in both anterior-half and posterior-half limbs in some cases (Stocum, 2017). We see this point as closely related to the broader question of whether proper limb patterning requires the integration of signals from all four positional domains. As noted in our response above, our ALM experiments in Fig. 1 were designed to test this point directly, and our data support the interpretation that cells from all four orientations are necessary for correct limb patterning.

      (4) Line 79-81: This is stated later in lines 98-105. I suggest expanding here or removing it here.

      We thank the reviewer for this suggestion. In the original version, lines 79–81 introduced our use of the terms “positional cue” and “positional information,” and this content partially overlapped with what later appeared in lines 98–105. In the revised manuscript, we have substantially rewritten this section (Line 82‒84), including the sentences corresponding to lines 79–81 in the original version, to remove the term “positional cue,” as explained in our response to the Editor’s comment (4); our revision reflects new analyses indicating that Wnt10b and Fgf2 appear not be strictly restricted to dorsal or ventral cell populations, and we now describe these factors as dorsal- or ventral-mediated signals that act across dorsoventral domains to induce Shh expression. Accordingly, we no longer maintain the original use of “positional cue” and “positional information.”

      (5) Line 92 - 93: "Similarly, an ALM blastema can be induced in a position-specific manner along the limb axes. In this case, the induced ALM blastema will lack cells from the opposite side." This sentence is difficult to follow. Isn't it the same thing stated in lines 88-90?

      We thank the reviewer for this comment. We revised the sentence to improve readability and to avoid redundancy with original Lines 88–90 (Line 104‒106).

      (6) Line 107: I think the appropriate reference is McCusker et al., 2014 (Position-specific induction of ectopic limbs in non-regenerating blastemas on axolotl forelimbs), although Vieira et al., 2019 can be included here. In addition, Ludolph et al 1990 should be cited.

      We thank the reviewer for this suggestion. We have added McCusker et al. (2014) and Ludolph et al. (1990) as references in the revised manuscript (Line 120‒121).

      (7) Line 107-109: A missing point is how the ventral information is established in the amniote limb. From what I remember, it is the expression of Engrailed 1, which inhibits the ventral expression of Wnt7a, and hence Lmx1b. This would suggest that there is no secreted ventral cue. This is a relatively large omission in the manuscript.

      We thank the reviewer for this comment. We agree that ventral fate in amniotes is specified by En1 in the ventral ectoderm, which represses Wnt7a and thereby prevents induction of Lmx1b; accordingly, a secreted ventral morphogen analogous to dorsal Wnt7a has not been established. We added this point to the revised Introduction (Line 61‒64).

      By contrast, in axolotl limb regeneration, our previous work on Lmx1b expression suggests that DV identities reflect the original positional identity rather than being re-specified during regeneration (Yamamoto et al., 2022). Within this framework, our original use of the term “ventral positional cue” does not imply a ventral patterning morphogen in the amniote sense; rather, it denotes downstream signals induced by cells bearing ventral identity that are required for the blastema to form a patterned limb. This interpretation is consistent with classic studies on double-half chimeras and ectopic contacts between opposite regions (Iten & Bryant, 1975; Bryant & Iten, 1976; Maden, 1980; Stocum, 1982) as well as with our ALM data (Fig. 1). For this reason, we intentionally used the term “positional cues” to refer to signals provided by cells bearing ventral identity, which can be considered separable from the DV patterning mechanism itself, in the original text. As explained in our response to the Editor’s comment (4), we describe these signals as “signals mediated by dorsal/ventral cells,” rather than “positional cues” in the revised manuscript.

      The necessity of dorsal- and ventral-mediated signals is supported by classic studies on the double-half experiment. In the non-regenerating cases, structural patterns along the anteroposterior axis appear to be lost even though both anterior and posterior cells should, in principle, be present in a blastema induced from a double-dorsal or double-ventral limbs. In limb development of amniotes, Wnt7a/Lmx1b or En-1 mutants show that limbs can exhibit anteroposterior patterning even when tissues are dorsalized or ventralized—that is, in the relative absence of ventral or dorsal cells, respectively (Riddle et al., 1995; Chen et al., 1998; Loomis et al., 1996). Taken together, axolotl limb regeneration, in which the presence of both dorsal and ventral cells plays a role in anteroposterior patterning, should differ from other model organisms. It is reasonable to predict the dorsal- and ventral-mediated signals in axolotl limb regeneration. We included this point in the revised manuscript (Line 82‒89). However, there is no evidence that these signals are secreted molecules. For this reason, we have carefully used the term “dorsal-/ventral-mediated signals” in the Introduction without implying secretion.

      (8) Introduction - In general, the argument is a bit misleading. It is written as if it is known that a ventral cue is necessary, but the evidence from other animal models is lacking, from what I know. I may be wrong, but further argument would strengthen the reasoning for the study.

      We thank the reviewer for this thoughtful comment. We agree that it should not read as if it is known that a ventral cue is necessary. In the revised Introduction, we have addressed this in several ways. First, as described in our response to comment (7), we now explicitly note that in amniote limb development ventral identity is specified by En1-mediated repression of Wnt7a, and that a secreted ventral morphogen equivalent to dorsal Wnt7a has not been established. Second, we removed the term “positional cue” and no longer present “ventral positional cue” as a defined entity. Instead, we use mechanistic phrasing such as “signals mediated by ventral cells” and “signals mediated by dorsal cells,” which does not assume that such signals are secreted morphogens or universally conserved. Third, we have reframed the role of dorsal- and ventral-mediated signals as a working hypothesis specific to axolotl limb regeneration, rather than as a general conclusion across model systems.

      (9) Line 129: Remove "As mentioned before".

      We thank the reviewer for this suggestion. We have removed the phrase “As mentioned before” in the revised manuscript (Line 143).

      (10) Figure 1: Are Lmx1, Fgf8, and Shh mutually exclusive? Multiplexed FISH would provide this information, and is a relatively important question considering the strong claims in the study.

      We thank the reviewer for raising this important point. As noted in our response to the editor’s comment, we cannot currently ensure sufficiently high detection sensitivity with multiplex FISH in our laboratory. However, based on previous reports (Nacu et al., 2016), Fgf8 and Shh should be mutually exclusive. In contrast, with respect to Lmx1b, our analysis suggests that its expression is not mutually exclusive with either Fgf8 or Shh, at least their expression domains. To confirm this, we analyzed the published scRNA-seq data and the results were added to the supplemental figure 6. Fgf8 and Shh were expressed in both Lmx1b-positive and Lmx1b-negative cells (Fig. S6H, I), but Fgf8 and Shh themselves were mutually exclusive (Fig. S6M). This point is now included in the revised manuscript (Line 314‒317).

      (11) Results section and Figure 2: More evidence is needed for the lack of Shh expression ISH in tissue sections. Demonstrating the absence of something needs some qPCR or other validation to make such a claim.

      We thank the reviewer for this suggestion. We performed qRT-PCR on ALM blastemas to complement the ISH data (Fig. 2).

      (12) Line 179: I think they are likely leucistic d/d animals and not wild-type animals based upon the images.

      We thank the reviewer for this observation. In the revised manuscript, we have corrected the description to “leucistic animals” (Line 194).

      (13) Line 183-186: I'm a bit confused about this interpretation. If Shh turns on in just a posterior blastema, wouldn't it turn on in a grafted posterior tissue into a dorsal or ventral region? Isn't this independent of environment, meaning Shh turns on if the cells are posterior, regardless of environment?

      Our interpretation is that only posterior-derived cells possess the competency to express Shh. In other words, whether a cell is capable of expressing Shh depends on its original positional identity (Iwata et al., 2020), but whether it actually expresses Shh depends on the environment in which the cell is placed. The results of Fig. 3E and G indicate that Shh activation is dependent on environment and that the posterior identity is not sufficient to activate Shh expression. We have revised the manuscript to emphasize this distinction more clearly (Line 198‒203).

      (14) Figure 4: Do the limbs have an elbow, or is it just a hand?

      We thank the reviewer for this thoughtful question. From the appearance, an elbow-like structure can occasionally be seen; however, we did not examine the skeletal pattern in detail because all regenerated limbs used for this analysis were sectioned for the purpose of symmetry evaluation, and we therefore cannot state this conclusively. While this is indeed an important point, analyzing proximodistal patterning would require a very large number of additional experiments, which falls outside the main focus of the present study. For this reason, and also to minimize animal use in accordance with ethical considerations, we did not pursue further experiments here. In response to this point, we have added a description of the skeletal morphology of ectopic limbs induced by BMP2+FGF2+FGF8 bead implantation (Fig. 6). In these experiments, multiple ectopic limbs were induced along the same host limb. In most cases, these ectopic limbs did not show fusion with the proximal host skeleton, similar to standard ALM-induced limbs, although in one case we observed fusion at the stylopod level. We now note this observation in the revised manuscript (Line 347‒354).

      We regard the relationship between APDV positional information and proximodistal patterning as an important subject for future investigation.

      (15) Line 203 - 237: I appreciate the symmetry score to estimate the DV axis. Are there landmarks that would better suggest a double-dorsal or double-ventral phenotype, like was done in the original double-half limb papers?

      We thank the reviewer for this thoughtful comment. In most cases, the limbs induced by the ALM exhibit abnormal and highly variable morphologies compared to normal limbs, making it difficult to apply consistent morphological landmarks as used in the original double-half limb studies. For this reason, we focused our analysis on “morphological symmetry” as a quantitative measure of DV axis patterning, and we have added this explanation to the manuscript (Line 232‒235). Additionally, we provided transverse sections along the proximodistal axis as supplemental figures (Figs. S2 and S4). In addition to reporting the symmetry score, we have explicitly stated in the text that symmetry was also assessed by visual inspection of these sections.

      (16) Line 245-247: The experiment was done using bulk sequencing, so both the epithelium and mesenchyme were included in the sample. The posterior (Shh) and anterior (Fgf8) patterning cues are mesenchymally expressed. In amniotes, the dorsal cue has been thought to be Wnt7a from the epithelium. Can ISH, FISH, or previous scRNAseq data be used to identify genes expressed in the mesenchyme versus epithelium? This is very important if the authors want to make the claim for defining "The molecular basis of the dorsal and ventral positional cues" as was stated by the authors.

      We thank the reviewer for highlighting this important point. As the reviewer notes, our bulk RNA-seq data do not distinguish between epithelial and mesenchymal expression domains. As noted in our response to the editor’s comment, we performed ISH and qPCR on regular blastemas. However, these approaches did not provide definitive information regarding the specific cell types expressing Wnt10b and Fgf2. To complement this, we re-analyzed publicly available single-cell RNA-seq data (from Li et al., 2021). As a results, Fgf2 was expressed mainly by the mesenchymal cells, and Wnt10b expression was observed in both mesenchymal and epithelial cells. These results are now included in the revised manuscript (Line 294‒321) and in supplemental figures (Fig. S6, S7).

      (17) Was engrailed 1, lmx1b, or Wnt7a differentially expressed along the DV axis, suggesting similar signaling between? Are these expressed in mesenchyme? Previous work suggests Wnt7a is expressed throughout the mesenchyme, but publicly available scRNAseq suggests that it is expressed in the epithelium.

      We thank the reviewer for this important comment. As noted, the reported expression patterns of DV-related genes are not consistent across studies, which likely reflects the technical difficulty of detecting these genes with high sensitivity. In our own experiments, expression of DV markers other than Lmx1b has been very weak or unclear by ISH. Whether these genes are expressed in the epithelium or mesenchyme also appears to vary depending on the detection method used. In our RNA-seq dataset, Wnt7a expression was detected at very low levels and showed no significant difference along the DV axis, while En1 expression was nearly absent. We have clarified these results in the revised manuscript (Line 437‒441). Our reanalysis of the published scRNA-seq likewise detected Wnt7a in only a very small fraction of cells. Accordingly, we consider it premature to reach a definitive conclusion—such as whether Wnt7a is broadly mesenchymal or restricted to epithelium—as suggested in prior reports. We also note that whether Wnt7a is epithelial or mesenchymal does not affect the conclusions or arguments of the present study. Although the roles of Wnt7a and En1 in axolotl DV patterning are certainly important, we feel that drawing a definitive conclusion on this issue lies beyond the scope of the present study, and we have therefore limited our description to a straightforward presentation of the data.

      (18) Line 247-249: The sentence suggests that all the ligands were tried. This should be included in the supplemental data.

      We thank the reviewer for this clarification. In fact, we tested only Wnt4, Wnt10b, Fgf2, Fgf7, and Tgfb2, and all of these results are presented in the figures. To avoid misunderstanding, we have revised the text to explicitly state that our analysis focused on these five genes (Line 272‒274).

      (19) Line 249: An n =3 seems low and qPCR would be a more sensitive means of measuring gene induction compared to ISH. The ISH would confirm the qPCR results. Figure 5C is also not the most convincing image of Shh induction without support from a secondary method.

      We have increased the sample size for these experiments (Line 277‒280). In addition, to complement the ISH results, we confirmed Shh induction by qPCR following electroporation of Wnt10b and Fgf2 (Fig. 5D, E). In addition, because Shh signal in the Wnt10b-electroporated VentBL images was particularly weak and difficult to discern, we replaced that panel with a representative example in which Shh signal is more clearly visible. These data are now included in the revised manuscript (Line 280‒282).

      (20) Line 253: It is confusing why Wnt10b, but not Wnt4 would work? As far as I know, both are canonical Wnt ligands. Was Wnt7a identified as expressed in the RNAseq, but not dorsally localized? Would electroporation of Wnt7a do the same thing as Wnt10b and hence have the same dorsalizing patterning mechanisms as amniotes?

      We thank the reviewer for raising this challenging but important question. Wnt10b was identified directly from our bulk RNA-seq analysis, as was Wnt4. The difference in the ability of Wnt10b and Wnt4 to induce Shh expression in VentBL may reflect differences in how these ligands activate downstream WNT signaling programs. WNT10B is a potent activator of the canonical WNT/β-catenin pathway (Bennett et al., 2005), although WNT10B has also been reported to trigger a β-catenin–independent pathway (Lin et al., 2021). By contrast, WNT4 can signal through both canonical and non-canonical (β-catenin–independent) pathways, and the balance between these outputs is known to depend on cellular context (Li et al., 2013; Li et al., 2019). Consistent with a requirement for canonical WNT signaling, we found that pharmacological activation of canonical WNT signaling with BIO (a GSK3 inhibitor) was also sufficient to induce Shh expression in VentBL. However, despite this, it is still unclear why Wnt10b, but not Wnt4, was able to induce Shh under our experimental conditions. One possible explanation is that different WNT ligands can engage the same receptors (e.g., Frizzled/LRP6) yet can drive distinct downstream transcriptional programs (This may depend on the state of the responding cells, as Voss et al. predicted), resulting in ligand-specific outputs (Voss et al., 2025). This point is now included in the revised discussion section (Line 402‒412). At present, we cannot distinguish between these possibilities experimentally, and we therefore refrain from making a stronger mechanistic claim.

      With respect to Wnt7a, we detected Wnt7a expression at very low levels, and without a clear dorsoventral bias, in our RNA-seq analysis of ALM blastemas (we describe this point in Line 437‒440). This is consistent with previous work suggesting that axolotl Wnt7a is not restricted to the dorsal region in regeneration. Because of this low and unbiased expression, and because our data already implicated Wnt10b as a dorsal-mediated signal that can act across dorsoventral domains to permit Shh induction, we did not prioritize Wnt7a electroporation in the present study. We therefore cannot conclude whether Wnt7a would behave similarly to Wnt10b in this context.

      Importantly, these uncertainties about ligand-specific mechanisms do not alter our main conclusion. Our data support the idea that a dorsal-mediated WNT signal (represented here by WNT10B and canonical WNT activation) and a ventral-mediated FGF signal (FGF2) must act together to permit Shh induction, and that the coexistence of these dorsal- and ventral-mediated signals is required for patterned limb formation in axolotl limb regeneration.

      (21) Is canonical Wnt signaling induced after electroporation of Wnt10b or Wnt4? qPCR of Lef1 and axin is the most common way of showing this.

      We thank the reviewer for this helpful suggestion. In addition to examining Shh expression, we also assessed canonical WNT signaling by qPCR analysis of Axin2 and Lef1 following Wnt10b electroporation. The data is now included in Fig. 5.

      (22) Line 255-256: qPCR was presented for Figure 5D, but ISH was used for everything else. Is there a technical reason that just qPCR was used for the bead experiments?

      We thank the reviewer for this helpful comment. In the original submission, our goal was to test whether treatment with commercial FGF2 protein or BIO could reproduce the results obtained by electroporation. In the revised manuscript, to avoid confusion between distinct experimental aims, we removed the FGF2–bead data from this section and instead used RT-qPCR to quantitatively corroborate Shh induction after electroporation (Fig. 5D–E). RT-qPCR provided a sensitive, whole-blastema readout and allowed a paired design (left limb: factor; right limb: GFP control) that increased statistical power while minimizing animal use. To address the reviewer’s point more directly, we additionally performed ISH for the BIO treatment and now include those results in Supplementary Figure 3 (Line 287‒288).

      (23) Line 261-263: The authors did not show where Wnt10B or Fgf2 is expressed in the limb as claimed. The RNAseq was bulk, so ISH of these genes is needed to make this claim. Where are Wnt10b and Fgf2 expressed in the amputated limb? Do they show a dorsal (Wnt10b) and ventral (Fgf2) expression pattern?

      We thank the reviewer for raising this important point. As noted in our response to the editor’s comment, we performed ISH on serial sections of regular blastemas at several time points (Fig. S5A). However, the expression patterns of Wnt10b and Fgf2 along the dorsoventral axis were not clear. To complement the ISH results, we performed RT-qPCR on microdissected dorsal and ventral halves of regular blastemas at the MB stage (Fig. S5B). We found that Wnt10b and Fgf2 were expressed at significantly higher levels in the dorsal and ventral halves, respectively, compared to the opposite half. This dorsal/ventral biased expression of Wnt10b/Fgf2 is consistent with our RNA-seq data. To identify the cell types expressing Wnt10b or Fgf2, we analyzed published single-cell RNA-seq data (7 dpa blastema (MB), Li et al., 2021). As a result, Fgf2 expression was observed in the mesenchymal cluster, whereas Wnt10b expression was observed in both mesenchymal and epithelial clusters (Fig. S6). However, because only a small fraction of cells expressed Wnt10b, the principal cellular source of WNT10B protein remains unclear. The apparent low abundance likely contributes to the weak ISH signals and reflects current technical limitations. In addition, Wnt10b and Fgf2 expression did not follow Lmx1b expression (Fig. S6J, K), and Wnt10b and Fgf2 themselves were not exclusive (Fig. S6L). Together with the RT-qPCR data (Fig. S5B), these results suggest that Wnt10b and Fgf2 are not exclusively confined to purely dorsal or ventral cells at the single-cell level, even though they show dorsoventral bias when assessed in bulk tissue, suggesting that Wnt10b/Fgf2 expression is not dorsal-/ventral-specific but mediated by dorsal/ventral cells. Defining the precise spatial patterns of Wnt10b and Fgf2 in regular regeneration will therefore be an important goal for future work. These points are now included in the revised manuscript (Line 485‒501).

      (24) Line 266-288: The formation of multiple limbs is impressive. Do these new limbs correspond to the PD location they are generated?

      We thank the reviewer for this interesting question. Interestingly, from our observations, there does appear to be a tendency for the induced limbs to vary in length depending on their PD location. The skeletal patterns of the induced multiple limbs are now included in Fig. 6. However, as noted earlier, the supernumerary limbs exhibit highly variable morphologies, and a rigorous analysis of PD correlation would require a large number of induced limbs. Since this lies outside the main focus of the present study, we have not pursued this point further in the manuscript.

      (25) Line 288: The minimal requirement for claiming the molecular basis for DV signaling was identified is to ISH or multiplexed FISH for Wnt10b and Fgf2 in amputated limb blastemas to show they are expressed in the mesenchyme or epithelium and are dorsally and ventrally expressed, respectively. In addition, the current understanding of DV patterning through Wnt7a, Lmx1b, and En1 shown not to be important in this model.

      We thank the reviewer for this comment and fully agree with the point raised. We would like to clarify that we are not claiming to have identified the molecular basis of DV patterning. As the reviewer notes, molecules such as Lmx1b, Wnt7a, and En1 are well identified in other animal models as key regulators of DV positional identity. There is no doubt that these molecules play central roles in DV patterning. However, in axolotl limb regeneration, clear DV-specific expression has not been demonstrated for these genes except for Lmx1b. Therefore, further studies will be required to elucidate the molecular basis of DV patterning in axolotls.

      Our focus here is more limited: we aim to identify the molecular basis for the mechanisms in which positional domain-mediated signals (FGF8, SHH, WNT10B, and FGF2) regulate the limb patterning process, rather than the molecular basis of DV patterning. In fact, our results on Wnt10b and Fgf2 suggest that these genes did not affect dorsoventral identities.

      We recognize that this distinction was not sufficiently clear in the original text, and we have revised the manuscript to describe DV patterning mechanisms in other animals and clarify that the dorsal- and ventral-mediated signals are distinct from DV patterning (Line 444‒450). At least, we avoid claiming that the molecular basis for DV signaling was identified.

      (26) Line 335: References are needed for this statement. From what I found, Wnt4 can be canonical or non-canonical.

      We thank the reviewer for this helpful comment. We have revised the manuscript (Line 404‒407). We added these citations at the relevant location and adjusted nearby wording to avoid implying pathway exclusivity, in alignment with our response to comment (20).

      (27) Line 337-338: The authors cannot claim "that canonical, but not non-canonical, WNT signaling contributes to Shh induction" as this was not thoroughly tested is based upon the negative result that Wnt4 electroporation did not induce Shh expression.

      We thank the reviewer for this important clarification. We agree that our data do not allow us to conclude that non-canonical WNT signaling in general does not contribute to Shh induction. Accordingly, we have removed the phrase “but not non-canonical” and revised the text to emphasize that, within the scope of our experiments, Shh induction was not observed following Wnt4 electroporation, whereas it was observed with Wnt10b.

      (28) Line 345: In order to claim "WNT10B via the canonical WNT pathway...appears to regulate Shh expression" needs at least qPCR to show WNT10B induces canonical signaling.

      We thank the reviewer for this comment. As noted in our response to comment (21), we also assessed canonical WNT signaling by qPCR analysis of Axin2 and Lef1 following Wnt10b electroporation (Line 282‒285).

      (29) Lines 361-372: A few studies have been performed on DV patterning of the mouse digit regeneration in regards to Lmx1b and En1. It may be good to discuss how the current study aligns with these findings.

      We appreciate the reviewer’s suggestion. As the reviewer refers, several studies have been performed on dorsoventral (DV) patterning in mouse digit tip regeneration in relation to Lmx1b and En1 (e.g., Johnson et al., 2022; Castilla-Ibeas et al., 2023). In the present study, however, our main conclusion is different in the scope of studies on mouse digit tip regeneration. We show that, in the axolotl, pre-existing dorsal and ventral identities (as reflected by dorsally derived and ventrally derived cells in the ALM blastema) are required together to induce Shh expression, and that this Shh induction in turn supports anteroposterior interaction at the limb level. This mechanism—dorsal-mediated and ventral-mediated signals acting in combination to permit Shh expression—does not have a clear direct counterpart in the mouse digit tip literature. Moreover, even with respect to Lmx1b, the two systems behave differently. In mouse digit tip regeneration, loss of Lmx1b during regeneration does not grossly affect DV morphology of the regenerate (Johnson et al., 2022). By contrast, in our axolotl ALM system, the presence or absence of Lmx1b-positive dorsal tissue correlates with the final dorsoventral organization of the induced limb-like structures (e.g., production of double-dorsal or double-ventral symmetric structures in the absence of appropriate dorsoventral contact). Thus, the role of dorsoventral identity in our model is directly tied to patterned limb outgrowth at the whole-limb scale, whereas in the mouse digit tip it has been reported primarily in the context of digit tip regrowth and bone regeneration competence, not robust DV repatterning (Johnson et al., 2022).

      For these reasons, we believe that an extended discussion of mouse digit tip regeneration would risk implying a mechanistic equivalence between axolotl limb regeneration and mouse digit tip regeneration that is not supported by current data. Because the regenerative contexts differ, and because Lmx1b does not appear to re-establish DV patterning in the mouse regenerates (Johnson et al., 2022), we have chosen not to include an explicit discussion of mouse digit tip regeneration in the main text.

      (30) Line 408-433: Although I appreciate generating a model, this section takes some liberties to tell a narrative that is not entirely supported by previous literature or this study. For example, lines 415-416 state "Wnt10b and Fgf2 are expressed at higher levels in dorsal and the ventral blastemal cells, respectively" which were not shown in the study or other studies.

      We thank the reviewer for this important comment. We agree that the original model based on RNA-seq data overstated the evidence. To address this point experimentally, we examined Wnt10b and Fgf2 expression in regular blastemas (Supplemental Figure 5 and 6). Accordingly, our model is now framed as an inductive mechanism for Shh expression—supported by results in ALM (WNT10B in VentBL; FGF2 in DorBL) and by DV-biased expression. Concretely, the sentence previously paraphrased as “Wnt10b and Fgf2 are expressed at higher levels in dorsal and ventral blastemal cells, respectively” has been replaced with wording that (i) avoids single-cell DV specificity and (ii) emphasizes dorsal-/ventral-mediated regulation and the requirement for both signals to allow Shh induction (Line 510‒511).

      Reviewer #2 (Recommendations for the authors):

      (1) Introduction:

      The authors' definitions of positional cues vs positional information are a little hard to follow, and do not appear to be completely accurate. From my understanding of what the authors explain, "positional information" is defined as a signal that generates positional identities in the regenerating tissue. This is a somewhat different definition than what I previously understood, which is the intrinsic (likely epigenetic) cellular identity associated with specific positional coordinates. On the other hand, the authors define "positional cues" as signals that help organize the cells according to the different axes, but don't actually generate positional identities in the regenerating cells. The authors provide two examples: Wnt7a as an example of positional information, and FGF8 as a positional cue. I think that coording to the authors definitions, FGF8 (and probobly Shh) are bone fide positional cues, since both signals work together to organize the regenerating limb cells - yet do not generate positional identities, because ectopic limbs formed from blastemas where these pathways have been activated do not regenerate (Nacu et al 2016). However, I am not sure Wnt7a constitutes an example of a "positional information" signal, since as far as I know, it has not been shown to generate stable dorsal limb identities (that remain after the signal has stopped) - at least yet. If it has, the authors should cite the paper that showed this. I think that some sort of diagram to help define these visually will be really helpful, especially to people who do not study regenerative patterning.

      We thank the reviewer for this thoughtful comment. We now agree with the reviewer that our use of “positional cue” and “positional information” may have been confusing. In the revision—and as noted in our response to the Editor’s comment (4)—we have removed the term “positional cue” and no longer attempt to contrast it with “positional information.” Instead, we adopt phrasing that reflects our data and hypothesis: during limb patterning, dorsal-mediated signals act on ventral cells and ventral-mediated signals act on dorsal cells to induce Shh expression. This wording avoids implying that these signals specify dorsoventral identity.

      Regarding WNT7A, we agree it has not been shown to generate a stable dorsal identity after signal withdrawal. In the revised Introduction we therefore describe WNT7A in amniote limb development as an extracellular regulator that induces Lmx1b in dorsal mesenchyme (with En1 repressing Wnt7a ventrally), rather than labeling it as “positional information” in a strict, identity-imprinting sense. We highlight this contrast because, in our axolotl experiments, WNT10B and FGF2 did not alter Lmx1b expression or dorsal–ventral limb characteristics when overexpressed, consistent with the idea that they act downstream of DV identity to enable Shh induction, not to establish DV identity.

      (2) Results:

      It would be helpful if the number of replicates per sample group were reported in the figure legends.

      We thank the reviewer for this suggestion. In accordance with the comment, we have added the number of replicates (n) for each sample group in the figure legends.

      Figure 2 shows ISH for A/P and D/V transcripts in different-positioned blastemas without tissue grafts. The images show interesting patterns, including the lack of Shh expression in all blastemas except in posterior-located blastemas, and localization of the dorsal transcript (Lmx1b) to the dorsal half of A or P located blastemas. My only concern about this data is that the expression patterns are described in only a small part of the ectopic blastema (how representative is it?) and the diagrams infer that these expression patterns are reflective of the entire blastema, which can't be determined by the limited field of view. It is okay if the expression patterns are not present in the entire blastema -in fact, that might be an important observation in terms of who is generating (and might be receiving) these signals.

      We thank the reviewer for this insightful comment. Because Fgf8 and Shh expression was detectable only in a limited subset of cells, the original submission included only high-magnification images. In response to the reviewer’s valid concern about representativeness, we have now added low-magnification overviews of the entire blastema as a supplemental figure (Fig. S1) and clarified in the figure legend that these expression patterns can be focal rather than pan-blastemal (Line 795‒796).

      In Figure 3, they look at all of these expression patterns in the grafted blastemas, showing that Shh expression is only visible when both D and V cells are present in the blastema. My only concern about this data is that the number of replicates is very low (some groups having only an N=3), and it is unclear how many sections the authors visualized for each replicate. This is especially important for the sample groups where they report no Shh expression -I agree that it is not observable in the single example sections they provide, but it is uncertain what is happening in other regions of the blastema.

      We thank the reviewer for this important comment. To increase the reliability of the results, we have increased the number of biological replicates in groups where n was previously low. For all samples, we collected serial sections spanning the entire blastema. For blastemas in which Shh expression was observed, we present representative sections showing the signal. For blastemas without detectable Shh expression, we selected a section from the central region that contains GFP-positive cells for the Figure. To make these points explicit, we have added the following clarification to the Fig. 3 legend (Line 811‒815).

      Figure 4: Shh overexpression in A/P/D/V blastemas - expression induces ectopic limbs in A/D/V locations. They analyzed the symmetry of these regenerates (assuming that Do and V located blastemas will exhibit D/V symmetry because they only contain cells from one side of that axis. I am a little concerned about how the symmetry assay is performed, since oblique sections through the digits could look asymmetric, while they are actually symmetric. It is also unclear how the angle of the boxes that the symmetry scores were based on was decided - I imagine that the score would change depending on the angle. It also appears that the authors picked different digits to perform this analysis on the different sample groups. I also admit that the logic of classification scheme that the authors used AI to perform their symmetry scoring analysis (both in Figures 4 and 5) is elusive to me. I think it would have been more informative if the authors leveraged the structural landmarks, like the localization of specific muscle groups. (If this experiment were performed in WT animals, the authors could have used pigment cell localization)... or generate more proximal sections to look at landmarks in the zeugopod.

      We thank the reviewer for these detailed comments regarding the symmetry analysis. Because reliance on a computed symmetry score alone could raise the concerns noted by the reviewer, we now provide transverse sections along the proximodistal axis as supplemental figures (Figs. S2 and S4). These include levels corresponding to the distal end of the zeugopod and the proximal end of the autopod. In addition to reporting the symmetry score, we have explicitly stated in the text that symmetry was also assessed by visual inspection of these sections.

      As also noted in our response to Reviewer #1 (comment 15), ALM-induced limbs frequently exhibit abnormal and highly variable morphologies, which makes it difficult to use consistent anatomical landmarks such as particular digits or muscle groups. For this reason, we focused our analysis on morphological symmetry rather than landmark-based metrics, and we emphasize this rationale in the revised text (Line 232‒235).

      Regarding the use of bounding boxes, this procedure was chosen to minimize the effects of curvature or fixation-induced distortion. For each section, the box angle was adjusted so that the outer contour (epidermal surface) was aligned symmetrically; this procedure was applied uniformly across all conditions to avoid bias. We analyzed multiple biological replicates in each group, which helps mitigate potential artifacts due to oblique sectioning. To further reduce bias, we increased the number of fields included in the analysis to n = 24 per group in the revised version.

      In addition, staining intensity varied among samples, such that a region identified as “muscle” in one sample could be assigned differently in another if classification were based solely on color. To avoid this problem, we used a machine-learning classifier trained separately for each sample, allowing us to group the same tissues consistently within that sample irrespective of intensity differences. In the context of ALM-induced limbs, where stable anatomical landmarks are not available, we consider this strategy the most appropriate. We have added this rationale to the revised manuscript for clarity (Line 239‒247).

      Figure 5: The number of replicates in sample groups is relatively low and is quite variable between groups (ranging between 3 and 7 replicates). Zoom in to visualize Shh expression is small relative to the blastema, and it is difficult to discern why the authors positioned the window where they did, and how they maintained consistency among their different sample groups. In the examples of positive Shh expression - the signal is low and hard to see. Validating these expression patterns using some sort of quantitative transcriptional assay (like qRTPCR) would increase the rigor of this experiment ... especially given that they will be able to analyze gene expression in the entire blastema as opposed to sections that might not capture localized expression.

      We thank the reviewer for this important comment. To increase the rigor of these experiments, we have increased the number of biological replicates in groups where n was previously low. In addition, because Shh signal in the Wnt10b-electroporated VentBL images was particularly weak and difficult to discern, we replaced that panel with a representative example in which Shh signal is more clearly visible. We also validated the Shh expression for Wnt10b–electroporated VentBL and Fgf2–electroporated DorBL by RT-qPCR, which assesses gene expression across the entire blastema. These results are now included in Fig. 5 and Line 280‒282. Finally, we clarified in the figure legend how the “window” for imaging was chosen: for samples with detectable Shh expression, the window was placed in the region where the signal was observed; for conditions without detectable Shh expression, the window was positioned in a comparable region containing GFP-positive cells (Line 836‒839). These revisions are included in the revised manuscript.

      Figure 6: They treat dorsal and ventral wounds with gelatin beads soaked in a combination of BMP2+FGF8 (nerve factors) and FGF2 proposed ventral factor). Remarkably, they observe ectopic limb expression in only dorsal wounds, further supporting the idea that FGF2 provides the "ventral" signal. They show examples of this impressive phenotype on limbs with multiple ectopic structures that formed along the Pr/Di axis. Including images of tubulin staining (as they have in Figures 1 and 2) to ensure that the blastemas (or final regenerates) are devoid of nerves. The authors' whole-mount skeletal staining which shows fusion of the ectopic humerus with the host humerus, is a phenotype associated with deep wounding, which could provide an opportunity for more cellular contribution from different limb axes.

      We thank the reviewer for these constructive comments. As noted in the prior study, when beads are used to induce blastemas without surgical nerve orientation, fine nerve ingrowth can still occur (Makanae et al., 2014), and the induced blastemas are not completely devoid of nerves. While it is still uncertain whether these recruited nerves are functional after blastema induction, it is an important point, and we added sentences about this in the revised manuscript (Line 341‒345).

      Regarding the skeletal phenotype, despite careful implantation to avoid injuring deep tissues, bead-induced ectopic limbs on the dorsal side occasionally displayed fusion of the stylopod with the host humerus—a phenotype associated with deep wounding, as the reviewer notes. This observation suggests that contributions from a broader cellular population cannot be excluded. However, because fusion was observed in only 1 of 16 induced limbs analyzed, and because ectopic limbs induced at the forearm (zeugopod) level did not exhibit such fusion (n=1/6 for stylopod-level inductions; n=0/10 for zeugopod-level inductions), we believe that our main conclusion remains valid. Because fusion is not a typical outcome, we now present representative non-fusion cases—including zeugopod-origin examples—in the figure (Fig. 6L1, L2), and we report the fusion incidence explicitly in the text (Line 350‒354). We also note in the revised manuscript that stylopod fusion can occur in a minority of cases (Line 347‒349).

      Figure 7 nicely summarizes their findings and model for patterning.

      We thank the reviewer for this positive comment.

      The table is cut off in the PDF, so it cannot be evaluated at this time.

      In our copy of the PDF, the table appears in full, so this may have been a formatting issue. We have carefully checked the file and ensured that the table is completely included in the revised submission.

      There is a supplemental figure that doesn't seem to be referenced in the text.

      The supplemental figure (Fig. S1 of the original manuscript) is referenced in the text, but it may have been overlooked. To improve clarity, we have expanded the description in the manuscript so that the supplemental figure is more clearly referenced (Line 285‒291).

      (3) Materials and Methods:

      No power analysis was performed to calculate sample group sizes. The authors have used these experimental techniques in the past and could have easily used past data to inform these calculations.

      We thank the reviewer for this important comment. We did not include a power analysis in the manuscript because this was the first time we compared Shh and other gene expression levels among ALM blastemas of different positional origins using RT-qPCR in our experimental system. As we did not have prior knowledge of the expected variability under these specific conditions, it was difficult to predetermine appropriate sample sizes.

      Reviewer #3 (Recommendations for the authors):

      General:

      Congratulations - I found this an elegant and easy-to-read study with significant implications for the field! If possible, I would urge you to consider adding some more characterisation of Wnt10b and Fgf2- which cell types are they expressed in? If you can link your mechanisms to normal limb regeneration too (i.e., regenerating blastema, not ALM), this would significantly elevate the interest in your study.

      We sincerely thank the reviewer for these encouraging comments. As also noted in our response to the editor’s comment, we have analyzed the expression patterns of Wnt10b and Fgf2 in regular blastemas (Line 294‒306). Although clear specific expression patterns along dorsoventral axis were not detected by ISH, likely due to technical limitations of sensitivity, RT-qPCR revealed significantly higher expression levels of Wnt10b in the dorsal half and Fgf2 in the ventral half of a regular blastema (Fig. S5). In addition, we analyzed published single-cell RNA-seq data (7 dpa blastema, Li et al., 2021) (Line 307‒321). As a result, Fgf2 expression was observed in the mesenchymal clusters, whereasWnt10b expression was observed in both mesenchymal and epithelial clusters (Fig. S6). However, because only a small fraction of cells expressed Wnt10b, the principal cellular source of WNT10B protein remains unclear. Therefore, defining the precise spatial patterns of Wnt10b and Fgf2 in regular regeneration will be an important goal for future work.

      Data availability:

      I assume that the RNA-sequencing data will be deposited at a public repository.

      RNA-seq FASTQ files have been deposited in the DNA Data Bank of Japan (DDBJ; https://www.ddbj.nig.ac.jp/) under BioProject accession PRJDB38065. We have added a Data availability section to the revised manuscript.

      References

      Castilla-Ibeas, A., Zdral, S., Oberg, K. C., & Ros, M. A. (2024). The limb dorsoventral axis: Lmx1b’s role in development, pathology, evolution, and regeneration. Developmental Dynamics, 253(9), 798–814. https://doi.org/10.1002/dvdy.695

      Johnson, G. L., Glasser, M. B., Charles, J. F., Duryea, J., & Lehoczky, J. A. (2022). En1 and Lmx1b do not recapitulate embryonic dorsal-ventral limb patterning functions during mouse digit tip regeneration. Cell Reports, 41(8), 111701. https://doi.org/10.1016/j.celrep.2022.111701

      Stocum, D. (2017). Mechanisms of urodele limb regeneration. Regeneration, 4. https://doi.org/10.1002/reg2.92

      Tank, P. W., & Holder, N. (1978). The effect of healing time on the proximodistal organization of double-half forelimb regenerates in the axolotl, Ambystoma mexicanum. Developmental Biology, 66(1), 72–85. https://doi.org/10.1016/0012-1606(78)90274-9

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #3 (Public review):

      To summarize: The authors' overfilling hypothesis depends crucially on the premise that the very quickly reverting paired-pulse depression seen after unusually short rest intervals of << 50 ms is caused by depletion of release sites whereas Dobrunz and Stevens (1997) concluded that the cause was some other mechanism that does not involve depletion on. The authors now include experiments where switching extracellular Ca2+ from 1.2 to 2.5 mM increases synaptic strength on average, but not by as much as at other synapse types. They contend that the result supports the depletion on hypothesis. I didn't agree because the model used to generate the hypothesis had no room for any increase at all, and because a more granular analysis revealed a mixed population with a subset where: (a) synaptic strength increased by as much as at standard synapses; and yet (b) the quickly reverting depression for the subset was the same as the overall population.

      The authors raise the possibility of additional experiments, and I do think this could clarify things if they pre-treat with EGTA as I recommended initially. They've already shown they can do this routinely, and it would allow them to elegantly distinguish between pv and pocc explanations for both the increases in synaptic strength and the decreases in the paired pulse ratio upon switching Ca2+ to 2.5 mM. Plus/minus EGTA pre-treatment trials could be interleaved and done blind with minimal additional effort.

      Showing reversibility would be a great addition too, because, in our experience, this does not always happen in whole-cell recordings in ex-vivo tissue even when electrical properties do not change. If the goal is to show that L2/3 synapses are less sensitive to changes in Ca2+ compared to other synapse types - which is interesting but a bit off point - then I would additionally include a positive control, done by the same person with the same equipment, at one of those other synapse types using the same kind of presynaptic stimulation (i.e. ChRs).

      Specific points (quotations are from the Authors' rebuttal)

      (1) Regarding the Author response image 1, I was instead suggesting a plot of PPR in 1.2 mM Ca2+ versus the relative increase in synaptic strength in 2.5 versus in 1.2 mM. This continues to seem relevant.

      Complying with your suggestion, we studied the effects of external [Ca<sup>2+</sup>] ([Ca<sup>2+</sup>]<sub>o</sub>) after pre-incubating the slice in aCSF containing 50 μM EGTA-AM, and added the results as Figure 3—figure supplement 3C-D. Elevation of ([Ca<sup>2+</sup>]<sub>o</sub>) from 1.3 to 2.5 mM produced no significant change in either baseline EPSC amplitude or PPR, supporting that the p<sub>v</sub> is already saturated at 1.3 mM [Ca<sup>2+</sup>]<sub>o</sub> and implying that the modest Ca<sup>2+</sup> dependence of baseline EPSCs and PPR in the absence of EGTA (Figure 3—figure supplement 3A-B) is mediated by the change in baseline vesicular occupancy of release sites (p<sub>occ</sub>) rather than fusion probability of docked vesicles (p<sub>v</sub>).

      We found some correlation of high Ca<sup>2+</sup>-induced relative increase in synaptic strength with the PPR at low Ca<sup>2+</sup> (Author response image 1-A). But this correlation was abolished by pre-incubating the slices in EGTA-AM too (Author response image 1-B). It should be noted that high PPR does not always mean low p<sub>v</sub>. For example, when the replenishment is equal between high and low baseline p<sub>occ</sub> synapses, the PPR would be higher at low p<sub>occ</sub> synapses than that at high p<sub>occ</sub> synapses, even if p<sub>v</sub> is close to unity. Therefore, high baseline release probability (Pr), whatever it is attributed to high p<sub>v</sub> or high p<sub>occ</sub>, can result in low PPR, considering that Pr = p<sub>occ</sub> x p<sub>v</sub>.

      As we have already mentioned in our previous letter, the relationship of PPR with refilling rate is complicated and can be bidirectional, whereas an increase in p<sub>v</sub> always results in a reduction of PPR. For example, PPR can be reduced by both a decrease and an increase in the refilling rate (Figure 2— figure supplement 1 and Lin et al., 2025). Therefore, the PPR analysis alone is insufficient to differentiate the contributions of p<sub>v</sub> and p<sub>occ</sub> Thanks to your suggestion, we could resolve this ambiguity by the EGTA-AM pre-incubation study (Figure 3—figure supplement 3C-D).

      Author response image 1.

      Plot of PPR at low [Ca<sup>2+</sup>]<sub>o</sub> (1.3 mM) as a function of the baseline EPSC at high [Ca<sup>2+</sup>]<sub>o</sub> (2.5 mM) normalized to that at low [Ca<sup>2+</sup>]<sub>o</sub> measured at recurrent excitatory synapses in L2/3 of the prelimbic cortex under the conditions without EGTA-AM (A) and after pre-incubating the slices in EGTA-AM (50 μM) (B)

      (2) "Could you explain in detail why two-fold increase implies pv < 0.2?"

      (a) start with power((2.5/(1 + (2.5/K1) + 1/2.97)),4) = 2<sup>*</sup>power((1.3/(1 + (1.3/K1) + 1/2.97)),4);

      (b) solve for K1 (this turns out to be 0.48);

      (c) then implement the premise that pv -> 1.0 when Ca2+ is high by calculating Max = power((C/(1 + (C/K1) + 1/2.97)),4) where C is [Ca] -> infinity.

      (d) pv when [Ca] = 1.3. mM must then be power((1.3/(1 + (1.3/K1) + 1/2.97)),4)/Max, which is <0.2. Note that modern updates of Dodge and Rahamimoff typically include a parameter that prevents pv from approaching 1.0; this is the gamma parameter in the versions from Neher group.

      Thank you very much for your kind explanation. This interpretation, however, based on the premise that pv is not saturated at low[Ca<sup>2+</sup>]<sub>o</sub>, and that Pr = p<sub>v</sub>. In the present study, however, we presented multiple convergent lines of evidence supporting that p<sub>v</sub> is already saturated at 1.3 mM [Ca<sup>2+</sup>]<sub>o</sub> as follows: (1) little effect of EGTA-AM on the baseline EPSCs (Figure 2—figure supplement 1); (2) high double failure rates (Figure 3—figure supplement 2); (3) little effect of high [Ca<sup>2+</sup>]<sub>o</sub> on baseline EPSC (Figure 3—figure supplement 3). Therefore, our results suggest that the classical Dodge-Rahamimoff fourth-power relationship can not be applied to estimate p<sub>v</sub> at the L2/3 recurrent excitatory synapses. 

      (3) "If so, we can not understand why depletion-dependent PPD should lead to PPF." When PPD is caused by depletion and pv < 0.2, the number of occupied release sites should not be decreased by more than one-filth at the second stimulus so, without facilitation, PPR should be > 0.8. The EGTA results then indicate there should be strong facilitation, driving PPR to something like 1.2 with conservative assumptions. And yet, a value of < 0.4 is measured, which is a large miss.

      As mentioned above, the framework used for inferring that p<sub>v</sub> < 0.2, the Dodge-Rahamimoff equation, is not applicable to our experimental system. Consequently, the subsequent deduction— that depletion-dependent PPD should logically lead to PPF—is based on a model that does not compatible with aforementioned multiple convergent lines of evidence, which supports high p<sub>v</sub> rather than the low p<sub>v</sub> facilitation model.

      (4) Despite the authors' suggestion to the contrary, I continue to think there is a substantial chance that Ca2+-channel inactivation is the mechanism underlying the very quickly reverting paired-pulse depression. However, this is only one example of a non-depletion mechanism among many, with the main point being that any non-depletion mechanism would undercut the reasoning for overfilling. And, this is what Dobrunz and Stevens claimed to show; that the mechanism - whatever it is - does not involve depletion. The most effective way to address this would be affirmative experiments showing that the quickly reverting depression is caused by depletion after all. Attempting to prove that Ca2+channel inactivation does not occur does not seem like a worthwhile strategy because it would not address the many other possibilities.

      We have systematically ruled out alternative possibilities that may underlie the strong PPD observed at our synapses and demonstrated that it arises from high p<sub>v</sub>-induced vesicle depletion through multiple independent lines of evidence. First, we excluded (1) AMPAR desensitization or saturation (Figure 1—figure supplement 5), (2) Ca<sup>2+</sup> channel inactivation (Figure 2—figure supplement 2), (3) channelrhodopsin inactivation (Figure 1—figure supplement 2), (4) artificial bouton stimulation (Figure 1—figure supplement 4), and (5) transient vesicle undocking (Figure 5; addressed in our previous rebuttal). Second, EGTA-AM experiments (Figure 2, Figure 2—figure supplement 1) revealed that release sites are tightly coupled to Ca<sup>2+</sup>  channels, and that EGTA further exacerbates PPD. Third, we validated high baseline p<sub>v</sub> through analysis of double failure rates (Figure 3—figure supplement 2). Fourth, the minimal increase in baseline EPSCs upon elevation of external [Ca<sup>2+</sup>] (Figure 3—figure supplement 3) further supports that baseline p<sub>v</sub> is already saturated at low [Ca<sup>2+</sup>]<sub>o</sub>. Additionally, to further validate our hypothesis, we performed the specific experiment suggested by the reviewer. We have now added EGTA pre-incubation experiments (Figure 3—figure supplement 3C-D) and have revised the manuscript. Specifically, when slices were pre-incubated with 50 μM EGTA-AM, elevation of extracellular [Ca<sup>2+</sup>] from 1.3 to 2.5 mM produced no significant change in either baseline EPSC amplitude or PPR, strongly supporting that the high [Ca<sup>2+</sup>]<sub>o</sub> effects in the absence of EGTA are primarily mediated by changes in p<sub>occ</sub> rather than p<sub>v</sub>

      (5) True that Kusick et al. observed morphological re-docking, but then vesicles would have to re-prime and Mahfooz et al. (2016) showed that re-priming would have to be slower than 110 ms (at least during heavy use at calyx of Held).

      As previously discussed, Kusick et al. (2020) demonstrated that the transient destabilization of the docked vesicle pool recovers very rapidly within 14 ms after stimulation. This implies that any posts stimulation undocking events are likely recovered before the 20 ms ISI used in our PPR experiments. Consequently, transient undocking/re-docking events are unlikely to significantly influence the PPR measured at this interval. Furthermore, regarding the slow re-priming kinetics (>100 ms) reported by Mahfooz et al. (2016) and Kusick et al., (2020), our 20 ms ISI effectively falls into a me window that avoids the potential confounds of both processes: it is long enough for the rapid morphological recovery (~14 ms) of docked vesicles to occur, yet too short for the slow re-priming process to make a substantial  contribution. Furthermore, Vevea et al. (2021) showed that post-stimulus undocking is facilitated in synaptotagmin-7 (Syt7) knockout synapses. In our study, however, Syt7 knockdown did not affect PPR at 20 ms ISI, suggesting that the undocking process described in Kusick et al. (2020) is not a major contributor to the PPD observed at 20 ms intervals in our experiments. Therefore, we conclude that the 20 ms ISI used in our experiments falls within a me window that is influenced neither by the rapid undocking (<14 ms) reported nor by the slow re-priming process (>100 ms).

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The revised manuscript presents an interesting and technically competent set of experiments exploring the role of the infralimbic cortex (IL) in extinction learning. The inclusion of histological validation in the supplemental material improves the transparency and credibility of the results, and the overall presentation has been clarified. However, several key issues remain that limit the strength of the conclusions.

      We thank the Reviewer for their positive assessment of our revised manuscript. We discussed the issues raised by the Reviewer below.

      The behavioral effects reported are modest, as evident from the trial-by-trial data included in the supplemental figures. Although the authors interpret their findings as evidence that IL stimulation facilitates extinction only after prior inhibitory learning, this conclusion is not directly supported by their data. The experiments do not include a condition in which IL stimulation is delivered during extinction training alone, without prior inhibitory experience. Without this control, the claim that prior inhibitory memory is necessary for facilitation remains speculative.

      The manuscript provides evidence across five experiments (Figures 2-6) that IL stimulation fails to facilitate extinction training in the absence of prior inhibitory experience. We therefore remain confident that the data support our conclusion: prior inhibitory learning enables IL stimulation to facilitate subsequent inhibitory learning.

      The electrophysiological example provided shows that IL stimulation induces a sustained inhibition that outlasts the stimulation period. This prolonged suppression could potentially interfere with consolidation processes following tone presentation rather than facilitating them. The authors should consider and discuss this alternative interpretation in light of their behavioral data.

      The possibility that IL stimulation exerted its effects by interfering with consolidation processes is inconsistent with the literature. Disrupting consolidation processes in the IL impairs extinction learning (1), even when animals have prior inhibitory learning experience (2). Yet our experiments found that IL stimulation failed to interfere with initial extinction learning but instead facilitated subsequent learning. Furthermore, the electrophysiological example demonstrates that the inhibitory effect is transient: the cell returned to firing properties similar to those observed pre-stimulation, making it unlikely that inhibition persists during the consolidation window.

      It is unfortunate that several animals had to be excluded after histological verification, but the resulting mismatch between groups remains a concern. Without a power analysis indicating the number of subjects required to achieve reliable effects, it is difficult to determine whether the modest behavioral differences reflect genuine biological variability or insufficient statistical power. Additional animals may be needed to properly address this imbalance.

      As noted in the revised manuscript, we are confident about the reliability of the findings reported. The manuscript provides evidence across five experiments that IL stimulation fails to facilitate brief extinction in the absence of prior inhibitory experience, replicating previous findings (3, 4). The manuscript also replicates these prior studies by demonstrating that experience with either fear or appetitive extinction enables IL stimulation to facilitate subsequent fear extinction. Furthermore, the present experiments replicate the facilitative effects of IL stimulation following fear or appetitive backward conditioning.

      Overall, while the manuscript is improved in clarity and methodological detail, the behavioral effects remain weak, and the mechanistic interpretation requires stronger experimental support and consideration of alternative explanations.

      We respectfully disagree with the assertion that the reported results are weak. The manuscript replicates all main findings internally or reproduces findings from previously published studies. While alternative explanations cannot be entirely excluded, we are not aware of any competing account that predicts the pattern of results reported here.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      We thank the Reviewer for their positive assessment.

      Strengths to highlight:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures

      We thank the Reviewer for their positive assessment.

      (2) Very clear representation of groups and experimental design for each figure

      We thank the Reviewer for their positive assessment.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      We thank the Reviewer for their positive assessment.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

      We thank the Reviewer for their positive assessment.

      Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, also are considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition. The authors have addressed the prior reviews. I still think it is unfortunate that the groups were not properly balanced in some of the figures (as noted by the authors, they were matched appropriately in real time, but some animals had to be dropped after histology, which caused some balancing issues). I think the overall pattern of results is compelling enough that more subjects do not need to be added, but it would still be nice to see more acknowledgement and statistical analyses of how these pre-existing differences may have impacted test performance.

      We thank the Reviewer for their positive assessment of our revised manuscript. We discussed the comments regarding group balancing below.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      We thank the Reviewer for their positive assessment

      Weaknesses:

      The various group differences in Figure 2 prior to any manipulation are still problematic. There was a reliable effect of subsequent group assignment in Figure 2 (p<0.05, described as "marginal" in multiple places). Then there are differences in extinction (nonsignificant at p=.07). The test difference between ReExt OFF/ON is identical to the difference at the end of extinction and the beginning of Forward 2, in terms of absolute size. I really don't think much can be made of the test result. The authors state in their response that this difference was not evident during the forward phase, but there clearly is a large ordinal difference on the first trial. I think it is appropriate to only focus on test differences when groups are appropriately matched, but when there are pre-existing differences (even when not statistically significant) then they really need to be incorporated into the statistical test somehow.

      We carefully considered the Reviewer's suggestion, but it is not possible to adjust the statistical analyses at test because these analyses do not directly compare the two ReExt groups. Any scaling of performance would require including the two Ext groups, which is not feasible since these groups did not receive initial extinction. Moreover, the analyses provide no conclusive evidence of pre-existing differences between the two ReExt groups: the difference was not significant during initial extinction and was absent during the Forward 2 stage. We acknowledge that closer performance between the two ReExt groups during initial extinction would have been preferable. However, we remain confident in the results obtained because they replicate previous experiments in which the two ReExt groups displayed identical performance during initial extinction.

      The same problem is evident in Figure 4B, but here the large differences in the Same groups are opposite to the test differences. It's hard to say how those large differences ultimately impacted the test results. I suppose it is good that the differences during Forward conditioning did not ultimately predict test differences, but this really should have been addressed with more subjects in these experiments. The authors explore the interactions appropriately but with n=6 in the various subgroups, it's not surprising that some of these effects were not detected statistically.

      As the Reviewer noted, the unexpected differences in Figure 4B are opposite in direction to the test differences. Importantly, Figure 4B replicates the main findings from Figure 3, which did not show these unexpected differences.

      It is useful to see the trial-by-trial test data now presented in the supplement. I think the discussion does a good job of addressing the issues of retrieval, but the ideas of Estes about session cues that the authors bring up in their response haven't really held up over the years (e.g., Robbins, 1990, who explicitly tested this; other demonstrations of within-session spontaneous recovery), for what it's worth.

      We thank the Reviewer for bringing our attention to Robbins’ work on session cues. We understand that the issue of retrieval is important but as we noted before, our manuscript and its conclusions do not claim to differentiate retrieval from additional learning.

      References

      (1) K. E. Nett, R. T. LaLumiere, Infralimbic cortex functioning across motivated behaviors: Can the differences be reconciled Neurosci Biobehav Rev 131, 704–721 (2021).

      (2) V. Laurent, R. F. Westbrook, Inactivation of the infralimbic but not the prelimbic cortex impairs consolidation and retrieval of fear extinction Learn Mem 16, 520–529 (2009).

      (3) N. W. Lingawi, R. F. Westbrook, V. Laurent, Extinction and Latent Inhibition Involve a Similar Form of Inhibitory Learning that is Stored in and Retrieved from the Infralimbic Cortex Cereb Cortex 27, 5547–5556 (2017).

      (4) N. W. Lingawi, N. M. Holmes, R. F. Westbrook, V. Laurent, The infralimbic cortex encodes inhibition irrespective of motivational significance Neurobiol Learn Mem 150, 64–74 (2018).


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript reports a series of experiments designed to test whether optogenetic activation of infralimbic (IL) neurons facilitates extinction retrieval and whether this depends on animals' prior experience. In Experiment 1, rats underwent fear conditioning followed by either one or two extinction sessions, with IL stimulation given during the second extinction; stimulation facilitated extinction retrieval only in rats with prior extinction experience. Experiments 2 and 3 examined whether backward conditioning (CS presented after the US) could establish inhibitory properties that allowed IL stimulation to enhance extinction, and whether this effect was specific to the same stimulus or generalized to different stimuli. Experiments 5 - 7 extended this approach to appetitive learning: rats received backward or forward appetitive conditioning followed by extinction, and then fear conditioning, to determine whether IL stimulation could enhance extinction in contexts beyond aversive learning and across conditioning sequences. Across studies, the key claim is that IL activation facilitates extinction retrieval only when animals possess a prior inhibitory memory, and that this effect generalizes across aversive and appetitive paradigms.

      Strengths:

      (1) The design attempts to dissect the role of IL activity as a function of prior learning, which is conceptually valuable.

      We thank the Reviewer for their positive assessment.

      (2) The experimental design of probing different inhibitory learning approaches to probe how IL activation facilitates extinction learning was creative and innovative.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) Non-specific manipulation.

      ChR2 was expressed in IL without distinction between glutamatergic and GABAergic populations. Without knowing the relative contribution of these cell types or the percentage of neurons affected, the circuit-level interpretation of the results is unclear.

      ChR2 was intentionally expressed in the infralimbic cortex (IL) without distinction between local neuronal populations for two reasons. First, the primary aim of this was to uncover some of the features characterizing the encoding of inhibitory memories in the IL, and this encoding likely engages interactions among various neuronal populations within the IL. Second, the hypotheses tested in the manuscript derived from findings that indiscriminately stimulated the IL using the GABA<sub>A</sub> receptor antagonist picrotoxin, which is best mimicked by the approach taken. We agree that it is also important to determine the respective contributions of distinct IL neuronal populations to inhibitory encoding; however, the global approach implemented in the present experiments represents a necessary initial step. These matters have been incorporated in the Discussion of the revised manuscript.

      (2) Extinction retrieval test conflates processes

      The retrieval test included 8 tones. Averaging across this many tone presentations conflate extinction retrieval/expression (early tones) with further extinction learning (later tones). A more appropriate analysis would focus on the first 2-4 tones to capture retrieval only. As currently presented, the data do not isolate extinction retrieval.

      It is unclear when retrieval of what has been learned across extinction ceases and additional extinction learning occurs. In fact, it is only the first stimulus presentation that unequivocally permits a distinction between retrieval and additional extinction learning, as the conditions for this additional learning have not been fulfilled at that presentation. However, confining evidence for retrieval to the first stimulus presentation introduces concerns that other factors could influence performance. For instance, processing of the stimulus present at the start of the session may differ from that present at the end of the previous session, thereby affecting what is retrieved. Such differences between the stimuli present at the start and end of an extinction session have been long recognized as a potential explanation for spontaneous recovery (Estes, 1955). More importantly, whether the test data presented confound retrieval and additional extinction learning or not, the interpretation remains the same with respect to the effects of a prior history of inhibitory learning on enabling the facilitative effects of IL stimulation. Finally, it is unclear how these facilitative effects could occur in the absence of the subjects retrieving the extinction memory formed under the stimulation. Nevertheless, the revised manuscript now provides the trial-by-trial performance (see Supplemental Figure 3) during the post-extinction retrieval tests and addresses this issue in the Discussion.

      (3) Under-sampling and poor group matching.

      Sample sizes appear small, which may explain why groups are not well matched in several figures (e.g., 2b, 3b, 6b, 6c) and why there are several instances of unexpected interactions (protocol, virus, and period). This baseline mismatch raises concerns about the reliability of group differences.

      Efforts were made to match group performance upon completion of each training stage and before IL stimulation. Unfortunately, these efforts were not completely successful due to exclusions following post-mortem analyses. This has been made explicit in the revised manuscript (Materials and Methods, Subjects section). However, we acknowledge that the unexpected interactions deserve further discussion, and this has been incorporated into the revised manuscript (see also comment from Reviewer 2). Although we cannot exclude the possibility that sample sizes may have contributed to some of these interactions, we remain confident about the reliability of the main findings reported, especially given their replication across the various protocols. Overall, the manuscript provides evidence that IL stimulation does not facilitate brief extinction in the absence of prior inhibitory experience in five different experiments, replicating previous findings (Lingawi et al., 2018; Lingawi et al., 2017). It also replicates these previous findings by showing that prior experience with either fear or appetitive extinction enables IL stimulation to facilitate subsequent fear extinction. Furthermore, the facilitative effects of such stimulation following fear or appetitive backward conditioning are replicated in the present manuscript. This is discussed in the Discussion of the revised manuscript.

      (4) Incomplete presentation of conditioning data

      Figure 3 only shows a single conditioning session despite five days of training. Without the full dataset, it is difficult to evaluate learning dynamics or whether groups were equivalent before testing.

      We apologize, as we incorrectly labeled the X axis for the backward conditioning data in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. This error has been corrected in the revised manuscript (see also second comment from Reviewer 2).

      (5) Interpretation stronger than evidence.

      The authors conclude that IL activation facilitates extinction retrieval only when an inhibitory memory has been formed. However, given the caveats above, the data are insufficient to support such a strong mechanistic claim. The results could reflect nonspecific facilitation or disruption of behavior by broad prefrontal activation. Moreover, there is compelling evidence that optogenetic activation of IL during fear extinction does facilitate subsequent extinction retrieval without prior extinction training (DoMonte et al 2015, Chen et al 2021), which the authors do not directly test in this study.

      As noted above, the interpretations of the main findings stand whether the test data confounds retrieval with additional extinction learning or not. The revised manuscript also clarifies the plotting of the data for the backward conditioning stages. We do agree that further discussion of the unexpected interactions is necessary, and this has been incorporated into the revised manuscript. However, the various replications of the core findings provide strong evidence for their reliability and the interpretations advanced in the original manuscript. The proposal that the results reflect non-specific facilitation or disruption of behavior seems highly unlikely. Indeed, the present experiments and previous findings (Lingawi et al., 2018; Lingawi et al., 2017) provide multiple demonstrations that IL stimulation fails to produce any facilitation in the absence of prior inhibitory experience with the target stimulus. Although these demonstrations appear inconsistent with previous studies (Do-Monte et al., 2015; Chen et al., 2021), this inconsistency is likely explained by the fact that these studies manipulated activity in specific IL neuronal populations. Previous work has already revealed differences between manipulations targeting discrete IL neuronal populations as opposed to general IL activity (Kim et al., 2016). Importantly, as previously noted, the present manuscript aimed to generally explore inhibitory encoding in the IL that is likely to engage several neuronal populations within the IL. Adequate statements on these matters have been included in the Discussion of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning, as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning, and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      Strengths:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures.

      We thank the Reviewer for their positive assessment.

      (2) Very clear representation of groups and experimental design for each figure.

      We thank the Reviewer for their positive assessment.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      We thank the Reviewer for their positive assessment.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) In Experiment 1, although not statistically significant, it does appear as though the stimulation groups (OFF and ON) differ during Extinction 1. It seems like this may be due to a difference between these groups after the first forward conditioning. Could the authors have prevented this potential group difference in Extinction 1 by re-balancing group assignment after the first forward conditioning session to minimize the differences in fear acquisition (the authors do report a marginally significant effect between the groups that would undergo one vs. two extinction sessions in their freezing during the first conditioning session)?

      Efforts were made daily to match group performance across the training stages, but these efforts were ultimately hampered by the necessary exclusions following postmortem analyses. This has been made explicit in the revised manuscript (Materials and Methods, Subjects section). Regarding freezing during Extinction 1, as noted by the Reviewer, the difference, which was not statistically significant, was absent across trials during the subsequent forward fear conditioning stage. Likewise, the protocol difference observed during the initial forward fear conditioning was absent in subsequent stages. We are therefore confident that these initial differences (significant or not) did not impact the main findings at test. Importantly, these findings replicate previous work using identical protocols in which no differences were present during the training stages. These considerations have been addressed in the revised manuscript (see Results for Experiment 1).

      (2) Across all experiments (except for Experiment 1), the authors state that freezing during the initial conditioning increased across "days". The figures that correspond to this text, however, show that freezing changes across trials. In the methods, the authors report that backward conditioning occurred over 5 days. It would be helpful to understand how these data were analyzed and collated to create the final figures. Was the freezing averaged across the five days for each trial for analyses and figures?

      We apologize, as noted above, for having incorrectly labeled the X axis across the backward conditioning data sets in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. The data shown in these Figures use the average of all trials on a given day. This has been clarified in the methods section of the revised manuscript (Statistical Analyses section). The labeling errors on the Figures have been corrected.

      (3) In Experiment 3, the authors report a significant Protocol X Virus interaction. It would be useful if the authors could conduct post-hoc analyses to determine the source of this interaction. Inspection of Figure 4B suggests that freezing during the two different variants of backward conditioning differs between the virus groups. Did the authors expect to see a difference in backward conditioning depending on the stimulus used in the conditioning procedure (light vs. tone)? The authors don't really address this confounding interaction, but I do think a discussion is warranted.

      We agree with the Reviewer that further discussion of the Protocol x Virus interaction that emerged during the backward conditioning and forward conditioning stages of Experiment 3 is warranted. This discussion has been provided in the revised manuscript (see Results section). Briefly, during both stages, follow-up analyses did not reveal any differences (main effects or interactions) between the two groups trained with the light stimulus (Diff-EYFP and Diff-ChR2). By contrast, the ChR2 group trained with the tone (Back-ChR2) froze more overall than the EYFP group (Back-EYFP), but there were no other significant differences between the two groups. Based on these analyses, the Protocol x Virus interaction appears to be driven by greater freezing in the ChR2 group trained with the tone rather than a difference in the backward conditioning performance based on stimulus identity. Consistent with this, the statistical analyses did not reveal a main effect of Protocol during either the backward conditioning stage or the stimulus trials during the forward conditioning stage. Nevertheless, during this latter stage, a main effect of Protocol emerged during baseline performance, but once again, this seems to be driven by the Back-ChR2 group. Critically, it is unclear how greater stimulus freezing in the Back-ChR2 group during forward conditioning would lead to lower freezing during the post-extinction retrieval test.

      We note that an unexpected Protocol x Period interaction was found during appetitive backward conditioning in Experiment 5. For consistency, we conducted additional analyses to determine the source of this interaction (see Results section). As previously noted, performance during appetitive backward conditioning is noisy and cannot be taken as a failure to generate inhibitory learning. It is therefore unlikely that this interaction implied a difference in such learning.

      (4) In this same experiment, the authors state that freezing decreased during extinction; however, freezing in the Diff-EYFP group at the start of extinction (first bin of trials) doesn't look appreciably different than their freezing at the end of the session. Did this group actually extinguish their fear? Freezing on the tone test day also does not look too different from freezing during the last block of extinction trials.

      We confirm that overall, there was a significant decline in freezing across the extinction session shown in Figure 4B. The Reviewer is correct to point out that this decline was modest (if not negligible) in the Diff-EYFP group, which was receiving its first inhibitory training with the target tone stimulus. It is worth noting that across all experiments, most groups that did not receive infralimbic stimulation displayed a modest decline in freezing during the extinction session since it was relatively brief, involving only 6 or 8 tone alone presentations. This was intentional, as we aimed for the brief extinction session to generate minimal inhibitory learning and thereby to detect any facilitatory effect of infralimbic stimulation. This has been clarified and explained in the revised version of the manuscript (see Results section, description of Experiment 1).

      (5) The Discussion explored the outcomes of the experiments in detail, but it would be useful for the authors to discuss the implications of their findings for our understanding of circuits in which the IL is embedded that are involved in inhibitory learning and memory. It would also be useful for the authors to acknowledge in the Discussion that although they did not have the statistical power to detect sex differences, future work is needed to explore whether IL functions similarly in both sexes.

      In line with the Reviewer’s suggestion (see also Reviewer 3), the Discussion section has been substantially altered in the revised manuscript. Among other things, it does mention that future studies will need to examine the role of additional brain regions in the effects reported and it acknowledges the need to further explore sex differences and IL functions.

      Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      We thank the Reviewer for their positive assessment

      Weaknesses:

      (1) More justification for parametric choices (number of days of backwards vs forwards conditioning) could be provided.

      All experimental parameters were based on previously published experiments showing the capacity of the backward conditioning protocols to generate inhibitory learning and the forward conditioning protocols to produce excitatory learning. Although this was mentioned in the methods section, we acknowledge that further explanation was required to justify the need for multiple days of backward training. This has been provided in the revised manuscript (see Results section and description of the backward parameters.

      (2) The current discussion could be condensed and could focus on broader implications for the literature.

      The discussion has been severely condensed and broader implications have been discussed with respect to the existing literature looking at the neural circuitry underlying inhibitory learning.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Re-analyze extinction retrieval, focusing only on the first 2-4 tones to capture extinction expression.

      This recommendation corresponds to the second public comment made by the Reviewer, and we have replied to this comment.

      (2) Directly test whether activation of IL during fear extinction is insufficient to facilitate extinction retrieval without prior extinction training.

      The manuscript provides five separate demonstrations that the optogenetic approach to stimulate IL activity did not facilitate the initial brief extinction session. This reproduces what had been found with indiscriminate pharmacological stimulation in our previous research (Lingawi et al., 2018; Lingawi et al., 2017). We appreciate that other work that stimulated specific IL neuronal populations has observed facilitation of extinction but, the present manuscript focuses on the role of all IL neuronal populations in encoding inhibitory memories. The Reviewer’s request would imply contrasting the role of various neuronal populations, which is beyond the scope of this manuscript. Nevertheless, we have modified our discussion to indicate that future research should establish which IL neuronal population(s) contribute to the effects reported here.

      (3) Show the percentage of neurons that exhibit excitatory or inhibitory responses in IL after non-specific optogenetic activation to better understand how this manipulation is affecting IL circuitry.

      All electrophysiological recordings (n = 10 cells) are presented in Figure 1C. ChR2 excitation was substantial and overwhelming. Based on the physiological and morphological characteristics of the recorded cells, one was non-pyramidal and was excited by LED light delivery. The remaining 9 cells were pyramidal. One did not respond to LED delivery, but we cannot exclude the possibility that this was due to a lack of ChR2 expression in the somatic compartment. Another cell showed a mild reduction in activity following LED stimulation, while the remaining 7 cells displayed clear excitation upon LED stimulation. We have modified our manuscript to reflect these observations. We did not include percentages since only 10 recordings are shown.

      (4) Present data from all five conditioning sessions, not just one, to allow evaluation of learning history.

      This recommendation corresponds to the fourth public comment made by the Reviewer, and we have replied to this comment.

      (5) Address the issue of small and poorly matched groups, particularly in Figures 2b, 3b, 6b, and 6c.

      This recommendation corresponds to the third public comment made by the Reviewer, and we have replied to this comment.

      (6) Temper the conclusions to reflect the limitations of sampling, group matching, and the lack of specificity in the manipulation.

      We have modified our Discussion to address potential issues related to sampling and group matching. However, we are unsure how the lack of specificity of the IL stimulation has any impact on the interpretations made, since no statement is made about neuronal specificity. That said, as noted above, “we have modified our discussion to indicate that future research should establish which IL neuronal population(s) contribute to the effects reported here”.

      Reviewer #2 (Recommendations for the authors):

      Nothing additional to include beyond what is written for public view.

      Reviewer #3 (Recommendations for the authors):

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition. I only have a couple of comments that the authors may want to consider.

      We thank the Reviewer for their positive assessment.

      First, in Figure 2, it is unfortunate that there is a general effect of the LED assignment before the LED experience (p=.07 during that first extinction session). This is in the same direction as the difference during the test, so it is not clear that the test difference really reflects differences due to Extinction 2 treatment or to preexisting differences based on group assignments.

      The Reviewer’s comment is identical to the first public comment of Reviewer 2, which has been addressed.

      Second, it is notable that the backwards fear conditioning phase was conducted over 5 days, but the forward conditioning phase was conducted over one day. The rationale for these differences should be presented. There is an old idea going back to Konorski that backwards conditioning may lead to excitation initially, and it is only after more extensive trials that inhibitory conditioning occurs (a finding supported by Heth, 1976). Some discussion of the potential biphasic nature of backwards conditioning would be useful, especially for people who want to run this type of experiment but with only a single session of backwards conditioning.

      In line with the Reviewer’s suggestion, the revised manuscript (see results section) provide an explanation for conducting backward conditioning across multiple days.

      Third, as written, each paragraph of the discussion is mostly a recapitulation of the findings from each experiment. This could be condensed significantly, and it would be nice to see more integration with the current literature and how these results challenge or suggest nuance in current thinking about IL function.

      We have significantly condensed the recapitulation of our findings in the Discussion of the revised manuscript. The Discussion now dedicates space to address comments from the other Reviewers and integrate the present findings with the current literature.

      References

      Chen, Y.-H., Wu, J.-L., Hu, N.-Y., Zhuang, J.-P., Li, W.-P., Zhang, S.-R., Li, X.-W., Yang, J.-M., & Gao, T.-M. (2021). Distinct projections from the infralimbic cortex exert opposing effects in modulating anxiety and fear. J Clin Invest, 131(14), e145692. https://doi.org/10.1172/JCI145692

      Do-Monte, F. H., Manzano-Nieves, G., Quiñones-Laracuente, K., Ramos-Medina, L., & Quirk, G. J. (2015). Revisiting the role of infralimbic cortex in fear extinction with optogenetics. J Neurosci, 35(8), 3607-3615. https://doi.org/10.1523/JNEUROSCI.3137-14.2015

      Estes, W. K. (1955). Statistical theory of spontaneous recovery and regression. Psychol Rev, 62(3), 145-154. https://doi.org/10.1037/h0048509

      Kim, H.-S., Cho, H.-Y., Augustine, G. J., & Han, J.-H. (2016). Selective Control of Fear Expression by Optogenetic Manipulation of Infralimbic Cortex after Extinction. Neuropsychopharmacology, 41(5), 1261-1273. https://doi.org/10.1038/npp.2015.276

      Lingawi, N. W., Holmes, N. M., Westbrook, R. F., & Laurent, V. (2018). The infralimbic cortex encodes inhibition irrespective of motivational significance. Neurobiol Learn Mem, 150, 64-74. https://doi.org/10.1016/j.nlm.2018.03.001

      Lingawi, N. W., Westbrook, R. F., & Laurent, V. (2017). Extinction and Latent Inhibition Involve a Similar Form of Inhibitory Learning that is Stored in and Retrieved from the Infralimbic Cortex. Cereb Cortex, 27(12), 5547-5556.

      https://doi.org/10.1093/cercor/bhw322.

    1. AbstractAdvances in spatial omics enable measurement of genes (spatial transcriptomics) and peptides, lipids, or N-glycans (mass spectrometry imaging) across thousands of locations within a tissue. While detecting spatially variable molecules is a well-studied problem, robust methods for identifying spatially varying co-expression between molecule pairs remain limited. We introduce SpaceBF, a Bayesian fused modeling framework that estimates co-expression at both local (location-specific) and global (tissue-wide) levels. SpaceBF enforces spatial smoothness via a fused horseshoe prior on the edges of a predefined spatial adjacency graph, allowing large, edge-specific differences to escape shrinkage while preserving overall structure. In extensive simulations, SpaceBF achieves higher specificity and power than commonly used methods that leverage geospatial metrics, including bivariate Moran’s I and Lee’s L. We also benchmark the proposed prior against standard alternatives, such as intrinsic conditional autoregressive (ICAR) and Matérn priors. Applied to spatial transcriptomics and proteomics datasets, SpaceBF reveals cancer-relevant molecular interactions and patterns of cell–cell communication (e.g., ligand–receptor signaling), demonstrating its utility for principled, uncertainty-aware co-expression analysis of spatial omics data.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag006), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Daniel Domovic

      Dear authors,

      I read your manuscript "SpaceBF: Spatial coexpression analysis using Bayesian Fused approaches in spatial omics datasets" with interest.

      The manuscript presents SpaceBF, a Bayesian method for detecting spatial co-expression between pairs of molecules in spatial omics data. The topic is relevant since new technologies like spatial transcriptomics, mass spectrometry imaging, and multiplex immunofluorescence produce large data but current tools for co-expression are limited. The authors try to solve this gap with a new model and they also test it on real datasets. The paper is technical, but it also gives biological examples, which is helpful for readers.

      The paper has many strong points. First, the idea to use Bayesian fused horseshoe prior together with MST spatial structure is new and well explained. Second, the authors apply their method on three real datasets and they show interesting biology, for example IGF2-IGF1R relation, keratin isoform consistency, and stromal ECM peptides. Third, I appreciate that the code is open on GitHub. Also, the paper compares with other methods and deals with the common problem of variance-stabilizing transform by modeling UMI counts directly with negative binomial distribution.

      Overall, the work is clear and well organized, but there are some points where more explanation or clarification would help. In my review I give major and minor remarks that I hope will improve the paper.

      Major remarks 1. Were you worried choosing MST may oversimplify spatial relationships, since many meaningful local neighborhoods may be excluded? Would the results of SpaceBF be significantly different if a different spatial graph, such as kNN, Delaunay triangulation, or kernel-based, was used instead of MST? 2. Since MST edges depend a lot on pairwise L2 distances, how stable are the results if spatial coordinates are a little noisy, or if there are tissue registration errors? 3. The model puts one molecule as outcome and the other as predictor. Are the co-expression estimates still the same if you switch roles? 4. In the Results you mention "FDR < 0.1." Can you explain which method you used for FDR? Also, are the discoveries robust if you change the threshold (for example 0.05 vs 0.1)? 5. Do the simulation parameters (lengthscale, slope, dispersion) correspond to realistic biological signal strengths and spatial scales observed in real datasets? Three values of the lengthscale l are considered, l = 3.6, 7.2, 18. Why exactly these values? What does ν=0.75 mean in terms of effect size? How does l=18 compare to real tissue lengthscales? 6. Can you describe runtime and memory for larger datasets, like 10X Visium with 5,000-20,000 spots? Is the current MCMC practical for this scale, or do you think approximate inference (like variational Bayes or INLA) is needed?

      Minor remark 1. How sensitive are the results to the choice of hyperparameters for the Horseshoe prior? 2. In the Results you state that keratins "co-express highly, meaning their binding patterns with any specific type 1 keratin should be similar." Please make clear that SpaceBF measures co-expression, not direct binding, so that conclusions are not overstated. 3. You mention SpatialCorr and Copulacci, but the comparison was not successful. Even if parameters were sensitive, I think one short numerical comparison in the supplement would be helpful. 4. You filter out genes with fewer than ~59 total reads (0.2 x number of spots). Can you justify the choice of this threshold and show if results are stable for other thresholds (for example 0.1x or 0.5x)? Since many ligands and receptors are lowly expressed, is there a risk of losing meaningful biology? Since the dataset has only 293 spots, thresholds can have strong effect.

    1. 1.2. Kumail Nanjiani’s Reflections on Ethics in Tech# Image source Kumail Nanjiani was a star of the Silicon Valley TV Show, which was about the tech industry. He posted these reflections on ethics in tech on Twitter (@kumailn) on November 1, 2017: As a cast member on a show about tech, our job entails visiting tech companies/conferences etc. We meet ppl eager to show off new tech. Often we’ll see tech that is scary. I don’t mean weapons etc. I mean altering video, tech that violates privacy, stuff w obv ethical issues. And we’ll bring up our concerns to them. We are realizing that ZERO consideration seems to be given to the ethical implications of tech. They don’t even have a pat rehearsed answer. They are shocked at being asked. Which means nobody is asking those questions. “We’re not making it for that reason but the way ppl choose to use it isn’t our fault. Safeguard will develop.” But tech is moving so fast. That there is no way humanity or laws can keep up. We don’t even know how to deal with open death threats online. Only “Can we do this?” Never “should we do this? We’ve seen that same blasé attitude in how Twitter or Facebook deal w abuse/fake news. You can’t put this stuff back in the box. Once it’s out there, it’s out there. And there are no guardians. It’s terrifying. The end. Kumail Nanjiani 1.2.1. Reflection questions:# What do you think is the responsibility of tech workers to think through the ethical implications of what they are making? Why do you think the people who Kumail talked with didn’t have answers to his questions?

      I think tech workers have a responsibility to consider the ethical implications of what they create, because technology can shape behavior, privacy, and power in ways that are difficult to reverse. As Kumail Nanjiani points out, once technology is released, it cannot simply be taken back, so ethical thinking should happen before harm occurs.

      I think the people Kumail spoke with lacked answers because ethical reflection is often not prioritized in tech culture. Many developers focus on whether something can be built rather than whether it should be built, and since these questions are rarely asked, they may not be prepared to address them.

    1. R0:

      Reviewer #1: Peer Reviewer’s report for the submission “Reaching the 100 by 2027 target for universal access to rapid diagnostic tests 2 for tuberculosis in Africa: in-sight but out of reach”

      Recommendation: Minor Revisions General Comment: This paper addresses a pertinent global health subject, a WHO priority research gap. The methods are sound and innovative. However, the authors need to improve on the clarity of the paper.

      Abstract: -The authors did a fantastic work summarizing the study with this abstract -Kindly break the abstract into the standard sections: background, methods, results, conclusion -Please clearly designate and state clearly the name of the study design used in this study. Are we an ecological study with mixed methods or what?

      Background -Great job introducing the research gap and pertinence of the research -A brief perspective on funding gaps for diagnostics might strengthen this section -Do not overestimate the knowledge of potential readers on the subject, briefly describe what WRDs are and state list them. Why are they so important?

      Methods -This section of the work is a bit to brief and doesn’t present the work in a way that can be easily reproducible by readers. Use standard sub-headers such as study design, study population, study period, data collection and data analysis for clarity. -Again, I ask what is the study design of this study? -WRD were recommended 10 years ago, what is the rationale behind the period 2021-2023? I think the key landmarks for this are 2015 for End-TB, 2018 for the first UNHLM and 2023 for the second UNHLM. -Line 98-101: How were these cutoffs decided? -Study area is completely absent. It is important to shade more light on the 24 countries. Who are they, what is the burden of TB there, any peculiarities? -Benchmarks which needed a secondary calculation following extraction need to be presented clearly, showing the variables used as denominator and numerator.

      Results -Kindly provide the exact number of cases tested for the different years, prior to providing proportions. A standalone table could resolve this. -Line 151-161, I find it hard to see trends with just 3 years data points. Probably need to increase the years if you want to discuss trends -Did the Table 2 strategies come from the TB staff or the authors? It appears it came from the authors, in which case I don’t agree with their existence in the results. At best in recommendations

      Discussions -The authors did a superb job discussing the available findings of the study -Being a study with policy implications, kindly include a sub-header for Policy implications of the findings and state them clearly -Include sub-headers for strengths and limitations and outline them clearly

      Reviewer #2: Review of Title: Reaching the 100 by 2027 target for universal access to rapid diagnostic tests for tuberculosis in Africa: in-sight but out of reach

      Summary of research and overall impression This is a well-written and researched article reporting on the availability and use of WHO-recommended rapid diagnostics for TB in African countries where there is significant burden. The authors use routinely reported data to assess access to WRDs, and a small survey of programme staff from a subset of countries to identify barriers and facilitators to the inclusion of WRDs in diagnostic algorithms. The paper makes an important contribution to the TB literature by mapping the gaps in terms of access to and usage of WRDs, which is needed to strengthen TB control efforts. There are minor comments for the authors to address to strengthen the paper.

      Methods 1. Include brief details on how/why the 24 countries included in the review were selected. 2. More details are needed to describe the process for the country stakeholder survey. For example:

      • Specify what the questionnaire consisted of, i.e., closed and open-ended questions? What topic areas/sections were included/asked about? How/by whom was the questionnaire designed/developed, using/adapting an existing framework/questionnaire?
      • How were the questionnaires sent out? Were specific people targeted? How many were sent out? What was the timeframe?
      • Provide details of how/why the 6 countries were selected – e.g., 1-2 from each region? Who inputted on these decisions? The authors mention later that these were also selected based on WRD access, which should be mentioned here in methods.

      • It is unclear under ‘statistical analysis’ if this refers to analysis of all data, or just the data review. Suggest revising to clarify analysis for data review, and analysis for the stakeholder survey. Two things to consider: 1) Provide details on the data extracted and the analysis conducted. 2) It is unclear what is meant here: “The first author used topic guides that reflected content areas such as barriers and contextual factors influencing WRD use and the themes that emerged during the review of the survey responses to manually organise the data into thematic codes.” Is this referring to the stakeholder surveys? Suggest revising for clarity on the analysis process. Were any frameworks used in analysis to categorise barriers into categories and develop mitigation strategies? This process needs to be detailed in the methods to lead into the results.

      • Please clarify/confirm the ethics of surveying country stakeholders without a consent process, even if participants (country stakeholders) are not identifiable.

      Results Provide details of how many survey responses were received. Is it only 6 from 6 countries (as in lines 182-186)? How were respondents distributed across the 6 countries? Could they speak to the different country contexts? Later in the text there is mention of 16, suggest clarifying this in the results clearly.

      In lines 163 onwards, when referring to the analysed gaps in the TB diagnostic cascade, please clarify in the text throughout what is meant with ‘countries reported’ – is this a comparison of what is found in the data review with what is reported by country stakeholders?

      As mentioned earlier, the process for categorising the barriers and developing mitigation strategies must be introduced in the methods. “We then distilled the barriers into five categories and developed mitigation strategies 260 (Table 3) to improve the use of WRDs across all 24 LabCoP countries.” Did you use a framework for this to guide at different health system level? Suggest revising the three theme headings as they read more like recommendations statements now than findings, i.e., optimise…, strengthen…. To read as findings of the barriers and facilitators, they should be descriptive of what was found. - Theme 1: ‘optimise WRD capacity’ – clarify what ‘capacity’ is referring to. Under this heading there are multiple aspects included, i.e., policies, guidelines, as well as examples of how access to WRD has been improved, so examples of optimising WRD capacity? - Theme 2: seems to speak to 2 things: sample transportation and access to testing via active case finding. Clarify if/how these are linked. - Theme 3 – insufficient financing, staffing, and infrastructure to implement WRD.

      Discussion Under strengths and limitations, the authors mention that ‘a planned report from our annual meeting will capture responses from all 24 countries’ – lines 362-363. This statement has limited relevance to the article, unless already publicly available and can be referenced. Suggest to delete/remove.

      The authors also mention ‘only reached out to the selected countries’ – line 361. Suggest to phrase this more positively, i.e., we purposively selected a subset of 6 countries from the 24 within the LabCoP network, which may limit…’

      R1:

      Reviewer #2: Well done on an exceptionally well-written and important paper. I do have one pending comment about the number of survey responses, which I do not see reported in the results. It is important to include the number of respondents and how they were distributed across the 6 countries included in the survey.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work presents an interesting circuit dissection of the neural system allowing a ctenophore to keep its balance and orientation in its aquatic environment by using a fascinating structure called the statocyst. By combining serial-section electron microscopy with behavioral recordings, the authors found a population of neurons that exists as a syncytium and could associate these neurons with specific functions related to controlling the beating of cilia located in the statocyst. The type A ANN neurons participate in arresting cilia beating, and the type B ANN neurons participate in resuming cilia beating and increasing their beating frequency.

      Moreover, the authors found that bridge cells are connected with the ANN neurons, giving them the role of rhythmic modulators.

      From these observations, the authors conclude that the control is coordination instead of feedforward sensory-motor function, a hypothesis that had been put forth in the past but could not be validated until now. They also compare it to the circuitry implementing a similar behavior in a species that belongs to a different phylum, where the nervous system is thought to have evolved separately.

      Therefore, this work significantly advances our knowledge of the circuitry implementing the control of the cilia that participate in statocyst function, which ultimately allows the animal to correct its orientation. It represents an example of systems neuroscience explaining how the nervous system allows an animal to solve a specific problem and puts it in an evolutionary perspective, showing a convincing case of convergent evolution.

      Strengths:

      The evidence for how the circuitry is connected is convincing. Pictures of synapses showing the direction of connectivity are clear, and there are good reasons to believe that the diagram inferred is valid, even though we can always expect that some connections are missing.

      The evidence for how the cilia change their beating frequency is also convincing, and the paradigm and recording methods seem pretty robust.

      The authors achieved their aims, and the results support their conclusions. This work impacts its field by presenting a mechanism by which ctenophores correct their balance, which will provide a template for comparison with other sensory systems.

      Thank you very much for these comments.

      Weaknesses:

      The evidence supporting the claim that the neural circuitry presented here controls the cilia beating is more correlational because it only relies on the fact that the location of the two types of ANN neurons coincides with the quadrants that are affected in the behavioral recordings. Discussing ways by which causality could be established might be helpful.

      We have now added additional discussions in a new “Future Directions” section explaining that for example calcium imaging or targeted neuron ablations could be used in future work to establish causality. This would require the development of genetic delivery techniques to e.g. introduce GCaMP calcium sensor or transgenic reporters.

      The explanation of the relevance of this work could be improved. The conclusion that the work hints at coordination instead of feedforward sensory-motor control is explained over only a few lines. The authors could provide a more detailed explanation of how the two models compete (coordination vs feedforward sensory-motor control), and why choosing one option over the other could provide advantages in this context.

      We added a more detailed explanation about the two types of model and why we believe that a coordination model is more compatible with our connectome data.

      “An alternative model for the function of the nerve net would be a feedforward sensory-motor system, in which balancer cells provide mechanosensory input to motor effectors via the nerve net, similar to a reflex arc. None of our observations support such a sensory-motor model. There are no synaptic pathways from balancer cells or any other sensory cells to the nerve net. The only synaptic input to ANNs comes from the bridge cells (discussed below) and from each other. The three synaptically interconnected ANNs may generate endogenous rhythm that controls balancer cilia and is influenced by bridge input. ANNs may also be influenced by neuropeptides secreted by other aboral organ neurons. Such chemical inputs may underlie the flexibility of gravitaxis and its modulation by other cues (e.g. light). Overall, the coordination model parsimoniously explains both the ANN wiring topology and the observed dynamics, whereas a simple feedforward reflex does not.”

      Since the fact that the ANN neurons form a syncytium is an important finding of this study, it would be useful to have additional illustrations of it. For instance, pictures showing anastomosing membranes could typically be added in Figure 2.

      We have now included a movie (Video 3) showing a volumetric reconstruction of a segment of an ANN neuron, which highlights the anastomosing morphology in greater detail than static images.

      “Video 3. Volumetric reconstruction of a single ANN Q1-4 neuron showing syncytial soma (cyan) and nuclei (magenta). The rotating view highlights the anastomosing morphology, although not all fine details could be reconstructed due to data limitations.”

      Also, to better establish the importance of the study, it could be useful to explain why the balancers’ cilia spontaneously beat in the first place (instead of being static and just acting as stretch sensors).

      We have discussed in more detail why it may be important for the balancer cilia to beat.

      “The observation that balancer cilia beat spontaneously, even in the absence of external tilt, suggests that they are active sensory oscillators rather than static stretch sensors. Their spontaneous beating could set a dynamic baseline of sensitivity, which can then be modulated by ANN inputs or sensory changes during tilt. Such a dynamic system may be more sensitive to small deflections and be more responsive [@Lowe1997]. Thus, the regulated beating of balancer cilia should not be seen as noise, but as an adaptive feature that enables flexible and robust graviceptive responses. The ctenophore balancer may thus use active ciliary oscillations for enhanced sensorimotor integration similar to other sensory systems [@Wan_2023].”

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors describe the production of a high-resolution connectome for the statocyst of a ctenophore nervous system. This study is of particular interest because of the apparent independent evolution of the ctenophore nervous system. The statocyst is a component of the aboral organ, which is used by ctenophores to sense gravity and regulate the activity of the organ’s balancer cilia. The EM reconstruction of the aboral organ was carried out on a five-day-old larva of the model ctenophore Mnemiopsis leidyi. To place their connectome data in a functional context, the authors used high-speed imaging of ciliary beating in immobilized larvae. With these data, the authors were able to model the circuitry used for gravity sensing in a ctenophore larva.

      Strengths:

      Because of it apparently being the sister phylum to all other metazoans, Ctenophora is a particularly important group for studies of metazoan evolution. Thus, this work has much to tell us about how animals evolved. Added to that is the apparent independent evolution of the ctenophore nervous system. This study provides the first high-resolution connectomic analysis of a portion of a ctenophore nervous system, extending previous studies of the ctenophore nervous system carried out by Sid Tamm. As such, it establishes the methodology for high-resolution analysis of the ctenophore nervous system. While the generation of a connectome is in and of itself an important accomplishment, the coupling of the connectome data with analysis of the beating frequency of balancer cell cilia provides a functional context for understanding how the organization of the neural circuitry in the aboral organ carries out gravity sensing. In addition, the authors identified a new type of syncytial neuron in  Mnemiopsis. Interestingly, the authors show that the neural circuitry controlling cilia beating in Mnemiopsis shares features with the circuitry that controls ciliary movement in the annelid Platynereis, suggesting convergent evolution of this circuitry in the two organisms. The data in this paper are of high quality, and the analyses have been thoroughly and carefully done.

      Weaknesses:

      The paper has no obvious weaknesses.

      We thank the reviewer for these comments.

      Reviewer #3 (Public review):

      Summary:

      It has been a long time since I enjoyed reviewing a paper as much as this one. In it, the authors generate an unprecedented view of the aboral organ of a 5-day-old ctenophore. They proceed to derive numerous insights by reconstructing the populations and connections of cell types, with up to 150 connections from the main Q1-4 neuron.

      Strengths:

      The strengths of the analysis are the sophisticated imaging methods used, the labor-intensive reconstruction of individual neurons and organelles, and especially the mapping of synapses. The synaptic connections to and from the main coordinating neurons allow the authors to create a polarized network diagram for these components of the aboral organ. These connections give insight into the potential functions of the major neurons. This also gives some unexpected results, particularly the lack of connections from the balancer system to the coordinating system.

      Thank you for these positive comments on the paper.

      Weaknesses:

      There were no significant weaknesses in the paper - only a slate of interesting unanswered questions to motivate future studies.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In consultation, the reviewers recommend that improving the evidence to “exceptional” would require additional perturbation experiments (e.g., ablation of specific neurons), as Reviewer 1 suggests. They also recommend adding a “Future Directions” section to the manuscript, because it opens up so many new experimental directions.

      We have added a new “Future Directions” section at the end of the Discussion. To carry out the proposed perturbation or calcium imaging experiments would require significant additional work and method development. We are actively working in establishing mRNA and DNA injection into ctenophore zygotes to enable live imaging, cell labelling or ablations in the future.

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improved or additional experiments, data, or analyses:

      To establish causality (neurons control balancer cilia), an important experiment would be to manipulate each of these neuronal populations (e.g., by ablating them) and measure the effect of these ablations on the beating frequency of the balancer cilia of the four quadrants. Moreover, direct observation of neuronal activity (e.g., by using calcium imaging) would also provide more compelling evidence for neuronal control.

      We agree with the reviewer that such perturbation experiments would be needed to establish causality. Such experiments are currently still not possible in ctenophoes and would require significant technology development. We discuss such experiments in the “Future directions” section and also place this in the context of the currently available techniques in ctenophores. We are actively working on this but waiting for such technological breakthroughs and new experiments would significantly delay the publication of a version of record of the paper.

      Recommendations for improving the writing and presentation:

      ANN neurons are described in great detail, though SNN neurons are described more loosely. Perhaps a more detailed description of SNN neurons would be helpful.

      We added the information on SNNs to show that these cells are distinct from the ANN neurons. Since our focus is on the aboral organ, we did not aim for a comprehensive reconstruction of SNNs. Several of the processes of the SNNs are also truncated and outside our EM volume. We have nevertheless added additional details about the morphology and connectivity of SNN neurons.

      “Near the perifery of the aboral organ, we identified four further anastomosing nerve-net neurons. These resembled the previously reported syncytial subepithelial nerve net (SNN) neurons in the body wall of Mnemiopsis (Figure 2–figure supplement 1C–G) and were clearly distinct from the ANN neurons (both in location and morphology). SNN neurons show a blebbed morphology and contain dense core vesicles @Burkhardt2023 but no synapses.”

      Minor corrections to the text and figures:

      (1) Figure 2 C): “mitochondia” instead of “mitochondria”.

      corrected

      (2) Figure 3. Title: “balancer and and bridge”.

      corrected

      (3) Figure 3.C) “shown in xxx color”

      corrected

      Reviewer #2 (Recommendations for the authors):

      Clearer usage of the terms statocyst, aboral organ, aboral nerve net, statolith, dome, and lithocytes would be helpful. For readers not familiar with ctenophore anatomy, things can get a bit confusing. A single schematic with all of these terms would be helpful. In Figure 1E, there is a label “dc”. Should this be “do”?

      We have added an annotated schematic to Figure 1, explaining these terms.

      Figure 1C “The statocyst is a cavity-like organ enclosed by the dome cilia (do), which contains the statolith formed by lithocytes (li) and supported by the balancer cilia (bal).”

      Reviewer #3 (Recommendations for the authors):

      My comments are numerous, but mostly minor suggestions for improving the clarity.

      [Suggested insertions/changes are indicated by square brackets]

      (1) [It would be much easier to review this if there were line numbers, or with a double-spaced manuscript that was more accommodating for markup.]

      Thank you for this comment. We have increased the line spacing in the revised version. (We set the CSS line-height property on the html ‘body’ element to 2em).

      (2) The terms statolith, statocyst, and lithocytes can be confusing, so it would be nice to have an upfront definition of how they relate to each other.

      We have now explain these terms in the Introduction and also have improved the annotation of Figure 1.

      Figure1C. “The statocyst is a cavity-like organ enclosed by the dome cilia (do), which contains the statolith formed by lithocytes (li) and supported by the balancer cilia (bal).”

      (3) Statolith is spelled as statolyth in the early pages, but statolith in the later pages. I think -lith is more common, but in any case, these should be standardized.

      corrected to ‘statolith’

      ABSTRACT:

      (1) Differential load[s] on the balancer cilia [lead] to altered

      changed

      (2) We used volume electron microscopy (vEM) to image the aboral organ.

      changed

      (3) also form reciprocal connections with the bridge cells.

      corrected

      INTRODUCTION:

      (1) “identify conserved neuronal markers in ctenophores” - confusing - does this mean conserved across ctenophores, or conserved in ctenophores and other animals?

      changed to “classical neuronal markers”

      (2) “either increase or decrease their [ciliary] activity, indicating” - otherwise it sounds like the balancers are increasing activity.

      changed to “balancer cells may either increase or decrease their ciliary activity”

      (3) after “matches the setup used in high-speed imagine experiments”, it might be nice to add a statement like “Future studies could potentially investigate activity in the inverted orientation, when the statolith is suspended below the cilia, to see if the response differs.”

      In this sentence we referred to the orientation of the animals in our figures. There is a consensus among ctenophore researchers that when depicting ctenophores, the aboral organ should face downwards. However, for this paper we chose the opposite orientation to better match our experiments and help interpreting the results. We changed the text to: “In this study, we represent ctenophores with their aboral organ facing upwards (”balancer-up” posture), as this configuration facilitates intuitive interpretation of balance-like functions and matches the setup used in high-speed imaging experiments. ”

      We added the sentences “Future experiments could also explore how orientation affects the response of balancer cilia. For example, when the statolith is suspended below the cilia (the”balancer-down” posture), ciliary beating patterns may differ from what we observed here in the “balancer-up” configuration.” to the section Future Directions”.

      (4) “abolished by calcium[-]channel inhibitors”

      corrected

      (5) “By functional imaging, we uncovered” - It is not clear what functional imaging is. Maybe a fewword definition here, and be sure to explain in the methods.

      changed to “By high-speed ciliary imaging”. The details of the imaging are explained in the Methods section under “Imaging the Activity of Balancer Cilia”.

      RESULTS:

      (1) “five-day-old” - is it worth saying post-fertilization here?

      Thank you for pointing this out. In accordance with Presnell et al. (2022), we use post-hatching as the reference. We have revised the text in the Materials and Methods section to read: “5-day-old (5 days post-hatching)”

      (2) “We classified these cells into cell types [based on …]” - specify a bit about how you classified them based on morphology, the presence of organelles, etc.

      We added a clarification. “Our classification was based on i) ultrastructural features (e.g. number of cilia), ii) cell morphology (e.g. nerve net or bridge cells), iii) unique organelles (e.g. lamellate body, plumose cells), iv) and similarities to cell types previously described by EM. Our classification agrees with the cell types identified in the 1-day-old larva [@ferraioli2025].”

      (3) “CATMAID only supports [bifurcating] skeleton trees” - Correct?

      yes, a node in CATMAID cannot be fused to another node of the same skeleton to represent anastomoses

      FIGURE 1:

      (1) It is not worth redrawing and renumbering everything, but I wish the lateral view in A matched the rotated aboral view in B, instead of having to do two rotations to get the alignment to coincide. (Rotating panel B 90{degree sign} clockwise would make them match, but then it wouldn’t coincide with all the subsequent figures.)

      Thank you for the suggestion. We have replaced panel A with a lateral view that now matches panel B.

      (2) The labels on Figure 1 are a mix of two typefaces (Helvetica and Myriad?). They should be standardized to all use one typeface (preferably Helvetica).

      we have changed the font to Helvetica

      (3) Panel C legend: arrows are not really arrows. Say “Eye icons” or something like that. Can you show the location of the anal pores in the DIC image?

      Changed to ‘eye icons’. The anal pores are usually closed and only open briefly therefore it is not clear where exactly they would be, so indicating their position would be misleading.

      (4) Panel F, I cannot see the lines mentioned in the legend at all, except for maybe a tiny wisp in a couple of places. Either omit or make visible.

      changed to “The spheres indicate the position of nuclei in the reconstructed cells.”

      (5) Panel G. “Cells are color coded according to quadrants”… but unfortunately, the color scale is 90{degree sign} off of what is presented in the rest of the panels and the paper. Q1 and Q3 have been blue, but now Q2+4 are blue/purple, while Q1+3 are orange/yellow. Again, it seems like too much work to recolor panel G, but in future, it would be nice to maintain that consistency, especially since other panels specifically mention the consistent colors.

      We have changed the color code in panels B, C and E to match G and the subsequent panels/figures.

      RESULTS: Aboral synaptic nerve net

      (1)“We reconstructed three aboral nerve-net (ANN) neurons” - out of how many total? Were these three just the first ones traced, or are they likely to be all of the multi-domain neurons? One can’t tell if these are the top 3 (out of X), or if there are other multi-quad neurons that were not traced. Are there any Q1Q4 or Q2Q3 neurona? Specify overall composition.

      There are only three ANN neurons in the aboral organ. These are all completely reconstructed and contained within the volume. We have clarified this in the text. “We identified and reconstructed three aboral nerve-net (ANN) neurons, each exhibiting a syncytial morphology characterized by anastomosing membranes and multiple nuclei (ranging from two to five) (Figure 2A and B, Figure 2–figure supplement 1C). These three neurons are the only fully reconstructed ANN neurons contained within the volume. Several small ANN-like fragments were also observed at the periphery of the aboral organ, but their connectivity to the main ANN remains uncertain.”

      FIGURE 2:

      (1) Panel C: “N > 2 cells for each cell type” - is that supposed to say “N > 2 mitochondria”? More than 2 cells in all the types shown in the graph.

      It is number of cells for each cell type

      (2) Panel D: Is this the wrong caption? I can only see green and black circles, not red, yellow, or blue. Make them larger or “flat” (circled, not shaded spheres) if they are supposed to be visible

      Thank you for pointing this out. The caption was incorrect and has been corrected to match the figure.

      (3) Panel E: Amazing to see the cross-network connections!

      Thank you

      (4) Again, it is great to see the three ANN mapped out, but … are there other connections that weren’t mapped in this study? Other high-level coordinating neurons? ANN_Q1Q4 or Q2Q3?

      The reconstruction is complete and there are no other neurons or connections. Given the large size of ctenophore synapses, we are confident that we identified all or most synapses and their connections.

      RESULTS: Synaptic connectome

      (1) “displaying rotational symmetry” - This is one of the things I am most curious about. Where is the evidence of rotational symmetry in the network diagram? Is it the larger number of connections to Q2 and Q4? Any evidence of rotational symmetry, like Q1 and Q3 connect to Q2 and Q4 respectively, but not the other way around?

      changed to “displaying biradial symmetry”, we do not consider the slight difference in synapse number from ANN Q1-4 to the Q1-Q3 vs. Q2-Q4 balancers as significant or strong enough evidence for a single rotational symmetry (i.e. 180 degrees rotation)

      (2) “Surprisingly” - this *was* really surprising. There have to be some afferent neurons connecting from the balancers, don’t there? I can’t remember the connections to the SNN, but is there a tertiary set of ANNs that connect between the balancers and the top 3 ANNs? I would like a little more discussion about this.

      Indeed, this is why this is so surprising. Most people would have expected some output connections from the balancer to the nerve net or elsewhere. There are none. We have the complete balancer network and all balancer cells are ‘sink nodes’ (inputs only)(Figure3–figure supplement 1).

      we added a short statement in the beginning of the Bridge Cells as Feedback Regulators of Ciliary Rhythms section noting that no direct connections from the balancers to the ANN were found and that all balancer cells act as sink nodes (inputs only; Figure 3–figure supplement 1). This highlights that bridge cells are indeed the sole neuronal input to the ANN circuit.

      Figure 3:

      (1) As you know, during development, the diagonally opposite cells have a shared heritage and shared functionality. Are there neuronal signatures that correspond to the rotational symmetry that we see, for example, in the position of the anal pores?

      We did not find any evidence in neuronal complement for a diagonal symmetry, suggesting that neuronal organization does not simply mirror the organism’s rotational body symmetry.

      (2) Do you have the information to say whether there are any diagonal or asymmetric connections? Can’t tell if those would have shown up in the mapping efforts or if you focused on the major ones only.

      Based on our complete mapping, we did not find evidence for a diagonal pattern. The connectivity instead shows a biradial organization.

      (3) “extending across opposite quadrant regions” - to me, opposite would be diagonally opposite, but this looks like a set of cells between Q1 and Q2 is connecting to a sister-set in Q3+Q4. I wonder if, in a more detailed view, you could see whether this is a rotational correspondence, rather than a reflection. There are some subtle hints of this in the aboral view, with some cells on the right of the blue cluster and the left of the magenta cluster.

      changed to “extending across tentacular-axis-symmetric quadrant regions” for clarity

      (4) As with Figure 2, I do not see any circles/spheres that are yellow, red, or blue! There are some traces of what appear to be other neurons that have these colors, but nothing that would suggest the localization of mitochondria.

      Thank you for pointing this out. We have corrected the caption to match the figure, as in the previous item.

      (5) The connectivity map is very cool, but the caption does not seem to correspond to the version included in the manuscript. I don’t see any hexagons; all arrows seem to have the same thickness.

      changed to: “Complete connectivity map of the gravity-sensing neural circuit. Cells belonging to the same group are shown as diamonds, and the number of cells is added to their labels. The number of synapses is shown on the arrows.”

      RESULTS: Dynamics of balancer cilia

      (1) The orientation of the stage+larvae is a bit hard to follow. Maybe say the sagittal or tentacular plane is parallel to the sample stage and the gravity vector?

      we added “Larvae were oriented with their sagittal or tentacular plane parallel to the sample stage.”

      (2) “We could simultaneously image Q1(3) and Q2(4). The meaning of the numbers in () is not clear. Either way that I try to interpret it does not match the diagrams. Should this say viewing the tentacular plane, you can image Q1 and 4 or Q2 and 3?

      Thank you for spotting this mistake, we have changed to: “In larvae with their sagittal plane facing the objective, we could compare balancer-cilia movements between Q1 vs. Q2 or Q3 vs. Q4. In other larvae oriented in the tentacular plane, we could simultaneously image Q1 and Q4 or Q2 and Q3.”

      (3) Typo: episod[e]s were excluded

      Corrected

      DISCUSSION:

      This section is quite clean. Maybe mention some future directions:

      We have added a “Future Directions” section

      (1) Do these networks change during development? Five-days-old is still quite undeveloped - what would it look like in an adult specimen? Would you expect a larger version of the same or more diverse connections?

      As far as we know from work on aboral organs in adult ctenophores, the same structures and cells can be found. We do not know how the network will develop. We know that at 5 days the balancer is fully functional and the animals can orient and their behaviour is coordinated. So the wiring may not change extensively later in development. In the 1-day-old larva, Ferraioli et al. did not distinguish ANN neurons as a separate population, as these were merged with SNNs in their dataset. This suggests that significant cellular and circuit maturation likely occurs between 1 and 5 days.

      METHODS: Imaging the Activity of Balancer Cilia

      (1) “we selected only larvae whose aboral-oral axis was oriented nearly perpendicular to the gravitational vector”. Shouldn’t this be “nearly parallel to the gravity vector” not perpendicular?

      Thank you for spotting this, corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present exciting new experimental data on the antigenic recognition of 78 H3N2 strains (from the beginning of the 2023 Northern Hemisphere season) against a set of 150 serum samples. The authors compare protection profiles of individual sera and find that the antigenic effect of amino acid substitutions at specific sites depends on the immune class of the sera, differentiating between children and adults. Person-to-person heterogeneity in the measured titers is strong, specifically in the group of children's sera. The authors find that the fraction of sera with low titers correlates with the inferred growth rate using maximum likelihood regression (MLR), a correlation that does not hold for pooled sera. The authors then measure the protection profile of the sera against historical vaccine strains and find that it can be explained by birth cohort for children. Finally, the authors present data comparing pre- and post- vaccination protection profiles for 39 (USA) and 8 (Australia) adults. The data shows a cohort-specific vaccination effect as measured by the average titer increase, and also a virus-specific vaccination effect for the historical vaccine strains. The generated data is shared by the authors and they also note that these methods can be applied to inform the bi-annual vaccine composition meetings, which could be highly valuable.

      We appreciate the reviewer’s clear summary of our work.

      Thanks to the authors for the revised version of the manuscript. A few concerns remain after the revision:

      (1) We appreciate the additional computational analysis the authors have performed on normalizing the titers with the geometric mean titer for each individual, as shown in the new Supplemental Figure 6. We agree with the authors statement that, after averaging again within specific age groups, "there are no obvious age group-specific patterns." A discussion of this should be added to the revised manuscript, for example in the section "Pooled sera fail to capture the heterogeneity of individual sera," referring to the new Supplemental Figure 6.

      However, we also suggested that after this normalization, patterns might emerge that are not necessarily defined by birth cohort. This possibility remains unexplored and could provide an interesting addition to support potential effects of substitutions at sites 145 and 275/276 in individuals with specific titer profiles, which as stated above do not necessarily follow birth cohort patterns.

      The reviewer is correct that there remains heterogeneity among the serum titers to different strains that we cannot easily explain via age group, and suggests that additional patterns could emerge. We certainly agree that explaining this heterogeneity remains an interesting goal, but as described in the manuscript we have analyzed the possible causes of the heterogeneity as exhaustively as possible given the available metadata. At this point, the most we can say is that the strain-specific neutralization titers are highly heterogeneous in a way that cannot be completely explained by birth cohort. We agree that further analysis of the cause is an area for future work, and have made all of our data available so that others can continue to explore additional hypotheses. It may be that these questions can only be answered by experiments on sera from newer cohorts where more detailed metadata on infection and vaccination history are available.

      (2) Thank you for elaborating further on the method used to estimate growth rates in your reply to the reviewers. To clarify: the reason that we infer from Fig. 5a that A/Massachusetts has a higher fitness than A/Sydney is not because it reaches a higher maximum frequency, but because it seems to have a higher slope. The discrepancy between this plot and the MLR inferred fitness could be clarified by plotting the frequency trajectories on a log-scale.

      For the MLR, we understand that the initial frequency matters in assessing a variant's growth. However, when starting points of two clades differ in time (i.e., in different contexts of competing clades), this affects comparability, particularly between A/Massachusetts and A/Ontario, as well as for other strains. We still think that mentioning these time-dependent effects, which are not captured by the MLR analysis, would be appropriate. To support this, it could be helpful to include the MLR fits as an appendix figure, showing the different starting and/or time points used.

      Multinomial logistic regression is a widely used technique to estimate viral growth rates from sequencing counts (PLoS Computational Biology, 20:e1012443; Nature, 597:703-708; Science, 376:1327-1332). As the reviewer points out, it does assume that the relative viral growth rates are constant over the time period analyzed. However, most of the patterns mentioned by the reviewer are not deviations from this assumption, but rather just due to the fact that frequencies are plotted on a linear scale. More specifically, our multinomial logistic regression implementation defines two parameters per variant: the initial frequency and the growth rate. The absolute variant growth rate is effectively the slope of the logit-transformed variant frequencies. Each variant's relative fitness depends on that variant's growth rate relative to a predefined baseline variant. Plotting frequencies on a logit scale does help emphasize the importance of the slope by showing exponential growth as a linear trajectory. We have added a new Supplemental Figure 9 that plots the frequencies from Figure 5A on a logit scale. As can be seen the frequency trajectories are closer to linear on the logit scale.

      We have updated the results text to clarify the nature of the fixed relative growth rates per strain and to refer to this new supplemental figure as follows:

      To estimate the evolutionary success of different human H3N2 influenza strains during 2023, we used multinomial logistic regression, which uses sequence counts to estimate fixed strain growth rates relative to a baseline strain for the entire analysis time period (in this case, 2023) [50–52]. Relative growth rates estimated by multinomial logistic regression represent relative fitnesses of strains over that time period. There were sufficient sequencing counts to reliably estimate growth rates in 2023 for 12 of the HAs for which we measured titers using our sequencing-based neutralization assay libraries (Figure 5a,b and Supplemental Figure 9). We estimated strain growth rates relative to the baseline strain of A/Massachusetts/18/2022. Note that these growth rates estimate how rapidly each strain grows relative to the baseline strain, rather than the absolute highest frequency reached by each strain. Each strain’s absolute growth rate corresponds to the slope of the strain’s logit-transformed frequencies at the end of the analysis time period (Supplemental Figure 9).

      As the reviewer notes, the multinomial logistic regression implementation assumes a fixed growth rate for each strain over the time period being analyzed. This limitation causes the inferred growth rates to emphasize the latest trends in the analysis time period. For example, at the end of December 2023 in Figure 5A, the A/Ontario/RV00796/2023 strain is growing rapidly and replacing all other variants. Correspondingly, the multinomial logistic regression infers a high growth rate for that Ontario strain relative to the A/Massachusetts/18/2022 baseline strain. However, the A/Massachusetts/18/2022 strain was growing relative to other strains in the first half of 2023 since it has a higher growth rate than they do. However, there are modest deviations from linearity on the logit scale shown in the added supplementary figure likely because the assumption of a fixed set of relative growth rates over the analyzed time period is an approximation.

      We have added the following text to the discussion to highlight this limitation of the multinomial logistic regression:

      Our comparisons of the neutralization titers to the growth rates of different H3N2 strains was limited by the fact that only a modest number of strains had adequate sequence data to estimate their growth rates. Strains with more sequencing counts tend to be those with moderate-to-high fitness, which therefore limited the dynamic range of growth rates across strains we were able to analyze. Relatedly, the multinomial logistic regression infers a single fixed growth rate per strain for the entire analysis time period of 2023, and cannot represent changes in relative fitness of strains over that relatively short time period. Additionally, because the strains for which we estimated growth rates are phylogenetically related it is difficult to assess the statistical significance of the correlation [53], so it will be important for future work to reassess the correlations with new neutralization data against the dominant strains in future years.

      (3) Regarding my previous suggestion to test an older vaccine strain than A/Texas/50/2012 to assess whether the observed peak in titer measurements is virus-specific: We understand that the authors want to focus the scope of this paper on the relative fitness of contemporary strains, and that this additional experimental effort would go beyond the main objectives outlined in this manuscript. However, the authors explicitly note that "Adults across age groups also have their highest titers to the oldest vaccine strain tested, consistent with the fact that these adults were first imprinted by exposure to an older strain." This statement gives the impression that imprinting effects increase titers for older strains, whereas this does not seem to be true from their results, but only true for A/Texas. It should be modified accordingly.

      We agree with the reviewer’s suggestion that the specific language describing the potential trend of adults having the highest titers to the oldest strain tested could be further caveated. To this end, we have made the following edits to the portion of the main text that they highlighted:

      Adults across age groups also have their highest titers to the oldest vaccine strain tested (Figure 6), consistent with the fact that these adults were likely first imprinted by exposure to an older strain more antigenically similar to A/Texas/50/2012 (the oldest strain tested here) than more recent strains. Note that a similar trend towards adult sera having higher titers to older vaccine strains was also observed in a more recent study we have performed using the same methodology described here [60].

      Notably, this trend of adults across age groups having the highest titers to the oldest vaccine strains tested has held true in subsequent work we’ve performed with H1N1 viruses (Kikawa et al., 2025 Virus Evolution, DOI: https://doi.org/10.1093/ve/veaf086). In that more recent study, we again saw that adults (cohorts EPIHK, NIID, and UWMC) tended to have their highest titers to the oldest cell-passaged strain tested (A/California/07/2009), whereas children (cohort SCH) had more similar neutralization titers across strains.  These additional data therefore support the idea that adults tend to have their highest titers to older vaccine strains, a finding that is also consistent with substantial prior work (eg, Science, 346:996-1000).

      Reviewer #2 (Public review):

      This is an excellent paper. The ability to measure the immune response to multiple viruses in parallel is a major advancement for the field, that will be relevant across pathogens (assuming the assay can be appropriately adapted). I only had a few comments, focused on maximising the information provided by the sera. These concerns were all addressed in the revised paper.

      We thank this reviewer for the summary of our work and their helpful comments in the first revision.

      Reviewer #3 (Public review):

      The authors use high throughput neutralisation data to explore how different summary statistics for population immune responses relate to strain success, as measured by growth rate during the 2023 season. The question of how serological measurements relate to epidemic growth is an important one, and I thought the authors present a thoughtful analysis tackling this question, with some clear figures. In particular, they found that stratifying the population based on the magnitude of their antibody titres correlates more with strain growth than using measurements derived from pooled serum data. The updated manuscript has a stronger motivation, and there is substantial potential to build on this work in future research.

      Comments on revisions:

      I have no additional recommendations. There are several areas where the work could be further developed, which were not addressed in detail in the responses, but given this is a strong manuscript as it stands, it is fine that these aspects are for consideration only at this point.

      We appreciate this reviewer’s summary of our work, and we are glad they feel the motivation is stronger in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study provides insights into the role of Pten mutations in SHH-medulloblastoma, by using mouse models to resolve the effects of heterozygous vs homozygous mutations on proliferation and cell death throughout tumorigenesis. The experiments presented are convincing, with rigorous quantifications and orthogonal experimentation provided throughout, and the models employing sporadic oncogene induction, rather than EGL-wide genetic modifications, represent an advancement in experimental design. However, the study remains incomplete, such that the biological conclusions do not extend greatly from those in the extant literature; this could be addressed with additional experimentation focused on cell cycle kinetic changes at early stages, as well as greater characterization of macrophage phenotypes (e.g., microglia vs circulating monocytes). The work will be of interest to medical biologists studying general cancer mechanisms, as the function of Pten may be similar across tumor types.

      We appreciate the summary of the importance of our work and agree that it provides a foundation for future experiments addressing underlying mechanisms including the role of macrophages in tumor progression/regression

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates how Pten loss influences the development of medulloblastoma using mouse models of Shh-driven MB. Previous studies have shown that Pten heterozygosity can accelerate tumorigenesis in models where the entire GNP compartment has MB-promoting mutations, raising questions about how Pten levels and context interact, especially when cancer-causing mutations are more sporadic. Here, the authors create an allelic series combining sporadic, cell-autonomous induction of SmoM2 with Pten loss in granule neuron progenitors. In their models, Pten heterozygosity does not significantly impact tumor development, whereas complete Pten loss accelerates tumour onset. Notably, Pten-deficient tumours accumulate differentiated cells, reduced cell death, and decreased macrophage infiltration. At early stages, before tumour establishment, they observe EGL hyperplasia and more pre-tumour cells in S phase, leading them to suggest that Pten loss initially drives proliferation but later shifts towards differentiation and accumulation of death-resistant, postmitotic cells. Overall, this is a well-executed and technically elegant study that confirms and extends earlier findings with more refined models. The phenotyping is strong, but the mechanistic insight is limited, especially with respect to dosage effects and macrophage biology.

      Strengths:

      The work is carefully executed, and the models-using sporadic oncogene induction rather than EGL-wide genetic manipulations-represent an advance in experimental design. The deeper phenotyping, including singlecell RNA-seq and target validation, adds rigor.

      Weaknesses:

      The biological conclusions largely confirm findings from previous studies (Castellino et al, 2010; Metcalf et al, 2013), showing that germline or conditional Pten heterozygosity accelerates tumorigenesis, generates tumors with a very similar phenotype, including abundant postmitotic cells, and reduced cell death.

      We respectfully would like to point out that we have added new insights not covered in the previous more abbreviated studies. First, we are the first to show that in a sporadic model, heterozygous loss of Pten does not lead to accelerated or more aggressive disease. This is an important finding, since this is the case for many patients and only germline PTEN mutant humans are likely to have more aggressive tumors. Also, the previous studies did not examine tumor progress by analyzing neonatal stages or analyze spinal cord metastasis. We found a different phenotype at some early stages then at end stage, thus they provide new insights. Our study also is the only one to apply a mosaic analysis to study cell behaviors at early stages of progression, including proliferation and differentiation/survival. We are also the first to demonstrate a reduction in macrophages in Pten mutant SHH-MB.

      The second stated goal - to understand why Pten dosage might matter - remains underdeveloped. The difference between earlier models using EGL-wide SmoA1 or Ptch loss versus sporadic cell-autonomous SmoM2 induction and Pten loss in this study could reflect model-specific effects or non-cell-autonomous contributions from Pten-deficient neighbouring cells in the EGL, for example. However, the study does not explore these possibilities. For instance, examining germline Pten loss in the sporadic SmoM2 context could have provided insight into whether dosage effects are cell-autonomous or dependent on the context.

      We thank the reviewer for suggesting this experiment and agree it would be an informative one for other groups to perform as a follow up to our work to allow a direct comparison in the same sporadic SHH-MB model of mosaic vs germline loss of Pten. Also, we would like to point out that we do show a dosage effect of lowering vs removing Pten when only sporadic GCPs also have an activating mutation in SMO. Please see above comments for additional new mechanistic insight we have provided.

      The observations on macrophages are intriguing but preliminary. The reduction in Iba1+ cells could reflect changes in microglia, barrier-associated macrophages, or infiltrating peripheral macrophages, but these populations are not distinguished. Moreover, the functional relevance of these immune changes for tumor initiation or progression remains unexplored.

      We agree, further studies of the influence of Pten mutations on macrophage phenotypes will be interesting.

      Reviewer #2 (Public review):

      The authors sought to answer several questions about the role of the tumor suppressor PTEN in SHHmedulloblastoma formation. Namely, whether Pten loss increases metastasis, understanding why Pten loss accelerates tumor growth, and the effect of single-copy vs double-copy loss on tumorigenesis. Using an elegant mouse model, the authors found that Pten mutations do not increase metastasis in a SmoD2-driven SHH-medulloblastoma mouse model, based on extensive characterization of the presence of spinal cord metastases. Upon examining the cellular phenotype of Pten-null tumors in the cerebellum, the authors made the interesting and puzzling observation that Pten loss increased the differentiation state of the tumor, with fewer cycling cells, seemingly in contrast to the higher penetrance and decreased latency of tumor growth.

      The authors then examined the rate of cell death in the tumor. Interestingly, Pten-null tumors had fewer dying cells, as assessed by TUNEL. In addition, the tumors expressed differentiation markers NeuN and SyP, which are rare in SHH-MB mouse models. This reduction in dying cells is also evident at earlier stages of tumor growth. By looking shortly after Pten-loss induction, the authors found that Pten loss had an immediate impact on increasing the proliferative state of GCPs, followed by enhancing the survival of differentiated cells. These two pro-tumor features together account for the increased penetrance and decreased latency of the model. While heterozygous loss of Pten also promoted proliferation, it did not protect against cell death.

      Interestingly, loss of Pten alone in GCPs caused an increase in cerebellar size throughout development. The authors suggest that Pten normally constrains GCP proliferation, although they did not check whether reduced cell death is also contributing to cerebellum size.

      Lastly, the authors examined macrophage infiltration and found that there was less macrophage infiltration in the Pten-null tumors. Using scRNA-seq, they suggest that the observed reduction in macrophages might be due to an immunosuppressive tumor microenvironment.

      This mouse model will be of high relevance to the medulloblastoma community, as current models do not reflect the heterogeneity of the disease. In addition, the elegant experimentation into Pten function may be relevant to cancer biologists outside of the medulloblastoma field.

      Strengths:

      The in-depth characterisation of the mouse model is a major strength of the study, including multiple time points and quantifications. The single-cell sequencing adds a nice molecular feature, and this dataset may be relevant to other researchers with specific questions of Pten function.

      Weaknesses:

      One weakness of the study was the examination of the macrophage phenotype, which did not include quantification (only single images), so it is difficult to assess whether this reduction of macrophages holds true across multiple samples. Future studies will also be needed to assess whether Pten-mutated patient medulloblastomas also have a differentiation phenotype, but this is difficult to assess given the low number of samples worldwide.

      We thank the reviewer for highlighting the importance of our sporadic mutant approach and new findings. As stated above, we agree, further studies of the influence of Pten mutations on macrophage phenotypes will be interesting as well as of human samples once large numbers can be obtained. All conclusions about macrophages are based on analyzing 3 independent tumors/genotype, which was stated in the Figure legends, and for all end stage tumors the sections were collected from one lateral edge of the tumor to the midline and for earlier stage from one side of the brain to the other, thus we believe the reported phenotypes are consistent within tumor and stages

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points 

      (1) The authors should state explicitly that early EGL analyses sample the same cerebellar region across animals (e.g., matched lobule or distance from the midline) because position-dependent effects are possible. 

      We agree this is an important aspect of the rigor of the study and are sorry this was not clear enough. We had stated in the legends to Figures 4 and 5 that midline sections were analyzed and when it was not the entire EGL quantified the region analyzed was shown, but we now include more details in all relevant Figure legends and in the Methods section. 

      (2) It is not clear from Figure 3i-k that TUNEL density in Syp-high regions differs between Pten+/- and Pten-/- tumors. 

      We have added a new graph as Figure 3 Supplemental Figure 1D with this direct comparison. Indeed, there is no difference between the Syp-high regions of Pten+/- and Pten-/- tumors as these regions of Pten+/- tumors have no detectable PTEN protein and thus have the same behavior as Pten-/- tumors (reduced cell death).

      (3) The authors interpret the increase in the %EdU+ GFP+ cells in the EGL as evidence of a faster cell cycle. However, EdU labeling alone does not demonstrate altered cell cycle kinetics; this would require a dedicated assay. It would also be informative to combine EdU with Ki67 staining. This could clarify whether the effect reflects changes in differentiation - for example, if a higher proportion of GFP+ pre-tumor cells remain Ki67+-or whether the increase in EdU simply reflects a greater fraction of cells being in cycle. Such an analysis might even reveal no change in cycling if the proliferation index in controls is lower. 

      We are sorry we did not make our analysis sufficiently clear in Figure 5 and Figure 6. The quantification of EdU+ cells was restricted to the outer EGL (region defined by containing GFP+ and EdU+ cells) where all cells should be Ki67+.  We cannot perform co-staining of Ki67 and GFP, since antigen retrieval for Ki67 removes the epitope for our GFP antibody. We have revised the wording in the figure legends and results sections.  

      (4) Some of the stains are unconvincing - for example, Figure 2 E,F, the p27 staining is difficult to distinguish from the background, Figure 7G,E- CD31+ blood vessels are difficult to see. 

      As requested, in Fig. 2 we adjusted the level of the green color for P27 to reduce the background in A, B, E , F using Photoshop. In Fig. 7G, H we adjusted the level of the green color for CD31 to reduce the background.  

      (5) Line 158: "unlike a SmoA2 model with germline or broad deletion of Pten in the cerebellum, where heterozygous deletion is sufficient..." That paper refers to the Neuro-D2SmoA1 mouse model. So this statement should be clarified.  

      We have made this edit.

      Reviewer #2 (Recommendations for the authors): 

      (1) I find the final discussion paragraph about Kmt2d does not add much to the study, as it seems obvious that the mechanisms of tumor formation would differ between two different tumor suppressor genes, but this is only my opinion. 

      We respectfully think it is interesting, even if expected, so have left it in the Discussion.

      (2) There is also a typo on line 342 that changes the meaning of the sentence: mTORC1 signaling is significantly 'unregulated'; 

      We thank the reviewer for noticing this mistake. We have changed 'unregulated' to ‘upregulated’.

      (3) Figure 9Q,R mislabeled: not mTORC1, but instead UPR  

      Asns is included in the mTOR pathway in Hallmark MTOR1 signaling as well as in the Unfolded Protein Response gene list. We have made a note of this in the Figure legend.

    1. 7.6.3. Trolling and Nihilism# While trolling can be done for many reasons, some trolling communities take on a sort of nihilistic philosophy: it doesn’t matter if something is true or not, it doesn’t matter if people get hurt, the only thing that might matter is if you can provoke a reaction. We can see this nihilism show up in one of the versions of the self-contradictory “Rules of the Internet:” 8. There are no real rules about posting … 20. Nothing is to be taken seriously … 42. Nothing is Sacred Youtuber Innuendo Studios talks about the way arguments are made in a community like 4chan: You can’t know whether they mean what they say, or are only arguing as though they mean what they say. And entire debates may just be a single person stirring the pot [e.g., sockpuppets]. Such a community will naturally attract people who enjoy argument for its own sake, and will naturally trend oward the most extremte version of any opinion. In short, this is the free marketplace of ideas. No code of ethics, no social mores, no accountability. … It’s not that they’re lying, it’s that they just don’t care. […] When they make these kinds of arguments they legitimately do not care whether the words coming out of their mouths are true. If they cared, before they said something is true, they would look it up. The Alt-Right Playbook: The Card Says Moops by Innuendo Studios While there is a nihilistic worldview where nothing matters, we can see how this plays out practically, which is that they tend to protect their group (normally white and male), and tend to be extremely hostile to any other group. They will express extreme misogyny (like we saw in the Rules of the Internet: “Rule 30. There are no girls on the internet. Rule 31. TITS or GTFO - the choice is yours”), and extreme racism (like an invented Nazi My Little Pony character). Is this just hypocritical, or is it ethically wrong? It depends, of course, on what tools we use to evaluate this kind of trolling. If the trolls claim to be nihilists about ethics, or indeed if they are egoists, then they would argue that this doesn’t matter and that there’s no normative basis for objecting to the disruption and harm caused by their trolling. But on just about any other ethical approach, there are one or more reasons available for objecting to the disruptions and harm caused by these trolls! If the only way to get a moral pass on this type of trolling is to choose an ethical framework that tells you harming others doesn’t matter, then it looks like this nihilist viewpoint isn’t deployed in good faith1. Rather, with any serious (i.e., non-avoidant) moral framework, this type of trolling is ethically wrong for one or more reasons (though how we explain it is wrong depends on the specific framework).

      This section helped me think about trolling in a much more nuanced way, especially the idea that disruption itself isn’t automatically good or bad. I found the discussion about group formation and norm enforcement really useful, because it explains why trolling can feel threatening—it challenges the patterns and signals that groups rely on to define who belongs. The comparison between trolling, protest, and revolution also stood out to me, since it shows how moral judgment often depends on whether we see the existing social order as legitimate. Overall, this section made it clear that evaluating trolling ethically requires looking beyond intent or humor and examining what is being disrupted and who is harmed or protected by that disruption.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank all three Reviewers for appreciating our work and for sharing constructive feedback to further enhance the quality of our study. It is really gratifying to read that the Reviewers believe that this work is interesting, novel and of interest to broad audience. Therefore, we believe that it will be suitable for a high profile journal. Further, the experiments suggested by the reviewers have added value to the work and have substantiated our findings. It is important to highlight that we have performed all the suggested experiments. Please find below the detailed point by point response to Reviewer’s Comments.

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required):

      • The manuscript entitled, "IP3R2 mediated inter-organelle Ca2+ signaling orchestrates melanophagy" is a rather diffuse study of the relationship between IP3R2 and melanin production. While this is an interesting and understudied area, the study lacks a clear focus. The model seems to be that IP3R2 is essential for mitochondrial calcium loading. And that its absence increases lysosomal calcium loading. There are also a number of incomplete and/or unconvincing links to autophagy/melanophagy, TMEM165, TRPML1 and even gene transcription. In this kind of diffuse study, each step needs to be convincing to get to the next one, which is not the case here. There are also references to altered proteasome function, despite the total absence of any direct data on the proteasome. Finally, I felt it was sometimes unclear whether the authors were referring to melanosomes or lysosomes at various points throughout the study.*

      While I suspect that, somewhere in here, there are some novel relationships worthy of further investigation, this is a case where the many parts make the overall product less convincing. What effects here are directly relevant to IP3R2? This study should stop there, leaving investigations of peripheral factors for future investigations, as the further you get from where you start, the less clear what you are studying becomes. And the less direct.

      Response: We thank the Reviewer for finding our study interesting and recognizing that this is an understudied area. Further, we appreciate the constructive feedback given by the Reviewer. We have addressed all the Reviewer’s comments. Please find below point-wise responses to the comments.

      Specific Comments:

      __ Comment 1.__ The separation of Figures 1F and 1J makes it impossible to assess the effect of αMSH on IP3R2 expression. This presentation makes interpretation difficult; a simple 4 lane Western would be more informative.

      Response: We apologize to the Reviewer for not being very clear. Actually, we have separated these data sets because these are two independent experimental conditions. The Figure 1F illustrates data from the LD-based pigmentation model, whereas Supplementary Figure 1K (Previously Fig 1J) depicts data from α-MSH–induced pigmentation model.

      Comment 2. One of the most attractive points made by this study is that there is a specific link between IP3R2 and melanin production. In my opinion, the null hypothesis is that this is just about the amount of IP3Rs expressed per cell. To reject this concept, the authors should show data demonstrating the relative expression of all 3 IP3Rs. Without this information, the null hypothesis that IP3R2 is the most expressed IP3R isoform and that's why its knockdown has the most dramatic effect cannot be rejected It would also be helpful to show where the different IP3Rs are expressed within the cell.

      Response: We thank the Reviewer for raising this interesting point and for the constructive comment. As suggested, we would like to clarify that the relative expression of all three IP₃R isoforms has already been analyzed in our study. Specifically, in Figure 1B, we demonstrate the expression pattern of IP₃R isoforms in our experimental system, where IP₃R2 shows the highest expression level, followed by IP₃R3 and IP₃R1 (IP₃R2 > IP₃R3 > IP₃R1). Further, in the revised manuscript, we additionally analyzed publicly available datasets for IP₃Rs expression. “The Human Protein Atlas” reports a higher expression of IP₃R2 in melanocytes compared to the other IP₃R isoforms (Supplementary Fig 1A). Therefore, we agree with the Reviewer’s proposed concept that the relatively higher expression of IP₃R2 can be one of the important factors that regulate pigmentation levels. Indeed, our analysis of microarray dataset from African vs Caucasian skin revealed a greater IP₃R2 expression in African skin compared to Caucasian skin (__Figure 1L). __

      With respect to subcellular localization, all three IP₃R isoforms are predominantly localized to the endoplasmic reticulum, consistent with their established role as ER-resident Ca²⁺ release channels. However, their expression levels are known to be highly cell and tissue specific (Bartok et al., Nature Communications 2019), supporting the idea that higher IP₃R2 levels play a functionally specialized role in melanogenesis.

      Comment 3. It would be helpful to label Figs 3F-I with the conditions used. The description in the text is of increased LC3II levels, however, the ratio of LC3I to LC3II might be more meaningful. Irrespective, although the graph shows an increase in LC3II, the Western really doesn't show much. As a standalone finding, I don't find this figure to be very convincing; there are better options to demonstrate this proposed relationship between IP3R2 and autophagy than what is shown.

      Response: We sincerely thank the Reviewer for this thoughtful and critical evaluation, which has helped us improve the clarity and precision of this analysis. To address this concern, in the revised manuscript, we have now labeled ‘LD’ in the Supplementary Fig 2A-B (Previously, Fig 4F-I) with the corresponding experimental conditions for clarity. In addition, we reanalyzed the data by calculating the LC3II/LC3I ratio in all the figures of the revised manuscript that include LC3II expression, which provides a more meaningful and robust assessment of autophagic flux. This revised analysis yields a clearer representation of LC3 dynamics and strengthens the interpretation of the western blotting data in support of the relationship between IP₃R2 and autophagy. Further, we have shown by confocal imaging that IP3R2 silencing significantly reduced GFP/RFP ratio of the pMRX-IP-GFP-LC3-RFP reporter system in comparison to control condition in Fig 4M-N to demonstrate the relationship between IP3R2 and autophagy. Collectively, these autophagy flux assays and biochemical experiments clearly demonstrate a direct relationship between IP3R2 and autophagy.

      Comment 4. The following statement at the beginning of page 22 "We observed an impaired proteasomal degradation of critical melanogenic proteins localized on melanosomes in the IP3R2 knockdown condition" is insufficiently supported by data to be made. Even if I was convinced that autophagy was enhanced, there is no data of any kind about the proteasome in this manuscript.

      Response: We appreciate the Reviewer’s careful scrutiny of this statement and the opportunity to clarify and strengthen our interpretation. To directly address the concern regarding proteasomal involvement, in the revised manuscript, we performed additional experiments using MG132, a well-established inhibitor of proteasomal degradation. These experiments were designed to assess whether the altered stability of melanogenic proteins observed upon IP₃R2 knockdown could be attributed to changes in proteasome-mediated turnover.

      In the revised manuscript, our new data show that treatment with MG132 leads to a marked reduction in the levels of melanosome-associated melanogenic proteins, including GP100 and DCT, compared to the DMSO control (Fig. 4A–D). This response contrasts with that of non-melanosomal proteins, such as IP₃R2 and Calnexin, which are localized to the endoplasmic reticulum and exhibits increased accumulation upon MG132 treatment (Fig. 4E–H), consistent with canonical proteasomal inhibition. These differential outcomes suggest that melanosome-resident proteins respond distinctly to proteasomal blockade, likely due to their compartmentalized localization on melanosomes.

      Previous studies have shown that impairment of proteasomal function can activate autophagy as a compensatory, cytoprotective mechanism (Williams et al, 2013; Li et al, 2019; Su & Wang, 2020; Pan et al, 2020). Indeed, we observed a significant increase in LC3II/LC3I levels in IP3R2 knockdown plus MG132 treatment condition in comparison to IP3R2 knockdown plus the DMSO control (Fig. 4I–J).

      To investigate whether impairment of proteasomal degradation upon IP3R2 silencing alone or together with MG132 selectively triggers melanophagy, we assessed melanophagy using melanophagy reporter, mCherry-Tyrosinase-eGFP following IP3R2 silencing along with MG132 treatment. Our observations revealed an increase in melanophagy flux with IP3R2 silencing and MG132 treatment compared to siNT with DMSO control (Fig 5K-L). This suggests that IP3R2 silencing induced inhibition of proteasomal degradation activates melanophagy. Taken together, these findings indicate that compromised proteasomal degradation engages the autophagy machinery, providing a mechanistic link between proteasome dysfunction, enhanced autophagy, and altered melanogenic protein turnover.

      Comment 5. In figure 5, the authors create a new ratiometric dye to detect melanosome stability based on the principle that tyrosinase is exclusively found in melanosomes. Unfortunately, there is no validation that this new construct is found exclusively in melanosomes upon expression. In addition, there is discussion about the pH of lysosomes, but not of melanosomes. Ultimately, this data cannot be considered at face value without any type of validation; I also note that the pictures lack sufficient detail to support identification of these structures as melanosomes. * While I maintain the above concerns, I note that, the data in supplemental figure 3 is MUCH more convincing than what is in the figure. Both the writing and the figure design should be rethought.*

      Response: We appreciate the Reviewer’s thorough evaluation and constructive critique of Figure 5, which has helped us to better clarify and validate this aspect of the study. In the revised manuscript, we directly address the concern regarding the subcellular specificity of the ratiometric probes, we performed detailed colocalization analysis using established melanosome markers. Specifically, we assessed the localization of the melanophagy detection probes mCherry–Tyr–eGFP and tyrosinase–mKeimaN1 with the melanosome-resident protein GP100 detected by anti-HMB45 (Supplementary Fig 2E-F and 2K-L). These analyses revealed a very high degree of colocalization, reflected by strong Pearson’s correlation and overlap coefficients, thereby validating that the expressed probes are predominantly localized to melanosomes.

      Regarding Lysosome/Melanosomal pH considerations, our melanophagy detection ratiometric probes: mCherry–Tyrosinase–eGFP (sensitive to acidic pH via eGFP) and tyrosinase mKeimaN1 (sensitive to acidic pH via Keima) are specifically designed to identify melanosome degradation, which happens upon melanosome fusion with lysosome. Consequently, the observed signal shifts indicate melanosome turnover rather than merely reflecting the lysosomal pH.

      To further corroborate the microscopic observations, we performed biochemical assays to study melanophagy flux upon IP3R2 silencing. We employed Bafilomycin A1, an inhibitor of autophagosome-lysosome fusion, to examine melanosomal protein accumulation. Upon Bafilomycin A1 treatment, IP3R2 silenced cells showed enhanced accumulation of melanosomes, as indicated by elevated tyrosinase levels compared with siNT controls (Supplementary Fig 3C-D), indicating elevated melanophagy flux upon IP3R2 knockdown. In the revised manuscript, we employed additional melanophagy detection strategies to further strengthen our findings. Specifically, we used Retagliptin phosphate (RTG), a well-established selective inducer of melanophagy, and observed a marked increase in melanophagy using the mCherry–Tyrosinase–eGFP melanophagy probe (Supplementary Fig 2G-H). Additionally, we performed independent validation by assessing colocalization of the melanosome (recognized by anti-HMB45 ab that identifies melanosomal structural protein GP100) with LC3 (Supplementary Fig 3A-B). This analysis revealed a significant increase in melanosomes colocalization with LC3 upon IP₃R2 silencing compared to control conditions.

      Collectively, these independent approaches clearly demonstrate that the melanophagy probes localize to melanosomes and detect melanophagy (by responding to melanosome fusion to lysosomes).

      Comment 6. Given the increase in ER Ca2+ content after IP3R2 knockdown, ER calcium content should be emptied before attempting to estimate lysosomal Ca2+ content with GPN or Bafilomycin. Otherwise, the source of calcium is less than clear.

      Response____: We appreciate the Reviewer’s careful consideration of Ca²⁺ source, which is critical for accurate interpretation of these experiments. Therefore, as suggested, in the revised manuscript, we conducted experiments involving Thapsigargin (Tg) pre-treatment to deplete ER Ca²⁺ reserves before examining lysosomal Ca²⁺ release using GPN or Bafilomycin (Supplementary Fig 6I-N). Even under these conditions, we noted increased lysosomal Ca²⁺ release in IP₃R2 knockdown cells, thus confirming that the observed Ca²⁺ signals originate from lysosomes rather than any remaining ER Ca²⁺. Importantly, this approach allowed us to minimize ER-derived Ca²⁺ contributions to changes in the lysosomal Ca²⁺ release.


      Reviewer #1 (Significance (Required)):

      The manuscript entitled, "IP3R2 mediated inter-organelle Ca2+ signaling orchestrates melanophagy" is a rather diffuse study of the relationship between IP3R2 and melanin production. While this is an interesting and understudied area, the study lacks a clear focus. The model seems to be that IP3R2 is essential for mitochondrial calcium loading. And that its absence increases lysosomal calcium loading. There are also a number of incomplete and/or unconvincing links to autophagy/melanophagy, TMEM165, TRPML1 and even gene transcription. In this kind of diffuse study, each step needs to be convincing to get to the next one, which is not the case here. There are also references to altered proteasome function, despite the total absence of any direct data on the proteasome. Finally, I felt it was sometimes unclear whether the authors were referring to melanosomes or lysosomes at various points throughout the study.

      Response____: We thank the Reviewer for finding our work interesting and appreciating that this is an understudied field. Further, we thank him/her for the constructive feedback on our study. We have performed several additional experiments and significantly revised the manuscript to address all the comments of the Reviewer.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In the present manuscript, Saurav et al. identify IP3R2-mediated ER calcium release as a key suppressor of melanophagy, thereby sustaining pigmentation in melanocytes. Using in vitro (B16 murine melanoma cells, primary human melanocytes) and in vivo (zebrafish) models, the authors report that IP3R2 expression is positively correlated with pigmentation. They then investigate the impact of IP3R2 knockdown and find that IP3R2 silencing enhances the stability of melanogenic proteins, while also inducing autophagic degradation of melanosomes (i.e., melanophagy). Concomitantly, they find that IP3R2 silencing decreases mitochondrial calcium uptake, increases lysosomal calcium loading, and lowers lysosomal pH. They propose a pathway wherein in IP3R2 knockdown cells impaired mitochondrial calcium uptake induces the activation of AMPK-ULK1, and increased lysosomal calcium activates TRPML1 via TMEM165 and closer proximity interactions between ER and lysosomes, TFEB nuclear translocation, and upregulation of melanophagy-related genes, namely OPTN and RCHY1. The work is placed within the context of emerging roles of organelle calcium signaling in pigmentation biology, where extracellular calcium influx pathways are known regulators, but the contribution of ER-mitochondria-lysosome crosstalk to melanosome turnover remains largely unknown.

      Response____: We thank the Reviewer for appreciating our work and highlighting that the contribution of ER-mitochondria-lysosome crosstalk to melanosome turnover remains largely unappreciated.

      Major comments:

      Comment 1- The central finding is that IP3R2 knockdown induces melanophagy and reduces pigmentation. However, the manuscript does not identify any physiological or pathological context in which IP3R2 expression or activity is naturally downregulated in melanocytes. Without such context, the knockdown may represent an artificial perturbation that broadly alters ER calcium handling and triggers melanophagy as part of a general stress-induced autophagy response. This raises uncertainty about whether the pathway operates in vivo under normal or disease conditions. It would strengthen the study to identify upstream cues that reduce IP3R2 function and to test whether these also trigger melanophagy through the proposed mechanism.


      Response____: We thank the Reviewer for asking such an important question. The Reviewer asked to identify any physiological or pathological context in which IP3R2 expression is naturally downregulated in melanocytes. To address this question, in the revised manuscript, we analyzed publicly available microarray datasets comparing skin samples from Caucasian and African populations (Yin et al., Experimental Dermatology 2014). This unbiased analysis revealed considerably lower IP₃R2 expression in the Caucasian skin as compared to African skin (Fig. 1L). This data support a physiological correlation between IP₃R2 expression and pigmentation level, reinforcing the physiological relevance of the proposed pathway.


      Comment 2- While the data link IP3R2 knockdown to decreased pigmentation and increased melanophagy, the causality between altered organelle calcium dynamics and the melanophagy induction is inferred from correlation and partial rescue experiments. More direct interventions in the proposed downstream pathways (e.g., acute mitochondrial calcium uptake restoration, lysosomal calcium buffering) would strengthen mechanistic claims.

      Response____: We appreciate the Reviewer’s recommendation on strengthening the mechanistic causality between organelle Ca²⁺ dynamics and melanophagy. As suggested, in the revised manuscript, we restored acute mitochondrial Ca²⁺ uptake by MCU over-expression in the IP₃R2 knockdown background, which resulted in a marked reduction in melanophagy along with increased mitochondrial Ca²⁺ uptake in comparison to control (Fig 6I-L). This data clearly demonstrates that downstream of IP₃R2 silencing mitochondrial Ca²⁺ restoration rescues the melanophagy phenotype thereby revealing a mechanistic causality between mitochondrial Ca²⁺ dynamics and melanophagy.

      Similarly, to assess the causality between lysosomal Ca²⁺ dynamics and melanophagy, we silenced TMEM165 in the IP₃R2 knockdown background. Excitingly, upon TMEM165 knockdown we observed reduction in melanophagy, concomitant with decrease in lysosomal Ca²⁺ levels under IP₃R2 silencing conditions (Supplementary Fig 7I-L). Together, these direct manipulations support a causal role for altered organelle Ca²⁺ dynamics in driving melanophagy.


      We believe that these experiments would have addressed the concern of the Reviewer. However, if there are any other specific experiments that the Reviewer would like us to perform, we would be happy to carry out them as well.

      __Comment 3____- __Zebrafish assays convincingly show altered pigmentation with altered IP3R2 levels, but do not connect this to in vivo melanophagy measurements or TRPML1/TFEB activity, which would link the cell biology to organismal phenotype more directly.

      Response____: We thank the Reviewer for appreciating our in vivo zenrafish experiments. Futher, we acknowledge the Reviewer’s point of linking the cellular mechanisms to organismal phenotypes in vivo. Therefore, as suggested, we activated TRPML1 in the zebrafish model system. In the revised manuscript, we investigated role of the TRPML1–TFEB axis in pigmentation in vivo by pharmacological activation of TRPML channels with MLSA1. The MLSA1 treatment resulted in a marked reduction in zebrafish pigmentation compared to vehicle-treated controls (Fig. 8M). This phenotypic change was further substantiated by quantitative melanin content assays, which confirmed a significant decrease in melanin levels following MLSA1 treatment (Fig. 8M–N). These in vivo findings support the involvement of TRPML1-mediated lysosomal signaling in pigmentation regulation.

      Comment 4- The work suggests therapeutic potential for pigmentary disorders, but no disease models are tested. It is unclear whether the observed mechanisms operate under physiological stressors.

      Response____: We appreciate the Reviewer’s comment regarding physiological relevance and disease context. As addressed in Comment 1, we examined publicly available human skin microarray datasets for IP₃R2 expression in Caucasian and African population. This analysis revealed a positive correlation between IP₃R2 expression and human skin pigmentation, supporting that modulation of IP₃R2 occurs under physiological conditions rather than representing an artificial perturbation.

      While formal pigmentary disease models were not examined in this study, the observed correlation between IP₃R2 expression and physiological pigmentation differences along with our robust in vivo zebrafish data suggests that IP₃R2 plays an important role in physiological pigmentation. As highlighted by Reviewer 1 and Reviewer 3, the manuscript is already too long. Therefore, we plan to delineate the precise role of IP₃R2 in pigmentary disorders as an independent study.

      Comment 5- The paradox between the observed enhanced stability of melanogenic proteins and increased melanophagy is insufficiently addressed. DCT, Tyrosinase and GP100 are all melanosome-associated and their stability or degradation is in prior literature often interpreted as reflecting melanosome biogenesis and turnover. This discrepancy needs to be resolved, as it complicates interpretation of melanophagy assays.

      Response____: We appreciate the Reviewer’s careful consideration of this apparent paradox. This point was also raised by Reviewer 1. We have addressed the query in detail in response to Comment 4 of Reviewer 1. Briefly, the enhanced stability of melanosome-associated proteins reflects impaired proteasomal degradation and prolonged protein half-life, while the concurrent increase in melanophagy represents a compensatory turnover mechanism for degrading such dysfunctional melanosomes.

      Thus, increased melanophagy and apparent stabilization of melanogenic proteins are not contradictory but instead represent parallel outcomes of disrupted proteostasis. This interpretation is supported by our proteasomal inhibition experiments (Fig 4A-H) and autophagy analyses (Fig 4I-P), which collectively reconcile the observed protein stability with enhanced melanosome turnover.


      Comment 6- The authors propose that mitophagy and ER-phagy are reduced in IP3R2 knockdown cells, suggesting specific induction of melanophagy, but the rationale for why increased autophagic flux only targets melanosomes is insufficiently addressed. Also, these conclusions are solely based on Keima assays, and positive controls for mitophagy and ER-phagy are lacking.

      Response: We appreciate the Reviewer’s critical assessment of the specificity of autophagic targeting in the IP₃R2 knockdown condition and the need for appropriate validation controls. In the revised manuscript, we have repeated both the mitophagy and ER-phagy assays with well-established positive controls. Carbonyl cyanide-p-trifluoromethoxyphenylhydrazone (FCCP) was employed as a positive control to robustly induce mitophagy (Supplementary Fig 4E-F), while 4-phenylbutyric acid (4PBA) was used as a positive control for ER-phagy/reticulophagy (Supplementary Fig 4G-H). Secondly, we have validated the microscopy data with biochemical assays by examining levels of ER (Fig 4E-H) and mitochondria resident protein MCU.

      To provide a mechanistic rationale for the specific induction of melanophagy, we examined recently identified regulators of melanophagy, RCHY1 and OPTN (Lee et al., PNAS 2024). Bioinformatic analysis identified multiple TFEB binding sites on the promoters of both genes, which was supported by increased RCHY1 and OPTN expression following IP₃R2 knockdown. Further, in the revised manuscript, we performed additional loss-of-function experiments to demonstrate that co-silencing IP3R2 along with RCHY1 or OPTN significantly reduced melanophagy flux compared to IP₃R2 knockdown alone (Fig. 9H–K). Taken together, these data explain why enhanced autophagic flux downstream of IP₃R2 silencing is preferentially directed toward melanosomes.

      Comment 7- The melanophagy probes are novel and validated with rapamycin/bafilomycin, but quantitative calibration of GFP/mCherry or Keima signal to actual lysosomal delivery rates is missing; photobleaching, pH heterogeneity (incl., observed decrease in lysosomal pH), and melanin autofluorescence (see below) could confound ratios. Also, side-by-side comparison with other melanophagy detection approaches (e.g., colocalization of melanosomes with LC3) is lacking.

      __Response____: __We appreciate the Reviewer’s careful evaluation of the melanophagy probes and the potential technical confounders. In the revised manuscript, we have performed a variety of experiments to further characterize and validate the probes. First of all, the melanophagy detection ratiometric probes (mCherry–Tyrosinase–eGFP and tyrosinase mKeimaN1) are built on well-established and extensively validated backbones. Further, we used appropriate controls (empty vectors/non-targeting siRNAs/vehicle controls) in all experiments to analyze the relative fluorescence changes in the test condition v/s control. The confounding factors, if any, should be present for both test and control. Therefore, we initially did not perform side-by-side comparison with other melanophagy detection approaches.

      In the revised manuscript, as suggested by the reviewer, we employed additional melanophagy detection strategies to further strengthen our findings. Specifically, we used Retagliptin phosphate (RTG), a well-established selective inducer of melanophagy, and observed a marked increase in melanophagy using the mCherry–Tyrosinase–eGFP melanophagy probe (Supplementary Fig 2G-H). Additionally, we performed independent validation by assessing colocalization of the melanosome (recognized by anti-HMB45 ab that identifies melanosomal structural protein GP100) with LC3 (Supplementary Fig 3A-B). This analysis revealed a significant increase in melanosomes colocalization with LC3 upon IP₃R2 silencing compared to control conditions. Further, to minimize the contribution of melanin autofluorescence, non-transfected cells were imaged under identical settings, and background signals obtained from these cells were subtracted during fluorescence quantitation from all acquired images. Potential effects of photobleaching and pH heterogeneity were minimized by uniform acquisition parameters and ratiometric analysis. Taken together, we believe these complementary approaches address the Reviewer’s concerns and reinforce the robustness of our melanophagy measurements.

      Comment 8- Melanosomes exhibit broad autofluorescence, particularly upon excitation at 405-488 nm and extending into the red channel. This signal can overlap with the detection ranges for GFP, mCherry, and mKeima reporters, potentially confounding quantitative readouts unless appropriate controls (e.g., untransfected cells, spectral unmixing) are used. Throughout this manuscript, it is not addressed how melanosome autofluorescence was controlled for or excluded in the reported fluorescence measurements.

      __Response____: __We apologize to the Reviewer for not clearly stating that melanosome autofluorescence was controlled by imaging non-transfected cells under identical settings, and these background signals were subtracted during quantitation from the acquired images. Specifically, to rigorously control this issue, autofluorescence was systematically evaluated using non-transfected control cells imaged under identical excitation and emission settings used for GFP, mCherry, and mKeima reporters. These controls allowed us to define the baseline autofluorescence profile arising from melanosomes across the relevant spectral ranges. These details are included in the methods section.

      Comment 9- While OPTN and RCHY1 expression is elevated upon IP3R2 knockdown, functional engagement (e.g., OPTN localization to melanosomes, melanosome ubiquitination by RCHY1), or necessity (e.g., siRNA knockdown of these in the IP3R2-deficient background), are not tested.

      Response: We appreciate the Reviewer’s point on establishing necessity of OPTN and RCHY1 in IP₃R2 knockdown–induced melanophagy. In the revised manuscript, we performed targeted loss of function analyses for both OPTN and RCHY1 in the IP₃R2-deficient background. We assessed melanophagy using the mCherry–Tyrosinase–eGFP melanophagy probe following co-silencing of IP₃R2 with either OPTN or RCHY1. Quantitative analysis revealed a significant reduction in melanophagy flux upon co-silencing of either gene compared to IP₃R2 silencing alone (Fig. 9H–K). These findings establish the functional requirement of OPTN and RCHY1 downstream of IP₃R2 loss to drive melanophagy. Since functional engagement of OPTN and RCHY1 on melanosomes is already well-established (Lee et al. PNAS 2024 and Park et al. Autophagy 2024), we have not repeated these experiments. Taken together, our data demonstrates that OPTN and RCHY1 are not only overexpressed but also act as critical mediators of melanophagy downstream of IP₃R2 silencing.

      __Comment 10- __While siRNA/shRNA efficacy is shown, functional rescue with pore-dead mutants sometimes fails to return to control values. The possibility of partial off-target or compensatory effects is not fully excluded.

      Response: We thank the Reviewer for raising for this point. In this study, we employed pore-dead mutants of IP₃R2 (IP₃R2-M) and TRPML1 (TRPML1-M), both of them are well characterized, widely validated and extensively used by a number of leading groups in the field. Upon meticulous literature analysis, we came across multiple studies wherein partial rescue effect was reported with these pore-dead mutants. Therefore, we believe it is not surprising that we are also observing partial rescue in some of our assays.

      Actually, it is important to note that we observe rescue of the function and phenotype in every single experiment carried out with the mutants. We agree with the Reviewer that the extent of rescue is not up to control levels in few experiments. This can be attributed to the differences in the extend of expression of mutants across different experiments. However, we have validated the results with multiple independent approaches. Collectively, the use of multiple independent approaches along with genetic silencing, pharmacological inhibition/activation supports the specificity of the observed phenotypes.

      Comment 11- The mitochondrial and lysosomal calcium measurements are largely endpoint peak quantifications; kinetic analyses and buffering capacity measurements would provide more mechanistic depth, especially for the TMEM165 contribution. Also, TMEM165 necessity for melanophagy induction upon IP3R2 knockdown has not been directly addressed.

      Response: We appreciate the Reviewer’s request for greater mechanistic depth regarding organelle Ca²⁺ dynamics and the specific contribution of TMEM165. Consistent with this, we had previously demonstrated that TMEM165 silencing decreases lysosomal Ca²⁺ levels using Oregon BAPTA–dextran–based measurements (Supplementary Fig 7C-D), establishing its role in regulating lysosomal Ca²⁺ buffering. Building on this, in the revised manuscript, we performed kinetic analyses of lysosomal Ca²⁺ levels following IP₃R2 and TMEM165 silencing. These kinetic analyses validated our end point measurements that IP₃R2 knockdown leads to increase in lysosomal Ca²⁺ levels, whereas TMEM165 silencing results in decrease in lysosomal Ca²⁺ content in comparison to control. Therefore, highlighting distinct and opposing effects of IP₃R2 and TMEM165 on lysosomal Ca²⁺ kinetics.

      Further, we directly evaluated the necessity of TMEM165 for melanophagy induction in the IP₃R2-deficient background. TMEM165 knockdown alone resulted in a significant reduction in melanophagy (Supplementary Fig 7G-H). Further, co-silencing of TMEM165 with IP₃R2 also attenuated melanophagy compared to IP₃R2 knockdown alone (Supplementary Fig 7K-L). Collectively, these kinetic Ca²⁺ assays and genetic loss-of-function analyses provide mechanistic depth to the organelle Ca²⁺ measurements and establish TMEM165 as a critical regulator of melanophagy downstream of IP₃R2 silencing.

      Comment 12- The proximity ligation assay between VAP-A and LAMP1 is interpreted as showing increased ER-lysosome contacts in IP3R2 knockdown cells. However, additional controls are needed and quantitative TEM should be included to substantiate changes in organelle contact frequency and distance.

      Response: We thank the Reviewer’s for his/her emphasis on strengthening the validation of the proximity ligation assay (PLA) findings and on providing ultrastructural evidence to support altered organelle interactions. The PLA data revealed a significant increase in VAP-A–LAMP1 interaction signals in IP₃R2-silenced cells compared to control conditions (Fig. 7L–M). In the revised manuscript, this increase was not observed upon treatment with bafilomycin A1, a specific inhibitor of lysosomal acidification, or when one of the primary antibodies was omitted, confirming the specificity of the PLA signal (Fig. 7L–M). These controls support the interpretation that IP₃R2 downregulation enhances ER–lysosome interactions.

      To further substantiate the changes in organelle contact frequency and distance, we performed ultrastructural analyses using transmission electron microscopy (TEM). The quantitative TEM measurements revealed no significant change in the frequency of ER–mitochondria or ER–lysosome contacts upon IP₃R2 silencing (Fig. 7N–P). Similarly, ER–mitochondria distances remained unchanged. However, we observed a significant reduction in the distance between the ER and lysosomes in IP₃R2 knockdown cells compared to control (Fig. 7N, 7Q–R). Together, these complementary approaches demonstrate that IP₃R2 silencing specifically increases ER–lysosome proximity without altering overall contact frequency, thereby strengthening the conclusion that IP₃R2 regulates ER–lysosome coupling.

      Comment 13- Some assays report small biological n (e.g., three independent experiments with relatively small per-condition cell counts).

      __Response:____ __We appreciate the Reviewer’s comment regarding sample size. All experiments were performed with a minimum of three independent biological replicates, which is consistent with standard practice in the field. For imaging-based assays, multiple fields of view and cells were analyzed per condition in each independent experiment, and quantitative analyses were performed on pooled data across replicates. As suggested by the Reviewer, we have increased the cell numbers in some experiments. The detailed information on biological replicates and cell numbers analyzed is provided in the respective figure legends.

      Minor comments:

      • Comment 1- The title "IP3R2-mediated inter-organelle Ca2+ signaling orchestrates melanophagy" could be misread as indicating IP3R2 'promotes' melanophagy; consider rewording to make clear that IP3R2 suppresses melanophagy to maintain pigmentation. Similarly, the running title "IP3R2 negatively regulates melanophagy" would be clearer as "IP3R2 suppresses melanophagy".*

      __Response____: __As suggested by the Reviewer, we have modified the title and running title in the revised manuscript.

      Comment 2- Unify the framing of "positively regulates pigmentation" vs. "negatively regulates melanophagy" in the Introduction/Discussion.

      Response: As recommended, we have unified the framing in the suggested sections.

      Comment 3- Adding schematic flow diagrams summarizing each pathway at the end of relevant results (figure) sections could help accessibility.

      Response____: __We appreciate the Reviewer’s suggestion to improve accessibility of the presented pathways. Accordingly, we have included schematic diagrams at the end of the relevant figures. These schematics summarize: (i) ER–mitochondria interactions in the context of melanophagy (__Fig. 6P); (ii) differences in Ca²⁺ and pH regulation between wild-type and IP₃R2-silenced cells (Fig. 7S); and (iii) TRPML1-mediated Ca²⁺ release driving melanophagy via TFEB translocation (Fig. 9L). Together, these diagrams provide a concise visual overview of the key mechanistic pathways described in the study.

      Comment 4- While the introduction summarizes extracellular calcium signaling in pigmentation, there is less coverage of recent work on selective autophagy of other lysosome-related organelles (e.g., platelet dense granules, lytic granules), which could provide broader mechanistic context.

      __Response____: __As suggested by the Reviewer, we have discussed selective autophagy of other lysosome-related organelles in the introduction.

      Reviewer #2 (Significance (Required)):

      This study addresses an important gap in pigmentation biology by identifying IP3R2-mediated ER calcium release as a suppressor of melanophagy and a positive regulator of pigmentation. The strongest aspects are the integration of in vitro and in vivo models, the multi-faceted mechanistic exploration linking altered organelle calcium dynamics to selective melanosome turnover, and the development of novel ratiometric fluorescent probes for live-cell melanophagy measurement. Conceptually, the work extends prior literature that has focused on extracellular calcium influx and melanosome biogenesis, revealing a new inter-organelle calcium signaling module that controls melanosome degradation via AMPK-ULK1 and TMEM165-TRPML1-TFEB pathways.

      • However, several limitations reduce the strength of the mechanistic claims. Some key pathway steps are inferred from correlation and partial rescue rather than direct necessity/sufficiency tests (e.g., mitochondrial calcium uptake restoration, lysosomal calcium buffering). The paradoxical observation that IP3R2 knockdown both increases melanophagy and stabilizes melanosome-resident protein (DCT, Tyrosinase, GP100) is not resolved, complicating interpretation of the melanophagy assays. The specificity for melanophagy over other selective autophagy pathways is asserted but not fully explained mechanistically, and positive controls for mitophagy/ER-phagy are missing. Potential technical confounds, such as melanin autofluorescence in the detection ranges of GFP, mCherry, and mKeima, are not explicitly addressed and alternative assays for these key data were insufficiently employed. In vivo results do not yet connect altered pigmentation to melanophagy readouts or downstream TRPML1/TFEB activation. Importantly, the study does not identify any physiological or pathological scenario in which IP3R2 expression or activity is naturally reduced in melanocytes. In the absence of such upstream cues, IP3R2 knockdown may represent an artificial perturbation that triggers melanophagy as part of a broader stress-induced autophagy response, raising questions about the in vivo relevance of the proposed pathway.*

      • The work's primary audience is specialized, cell biologists, autophagy researchers, and pigmentation/skin biology specialists, but the mechanistic framework on organelle crosstalk and selective autophagy will interest a broader basic research readership, including those studying lysosome-related organelles in other systems. The ratiometric probes could be adapted for future melanophagy research, and the pathway insights may guide translational studies in pigmentary disorders or melanoma. My expertise is in mitochondrial and lysosomal calcium signaling, autophagy, and microscopy-based functional assays; I do not have detailed expertise in zebrafish developmental genetics, though the phenotypic analysis appears sound.*

      Response____: We thank the Reviewer for appreciating our work and stating that our study “addresses an important gap in pigmentation biology”. Further, we thank him/her for believing that this work will be of interest to a broad basic research readership. Moreover, we thank him/her for valuing the importance and potential significance of the ratio-metric melanophagy probes generated in this study. Finally, we acknowledge the Reviewer’s constructive feedback on our study, which has helped us in enhancing the quality of our manuscript. We have performed variety of additional in vitro experiments, in vivo zebrafish studies and have significantly revised the manuscript to address all the comments of the Reviewer.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This is a robust and extensive study showing that IP3R2 selectively initiates a calcium signalling pathway leading to melanophagy, that is the degradation of melanosomes. This reduces pigmentation and UV light protection. A strength of the paper is that it combines detailed cellular studies with in viva studies in the zebrafish model. They show that knockdown of IP3R2 reverses this process perhaps leading to a strategy to enhance melanosome number and hence to afford protection from UV irradiation. The authors use a battery of fluorescent probes (mainly genetically encoded reporters) in investigate the signalling cascade leading to melanophagy or its reduction. This involves reports for a number of different organelles involved in this process. The experiments are generally well performed with clear controls for the probes in many cases. My main issue is the panels contain too much data which may obscure the message, and a good deal could be moved to supplementary data. The manuscript investigates many mechanisms in distinct organelles which is remarkable for a two author paper. Particularly interesting was the design of novel fluorescent protein reporters for melanophagy itself. One area not explored is ion fluxes across melanosomes themselves which are lysosome-related organelles and may exhibit similar properties and signalsomes of lysosomes.

      Specifically, the authors show that a REDUCTION of IP3R2-mediated calcium release leads to a calcium flux from the ER by a different mechanism (possibly via TMBIM6). This increases calcium loading of the lysosome via TMEM165, at the expense of calcium transfer to mitochondria, and an acidification.

      • This leads to TRPML1 activation and the lysosomal calcium release activates TFEB translocation to the nucleus increases the transcription of autophagy/melanophagy genes and activation of the AMPK-ULK1 pathway (rather than mTOR). This is a complex pathway and evidence is presented for many of the steps involved.*

      • This is a tour de force investigating organelle communication during the process of melanophagy, that is little understood. It highlights many important organelle ion transport events that are important findings in their own right. For example, the importance of TMEM165 in calcium filling of lysosomes.*

      Response____: We thank the Reviewer for appreciating our study and thinking that it is a robust and extensive study in a highly understudied area. We appreciate the Reviewer’s acknowledgement that our manuscript combines detailed cellular studies with in vivo studies in the zebrafish model. Further, we thank the Reviewer for his/her constructive feedback on our work.

      __ Major points:__

      Comment 1- The authors state that TPC activation does not activate TFEB translocation the nucleus. This is now not the case and should be at least looked at. What is the role of endolysosomal channels on the melanosomes themselves in melanophagy.

      Response____: We appreciate the Reviewer’s comment regarding the potential contribution of TPC channels to TFEB activation and melanophagy. In the revised manuscript, we assessed Ca²⁺ release from TPC2 under IP₃R2 knockdown conditions using the selective TPC2 agonist TPC2-A1-N (Supplementary Fig 9G-H). Additionally, we evaluated TFEB nuclear translocation following TPC2-mediated Ca²⁺ release using TPC2-A1-N (Supplementary Fig 9I-J). Our analyses revealed no significant differences in TPC2 activity or TFEB nuclear translocation upon IP₃R2 silencing compared to control conditions. These findings suggest that, in our system, TPC2-mediated Ca²⁺ signaling does not contribute significantly to TFEB activation or melanophagy downstream of IP₃R2 silencing, indicating a more prominent role for TRPML1-dependent Ca²⁺ signaling in this context.

      Comment 2- How does reduction in IP3R2 mediated calcium fluxes enhance lysosomal acidity?

      Response____: We thank the Reviewer’s question regarding the mechanistic link between reduced IP₃R2-mediated Ca²⁺ flux and enhanced lysosomal acidity. In the revised manuscript, we show that IP₃R2 silencing results in a significant upregulation of the lysosomal proton pump H⁺-ATPase subunits: ATPV0D1 and ATP6V1H (Supplementary Fig 6E-F). Increased H⁺-ATPase expression is expected to promote proton influx into the lysosomal lumen, thereby enhancing lysosomal acidification. These findings provide a mechanistic basis for how IP₃R2 silencing can drive increased lysosomal acidity.

      Comment 3- What mediates the ER source for calcium filling of lysosomes?

      Response____: We appreciate the Reviewer’s interest in the mechanism underlying ER to lysosome Ca²⁺ transfer. Recently, an independent study also reported that IP₃R2 silencing enhances lysosomal Ca²⁺ levels and lysosomal Ca²⁺ release (Zheng et al. Cell 2022). Literature suggests that lysosomal Ca²⁺ refilling is depend on Ca²⁺ fluxes originating from the endoplasmic reticulum, particularly through ER Ca²⁺ leak pathways at ER–lysosome contact sites. In this context, ER-resident Ca²⁺ leak channels such as TMBIM6 (also known as Bax inhibitor-1) play an important role in maintaining basal cytosolic Ca²⁺ levels that can be subsequently taken up by lysosomes (Kim et al. Autophagy 2020). TMBIM6-mediated Ca²⁺ leak from the ER provides a continuous, low-level Ca²⁺ source that supports lysosomal Ca²⁺ loading, (Kim et al. Autophagy 2020). This mechanism allows lysosomes to replenish their Ca²⁺ stores via Ca²⁺ uptake systems operating at ER–lysosome contact sites. Thus, ER Ca²⁺ leak channels represent a key conduit linking ER Ca²⁺ homeostasis to lysosomal Ca²⁺ filling and function.

      Recently, lysosome localized TMEM165 was identified to play an important role in Ca²⁺ filling of lysosomes (Zajac et al. Science Advances 2024). Here, in our study, we observe that TMEM165 drives lysosomal Ca²⁺ influx in melanocytes.

      Comment 4- Oregon-green-dextran is not a great probe for lysosomal calcium. Its Kd is 170nM and even in the acidic environment this may be lowered to low micromolar which may not be great for measuring changes around luminal concentrations of around 500uM. Additionally, it is usual to correct for pH effects simultaneously since the dye is also a pH reporter and has been used as such. However, I take the point that they still see an increase in fluorescence whilst pH falls probably indicating an increase in luminal lysosomal calcium confirmed by increased perilysosomal calcium.

      Response____: We thank the Reviewer for the careful and balanced assessment of the Oregon Green–dextran measurements. We appreciate the acknowledgment that, despite the known limitations of this probe and its pH sensitivity, the observed increase in fluorescence concurrent with reduced lysosomal pH is consistent with elevated luminal lysosomal Ca²⁺ levels. We are grateful for this positive interpretation, which strengthens our conclusions when considered alongside the large amount of supporting data.

      Comment 5- The major point is to reduce the number of main data panels with consigment of some controls perhaps to supplementary. This would increase the comprehensibility of the paper.

      Response____: We thank the Reviewer for this constructive and positive suggestion. We appreciate the emphasis on reducing the data in the main figures. Therefore, as suggested, we have moved considerable data to the supplementary figures. However, due to the additional experiments performed to address the concerns of other Reviewers, the main data panels may still look little busy. We sincerely think that the Reviewer would understand our situation.

      Minor points

      Comment 1- Fig 10 needs a clear legend with symbols in the diagram explained. eg ER calcium release proteins.

      Response____: We thank the Reviewer for this helpful and constructive comment. Therefore, we have revised the Figure 10 legend to clearly explain all symbols used in the schematic illustration.

      Reviewer #3 (Significance (Required)):

      This is a tour de force investigating organelle communication during the process of melanophagy, that is little understood. It highlights many important organelle ion transport events that are important findings in their own right. For example, the importance of TMEM165 in calcium filling of lysosomes.

      Response____: We sincerely thank the Reviewer for considering our work as “a tour de force investigation” and appreciating that our study presents several important organelle ion transport events.

    1. Author response:

      eLife Assessment 

      This study presents a valuable finding on maternal SETDB1 as a key chromatin repressor that shuts down the 2C gene program and enables normal mouse embryonic development. The evidence supporting the claims of the authors is solid, although the inclusion of a causality test, a mechanistic understanding of SETDB1 targeting, and phenotypic quantification would have greatly strengthened the study. The work will be of broad interest to biologists working on embryonic development, stem cells and gene regulation.

      Thank you for this positive evaluation of our work. Please find the point-by point responses to the Reviewer’s comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      During the earliest stages of mouse development, the zygote and 2-cell (2C) embryo are totipotent, capable of generating all embryonic and extra-embryonic lineages, and they transiently express a distinctive set of "2C-stage" genes, many driven by MERVL long terminal repeat (LTR) promoters. Although activation of these transcripts is a normal feature of totipotency, they must be rapidly silenced as development proceeds to the 4-cell and 8-cell stages; failure to shut down the 2C program results in developmental arrest. This study examines the role of maternal SETDB1, a histone H3K9 methyltransferase, in suppressing the 2C transcriptional network. Using an oocyte-specific conditional knockout that removes maternal Setdb1 while leaving the paternal allele intact, the authors demonstrate that embryos lacking maternal SETDB1 arrest during cleavage, with very few progressing beyond the 8-cell stage and no morphologically normal blastocysts forming. Transcriptomic analyses reveal persistent expression of MERVL-LTR-driven transcripts and other totipotency markers, indicating a failure to terminate the totipotent state. Together, the data demonstrate that maternally deposited SETDB1 is required to silence the MERVL-driven 2C program and enable the transition from totipotency to pluripotency. More broadly, the work identifies maternal SETDB1 as a key chromatin repressor that deposits repressive H3K9 methylation to shut down the transient 2C gene network and to permit normal preimplantation development. 

      Strengths: 

      (1) Closes a key knowledge gap. 

      The study tackles a central open question - how embryos exit the totipotent 2-cell (2C) state - and provides direct in vivo evidence that epigenetic repression is required to terminate the 2C program for development to proceed. By identifying maternal SETDB1 as the responsible factor, the work substantially advances our understanding of the maternal-to-zygotic transition and early lineage specification. 

      (2) Clean genetics paired with rigorous genomics. 

      An oocyte-specific Setdb1 knockout cleanly isolates a maternal-effect requirement, ensuring that early phenotypes arise from loss of maternal protein. The resulting cleavage-stage arrest is unambiguous (most embryos stall before or around the 8-cell stage). State-of-the-art single-embryo RNA-seq across stages - well-matched to low-cell-number constraints - captures genome-wide mis-expression, including persistent 2C transcripts in mutants, strongly supporting the conclusions. 

      (3) Compelling molecular linkage to phenotype. 

      Transcriptome data show that without maternal SETDB1, embryos fail to repress a suite of 1-cell/2C-specific genes by the 8-cell stage. The tight correlation between continued activation of the MERVL-driven totipotency network and developmental arrest provides a specific molecular explanation for the observed failure to progress. 

      (4) Mechanistic insight grounded in chromatin biology. 

      SETDB1, a H3K9 methyltransferase classically linked to heterochromatin and transposon repression, targets MERVL LTRs and MERVL-driven chimeric transcripts in early embryos. Bioinformatic evidence indicates that these loci normally acquire H3K9me3 during the 2C→4C transition. The data articulate a coherent mechanism: maternal SETDB1 deposits repressive H3K9me3 at 2C gene loci to shut down the totipotency network, extending observations from ESC systems to bona fide embryos. 

      (5) Broad implications for development and stem-cell biology. 

      By pinpointing a maternal gatekeeper of the totipotent-to-pluripotent transition, the work suggests that some cases of cleavage-stage arrest (e.g., in IVF) may reflect faulty epigenetic silencing of transposon-driven genes. It also informs stem-cell efforts to control totipotent-like states in vitro (e.g., 2C-like cells), linking epigenetic reprogramming, transposable-element regulation, and developmental potency.

      We thank Reviewer 1 for recognizing the strengths in our work and for the suggestions below.

      Weaknesses: 

      (1) Causality not directly demonstrated. 

      The link among loss of SETDB1, persistence of 2C transcripts, and developmental arrest is compelling but remains correlative. No rescue experiments test whether dampening the 2C/MERVL program restores development. Targeted interventions-e.g., knocking down key 2C drivers (such as Dux) or pharmacologically curbing MERVL-linked transcription in maternal Setdb1 mutants-would strengthen the claim that unchecked 2C activity is causal rather than a by-product of other SETDB1 functions.

      We agree that rescue experiments might strengthen causality. Those experiments, however, would be extremely challenging technically because the knockdowns would need to be precisely timed to follow (and not prevent) the wave of 2c-specific activation. Knocking down 2c drivers in the zygote, for example, may prevent switching on the totipotency program. In addition, while sustained MERVL expression—such as that induced by forced DUX expression—disrupts totipotency exit and embryo development (1, 2), derepression of transcription is very broad in Setdb1<sup>mat-/+</sup> embryos and knocking down individual 2C drivers may not be sufficient to rescue development or restore the exit from totipotency.

      (2) Limited mechanistic resolution of SETDB1 targeting. 

      The study establishes a requirement for maternal SETDB1 but does not define how it is recruited to MERVL loci. Given SETDB1's canonical cooperation with TRIM28/KAP1 and KRAB-ZNFs, upstream sequence-specific factors and/or pre-existing chromatin features likely guide targeting. Direct occupancy and mark-placement evidence (e.g., SETDB1/TRIM28 CUT&RUN or ChIP, and H3K9me3 profiling at MERVL LTRs during the 2C→4C window) would convert inferred mechanisms into demonstrated ones.

      We do show H3K9me3 patterns at MERVL LTRs during the early2c-late2c-2c-4c-8c-morula window from a published dataset. Please see the genome browser images in Figures 4C, 4D, 4E, 6D, 6E and Figure S6. We agree that mapping of SETDB1/TRIM28 to those locations would strengthen the mechanistic insight. However, ChIPseq or CUT&RUN of those proteins in preimplantation embryos are not technically feasible. We do provide genetic evidence for the collaboration between SETDB1 and DUXBL, a DNA-binding factor, by showing that DUXBL cannot switch off its top targets without SETDB1 (Figure 6). Future studies will characterize the molecular mechanisms underlying this (likely indirect) collaboration. We do not think that DUXBL and SETDB1 directly interact, because such interaction was not detected by DUXBL IP-MS (3).

      (3) Narrow scope on MERVL; broader epigenomic consequences underexplored. 

      Maternal SETDB1 may restrain additional repeat classes or genes beyond the 2C network. A systematic repeatome analysis (LINEs/SINEs/ERV subfamilies) would clarify specificity versus a general loss of heterochromatin control. Moreover, potential effects on imprinting or DNA methylation balance are not examined; perturbations there could also contribute to arrest. Bisulfite-based DNA methylation maps at imprinted loci and allele-specific expression analyses would help rule in/out these mechanisms.

      We did examine genes and repeat elements beyond the 2c network. We evaluated gene and TE expression changes using four-way comparisons. Please find the results regarding gene expression in Figure 1C-J, Figure S2, Figure S3, Figure S4., Table S2, Table S3, and Table S4. Please find results on TE expression in Figure S5. Table S6, Table S7, and Table S8 and in the text. We agree that DNA methylation may be altered in Setdb1<sup>mat-/+</sup> embryos. In our hands, evaluating this possibility using bisulfite sequencing requires a larger number of embryos than what we can feasibly obtain (the number of obtained mutant embryos is very small). Regarding imprinted gene expression, one cannot fully assess and interpret imprinted gene expression in preimplantation stage embryos before the maternally deposited transcripts are gone. We reported earlier that clear somatic parental-specific patterns of imprinted gene expression may only start later in development, around 8.5 dpc (4).

      (4) Phenotype quantitation and transcriptomic breadth could be clearer. 

      The developmental phenotype is described qualitatively ("very few beyond 8-cell") without precise stage-wise arrest rates or representative morphology. Tabulated counts (2C/4C/8C/blastocyst), images, and statistics would increase clarity. On the RNA-seq side, the narrative emphasizes known 2C markers; reporting novel/unannotated misregulated transcripts, as well as downregulated pathways (e.g., failure to activate normal 8-cell programs, metabolism, or early lineage markers), would present a fuller portrait of the mutant state.

      Tabulated counts are displayed in Figure 1A, and morphology is shown in Figure S1A. We do say that 4% Setdb1<sup>mat-/+</sup> embryos reached the 8-cel stage by 2.5 dpc. We recovered zero Setdb1<sup>mat-/+</sup> blastocysts at 4.5 dpc (not shown). On the RNA-seq side we do report a more global assessment of transcription of genes and TEs (please see above at point 3), including novel chimeric transcripts (Table S6). Developmental pathways are shown in Figure S3 and Figure S4. Metabolic pathways are displayed in Figure S2.

      Reviewer #2 (Public review): 

      Zeng et al. report that Setdb1-/- embryos fail to extinguish the 1- and 2-cell embryo transcriptional program and have permanent expression of MERVL transposable elements. The manuscript is technically sound and well performed, but, in my opinion, the results lack conceptual novelty.

      (1) The manuscript builds on previous observations that: 1, Setbd1 is necessary for early mouse development, with knockout embryos rarely reaching the 8-cell stage; 2, SETB1 mediates H3K9me3 deposition at transposable elements in mouse ESCs; 3, SETB1silences MERVLs to prevent 2CLC-state acquisition in mouse ESCs. The strength of the current work is the demonstration that this is not due to a general transcriptional collapse; but otherwise, the findings are not surprising. The well-known (several Nature papers of years ago) crosstalk between m6A RNA modification and H3K9me3 in preventing 2CLC generation also partly compromises the novelty of this work.

      We thank the Reviewer for appreciating the technical quality of our work. Regarding novelty, please consider that prior work in ES cells included contradictory findings (please see our Introduction). Prior embryology work (please see our Introduction) did not explain the preimplantation-stage phenotype. We highly appreciate those earlier works. Our work here answers the expectations drawn from prior studies and unequivocally shows that SETDB1 carries out the developmentally essential function of suppressing MERVLs and the 2-cell program in the mouse embryo.

      (2) The conclusions regarding H3K9me3 deposition are inferred based on previously reported datasets, but there is no direct demonstration.

      Dynamic H3K9me3 deposition is displayed at MERVL LTRs during the early2c-late2c-2c-4c-8c-morula window (Figures 4C, 4D, 4E, 6D, 6E and Figure S6) from a published work that has very high-quality data. We agree that demonstrating loss off H3K9me3 in Setdb1<sup>mat-/+</sup> embryos would confirm that the H3K9me3 histone methyltransferase function of SETDB1 (as opposed to any, yet unidentified, non-HMT specific activity of SETDB1) is responsible for shutting down MERVL LTRs. However, ChIP-seq, CUT&RUN, or similar assays are not feasible due to the rarity of Setdb1<sup>mat-/+</sup> embryos.

      (3) The detection of chimeric transcripts is somewhat unreliable using short-read sequencing.

      We used single embryo total RNA-seq and we report detecting chimeric transcripts (Table S6), which is considered more reliable than mRNA-seq for detecting chimeric transcripts, because many are not polyadenylated. We acknowledge, however, that long-read sequencing, which recently is becoming available, but which is still very expensive, is currently the most powerful method for detecting chimeric transcripts. This, however, does not affect the major conclusions or the significance of our work.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful to the Review Commons reviewers for their constructive feedback, which has significantly strengthened the manuscript. In response, we have performed additional experiments, revised and expanded multiple figures, incorporated new statistical and functional analyses, and carefully edited the text to improve clarity and precision. A detailed point-by-point response to all reviewer comments, together with a summary of revised figures, is provided.

      To address the reviewers' suggestions, we have conducted additional experiments that are now incorporated into new figures, or we have added new images to several existing figures where appropriate.

      For this reason, please note that all figures have been renumbered to improve clarity and facilitate cross-referencing throughout the text. As recommended by Referee #3, all figure legends have been thoroughly revised to reflect these updates and are now labeled following the standard A-Z panel format, enhancing readability and ensuring easier identification. In addition, all figure legends now include the sample size for each statistical analysis.

      For clarity and ease of reference, we provide below a comprehensive list of all figures included in the revised version. Figures that have undergone modifications are underlined.

      Figure 1____. The first spermatogenesis wave in prepuberal mice.

      This figure now includes amplified images of representative spermatocytes and a summary schematic illustrating the timeline of spermatogenesis. In addition, it now presents the statistical analysis of spermatocyte quantification to support the visual data.

      __Figure 2.____ Cilia emerge across all stages of prophase I in spermatocytes during the first spermatogenesis wave. __

      The images of this figure remain unchanged from the original submission, but all the graphs present now the statistical analysis of spermatocyte quantification.

      Figure 3. Ultrastructure and markers of prepuberal meiotic cilia.

      This figure remains unchanged from the original submission; however, we have replaced the ARL3-labelled spermatocyte image (A) with one displaying a clearer and more representative signal.

      __Figure 4. Testicular tissue presents spermatocyte cysts in prepuberal mice and adult humans. __

      This figure remains unchanged from the original submission.

      __Figure 5. Cilia and flagella dynamics are correlated during prepuberal meiosis. __

      This figure remains unchanged from the original submission.

      __Figure 6. Comparative proteomics identifies potential regulators of ciliogenesis and flagellogenesis. __

      This figure remains unchanged from the original submission.

      Figure 7.____ Deciliation induces persistence of DNA damage in meiosis.

      This figure has been substantially revised and now includes additional experiments analyzing chloral hydrate treatment, aimed at more accurately assessing DNA damage under both control and treated conditions. Images F-I and graph J are new.

      Figure 8____. Aurora kinase A is a regulator of cilia disassembly in meiosis.

      This figure is remodelled as the original version contained a mistake in previous panel II, for this, graph in new Fig.8 I has been corrected. In addition, it now contains additional data of αTubulin staining in arrested ciliated metaphases I after AURKA inhibition (new panel L1´).

      __Figure 9. Schematic representation of the prepuberal versus adult seminiferous epithelium. __

      This figure remains unchanged from the original submission.

      __Supplementary Figure 1. Meiotic stages during the first meiotic wave. __

      This figure remains unchanged from the original submission.

      __Supplementary Figure 2 (new)____. __

      This is a new figure that includes additional data requested by the reviewers. It includes additional markers of cilia in spermatocytes (glutamylated Tubulin/GT335), and the control data of cilia markers in non-ciliated spermatocytes. It also includes now the separated quantification of ciliated spermatocytes for each stage, as requested by reviewers, complementing graphs included in Figure 2.

      Please note that with the inclusion of this new Supplementary Figure 2, the numbering of subsequent supplementary figures has been updated accordingly.

      Supplementary Figure 3 (previously Suppl. Fig. 2)__. Ultrastructure of prophase I spermatocytes. __

      This figure is equal in content to the original submission, but some annotations have been included.

      Supplementary Figure 4 (previously Suppl. Fig. 3).__ Meiotic centrosome under the electron microscope. __

      This figure remains unchanged from the original submission, but additional annotations have been included.

      Supplementary Figure 5 (previously Suppl. Fig. 4)__. Human testis contains ciliated spermatocytes. __

      This figure has been revised and now includes additional H2AX staining to better determine the stage of ciliated spermatocytes and improve their identification.

      Supplementary Figure 6 (previously Suppl. Fig. 5). GLI1 and GLI3 readouts of Hedgehog signalling are not visibly affected in prepuberal mouse testes.

      This figure has been remodeled and now includes the quantification of GLI1 and GLI3 and its corresponding statistical analysis. It also includes the control data for Tubulin, instead of GADPH.

      Supplementary Figure 7 (previously Suppl. Fig. 6)__. CH and MLN8237 optimization protocol. __

      This figure has been remodeled to incorporate control experiments using 1-hour organotypic culture treatment.

      Supplementary Figure 8 (previously Suppl. Fig. 7)__. Tracking first meiosis wave with EdU pulse injection during prepubertal meiosis. __This figure remains unchanged from the original submission.

      Supplementary Figure 9 (previously Suppl. Fig. 8)__. PLK1 and AURKA inhibition in cultured spermatocytes. __

      This figure has been remodeled and now includes additional data on spindle detection in control and AURKA-inhibited spermatocytes (both ciliated and non ciliated).

      DETAILED POINT-BY-POINT RESPONSE TO THE REVIEWERS

      We will submit both the PDF version of the revised manuscript and the Word file with tracked changes relative to the original submission. Each modification made in response to reviewers' suggestions is annotated in the Word document within the corresponding section of the text. all new figures have also been uploaded to the system.

      Response to the Referee #1

      In this manuscript by Perez-Moreno et al., titled "The dynamics of ciliogenesis in prepubertal mouse meiosis reveal new clues about testicular maturation during puberty", the authors characterize the development of primary cilia during meiosis in juvenile male mice. The authors catalog a variety of testicular changes that occur as juvenile mice age, such as changes in testis weight and germ cell-type composition. They next show that meiotic prophase cells initially lack cilia, and ciliated meiotic prophase cells are detected after 20 days postpartum, coinciding with the time when post-meiotic spermatids within the developing testes acquire flagella. They describe that germ cells in juvenile mice harbor cilia at all substages of meiotic prophase, in contrast to adults where only zygotene stage meiotic cells harbor cilia. The authors also document that cilia in juvenile mice are longer than those in adults. They characterize cilia composition and structure by immunofluorescence and EM, highlighting that cilia polymerization may initially begin inside the cell, followed by extension beyond the cell membrane. Additionally, they demonstrate ciliated cells can be detected in adult human testes. The authors next perform proteomic analyses of whole testes from juvenile mice at multiple ages, which may not provide direct information about the extremely small numbers of ciliated meiotic cells in the testis, and is lacking follow up experiments, but does serve as a valuable resource for the community. Finally, the authors use a seminiferous tubule culturing system to show that chemical inhibition of Aurora kinase A likely inhibits cilia depolymerization upon meiotic prophase I exit and leads to an accumulation of metaphase-like cells harboring cilia. They also assess meiotic recombination progression using their culturing system, but this is less convincing.

      Author response: We sincerely thank Ref #1 for the thorough and thoughtful evaluation of our manuscript. We are particularly grateful for the reviewer's careful reading and constructive feedback, which have helped us refine several sections of the text and strengthen our discussion. All comments and suggestions have been carefully considered and addressed, as detailed below.

      __Major comments: __

      1. There are a few issues with the experimental set up for assessing the effects of cilia depolymerization on DNA repair (Figure 7-II). First, how were mid pachytene cells identified and differentiated from early pachytene cells (which would have higher levels of gH2AX) in this experiment? I suggest either using H1t staining (to differentiate early/mid vs late pachytene) or the extent of sex chromosome synapsis. This would ensure that the authors are comparing similarly staged cells in control and treated samples. Second, what were the gH2AX levels at the starting point of this experiment? A more convincing set up would be if the authors measure gH2AX immediately after culturing in early and late cells (early would have higher gH2AX, late would have lower gH2AX), and then again after 24hrs in late cells (upon repair disruption the sampled late cells would have high gH2AX). This would allow them to compare the decline in gH2AX (i.e., repair progression) in control vs treated samples. Also, it would be informative to know the starting gH2AX levels in ciliated vs non-ciliated cells as they may vary.

      Response:

      We thank Ref #1 for this valuable comment, which significantly contributed to improving both the design and interpretation of the cilia depolymerization assay.

      Following this suggestion, we repeated the experiment including 1-hour (immediately after culturing), and 24-hour cultures for both control and chloral hydrate (CH)-treated samples (n = 3 biological replicates). To ensure accurate staging, we now employ triple immunolabelling for γH2AX, SYCP3, and H1T, allowing clear distinction of zygotene (H1T−), early pachytene (H1T−), and late pachytene (H1T+) cells. The revised data (Figure 7) now provide a more complete and statistically robust analysis of DNA damage dynamics. These results confirm that CH-induced deciliation leads to persistence of the γH2AX signal at 24 hours, indicating impaired DNA repair progression in pachytene spermatocytes. The new images and graphs are included in the revised Figure 7.

      Regarding the reviewer's final point about the comparison of γH2AX levels between ciliated and non-ciliated cells, we regret that direct comparison of γH2AX levels between ciliated and non-ciliated cells is not technically feasible. To preserve cilia integrity, all cilia-related imaging is performed using the squash technique, which maintains the three-dimensional structure of the cilia but does not allow reliable quantification of DNA damage markers due to nuclear distortion. Conversely, the nuclear spreading technique, used for DNA damage assessment, provides optimal visualization of repair foci but results in the loss of cilia due to cytoplasmic disruption during the hypotonic step. Given that spermatocytes in juvenile testes form developmentally synchronized cytoplasmic cysts, we consider that analyzing a statistically representative number of spermatocytes offers a valid and biologically meaningful measure of tissue-level effects.

      In conclusion, we believe that the additional experiments and clarifications included in revised Figure 7 strengthen our conclusion that cilia depolymerization compromises DNA repair during meiosis. Further functional confirmation will be pursued in future works, since we are currently generating a conditional genetic model for a ciliopathy in our laboratory.

      The authors analyze meiotic progression in cells cultured with/without AURKA inhibition in Figure 8-III and conclude that the distribution of prophase I cells does not change upon treatment. Is Figure 8-III A and B the same data? The legend text is incorrect, so it's hard to follow. Figure 8-III A shows a depletion of EdU-labelled pachytene cells upon treatment. Moreover, the conclusion that a higher proportion of ciliated zygotene cells upon treatment (Figure 8-II C) suggests that AURKA inhibition delays cilia depolymerization (page 13 line 444) does not make sense to me.

      Response:

      We thank Ref#1 for identifying this issue and for the careful examination of Figure 8. We discovered that the submitted version of Figure 8 contained a mismatch between the figure legend and the figure panels. The legend text was correct; however, the figure inadvertently included a non-corresponding graph (previously panel II-A), which actually belonged to Supplementary Figure 7 in the original submission. We apologize for this mistake.

      This error has been corrected in the revised version. The updated Figure 8 now accurately presents the distribution of EdU-labelled spermatocytes across prophase I substages in control and AURKA-inhibited cultures (previously Figure 8-II B, now Figure 8-A). The corrected data show no significant differences in the proportions of EdU-labelled spermatocytes among prophase I substages after 24 hours of AURKA inhibition, confirming that meiotic progression is not delayed and that no accumulation of zygotene cells occurs under this treatment. Therefore, the observed increase in ciliated zygotene spermatocytes upon AURKA inhibition (new Figure 8 H-I) is best explained by a delay in cilia disassembly, rather than by an arrest or slowdown in meiotic progression. The figure legend and main text have been revised accordingly.

      How do the authors know that there is a monopolar spindle in Figure 8-IV treated samples? Perhaps the authors can use a different Tubulin antibody (that does not detect only acetylated Tubulin) to show that there is a monopolar spindle.

      Response:

      We appreciate Ref#1 for this excellent suggestion. In the original submission (lines 446-447), we described that ciliated metaphase I spermatocytes in AURKA-inhibited samples exhibited monopolar spindle phenotypes. This description was based on previous reports showing that AURKA or PLK1 inhibition produces metaphases with monopolar spindles characterized by aberrant yet characteristic SYCP3 patterns, abnormal chromatin compaction, and circular bivalent alignment around non-migrated centrosomes (1). In our study, we observed SYCP3 staining consistent with these characteristic features of monopolar metaphases I.

      However, we agree with Ref #1 that this could be better sustained with data. Following the reviewer's suggestion, we performed additional immunostaining using α-Tubulin, which labels total microtubules rather than only the acetylated fraction. For clarity purposes, the revised Figure 8 now includes α-Tubulin staining in the same ciliated metaphase I cells shown in the original submission, confirming the presence of defective microtubule polymerization and defective spindle organization. For clarity, we now refer to these ciliated metaphases I as "arrested MI". This new data further support our conclusion that AURKA inhibition disrupts spindle bipolarization and prevents cilia depolymerization, indicating that cilia maintenance and bipolar spindle organization are mechanistically incompatible events during male meiosis. The abstract, results, and discussion section has been expanded accordingly, emphasizing that the persistence of cilia may interfere with microtubule polymerization and centrosome separation under AURKA inhibition. The Discussion has been expanded to emphasize that persistence of cilia may interfere with centrosome separation and microtubule polymerization, contrasting with invertebrate systems -e.g. Drosophila (2) and P. brassicae (3)- in which meiotic cilia persist through metaphase I without impairing bipolar spindle assembly.

      1. Alfaro, et al. EMBO Rep 22, (2021). DOI: 15252/embr.202051030 (PMID: 33615693)
      2. Riparbelli et al . Dev Cell (2012) DOI: 1016/j.devcel.2012.05.024 (PMID: 22898783)
      3. Gottardo et al, Cytoskeleton (Hoboken) (2023) DOI: 1002/cm.21755 (PMID: 37036073)

      The authors state in the abstract that they provide evidence suggesting that centrosome migration and cilia depolymerization are mutually exclusive events during meiosis. This is not convincing with the data present in the current manuscript. I suggest amending this statement in the abstract.

      Response:

      We thank Ref#1 for this valuable observation, with which we fully agree. To avoid overstatement, the original statement has been removed from the Abstract, Results, and Discussion, and replaced with a more accurate formulation indicating that cilia maintenance and bipolar spindle formation are mutually exclusive events during mouse meiosis.

      This revised statement is now directly supported by the new data presented in Figure 8, which demonstrate that AURKA inhibition prevents both spindle bipolarization and cilia depolymerization. We are grateful to the reviewer for highlighting this important clarification.

      Minor comments:

      The presence of cilia in all stages of meiotic prophase I in juvenile mice is intriguing. Why is the cellular distribution and length of cilia different in prepubertal mice compared to adults (where shorter cilia are present only in zygotene cells)? What is the relevance of these developmental differences? Do cilia serve prophase I functions in juvenile mice (in leptotene, pachytene etc.) that are perhaps absent in adults?

      Related to the above point, what is the relevance of the absence of cilia during the first meiotic wave? If cilia serve a critical function during prophase I (for instance, facilitating DSB repair), does the lack of cilia during the first wave imply differing cilia (and repair) requirements during the first vs latter spermatogenesis waves?

      In my opinion, these would be interesting points to discuss in the discussion section.

      Response:

      We thank the reviewer for these thoughtful observations, which we agree are indeed intriguing.

      We believe that our findings likely reflect a developmental role for primary cilia during testicular maturation. We hypothesize that primary cilia at this stage might act as signaling organelles, receiving cues from Sertoli cells or neighboring spermatocytes and transmitting them through the cytoplasmic cysts shared by spermatocytes. Such intercellular communication could be essential for coordinating tissue maturation and meiotic entry during puberty. Although speculative, this hypothesis aligns with the established role of primary cilia as sensory and signaling hubs for GPCR and RTK pathways regulating cell differentiation and developmental patterning in multiple tissues (e.g., 1, 2). The Discussion section has been expanded to include these considerations.

      1. Goetz et al, Nat Rev Genet (2010)- DOI: 1038/nrg2774 (PMID: 20395968)
      2. Naturky et al , Cell (2019) DOI: 1038/s41580-019-0116-4 (PMID: 30948801) Our study focuses on the first spermatogenic wave, which represents the transition from the juvenile to the reproductive phase. It is therefore plausible that the transient presence of longer cilia during this period reflects a developmental requirement for external signaling that becomes dispensable in the mature testis. Given that this is only the second study to date examining mammalian meiotic cilia, there remains a vast area of research to explore. We plan to address potential signaling cascades involved in these processes in future studies.

      On the other hand, while we cannot confirm that the cilia observed in zygotene spermatocytes persist until pachytene within the same cell, it is reasonable to speculate that they do, serving as longer-lasting signaling structures that facilitate testicular development during the critical pubertal window. In addition, the observation of ciliated spermatocytes at all prophase I substages at 20 dpp, together with our proteomic data, supports the idea that the emergence of meiotic cilia exerts a significant developmental impact on testicular maturation.

      In summary, although we cannot yet define specific prophase I functions for meiotic cilia in juvenile spermatocytes, our data demonstrate that the first meiotic wave differs from later waves in cilia dynamics, suggesting distinct regulatory requirements between puberty and adulthood. These findings underscore the importance of considering developmental context when using the first meiotic wave as a model for studying spermatogenesis.

      The authors state on page 9 lines 286-288 that the presence of cytoplasmic continuity via intercellular bridges (between developmentally synchronous spermatocytes) hints towards a mechanism that links cilia and flagella formation. Please clarify this statement. While the correlation between the timing of appearance of cilia and flagella in cells that are located within the same segment of the seminiferous tubule may be hinting towards some shared regulation, how would cytoplasmic continuity participate in this regulation? Especially since the cytoplasmic continuity is not between the developmentally distinct cells acquiring the cilia and flagella?

      Response:

      We thank Ref#1 for this excellent question and for the opportunity to clarify our statement.

      The presence of intercellular bridges between spermatocytes is well known and has long been proposed to support germ cell communication and synchronization (1,2) as well as sharing mRNA (3) and organelles (4). A classic example is the Akap gene, located on the X chromosome and essential for the formation of the sperm fibrous sheath; cytoplasmic continuity through intercellular bridges allows Akap-derived products to be shared between X- and Y-bearing spermatids, thereby maintaining phenotypic balance despite transcriptional asymmetry (5). In addition, more recent work has further demonstrated that these bridges are critical for synchronizing meiotic progression and for processes such as synapsis, double-strand break repair, and transposon repression (6).

      In this context, and considering our proteomic data (Figure 6), our statement did not intend to imply direct cytoplasmic exchange between ciliated and flagellated cells. Although our current methods do not allow comprehensive tracing of cytoplasmic continuity from the basal to the luminal compartment of the seminiferous epithelium, we plan to address this limitation using high-resolution 3D and ultrastructural imaging approaches in future studies.

      Based on our current data, we propose that cytoplasmic continuity within developmentally synchronized spermatocyte cysts could facilitate the coordinated regulation of ciliogenesis, and similarly enable the sharing of regulatory factors controlling flagellogenesis within spermatid cysts. This coordination may occur through the diffusion of centrosomal or ciliary proteins, mRNAs, or signaling intermediates involved in the regulation of microtubule dynamics. However, we cannot exclude the possibility that such cytoplasmic continuity extends across all spermatocytes derived from the same spermatogonial clone, potentially providing a larger regulatory network.]] This mechanism could help explain the temporal correlation we observe between the appearance of meiotic cilia and the onset of flagella formation in adjacent spermatids within the same seminiferous segment.

      We have revised the Discussion to explicitly clarify this interpretation and to note that, although hypothetical, it is consistent with established literature on cytoplasmic continuity and germ cell coordination.

      1. Dym, et al. * Reprod.*(1971) DOI: 10.1093/biolreprod/4.2.195 (PMID: 4107186)
      2. Braun et al. Nature. (1989) DOI: 1038/337373a0 (PMID: 2911388)
      3. Greenbaum et al. * Natl. Acad. Sci. USA*(2006). DOI: 10.1073/pnas.0505123103 (PMID: 16549803)
      4. Ventelä et al. Mol Biol Cell. (2003) DOI: 1091/mbc.e02-10-0647 (PMID: 12857863)
      5. Turner et al. Journal of Biological Chemistry (1998). DOI: 1074/jbc.273.48.32135 (PMID: 9822690)
      6. Sorkin, et al. Nat Commun (2025). DOI: 1038/s41467-025-56742-9 (PMID: 39929837) *note: due to manuscript-length limitations, not all cited references can be included in the text; they are listed here to substantiate our response.

      Individual germ cells in H&E-stained testis sections in Figure 1-II are difficult to see. I suggest adding zoomed-in images where spermatocytes/round spermatids/elongated spermatids are clearly distinguishable.

      Response:

      Ref#1 is very right in this suggestion. We have revised Figure 1 to improve the quality of the H&E-stained testis sections and have added zoomed-in panels where spermatocytes, round spermatids, and elongated spermatids are clearly distinguishable. These additions significantly enhance the clarity and interpretability of the figure.

      In Figure 2-II B, the authors document that most ciliated spermatocytes in juvenile mice are pachytene. Is this because most meiotic cells are pachytene? Please clarify. If the data are available (perhaps could be adapted from Figure 1-III), it would be informative to see a graph representing what proportions of each meiotic prophase substages have cilia.

      Response:

      We thank the reviewer for this valuable observation. Indeed, the predominance of ciliated pachytene spermatocytes reflects the fact that most meiotic cells in juvenile testes are at the pachytene stage (Figure 1). We have clarified this point in the text and have added a new supplementary figure (Supplementary Figure 2, new figure) presenting a graph showing the proportion of spermatocytes at each prophase I substage that possess primary cilia. This visualization provides a clearer quantitative overview of ciliation dynamics across meiotic substages.

      I suggest annotating the EM images in Sup Figure 2 and 3 to make it easier to interpret.

      Response:

      We thank the reviewer for this helpful suggestion. We have now added annotations to the EM images in Supplementary Figures 3 and 4 to facilitate their interpretation. These visual guides help readers more easily identify the relevant ultrastructural features described in the text.

      The authors claim that the ratio between GLI3-FL and GLI3-R is stable across their analyzed developmental window in whole testis immunoblots shown in Sup Figure 5. Quantifying the bands and normalizing to the loading control would help strengthen this claim as it hard to interpret the immunoblot in its current form.

      Response:

      We thank the reviewer for this valuable suggestion. Following this recommendation, Supplementary Figure 5 has been revised to include quantification of GLI1 and GLI3 protein levels, normalized to the loading control.

      After quantification, we observed statistically significant differences across developmental stages. Specifically, GLI1 expression is slightly higher at 21 dpp compared to 8 dpp. For GLI3, we performed two complementary analyses:

      • Total GLI3 protein (sum of full-length and repressor forms normalized to loading control) shows a progressive decrease during development, with the lowest levels at 60 dpp (Supplementary Figure 5D).
      • GLI3 activation status, assessed as the GLI3-FL/GLI3-R ratio, is highest during the 19-21 dpp window, compared to 8 dpp and 60 dpp. Although these results suggest a possible transient activation of GLI3 during testicular maturation, we caution that this cannot automatically be attributed to increased Hedgehog signaling, as GLI3 processing can also be affected by other processes, such as changes in ciliogenesis. Furthermore, because the analysis was performed on whole-testis protein extracts, these changes cannot be specifically assigned to ciliated spermatocytes.

      We have expanded the Discussion to address these findings and to highlight the potential involvement of the Desert Hedgehog (DHH) pathway, which plays key roles in testicular development, Sertoli-germ cell communication, and spermatogenesis (1, 2, 3). We plan to investigate these pathways further in future studies.

      1. Bitgood et al. Curr Biol. (1996). DOI: 1016/s0960-9822(02)00480-3 (PMID: 8805249)
      2. Clark et al. Biol Reprod. (2000) DOI: 1095/biolreprod63.6.1825 (PMID: 11090455)
      3. O'Hara et al. BMC Dev Biol. (2011) DOI: 1186/1471-213X-11-72 (PMID: 22132805) *note: due to manuscript-length limitations, not all cited references can be included in the text; they are listed here to substantiate our response.

      There are a few typos throughout the manuscript. Some examples: page 5 line 172, Figure 3-I legend text, Sup Figure 5-II callouts, Figure 8-III legend, page 15 line 508, page 17 line 580, page 18 line 611.

      Response:

      We thank the reviewer for detecting this. All typographical errors have been corrected, and figure callouts have been reviewed for consistency.

      Response to the Referee #2

      This study focuses on the dynamic changes of ciliogenesis during meiosis in prepubertal mice. It was found that primary cilia are not an intrinsic feature of the first wave of meiosis (initiating at 8 dpp); instead, they begin to polymerize at 20 dpp (after the completion of the first wave of meiosis) and are present in all stages of prophase I. Moreover, prepubertal cilia (with an average length of 21.96 μm) are significantly longer than adult cilia (10 μm). The emergence of cilia coincides temporally with flagellogenesis, suggesting a regulatory association in the formation of axonemes between the two. Functional experiments showed that disruption of cilia by chloral hydrate (CH) delays DNA repair, while the AURKA inhibitor (MLN8237) delays cilia disassembly, and centrosome migration and cilia depolymerization are mutually exclusive events. These findings represent the first detailed description of the spatiotemporal regulation and potential roles of cilia during early testicular maturation in mice. The discovery of this phenomenon is interesting; however, there are certain limitations in functional research.

      We thank Referee #2 for their careful reading of the manuscript and for highlighting important limitations regarding functional interpretation.

      Our primary objective in this study was to provide a rigorous structural, temporal, and developmental characterization of meiotic ciliogenesis in the mammalian testis, a process for which almost no prior data exist. Given this lack of foundational information, we focused on establishing when, where, and in which meiotic stages primary cilia form during prepubertal development, and on identifying candidate regulatory pathways using complementary imaging, proteomic, and pharmacological approaches.

      We agree that genetic ablation models would provide the most direct means to test ciliary function during spermatogenesis. However, we believe that such functional analyses must be preceded by a detailed developmental and phenotypic framework, which was previously unavailable. The present study therefore represents a necessary first step, defining the dynamics, ultrastructure, and molecular context of meiotic cilia during the transition from juvenile to adult spermatogenesis. We are currently generating conditional genetic models to directly address functional mechanisms in future work.

      Regarding the temporal coincidence between the emergence of meiotic cilia and the onset of flagellogenesis, we do not interpret this observation as evidence of stochastic or non-functional protein expression. Rather, we present it as a developmental correlation that may reflect shared regulatory constraints on axonemal assembly during testicular maturation. We have clarified in the revised manuscript that this relationship is descriptive and hypothesis-generating, and we avoid assigning direct causal roles.

      With respect to the proteomic analysis, we agree that proteomics alone cannot establish function. Our intent was not to assign causality, but to provide a developmental, hypothesis-generating dataset identifying candidate regulators that are enriched at the precise developmental window when both meiotic cilia and spermatid flagella first emerge. We have revised the text to explicitly frame these data as a resource for future mechanistic studies, rather than as direct functional evidence.

      Taken together, we believe that the revised manuscript now more accurately reflects the scope and limitations of the study, while providing a robust and much-needed developmental framework for future genetic and functional analyses of meiotic ciliogenesis in mammals. We would be happy to further clarify any aspect of these interpretations if the reviewer or editor considers it helpful.

      Major points:

      1. The prepubertal cilia in spermatocytes discovered by the authors lack specific genetic ablation to block their formation, making it impossible to evaluate whether such cilia truly have functions. Because neither in the first wave of spermatogenesis nor in adult spermatogenesis does this type of cilium seem to be essential. In addition, the authors also imply that the formation of such cilia appears to be synchronized with the formation of sperm flagella. This suggests that the production of such cilia may merely be transient protein expression noise rather than a functionally meaningful cellular structure.

      Response:

      We agree that a genetic ablation model would represent the ideal approach to directly test cilia function in spermatogenesis. However, given the complete absence of prior data describing the dynamics of ciliogenesis during testis development, our priority in this study was to establish a rigorous structural and temporal characterization of this process in the main mammalian model organism, the mouse. This systematic and rigorous phenotypic characterization is a necessary first step before any functional genetics could be meaningfully interpreted.

      To our knowledge, this study represents the first comprehensive analysis of ciliogenesis during prepubertal mouse meiosis, extending our previous work on adult spermatogenesis (1). Beyond these two contributions, only four additional studies have addressed meiotic cilia-two in zebrafish (2, 3), with Mytlys et al. also providing preliminary observations relevant to prepubertal male meiosis that we discuss in the present work, one in Drosophila (4) and a recent one in butterfly (5). No additional information exists for mammalian gametogenesis to date.

      1. López-Jiménez et al. Cells (2022) DOI: 10.3390/cells12010142 (PMID: 36611937)
      2. Mytlis et al. Science (2022) DOI: 10.1126/science.abh3104 (PMID: 35549308)
      3. Xie et al. J Mol Cell Biol (2022) DOI: 10.1093/jmcb/mjac049 (PMID: 35981808)
      4. Riparbelli et al . Dev Cell (2012) DOI: 10.1016/j.devcel.2012.05.024 (PMID: 22898783)
      5. Gottardo et al, Cytoskeleton (Hoboken) (2023) DOI: 10.1002/cm.21755 (PMID: 37036073) We therefore consider this descriptive and analytical foundation to be essential before the development of functional genetic models. Indeed, we are currently generating a conditional genetic model for a ciliopathy in our laboratory. These studies are ongoing and will directly address the type of mechanistic questions raised here, but they extend well beyond the scope and feasible timeframe of the present manuscript.

      We thus maintain that the present work constitutes a necessary and timely contribution, providing a robust reference dataset that will facilitate and guide future functional studies in the field of cilia and meiosis.

      Taking this into account, we would be very pleased to address any additional, concrete suggestions from Ref#2 that could further strengthen the current version of the manuscript

      The high expression of axoneme assembly regulators such as TRiC complex and IFT proteins identified by proteomic analysis is not particularly significant. This time point is precisely the critical period for spermatids to assemble flagella, and TRiC, as a newly discovered component of flagellar axonemes, is reasonably highly expressed at this time. No intrinsic connection with the argument of this paper is observed. In fact, this testicular proteomics has little significance.

      Response:

      We appreciate this comment but respectfully disagree with the reviewer's interpretation of our proteomic data. To our knowledge, this is the first proteomic study explicitly focused on identifying ciliary regulators during testicular development at the precise window (19-21 dpp) when both meiotic cilia and spermatid flagella first emerge.

      While Piprek et al (1) analyzed the expression of primary cilia in developing gonads, proteomic data specifically covering the developmental transition at 19-21 dpp were not previously available. Furthermore, a recent cell-sorting study (2), detected expression of cilia proteins in pachytene spermatocytes compared to round spermatids, but did not explore their functional relevance or integrate these data with developmental timing or histological context.

      In contrast, our dataset integrates histological staging, high-resolution microscopy, and quantitative proteomics, revealing a set of candidate regulators (including DCAF7, DYRK1A, TUBB3, TUBB4B, and TRiC) potentially involved in cilia-flagella coordination. We view this as a hypothesis-generating resource that outlines specific proteins and pathways for future mechanistic studies on both ciliogenesis and flagellogenesis in the testis.

      Although we fully agree that proteomics alone cannot establish causal function, we believe that dismissing these data as having little significance overlooks their value as the first molecular map of the testis at the developmental window when axonemal structures arise. Our dataset provides, for the first time, an integrated view of proteins associated with ciliary and flagellar structures at the developmental stage when both axonemal organelles first appear. We thus believe that our proteomic dataset represents an important and novel contribution to the understanding of testicular development and ciliary biology.

      Considering this, we would again welcome any specific suggestions from Ref#2 on additional analyses or clarifications that could make the relevance of this dataset even clearer to readers.

      1. Piprek et al. Int J Dev Biol. (2019) doi: 10.1387/ijdb.190049rp (PMID: 32149371).
      2. Fang et al. Chromosoma. (1981) doi: 10.1007/BF00285768 (PMID: 7227045). Response to the Referee #3

      In "The dynamics of ciliogenesis in prepubertal mouse meiosis reveals new clues about testicular development" Pérez-Moreno, et al. explore primary cilia in prepubertal mouse spermatocytes. Using a combination of microscopy, proteomics, and pharmacological perturbations, the authors carefully characterize prepubertal spermatocyte cilia, providing foundational work regarding meiotic cilia in the developing mammalian testis.

      Response: We sincerely thank Ref#3 for their positive assessment of our work and for the thoughtful suggestions that have helped us strengthen the manuscript. We are pleased that the reviewer recognizes both the novelty and the relevance of our study in providing foundational insights into meiotic ciliogenesis during prepubertal testicular development. All specific comments have been carefully considered and addressed as detailed below.

      Major concerns:

      1. The authors provide evidence consistent with cilia not being present in a larger percentage of spermatocytes or in other cells in the testis. The combination of electron microscopy and acetylated tubulin antibody staining establishes the presence of cilia; however, proving a negative is challenging. While acetylated tubulin is certainly a common marker of cilia, it is not in some cilia such as those in neurons. The authors should use at least one additional cilia marker to better support their claim of cilia being absent.

      Response:

      We thank the reviewer for this helpful suggestion. In the revised version, we have strengthened the evidence for cilia identification by including an additional ciliary marker, glutamylated tubulin (GT335), in combination with acetylated tubulin and ARL13B (which were included in the original submission). These data are now presented in the new Supplementary Figure 2, which also includes an example of a non-ciliated spermatocyte showing absence of both ARL13B and AcTub signals.

      Taken together, these markers provide a more comprehensive validation of cilia detection and confirm the absence of ciliary labelling in non-ciliated spermatocytes.

      The conclusion that IFT88 localizes to centrosomes is premature as key controls for the IFT88 antibody staining are lacking. Centrosomes are notoriously "sticky", often sowing non-specific antibody staining. The authors must include controls to demonstrate the specificity of the staining they observe such as staining in a genetic mutant or an antigen competition assay.

      Response:

      We appreciate the reviewer's concern and fully agree that antibody specificity is critical when interpreting centrosomal localization. The IFT88 antibody used in our study is commercially available and has been extensively validated in the literature as both a cilia marker (1, 2), and a centrosome marker in somatic cells (3). Labelling of IFT88 in centrosomes has also been previously described using other antibodies (4, 5). In our material, the IFT88 signal consistently appears at one of the duplicated centrosomes and at both spindle poles-patterns identical to those reported in somatic cells. We therefore consider the reported meiotic IFT88 staining as specific and biologically reliable.

      That said, we agree that genetic validation would provide the most definitive confirmation. We would like to inform that we are currently since we are currently generating a conditional genetic model for a ciliopathy in our laboratory that will directly assess both antibody specificity and functional consequences of cilia loss during meiosis. These experiments are in progress and will be reported in a follow-up study.

      1. Wong et al. Science (2015). DOI: 1126/science.aaa5111 (PMID: 25931445)
      2. Ocbina et al. Nat Genet (2011). DOI: 1038/ng.832 (PMID: 21552265)
      3. Vitre et al. EMBO Rep (2020). DOI: 15252/embr.201949234 (PMID: 32270908)
      4. Robert A. et al. J Cell Sci (2007). DOI: 1242/jcs.03366 (PMID: 17264151)
      5. Singla et al, Developmental Cell (2010). DOI: 10.1016/j.devcel.2009.12.022 (PMID: 20230748) *note: due to manuscript-length limitations, not all cited references can be included in the text; they are listed here to substantiate our response.

      There are many inconsistent statements throughout the paper regarding the timing of the first wave of spermatogenesis. For example, the authors state that round spermatids can be detected at 21dpp on line 161, but on line 180, say round spermatids can be detected a 19dpp. Not only does this lead to confusion, but such discrepancies undermine the validity of the rest of the paper. A summary graphic displaying key events and their timing in the first wave of spermatogenesis would be instrumental for reader comprehension and could be used by the authors to ensure consistent claims throughout the paper.

      Response:

      We thank the reviewer for identifying this inconsistency and apologize for the confusion. We confirm that early round spermatids first appear at 19 dpp, as shown in the quantitative data (Figure 1J). This can be detected in squashed spermatocyte preparations, where individual spermatocytes and spermatids can be accurately quantified. The original text contained an imprecise reference to the histological image of 21 dpp (previous line 161), since certain H&E sections did not clearly show all cell types simultaneously. However, we have now revised Figure 1, improving the image quality and adding a zoomed-in panel highlighting early round spermatids. Image for 19 dpp mice in Fig 1D shows early, yet still aflagellated spermatids. The first ciliated spermatocytes and the earliest flagellated spermatids are observed at 20 dpp. This has been clarified in the text.

      In addition, we also thank the reviewer for the suggestion of adding a summary graphic, which we agree greatly facilitates reader comprehension. We have added a new schematic summary (Figure 1K) illustrating the key stages and timing of the first spermatogenic wave.

      In the proteomics experiments, it is unclear why the authors assume that changes in protein expression are predominantly due to changes within the germ cells in the developing testis. The analysis is on whole testes including both the somatic and germ cells, which makes it possible that protein expression changes in somatic cells drive the results. The authors need to justify why and how the conclusions drawn from this analysis warrant such an assumption.

      Response:

      We agree with the reviewer that our proteomic analysis was performed on whole testis samples, which contain both germ and somatic cells. Although isolation of pure spermatocyte populations by FACS would provide higher resolution, obtaining sufficient prepubertal material for such analysis would require an extremely large number of animals. To remain compliant with the 3Rs principle for animal experimentation, we therefore used whole-testis samples from three biological replicates per age.

      We acknowledge that our assumption-that the main differences arise from germ cells-is a simplification. However, germ cells constitute the vast majority of testicular cells during this developmental window and are the population undergoing major compositional changes between 15 dpp and adulthood. It is therefore reasonable to expect that a substantial fraction of the observed proteomic changes reflects alterations in germ cells. We have clarified this point in the revised text and have added a statement noting that changes in somatic cells could also contribute to the proteomic profiles.

      The authors should provide details on how proteins were categorized as being involved in ciliogenesis or flagellogenesis, specifically in the distinction criteria. It is not clear how the categorizations were determined or whether they are valid. Thus, no one can repeat this analysis or perform this analysis on other datasets they might want to compare.

      Response:

      We thank the reviewer for this opportunity to clarify our approach. The categorization of protein as being involved in ciliogenesis or flagellogenesis was based on their Gene Ontology (GO) cellular component annotations obtained from the PANTHER database (Version 19.0), using the gene IDs of the Differentially Expressed Proteins (DEPs). Specifically, we used the GO terms cilium (GO:0005929) and motile cilium (GO:0031514). Since motile cilium is a subcategory of cilium, proteins annotated only with the general cilium term, but not included under motile cilium, were considered to be associated with primary cilia or with shared structural components common to different types of cilia. These GO terms are represented in the bottom panel of the Figure 6.

      This information has been added to the Methods section and referenced in the Results for transparency and reproducibility.

      In the pharmacological studies, the authors conclude that the phenotypes they observe (DNA damage and reduced pachytene spermatocytes) are due to loss of or persistence of cilia. This overinterprets the experiment. Chloral hydrate and MLN8237 certainly impact ciliation as claimed, but have additional cellular effects. Thus, it is possible that the observed phenotypes were not a direct result of cilia manipulation. Either additional controls must address this or the conclusions need to be more specific and toned down.

      Response:

      We thank the reviewer for this fair observation and have taken steps to strengthen and refine our interpretation. In the revised version, we now include data from 1-hour and 24-hour cultures for both control and chloral hydrate (CH)-treated samples (n = 3 biological replicates). The triple immunolabelling with γH2AX, SYCP3, and H1T allows accurate staging of zygotene (H1T⁻), early pachytene (H1T⁻), and late pachytene (H1T⁺) spermatocytes.

      The revised Figure 7 now provides a more complete and statistically supported analysis of DNA damage dynamics, confirming that CH-induced deciliation leads to persistent γH2AX signal at 24 hours, indicative of delayed or defective DNA repair progression. We have also toned down our interpretation in the Discussion, acknowledging that CH could affect other cellular pathways.

      As mentioned before, the conditional genetic model that we are currently generating will allow us to evaluate the role of cilia in meiotic DNA repair in a more direct and specific way.

      Assuming the conclusions of the pharmacological studies hold true with the proper controls, the authors still conflate their findings with meiotic defects. Meiosis is not directly assayed, which makes this conclusion an overstatement of the data. The conclusions need to be rephrased to accurately reflect the data.

      Response:

      We agree that this aspect required clarification. As noted above, we have refined both the Results and Discussion sections to make clear that our assays specifically targeted meiotic spermatocytes.

      We now present data for meiotic stages at zygotene, early pachytene and late pachytene. This is demonstrated with the labelling for SYCP3 and H1T, both specific marker for meiosis that are not detectable in non meiotic cells. We believe that this is indeed a way to assay the meiotic cells, however, we have specified now in the text that we are analysing potential defects in meiosis progression. We are sorry if this was not properly explained in the original manuscript: it is now rephrased in the new version both in the results and discussion section.

      It is not clear why the authors chose not to use widely accepted assays of Hedgehog signaling. Traditionally, pathway activation is measured by transcriptional output, not GLI protein expression because transcription factor expression does not necessarily reflect transcription levels of target genes.

      Response:

      We agree with the reviewer that measuring mRNA levels of Hedgehog pathway target genes, typically GLI1 and PTCH1, is the most common method for measuring pathway activation, and is widely accepted by researchers in the field. However, the methods we use in this manuscript (GLI1 and GLI3 immunoblots) are also quite common and widely accepted:

      Regarding GLI1 immunoblot, many articles have used this method to monitor Hedgehog signaling, since GLI1 protein levels have repeatedly been shown to also go up upon pathway activation, and down upon pathway inhibition, mirroring the behavior of GLI1 mRNA. Here are a few publications that exemplify this point:

      • Banday et al. 2025 Nat Commun. DOI: 10.1038/s41467-025-56632-0 (PMID: 39894896)
      • Shi et al 2022 JCI Insight DOI: 10.1172/jci.insight.149626 (PMID: 35041619)
      • Deng et al. 2019 eLife, DOI: 10.7554/eLife.50208 (PMID: 31482846)
      • Zhu et al. 2019 Nat Commun, DOI: 10.1038/s41467-019-10739-3 (PMID: 31253779)
      • Caparros-Martin et al 2013 Hum Mol Genet, DOI: 10.1093/hmg/dds409 (PMID: 23026747) *note: due to manuscript-length limitations, not all cited references can be included in the text; they are listed here to substantiate our response.

      As for GLI3 immunoblot, Hedgehog pathway activation is well known to inhibit GLI3 proteolytic processing from its full length form (GLI3-FL) to its transcriptional repressor (GLI3-R), and such processing is also commonly used to monitor Hedgehog signal transduction, of which the following are but a few examples:

      • Pedraza et al 2025 eLife, DOI: 10.7554/eLife.100328 (PMID: 40956303)
      • Somatilaka et al 2020 Dev Cell, DOI: 10.1016/j.devcel.2020.06.034 (PMID: 32702291)
      • Infante et al 2018, Nat Commun, DOI: 10.1038/s41467-018-03339-0 (PMID: 29515120)
      • Wang et al 2017 Dev Biol DOI: 10.1016/j.ydbio.2017.08.003 (PMID: 28800946)
      • Singh et al 2015 J Biol Chem DOI: 10.1074/jbc.M115.665810 (PMID: 26451044) *note: due to manuscript-length limitations, not all cited references can be included in the text; they are listed here to substantiate our response.

      In summary, we think that we have used two well established markers to look at Hedgehog signaling (three, if we include the immunofluorescence analysis of SMO, which we could not detect in meiotic cilia).

      These Hh pathway analyses did not provide any convincing evidence that the prepubertal cilia we describe here are actively involved in this pathway, even though Hh signaling is cilia-dependent and is known to be active in the male germline (Sahin et al 2014 Andrology PMID: 24574096; Mäkelä et al 2011 Reproduction PMID: 21893610; Bitgood et al 1996 Curr Biol. PMID: 8805249).

      That said, we fully agree that our current analyses do not allow us to draw definitive conclusions regarding Hedgehog pathway activity in meiotic cilia, and we now state this explicitly in the revised Discussion.

      Also in the Hedgehog pathway experiment, it is confusing that the authors report no detection of SMO yet detect little to no expression of GLIR in their western blot. Undetectable SMO indicates Hedgehog signaling is inactive, which results in high levels of GLIR. The impact of this is that it is not clear what is going on with Hh signaling in this system.

      Response:

      It is true that, when Hh signaling is inactive (and hence SMO not ciliary), the GLI3FL/GLI3R ratio tends to be low.

      Although our data in prepuberal mouse testes show a strong reduction in total GLI3 protein levels (GLI3FL+GLI3R) as these mice grow older, this downregulation of total GLI3 occurs without any major changes in the GLI3FL/GLI3R ratio, which is only modestly affected (suppl. Figure 6).

      Hence, since it is the ratio that correlates with Hh signaling rather than total levels, we do not think that the GLI3R reduction we see is incompatible with our non-detection of SMO in cilia: it seems more likely that overall GLI3 expression is being downregulated in developing testes via a Hh-independent mechanism.

      Also potentially relevant here is the fact that some cell types depend more on GLI2 than on GLI3 for Hh signaling. For instance, in mouse embryos, Hh-mediated neural tube patterning relies more heavily on GLI2 processing into a transcriptional activator than on the inhibition of GLI3 processing into a repressor. In contrast, the opposite is true during Hh-mediated limb bud patterning (Nieuwenhuis and Hui 2005 Clin Genet. PMID: 15691355). We have not looked at GLI2, but it is conceivable that it could play a bigger role than GLI3 in our model.

      Moreover, several forms of GLI-independent non-canonical Hh signaling have been described, and they could potentially play a role in our model, too (Robbins et al 2012 Sci Signal. PMID: 23074268).

      We have revised the discussion to clarify some of these points.

      All in all, we agree that our findings regarding Hh signaling are not conclusive, but we still think they add important pieces to the puzzle that will help guide future studies.

      There are multiple instances where it is not clear whether the authors performed statistical analysis on their data, specifically when comparing the percent composition of a population. The authors need to include appropriate statistical tests to make claims regarding this data. While the authors state some impressive sample sizes, once evaluated in individual categories (eg specific cell type and age) the sample sizes of evaluated cilia are as low as 15, which is likely underpowered. The authors need to state the n for each analysis in the figures or legends.

      We thank the reviewer for highlighting this important issue. We have now included the sample size (n) for every analysis directly in the figure legends. Although this adds length, it improves transparency and reproducibility.

      Regarding the doubts of Ref#3 about the different sample sizes, the number of spermatocytes quantified in each stage is in agreement with their distribution in meiosis (example, pachytene lasts for 10 days this stage is widely represented in the preparations, while its is much difficult to quantify metaphases I that are less present because the stage itself lasts for less than 24hours). Taking this into account, we ensured that all analyses remain statistically valid and representative, applying the appropriate statistical tests for each dataset. These details are now clearly indicated in the revised figures and legends.

      Minor concerns:

      1. The phrase "lactating male" is used throughout the paper and is not correct. We assume this term to mean male pups that have yet to be weaned from their lactating mother, but "lactating male" suggests a rare disorder requiring medical intervention. Perhaps "pre-weaning males" is what the authors meant.

      Response:

      We thank the reviewer for noticing this terminology error. The expression has been corrected to "pre-weaning males" throughout the manuscript.

      The convention used to label the figures in this paper is confusing and difficult to read as there are multiple panels with the same letter in the same figure (albeit distinct sections). Labeling panels in the standard A-Z format is preferred. "Panel Z" is easier to identify than "panel III-E".

      Response:

      We thank the reviewer for this suggestion. All figures have been relabelled using the standard A-Z panel format, ensuring consistency and easier readability across the manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (R1)

      R1 General statement: Here, Escalera-Maurer and colleagues, present an up-to-date distribution of homologues of Hok toxic proteins belonging to the well-annotated, but otherwise functionally obscure, hok/Sok type I toxin-antitoxin system, across the RefSeq database. Although such computational analyses have been done in the past, the authors here find many more hok homologs than described before, and they categorise their distribution based on whether they are encoded on chromosomes, plasmids, or (pro)phages. These computational analyses are in general tricky with T1TAs, as their toxins are quite short (~50 amino acids, as is the case for Hok), which is why the authors here used three separate approaches to expand their search (nucleotide-level BLAST, protein-homology, or both combined with Infernal). The authors cluster the Hok homologues they find based on a 60% sequence identity cut-off (expanding the known clusters in the process), and proceeded to test 31 candidates belonging to 15 sequence-clusters for their toxicity in Salmonella Typhimurium LT2, showing that 30/31 were toxic upon induction. An interesting finding from their endeavours is that hok/Sok homologues are enriched within prophages and large plasmids, but are not enriched near bacterial anti-phage defense systems (in contrast to the SymE/SymR T1TA). The findings suggest that hok/Sok are indeed sometimes linked to phage and plasmid biology, although they might not be antiphage defenses per se (they have been clearly shown in the past to be addiction modules, and this is still clearly true).

      Authors' answer to R1 General statement: __We do not state here that hok/Sok are not anti-phage defense systems, but we simply observe that they do not cluster with anti-phage defense systems. We have also observed (unpublished data) that known defense systems do not systematically cluster together with other defense systems. Therefore, strong association with other defense systems would have been a strong indication of their function in phage defense but the fact that we did not observe any association with defense systems does not exclude they are involved in phage defense. __

      R1_C1: My expertise lies towards the experimental side of the authors' work, I thus cannot comment on the accuracy/robustness of the computational analyses performed here. The authors do a fine job in clearly stating their findings overall; I could follow most of the conclusions, and I deemed that most of them were supported by their work. Additionally, I find that this paper is a missed opportunity to uncover even more novel biology connected to the interesting hok/Sok T1TAs. The paper does not provide a new framework to think about what is the function of the chromosomal/prophage hok/Sok T1TA systems, although I realize that this is very difficult to accomplish, especially when considering that hok/Sok systems have been around in the literature for almost 40 years.

      Authors' answer to R1_C1: We agree with the reviewer, as we indeed performed this analysis having in mind to clarify the role of hok/Sok systems. However, we still believe that our strong survey of Hok loci put in light their enrichment in various mobile genetic elements, such as prophage and large conjugative plasmids, which is indubitably linked to their function. In addition, our study will guide future experimental efforts in uncovering the function of these systems, for example by helping researchers to select relevant homologs to test for a specific function.__ __

      R1_C2: My major comment is in regard to the Hok toxicity assays (Fig. 2). The authors state in the discussion that "Hok peptides originating from chromosomes are as toxic as those from plasmids", but I believe that the way that they tested their constructs might not have allowed them to see toxicity differences between the two groups. Specifically, using the multi-copy plasmid pAZ3 (pBR322 origin of replication; ~15-20 plasmid copies per chromosome) to induce the different Hok toxin homologues in Salmonella Typhimurium LT2 with arabinose might have masked toxicity differences that would otherwise be apparent on the chromosomal expression-level.

      Some of the authors themselves have previously used the FASTBAC-Seq method to study the Hok homologue from plasmid R1, a useful technique during which a toxin is integrated in the chromosome, in order to study their toxicity under natural levels of expression. I believe that an ideal scenario would be to apply FASTBAC-seq to some of the 31 Hok homologues described here (e.g., a subset of plasmidic vs chromosomal Hok homologues) to shed light on potential toxicity differences between the Hok clusters. This would increase the value of the presented study.

      Alternatively, the authors could employ an L-arabinose concentration gradient to titrate the expression levels of the Hok toxins in order to potentially see different toxicity levels from the different homologues. However, this is not going to work in the system as they are using it now for two reasons:

      1. a) the S. Typhimurium LT2 (STm) used here has its arabinose utilization operon intact (araBAD), which means that Salmonella can catabolize arabinose to use it as a carbon source. This catabolization process interferes with the arabinose induction (i.e., Salmonella eats arabinose instead of using it as the Hok inducer). To ameliorate this, the authors could delete the araBAD operon in STm, rendering STm incapable of catabolizing arabinose, and repeat the experiments in that strain. Or use E. coli BW25113 as the expression host, which already has the araBAD operon deleted (it is not clear to me why the different Hok homologues would not be toxic in E. coli, as the different Hok homologues are widely diverse in sequence, as the authors found here).
      2. b) Even with the araBAD operon deleted, the arabinose induction would be bimodally on or off in the population, due to the bimodal expression of the arabinose transporter (AraE; see Khlebnikov et al., 2002). This would again not allow for titratable arabinose-inducible expression from different concentrations of arabinose. The solution for this would be to co-express a separate plasmid with araE, which would render every cell the same in regards to arabinose permeability, and thus the system would be titratable (as explained in Khlebnikov et al., 2002). Therefore, if the authors would be interested to go towards this route, they would have to first delete the araBAD from STm, then transform STm with an araE plasmid, and redo the experiments. In addition, I would propose to the authors to use the drop plate method (agar plate-based), which is more sensitive compared to the liquid assays employed here.

      Having said all that, I understand that all this experimental work would be strenuous and time-consuming, and although I would like to see it happen, this is not my paper. I would be content therefore if the authors toned down the claim that plasmidic vs chromosomal Hok homologues have the same toxicity, and discuss that chromosomal levels of toxicity are an important caveat that has not been explored here.

      __Authors' answer to R1_C2: __ We thank the reviewer for the detailed suggestion on how to better assess toxicity differences by using an araBAD deletion mutant overexpressing araE. We repeated the arabinose induction assays using drop assays and strain BW25223 with plasmid pJAT13araE and our pAZ3 based plasmid carrying Hok CDS homologs. However, we obtained similar data, not being able to distinguish between the toxicity of chromosomal versus plasmidic CDS, even using different concentration of Arabinose. This is probably because low concentration of the Hok protein are sufficient for activity, but here we are bypassing all post-transcriptional silencing by the native Hok mRNAs by expressing directly the protein, and we are using a multicopy plasmid. We now included 0.01% arabinose induction drop assays in the manuscript as the data obtained with other arabinose concentration did not provide new information. In any case, we are still not accessing the native expression levels for the following reasons 1/ chromosomal level of toxicity were not explored here and 2/ only the toxicity of the coding sequence but not the full mRNA was tested. Indeed, we do not know the exact sequence of the hok homolog mRNAs and this is beyond the scope of the study. These remarks were clearly added in the discussion.

      We agree that the sentence "Hok peptides originating from chromosomes are as toxic as those from plasmids" was too strong and we have added the caveats of our experimental design in the discussion. While we indeed did not compare the toxicity of the peptides, we still showed that chromosomal Hok can be toxic upon overexpression, which would not be the case if the sequences were degenerated.

      The reviewer also suggests the use of the FASTBAC-Seq method, that we previously used to study Hok from the R1 plasmid, which is a method to study toxic type I toxins at the native expression level. While FASTBAC-Seq identifies loss-of-function mutants of the systems, it does not allow to determine a difference of toxicity between systems per se. In addition, FASTBAC-Seq was always done in the context of the full mRNA, not only the coding sequence, and these sequences are presently unknown for most homologs.

      Other comments:

      __R1_C3: __a) There is barely any discussion of the Sok component (RNA antitoxin) of the homologues; why is that? Could you please discuss Sok differences across the homologues, or at least explain why this is not discussed at all in the paper (e.g., in the discussion)?

      Authors' answer to R1_C3: __It is not trivial to identify the Sok RNA sequence, this is why it was not done in this study, a paragraph was added in the discussion explaining this. __

      __R1_C4: __b) In the results section, the Hok clusters are referred to as 62 in number ("Because Hok sequences were too short and variable to construct a meaningful phylogenetic tree, we clustered the Hok sequences with a 60% identity threshold and obtained 62 clusters"), but then in the discussion section, the cluster number becomes 74 ("We highlighted the high sequence variability within Hok peptides by obtaining a total of 74 clusters with 60% identity (Fig. S7)."). Which one is the right number, and why is there a discrepancy?

      Authors' answer to R1_C4: We apologize for the discrepancy between the number. The first number corresponded to the Hok hits from the refSeq and we then added the Hok hits from the plasmid and virus databases (performed later in the manuscript). We clarified this information both in the result and discussion texts (61 clusters from RefSeq and 79 in total, 74 was a typo).__ __

      __R1 Significance: __The most well-clarified aspect of the paper presented here is the distribution of Hok homologues, with the novel aspect of the location in which the hok/Sok T1TAs reside (i.e., chromosome, plasmid, or phage). There is room for the molecular genetics part to be developed further, as I discussed earlier, however this study is the most up-to-date characterization of the diversity of Hok homologues, and will be of interest to the T1TA and the general toxin-antitoxin field.

      __Reviewer #2 (R2) __

      R2 General statement: The authors examined how the Hok toxins are spread across bacterial genomes. The manuscript including its figures is hard to read and understand. I commented figure 1 in details, but similar comments apply to the other figures. Overall, the data lack clarity and precision. Finding information about sequences, clusters in the supplementary materials was not easy. The manuscript should be thoroughly revised. In addition, I believe that other aspects should be developed to expand the interest of the study, such as the co-occurrence of multiple systems in chromosomes, on plasmids and whether they are able to crosstalk. This might provide some evolutionary insights into the biology of these toxins.

      __Authors' answer to R2 General statement: __We designed all figures according to established standards for scientific data visualization, although we recognize that different presentations may work better for different audiences. In our detailed response to Figure 1A, we explain how UpSet plots are constructed and interpreted, which we hope clarifies the visualization approach for the full dataset. We are open to discussing specific improvements if the reviewer has suggestions for enhanced clarity. To address concerns about accessibility, we want to clarify that all sequences are compiled in Table S1 with their clus100 identifiers, making them easy to locate. We are open to reorganizing supplementary materials if a different structure would be more user-friendly. Finally, we agree that an extensive analysis of co-occurrences and crosstalks would be valuable. However, predicting crosstalk bioinformatically for all genomes presents challenges, as it would require predicting RNA:RNA interactions between hok mRNA and Sok sequences, which are currently unknown. Given these limitations, this analysis was beyond the scope of the current study.

      R2_C1: The introduction lacks information regarding the Hok protein (size, structure prediction, localization) as well as a bit of explanation about the reason of looking at these toxins. The description of the potential roles should be a bit expanded.

      Authors' answer to R2_C1: Following the comment from the reviewer, we have provided additional information about Hok in the introduction.

      __R2_C2: __When the authors talk about 'loci', they mean genes encoding Hok homologs if I understand correctly. They did not look for the Sok sequences (hok-sok loci).

      __Author's answer to R2_C2: __Indeed, we did not look for the Sok sequences and we are only describing Hok homologs loci, that could either encode or lack a Sok homolog.

      __R2_C3: __It is not clear what the authors did with the sequences for which they could not detect a start codon and a SD (although it is unusual to refer to SD in the context of protein sequence)

      Authors' answer to R2_C3: The peptides were annotated by extending the initial hit until the first start codon. Therefore, all annotated peptides have a start codon. Shine-Dalgarno sequences were annotated when confidently predicted, to provide additional information. Sequences were not excluded based on the presence or absence of the SD.

      __R2_C4: __Figure 1A is not clear. The total of the bars equal 32,532 which is the number of 'loci' detected by the combination of the different methods. However, it is not clear to me how many are redundant. For instance, I suppose that all the 8483 sequences that were retrieved using blastn and Infernal were retrieved using MMseqs2, blastn and Infernal. So, what is the actual number of sequences that were found? When the authors talk about 1264 distinct peptides, what do they mean? What are the numbers on the X axis (18209, 2260, 27728)?

      Author's answer to R2_C4: Figure A1 is a very typical "UpSet" plot, as indicated in the legend (A. Lex, N. Gehlenborg, H. Strobelt, R. Vuillemot and H. Pfister, "UpSet: Visualization of Intersecting Sets," in IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 12, pp. 1983-1992, 31 Dec. 2014, doi: 10.1109/TVCG.2014.2346248). Those plots are a data visualization method for showing data with more than two intersecting sets. The Hok sequence hits were obtained by 3 different methods stated on the rows (MMseqs2, blastn and Infernal, therefore the number 18209 is the number of hits by the MMseqs2, 22680 the number of hits by blastn and 27728 the number of hits by Infernal). The columns show the intersections between these three sets. For example, the mentioned 8483 sequences (second column) were only found by blastn and Infernal but not by MMseqs2. The actual total number of sequences found is indeed 32 532. The 1264 distinct peptides are peptides with different sequences. After removing false positives, degenerated sequences and small peptides, we obtained 1264 unique Hok sequences that are found in the 32532 bacterial loci.

      __R2_C5: __About Infernal: first the authors are stating that only 8% of the sequences are lost when not considering the mRNA structure - which they seem to consider as negligeable. Then in the next section, they state that Infernal is the best tool at identifying clusters that are not detected otherwise. Seems a bit contradictory.

      __Authors' answer to R2_C5: __We appreciate the reviewer pointing out this apparent contradiction, we have clarified this part in the revised manuscript. Infernal uses both sequence and structure information simultaneously for homology detection. While only 8% of Infernal's hits are detected uniquely when structural information was considered, these sequences account for 9 additional clusters with notably high sequence diversity, which would otherwise have been undetected. Therefore, we believe that Infernal is the best tool to capture novel cluster diversity.

      __R2_C6: __Cluster determination. The threshold was put at 60% identity. What is the rationale for the 60% identity? Given that the Hok sequences (like toxins and antitoxins from TA systems in general) are highly variable, this leads to a high number of clusters. I'm not sure of the relevance of these clusters. Are there any other criteria to define clusters?

      Authors' answer to R2_C6: We selected 60% identity as a balance between capturing sequence diversity and generating interpretable results. We also tested 70, 80 and 90% and obtained 128, 221, 377 clusters, respectively, which would be too many for a meaningful visualization and interpretation. The best clustering method would be constructing a phylogenetic tree. However, as explained in the discussion, because the high sequence diversity prevented the construction of a reliable phylogenetic tree, clustering was used as an alternative strategy to identify and interpret patterns of sequence variability.

      __R2_C7: __The authors claim that most of the Hok diversity is found on chromosomes. However, the number of chromosomal Hok is higher than that located on plasmids, which might be related to the different sizes of the different replicons ie, chromosomes being larger than plasmids. Is there a way to normalize by determining the density per size?

      Authors' answer to R2_C7: We do not claim that chromosomes contain most of Hok diversity, as this would be indeed influenced by biases in the databases. We are just describing that we found most of the diversity in chromosomes, but we cannot conclude whether this is a true representation of the frequencies in nature.__ __

      R2_C8: '46 of the 62 clusters contained 10 or less distinct sequences and might be in the process of degenerating'. The authors also linked this with SD detection. Please explain. From what was indicated earlier, I understand that sequences with premature stop codons or short sequences (Authors' answer to R2_C8: We did not remove sequences for which we could not predict the SD. Indeed, lacking SD is a sign that the hok mRNA might not be able to play its biological role and would be indicative that the sequences have degenerated. To evaluate this hypothesis, we experimentally tested 5 sequences without a predicted SD and two of those were not toxic (see Table S2). In order to assess if the low abundant clusters contained degenerated sequences we experimentally tested representatives from some of the clusters with only one Hok CDS and found most of them to be toxic.

      R2_C9: 'Only 7.3% of the unique sequences were found on both plasmids and chromosomes'. From this observation, the authors conclude that 'there is little stable transfer from chromosomes to plasmids or vice-versa'. I don't understand what this means. Do they mean identical sequences? The fact that sequences differ from chromosomes to plasmids does not rule out 'stable transfer'. What do they actually mean by stable transfer? Once the gene is horizontally transferred, it is fixed and vertically transmitted? Same comments apply to the inter-genera horizontal transfer by plasmids.

      __Authors' answer to R2_C9: __Due to the impossibility of constructing a reliable phylogenetic tree, we used identity of sequences across different localizations or genera as our marker for recent, stable transfer events. We define stable transfer as the persistence of sequences in an unchanged form following horizontal transfer; long enough to be detected in current databases. Our approach likely underestimates total transfer events, as sequences accumulating mutations after transfer would not be captured. We would expect to observe numerous identical sequences across plasmids and chromosomes if frequent exchange were occurring, unless rapid mutation after the transfer prevented their detection as identical sequences. We have added a sentence to clarify this in the manuscript and removed the term stable transfer.

      __R2_C10: __I don't understand the next section about 'family'. What do the authors mean about 'family'? Genera? The same apply to the next section about the Y to C recoding. Did the authors do point mutations in the conserved amino acids/codons to test whether they are important for toxicity? Some Hok variants lacks some of the conserved amino acids and are toxic (under overexpression conditions in Salmonella). What about T18, C31 and E42?

      Authors' answer to R2_C10: Families (Enterobacteriaceae, Vibrionaceae etc... ) and genera (Escherichia, Salmonella etc...) refer to the taxonomic categories. Following the reviewer comment, we experimentally assessed the toxicity of Hok from R1 plasmid after mutating the conserved amino acids to alanine residues. All the mutants were found to be toxic under our expression conditions.

      __R2_C11: __The prevalence of Hok in chromosomes or on plasmids might depend on various confounding parameters, such as the size, number of sequences available among others. The authors should find methods to correct for all that.

      Authors' answer to R2_C11: Normalization would indeed be needed if we were comparing the prevalence on chromosomes vs the prevalence on plasmids. Here, we do not claim that Hok homologs are more prevalent in plasmid or chromosomes and only describe where we found them.

      __R2_C12: __Link with defense systems. The threshold was set at 20 kb. Why this threshold?

      Authors' answer to R2_C12: The size of defense islands in a previous report was approximately 40 kb, by setting up a 20 kb threshold we searched for defense systems in a region of 40 kb adjacent to each of the homologs (https://doi.org/10.1126/science.aar4120). If the specific homolog was part of a defense island we would expect that it is less than 20 kb apart from any defense system.

      __R2 Significance: __The paper in its current state appears to serve the role of a data repository rather than a thorough and original analysis. It requires extensive revisions before it can be of interest to experts in the toxin-antitoxin field.

      __ ____Reviewer #3 (R3): __

      R3 General statement: In the manuscript, "The Hok bacterial toxin: diversity, toxicity, distribution and genomic localization," by Escalera-Maurer et al., investigate the distribution of Hok type I toxin proteins across bacterial species. The Hok-Sok type I toxin-antitoxin system was first described on plasmids where it serves to maintain the plasmid in a population of bacterial cells: translation of the hok mRNA is prevented via the small antitoxin RNA Sok. Upon plasmid loss, with no new transcription of sok, the highly stable hok mRNA is translated into a small protein, killing the plasmid-less cell. Homologues to the system were identified in the chromosome of E. coli in the 1990s, and subsequent analyses have identified identical systems in other bacterial chromosomes, though they are close relatives to E. coli. Given the increased number of bacterial genomes sequenced, the group examined how widespread Hok may be across bacteria. They used a combination of BLASTn, MMseqs2 (protein) and Infernal (RNA) to identify, as best possible, all possible homologs. They then used sequence identity cut-offs to form Hok "clusters," and identified key features of the cluster as well as tested toxicity of overproduction of 31 homologs in a strain of Salmonella. Overall, though a variety of bioinformatic predictions and analyses, the manuscript identifies an expanded number of Hok members not previously identified and broaden the species it is found in, supported that Hok is not associate with defense systems, and provides additional support that horizontal transfer of hok genes is likely via plasmids (where hok is presumed to have originated).

      Major comments: There are some areas of the text that are a bit too definitive (these can be fixed or better explained in the text) and a few questions raised about the analyses and interpretations.

      Authors' answer to R3 Major Comment: As suggested by the reviewer, we rephrased parts of the manuscript.

      __These are the specific comments: __

      Introduction R3_C1: First paragraph: "Toxin production leads to the death of the cell encoding it" For many chromosomally encoded systems, toxicity has only been observed via artificial overexpression. This is an important point, as for many systems, a true biological function remains unknown. Further, add caveats regarding toxin function (for systems with validated function, they are involved in...). Again, there are still many questions for many t-at systems, in particular the Type I systems.

      __Authors' answer to R3_C1: __Indeed, the function of type 1 TA, in particular chromosomal ones, is still a matter of debate. While for hok/Sok R1, we previously showed death by expression at the chromosomal level, this was not shown for all TA (Le Rhun et al., NAR, 2023). We added that it could lead to the death or growth arrest of the cell instead and added the reviewer changes to for the function part.

      __R3_C2: __Introduction: type I's are more narrow in distribution, but much of this is due to their size and lack of biochemical domains. Again, please clarify more here.

      __Authors' answer to R3_C2: __We added the reviewer suggestion to the text.

      __R3_C3: __Introduction: while Hok's have been found on chromosomes, in E. coli strains, there is clear evidence that many are inactive. This comes up in the discussion, but it is worth including briefly in the introduction.

      Authors' answer to R3_C3: We have now added in the introduction that in the K12 laboratory strain, most chromosomal hok/Sok were found to be inactive.

      __R3_C4: __For the predicted transmembrane domain: it would be worth to include a box/indication as to where that is within the peptide (with the understanding it may not be exact). Is there more/less variation here? I'm assuming all clusters/family have a predicted TM domain?

      __Authors' answer to R3_C4: __When predicting the TM domain using DeepTMHMM - 1.0 prediction (https://services.healthtech.dtu.dk/services/DeepTMHMM-1.0/), 227 out of the 1264 unique Hok sequence are predicted to have a TM (transmembrane), 7 a SP (signal peptide) and a TM and 1025 have a SP. When predicting the TM of the consensus sequence (most abundant amino-acid) shown in Fig. 1D, region A8 to L25 is predicted to be inserted in the membrane, with the Nterm inside and Cterm outside.

      __R3_C5: __What is the cutoff for being a Hok? Did they take the "last hit" and use that in additional searches to see if more appeared? If that was done, and the search was exhaustive, this really important to add for the reader.

      Authors' answer to R3_C5: The MMseqs2 search was performed using 5 iterations as indicated in the M&M, meaning that the hits of the one search were used to search the database again five time in a raw. Importantly, an attempt to increase the number of iterations to 10 did not significantly increase the number of hits. Therefore, at least for the MMseqs2 search in the RefSeq database, we are close to being exhaustive.

      __R3_C6: __Figure S4: the authors state that there was no difference in the degree of toxicity between the clusters. There do appear to be some peptides tested that at the arabinose concentration used did not repress growth as immediately as others. If higher arabinose concentration is used, does that eliminate these differences? OR are many of these suppressors-if diluted back again, do they grow as if they are non-toxic in arabinose?

      Authors' answer to R3_C6: As suggested by Reviewer 1 (R1_C2), we performed titration of arabinose in a system overexpressing araE in a ΔaraBAD but were not able to find difference of toxicity in our conditions, see also our answer to R1_C2.

      __R3_C7: __Discussion: "because non-functional homologs are expected to quickly accumulate mutations..." is a bit problematic. Hok is highly regulated-as are some of the other well-described type I toxins. In MG1655, while the coding sequence may be intact, there are other mutations and/or insertion elements that prevent expression (and be extension, function. Given the lack of consensus data for type Is, it is best to provide more context for this. If the authors wish to argue that they should quickly accumulate mutations, it would be good to provide additional rates/evidence (even for other loci) from the Enterobacteriaceae.

      __Authors' answer to R3_C7: __We agree this statement might need to be supported further. We have removed this sentence to address this concern.

      __Minor comments: __

      __R3_C8: __For the sequences used in the search: please provide the sequence used in addition to the reference to the T1TAdb. Was the full-length hok mRNA, including mok, used? Please provide the nucleic acid sequence (and include description of whether full-length, etc.) in Materials and Methods or in Supplemental.

      __Authors' answer to R3_C8: __Sequences and code were deposited on https://gitub.u-bordeaux.fr/alerhun/Escalera-Maurer_2025. This files named curated_Hok.fasta and hok.fa, corresponding to Hok protein and mRNA sequences respectively are available in the file "T1TAdb input".

      __R3_C9: __60% identity was used for clustering. Did this become a problem-meaning separation of same property amino acid?

      __Authors' answer to R3_C9: __We checked amino acid signatures for each cluster (Fig S2), but could not find anything relevant.

      __R3_C10: __Fig. S2: for the clusters shown, please add in HokB, HokE, etc., to better correspond to Figure 1 in the main text.

      __Authors' answer to R3_C10: __The clusters were annotated according to the suggestion.

      __R3_C11: __Fig S1: this figure is challenging to orient-what are the numbers (8_10_85)?

      Authors' answer to R3_C11: The figure was generated using the CLANS tool, with each unique sequence retrieved by our analysis shown as a dot. Hok homologous sequences are in red and cluster together, the outlier clusters are annotated with the numbers corresponding to their 60% identity cluster. We understand that separating the number using an underscore could lead to confusion, therefore we have now separated the numbers using a coma.

      __R3_C12: __Please make a separate table or sheet for the experimentally tested peptides. Table S1 is quite large and a separate table/sheet would make this easier to find. If possible, please give the files names a more descriptive title (Table S1 in the name for example). This may be an issue with Review Commons but the individual file names were non-descript and the descriptions on the webpage did not indicate what the file contained.

      __Authors' answer to R3_C12: __We named the files Table S1 and File_S1 to S7. We added a table S2 with the experimentally tested peptides. Note that identical peptides can be sometime found in several bacterial loci.

      __R3_C13: __Figure S9: the black arrow for Hok is hard to see-it appears that the long grey bar going through multiple loci is indicative of Hok. Perhaps label this differently to make it easier on the reader (the line initially seemed to be a formatting issue and not indicative of the position of Hok.

      __Authors' answer to R3_C13: __We have now added a new label to indicate where is Hok, and clarified it in the figure legend.

      __R3_C14: __While the authors focused on Hok for this approach, which is fine and appropriate, can they comment at all about where mok is there in these new clusters/sub-families? Sok potential?

      __Authors' answer to R3_C14: __We added a paragraph about Mok in the discussion.

      __R3 Significance: __Overall the paper is a sound bioinformatic exercise and is improved with the testing of numerous "new" Hok proteins. Most of the comments can be done with some clarifications and maybe some additional analyses and/or verification which should take minimal time. The authors are over-emphatic at points as indicated and need to be more careful and precise with their language.

      In terms of advancement, it advances the distribution of these systems and adds to the depth of sub-classes. The audience will be more specialized to those who study these systems.

      Expertise: I have been studying type I toxin-antitoxin systems since the mid-2000s. We published a study examining (and mentioned well by this article!) the distribution in chromosomes of type I toxin-antitoxin systems, identified brand-new systems (that were chromosomally-limited at the time). My lab has continued to study regulation of type I toxins and distribution of chromosomally-only-encoded systems (so not Hok).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors devote significant effort to characterizing the physical interaction between Bicc1 and Pkd2. However, the study does not examine or discuss how this interaction relates to Bicc1's well-established role in posttranscriptional regulation of Pkd2 mRNA stability and translation efficiency.

      The reviewer is correct that the present study has not addressed the downstream consequences of uthis interaction considering that Bicc1 is a posttranscriptional regulator of Pkd2 (and potentially Pkd1). We think that the complex of Bicc1/Pkd1/Pkd2 retains Bicc1 in the cytoplasm and thus restrict its activity in participating in posttranscriptional regulation (see Author response image 1). We, however, do not yet have data to support this and thus have not included this model in the manuscript. Yet, we have updated the discussion of the manuscript to further elaborate on the potential mechanism of the Bicc1/Pkd1/Pkd2 complex.

      We have updated the discussion to include a discussion on the potential consequences on posttranscriptional regulation by Bicc1.

      Author response image 1.

      Model of BICC1, PC1 and PC2 self-regulation. In this model Bicc1 acts as a positive regulator of PKD gene expression. In the presence of ‘sufficient’ amounts of PC1/PC2 complex, it is tethered to the complex and remains biologically inactive (Fig. 1A). However, once the levels of the PC1/PC2 complex are reduced, Bicc1 is now present in the cytoplasm to promote expression of the PKD proteins, thereby raising their levels (Fig. 4B), which then in turn will ‘shutdown’ Bicc1 activity by again tethering it to the plasma membrane.

      (2) Bicc1 inactivation appears to downregulate Pkd1 expression, yet it remains unclear whether Bicc1 regulates Pkd1 through direct interaction or by antagonizing miR-17, as observed in Pkd2 regulation. This should be further examined or discussed.

      This is a very interesting comment. Vishal Patel published that PKD1 is regulated by a mir-17 binding site in its 3’UTR (PMID: 35965273). We, however, have not evaluated whether BICC1 participates in this regulation. A definitive answer would require utilization of the mice described in above reference, which is beyond the scope of this manuscript. We, however, have revised the discussion to elaborate on this potential mechanism. 

      We have updated the discussion to include a statement on the potential direct regulation of Pkd1 mRNA by Bicc1.

      (3) The evidence supporting Bicc1 and ADPKD gene cooperativity, particularly with Pkd1, in mouse models is not entirely convincing, likely due to substantial variability and the aggressive nature of Bpk/Bpk mice. Increasing the number of animals or using a milder Bicc1 strain, such as jcpk heterozygotes, could help substantiate the genetic interaction.

      We have initially performed the analysis using our Bicc1 complete knockout, we previously reported on (PMID 20215348) focusing on compound heterozygotes. Yet, similar to the Pkd1/Pkd2 compound heterozygotes (PMID 12140187) no cyst development was observed when we sacrificed the mice as late as P21. Our strain is similar to the above mentioned jcpk, which is characterized by a short, abnormal transcript thought to result in a null allele (PMID: 12682776). We thank the reviewer for pointing us to the reference showing the heterozygous mice exhibit glomerular cysts in the adults (PMID: 7723240). This suggestion is an interesting idea we will investigate. In general, we agree with the reviewer that a better understanding of the contribution of Bicc1 to the adult PKD phenotype will be critical. To this end, we are currently generating a floxed allele of Bicc1 that will allow us to address the cooperativity in the adult kidney, when e.g. crossed to the Pkd1<sup>RC/RC</sup> mice. Yet, these experiments are beyond the timeframe for this revision. 

      No changes were made in the revised manuscript. 

      Reviewer #2 (Public review):

      (1) These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed. 

      As mentioned above, the study was designed to explore whether there is an interaction between BICC1 and the PKD1/PKD2 and whether this interaction is functionally important. How this translates into the clinical relevance will require additional studies (and we have addressed this in the discussion of the manuscript).

      (2) The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been. 

      We respectfully disagree with the reviewer on the latter interpretation. The study was performed with rigor. We have carefully assessed the critiques raised by the reviewer. As presented below, most of the criticisms raised by the reviewer have been easily addressed in the revised version of the manuscript. Yet, none of the critiques seems to directly impact the overall interpretation of the data. 

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript requires further editing. For example, figure panels and legends are mismatched in Figure 1

      We have corrected the labeling of Figure 1. 

      (2) Y-axis units and values are inconsistent in Figures 4b-4g, Supplementary Figures S2e and S2f are not referenced in the text, genotypes are missing in Supplementary Figure S3f, and numerous typographical errors are present.

      In respect to the y-axis in Figure 4b-g, the scale is different for each of them, but that is intentional as one would lose the differences if they were all scaled identically. But we have now mentioned this in the figure legend to make the reader aware of it. In respect to the Supplemental Figure S2e,f, we included the panels in the description of the mutant BICC1 lines, but unfortunately forgot to reference them. This has now been done.

      We have updated the labeling of the Y-axis for the cystic indices adding “[%]” as the unit and updated the figure legend of Figure 4. We have included the genotypes in Supplementary Figure S3f. The Supplementary Figure S2e,f is now mentioned in the supplemental material (page 9, 2<sup>nd</sup> paragraph). 

      Reviewer #2 (Recommendations for the authors):

      (1) Previous data from mouse, Xenopus, and zebrafish suggest a crucial role for the RNAbinding protein Bicc1 in the pathogenesis of PKD, although BICC1 mutations in human PKD have not been previously reported." The cited sources (and others that were not cited) link Bicc1 mutations to renal cysts, similar to a report by Kraus (PMID: 21922595) that the authors cite later. However, a more direct link to PKD was reported by Lian and colleagues using whole Pkd1 mice (PMID: 20219263) and by Gamberi and colleagues using Pkd1 kidneys and human microarrays (PMID: 28406902). Although relevant, neither is cited here, and only the former is cited later in the manuscript.

      Thanks for pointing this out. We have added these three citations.

      We have added these three citations (PMID: 21922595, PMID: 20219263 and PMID: 28406902) in the indicated sentence.

      (2) In Figure 1B, the lanes do not seem to correspond among panels, particularly evident in the panel with myc-mBicc1. Hence, it is difficult to agree with the presented conclusions.

      We have corrected the labeling of the lanes in Figure 1b.

      (3) In the Figure 1 legend: "(g) Western blot analysis following co-IP experiments, using an anti-mouse Bicc1 or anti-goat PC2 antibody as bait, identified protein interactions between endogenous PC2 and BICC1 in UCL93 cells. Non-immune goat and mouse IgG were included as a negative control." There is no mention of panel H, although this reviewer can imagine what the authors meant. The capitalization differs in the figure and legend. More troublingly, in panel G, a non-defined star indicates a strong band present in both immune and non-immune control.

      We have corrected the figure legend of Figure 1 and clarified the non-specific band in the figure legend.

      (4) In Figure 4, the authors do not show the matched control for the Bicc1 Pkd1 interaction in panel d, nor do they show a scale bar in either a) or d). Thus, the phenotypic severity cannot be properly assessed.

      Thanks for pointing out the missing scale bars, which have now been added. In respect to the two kidneys shown in Figure 4d, the two kidneys shown are from littermates to illustrate the kidney size in agreement with the cumulative data shown in Figure 4e. Unfortunately, this litter did not have a wildtype control. As the data analysis in Figure 4e is based on littermates, mixing and matching kidneys of different litters does not seem appropriate. Thus, we have omitted showing a wildtype control in this panel. However, the size of the wildtype kidney can be seen in Figure 4a.

      We have added the scale bar to both panels and have updated the figure legend to emphasize that the kidneys shown are from littermates and that no wildtype littermate was present in this litter.

      (5) "Surprisingly, an 8-fold stronger interaction was observed between full-length PC1 and myc-mBicc1-ΔKH compared to mycmBicc1 or myc-mBicc1-ΔSAM." Assuming all the controls for protein folding and expression levels have been carried out and not shown/mentioned, this sentence seems to contradict the previous statement that Bicc1deltaSAM reduced the interaction with PC1 by 55%. Because the full length and SAM deletion have different interaction strengths, the latter sentence makes no sense.

      The reduction in the levels of myc-mBicc1-ΔSAM compared to wildtype mycmBicc1 in respect to PC1 binding was not significant. We have clarified this in the text.

      We have corrected the sentence and modified the Figure accordingly. 

      (6) Imprecise statements make a reader wonder how to interpret the data: "More than three independent experiments were analyzed." Stating the sample size or including it in the figure would save space and improve confidence in the data presented.

      We have stated the exact number of animals per conditions above each of the bars.

      (7) "Next, we performed a similar mouse study for Pkd1 by reducing the gene dose of Pkd1 postnatally in the collecting ducts using a Pkhd1-Cre as previously described40" What did the authors mean?

      The reference was included to cite the mouse strain, but realized that it can be mis-interpreted that the exact experiments has been performed previously. We have clarified this in the text.

      We have reworded the sentence to avoid misinterpretation. 

      (8) The authors examined the additive effects of knocking down Bicc1, Pkd1, and Pkd2 with morpholinos in Xenopus and, genetically, in mice. While the Bicc1[+/-] Pkd1 or 2[+/-] double heterozygote mice did not show phenotypes, the authors report that the Bicc1[-/-] Pkd1 or 2 [+/-] did instead show enlarged kidneys. What is the phenotype of a Bicc1[+/-] Pkd1 or 2 [-/-]? What we learn from the author's findings among the PKD population suggests that the latter situation would be potentially translationally relevant.

      The mouse experiments were designed to address a cooperativity between Bicc1 and either Pkd1 or Pkd2 and whether removal of one copy of Pkd1 or Pkd2 would further worsen the Bicc1 cystic kidney phenotype. Thus, the parental crosses were chosen to maximize the number of animals obtained for these genotypes. Unfortunately, these crosses did not yield the genotypes requested by the reviewer. To address the contribution of Bicc1 towards the PKD population, we will need to perform a different cross, where we eliminate Pkd1 or Pkd2 in a floxed background of Bicc1 postnatally in adult mice. While we are gearing up to perform such an experiment, this is timewise beyond the scope of the manuscript. In addition, please note that we have addressed the question about the translation towards the PKD population already in the discussion of the original submission (page 13/14, last/first paragraph).

      No changes have been made to the revised version of the manuscript.

      (9) How do the authors interpret the milder effects of the Bicc1[-/-] Pkd1[+/-] compared to Bicc1[-/-] Pkd2[+/-] relative to the respective protein-protein interactions?

      The milder effects are due to the nature of the crosses. While the Pkd2 mutant is a germline mutation, the Pkd1 mutant is a conditional allele eliminating Pkd1 only in the collecting ducts of the kidney. As such, we spare other nephron segments such as the proximal tubules, which also significantly contribute to the cyst load. As such these mouse data support the interaction between Pkd1 and Pkd2 with Bicc1, but do not allow us to directly compare the outcomes. While this was mentioned in the previous version of the manuscript, we have expanded on this in the revised version of the manuscript.

      We have expanded the results section in the revised version of the manuscript highlighting that the two different approaches cannot be directly compared.

      (10) How do the authors interpret that the strong Bicc1[Bpk] Pkd1 or Pkd2 double heterozygote mice did not have defects and "kidneys from Bicc1+/-:Pkd2+/- did not exhibit cysts (data not shown)", when the VEO PKD patients and - although not a genetic reduction - also the morpholino-treated Xenopus did?

      VEO PKD patients are characterized by a loss of function of PKD1 or PKD2 and – as we propose in this manuscript - that BICC1 further aggravates the phenotype. Yet, we do not address either in the mouse or Xenopus experiments whether BICC1 is a genetic modifier. We are simply addressing whether the two genes show a genetic interaction. In the mouse studies, we eliminate one copy of Pkd1 or Pkd2 in the background of a hypomorphic allele of Bicc1. Similarly, in the Xenopus experiments, we employ suboptimal doses of the morpholino oligomers, i.e., concentrations that did not yield a phenotypic change and then asked whether removing both together show cooperativity. It is important to state that this is based on a biological readout and not defined based on the amount of protein. While we have described this already in the original manuscript (page 7, first paragraph), we have amended our description of the Xenopus experiment to make this even clearer. 

      Finally, we agree with the reviewer that if we were to address whether Bicc1 is a modifier of the PKD phenotype in mouse, we would need to reduce Bicc1 function in a Pkd1 or Pkd2 mutants. Yet, we have recognized this already in the initial version of the manuscript in the discussion (page 14, first paragraph).

      We have expanded the results section when discussing the suboptimal amounts of the morpholino oligos (Page 6, 1<sup>st</sup> paragraph).

      (11) Unclear: "While variants in BICC1 are very rare, we could identify two patients with BICC1 variants harboring an additional PKD2 or PKD1 variant in trans, respectively." Shortly after, the authors state in apparent contradiction that "the patients had no other variants in any of other PKD genes or genes which phenocopy PKD including PKD1, PKD2, PKHD1, HNF1s, GANAB, IFT140, DZIP1L, CYS1, DNAJB11, ALG5, ALG8, ALG9, LRP5, NEK8, OFD1, or PMM2."

      The reviewer is correct. This should have been phrased differently. We have now added “Besides the variants reported below” to clarify this more adequately.

      The sentence was changed to start with “Besides the variants reported below, […].”

      (12) "The demonstrated interaction of BICC1, PC1, and PC2 now provides a molecular mechanism that can explain some of the phenotypic variability in these families." How do the authors reconcile this statement with their reported ultra-rare occurrence of the BICC1 mutations?

      As mentioned in the manuscript and also in response to the other two reviewers, Bicc1 has been shown to regulate Pkd2 gene expression in mice and frogs via an interaction with the miR-17 family of microRNAs. Moreover, the miR-17 family has been demonstrated to be critical in PKD (PMID: 30760828, PMID: 35965273, PMID: 31515477, PMID: 30760828). In fact, both other reviewers have pointed out that we should stress this more since Bicc1 is part of this regulatory pathway. Future experiments are needed to address whether Bicc1 contributes to the variability in ADPKD onset/severity. Yet, this is beyond the scope of this study. 

      Based on the comments of the two other reviewers we have further addressed the Bicc1/miR-17 interaction.

      (13) The manuscript should use correct genetic conventions of italicization and capitalization. This is an issue affecting the entire manuscript. Some exemplary instances are listed below.

      (a) "We also demonstrate that Pkd1 and Pkd2 modifies the cystic phenotype in Bicc1 mice in a dose-dependent manner and that Bicc1 functionally interacts with Pkd1, Pkd2 and Pkhd1 in the pronephros of Xenopus embryos." Genes? Proteins?

      The data presented in this section show that a hypomorphic allele of Bicc1 in mouse and a knockdown in Xenopus yields this. As both affect the proteins, the spelling should reflect the proteins.

      No changes have been made in the revised manuscript.

      (b) The sentence seems to use both the human and mouse genetic capitalization, although it refers to experiments in the mouse system “to define the Bicc1 interacting domains for PC2 (Fig. 2d,e). Full-length PC2 (PC2-HA) interacted with full-length myc-mBICC1.”

      We agree with the review that stating the species of the molecules used is critical, we have adapted a spelling of Bicc1, where BICC1 is the human homologue, mBicc1 is the mouse homologue and xBicc1 the Xenopus one.

      We have highlighted the species spelling in the methods section and labeled the species accordingly throughout the manuscript and figures. 

      (14) “Together these data supported our biochemical interaction data and demonstrated that BICC1 cooperated with PKD1 and PKD2.” Are the authors implying that these results in mice will translate to the human protein?

      We agree that we have not formally shown that the same applies to the human proteins. Thus, we have changed the spelling accordingly.

      We have revised the capitalization of the proteins. 

      (15) The text is often unclear, terse, or inconsistent.

      (a) “These results suggested that the interaction between PC1 and Bicc1 involves the SAM but not the KH/KHL domains (or the first 132 amino acids of Bicc1). It also suggests that the N-terminus could have an inhibitory effect on PC1-BICC1 association.” How do the authors define the N-terminus? The first 132 aa? KH/KHL domains?

      This was illustrated in the original Figure 2A. The DKH constructs lack the first 351 amino acids. 

      To make this more evident, we have specified this in the text as well.

      (b) Similarly, the authors state below, "Unlike PC1, PC2 interacted with mycmBICC1ΔSAM, but not myc-mBICC1-ΔKH suggesting that PC2 binding is dependent on the N-terminal domains but not the SAM domain." It is unclear if the authors refer to the KH/KHL domains or others. Whatever the reference to the N-terminal region, it should also be consistent with the section above.

      This is now specified in the text.

      (c) Unclear: "We have previously demonstrated that Pkd2 levels are reduced in a complete Bicc1 null mice,22 performing qRT-PCR of P4 kidneys (i.e. before the onset of a strong cystic phenotype), revealed that Bicc1, Pkd1 and Pkd2 were statistically significantly down9 regulated (Fig. 4h-j)".

      We have changed the text to clarify this. 

      (d) “Utilizing recombinant GST domains of PC1 and PC2, we demonstrated that BICC1 binds to both proteins in GST-pulldown assays (Fig. 1a, b)." GST-tagged domains? Fusions?

      We have changed the text to clarify this. 

      (e) "To study the interaction between BICC1, PKD1 and PKD2 we combined biochemical approaches, knockout studies in mice and Xenopus, genetic engineered human kidney cells" > genetically engineered.

      We have changed the text to clarify this.

      (f) Capitalization (e.g., see Figure S3, ref. the Bpk allele) and annotation (e.g., Gly821Glu and G821E) are inconsistent.

      We have homogenized the labeling of the capitalization and annotations throughout the manuscript. 

      (g) What do the authors mean by "homozygous evolutionarily well-conserved missense variant"?

      We have changed this is the revised version of the manuscript. 

      Reviewer #3 (Public review/Recommendations to the authors):

      (1) A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      (2) This study should ideally include experiments in HUREC material obtained from patients/families with BICC1 mutations and studying its effects on the PKD1/2 complex in primary cell lines.

      This is an excellent suggestion. We agree with the reviewer that it would have been interesting to analyze HUREC material from the affected patients. Unfortunately, besides DNA and the phenotypic analysis described in the manuscript neither human tissue nor primary patient-derived cells collected once the two patients with the BICC1 p.Ser240Pro variant passed away.

      No changes to the revised manuscript have been made to address this point.

      (3) Please remove repeated words in the following sentence in paragraph 2 of the introduction: "BICC1 encodes an evolutionarily conserved protein that is characterized by 3 K-homology (KH) and 2 KH-like (KHL) RNA-binding domains at the N-terminus and a SAM domain at the C-terminus, which are separated by a by a disordered intervening sequence (IVS).23-28".

      This has been changed.

    1. Author response:

      Reviewer #1 (Public review):

      The authors analysed large-scale brain-state dynamics while humans watched a short video. They sought to identify the role of thalamocortical interactions.

      Major concerns

      (1) Rationale for using the naturalistic stimulus

      In terms of brain state dynamics, previous studies have already reported large-scale neural dynamics by applying some data-driven analyses, like energy landscape analysis and Hidden Markov Model, to human fMRI/EEG data recorded during resting/task states. Considering such prior work, it'd be critical to provide sufficient biological rationales to perform a conceptually similar study in a naturalistic condition, i.e., not just "because no previous work has been done". The authors would have to clarify what type of neural mechanisms could be missed in conventional resting-state studies using, say, energy landscape analysis, but could be revealed in the naturalistic condition.

      We appreciate your insightful comments regarding the need for a biological rationale in our study. As you mentioned, there are similar studies, just like Meer et al. utilized Hidden Markov Models to identify various activation modes of brain networks that included subcortical regions[1], Song et al. linked brain states to narrative understandings and attentional dynamics[2, 3]. These studies could answer why we use naturalistic stimuli datasets. Moreover, there is evidence suggesting that the thalamus plays a crucial role in processing information in a more naturalistic context while pointing out the vital role in thalamocortical communications[4, 5]. So, we tended to bridge thalamic activity and cortical state transition using the energy landscape description.

      To address these gaps in conventional resting-state studies, we explored an alternative method—maximum entropy modeling based on the energy landscape. This allowed us to validate how the thalamus responds to cortical state transitions. To enhance clarity, we will update our introduction to emphasize the motivations behind our research and the significance of examining these neural mechanisms in a naturalistic setting.

      (2) Effects of the uniqueness of the visual stimulus and reproducibility

      One of the main drawbacks of the naturalistic condition is the unexpected effects of the stimuli. That is, this study looked into the data recorded from participants who were watching Sherlock, but what would happen to the results if we analyzed the brain activity data obtained from individuals who were watching different movies? To ensure the generalizability of the current findings, it would be necessary to demonstrate qualitative reproducibility of the current observations by analysing different datasets that employed different movie stimuli. In fact, it'd be possible to find such open datasets, like www.nature.com/articles/s41597-023-02458-8.

      We appreciate your concern regarding the reproducibility of our findings. The dataset from the "Sherlock" study is of high quality and has shown good generalizability in various research contexts. We acknowledge the importance of validating our results with different datasets to enhance the robustness of our conclusions. While we are open to exploring additional datasets, we intend to pursue this validation once we identify a suitable alternative. Currently, we are considering a comparison with the dataset from "Forrest Gump" as part of our initial plan.

      (3) Spatial accuracy of the "Thalamic circuit" definition

      One of the main claims of this study heavily relies on the accuracy of the localization of two different thalamic architectures: matrix and core. Given the conventional or relatively low spatial resolution of the fMRI data acquisition (3x3x3 mm^3), it appears to be critically essential to demonstrate that the current analysis accurately distinguished fMRI signals between the matrix and core parts of the thalamus for each individual.

      We acknowledge the importance of accurately localizing the different thalamic architectures, specifically the matrix and core regions. To address this, we downsampled the atlas of matrix and core cell populations from the previous study from a resolution of 2x2x2 mm<sup>3</sup> to 3x3x3 mm<sup>3</sup>, which aligns with our fMRI data acquisition. We would report the atlas as Supplementary Figures in our revision.

      (4) More detailed analysis of the thalamic circuits

      In addition, if such thalamic localisation is accurate enough, it would be greatly appreciated if the authors perform similar comparisons not only between the matrix and core architectures but also between different nuclei. For example, anterior, medial, and lateral groups (e.g., pulvinar group). Such an investigation would meet the expectations of readers who presume some microscopic circuit-level findings.

      We appreciate your suggestion regarding a more detailed analysis of thalamic circuits. We have touched upon this in the discussion section as a forward-looking consideration. However, we believe that performing nuclei segmentation with 3T fMRI may not be ideal due to well-documented concerns regarding signal-to-noise ratio and spatial resolution. That said, we are interested in exploring these nuclei-pathway connections to cortical areas in future studies with a proper 7T fMRI naturalistic dataset.

      (5) Rationale for different time window lengths

      The authors adopted two different time window lengths to examine the neural dynamics. First, they used a 21-TR window for signal normalisation. Then, they narrowed down the window length to 13-TR periods for the following statistical evaluation. Such a seemingly arbitrary choice of the shorter time window might be misunderstood as a measure to relax the threshold for the correction of multiple comparisons. Therefore, it'd be appreciated if the authors stuck to the original 21-TR time window and performed statistical evaluations based on the setting.

      Thank you for your valuable feedback regarding the choice of time window lengths. We aimed to maintain consistency in window lengths across our analyses. In light of your comments and suggestions from other reviewers, we plan to test our results using different time window lengths and report findings that generalize across these variations. Should the results differ significantly, we will discuss the implications of this variability in our revised manuscript.

      (6) Temporal resolution

      After identifying brain states with energy landscape analysis, this study investigated the brain state transitions by directly looking into the fMRI signal changes. This manner seems to implicitly assume that no significant state changes happen in one TR (=1.5sec), which needs sufficient validation. Otherwise, like previous studies, it'd be highly recommended to conduct different analyses (e.g., random-walk simulation) to address and circumvent this problem.

      Thank you for raising this important point regarding temporal resolution. Many fMRI studies, such as those examining event boundaries during movie watching, operate under similar assumptions concerning state changes within one TR. For example, Barnett et al. processed the dynamic functional connectivity (dFC) with a window of 20 TRs (24.4s). So, we do not think it is a limitation but is a common question related to fMRI scanning parameters. To strengthen our analysis of state transitions and ensure they are not merely coincidental, we plan to conduct random-walk simulations, as suggested, to validate our findings in accordance with methodologies used in previous research.

      Reviewer #2 (Public review):

      Summary:

      In this study, Liu et al. investigated cortical network dynamics during movie watching using an energy landscape analysis based on a maximum entropy model. They identified perception- and attention-oriented states as the dominant cortical states during movie watching and found that transitions between these states were associated with inter-subject synchronization of regional brain activity. They also showed that distinct thalamic compartments modulated distinct state transitions. They concluded that cortico-thalamo-cortical circuits are key regulators of cortical network dynamics.

      Strengths:

      A mechanistic understanding of cortical network dynamics is an important topic in both experimental and computational neuroscience, and this study represents a step forward in this direction by identifying key cortico-thalamo-cortical circuits. The analytical strategy employed in this study, particularly the LASSO-based analysis, is interesting and would be applicable to other data types, such as task- and resting-state fMRI.

      We thanks for this comment and encouragement.

      Weaknesses:

      Due to issues related to data preprocessing, support for the conclusions remains incomplete. I also believe that a more careful interpretation of the "energy" derived from the maximum entropy model would greatly clarify what the analysis actually revealed.

      Thank you for your valuable suggestions, and we apologize for any misunderstandings regarding the interpretation of the energy landscape in our study. To address this issue, we will include a dedicated paragraph in both the methods and results sections to clarify our use of the term "energy" derived from the maximum entropy model. This addition aims to eliminate any ambiguity and provide a clearer understanding of what our analysis reveals.

      (1) I think the method used for binarization of BOLD activity is problematic in multiple ways.

      a) Although the authors appear to avoid using global signal regression (page 4, lines 114-118), the proposed method effectively removes the global signal. According to the description on page 4, lines 117-122, the authors binarized network-wise ROI signals by comparing them with the cross-network BOLD signal (i.e., the global signal): at each time point, network-wise ROI signals above the cross-network signal were set to 1, and the rest were set to −1. If I understand the binarization procedure correctly, this approach forces the cross-network signal to be zero (up to some noise introduced by the binarization of network-wise signals), which is essentially equivalent to removing the global signal. Please clarify what the authors meant by stating that "this approach maintained a diverse range of binarized cortical states in data where the global signal was preserved" (page 4, lines 121-122).

      Thank you for highlighting the potential issue with our binarization method. We appreciate your insights regarding the comparison of network-wise ROI signals with the cross-network BOLD signal, as this may inadvertently remove the global signal. To address this, we will conduct a comparative analysis of results obtained from both our current approach and the original pipeline. If we decide to retain our current method, we will carefully reconsider the rationale and rephrase our descriptions to ensure clarity regarding the preservation of the global signal and the diversity of binarized cortical states.

      b) The authors might argue that they maintained a diverse range of cortical states by performing the binarization at each time point (rather than within each network). However, I believe this introduces another problem, because binarizing network-wise signals at each time point distorts the distribution of cortical states. For example, because the cross-network signal is effectively set to zero, the network cannot take certain states, such as all +1 or all −1. Similarly, this binarization biases the system toward states with similar numbers of +1s and −1s, rather than toward unbalanced states such as (+1, −1, −1, −1, −1, −1). These constraints and biases are not biological in origin but are simply artifacts of the binarization procedure. Importantly, the energy landscape and its derivatives (e.g., hard/easy transitions) are likely to be affected by these artifacts. I suggest that the authors try a more conventional binarization procedure (i.e., binarization within each network), which is more robust to such artifacts.

      Related to this point, I have a question regarding Figure S1, in which the authors plotted predicted versus empirical state probabilities. As argued above, some empirical state probabilities should be zero because of the binarization procedure. However, in Figure S1, I do not see data points corresponding to these states (i.e., there should be points on the y-axis). Did the authors plot only a subset of states in Figure S1? I believe that all states should be included. The correlation coefficient between empirical and predicted probabilities (and the accuracy) should also be calculated using all states.

      Thank you for your thoughtful examination of our data processing pipeline. We agree that a comparison between the conventional binarization method and our current approach is warranted, and we appreciate your suggestion. Upon reviewing Figure S1, we discovered that there was indeed an error related to the plotting style set to "log10." As you correctly pointed out, the data should reflect that the probabilities for states where all networks are either activated or deactivated are zero. We are very interested in exploring the state distributions obtained from both the original and current approaches, as your comments highlight important considerations. We sincerely appreciate your insightful feedback and will make sure to address these points thoroughly in our first revision.

      c) The current binarization procedure likely inflates non-neuronal noise and obscures the relationship between the true BOLD signal and its binarized representation. For example, consider two ROIs (A and B): both (+2%, +1%) and (+0.01%, −0.01%) in BOLD signal changes would be mapped to (+1, −1) after binarization. This suggests that qualitatively different signal magnitudes are treated identically. I believe that this issue could be alleviated if the authors were to binarize the signal within each network, rather than at each time point.

      Thank you for your important observation regarding the potential inflation of non-neuronal noise in our current binarization procedure. We recognize that this process could lead to qualitatively different signal magnitudes being treated similarly after binarization, as you illustrated with your example. While we acknowledge your point, we believe that conventional binarization pipelines may also encounter this issue, albeit by comparing signals to a network's temporal mean activity. To address this concern and maintain consistency with previous studies, we will discuss this limitation in our revised manuscript. Additionally, if deemed necessary, we will explore implementing a percentile-based threshold above the baseline to further refine our binarization approach. Your suggestion provides a valuable perspective, and we appreciate your insights.

      (2) As the authors state (page 5, lines 145-148), the "energy" described in the energy landscape is not biological energy but rather a statistical transformation of probability distributions derived from the Boltzmann distribution. If this is the case, I believe that Figure 2A is potentially misleading and should be removed. This type of schematic may give the false impression that cortical state dynamics are governed by the energy landscape derived from the maximum entropy model (which is not validated).

      Thank you for your valuable feedback regarding Figure 2A. We apologize for any confusion it may have created. While we recognize that similar figures are commonly used in literature involving energy landscapes (maximum entropy model), we agree that Figure 2A may mislead readers into thinking that cortical state dynamics are directly governed by the energy landscape derived from the maximum entropy model, which has not been validated. In light of your comments, we will remove Figure 2A and instead emphasize the analytical strategy presented in Figure 2B. Additionally, we will provide a simplified line graph as an illustrative example to clarify the concepts without the potential for misinterpretation.

      Reviewer #3 (Public review):

      Summary:

      In this study, Liu et al. analyze fMRI data collected during movie watching, applied an energy landscape method with pairwise maximum entropy models. They identify a set of brain states defined at the level of canonical functional networks and quantify how the brain transitions between these states. Transitions are classified as "easy" or "hard" based on changes in the inferred energy landscape, and the authors relate transition probabilities to inter-subject correlation. A major emphasis of the work is the role of the thalamus, which shows transition-linked activity changes and dynamic connectivity patterns, including differential involvement of parvalbumin- and calbindin-associated thalamic subdivisions.

      Strengths:

      The study is methodologically complex and technically sophisticated. It integrates advanced analytical methods into high-dimensional fMRI data. The application of energy landscape analysis to movie-watching data appears to be novel as well. The finding on the thalamus involved energy state transition and provides a strong linkage to several theories on thalamic control functions, which is a notable strength.

      Thanks for your comments on the novelty of our study.

      Weaknesses:

      The main weakness is the conceptual clarity and advances that this otherwise sophisticated set of analyses affords. A central conceptual ambiguity concerns the energy landscape framework itself. The authors note that the "energy" in this model is not biological energy but a statistical quantity derived from the Boltzmann distribution. After multiple reads, I still have major trouble mapping this measure onto any biological and cognitive operations. BOLD signal is a measure of oxygenation as a proxy of neural activity, and correlated BOLD (functional connectivity) is thought to measure the architecture of information communication of brain systems. The energy framework described in the current format is very difficult for most readers to map onto any neural or cognitive knowledge base on the structure and function of brain systems. Readers unfamiliar with maximum entropy models may easily misinterpret energy changes as reflecting metabolic cost, neural effort, or physiological variables, and it is just very unclear what that measure is supposed to reflect. The manuscript does not clearly articulate what conceptual and mechanistic advances the energy formalism provides beyond a mathematical and statistical report. In other words, beyond mathematical description, it is very hard for most readers to understand the process and function of what this framework is supposed to tell us in regards to functional connectivity, brain systems, and cognition. The brain is not a mathematical object; it is a biological organ with cognitive functions. The impact of this paper is severely limited until connections can be made.

      Thank you for your insightful and constructive comments regarding the conceptual clarity of our energy landscape framework. We appreciate your perspective on the challenges of mapping the statistical measure of "energy" derived from the Boltzmann distribution onto biological and cognitive operations. To address these concerns, we will revise our manuscript to clarify our expressions surrounding "energy" and emphasize its probabilistic nature. Additionally, we will incorporate a series of analyses that explicitly relate the features of the energy landscape to cognitive processes and key parameters, such as brain integration and functional connectivity. We believe these changes will help bridge the gap between our mathematical framework and its relevance to understanding brain systems and cognitive functions.

      Relatedly, the use of metaphors such as "valleys," "hills," and "routes" in multidimensional measures lacks grounding. Valleys and hills of what is not intuitive to understand. Based on my reading, these features correspond to local minima and barriers in a probability distribution over binarized network activation patterns, but similar to the first point, the manuscript does not clearly explain what it means conceptually, neurobiologically, or computationally for the brain to "move" through such a landscape. The brain is not computing these probabilities; they are measurement tools of "something". What is it? To advance beyond mathematical description, these measurements must be mapped onto neurobiological and cognitive information.

      Thank you for your valuable feedback. In our revisions, we would aim to link the concept of rapid transition routes in the energy landscape to cognitive processes, such as narrative understanding and related features. By exploring these connections, we hope to provide a clearer context for how our framework can enhance understanding of cognitive functions and their neural correlates.

      This conceptual ambiguity goes back to the Introduction. At the level of motivation, the purpose and deliverables of the study are not defined in the Introduction. The stated goal is "Transitions between distinct cortical brain states modulate the degree of shared neural processing under naturalistic conditions". I do not know if readers will have a clear answer to this question at the end. Is the claim that state transitions cause changes in inter-subject correlation, that they index moments of narrative alignment, or that they reflect changes in attentional or cognitive mode? This level of explanation is largely dissociated from the methods in their current form.

      Thank you for highlighting this important point regarding the conceptual clarity in our Introduction. We appreciate your feedback about the motivation and objectives of the study. To clarify the stated goal of investigating how transitions between distinct cortical brain states modulate shared neural processing under naturalistic conditions, we will revise the manuscript to explicitly define the specific claims we aim to address. We will ensure that these explanations are closely tied to the methods employed in our study, providing a clearer framework for our readers.

      Several methodological choices can use clarification. The use of a 21-TR window centered on transition offsets is unusually long relative to the temporal scale of fMRI dynamics and to the hypothesized rapidity of state transitions. On a related note, what is the temporal scale of state transition? Is it faster than 21 TRs?

      Thank you for your insightful questions regarding our methodological choices. Our focus on specific state transitions necessitated the use of a 21-TR window. While it’s true that other transitions may occur within this window, averaging across the same transitions at different times allows us to identify distinctive thalamic BOLD patterns that precede cortical state transitions. This methodology enables us to capture relevant dynamics while ensuring that we focus on the transitions of interest. We appreciate your feedback, and this clarification will be included in our revised manuscript. We would also add a figure that describe the dwell time of cortical states.

      The choice of movie-watching data is a strength. But, many of the analyses performed here, energy landscape estimation, clustering of states, could in principle be applied to resting-state data. The manuscript does not clearly articulate what is gained, mechanistically or cognitively, by using movie stimuli beyond the availability of inter-subject correlation.

      Thank you for your question, which closely aligns with a concern raised by Reviewer #1. Our core hypothesis posits that naturalistic stimuli yield a broader set of brain states compared to those observed during resting-state conditions. To support this assertion, we will clearly articulate the findings from previous studies that relate to this hypothesis. Additionally, if appropriate, we will provide a comparative analysis between our data and resting-state data to highlight the differences and emphasize the uniqueness of the brain states elicited by naturalistic stimuli.

      Because of the above issues, a broader concern throughout the results is the largely descriptive nature of the findings. For example, the LASSO analysis shows that certain state transitions predict ISC in a subset of regions, with respectable R² values. While statistically robust, the manuscript provides little beyond why these particular transitions should matter, what computations they might reflect, or how they relate to known cognitive operations during movie watching. Similar issues arise in the clustering analyses. Clustering high-dimensional fMRI-derived features will almost inevitably produce structure, whether during rest, task, or naturalistic viewing. What is missing is an explanation of why these specific clusters are meaningful in functional or mechanistic terms.

      Thank you for your questions. In our revisions, we will perform additional analyses aimed at linking state transitions to cognitive processes more explicitly. Regarding clustering, we will provide a thorough discussion in the revised manuscript.

      Finally, the treatment of the thalamus, while very exciting, could use a bit more anatomical and circuit-level specificity. The manuscript largely treats the thalamus as a unitary structure, despite decades of work demonstrating big functional and connectivity differences across thalamic nuclei. A whole-thalamus analysis without more detailed resolution is increasingly difficult to justify. The subsequent subdivision into PVALB- and CALB-associated regions partially addresses this, but these markers span multiple nuclei with overlapping projection patterns.

      This suggestion aligns with the feedback from Reviewer #1. We believe that performing nuclei segmentation with 3T fMRI may not be ideal due to well-documented concerns regarding signal-to-noise ratio and spatial resolution. Therefore, investigating core and matrix cell projections across different thalamic nuclei using 7T fMRI presents a promising avenue for further study.

      (1) Van Der Meer J N, Breakspear M, Chang L J, et al. Movie viewing elicits rich and reliable brain state dynamics [J]. Nature Communications, 2020, 11(1): 5004.

      (2) Song H, Park B Y, Park H, et al. Cognitive and Neural State Dynamics of Narrative Comprehension [J]. Journal of Neuroscience, 2021, 41(43): 8972-8990.

      (3) Song H, Shim W M, Rosenberg M D. Large-scale neural dynamics in a shared low-dimensional state space reflect cognitive and attentional dynamics [J]. Elife, 2023, 12.

      (4) Shine J M, Lewis L D, Garrett D D, et al. The impact of the human thalamus on brain-wide information processing [J]. Nature Reviews Neuroscience, 2023, 24(7): 416-430.

      (5) Yang M Y, Keller D, Dobolyi A, et al. The lateral thalamus: a bridge between multisensory processing and naturalistic behaviors [J]. Trends in Neurosciences, 2025, 48(1): 33-46.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1(Public review):

      In this study, Acosta-Bayona et al. aim to better understand how environmental conditions could have influenced specific gene functions that may have been selected for during the domestication of teosinte parviglumis into domesticated maize. The authors are particularly interested in identifying the initial phenotypic changes that led to the original divergence of these two subspecies. They selected heavy metal (HM) stress as the condition to investigate. While the justification for this choice remains speculative, paleoenvironmental data would add value; the authors hypothesize that volcanic activity near the region of origin could have played a role.

      The justification of choice to investigate the effects of heavy metal stress is not speculative. As mentioned now in the Abstract, the elucidation of the genome from the Palomero toluqueño maize landrace revealed heavy metal effects during domestication (Vielle-Calzada et al., Science 2009). Our aim was to test the hypothesis that heavy metal (HM) stress influenced the evolutionary transition of teosinte parviglumis to maize.

      (1) Although the paper presents some interesting findings, it is difficult to distinguish which observations are novel versus already known in the literature regarding maize HM stress responses. The rationale behind focusing on specific loci is often lacking. For example, a statistically significant region identified via LOD score on chromosome 5 contains over 50 genes, yet the authors focus on three known HM-related genes without discussing others in the region. It is unclear why ZmHMA1 was selected for mutagenesis over ZmHMA7 or ZmSKUs5.

      We appreciated the depth and value of this comment.

      Maize phenotypic responses to sublethal concentrations to heavy metals – copper (Cu) and cadmium (Cd) in particular - are well characterized and published, and in agreement with our results. In the first section of the Results (pgs 7 and 8), we added pertinent references to clearly show which observations are already known. By contrast, teosinte parviglumis responses are in all cases novel. To our knowledge this is the first study that analyzed in detail the phenotypic response of teosinte to sublethal concentrations of heavy metals, specifically Cu and Cd. We have now emphasized the novelty of these observations (pg 8).

      To address the fact that we only focused on three known HM-related genes without discussing others in the statistically significant region identified via LOD score on chr.5, we have added a full section that reads as follows (pgs. 11 to 13 of the new version):

      “Large-scale genomic and transcriptomic comparisons indicate that many HM response genes were positively selected across the maize genome.

      To expand the results well beyond the analysis of the three genes previously described, we performed a detailed analysis of genetic diversity across the 11.47 Mb genomic region comprised between Z_mSKUs5_ and ZmHMA1. This additional analysis reveals general tendencies in the quantity and nature of loci that were affected by positive selection during the teosinte parviglumis to maize transition in a region identified via LOD score on chr.5. We compared nucleotide variability by using 100 bp bins covering loci composed of two 30 Kb segments up and downstream of coding sequences, respectively, and the coding sequence itself, for 173 genes present within the genomic region comprised between ZmSKUs5 and ZmHMA (Figure S1 and Supplementary File 6). Two types of statistical tests (ANOVA and Wilcoxon) were applied to nucleotide variability comparisons using the entirety of each locus. The Benjamini-Hochber procedure allowed an estimation of the false discovery rate (FDR<0.05) to avoid type I errors (false positives). Although some individual loci appear as differently classified depending on the statistical test applied (22 out of 173 loci), the general differences in nucleotide variability are consistently maintained within the subregions described below. We found that 166 out of 173 loci show signatures of positive selection and are roughly organized in five independent subregions of variable length. The first six loci are consecutively ordered in a 402 Kb subregion that includes ZmSKUs5. A second group of 13 consecutive loci expands over a 1.44 Mb subregion that contains NRAMP ALUMINUM TRANSPORTER1, also involved in HM response through uptake of divalent ions. A third group of 17 consecutive loci expands over 1.28 Mb; eleven contain genes encoding for uncharacterized proteins. The fourth group is composed of 57 consecutive loci expanding over 3.22 Mb and contains genes encoding for DEFECTIVE KERNEL55, AUXIN RESPONSE FACTOR16, and peroxydases involved in responses to oxydative stress. The fifth group contains 12 consecutive loci expanding over 713 Kb and contains ZmHMA1. An additional segment of approximately 1.17 Mb and containing 25 consecutive loci that were positively selected expands away from the ZmSKUs5-ZmHMA1 segment; it also contains several genes encoding for peroxydases. Although multiple loci include genes that could be involved in abiotic stress and oxidative responses, these results suggest that multiple factors other than HM stress could have played a role in the evolutionary mechanisms that affected the genetic diversity of chr.5 during the teosinte parviglumis to maize transition.

      To further analyze the possibility that HM response could have played a role in maize emergence and subsequent domestication, we analyzed large scale transcriptomic data corresponding to independent experiments aiming at understanding the response of maize roots to HM stress. Six available transcriptomes were selected for in-depth analysis because they presented a fold change strictly higher than 1, and their results were supported by false discovery rates (FDR<0.05). These six transcriptomes (Table S5) included HM response datasets corresponding to growth conditions that not only incorporated Cu, but also lead (Pb) and chromium (Cr) that were not included in the substrate of our experiments. Transcriptional profiles were obtained from roots of plants at different stages: maize seedlings (Shen et al., 2012; Gao et al., 2015; Zhang et al., 2024a), three week old plantlets (Yang et al., 2023), and plants at V2 stage (Zhang et al., 2024b; Fengxia et al., 2025). A total of 120 genes shared by all six transcriptomes were found to be differentially expressed under HM stress conditions (66 upegulated and 54 downregulated; Figure S3), including ZmSKUs5, ZmHMA1 and ZmHMA7; 52 of them (43.3%) are located in maize loci showing less than 70% of the nucleotide variability found in teosinte parviglumis, suggesting that they were affected by positive selection (Yamasaki et al., 2005; Supplementary File 7). Of 18 mapping in chr.5, twelve are within the 82 cM that fractionates into multiple QTLs under selection during the parviglumis to maize transition. Interestingly, five additional loci containing HM response genes completely lack SNPs within their total length in both parviglumis and maize, and 19 additional loci lack SNPs in at least one 30 Kb segment or their coding region (Supplementary File 7), suggesting the frequent presence of ultraconserved genomic regions in many loci containing HM response genes. When this same analysis was conducted in a set of loci comprising 63 genes previously identified as differentially expressed in response to abiotic stress not directly related to HM responses (hypoxia; nutritional deficiency; soil alkalinity; drought; soil salinity), 18 loci (28.6%) showed less than 70% of the nucleotide variability found in teosinte parviglumis. Only one of them maps in chr.5 and none contained segments or coding regions lacking SNPs in parviglumis or maize. These results suggest that in contrast to other types of abiotic stress response genes, loci comprising a large set of genes that unambiguously respond to HM stress caused by chemical elements of diverse nature were affected by positive selection during the parviglumis to maize transition, irrespectively of their position in the genome.”

      The detailed analysis of genetic diversity across 11.47 Mb of chr.5 in the genomic region comprised between ZmSKUs5 and ZmHMA1 in presented as Supplementary File 6.

      The analysis of genetic diversity in loci encompassing heavy metal response genes shared by six transcriptomes and abiotic stress controls are described in Supplementary File 7.

      In the Discussion (pgs. 21 and 22), we added a paragraph section that reads as follows:

      “Although loss of genetic diversity is usually the result of human selection during domestication, it can also represent a consequence of natural selective pressures favoring fitness of specific teosinte parviglumis allelic variants better adapted to environmental changes and subsequently affected by human selection during the domestication process. This possibility is reflected by widely spread selective sweeps affecting a large portion of chr.5 that contains hundreds of genes showing signatures of positive selection. The analysis of 11.47 Mb covering the ZmHMA1ZmSKUs5 segment confirms the presence of large but discrete genomic subregions that were positively selected during the teosinte parviglumis to maize transition. Although several contain genes involved in HM response and oxidative stress, the diversity of gene functions does not necessarily favor abiotic stress over other factors that could be at the origin of selective forces affecting these regions. By contrast, a large scale transcriptomic survey indicates that genes consistently responding to HMs (Cu, Cd, Pb and Cr ) show signatures of positive selection at unusual high frequencies (43.3%) as compared to loci containing genes responding to other types of abiotic stress (28.6%). Our identification of HM response genes affected by positive selection is far from being exhaustive. Nevertheless, it agrees with the expected effects of a widespread selective sweep caused by environmental changes that influenced the parviglumis to maize transition at the genetic level. Of intriguing interest are 24 loci that partially or completely lack SNPs in both teosinte parviglumis and maize, suggesting possible genetic bottlenecks occurred before the teosinte to maize transition. Examples of other edaphological factors driving genetic divergence either in the teosintes or maize include local adaptation to phosphorus concentration in mexicana and parviglumis (Aguirre-Liguori et al. 2019), and fast maize adaptation to changing iron availability through the action of genes involved in its mobilization, uptake, and transport (Benke and Stich 2011). Our results reveal a teosinte parviglumis environmental plasticity that could be related to the function of HM response genes positively selected during the teosinte parviglumis to maize transition. Previous studies have demonstrated that transposable elements (TEs) contribute to activation of maize genes in response to abiotic stress, affecting up to 20% of the genes upregulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress (Makarevitch et al., 2015). It is therefore possible that the HM response of some specific genes that influenced maize emergence or domestication could be mediated by TEs influencing or driving their transcriptional regulation.”

      The mutagenic analysis of ZmHMA7 and ZmSKUs5 will be included in a different publication.

      (2) The idea that HM stress impacted gene function and influenced human selection during domestication is of interest. However, the data presented do not convincingly link environmental factors with human-driven selection or the paleoenvironmental context of the transition. While lower nucleotide diversity values in maize could suggest selective pressure, it is not sufficient to infer human selection and could be due to other evolutionary processes. It is also unclear whether the statistical analysis was robust enough to rule out bias from a narrow locus selection. Furthermore, the addition of paleoclimate records (Paleoenvironmental Data Sources as a starting point) or conducting ecological niche modeling or crop growth models incorporating climate and soil scenarios would strengthen the arguments.

      We think that the detailed analysis of genetic diversity across 11.46 Mb covering the ZmSKUs5 to ZmHMA1 genomic segment – and its statistical validation - provides a precise understanding of the selective sweep dimensions in chr.5.

      We do agree that lower nucleotide diversity values in maize are not sufficient to infer human selection. Because many HM response loci show unusually low nucleotide variability in teosinte parviglumis (see the results of the transcriptomic analysis presented above), we cannot discard the possibility that natural selection forces related to environmental changes could have affected native populations of teosinte parviglumis.

      To further explore the link between environmental factors, natural or human-driven selection, and the paleoenvironmental context of the parviglumis to maize transition, we revised paleoenvironmental and geological records and added results in two sections that read as follows (pgs. 17 to 20):

      “Paleoenvironmental studies reveal periods of climatic instability in the presumed region of maize emergence during the early Holocene.

      It is well accepted that temperature fluctuations, volcanism and anthropogenic impact shaped the distribution and abundance of plant species in the Transmexican Volcanic Belt (TMVB) during the last 14,000 years (Torrescano-Valle et al. 2019). The TMVB has produced close to 8000 volcanic structures (Ferrari et al., 2011), transforming the relief multiple times, and causing hydrographic and soil changes that actively modified the distribution and composition of plant communities in Central Mexico. Detailed paleoenvironmental data for the Pleistocene and Holocene is available for several lacustrine zones located within the 50 to 100 km range of the region currently considered the cradle of maize domestication (Matzuoka et al. 2002; Figure 5a). In Lake Zirahuén (102°44′ W; 19°26′ N and approximately 2075 meters above sea level; index [i] in Figure 5a), pollen, microcharcoal and magnetic susceptibility analyses of two sedimentary sequences reveals three periods of major ecological change during the early and middle Holocene.

      Between 9500 and 9000 calibrated years before present (cal yr BP), pine forests seem to have been associated with summer insolation increases. A second peak of forest change occurred at around 8200 cal yr BP, coinciding with cold oscillations documented in the North Atlantic. Finally, events occurred between 7500 and 7100 cal yr BP shows an abrupt change in the plant community related to humid Holocene climates and a presumed volcanic event (Lozano-García et al., 2013). The environmental history of the central Balsas watershed has also been documented by pollen, charcoal, and sedimentary analysis conducted in three lakes and a swamp of the Iguala valley (Piperno et al. 2007). Paleoecological records of lake Ixtacyola (8°20N, 99°35W and approximately 720 meters above sea level; index [ii] in Figure 5a) and lake Ixtapa (8°21N, 99°26W) indicate that an important increase in temperature and precipitation occurred between 13000 and 10000 cal yr BP. The pollen record of Ixtacyola showed that members of the genus Zea were already part of the vegetation coverage by 12900 to 13000 cal yr BP, suggesting that some teosintes – likely including parviglumis - were commonly found at elevation areas where they do not presently occur. Lake Almoloya (also named Chignahuapan; 19°05N, 99°20E and approximately 2575 meters above sea level; index [iii] in Figure 5a) in the upper Lerma basin is only 20 Km from the crater of the Nevado de Toluca that is responsible for creating the late Pleistocene Upper Toluca Pumice layer over which the Lerma basin is deposited. Pollen records indicate the presence of Zea species by 11080 to 10780 cal yr BP. As for other locations, an important period of climatic instability prevailed between 11500 and 8500 cal yr BP (Ludlow-Wiechers et al., 2005). Humidity fluctuations occurred until 8000 cal yr BP, with a stable temperate climate between 8500 and 5000 cal yr BP. Although pollen and diatom studies are often difficult to interpret at a regional scale, the overall results presented above suggest consistent periods of Zea plants present in periods of environmental and climatic instability that correlate with the history of volcanic activity during the early Holocene, as described in the next section.

      Temporal and geographical convergence between volcanic eruptions and maize emergence during the Holocene.

      Current evidence indicates that the emergence and domestication of maize initiated in Mesoamerica some time around 9,000 yr BP (Matsuoka et al. 2002). The current location of teosinte parviglumis populations that are phylogenetically most closely allied with maize are currently distributed in a region located between the Michoacan-Guanajuato Volcanic Field (MGVF) at their northwest, and the Nevado de Toluca and Popocatéptl volcanoes at their east and northeast (Figure 5a; Matsuoka et al. 2002). Precise records of field data indicate that ten accessions were collected in the Balsas river drainage near Teloloapan and Sierra de Huautla (Guerrero), at approximately 100 km south of the Nevado de Toluca crater. Three other accessions were collected near Tejupilco de Hidalgo and Zacazonapan (Estado de México), at approximately 50 to 60 km from the Nevado de Toluca crater (8762, JSG y LOS-161, and JSG-391). And four other accessions were located in Michoacan, at a location within the MGVF (accession 8763), or at mid-distance between the MGVF and the Nevado de Toluca crater (accessions JSG y LOS-130, 8761, and 8766).

      The most important source of HMs in ancient soils of Mesoamerica is TMBV-dependent volcanic activity through short- and long-term effects related to lava deposits, ores, hydrothermal flow, and ash (Torrescano-Valle et al. 2019). The Nevado de Toluca volcano produced one of the most powerful eruptions from central Mesoamerica in the Holocene, giving rise to the Upper Toluca Pumice deposit at 12621 to 12025 cal yr BP (Arce et al., 2003; Figure 5b). The pumice fallout blanketed the Lerma and Mexico basins with 40 cm of coarse ash (Bloomfield and Valastro 1977; Arce et al. 2003). A second eruption dated by 36Cl exposure occurred at 9700 cal yr BP (Arce et al. 2003; Figure 5b), and the most recent eruption occurred at 3580 to 3831 cal yr BP (Macías et al. 1997). During the early and middle Holocene, the Popocatéptl volcano produced at least four eruptions dated 13037-12060, 10775–9564, 8328-7591, and 6262-5318 cal yr BP (Siebe et al. 1997); three other important eruptions occurred during the late Holocene, between 2713 and 733 cal yr BP (Siebe and Macías, 2006). In addition, the MGFV is a monogenetic volcanic field for which 23 independent eruptions have been documented during the Holocene, 21 of them located towards the southern part of the field, in close proximity to the region harboring some of the teosinte parviglumis populations most closely related to maize. Three of these eruptions occurred in the early Holocene (El Huanillo 1130 to 9688 cal yr BP; La Taza 10649 to 10300 cal yr BP; Cerro Grande 10173 to 9502 cal yr BP; Figure 5b), and three others during the initial period of the middle Holocene, between 8400 and 7696 cal yr BP (La Mina, Los Caballos, and Cerro Amarillo; Figure 5b). On average, a new volcano forms every ~435 years in the MGFV (Macías and Arce, 2019). No less than 16 other eruptions occurred between 7159 cal yr BP and the present time (Figure 5b). Soils of volcanic origin (andosols) are currently distributed in regions north-west from the Nevado de Toluca and Popocatéptl craters, in close proximity with teosinte parviglumis populations most closely related to maize (Figure S5). Although modern distribution of teosinte populations may differ from their distribution around 9000 yr BP, and unknown populations more closely related to maize may yet to be discovered, this data indicates that the date and region where maize emerged is convergent with the dates and locations of several volcanic eruptions occurred during the Holocene in that same region.”

      (3) Despite the interest in examining HM stress in maize and the presence of a pleiotropic phenotype, the assessment of the impact of gene expression is limited. The authors rely on qPCR for two ZmHMA genes and the locus tb1, known to be associated with maize architecture. A transcriptomic analysis would be necessary to 1- strengthen the proposed connection and 2- identify other genes with linked QTLs, such as those in the short arm of chromosome 5.

      Real-time qPCR is an accurate and reliable approach to assess the expression of specific genes such as ZMHMA1 and Tb1, but we agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. We have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      This is also emphasized in the Discussion (pg 21) as follows:

      “Under HM stress, we also show that Tb1 is overexpressed in the apical meristem of teosinte parviglumis, suggesting that formation of secondary ramifications is repressed by Tb1 function under HM stress, as in extant maize. At this stage we cannot discard the possibility that Tb1 upregulation in parviglumis reflects a more generalized response to abiotic stress; however, the expression ZmHMA1 is downregulated in W22 wild-type maize meristems in the presence of HMs but upregulated in teosinte parviglumis meristems, suggesting that a specific regulatory shift relating HM responses and ZmHMA1 function occurred during the teosinte parviglumis to maize transition.”

      On the other hand, the transcriptional analysis the identification of 52 additional HM response genes showing signatures of positive selection occurred during the parviglumis to maize transition; 12 of them map to chr.5 within the region having linked QTLs within the short arm of chr.5. So far, genes involved in HM response and oxidative stress represent the most prevalent class of genes identified within the genomic region showing pleiotropic effects on domestication and multiple linked QTLs in chr.5.

      Reviewer #2 (Public review):

      Summary:

      This work explores the phenotypic developmental traits associated with Cu and Cd responses in teosinte parviglumis, a species evolutionary related to extant maize crops. Cu and Cd could serve as a proxy for heavy metals present in the soils. The manuscript explores potential genetic loci associated with heavy metal responses and domestication identified in previous studies. This includes heavy metal transporters, which are unregulated during stress. To study that, the authors compare the plant architecture of maize defective in ZmHMA1 and speculate on its association with domestication.

      Strengths:

      Very few studies covered the responses of teosintes to heavy metal stress. The physiological function of ZmHMA1 in maize also gives some novelty in this study. The idea and speculation section is interesting and well-implemented.

      Weaknesses:

      The authors explored Cu/Cd stress but not a more comprehensive panel of heavy metals, making the implications of this study quite narrow. Some techniques used, such as end-point RT-PCR and qPCR, are substandard for the field. The phenotypic changes explored are not clearly connected with the potential genetic mechanisms associated with them, with the exception of nodal roots. If teosintes in response to heavy metal have phenotypic similarity with modern landraces of maize, then heavy metal stress might have been a confounding factor in the selection of maize and not a potential driving factor. Similar to the positive selection of ZmHMA1 and its phenotypic traits. In that sense, there is no clear hypothesis of what the authors are looking for in this study, and it is hard to make conclusions based on the provided results to understand its importance. The authors do not provide any clear data on the potential influence of heavy metals in the field during the domestication of maize. The potential role of Tb-1 is not very clear either.

      Thank you for these comments. We have now emphasized our hypothesis in the abstract and the last paragraph of the Introduction (pg. 6):

      “To test the hypothesis that heavy metal (HM) stress influenced the evolutionary transition of teosinte to maize, we exposed both subspecies to sublethal concentrations of copper and cadmium etc…”

      A comprehensive panel of heavy metals would not be more accurate in terms of simulating the composition of soils evolving across 9,000 years in the region where maize presumably emerged. Copper (Cu) and cadmium (Cu) correspond each to a different affinity group for proteins of the ZmHMA family. ZmHMA1 has preferential affinity for Cu and Ag (silver), whereas ZmHMA7 has preferential affinity to Cd, Zn (zinc), Co (cobalt), and Pb (lead). Since these P1b-ATPase transporters mediate the movement of divalent cations, their function remains consistent regardless of the specific metal tested, provided it belongs to the respective affinity group. By applying sublethal concentrations of Cd (16 mg/kg) and Cu (400 mg/kg), we caused a measurable physiological response while allowing plants to complete their life cycle, including the reproductive phase, facilitating a comprehensive analysis of metal stress adaptation. Whereas higher doses impair flowering or are lethal, lower Cu/Cd concentrations do not consistently show conventional phenotypic responses such as reduced plant growth (AbdElgawad et al. 2020; Atta et al., 2023)

      Based on comments by both reviewers, we present now a large transcriptional analysis that incorporates HM responses to lead (Pb) and chromium (Cr), in addition to Cu. Results show that many genes responding to Pb and Cr were also positively selected across the maize genome, suggesting that HM stress led to a ubiquitous rather than a specific evolutionary response to heavy metals (please see our response to Reviewer#1 and sections in pgs. 11 to 13) .

      Real-time qPCR is an accurate and reliable approach to assess the expression of specific genes such as ZMHMA1 and Tb1, but we agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. Therefore, we have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      There are two phenotypic changes clearly connected with the genetic mechanisms involved in the parviglumis to maize transition: plant height and the number of seminal roots (not nodal roots). These changes have been now emphasized in the Abstract and the description of the results.

      Regarding the possibility for HM stress to represent a confounding factor in the selection of maize and not a driving factor, we expanded the genomic analysis of genetic diversity well beyond the analysis of the three genes under initial study, to cover a segment of 11.47 Mb comprised between ZmSKUs5 and ZmHMA1. We compared nucleotide variability by using 100 bp bins covering loci composed of two 30 Kb segments up and downstream of coding sequences, respectively, and the coding sequence itself, for 173 genes present within the genomic region comprised between ZmSKUs5 and ZmHMA (Figure S1 and Supplementary File 6). The full analysis is presented in a new section pgs. 11 and 12. We found that 166 out of 173 loci show signatures of positive selection and are roughly organized in five independent subregions of variable length. Four out of five subregions contain more than one HM or oxidative stress response gene within loci showing signatures of positive selection. Although multiple factors other than HM stress could have played a role in the evolutionary mechanisms that affected the genetic diversity of chr.5, large scale transcriptomic data corresponding to independent experiments aiming at understanding the response of maize roots to HM stress allowed the identification of 49 additional HM response genes within loci showing positive selection across the genome, a proportion (43.3%) far greater than the proportion of loci containing response genes to other types of abiotic stress not related to HMs (28.6%). These results are described in detail in pgs. 12 and 13 (Figure S3 and Supplementary File 7). These results provide strong evidence in favor of HM stress and not another factor driving positive selection.

      We now provide precise and pertinent paleoenvironmental data on the potential influence of heavy metals in the field. In sections pgs. 17 to 20 we review paleoenvironmental studies revealing periods of climatic instability in the presumed region of maize emergence during the early Holocene, and data indicating that the date and region where maize emerged is convergent with the dates and locations of several volcanic eruptions occurred during the early and middle Holocene in that same region. Please see responses to Reviewer#1 for details.

      We agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. Therefore, we have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      This is also emphasized in the Discussion (pg 21) as follows:

      “Under HM stress, we also show that Tb1 is overexpressed in the apical meristem of teosinte parviglumis, suggesting that formation of secondary ramifications is repressed by Tb1 function under HM stress, as in extant maize. At this stage we cannot discard the possibility that Tb1 upregulation in parviglumis reflects a more generalized response to abiotic stress; however, the expression ZmHMA1 is downregulated in W22 wild-type maize meristems in the presence of HMs but upregulated in teosinte parviglumis meristems, suggesting that a specific regulatory shift relating HM responses and ZmHMA1 function occurred during the teosinte parviglumis to maize transition.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the dataset generated provides an interesting foundation for hypothesis testing on HM stress and domestication, the current data do not sufficiently support the conclusions of the manuscript.

      (1) The description of maize and teosinte architecture under HM stress is well presented.

      However, traits like shoot height, leaf size reduction, and biomass loss also occur under other environmental stresses such as drought and salinity. Additional evidence beyond shoot and root architecture would help validate the link between tb1 expression and specific ZmHMA genes under HM stress, or whether it reflects a more generalized stress response.

      We have already addressed in detail this point in the public response to Reviewer#1.

      (2) The nucleotide variability analysis is interesting, but I would have liked to see additional information to clarify the choice of the data selection and the strength of the conclusions with human selection.

      We have already addressed in detail this point in the public response to Reviewer#1.

      a) The choice of Tripsacum dactyloides as the outgroup to determine nucleotide variability seems to be distant, and I wonder whether other combinations with a closer outgroup or multiple outgroups were tried to provide a more accurate context.

      Nucleotide variability in Tripsacum dactyloides is used to graphically illustrate an external reference and not as an outgroup in the extended analysis of genetic diversity at the locus and genomic level. We did not used Tripsacum dactyloides as an outgroup in our statisticalm analysis. We could have indeed a closer teosinte subspecies as an outgroup, but at this stage no data warrants that environmentally-related selective pressures could have affected genetic diversite in other teosintes. This possibility in currently being investigated.

      b) Evolutionary differences not related to human influence could affect the results. The phrase "order of magnitude difference in π values" needs statistical validation (e.g., confidence intervals, p-values).

      We agree and have eliminated the sentence, as it is no longer relevant at the light of the detailed genomic analysis of genetic diversity prsented in Supplementary File 6.

      c) The comparison with ZmGLB1, a neutral control locus, suggests that domestication-related changes in nucleotide variability are specific to the three candidate genes. However, the concept of neutrality is complex, and while ZmGLB1 may be considered neutral in this case, the argument does not address the possibility of other factors, such as linked selection, that could influence variability in these genes. Referencing Hufford et al. is insufficient and would require a deeper argument.

      We also agree with this comment. We think that the influence and consequences of linked selection are now well documented for 11.46 Mb analyzed in chr.5 (pgs 11 and 12) in the main text and Supplementary File 6).

      (3) The statement: "Our evidence indicates that HM stress revealed a teosinte parviglumis environmental plasticity that is directly related to the function of specific HM response genes that were affected by domestication through human selection" is not supported by the presented data. The rationale for the specific Cd/Cu dosage used is unclear. A dose-response gradient would better demonstrate the nature and strength of the plastic response.

      Previous reports support the rationale for the specific HM dosage in this study; Cu/Cd dosage response gradients have been conducted in maize (AbdElgawad et al. 2020; Atta et al., 202), but since no studies have been conducted in teosinte, we reasoned that it was important to apply the same treatment to both subspecies. We have now emphasized this rationale by adding the following in pg XX: “Whereas higher doses impair flowering or are lethal, lower Cu/Cd concentrations do not consistently show conventional phenotypic responses such as reduced plant growth (AbdElgawad et al. 2020; Atta et al., 2023)”.

      We agree that the statement raised by the reviewer needed revision at the light of our results. We did revise the statement to accurately reflect our current evidence as follows: “Our results reveal a teosinte parviglumis environmental plasticity that is likely related to the function of HM response genes positively selected during the teosinte parviglumis to maize transition.”

      (4) In maize, TEs are known to influence gene expression under abiotic stress, including for tb1 (PMID: 25569788). Since the author appears to make a causative conclusion between ZmHMA1, TB1, and HM stress, I would have liked to see a whole-transcriptome analysis and not a curation of two genes to determine whether other factors, such as TEs, can have that would lead to similar outcomes.

      We agree that is definetely a possibility that we have not investigated at this stage. However, we added a pargraph to reflect this pertinent suggestion:

      “Previous studies have demonstrated that transposable elements (TEs) contribute to activation of maize genes in response to abiotic stress, affecting up to 20% of the genes upregulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress (Makarevitch et al., 2015). It is therefore possible that the HM response of some specific genes that influenced maize emergence or domestication could be mediated by TEs influencing or driving their transcriptional regulation.”

      (5) I would suggest that the authors carefully review the tables, figures, and the corresponding legends. For example :

      a) Table 2 is called before Table 1, I would therefore suggest changing the numbering to reflect the paragraph order.

      Thank you for your help, we did change the order of the Tables in the new version.

      b) In Table 2, it is not clear whether the P value applies to the mean difference between WT and the mutant zmhma1, either in the presence or the absence of heavy metals. In addition, the authors need to use the P-value to estimate the differences between WT in the absence vs presence of HM, and WT in the absence of HM versus the mutant in the absence of HM (idem for presence).

      We did address this issue in detail and added P-values and specific pairwise comparisons to that Table (now Table 1). Data are presented as mean ± standard deviation and were tested by a paired Student’s T-Test. When the effects were significant according to T-Test, the treatments were compared with the Welch two sample T-Test at P < 0.05.

      c) Table 1 and Table 2: Indicate what type of statistical test was used and the number of plants used for each experiment (n). Also, I recommend the use of scientific notation for the P-values.

      The statistical tests have now been indicated, scientific notation has been added to the P-values; the number of plants and biological replicates are indicated in the Methods section.

      d) Lines 202 and 204: I assume Table 1 should be called instead of Table 2.

      This error has been corrected.

      e) General: In the text, when significance is highlighted along with measurements, the p-value needs to be added.

      We have added the P-value along the measurement for all significant differences.

      f) In the text, it is also mentioned that "the expression of ZMHMA1 was significantly increased in the presence of HMs (Figure 3c)". We are looking here at an RT-PCR, which is qualitative and without a robust quantitative comparison and statistics, I cannot conclude this assessment based on the presented evidence. No statistical measure is indicated here.

      Panel 3c is not RT-PCR but a real-time qPCR, showing relative fold-change, normalized to actin, with a 3-technical triplicate per 3 biological replicates). We have added error bars (SD) and P-values represented by asterisks (calculated with Student's t statistic) to support significant differences (P<0.05 and P<0.01). ZmHMA1 expression was significantly increased in the presence of HMs only in teosinte; there was no significant difference in maize.

      g) Figure 3 should at least have the gene name in the figure to quickly understand the figure panel. The key conserved domains should also be identified.

      We agree and apologize for the omission. The gene names have been added adjacent to the structures.

      h) Sentence at lines 459-460 lacks words and punctuation.

      This unfortunate rror has also been corrected.

      i) Figure S1, the reference Lemmon and Doebley, 2024 should be Lemmon and Doebley, 2014 to harmonize with the text.

      The correct year is 2014. We have corrected this error.

      Reviewer #2 (Recommendations for the authors):

      (1) The narrative should be clearer, starting with a clearer hypothesis that is later sustained or not in the results, and then discussed in the idea and speculation section.

      Thank you for the comment. We have clarified the hypothesis, it is included in the abstract and the last paragraph of the Introduction. We hope it is now clear that the evidence presented supports our hypothesis

      (2) Focus more on traits that are relevant, for example, nodal and seminal roots.

      We modified the text to emphasize three relevant traits. In the case of teosinte under HM stress, absence of tillering and increase in the number of female inflorescences. In the case of the zmha1 mutant under HM stress, differences in the number of nodal roots, and differences in height.

      (3) RNA-seq in Cu/Cd stress could make the work much more useful and complete.

      As previously mentioned, we have incorporated a large scale transcriptional analysis on the basis of six transcriptomes statistically validated (Table S5). Please see sections pgs. 11 to 13 for details.

    1. Lady Susan to Mrs. Johnson. Churchhill. Never, my dearest Alicia, was I so provoked in my life as by a letter this morning from Miss Summers. That horrid girl of mine has been trying to run away. I had not a notion of her being such a little devil before, she seemed to have all the Vernon milkiness; but on receiving the letter in which I declared my intention about Sir James, she actually attempted to elope; at least, I cannot otherwise account for her doing it. She meant, I suppose, to go to the Clarkes in Staffordshire, for she has no other acquaintances. But she shall be punished, she shall have him. I have sent Charles to town to make matters up if he can, for I do not by any means want her here. If Miss Summers will not keep her, you must find me out another school, unless we can get her married immediately. Miss S. writes word that she could not get the young lady to assign any cause for her extraordinary conduct, which confirms me in my own previous explanation of it. Frederica is too shy, I think, and too much in awe of me to tell tales, but if the mildness of her uncle should get anything out of her, I am not afraid. I trust I shall be able to make my story as good as hers. If I am vain of anything, it is of my eloquence. Consideration and esteem as surely follow command of language as admiration waits on beauty, and here I have opportunity enough for the exercise of my talent, as the chief of my time is spent in conversation. Reginald is never easy unless we are by ourselves, and when the weather is tolerable, we pace the shrubbery for hours together. I like him on the whole very well; he is clever and has a good deal to say, but he is sometimes impertinent and troublesome. There is a sort of ridiculous delicacy about him which requires the fullest explanation of whatever he may have heard to my disadvantage, and is never satisfied till he thinks he has ascertained the beginning and end of everything. This is one sort of love, but I confess it does not particularly recommend itself to me. I infinitely prefer the tender and liberal spirit of Mainwaring, which, impressed with the deepest conviction of my merit, is satisfied that whatever I do must be right; and look with a degree of contempt on the inquisitive and doubtful fancies of that heart which seems always debating on the reasonableness of its emotions. Mainwaring is indeed, beyond all compare, superior to Reginald—superior in everything but the power of being with me! Poor fellow! he is much distracted by jealousy, which I am not sorry for, as I know no better support of love. He has been teazing me to allow of his coming into this country, and lodging somewhere near incog.; but I forbade everything of the kind. Those women are inexcusable who forget what is due to themselves, and the opinion of the world. Yours ever, S. VERNON.

      There is a lot to debrief in this passage. We see how she has received word from Miss Summer over Fredrica, who has tried to run away to a possible friend's house after hearing about her mother's intention with her to marry Sir James. She is explaining to her dear friend, Alicia/Mrs. Johnson that she feels Fredrica is too scared of her to tell her anything, so she has sent her uncle to 'truly scare' her in the hopes she'll start behaving correctly. She also addresses how she's sure that Fredrica will speak "lies" of her to her uncle, so Lady Susan is going to have to find a way to make Fredrica's stories sound misunderstood and victimize herself. After that first part of the passage, she then switches into telling her friend about all the new romantical aspects in her life. I believe she's making Reginald out to sound like a possible interesting affair but she would never plan to marry him as he isn't serious and far too cocky. She then goes back to her yearning for Mr. Mainwaring.. the married man...and she sounds semi delusional addressing his jealously. She then mentions how she refuses to bring him home near Incog, as the women there are nosy and have their opinion on everything.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reply to the Reviewers __

      We thank the Reviewers for their positive assessment and recognition of the paper achievements. The insightful comments will strengthen the data and manuscript.

      Referee #1* *

      Minor comments

      1. Fig 1B - add arrows showing mRNAs being translated or not (the latter mentioned in line 113 is not so easy to see). We have magnified the inset of the colocalisation in the right column; we added arrows and arrowheads to differentiate colocalised and non-colocalised bcd with translating SunTag.

      2. Fig 2A - add a sentence explaining why 1,6HD, 2,5HD and NaCl disrupt P bodies. *

      We have added the information on the use of 1,6HD, 2,5HD, and NaCl to disrupt P-bodies as below. Revised line 158: “To further show that bcd storage in P bodies is required for translational repression, we treated mature eggs with chemicals known to disrupt RNP granule integrity (31, 37, 69-72). Previous work has shown that the physical properties of P bodies in mature Drosophila oocytes can be shifted from an arrested to a more liquid-like state by addition of the aliphatic alcohol hexanediol (HD) (Sankaranarayanan et al., 2021, Ribbeck and Görlich, 2002; Kroschwald et al., 2017). While 1,6 HD has been widely used to probe the physical state of phase-separated condensates both in vivo and in vitro (Alberti et al., 2019; McSwiggen et al., 2019; Gao et al., 2022), in some cells it appears to have unwanted cellular consequences (Ulianov et al., 2021). These include a potentially lethal cellular consequences that may indirectly affect the ability of condensates to form (Kroschwald et al., 2017) and wider cellular implications thought to alter the activity of kinases (Düster et al., 2021). While we did not observe any noticeable cellular issues in mature Drosophila oocytes with 1,6 HD, we also used 2,5 HD, known to be less problematic in most tissues (Ulianov et al., 2021) and the monovalent salt sodium chloride (NaCl), which changes electrostatic interactions (Sankaranarayanan et al., 2021).”

      *Fig 4C - explain in the legend what the white lines drawn over the image represent. And why is there such an obvious distinction in the staining where suddenly the DAPI is much more evident (is the image from tile scans)? *

      Figure 4C is the tile scan image of a n.c.10 embryo and the white line classified the image into four quadrants. We used this image to quantify the extent of bcd (magenta) colocalisation to SunTag (green) in the anterior and posterior domains of the embryo in the bar graph shown in panel C’. There is a formatting error in the image. We will correct this in the revised version. We will also include the details of white lines in the legends. Finally, based on further reviewer comments, in the revised version this data is shifted to the supplementary information.

      • Line 215 - 'We did not see any significant differences in the translation of bcd based on their position, however, there appears an enhanced translation of bcd localised basally to the nuclei (Figure S5).' Since the difference is not significant, I do not think the authors should conclude that translation is enhanced basally. *

      We agree with the reviewer. In this preliminary revision we have changed this statement to: “We did not see any differences in the translation of bcd based on their position with respect to the nuclei position (Figure S5)” (revised line 238-239).

      *Line 218: 'The interphase nuclei and their subsequent mitotic divisions appeared to displace bcd towards the apical surface (Figure S6B).' Greater explanation is needed in the legend to Fig S6B to support this statement as the data just seem to show a nuclear division - I would have thought an apical-basal view is needed to conclude this. *

      We have rearranged this figure and shown in clarity the apical-basal view of the blastoderm nuclei and the displacement of bcd from the surface of the blastoderm in Figure S8.

      New Figure S8: n.c.8 - pre-cortical migration; n.c.12,14- post cortical migration; Mitosis stages of n.c.9-10. The cortical interphase nuclei at n.c. 12,14 displaces bcd. The nuclear area (DAPI, cyan) does not show any bcd particles (magenta) indicated by blue stars. The mitotic nuclei (yellow arrowheads, yellow stars) displace bcd along the plane of nuclear division (doubled headed yellow arrows).

      Fig 5B - the authors compare Bcd protein distribution across developmental time. However, in the early time points cytoplasmic Bcd is measured (presumably as it does not appear nuclear until nc8 onwards) and compare the distribution to nuclear Bcd intensities from nc9 onwards. Is most/all of the Bcd protein nuclear localised form nc9 to validate the nuclear quantitation? Does the distribution look the same if total Bcd protein is measured per volume rather than just the nuclear signal? Are the authors assuming a constant fast rate of nuclear import?

      From n.c.8 onwards, the Bcd signal in interphase nuclei builds up, with the nuclear intensity becoming very high compared to cytoplasmic Bcd. However, we do see significant Bcd signal in the cytoplasm (i.e., above background). In earlier work, gradients of the nuclear Bcd and nuclear-import mutant Bcd overlapped closely (Figure 1B, Grimm et al., 2010). This essentially suggests the nuclear Bcd gradient reflects the corresponding gradient of cytoplasmic Bcd. Further, the nuclear import of Bcd occurs rapidly after photobleaching (Gregor et al., 2007). Based on these observations, and our own measurements, prior to n.c. 9, the cytoplasmic gradient is likely a good approximation of the overall shape, whereas post n.c. 9 the Bcd signal is largely nuclear localised. Further, the overall profile is not dependent on the nuclear volume.

      • Line 259 - 'We then asked if considering the spatiotemporal pattern of bcd translation' - the authors should clarify what new information was included in the model. Similarly in line 286, 'By including more realistic bcd mRNA translation' - what does this actually mean? In line 346, 'We see that the original SDD model .... was too simple.' It would be nice to compare the outputs from the original vs modified SDD models to support the statement that the original model was too simple. *

      We will improve the linking of the results to the model. The important point is that when and where Bcd production occurs is more faithfully used, compared with previous approximations. By including more realistic production domains, we can replicate the observed Bcd gradient within the SDD paradigm without resorting to more complex models.

      Fig S1A - clarify what the difference is between the 2 +HD panels shown.__ __

      The two +HD panels at stage 14 indicate that upon the addition of HD, there are no particles in 70% of the embryos, and 30% show reduced particles. We will add this information to the figure legend.

      • Fig S2E - the graph axis label/legend says it is intensity/molecule. Since intensity/molecule is higher in the anterior for bcd RNAs, is this because there are clumps of mRNAs (in which case it's actually intensity/puncta)? *

      The density of mRNA is very high in the anterior pole; there is a chance that more than one bcd particle is within the imaged puncta (due to optical resolution limitations). We will change the y-axis to average intensity per molecule to average intensity per puncta.


      • Fig S4 - I think this line is included in error: '(B) The line plots of bcd spread on the Dorsal vs. Ventral surfaces.'*

      Yes, we will correct this in the revision.

      • In B, D, E - is the plot depth from the dorsal surface? I would have preferred to see actual mRNA numbers rather than normalised mRNAs. In Fig S4D moderate, from 10um onwards there are virtually no mRNA counts based on the normalised value, but what is the actual number? The equivalent % translated data in Fig S4E look noisy so I wonder if this is due to there being a tiny mRNA number. The same is true for Figs S4D, E 10um+ in the low region.*

      Beyond 10um from the dorsal surface, the number of bcdsun10 counts is very low. It becomes negligible at the moderate and low domains. We will attach the actual counts of mRNA in all these domains as a supplementary table in the revised version.

      General assessment Strengths are: 1) the data are of high quality; 2) the study advances the field by directly visualising Bcd mRNA translation during early Drosophila development; 3) the data showing re-localisation of bcd mRNAs to P bodies nc14 provides new mechanistic insight into its degradation; 4) a new SDD model for Bcd gradient formation is presented. Limitations of the study are: 1) there was already strong evidence (but no direct demonstration) that bcd mRNA translation was associated with release from P bodies at egg activation; 2) it is not totally clear to me how exactly the modified SDD model varies from the original one both in terms of parameters included and model output.

      This is the first direct demonstration of the translation of bcd mRNA released as a single mRNA from P bodies. Previously, we have shown that P bodies disruption releases single bcd from the condensates (31). We have captured a comprehensive understanding of the status of individual bcd translation events, from their release from P bodies at the end of oocyte maturation until the end of blastoderm formation.

      The underlying SDD model – that of localised production, diffusion, and degradation – is still the same (up to spatially varying diffusion). Yet the model as originally formulated did not fit all aspects of the data, especially with regards to the system dynamics. Here, we demonstrate that by including more accurate approximations of when and where Bcd is produced, we can explain the formation of the Bcd morphogen gradient without recourse to any further mechanism.


      Referee #2

      1. Line 114: The authors claim to have validated the SunTag using a fluorescent reporter, but do not show any data. Ref 60 is a general reference to the SunTag, and not the Bcd results in this paper. Perhaps place their data into a supplemental figure or movie? To show the validation of our bcdSun32 line, we have composed a new Figure S1 that shows the translating bcdSun32 (magenta) colocalising to the ScFV-mSGFP2 (green). Yellow arrowheads in the zoom (right panel) points to the translating bcdSun32 (magenta) and red arrowheads points to the untranslated bcdSun32. In addition, we have also shown the validation of bcdSun32 with the anti-GCN4 staining in the main Figure 1B.

      Further, we have dedicated supplementary Figure S3 (previously Figure S2) for the validation of our bcdSun10 construct. Briefly, bcdSun10 is inserted into att40 site of chr.2. We did a rescue experiment, where bcdSun10 rescued the lethality of homozygous bcdE1 null mutant. We then performed a colocalisation experiment using smFISH, where we demonstrated that almost all bcd in the anterior pole are of type bcdSun10. We targeted specific fluorescent FISH probes against 10xSunTag sequence (magenta, Figure S2A) and bcd coding sequence (magenta, Figure S2A). Upon colocalisation, we found ~90% of the mRNA are of bcdSun10 type. The remaining 10% could likely be contributed by the noise level (Figure S2B). We will make sure these points are clear in the revised manuscript.

      Line 128 and Fig. 1E: The claim that bcd becomes dispersed is difficult to verify by looking at the image. The language could also be more precise. What does it mean to lose tight association? Perhaps the authors could quantify the distribution, and summarize it by a length scale parameter? This is one of the main claims of the paper (cf. Line 23 of the abstract) but it is described vaguely and tersely here.

      We have changed the text from, “We also confirmed that bcd becomes dispersed, losing its tight association with the anterior cortex (Figure 1E) (31)” to, “We also confirmed that bcd is released from the anterior cortex at egg activation (Figure 1E) (31, 21).” (Revised line 131).

      The release of bcd mRNA at egg activation was first shown in 2008 (Ref 21, Figure 4, D-E) and again in 2021 (Ref 31, Figure 7 B and E). The main point in line 127-128, “P bodies disassembled and bcd was no longer colocalised with P bodies” and the novel aspect of line 23 is “translation observed”. The distribution of bcd mRNA after egg activation was not the point of this section. We have improved the writing in the revision to make this clearer.

      Line 146, Fig. 1G: This is a really important figure in the paper, but it is confusing because it seems the authors use the word "translation," when they mean "presence of Bcd protein." In other places in the paper, the authors give the impression that "bcd translation" means translation in progress (assayed by the colocalization of GCN4 and bcd mRNA). However, in Fig. 1G, the focus is only on GCN4. Detecting Bcd protein only at the anterior does not mean that translation happens only at the anterior (e.g., diffusion or spatially-restricted degradation could be in play).

      In Figure 1G, we have shown only the “translated” Bcd by staining with a-GCN4. We have changed line 146 from, “Consistent with previous findings, we only observed bcd translation at the anterior of the activated egg and early embryo (Figure 1G-H) (3, 68)” to, “Consistent with previous findings, we only observed the presence of Bcd protein at the anterior of the activated egg and early embryo (Figure 1G-H) (3, 68). (Revised line 151-153). We will use “translating bcd” or “bcd in translation” where we show colocalisation of bcd with BcdSun10 or BcdSun32 elsewhere in the manuscript.

      We did not mean to claim that translation occurred only in the anterior pole. We show that the abundance of bcd is very high in the anterior pole (in agreement with previous work) and that this is where the majority of observed translation events took place. Indeed, we have also shown that posteriorly localised mRNAs have the same BcdSun10 intensity per bcd puncta from the posterior pole (Figure 3B & 4C’ and Figure S2 E), but these are much fewer in number.

      *It would also be helpful to show a plot with quantification of Bcd detection (or translation) on the y-axis and a continuous AP coordinate on the x-axis, instead of just two points (anterior and posterior poles, the latter of which is uninteresting because observing no Bcd at the posterior pole is expected). *

      In Figure 1G,H, our aim was to test whether release from P bodies allowed for bcd mRNA to be translated. We used the presence of Bcd protein at the anterior domain of the oocytes to show this. The posterior pole was included as an internal control. To show the spatial distribution of bcd mRNA and its translation, we used early blastoderm (Figure 3, Figure S4).

      • *

      Another issue with Fig. 1G is that the A and P panels presumably have different brightness and contrast. If not, just from looking at the A and P panels, the conclusion would be that Bcd protein is diffuse (and abundant) in the posterior and concentrated into puncta in the anterior. The authors should either make the brightness and contrast consistent or state that the P panel had a much higher brightness than the A panel.

      We agree with this shortcoming. We have now added the following to Figure 1 legend to clarify this observation. “G: Representative fixed 10 µm Z-stack images (from 10 samples) showing BcdSun32 protein (anti-GCN4) is only present at the anterior of an in vitro activated egg or early embryo 30-minute post fertilization. BcdSun32 protein is not detected in these samples at the posterior pole (image contrast increased to highlight the lack of distinct particles at the posterior). BcdSun32 protein is also not detected at the anterior or posterior of a mature oocyte or an in vitro activated egg incubated with NS8953 (images have the contrast increased to highlight the lack of distinct particles). Scale bar: 20 mm; zoom 2 mm.” (Revised line 623).

      • Line 176: This section is very confusing, because at this point the authors already addressed the spatial localization of translation in Fig. 1G,H (see my above comment). However, here it seems the authors have switched the definition of translation back to "translation in progress." Therefore, the confusion here could be eliminated by addressing the above point.*

      In the revised version, we will use Bcd protein when shown with anti-GCN4 staining. We will use “translating bcd” or “bcd in translation” where we show colocalisation of bcd with a-GCN4 (BcdSun10 or BcdSun32). We will change this in the corresponding text.

      Line 185: The sentence here is seemingly contradictory: "most...within 100 microns" implies that at least some are beyond 100 microns, while the sentence ends with "[none]...more than 100 microns." The language could perhaps be altered to be less vague/contradictory.

      We will clarify this in the revised version. There are few particles visible beyond 100 um. In the lower panel of Figure 3B, the posterior domain shows few particles. However, their actual number compared to bcd counts within the 100 um is negligible (Figure3C). Nonetheless, the few bcd particles we observe do seem to be under translation (quantified in Figure 4C’ and Figure S2E).

      • Line 204: It would be really nice to have quantification of the translation events, such as curves of rate of translation as a function of a continuous AP coordinate, and a curve for each nc.*__ __

      In the revised version we will provide the results quantifying the translation events across the anterior- posterior axis. This will provide a clarity to the presence of bcd and their translation in the posterior domain with time.

      Our colocalisation analysis is semi-automated. It includes an automated counting of the individual bcd particle counts and a manual judgement of the colocalised BcdSun10 protein (distinct spots, above noise) to bcd particles (Figure S3D). The bcd particle counts ran into thousands in each cyan square box (measuring 50um radius and ~ 20um deep from the dorsal surface). We selected three such boxes covering 150um (continuously) from the anterior pole across A-P axis and 20um deep of the flattened embryo mounts across D-V axis (Figure 3A-C, Figure S4). We have also scanned scarce particles in the posterior; however, bcd counts are very low compared to the anterior. Further, in Figure 4 we have repeated the same technique to measure translation of bcd particles in embryos at different nuclear cycles.

      We have also shown continuous intensity measurements of bcd particles with their respective BcdSun10 gradient in Figure 5 across the A-P axis at different nuclear cycles. Here, we know BcdSun10 intensity is not only from the “translating” bcd (colocalised BcdSun10 to bcd particles) but also from the translated BcdSun10 freely diffusing (non-colocalised BcdSun10 to bcd particles). As asked by the reviewer, in the revised version we will add bcd counts and their translation status from anterior to posterior axis for each of the nuclear cycles.

      In our future work, we planned to generate MS2 tagged bcdSun10 to measure the rates of translation in live across all nuclear cycles.

      • *

      *Line 209 and Fig 4C: The authors use the terms "intensity of translation events" or "translation intensity" without clearly defining them. From the figure (specifically from the y-axis label), it looks like the authors are quantifying the intensity per molecule (which is not clearly the same thing as "translation intensity"), but it would be nice if that were stated explicitly. *

      In the relevant result section, we have changed the results text to “the intensity of translation events” for explaining the results of Figure 4C’.

      • In addition, the authors again quantify only two points. This is a continuously frustrating part of the manuscript, which applies to nearly all figures where the authors looked only at two points in space. At a typical sample size of N = 3, it seems well within time constraints to image at multiple points along the AP axis.*__ __

      In addition to the quantification shown at the anterior and posterior locations of the embryo in the Figure 3 and 4, we will show in the revised version, the quantification of translation events across all locations from the anterior to the posterior. We will use three embryos for each nuclear cycle from n.c.1 to 14.

      • Furthermore, it sounds like the authors are saying the "translation intensity" is the same in anterior and the posterior, which is counterintuitive. The expectation is that translation would be undetectable at the posterior end, in part because bcd mRNA would not be present. (Note that this expectation is even acknowledged by the authors on Line 185, which I comment on above, and also on Line 197). There should also be very low levels of Bcd protein (possibly undetectable) at the posterior pole. As such, the authors should explain how they think their claim of the same "translation intensities" in the anterior vs posterior fits into the bigger picture of what we know about Bcd and what they have already stated in the manuscript. They should also explain how they observed enough molecules to quantify at the posterior end. The authors should also disclose how many points are in each box in the boxplot. For example, the sample size is N = 3 embryos. In just three embryos, how many bcd/GCN4 colocalizations did the authors observe at the posterior end of the embryo?*

      In n.c.4 in Figure3, we saw few bcd particles in the posterior. However, at n.c.10 in Figure 4C’ the number of posterior bcd particles are higher than at the early stages. We have quantified them in Figure 4C’. We will clarify this from the new set of quantification we are undertaking now to quantify translation across the A-P axis in the revision.

      Finally, we will also provide the number of bcd particle counts and their colocalisation with a-GCN4 as a supplementary table.

      • Line 215: The sentence that starts on this line seems self-contradictory: I cannot tell whether or not there is a difference in translation based on position. *

      We have not observed any difference in the translation of bcd particles depending on the position along the Z-axis. We will edit this in our revised version.

      • Line 229: Long-ranged is a relative term. From the graph, one could state there is some spatial extent to the mRNA gradient, so it is unclear what the authors mean when they say it is not "long-ranged." Could the mRNA gradient be quantified, such as with a spatial length scale? This would provide more information for readers to make their own conclusions about whether it is long-ranged.*

      We have quantified the bcd mRNA gradient for each n.c. (Figure 5B-C); absolute bcd intensities in Figure 5B, left panel and the normalised intensities in Figure 5C. The length of the mRNA spread appears constant with the half-length maximum of ~75um across all nuclear cycles. Our conclusion of a long ranged Bcd gradient is based on the comparisons of the half-length maximum measurements of bcd particles and BcdSun10 (Figure 5D).

      *Line 230: When the authors claim the Bcd gradient is steeper earlier, a quantification of the spatial extent (exponential decay length scale) would be appropriate. Indeed, lambda as a function of time would be beneficial. It should also be placed in context of earlier papers that claim the spatial length scale is constant. *

      We will show this effectively from the live movies of bcdSun10/nanos-scFv-sGFP2 in the revised version.

      • Lines 235-236: The two sentences that start on these two lines are vague and seemingly contradictory. The first sentence says there is a spatial shift, but the second sentence sounds like it is saying there is no spatial change. The language could be more precise to explain the conclusions. *

      We agree with the reviewer. We will edit this in revision.

      Minor comments

        • Line 81: Probably meant "evolutionarily conserved" * Yes, we have changed, “P bodies are an evolutionarily cytoplasmic RNP granule” to, “P bodies are an evolutionarily conserved cytoplasmic RNP granule.”(Revised line 84-85).

      *Figure 1 legend: part B says "from 15 samples" but also says N = 20. Which is it, or do these numbers refer to different things? *

      We have edited this from, “early embryo (from 15 samples)” to, “early embryo (from 20 samples)”. (Revised line 602).

      • Line 217: migration of what? *

      Edited to “cortical nuclear migration”.

      • Line 228: "early embryo" is vague. The authors should give specific time points or nuclear cycle numbers.*

      Edited to “nuclear cycles 1-8”.

      • Line 301: Other locations in the paper say 75 microns or 100 microns. *

      We will make the changes. It is 100 um.

      • Fig. 5: all images should be oriented such that the dorsal midline is on the upper half of the embryo/image. *

      We will flip the image to match.

      • Fig. 5B: There are light tan and/or light orange curves (behind the bold curves) that are not explained. *

      It is the standard deviation. This will be explained.

      • Fig. 5C: the plot says "normalized" but nowhere do the authors describe what the curves are normalized to. There is also no explanation for what the broad areas of light color correspond to.*__ __

      Normalised to the bcd intensity maxima. This will be explained.

      Significance

      The results, if upheld, are highly significant, as they are foundational measurements addressing a longstanding question of how morphogen gradients are formed, using Bcd (the foundational morphogen gradient) as a model. They also address fundamental questions in genetics and molecular biology: namely, control of mRNA distribution and translation.__ __

      We thank Reviewer 2 for highlighting the importance of our work in the field. We are confident that we address the issues raised by Reviewer 2 with the new set of quantifications we are currently working on.

      Referee #3

        • It is not evident from the main results and methods text that the new SDD model incorporates the phenomenon reported in figure 4B. From my reading, the parameter beta accounts for the Bcd translation rate, which according to figure 7B(ii) effectively switches from off to on around fertilization and thereafter remains constant. Figure 4B shows that the fraction of bcd mRNA engaged in translation decreases beginning around NC12/13, and this is one of the more powerful results that comes from monitoring translation in addition to RNA localization/abundance/stability. My expectation based on figure 4B would be that parameter beta should decrease over time beginning around 90-100 minutes and approach zero by ~150 minutes. This rate could be fit to the experimental data that yields figure 4B. The modeling should be repeated while including this information. This is a good observation. Currently, the reduced rate of bcd translation is modelled by incorporating an increased rate of bcd *mRNA degradation. Of course, this could also be reduced by a change in the rate of translation directly. As stated already, the beta parameter is the least well characterised. In the revision, we will include a model where beta changes but not the mRNA degradation rate. We will improve the discussion to make this point clearer.
      1. The presentation of the SDD model should be expanded to address how well the characteristic decay length fits A) measured Bcd protein distributions, B) measured at different nuclear cycles. This would strengthen the claim that the new SDD model better captures gradient dynamics given the addition of translation and RNA distribution. These experimental data already exist as reported in Figure 5. In the current Figure 7, panels D and D' add little to the story and could be moved to a supplement if the authors want to include it (in any case, please fix the typo on the time axis of fig 7D' to read "hours"). The model per cell cycle and the comparison of experimental and modeled decay lengths could replace current D and D'.*

      Originally, we kept discussion of the SDD model only to core points. It is clear from all Reviewers that expanding this discussion is important. In the revision, we will refocus Figure 7 on describing new results that we can learn. As outlined in the responses above, this paper reveals an important insight: the SDD model – with suitable modifications such as temporally restricted Bcd production – can explain all observed properties of Bcd gradient formation. Other mechanisms – such as bcd mRNA gradients – are not required.

      • The exposition of the manuscript would benefit significantly by including a section either in the introduction or the appropriate section of the results that defines the competing models for gradient formation. In the current version, these models are only cited, and the key details only come out late (e.g., lines 302 onward, in the Discussion). Nevertheless, some of the results are presented as if in dialog with these models, but it reads as a one-sided conversation. For instance: Figure 3. The undercurrent in this figure is the RNA-gradient model. In the context of this model, the results clearly show that translation of bcd is restricted to the anterior. Without this context, Figure 3 could read as a fairly unremarkable observation that translation occurs wherever there is mRNA. Restructuring the manuscript to explicitly name competing models and to address how experimental results support or detract from each competing model would greatly enhance the impact of the exposition.*

      We thank the reviewer for this suggestion. We will add the current models of Bcd gradient formation in the introduction section and will change the narrative of results in the section explaining the models.

      (4A) Related to point 3: The entire results text surrounding Figure 2 should be revised to include more detail about A) what specific hypotheses are being tested; and B) to critically evaluate the limitations of the experimental approaches used to evaluate these hypotheses. Hexanediol and high salt conditions are not named explicitly in the text, but the text touts these as "chemicals" that "disrupt P-body integrity." This implies that the treatments are specific to P-bodies. Neither of these approaches are only disrupting P Body integrity. This does not invalidate this approach, but the manuscript needs to state what hypothesis HD and NaCl treatment addresses, and acknowledge the caveats of the approach (such as the non-specificity and the assumptions about the mechanism of action for HD).

      We have made the following edits to resolve this point. Revised line 158: “To further show that bcd storage in P bodies is required for translational repression, we treated mature eggs with chemicals known to disrupt RNP granule integrity (31, 37, 69-72). Previous work has shown that the physical properties of P bodies in mature Drosophila oocytes can be shifted from an arrested to a more liquid-like state by addition of the aliphatic alcohol hexanediol (HD) (Sankaranarayanan et al., 2021, Ribbeck and Görlich, 2002; Kroschwald et al., 2017). While 1,6 HD has been widely used to probe the physical state of phase-separated condensates both in vivo and in vitro (Alberti et al., 2019; McSwiggen et al., 2019; Gao et al., 2022), in some cells it appears to have unwanted cellular consequences (Ulianov et al., 2021). These include a potentially lethal cellular consequences that may indirectly affect the ability of condensates to form (Kroschwald et al., 2017) and wider cellular implications thought to alter the activity of kinases (Düster et al., 2021). While we did not observe any noticeable cellular issues in mature Drosophila oocytes with 1,6 HD, we also used 2,5 HD, known to be less problematic in most tissues (Ulianov et al., 2021) and the monovalent salt sodium chloride (NaCl), which changes electrostatic interactions (Sankaranarayanan et al., 2021).”

      (4B) Continuing the comment above: it is good that the authors checked that HD and NaCl treatment does not cause egg activation. But no one outside of the field of Drosophila egg activation knows what the 2-minute bleach test is and shouldn't have to delve into the literature to understand this sentence. Please explain in one sentence that "if eggs are activated, then x happens following a short exposure to bleach (citations). We exposed HD and NaCl treated eggs to bleach and observed... ."

      We have made the following edits to resolve this point. Revised line 174: “After treating mature eggs with these solutions, we observed BcdSun32 protein in the oocyte anterior (Figure 2A-B). One caveat to this experiment could be that treating mature eggs with these chemicals results in egg activation which would in turn generate Bcd protein. To eliminate this possibility, we first screened for phenotypic egg activation markers, including swelling and a change in the chorion (73). We also applied the classic approach of bleaching eggs for two minutes which causes lysis of unactivated eggs (74). All chemically treated eggs failed this bleaching test meaning they were not activated (74). While we unable to rule out non-specific actions of these treatments, these experiments corroborate that storage in P bodies that adopt an arrested physical state is crucial to maintain bcd translational repression (31).”

      (4C) Continuing the comment above: The section of the results related to the endos mutation needs additional information. It is not apparent to the average reader how the endos mutation results in changes in RNP granules, nor what the expected outcome of such an effect would "further test the model" set up by the HD and NaCl experiments. The average reader needs more hand-holding throughout this entire section (related to figure 2) to follow the exposition of the results.

      We have made the following edits to resolve this point. Edited line 185: “Finally, we used a genetic manipulation to change the physical state of P bodies in mature oocytes. Mutations in Drosophila Endosulfine (Endos), which is part of the conserved phosphoprotein ⍺-endosulfine (ENSA) family (75), caused a liquid-like P body state after oocyte maturation, similar to that observed with chemical treatment (Figure 2C) (31). This temporal effect matched the known roles of Endos as the master regulator of oocyte maturation (75, 76). endos mutant oocytes lost the colocalisation of bcd mRNA and P bodies, concurrent with P bodies becoming less viscous during oocyte maturation (Figure 2D, Figure S1). Particle size and position analysis showed that bcd mRNA prematurely exhibits an embryo distribution in these mutants (Figure 2E). Due to genetic and antibody constraints, we are unable to test for translation of bcd in the endos mutant. However, it follows that bcd observed in this diffuse distribution outside of P bodies would be translationally active (Figure 2E-F).”

      • (4D) Continuing the comment above: The average reader also needs a better explanation of what hypothesis is being tested in Figure 1 with the pharmacological inhibition of calcium. *

      We have made the following edits to resolve this point. Revised line 138: “We next sought to maintain the relationship between bcd mRNA and P bodies through egg activation. This would act as a control to further test if colocalisation of bcd to P bodies was necessary for its translational repression. Previous work has shown that a calcium wave is required at egg activation for further development (references to add Kaneuchi et al., 2015; York-Anderson et al., 2019; Hu and Wolfner, 2019). Chemical treatment with NS8593 disrupts this calcium wave, while other phenotypic markers of egg activation are still observed (58). Using NS8593 to disrupt the calcium wave in the activated egg, we show P bodies are retained during ex vivo egg activation (Figure 1E). In these treated eggs, bcd mRNA remains colocalised with the retained P bodies (Figure 1F). Based on these results and previous observations (31, 66), we hypothesised that the loss of colocalisation between bcd and P bodies correlates with bcd translation.”

      *It is unclear why Bcd translation could not be measured in the endos mutant background, but it would be necessary to measure Bcd translation in the endos background. If genotypically it is not possible/inconvenient to invoke the suntag reporter in the endos background, would it not be sufficient to immunostain against Bcd itself? Different Bcd antisera have recently been reported and distributed by the Wieschaus and the Zeitlinger groups. *

      We have recently received the Bcd antibody from the Zeitlinger group. This has not been shown to work for immunostaining. It remains unclear if it will be successful in this capacity, but we are currently testing it and will include this experiment in the revision if successful.

      *Figure 4 overall is glorious, but there is a problem with panel C. What are the white lines? Why does the intensity for the green and magenta channel change abruptly in the middle of the embryo? *

      These white lines divide the embryo into 4 compartments. We used this method to quantify the intensity of Bcd translation with respect to the bcd puncta. We will correct this image as there is a problem in formatting.

      *It is noted that neither the methods section or the supplement does not contain any mention of how the modeling was performed. How was parameter beta fit? At least a brief section should be added to the methods describing how beta was fit (pending adjustments suggested in comment 1 above). A platinum-level addition would include a modeling supplement that reports the sensitivity of model outcomes to changes in parameters. *

      We apologise for this omission and will include full methodological details in the revision.

      Minor Comments:

        • Line 28: "Source-Diffusion-Degradation" should be changed to "Synthesis-..."* We will edit in the revised version.

      *Line 39: "blastocyst" should be "blastoderm stage embryo". *

      We will edit in the revised version.

      • Line 81: "P bodies are an evolutionarily cytoplasmic RNP granule." is "conserved" missing here? *

      We will edit in the revised version.

      • Throughout the manuscript, there should be better reporting of the imaged genotypes and whether the suntag is being visualized by indirect immunostaining of fixed tissues or through an encoded nanobody-GFP fusion. *

      We will explain in detail in the revised version.

      • Figure 1G: Why is the background staining so different across conditions? Is this a normalization artifact?*__ __

      We agree with this shortcoming. We have now added the following to the figure legend to clarify this observation. “G: Representative fixed 10 µm Z-stack images (from 10 samples) showing BcdSun32 protein (anti-GCN4) is only present at the anterior of an in vitro activated egg or early embryo 30-minute post fertilization. BcdSun32 protein is not detected in these samples at the posterior pole (image contrast increased to highlight the lack of distinct particles at the posterior). BcdSun32 protein is also not detected at the anterior or posterior of a mature oocyte or an in vitro activated egg incubated with NS8953 (images have the contrast increased to highlight the lack of distinct particles). Scale bar: 20 mm; zoom 2 mm.” (Revised line 623).

      Figure 2 legend: what is +Sch in the x-axis labels of figure 2B? The legend says that 2B is the quantification of the data in 2A, but there is no (presumed control) +Sch image in 2A.__ __

      Thank you for this suggestion we have added the data to Figure 2A.

      • Figure 5A largely repeats information presented in figure 4A. Please consider moving to a supplement. Also, please re-orient embryos to follow the convention that dorsal-most surfaces be presented on the top of the displayed images. *

      Thank you for this suggestion. We will consider moving Figure 5A to the supplementary.

      • The lower-case roman numerals referred to in the text for figure 7B are not included in the corresponding figure panel. *

      We will edit in the revised version.

      • Figure 7C y-axis typo (concentration). *

      We will edit in the revised version.

      • Line 222: "make a long-range functional gradient": more accurate to say, "but also marks mature, Bcd protein which resolves in the expected long-range gradient." *

      We will edit in the revised version.

      • Methods: Please check that all buffers referred to as acronyms are both compositionally defined in the reagents table, and that full names are written out at the time of first mention in the presented order. For instance, Schneider's media is referred to a few times before defining the acronym about midway through the methods section.*__ __

      We have added to Figure 2B: “Quantification of experiments shown in A. The number of oocytes that displayed Bcd protein at the anterior as measured by the presence of BcdSun32 at the anterior of the oocyte, but not the posterior. Schneider’s Insect Medium (+Sch) used as a negative control. N = 30 oocytes for each treatment. Scale bar: 5 um.” (Revised line 646).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Lai and Doe address the integration of spatial information with temporal patterning and genes that specify cell fate. They identify the Forkhead transcription factor Fd4 as a lineage-restricted cell fate regulator that bridges transient spatial transcription factors to terminal selector genes in the developing Drosophila ventral nerve cord. The experimental evidence convincingly demonstrates that Fd4 is both necessary for lateborn NB7-1 neurons, but also sufficient to transform other neural stem cell lineages toward the NB7-1 identity. This work addresses an important question that will be of interest to developmental neurobiologists: How can cell identities defined by initial transient developmental cues be maintained in the progeny cells, even if the molecular mechanism remains to be investigated? In addition, the study proposes a broader concept of lineage identity genes that could be utilized in other lineages and regions in the Drosophila nervous system and in other species.

      Thanks for the accurate summary and positive comments!

      While the spatial factors patterning the neuroepithelium to define the neuroblast lineages in the Drosophila ventral nerve cord are known, these factors are sometimes absent or not required during neurogenesis. In the current work, Lai and Doe identified Fd4 in the NB7-1 lineage that bridges this gap and explains how NB7-1 neurons are specified after Engrailed (En) and Vnd cease their expression. They show that Fd4 is transiently co-expressed with En and Vnd and is present in all nascent NB7-1 progenies. They further demonstrate that Fd4 is required for later-born NB7-1 progenies and sufficient for the induction of NB7-1 markers (Eve and Dbx) while repressing markers of other lineages when force-expressed in neural progenitors, e.g., in the NB56 lineage and in the NB7-3 lineage. They also demonstrate that, when Fd4 is ectopically expressed in NB7-3 and NB5-6 lineages, this leads to the ectopic generation of dorsal muscle-innervating neurons. The inclusion of functional validation using axon projections demonstrates that the transformed neurons acquire appropriate NB7-1 characteristics beyond just molecular markers. Quantitative analyses are thorough and well-presented for all experiments.

      Thanks for the positive comments!

      (1) While Fd4 is required and sufficient for several later-born NB7-1 progeny features, a comparison between early-born (Hb/Eve) and later-born (Run/Eve) appears missing for pan-progenitor gain of Fd4 (with sca-Gal4; Figure 4) and for the NB7-3 lineage (Figure 6). Having a quantification for both could make it clearer whether Fd4 preferentially induces later-born neurons or is sufficient for NB7-1 features without temporal restriction.

      We quantified the percentage of Hb+ and Runt+ cells among Eve+ cells with sca-gal4, and the results are shown in Figure 4-figure supplement 1. We found that the proportion of early-born cells is slightly reduced but the proportion of later-born cells remain similar. Interestingly, we also found a subset of Eve+ cells with a mixed fate (Hb+Runt+) but the reason remains unclear.

      (2) Fd4 and Fd5 are shown to be partially redundant, as Fd4 loss of function alone does not alter the number of Eve+ and Dbx+ neurons. This information is critical and should be included in Figure 3.

      Because every hemisegment in an fd4 single mutant is normal, we just added it as the following text: “In fd4 mutants, we observe no change in the number of Eve+ neurons or Dbx+ neurons (n=40 hemisegments).”

      (3) Several observations suggest that lineage identity maintenance involves both Fd4dependent and Fd4-independent mechanisms. In particular, the fact that fd4-Gal4 reporter remains active in fd4/fd5 mutants even after Vnd and En disappear indicates that Fd4's own expression, a key feature of NB7-1 identity, is maintained independently of Fd4 protein. This raises questions about what proportion of lineage identity features require Fd4 versus other maintenance mechanisms, which deserves discussion.

      We agree, thanks for raising this point. We add the following text to the Discussion. “Interestingly, the fd4 fd5 mutant maintains expression of fd4:gal4, suggesting that the fd4/fd5 locus may have established a chromatin state that allows “permanent” expression in the absence of Vnd, En, and Fd4/Fd5 proteins.”

      (4) Similarly, while gain of Fd4 induces NB7-1 lineage markers and dorsal muscle innervation in NB5-6 and NB7-3 lineages, drivers for the two lineages remain active despite the loss of molecular markers, indicating some regulatory elements retain activity consistent with their original lineage identity. It is therefore important to understand the degree of functional conversion in the gain-of-function experiments. Sparse labeling of Fd4 overexpressing NB5-6 and NB7-3 progenies, as was done in Seroka and Doe (2019), would be an option.

      We agree it is interesting that the NB7-3 and NB5-6 drivers remain on following Fd4 misexpression. To explore this, we used sca-gal4 to overexpress Fd4 and observed that Lbe expression persisted while Eg was largely repressed (Author response image 1). The results show that Lbe and Eg respond differently to Fd4. A non-mutually exclusive possibility is that the continued expression of lbe-Gal4 UAS-GFP or eg-Gal4 UAS-GFP may be due to the lengthy perdurance of both Gal4 and GFP.

      Author response image 1.

      (5) The less-penetrant induction of Dbx+ neurons in NB5-6 with Fd4-overexpression is interesting. It might be worth the authors discussing whether it is an Fd4 feature or an NB56 feature by examining Dbx+ neuron number in NB7-3 with Fd4-overexpression.

      In the NB7-3 lineages misexpressing Fd4, only 5 lineages generated Dbx+ cells (0.1±0.4, n=64 hemisegments), suggesting that the low penetrance of Dbx+ induction is an intrinsic feature of Fd4 rather than lineage context. We have added this information in the results section.

      (6) It is logical to hypothesize that spatial factors specify early-born neurons directly, so only late-born neurons require Fd4, but it was not tested. The model would be strengthened by examining whether Fd4-Gal4-driven Vnd rescues the generation of laterborn neurons in fd4/fd5 mutants.

      When we used en-gal4 driver to express UAS-vnd in the fd4/fd5 mutant background, we found an average 7.4±2.2 Eve+ cells per hemisegment (n=36), significantly higher than fd4/fd5 mutant alone (3.9±0.8 cells, n=52, p=2.6x10<sup>-11</sup>) (Figure 3J). In addition, 0.2±0.5 Eve+ cells were ectopic Hb+ (excluding U1/U2), indicating that Vnd-En integration is sufficient to generate both early-born and late-born Eve+ cells in the fd4/fd5 mutants. We have added the results to the text.

      (7) It is mentioned that Fd5 is not sufficient for the NB7-1 lineage identity. The observation is intriguing in how similar regulators serve distinct roles, but the data are not shown. The analysis in Figure 4 should be performed for Fd5 as supplemental information.

      Thanks for the suggestion. Because the results are exactly the same as the wild type, we don’t think it is necessary to provide an additional images or analysis as supplemental information.

      Reviewer #2 (Public review):

      Via a detailed expression analysis, they find that Fd4 is selectively expressed in embryonic NB7-1 and newly born neurons within this lineage. They also undertake a comprehensive genetic analysis to provide evidence that fd4 is necessary and sufficient for the identity of NB7-1 progeny.

      Thanks for the accurate summary!

      The analysis is both careful and rigorous, and the findings are of interest to developmental neurobiologists interested in molecular mechanisms underlying the generation of neuronal diversity. Great care was taken to make the figures clear and accessible. This work takes great advantage of years of painstaking descriptive work that has mapped embryonic neuroblast lineages in Drosophila.

      Thanks for the positive comments!

      The argument that Fd4 is necessary for NB7-1 lineage identity is based on a Fd4/Fd5 double mutant. Loss of fd4 alone did not alter the number of NB7-1-derived Eve+ or Dbx+ neurons. The authors clearly demonstrate redundancy between fd4 and fd5, and the fact that the LOF analysis is based on a double mutant should be better woven through the text.The authors generated an Fd5 mutant. I assume that Fd5 single mutants do not display NB7-1 lineage defects, but this is not stated. The focus on Fd4 over Fd5 is based on its highly specific expression profile and the dramatic misexpression phenotypes. But the LOF analysis demonstrates redundancy, and the conclusions in the abstract and through the results should reflect the existence of Fd5 in the conclusions of this manuscript.

      We agree, and have added new text to clarify the single mutant phenotypes (there are none) and the double mutant phenotype (loss of NB7-1 molecular and morphological features. The following text is added to the manuscript: “Not surprisingly, we found that fd4 single mutants or fd5 single mutants had no phenotype (Eve+ neurons were all normal). Thus, to assess their roles, we generated a fd4 and fd5 double mutant. Because many Eve+ and Dbx+ cells are generated outside of NB7-1 lineage, it was also essential to identify the Eve+ or Dbx+ cells within NB7-1 lineage in wild type and fd4 mutant embryos. To achieve this, we replaced the open reading frame of fd4 with gal4 (called fd4-gal4) (see Methods); this stock simultaneously knocked out both fd4 and fd5 (called fd4/fd5 mutant hereafter) while specifically labeling the NB7-1 lineage. For the remainder of this paper we use the fd4/fd5 double mutant to assay for loss of function phenotypes.”

      It is notable that Fd4 overexpression can rewire motor circuits. This analysis adds another dimension to the changes in transcription factor expression and, importantly, demonstrates functional consequences. Could the authors test whether U4 and U5 motor axon targeting changes in the fd4/fd5 double mutant? To strengthen claims regarding the importance of fd4/fd5 for lineage identity, it would help to address terminal features of U motorneuron identity in the LOF condition.

      Thanks for raising this important point. We examined the axon targeting on body wall muscles in both wild type and in fd4/fd5 mutant background and added the results in Figure 3-figure supplement 2. We found that the axon targeting in the late-born neuron region (LL1) is significantly reduced, suggesting that the loss of late-born neurons in fd4/fd5 mutant leads to the absence of innervation of corresponding muscle targets.

      Reviewer #3 (Public review):

      The goal of the work is to establish the linkage between the spatial transcription factors (STFs) that function transiently to establish the identities of the individual NBs and the terminal selector genes (typically homeodomain genes) that appear in the newborn postmitotic neurons. How is the identity of the NB maintained and carried forward after the spatial genes have faded away? Focusing on a single neuroblast (NB 7-1), the authors present evidence that the fork-head transcription factor, fd4, provides a bridge linking the transient spatial cues that initially specified neuroblast identity with the terminal selector genes that establish and maintain the identity of the stem cell's progeny.

      Thanks for the positive comments!

      The study is systematic, concise, and takes full advantage of 40+ years of work on the molecular players that establish neuronal identities in the Drosophila CNS. In the embryonic VNC, fd4 is expressed only in the NB 7-1 and its lineage. They show that Fd4 appears in the NB while the latter is still expressing the Spatial Transcription Factors and continues after the expression of the latter fades out. Fd4 is maintained through the early life of the neuronal progeny but then declines as the neurons turn on their terminal selector genes. Hence, fd4 expression is compatible with it being a bridging factor between the two sets of genes.

      Thanks for the accurate summary!

      Experimental support for the "bridging" role of Fd4 comes from a set of loss-of-function and gain-of-function manipulations. The loss of function of Fd4, and the partially redundant gene Fd5, from lineage 7-1 does not aoect the size of the lineage, but terminal markers of late-born neuronal phenotypes, like Eve and Dbx, are reduced or missing. By contrast, ectopic expression of fd4, but not fd5, results in ectopic expression of the terminal markers eve and Dbx throughout diverse VNC lineages.

      Thanks for the accurate summary!

      A detailed test of fd4's expression was then carried out using lineages 7-3 and 5-6, two well-characterized lineages in Drosophila. Lineage 7-3 is much smaller than 7-1 and continues to be so when subjected to fd4 misexpression. However, under the influence of ectopic Fd4 expression, the lineage 7-3 neurons lost their expected serotonin and corazonin expression and showed Eve expression as well as motoneuron phenotypes that partially mimic the U motoneurons of lineage 7-1.

      Thanks for the positive comments!

      Ectopic expression of Fd4 also produced changes in the 5-6 lineage. Expression of apterous, a feature of lineage 5-6, was suppressed, and expression of the 7-1 marker, Eve, was evident. Dbx expression was also evident in the transformed 5-6 lineages, but extremely restricted as compared to a normal 7-1 lineage. Considering the partial redundancy of fd4 and fd5, it would have been interesting to express both genes in the 5-6 lineage. The anatomical changes that are exhibited by motoneurons in response to Fd4 expression confirm that these cells do, indeed, show a shift in their cellular identity.

      We appreciate the positive comments. We agree double misexpression of Fd4 and Fd5 might give a stronger phenotype (as the reviewer says) but the lack of this experiment does not change the conclusions that Fd4 can promote NB7-1 molecular and morphological aspects at the expense of NB5-6 molecular markers.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The title of Figure 4 may be intended to include the term "Widespread", not "Wild spread". (Though the expansion of the Eve and Dbx with Fd4 is quite remarkable…).

      Done!

      Reviewer #3 (Recommendations for the authors):

      (1) Line 138. Is part of the sentence missing? Did the authors mean to say "that fd5 is coexpressed with fd4 in NB7-1 and its .....".

      Done!

      (2) ln 237: In trying to explain the "U-like" phenotype of the transformed motoneurons in lineage 7-3, the authors speculate that "perhaps their late birth did not give them time to extend to the most distant dorsal muscles ". It is very difficult to convince a motoneuron to stop growing in the absence of a target! An alternate possibility is that since there is only one or two U neurons made instead of the normal five, the growing motoneuron has enough information to direct them to the dorsal domain, but they lack the specification that allows them to recognize a specific muscle target.

      We agree there are additional possibilities, and now update the text to say: “We observed that these transformed neurons did not innervate the dorsal muscles, perhaps their late birth did not give them time to extend to the most distant dorsal muscles, or they were incompletely specified.”

      (3) In the References, I think that the Anderson et al. reference should also include "BioRxiv" before the DOI.

      Done!

      (4) Figure 6A for wild-type 7-3 lineage. The corazonin expression appears to be expressed in EW2 as well as EW3. This should be explained.

      We agree it looks that way, due to the 3D rotation used; we now replace it with a more representative image. Note that our quantification always shows a single Cor+ neuron per hemisegment.

      (5) Figure 7: Issues of terminology. The designation of "longitudinal" for muscles is traditionally in reference to the body axis, such as the Dorsal Longitudinal Muscles (DLM) of the adult thorax. The "longitudinal" muscles in the figure are really "transverse" muscles. I also suggest using "axon" or "neurites" rather than "filament". For the middle and bottom parts of E and F, are these lateral and ventral views? They should be designated as such.

      Thanks, we agree and have made the changes, using Axon instead of Filament, and labeling the views (lateral and ventro-lateral).